[PATCH iproute2-next] devlink: Add missing region option to devlink man page

2018-11-08 Thread Alex Vesker
The region field was not added to the devlink man page.

Fixes: 8b4fbf0bed8e6 ("devlink: Add support for devlink-region access")
Signed-off-by: Alex Vesker 
---
 man/man8/devlink.8 | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 360031f..8d527e7 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -7,7 +7,7 @@ devlink \- Devlink tool
 .in +8
 .ti -8
 .B devlink
-.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource " } { " COMMAND 
" | "
+.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region " } { " 
COMMAND " | "
 .BR help " }"
 .sp
 
@@ -74,6 +74,10 @@ Turn on verbose output.
 .B resource
 - devlink device resource configuration.
 
+.TP
+.B region
+- devlink address region access
+
 .SS
 .I COMMAND
 
-- 
1.8.3.1



Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-29 Thread Alex Vesker




On Wed, Aug 1, 2018 at 4:13 PM, Saeed Mahameed
 wrote:

On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck
 wrote:
On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed  
wrote:

Hi Dave,

This series provides devlink parameters updates to both devlink API 
and
mlx5 driver, it is a 2nd iteration of the dropped patches sent in a 
previous

mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
Devlink" to address review comments [1].

Changes from the original series:
- According to the discussion outcome, we are keeping the 
congestion control

  setting as mlx5 device specific for the current HW generation.
- Changed the congestion_mode and congestion action param type to 
string

- Added patches to fix devlink handling of param type string
- Added a patch which adds extack messages support for param set.
- At the end of this series, I've added yet another mlx5 devlink 
related

 feature, firmware snapshot support.

For more information please see tag log below.

Please pull and let me know if there's any problem.

[1] https://patchwork.ozlabs.org/patch/945996/

Thanks,
Saeed.

---

The following changes since commit 
e6476c21447c4b17c47e476aade6facf050f31e8:


  net: remove bogus RCU annotations on socket.wq (2018-07-31 
12:40:22 -0700)


are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2018-08-01


for you to fetch changes up to 
2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:


  net/mlx5: Use devlink region_snapshot parameter (2018-08-01 
14:49:09 -0700)



mlx5-updates-2018-08-01

This series provides devlink parameters updates to both devlink API 
and

mlx5 driver,

1) Devlink changes: (Moshe Shemesh)
The first two patches fix devlink param infrastructure for string type
params.
The third patch adds a devlink helper function to safely copy 
string from

driver to devlink.
The forth patch adds extack support for param set.

2) mlx5 specific congestion parameters: (Eran Ben Elisha)
Next three patches add new devlink driver specific params for 
controlling
congestion action and mode, using string type params and extack 
messages support.


This congestion mode enables hw workaround in specific devices 
which is

controlled by devlink driver-specific params. The workaround is device
specific for this NIC generation, the next NIC will not need it.

Congestion parameters:
 - Congestion action
    HW W/A mechanism in the PCIe buffer which monitors the 
amount of
    consumed PCIe buffer per host.  This mechanism supports 
the

    following actions in case of threshold overflow:
    - Disabled - NOP (Default)
    - Drop
    - Mark - Mark CE bit in the CQE of received packet
    - Congestion mode
    - Aggressive - Aggressive static trigger threshold 
(Default)

    - Dynamic - Dynamically change the trigger threshold

3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
Last three patches, add the support for capturing region snapshot 
of the

firmware crspace during critical errors, using devlink region_snapshot
parameter.

-Saeed.

----
Alex Vesker (3):
  net/mlx5: Add Vendor Specific Capability access gateway
  net/mlx5: Add Crdump FW snapshot support
  net/mlx5: Use devlink region_snapshot parameter

Eran Ben Elisha (3):
  net/mlx5: Move all devlink related functions calls to devlink.c
  net/mlx5: Add MPEGC register configuration functionality
  net/mlx5: Enable PCIe buffer congestion handling workaround 
via devlink


Moshe Shemesh (4):
  devlink: Fix param set handling for string type
  devlink: Fix param cmode driverinit for string type
  devlink: Add helper function for safely copy string param
  devlink: Add extack messages support to param set

 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
 drivers/net/ethernet/mellanox/mlx4/main.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 
+

 drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
 .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
 .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 
+

 .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  10 +-
 include/linux/mlx5/driver.h    |   5 +
 include/net/devlink.h  |  15 +-
 net/core/devlink.c |  44 ++-
 14 files changed, 1076 insertions(+), 17 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 d

[PATCH iproute2 net-next] devlink: Add support for devlink-region access

2018-07-17 Thread Alex Vesker
Devlink region allows access to driver defined address regions.
Each device can create its supported address regions and register
them. A device which exposes a region will allow access to it
using devlink.

This support allows reading and dumping regions snapshots as well
as presenting information such as region size and current available
snapshots.

A snapshot represents a memory image of a region taken by the driver.
If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots.

The dump command is designed to read the full address space of a
region or of a snapshot unlike the read command which allows
reading only a specific section in a region/snapshot indicated by
an address and a length, current support is for reading and dumping
for a previously taken snapshot ID.

New commands added:
 devlink region show [ DEV/REGION ]
 devlink region delete DEV/REGION snapshot SNAPSHOT_ID
 devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
 devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
address ADDRESS length length

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 devlink/devlink.c | 485 +-
 man/man8/devlink-region.8 | 131 +
 man/man8/devlink.8|   1 +
 3 files changed, 616 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/devlink-region.8

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 42fa716..784bb84 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -194,6 +195,10 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_PARAM_NAME  BIT(18)
 #define DL_OPT_PARAM_VALUE BIT(19)
 #define DL_OPT_PARAM_CMODE BIT(20)
+#define DL_OPT_HANDLE_REGION   BIT(21)
+#define DL_OPT_REGION_SNAPSHOT_ID  BIT(22)
+#define DL_OPT_REGION_ADDRESS  BIT(23)
+#define DL_OPT_REGION_LENGTH   BIT(24)
 
 struct dl_opts {
uint32_t present; /* flags of present items */
@@ -221,6 +226,10 @@ struct dl_opts {
const char *param_name;
const char *param_value;
enum devlink_param_cmode cmode;
+   char *region_name;
+   uint32_t region_snapshot_id;
+   uint64_t region_address;
+   uint64_t region_length;
 };
 
 struct dl {
@@ -364,6 +373,16 @@ static const enum mnl_attr_data_type 
devlink_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_PARAM_VALUES_LIST] = MNL_TYPE_NESTED,
[DEVLINK_ATTR_PARAM_VALUE] = MNL_TYPE_NESTED,
[DEVLINK_ATTR_PARAM_VALUE_CMODE] = MNL_TYPE_U8,
+   [DEVLINK_ATTR_REGION_NAME] = MNL_TYPE_STRING,
+   [DEVLINK_ATTR_REGION_SIZE] = MNL_TYPE_U64,
+   [DEVLINK_ATTR_REGION_SNAPSHOTS] = MNL_TYPE_NESTED,
+   [DEVLINK_ATTR_REGION_SNAPSHOT] = MNL_TYPE_NESTED,
+   [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = MNL_TYPE_U32,
+   [DEVLINK_ATTR_REGION_CHUNKS] = MNL_TYPE_NESTED,
+   [DEVLINK_ATTR_REGION_CHUNK] = MNL_TYPE_NESTED,
+   [DEVLINK_ATTR_REGION_CHUNK_DATA] = MNL_TYPE_BINARY,
+   [DEVLINK_ATTR_REGION_CHUNK_ADDR] = MNL_TYPE_U64,
+   [DEVLINK_ATTR_REGION_CHUNK_LEN] = MNL_TYPE_U64,
 };
 
 static int attr_cb(const struct nlattr *attr, void *data)
@@ -502,6 +521,20 @@ static int strslashrsplit(char *str, char **before, char 
**after)
return 0;
 }
 
+static int strtouint64_t(const char *str, uint64_t *p_val)
+{
+   char *endptr;
+   unsigned long long int val;
+
+   val = strtoull(str, , 10);
+   if (endptr == str || *endptr != '\0')
+   return -EINVAL;
+   if (val > ULONG_MAX)
+   return -ERANGE;
+   *p_val = val;
+   return 0;
+}
+
 static int strtouint32_t(const char *str, uint32_t *p_val)
 {
char *endptr;
@@ -687,6 +720,64 @@ static int dl_argv_handle_both(struct dl *dl, char 
**p_bus_name,
return 0;
 }
 
+static int __dl_argv_handle_region(char *str, char **p_bus_name,
+  char **p_dev_name, char **p_region)
+{
+   char *handlestr;
+   int err;
+
+   err = strslashrsplit(str, , p_region);
+   if (err) {
+   pr_err("Region identification \"%s\" is invalid\n", str);
+   return err;
+   }
+   err = strslashrsplit(handlestr, p_bus_name, p_dev_name);
+   if (err) {
+   pr_err("Region identification \"%s\" is invalid\n", str);
+   return err;
+   }
+   return 0;
+}
+
+static int dl_argv_handle_region(struct dl *dl, char **p_bus_name,
+   char **p_dev_name, char **p_region)
+{
+   char *str = dl_argv_next(dl);
+   unsigned int slash_count;
+
+   if (!str) {
+   pr_err("Expected \"bus_name/dev_name/region\&

Re: [PATCH net-next v3 02/11] devlink: Add callback to query for snapshot id before snapshot create

2018-07-15 Thread Alex Vesker




On 7/13/2018 3:51 AM, Jakub Kicinski wrote:

On Thu, 12 Jul 2018 15:13:09 +0300, Alex Vesker wrote:

To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID for multiple
snapshots taken of different regions on the same time.

I'm not in position to criticize other people's commit messages :), but
I find this one hard to parse.  I think what you meant to say is that
you add a helper for numbering the snapshot per-devlink instance.
There is no callback to be seen here.  You *prevent* from giving the
same ID to multiple snapshot even if they are from different regions.

Let me try to clarify,
The idea is to have a simple helper function that assigns IDs to provide 
a more complete
API, an example use case is when you want to add a new snapshot to 
multiple regions
from the same trigger, then it should be called once to get an ID, this 
ID should be used

on all new snapshots.


diff --git a/net/core/devlink.c b/net/core/devlink.c
index cac8561..6c92ddd 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4193,6 +4193,27 @@ void devlink_region_destroy(struct devlink_region 
*region)
  }
  EXPORT_SYMBOL_GPL(devlink_region_destroy);
  
+/**

+ * devlink_region_shapshot_id_get - get snapshot ID
+ *
+ * This callback should be called when adding a new snapshot,
+ * Driver should use the same id for multiple snapshots taken
+ * on multiple regions at the same time/by the same trigger.
+ *
+ * @devlink: devlink
+ */
+u32 devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   u32 id;
+
+   mutex_lock(>lock);
+   id = ++devlink->snapshot_id;

Any reason not to use an IDA?  The reuse may seem unlikely, OTOH IDA
isn't going to cost much, so why risk it...

As you mentioned more than u32_max_value snapshots doesn't sound likely.
New snapshots will be created, old snapshots should be deleted by the user
a wrap around sounds unlikely. Let me think about it some more, might send a
patch that changes to IDA.


+   mutex_unlock(>lock);
+
+   return id;
+}
+EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);

Sorry for only spotting this now.




[PATCH net-next v3 06/11] devlink: Add support for region snapshot delete command

2018-07-12 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_DEL used
for deleting a snapshot from a region. The snapshot ID is required.
Also added notification support for NEW and DEL of snapshots.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |  2 +
 net/core/devlink.c   | 93 
 2 files changed, 95 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index abde4e3..d212e02 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -85,6 +85,8 @@ enum devlink_command {
 
DEVLINK_CMD_REGION_GET,
DEVLINK_CMD_REGION_SET,
+   DEVLINK_CMD_REGION_NEW,
+   DEVLINK_CMD_REGION_DEL,
 
/* add new commands above here */
__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index cb75e26..fc08363 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3236,6 +3236,58 @@ static int devlink_nl_region_fill(struct sk_buff *msg, 
struct devlink *devlink,
return err;
 }
 
+static void devlink_nl_region_notify(struct devlink_region *region,
+struct devlink_snapshot *snapshot,
+enum devlink_command cmd)
+{
+   struct devlink *devlink = region->devlink;
+   struct sk_buff *msg;
+   void *hdr;
+   int err;
+
+   WARN_ON(cmd != DEVLINK_CMD_REGION_NEW && cmd != DEVLINK_CMD_REGION_DEL);
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return;
+
+   hdr = genlmsg_put(msg, 0, 0, _nl_family, 0, cmd);
+   if (!hdr)
+   goto out_free_msg;
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto out_cancel_msg;
+
+   err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME,
+region->name);
+   if (err)
+   goto out_cancel_msg;
+
+   if (snapshot) {
+   err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID,
+ snapshot->id);
+   if (err)
+   goto out_cancel_msg;
+   } else {
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE,
+   region->size, DEVLINK_ATTR_PAD);
+   if (err)
+   goto out_cancel_msg;
+   }
+   genlmsg_end(msg, hdr);
+
+   genlmsg_multicast_netns(_nl_family, devlink_net(devlink),
+   msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+
+   return;
+
+out_cancel_msg:
+   genlmsg_cancel(msg, hdr);
+out_free_msg:
+   nlmsg_free(msg);
+}
+
 static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb,
  struct genl_info *info)
 {
@@ -3307,6 +3359,35 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
return msg->len;
 }
 
+static int devlink_nl_cmd_region_del(struct sk_buff *skb,
+struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink_snapshot *snapshot;
+   struct devlink_region *region;
+   const char *region_name;
+   u32 snapshot_id;
+
+   if (!info->attrs[DEVLINK_ATTR_REGION_NAME] ||
+   !info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
+   return -EINVAL;
+
+   region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+   snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+
+   region = devlink_region_get_by_name(devlink, region_name);
+   if (!region)
+   return -EINVAL;
+
+   snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+   if (!snapshot)
+   return -EINVAL;
+
+   devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL);
+   devlink_region_snapshot_del(snapshot);
+   return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -3331,6 +3412,7 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
[DEVLINK_ATTR_PARAM_TYPE] = { .type = NLA_U8 },
[DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
+   [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -3537,6 +3619,13 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+   {
+   .cmd = DEVLINK_CMD_REGION_DEL,
+   .doit = devlink_nl_cmd_region_del,
+   .policy = devlink_nl_policy,
+   .flags = GENL_ADMIN

[PATCH net-next v3 09/11] net/mlx4_core: Add Crdump FW snapshot support

2018-07-12 Thread Alex Vesker
Crdump allows the driver to create a snapshot of the FW PCI
crspace and health buffer during a critical FW issue.
In case of a FW command timeout, FW getting stuck or a non zero
value on the catastrophic buffer, a snapshot will be taken.

The snapshot is exposed using devlink, cr-space, fw-health
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.

Signed-off-by: Alex Vesker 
Signed-off-by: Tariq Toukan 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 231 
 drivers/net/ethernet/mellanox/mlx4/main.c   |  10 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h |   6 +
 6 files changed, 255 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

diff --git a/drivers/net/ethernet/mellanox/mlx4/Makefile 
b/drivers/net/ethernet/mellanox/mlx4/Makefile
index 16b10d0..3f40077 100644
--- a/drivers/net/ethernet/mellanox/mlx4/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx4/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_MLX4_CORE) += mlx4_core.o
 
 mlx4_core-y := alloc.o catas.o cmd.o cq.o eq.o fw.o fw_qos.o icm.o intf.o \
main.o mcg.o mr.o pd.o port.o profile.o qp.o reset.o sense.o \
-   srq.o resource_tracker.o
+   srq.o resource_tracker.o crdump.o
 
 obj-$(CONFIG_MLX4_EN)   += mlx4_en.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c 
b/drivers/net/ethernet/mellanox/mlx4/catas.c
index 8afe4b5..c81d15b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/catas.c
+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c
@@ -178,10 +178,12 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent 
*persist)
 
dev = persist->dev;
mlx4_err(dev, "device is going to be reset\n");
-   if (mlx4_is_slave(dev))
+   if (mlx4_is_slave(dev)) {
err = mlx4_reset_slave(dev);
-   else
+   } else {
+   mlx4_crdump_collect(dev);
err = mlx4_reset_master(dev);
+   }
 
if (!err) {
mlx4_err(dev, "device was reset successfully\n");
diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c 
b/drivers/net/ethernet/mellanox/mlx4/crdump.c
new file mode 100644
index 000..4d5524d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -0,0 +1,231 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "mlx4.h"
+
+#define BAD_ACCESS 0xBADACCE5
+#define HEALTH_BUFFER_SIZE 0x40
+#define CR_ENABLE_BIT  swab32(BIT(6))
+#define CR_ENABLE_BIT_OFFSET   0xF3F04
+#define MAX_NUM_OF_DUMPS_TO_STORE  (8)
+
+static const char *region_cr_space_str = "cr-space";
+static const char *region_fw_health_str = "fw-health";
+
+/* Set to true in case cr enable bit was set to true before crdump */
+static bool crdump_enbale_bit_set;
+
+static void crdump_enable_crspace_access(struct mlx4_dev *dev,
+u8 __iomem *cr_space)
+{
+   /* Get current enable bit value */
+   crdump_enbale_bit_set =
+   readl(cr_space + CR_ENABLE_BIT_OFFSET) & CR_ENABLE_BIT;
+
+   /* Enable FW CR filter (set bit6 to 0) */
+   if (crdump_enbale_bit_set)
+   writel(readl(cr_space + CR_ENABLE_BIT_OFFSET) &

[PATCH net-next v3 11/11] net/mlx4_core: Use devlink region_snapshot parameter

2018-07-12 Thread Alex Vesker
This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
Reviewed-by: Moshe Shemesh 
---
 drivers/net/ethernet/mellanox/mlx4/crdump.c |  8 ++
 drivers/net/ethernet/mellanox/mlx4/main.c   | 41 +
 include/linux/mlx4/device.h |  1 +
 3 files changed, 50 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c 
b/drivers/net/ethernet/mellanox/mlx4/crdump.c
index 4d5524d..88316c7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -158,6 +158,7 @@ static void mlx4_crdump_collect_fw_health(struct mlx4_dev 
*dev,
 int mlx4_crdump_collect(struct mlx4_dev *dev)
 {
struct devlink *devlink = priv_to_devlink(mlx4_priv(dev));
+   struct mlx4_fw_crdump *crdump = >persist->crdump;
struct pci_dev *pdev = dev->persist->pdev;
unsigned long cr_res_size;
u8 __iomem *cr_space;
@@ -168,6 +169,11 @@ int mlx4_crdump_collect(struct mlx4_dev *dev)
return 0;
}
 
+   if (!crdump->snapshot_enable) {
+   mlx4_info(dev, "crdump: devlink snapshot disabled, skipping\n");
+   return 0;
+   }
+
cr_res_size = pci_resource_len(pdev, 0);
 
cr_space = ioremap(pci_resource_start(pdev, 0), cr_res_size);
@@ -197,6 +203,8 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
struct mlx4_fw_crdump *crdump = >persist->crdump;
struct pci_dev *pdev = dev->persist->pdev;
 
+   crdump->snapshot_enable = false;
+
/* Create cr-space region */
crdump->region_crspace =
devlink_region_create(devlink,
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 46b0214..2d979a6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -191,6 +191,26 @@ static int mlx4_devlink_ierr_reset_set(struct devlink 
*devlink, u32 id,
return 0;
 }
 
+static int mlx4_devlink_crdump_snapshot_get(struct devlink *devlink, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = >dev;
+
+   ctx->val.vbool = dev->persist->crdump.snapshot_enable;
+   return 0;
+}
+
+static int mlx4_devlink_crdump_snapshot_set(struct devlink *devlink, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = >dev;
+
+   dev->persist->crdump.snapshot_enable = ctx->val.vbool;
+   return 0;
+}
+
 static int
 mlx4_devlink_max_macs_validate(struct devlink *devlink, u32 id,
   union devlink_param_value val,
@@ -224,6 +244,11 @@ enum mlx4_devlink_param_id {
DEVLINK_PARAM_GENERIC(MAX_MACS,
  BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
  NULL, NULL, mlx4_devlink_max_macs_validate),
+   DEVLINK_PARAM_GENERIC(REGION_SNAPSHOT,
+ BIT(DEVLINK_PARAM_CMODE_RUNTIME) |
+ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+ mlx4_devlink_crdump_snapshot_get,
+ mlx4_devlink_crdump_snapshot_set, NULL),
DEVLINK_PARAM_DRIVER(MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE,
 "enable_64b_cqe_eqe", DEVLINK_PARAM_TYPE_BOOL,
 BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
@@ -270,6 +295,11 @@ static void mlx4_devlink_set_params_init_values(struct 
devlink *devlink)
mlx4_devlink_set_init_value(devlink,
MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR,
value);
+
+   value.vbool = false;
+   mlx4_devlink_set_init_value(devlink,
+   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+   value);
 }
 
 static inline void mlx4_set_num_reserved_uars(struct mlx4_dev *dev,
@@ -3862,6 +3892,9 @@ static int mlx4_devlink_port_type_set(struct devlink_port 
*devlink_port,
 
 static void mlx4_devlink_param_load_driverinit_values(struct devlink *devlink)
 {
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = >dev;
+   struct mlx4_fw_crdump *crdump = >persist->crdump;
union devlink_param_value saved_value;
int err;
 
@@ -3889,6 +3922,14 @@ static void 
mlx4_devlink_param_load_driverinit_values(struct devlink *devlink)
 _va

[PATCH net-next v3 07/11] devlink: Add support for region snapshot read command

2018-07-12 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_READ_GET used for both reading
and dumping region data. Read allows reading from a region specific
address for given length. Dump allows reading the full region.
If only snapshot ID is provided a snapshot dump will be done.
If snapshot ID, Address and Length are provided a snapshot read
will done.

This is used for both snapshot access and will be used in the same
way to access current data on the region.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |   7 ++
 net/core/devlink.c   | 182 +++
 2 files changed, 189 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index d212e02..79407bb 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -87,6 +87,7 @@ enum devlink_command {
DEVLINK_CMD_REGION_SET,
DEVLINK_CMD_REGION_NEW,
DEVLINK_CMD_REGION_DEL,
+   DEVLINK_CMD_REGION_READ,
 
/* add new commands above here */
__DEVLINK_CMD_MAX,
@@ -273,6 +274,12 @@ enum devlink_attr {
DEVLINK_ATTR_REGION_SNAPSHOT,   /* nested */
DEVLINK_ATTR_REGION_SNAPSHOT_ID,/* u32 */
 
+   DEVLINK_ATTR_REGION_CHUNKS, /* nested */
+   DEVLINK_ATTR_REGION_CHUNK,  /* nested */
+   DEVLINK_ATTR_REGION_CHUNK_DATA, /* binary */
+   DEVLINK_ATTR_REGION_CHUNK_ADDR, /* u64 */
+   DEVLINK_ATTR_REGION_CHUNK_LEN,  /* u64 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index fc08363..e5118db 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3388,6 +3388,181 @@ static int devlink_nl_cmd_region_del(struct sk_buff 
*skb,
return 0;
 }
 
+static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
+struct devlink *devlink,
+u8 *chunk, u32 chunk_size,
+u64 addr)
+{
+   struct nlattr *chunk_attr;
+   int err;
+
+   chunk_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_CHUNK);
+   if (!chunk_attr)
+   return -EINVAL;
+
+   err = nla_put(msg, DEVLINK_ATTR_REGION_CHUNK_DATA, chunk_size, chunk);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_CHUNK_ADDR, addr,
+   DEVLINK_ATTR_PAD);
+   if (err)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, chunk_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, chunk_attr);
+   return err;
+}
+
+#define DEVLINK_REGION_READ_CHUNK_SIZE 256
+
+static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
+   struct devlink *devlink,
+   struct devlink_region *region,
+   struct nlattr **attrs,
+   u64 start_offset,
+   u64 end_offset,
+   bool dump,
+   u64 *new_offset)
+{
+   struct devlink_snapshot *snapshot;
+   u64 curr_offset = start_offset;
+   u32 snapshot_id;
+   int err = 0;
+
+   *new_offset = start_offset;
+
+   snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+   snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+   if (!snapshot)
+   return -EINVAL;
+
+   if (end_offset > snapshot->data_len || dump)
+   end_offset = snapshot->data_len;
+
+   while (curr_offset < end_offset) {
+   u32 data_size;
+   u8 *data;
+
+   if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
+   data_size = end_offset - curr_offset;
+   else
+   data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
+
+   data = >data[curr_offset];
+   err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
+   data, data_size,
+   curr_offset);
+   if (err)
+   break;
+
+   curr_offset += data_size;
+   }
+   *new_offset = curr_offset;
+
+   return err;
+}
+
+static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
+struct netlink_callback *cb)
+{
+   u64 ret_offset, start_offset, end_offset = 0;
+   struct nlattr *attrs[DEVLINK_ATTR_MAX + 1];
+   const struct genl_ops *ops = cb->data;
+   struct d

[PATCH net-next v3 02/11] devlink: Add callback to query for snapshot id before snapshot create

2018-07-12 Thread Alex Vesker
To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID for multiple
snapshots taken of different regions on the same time.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h |  8 
 net/core/devlink.c| 21 +
 2 files changed, 29 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index e539765..f27d859 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -29,6 +29,7 @@ struct devlink {
struct list_head resource_list;
struct list_head param_list;
struct list_head region_list;
+   u32 snapshot_id;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -551,6 +552,7 @@ struct devlink_region *devlink_region_create(struct devlink 
*devlink,
 u32 region_max_snapshots,
 u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
+u32 devlink_region_shapshot_id_get(struct devlink *devlink);
 
 #else
 
@@ -792,6 +794,12 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
 {
 }
 
+static inline u32
+devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index cac8561..6c92ddd 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4193,6 +4193,27 @@ void devlink_region_destroy(struct devlink_region 
*region)
 }
 EXPORT_SYMBOL_GPL(devlink_region_destroy);
 
+/**
+ * devlink_region_shapshot_id_get - get snapshot ID
+ *
+ * This callback should be called when adding a new snapshot,
+ * Driver should use the same id for multiple snapshots taken
+ * on multiple regions at the same time/by the same trigger.
+ *
+ * @devlink: devlink
+ */
+u32 devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   u32 id;
+
+   mutex_lock(>lock);
+   id = ++devlink->snapshot_id;
+   mutex_unlock(>lock);
+
+   return id;
+}
+EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
+
 static int __init devlink_module_init(void)
 {
return genl_register_family(_nl_family);
-- 
1.8.3.1



[PATCH net-next v3 10/11] devlink: Add generic parameters region_snapshot

2018-07-12 Thread Alex Vesker
region_snapshot - When set enables capturing region snapshots

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
Reviewed-by: Moshe Shemesh 
---
 include/net/devlink.h | 4 
 net/core/devlink.c| 5 +
 2 files changed, 9 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 905f0bb..b9b89d6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -361,6 +361,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET,
DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV,
+   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
 
/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -376,6 +377,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_NAME "enable_sriov"
 #define DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME "region_snapshot_enable"
+#define DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE DEVLINK_PARAM_TYPE_BOOL
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
 {  \
.id = DEVLINK_PARAM_GENERIC_ID_##_id,   \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e5118db..65fc366 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2671,6 +2671,11 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, 
struct genl_info *info)
.name = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_NAME,
.type = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_TYPE,
},
+   {
+   .id = DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+   .name = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME,
+   .type = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE,
+   },
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
1.8.3.1



[PATCH net-next v3 08/11] net/mlx4_core: Add health buffer address capability

2018-07-12 Thread Alex Vesker
Health buffer address is a 32 bit PCI address offset provided by
the FW. This offset is used for reading FW health debug data
located on the shared CR space. Cr space is accessible in both
driver and FW and allows for different queries and configurations.
Health buffer size is always 64B of readable data followed by a
lock which is used to block volatile CR space access.

Signed-off-by: Alex Vesker 
Signed-off-by: Tariq Toukan 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c   | 5 -
 drivers/net/ethernet/mellanox/mlx4/fw.h   | 1 +
 drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
 include/linux/mlx4/device.h   | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 46dcbfb..babcfd9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -825,7 +825,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_NUM_OFFSET 0xcc
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_MAX_OFFSET 0xd0
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_MIN_OFFSET 0xd2
-
+#define QUERY_DEV_CAP_HEALTH_BUFFER_ADDRESS_OFFSET 0xe4
 
dev_cap->flags2 = 0;
mailbox = mlx4_alloc_cmd_mailbox(dev);
@@ -1082,6 +1082,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev_cap->rl_caps.min_unit = size >> 14;
}
 
+   MLX4_GET(dev_cap->health_buffer_addrs, outbox,
+QUERY_DEV_CAP_HEALTH_BUFFER_ADDRESS_OFFSET);
+
MLX4_GET(field32, outbox, QUERY_DEV_CAP_EXT_2_FLAGS_OFFSET);
if (field32 & (1 << 16))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h 
b/drivers/net/ethernet/mellanox/mlx4/fw.h
index cd6399c..650ae08 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -128,6 +128,7 @@ struct mlx4_dev_cap {
u32 dmfs_high_rate_qpn_base;
u32 dmfs_high_rate_qpn_range;
struct mlx4_rate_limit_caps rl_caps;
+   u32 health_buffer_addrs;
struct mlx4_port_cap port_cap[MLX4_MAX_PORTS + 1];
bool wol_port[MLX4_MAX_PORTS + 1];
 };
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index c42eddf..806d441 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -523,6 +523,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev->caps.max_rss_tbl_sz = dev_cap->max_rss_tbl_sz;
dev->caps.wol_port[1]  = dev_cap->wol_port[1];
dev->caps.wol_port[2]  = dev_cap->wol_port[2];
+   dev->caps.health_buffer_addrs  = dev_cap->health_buffer_addrs;
 
/* Save uar page shift */
if (!mlx4_is_slave(dev)) {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 122e7e9..e3bfe76 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -630,6 +630,7 @@ struct mlx4_caps {
u32 vf_caps;
boolwol_port[MLX4_MAX_PORTS + 1];
struct mlx4_rate_limit_caps rl_caps;
+   u32 health_buffer_addrs;
 };
 
 struct mlx4_buf_list {
-- 
1.8.3.1



[PATCH net-next v3 04/11] devlink: Add support for region get command

2018-07-12 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_GET command which is used for
querying for the supported DEV/REGION values of devlink devices.
The support is both for doit and dumpit.

Reply includes:
  BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |   6 +++
 net/core/devlink.c   | 114 +++
 2 files changed, 120 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 68641fb..28bfa8a 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -83,6 +83,9 @@ enum devlink_command {
DEVLINK_CMD_PARAM_NEW,
DEVLINK_CMD_PARAM_DEL,
 
+   DEVLINK_CMD_REGION_GET,
+   DEVLINK_CMD_REGION_SET,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -262,6 +265,9 @@ enum devlink_attr {
DEVLINK_ATTR_PARAM_VALUE_DATA,  /* dynamic */
DEVLINK_ATTR_PARAM_VALUE_CMODE, /* u8 */
 
+   DEVLINK_ATTR_REGION_NAME,   /* string */
+   DEVLINK_ATTR_REGION_SIZE,   /* u64 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 7d09fe6..221ddb6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3149,6 +3149,111 @@ static void devlink_param_unregister_one(struct devlink 
*devlink,
kfree(param_item);
 }
 
+static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
+ enum devlink_command cmd, u32 portid,
+ u32 seq, int flags,
+ struct devlink_region *region)
+{
+   void *hdr;
+   int err;
+
+   hdr = genlmsg_put(msg, portid, seq, _nl_family, flags, cmd);
+   if (!hdr)
+   return -EMSGSIZE;
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->name);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE,
+   region->size,
+   DEVLINK_ATTR_PAD);
+   if (err)
+   goto nla_put_failure;
+
+   genlmsg_end(msg, hdr);
+   return 0;
+
+nla_put_failure:
+   genlmsg_cancel(msg, hdr);
+   return err;
+}
+
+static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink_region *region;
+   const char *region_name;
+   struct sk_buff *msg;
+   int err;
+
+   if (!info->attrs[DEVLINK_ATTR_REGION_NAME])
+   return -EINVAL;
+
+   region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+   region = devlink_region_get_by_name(devlink, region_name);
+   if (!region)
+   return -EINVAL;
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return -ENOMEM;
+
+   err = devlink_nl_region_fill(msg, devlink, DEVLINK_CMD_REGION_GET,
+info->snd_portid, info->snd_seq, 0,
+region);
+   if (err) {
+   nlmsg_free(msg);
+   return err;
+   }
+
+   return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg,
+   struct netlink_callback *cb)
+{
+   struct devlink_region *region;
+   struct devlink *devlink;
+   int start = cb->args[0];
+   int idx = 0;
+   int err;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(devlink, _list, list) {
+   if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+   continue;
+
+   mutex_lock(>lock);
+   list_for_each_entry(region, >region_list, list) {
+   if (idx < start) {
+   idx++;
+   continue;
+   }
+   err = devlink_nl_region_fill(msg, devlink,
+DEVLINK_CMD_REGION_GET,
+NETLINK_CB(cb->skb).portid,
+cb->nlh->nlmsg_seq,
+NLM_F_MULTI, region);
+   if (err) {
+   mutex_unlock(>lock);
+   goto out;
+   }
+   idx++;
+   }
+   mutex_unlock(>

[PATCH net-next v3 05/11] devlink: Extend the support querying for region snapshot IDs

2018-07-12 Thread Alex Vesker
Extend the support for DEVLINK_CMD_REGION_GET command to also
return the IDs of the snapshot currently present on the region.
Each reply will include a nested snapshots attribute that
can contain multiple snapshot attributes each with an ID.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |  3 +++
 net/core/devlink.c   | 53 
 2 files changed, 56 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 28bfa8a..abde4e3 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -267,6 +267,9 @@ enum devlink_attr {
 
DEVLINK_ATTR_REGION_NAME,   /* string */
DEVLINK_ATTR_REGION_SIZE,   /* u64 */
+   DEVLINK_ATTR_REGION_SNAPSHOTS,  /* nested */
+   DEVLINK_ATTR_REGION_SNAPSHOT,   /* nested */
+   DEVLINK_ATTR_REGION_SNAPSHOT_ID,/* u32 */
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 221ddb6..cb75e26 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3149,6 +3149,55 @@ static void devlink_param_unregister_one(struct devlink 
*devlink,
kfree(param_item);
 }
 
+static int devlink_nl_region_snapshot_id_put(struct sk_buff *msg,
+struct devlink *devlink,
+struct devlink_snapshot *snapshot)
+{
+   struct nlattr *snap_attr;
+   int err;
+
+   snap_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_SNAPSHOT);
+   if (!snap_attr)
+   return -EINVAL;
+
+   err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID, snapshot->id);
+   if (err)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, snap_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, snap_attr);
+   return err;
+}
+
+static int devlink_nl_region_snapshots_id_put(struct sk_buff *msg,
+ struct devlink *devlink,
+ struct devlink_region *region)
+{
+   struct devlink_snapshot *snapshot;
+   struct nlattr *snapshots_attr;
+   int err;
+
+   snapshots_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_SNAPSHOTS);
+   if (!snapshots_attr)
+   return -EINVAL;
+
+   list_for_each_entry(snapshot, >snapshot_list, list) {
+   err = devlink_nl_region_snapshot_id_put(msg, devlink, snapshot);
+   if (err)
+   goto nla_put_failure;
+   }
+
+   nla_nest_end(msg, snapshots_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, snapshots_attr);
+   return err;
+}
+
 static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
  enum devlink_command cmd, u32 portid,
  u32 seq, int flags,
@@ -3175,6 +3224,10 @@ static int devlink_nl_region_fill(struct sk_buff *msg, 
struct devlink *devlink,
if (err)
goto nla_put_failure;
 
+   err = devlink_nl_region_snapshots_id_put(msg, devlink, region);
+   if (err)
+   goto nla_put_failure;
+
genlmsg_end(msg, hdr);
return 0;
 
-- 
1.8.3.1



[PATCH net-next v3 03/11] devlink: Add support for creating region snapshots

2018-07-12 Thread Alex Vesker
Each device address region can store multiple snapshots,
each snapshot is identified using a different numerical ID.
This ID is used when deleting a snapshot or showing an address
region specific snapshot. This patch exposes a callback to add
a new snapshot to an address region.
The snapshot will be deleted using the destructor function
when destroying a region or when a snapshot delete command
from devlink user tool.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h | 13 +++
 net/core/devlink.c| 95 +++
 2 files changed, 108 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f27d859..905f0bb 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -401,6 +401,8 @@ enum devlink_param_generic_id {
 
 struct devlink_region;
 
+typedef void devlink_snapshot_data_dest_t(const void *data);
+
 struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -553,6 +555,9 @@ struct devlink_region *devlink_region_create(struct devlink 
*devlink,
 u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
 u32 devlink_region_shapshot_id_get(struct devlink *devlink);
+int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t 
*data_destructor);
 
 #else
 
@@ -800,6 +805,14 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
return 0;
 }
 
+static inline int
+devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t *data_destructor)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6c92ddd..7d09fe6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -336,6 +336,15 @@ struct devlink_region {
u64 size;
 };
 
+struct devlink_snapshot {
+   struct list_head list;
+   struct devlink_region *region;
+   devlink_snapshot_data_dest_t *data_destructor;
+   u64 data_len;
+   u8 *data;
+   u32 id;
+};
+
 static struct devlink_region *
 devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
 {
@@ -348,6 +357,26 @@ struct devlink_region {
return NULL;
 }
 
+static struct devlink_snapshot *
+devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id)
+{
+   struct devlink_snapshot *snapshot;
+
+   list_for_each_entry(snapshot, >snapshot_list, list)
+   if (snapshot->id == id)
+   return snapshot;
+
+   return NULL;
+}
+
+static void devlink_region_snapshot_del(struct devlink_snapshot *snapshot)
+{
+   snapshot->region->cur_snapshots--;
+   list_del(>list);
+   (*snapshot->data_destructor)(snapshot->data);
+   kfree(snapshot);
+}
+
 #define DEVLINK_NL_FLAG_NEED_DEVLINK   BIT(0)
 #define DEVLINK_NL_FLAG_NEED_PORT  BIT(1)
 #define DEVLINK_NL_FLAG_NEED_SBBIT(2)
@@ -4185,8 +4214,14 @@ struct devlink_region *devlink_region_create(struct 
devlink *devlink,
 void devlink_region_destroy(struct devlink_region *region)
 {
struct devlink *devlink = region->devlink;
+   struct devlink_snapshot *snapshot, *ts;
 
mutex_lock(>lock);
+
+   /* Free all snapshots of region */
+   list_for_each_entry_safe(snapshot, ts, >snapshot_list, list)
+   devlink_region_snapshot_del(snapshot);
+
list_del(>list);
mutex_unlock(>lock);
kfree(region);
@@ -4214,6 +4249,66 @@ u32 devlink_region_shapshot_id_get(struct devlink 
*devlink)
 }
 EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
 
+/**
+ * devlink_region_snapshot_create - create a new snapshot
+ * This will add a new snapshot of a region. The snapshot
+ * will be stored on the region struct and can be accessed
+ * from devlink. This is useful for future analyses of snapshots.
+ * Multiple snapshots can be created on a region.
+ * The @snapshot_id should be obtained using the getter function.
+ *
+ * @devlink_region: devlink region of the snapshot
+ * @data_len: size of snapshot data
+ * @data: snapshot data
+ * @snapshot_id: snapshot id to be created
+ * @data_destructor: pointer to destructor function to free data
+ */
+int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t 
*data_destructor)
+{
+   struct devlink *devlink = region->devlink;
+   struct devl

[PATCH net-next v3 00/11] devlink: Add support for region access

2018-07-12 Thread Alex Vesker
This is a proposal which will allow access to driver defined address
regions using devlink. Each device can create its supported address
regions and register them. A device which exposes a region will allow
access to it using devlink.

The suggested implementation will allow exposing regions to the user,
reading and dumping snapshots taken from different regions. 
A snapshot represents a memory image of a region taken by the driver.

If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots to be
done.

The major benefit of this support is not only to provide access to
internal address regions which were inaccessible to the user but also
to provide an additional way to debug complex error states using the
region snapshots.

Implemented commands:
$ devlink region help
$ devlink region show [ DEV/REGION ]
$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
$ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
$ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
address ADDRESS length length

Show all of the exposed regions with region sizes:
$ devlink region show
pci/:00:05.0/cr-space: size 1048576 snapshot [1 2]
pci/:00:05.0/fw-health: size 64 snapshot [1 2]

Delete a snapshot using:
$ devlink region del pci/:00:05.0/cr-space snapshot 1

Dump a snapshot:
$ devlink region dump pci/:00:05.0/fw-health snapshot 1
 0014 95dc 0014 9514 0035 1670 0034 db30
0010    ff04 0029 8c00 0028 8cc8
0020 0016 0bb8 0016 1720   c00f 3ffc
0030 bada cce5 bada cce5 bada cce5 bada cce5

Read a specific part of a snapshot:
$ devlink region read pci/:00:05.0/fw-health snapshot 1 address 0 
length 16
 0014 95dc 0014 9514 0035 1670 0034 db30

For more information you can check devlink-region.8 man page

Future:
There is a plan to extend the support to include a write command
as well as performing read and dump live region

v1->v2:
-Add a parameter to enable devlink region snapshot
-Allocate snapshot memory using kvmalloc
-Introduce destructor function devlink_snapshot_data_dest_t to avoid
 double allocation

v2->v3:
-Fix incorrect comment in devlink.h for DEVLINK_ATTR_REGION_SIZE
 from u32 to u64

Alex Vesker (11):
  devlink: Add support for creating and destroying regions
  devlink: Add callback to query for snapshot id before snapshot create
  devlink: Add support for creating region snapshots
  devlink: Add support for region get command
  devlink: Extend the support querying for region snapshot IDs
  devlink: Add support for region snapshot delete command
  devlink: Add support for region snapshot read command
  net/mlx4_core: Add health buffer address capability
  net/mlx4_core: Add Crdump FW snapshot support
  devlink: Add generic parameters region_snapshot
  net/mlx4_core: Use devlink region_snapshot parameter

 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 239 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c |   5 +-
 drivers/net/ethernet/mellanox/mlx4/fw.h |   1 +
 drivers/net/ethernet/mellanox/mlx4/main.c   |  52 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h |   8 +
 include/net/devlink.h   |  47 ++
 include/uapi/linux/devlink.h|  18 +
 net/core/devlink.c  | 647 
 11 files changed, 1024 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

-- 
1.8.3.1



[PATCH net-next v3 01/11] devlink: Add support for creating and destroying regions

2018-07-12 Thread Alex Vesker
This allows a device to register its supported address regions.
Each address region can be accessed directly for example reading
the snapshots taken of this address space.
Drivers are not limited in the name selection for different regions.
An example of a region-name can be: pci cr-space, register-space.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h | 22 ++
 net/core/devlink.c| 84 +++
 2 files changed, 106 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f67c29c..e539765 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -28,6 +28,7 @@ struct devlink {
struct list_head dpipe_table_list;
struct list_head resource_list;
struct list_head param_list;
+   struct list_head region_list;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -397,6 +398,8 @@ enum devlink_param_generic_id {
.validate = _validate,  \
 }
 
+struct devlink_region;
+
 struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -543,6 +546,11 @@ int devlink_param_driverinit_value_get(struct devlink 
*devlink, u32 param_id,
 int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id,
   union devlink_param_value init_val);
 void devlink_param_value_changed(struct devlink *devlink, u32 param_id);
+struct devlink_region *devlink_region_create(struct devlink *devlink,
+const char *region_name,
+u32 region_max_snapshots,
+u64 region_size);
+void devlink_region_destroy(struct devlink_region *region);
 
 #else
 
@@ -770,6 +778,20 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
 {
 }
 
+static inline struct devlink_region *
+devlink_region_create(struct devlink *devlink,
+ const char *region_name,
+ u32 region_max_snapshots,
+ u64 region_size)
+{
+   return NULL;
+}
+
+static inline void
+devlink_region_destroy(struct devlink_region *region)
+{
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 470f3db..cac8561 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -326,6 +326,28 @@ static int devlink_sb_pool_index_get_from_info(struct 
devlink_sb *devlink_sb,
  pool_type, p_tc_index);
 }
 
+struct devlink_region {
+   struct devlink *devlink;
+   struct list_head list;
+   const char *name;
+   struct list_head snapshot_list;
+   u32 max_snapshots;
+   u32 cur_snapshots;
+   u64 size;
+};
+
+static struct devlink_region *
+devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
+{
+   struct devlink_region *region;
+
+   list_for_each_entry(region, >region_list, list)
+   if (!strcmp(region->name, region_name))
+   return region;
+
+   return NULL;
+}
+
 #define DEVLINK_NL_FLAG_NEED_DEVLINK   BIT(0)
 #define DEVLINK_NL_FLAG_NEED_PORT  BIT(1)
 #define DEVLINK_NL_FLAG_NEED_SBBIT(2)
@@ -3358,6 +3380,7 @@ struct devlink *devlink_alloc(const struct devlink_ops 
*ops, size_t priv_size)
INIT_LIST_HEAD_RCU(>dpipe_table_list);
INIT_LIST_HEAD(>resource_list);
INIT_LIST_HEAD(>param_list);
+   INIT_LIST_HEAD(>region_list);
mutex_init(>lock);
return devlink;
 }
@@ -4109,6 +4132,67 @@ void devlink_param_value_changed(struct devlink 
*devlink, u32 param_id)
 }
 EXPORT_SYMBOL_GPL(devlink_param_value_changed);
 
+/**
+ * devlink_region_create - create a new address region
+ *
+ * @devlink: devlink
+ * @region_name: region name
+ * @region_max_snapshots: Maximum supported number of snapshots for region
+ * @region_size: size of region
+ */
+struct devlink_region *devlink_region_create(struct devlink *devlink,
+const char *region_name,
+u32 region_max_snapshots,
+u64 region_size)
+{
+   struct devlink_region *region;
+   int err = 0;
+
+   mutex_lock(>lock);
+
+   if (devlink_region_get_by_name(devlink, region_name)) {
+   err = -EEXIST;
+   goto unlock;
+   }
+
+   region = kzalloc(sizeof(*region), GFP_KERNEL);
+   if (!region) {
+   err = -ENOMEM;
+   goto unlock;
+   }
+
+   region->devlink = devlink;
+   region->max_snapshots = region_max_snapshots;
+   

[PATCH net-next v2 11/11] net/mlx4_core: Use devlink region_snapshot parameter

2018-07-11 Thread Alex Vesker
This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
Reviewed-by: Moshe Shemesh 
---
 drivers/net/ethernet/mellanox/mlx4/crdump.c |  8 ++
 drivers/net/ethernet/mellanox/mlx4/main.c   | 41 +
 include/linux/mlx4/device.h |  1 +
 3 files changed, 50 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c 
b/drivers/net/ethernet/mellanox/mlx4/crdump.c
index 4d5524d..88316c7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -158,6 +158,7 @@ static void mlx4_crdump_collect_fw_health(struct mlx4_dev 
*dev,
 int mlx4_crdump_collect(struct mlx4_dev *dev)
 {
struct devlink *devlink = priv_to_devlink(mlx4_priv(dev));
+   struct mlx4_fw_crdump *crdump = >persist->crdump;
struct pci_dev *pdev = dev->persist->pdev;
unsigned long cr_res_size;
u8 __iomem *cr_space;
@@ -168,6 +169,11 @@ int mlx4_crdump_collect(struct mlx4_dev *dev)
return 0;
}
 
+   if (!crdump->snapshot_enable) {
+   mlx4_info(dev, "crdump: devlink snapshot disabled, skipping\n");
+   return 0;
+   }
+
cr_res_size = pci_resource_len(pdev, 0);
 
cr_space = ioremap(pci_resource_start(pdev, 0), cr_res_size);
@@ -197,6 +203,8 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
struct mlx4_fw_crdump *crdump = >persist->crdump;
struct pci_dev *pdev = dev->persist->pdev;
 
+   crdump->snapshot_enable = false;
+
/* Create cr-space region */
crdump->region_crspace =
devlink_region_create(devlink,
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 46b0214..2d979a6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -191,6 +191,26 @@ static int mlx4_devlink_ierr_reset_set(struct devlink 
*devlink, u32 id,
return 0;
 }
 
+static int mlx4_devlink_crdump_snapshot_get(struct devlink *devlink, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = >dev;
+
+   ctx->val.vbool = dev->persist->crdump.snapshot_enable;
+   return 0;
+}
+
+static int mlx4_devlink_crdump_snapshot_set(struct devlink *devlink, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = >dev;
+
+   dev->persist->crdump.snapshot_enable = ctx->val.vbool;
+   return 0;
+}
+
 static int
 mlx4_devlink_max_macs_validate(struct devlink *devlink, u32 id,
   union devlink_param_value val,
@@ -224,6 +244,11 @@ enum mlx4_devlink_param_id {
DEVLINK_PARAM_GENERIC(MAX_MACS,
  BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
  NULL, NULL, mlx4_devlink_max_macs_validate),
+   DEVLINK_PARAM_GENERIC(REGION_SNAPSHOT,
+ BIT(DEVLINK_PARAM_CMODE_RUNTIME) |
+ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+ mlx4_devlink_crdump_snapshot_get,
+ mlx4_devlink_crdump_snapshot_set, NULL),
DEVLINK_PARAM_DRIVER(MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE,
 "enable_64b_cqe_eqe", DEVLINK_PARAM_TYPE_BOOL,
 BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
@@ -270,6 +295,11 @@ static void mlx4_devlink_set_params_init_values(struct 
devlink *devlink)
mlx4_devlink_set_init_value(devlink,
MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR,
value);
+
+   value.vbool = false;
+   mlx4_devlink_set_init_value(devlink,
+   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+   value);
 }
 
 static inline void mlx4_set_num_reserved_uars(struct mlx4_dev *dev,
@@ -3862,6 +3892,9 @@ static int mlx4_devlink_port_type_set(struct devlink_port 
*devlink_port,
 
 static void mlx4_devlink_param_load_driverinit_values(struct devlink *devlink)
 {
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = >dev;
+   struct mlx4_fw_crdump *crdump = >persist->crdump;
union devlink_param_value saved_value;
int err;
 
@@ -3889,6 +3922,14 @@ static void 
mlx4_devlink_param_load_driverinit_values(struct devlink *devlink)
 _va

[PATCH net-next v2 04/11] devlink: Add support for region get command

2018-07-11 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_GET command which is used for
querying for the supported DEV/REGION values of devlink devices.
The support is both for doit and dumpit.

Reply includes:
  BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |   6 +++
 net/core/devlink.c   | 114 +++
 2 files changed, 120 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 68641fb..d1dbc5d 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -83,6 +83,9 @@ enum devlink_command {
DEVLINK_CMD_PARAM_NEW,
DEVLINK_CMD_PARAM_DEL,
 
+   DEVLINK_CMD_REGION_GET,
+   DEVLINK_CMD_REGION_SET,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -262,6 +265,9 @@ enum devlink_attr {
DEVLINK_ATTR_PARAM_VALUE_DATA,  /* dynamic */
DEVLINK_ATTR_PARAM_VALUE_CMODE, /* u8 */
 
+   DEVLINK_ATTR_REGION_NAME,   /* string */
+   DEVLINK_ATTR_REGION_SIZE,   /* u32 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 7d09fe6..221ddb6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3149,6 +3149,111 @@ static void devlink_param_unregister_one(struct devlink 
*devlink,
kfree(param_item);
 }
 
+static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
+ enum devlink_command cmd, u32 portid,
+ u32 seq, int flags,
+ struct devlink_region *region)
+{
+   void *hdr;
+   int err;
+
+   hdr = genlmsg_put(msg, portid, seq, _nl_family, flags, cmd);
+   if (!hdr)
+   return -EMSGSIZE;
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->name);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE,
+   region->size,
+   DEVLINK_ATTR_PAD);
+   if (err)
+   goto nla_put_failure;
+
+   genlmsg_end(msg, hdr);
+   return 0;
+
+nla_put_failure:
+   genlmsg_cancel(msg, hdr);
+   return err;
+}
+
+static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink_region *region;
+   const char *region_name;
+   struct sk_buff *msg;
+   int err;
+
+   if (!info->attrs[DEVLINK_ATTR_REGION_NAME])
+   return -EINVAL;
+
+   region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+   region = devlink_region_get_by_name(devlink, region_name);
+   if (!region)
+   return -EINVAL;
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return -ENOMEM;
+
+   err = devlink_nl_region_fill(msg, devlink, DEVLINK_CMD_REGION_GET,
+info->snd_portid, info->snd_seq, 0,
+region);
+   if (err) {
+   nlmsg_free(msg);
+   return err;
+   }
+
+   return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg,
+   struct netlink_callback *cb)
+{
+   struct devlink_region *region;
+   struct devlink *devlink;
+   int start = cb->args[0];
+   int idx = 0;
+   int err;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(devlink, _list, list) {
+   if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+   continue;
+
+   mutex_lock(>lock);
+   list_for_each_entry(region, >region_list, list) {
+   if (idx < start) {
+   idx++;
+   continue;
+   }
+   err = devlink_nl_region_fill(msg, devlink,
+DEVLINK_CMD_REGION_GET,
+NETLINK_CB(cb->skb).portid,
+cb->nlh->nlmsg_seq,
+NLM_F_MULTI, region);
+   if (err) {
+   mutex_unlock(>lock);
+   goto out;
+   }
+   idx++;
+   }
+   mutex_unlock(>

[PATCH net-next v2 03/11] devlink: Add support for creating region snapshots

2018-07-11 Thread Alex Vesker
Each device address region can store multiple snapshots,
each snapshot is identified using a different numerical ID.
This ID is used when deleting a snapshot or showing an address
region specific snapshot. This patch exposes a callback to add
a new snapshot to an address region.
The snapshot will be deleted using the destructor function
when destroying a region or when a snapshot delete command
from devlink user tool.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h | 13 +++
 net/core/devlink.c| 95 +++
 2 files changed, 108 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f27d859..905f0bb 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -401,6 +401,8 @@ enum devlink_param_generic_id {
 
 struct devlink_region;
 
+typedef void devlink_snapshot_data_dest_t(const void *data);
+
 struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -553,6 +555,9 @@ struct devlink_region *devlink_region_create(struct devlink 
*devlink,
 u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
 u32 devlink_region_shapshot_id_get(struct devlink *devlink);
+int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t 
*data_destructor);
 
 #else
 
@@ -800,6 +805,14 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
return 0;
 }
 
+static inline int
+devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t *data_destructor)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6c92ddd..7d09fe6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -336,6 +336,15 @@ struct devlink_region {
u64 size;
 };
 
+struct devlink_snapshot {
+   struct list_head list;
+   struct devlink_region *region;
+   devlink_snapshot_data_dest_t *data_destructor;
+   u64 data_len;
+   u8 *data;
+   u32 id;
+};
+
 static struct devlink_region *
 devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
 {
@@ -348,6 +357,26 @@ struct devlink_region {
return NULL;
 }
 
+static struct devlink_snapshot *
+devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id)
+{
+   struct devlink_snapshot *snapshot;
+
+   list_for_each_entry(snapshot, >snapshot_list, list)
+   if (snapshot->id == id)
+   return snapshot;
+
+   return NULL;
+}
+
+static void devlink_region_snapshot_del(struct devlink_snapshot *snapshot)
+{
+   snapshot->region->cur_snapshots--;
+   list_del(>list);
+   (*snapshot->data_destructor)(snapshot->data);
+   kfree(snapshot);
+}
+
 #define DEVLINK_NL_FLAG_NEED_DEVLINK   BIT(0)
 #define DEVLINK_NL_FLAG_NEED_PORT  BIT(1)
 #define DEVLINK_NL_FLAG_NEED_SBBIT(2)
@@ -4185,8 +4214,14 @@ struct devlink_region *devlink_region_create(struct 
devlink *devlink,
 void devlink_region_destroy(struct devlink_region *region)
 {
struct devlink *devlink = region->devlink;
+   struct devlink_snapshot *snapshot, *ts;
 
mutex_lock(>lock);
+
+   /* Free all snapshots of region */
+   list_for_each_entry_safe(snapshot, ts, >snapshot_list, list)
+   devlink_region_snapshot_del(snapshot);
+
list_del(>list);
mutex_unlock(>lock);
kfree(region);
@@ -4214,6 +4249,66 @@ u32 devlink_region_shapshot_id_get(struct devlink 
*devlink)
 }
 EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
 
+/**
+ * devlink_region_snapshot_create - create a new snapshot
+ * This will add a new snapshot of a region. The snapshot
+ * will be stored on the region struct and can be accessed
+ * from devlink. This is useful for future analyses of snapshots.
+ * Multiple snapshots can be created on a region.
+ * The @snapshot_id should be obtained using the getter function.
+ *
+ * @devlink_region: devlink region of the snapshot
+ * @data_len: size of snapshot data
+ * @data: snapshot data
+ * @snapshot_id: snapshot id to be created
+ * @data_destructor: pointer to destructor function to free data
+ */
+int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t 
*data_destructor)
+{
+   struct devlink *devlink = region->devlink;
+   struct devl

[PATCH net-next v2 05/11] devlink: Extend the support querying for region snapshot IDs

2018-07-11 Thread Alex Vesker
Extend the support for DEVLINK_CMD_REGION_GET command to also
return the IDs of the snapshot currently present on the region.
Each reply will include a nested snapshots attribute that
can contain multiple snapshot attributes each with an ID.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |  3 +++
 net/core/devlink.c   | 53 
 2 files changed, 56 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index d1dbc5d..42fcb55 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -267,6 +267,9 @@ enum devlink_attr {
 
DEVLINK_ATTR_REGION_NAME,   /* string */
DEVLINK_ATTR_REGION_SIZE,   /* u32 */
+   DEVLINK_ATTR_REGION_SNAPSHOTS,  /* nested */
+   DEVLINK_ATTR_REGION_SNAPSHOT,   /* nested */
+   DEVLINK_ATTR_REGION_SNAPSHOT_ID,/* u32 */
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 221ddb6..cb75e26 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3149,6 +3149,55 @@ static void devlink_param_unregister_one(struct devlink 
*devlink,
kfree(param_item);
 }
 
+static int devlink_nl_region_snapshot_id_put(struct sk_buff *msg,
+struct devlink *devlink,
+struct devlink_snapshot *snapshot)
+{
+   struct nlattr *snap_attr;
+   int err;
+
+   snap_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_SNAPSHOT);
+   if (!snap_attr)
+   return -EINVAL;
+
+   err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID, snapshot->id);
+   if (err)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, snap_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, snap_attr);
+   return err;
+}
+
+static int devlink_nl_region_snapshots_id_put(struct sk_buff *msg,
+ struct devlink *devlink,
+ struct devlink_region *region)
+{
+   struct devlink_snapshot *snapshot;
+   struct nlattr *snapshots_attr;
+   int err;
+
+   snapshots_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_SNAPSHOTS);
+   if (!snapshots_attr)
+   return -EINVAL;
+
+   list_for_each_entry(snapshot, >snapshot_list, list) {
+   err = devlink_nl_region_snapshot_id_put(msg, devlink, snapshot);
+   if (err)
+   goto nla_put_failure;
+   }
+
+   nla_nest_end(msg, snapshots_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, snapshots_attr);
+   return err;
+}
+
 static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
  enum devlink_command cmd, u32 portid,
  u32 seq, int flags,
@@ -3175,6 +3224,10 @@ static int devlink_nl_region_fill(struct sk_buff *msg, 
struct devlink *devlink,
if (err)
goto nla_put_failure;
 
+   err = devlink_nl_region_snapshots_id_put(msg, devlink, region);
+   if (err)
+   goto nla_put_failure;
+
genlmsg_end(msg, hdr);
return 0;
 
-- 
1.8.3.1



[PATCH net-next v2 07/11] devlink: Add support for region snapshot read command

2018-07-11 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_READ_GET used for both reading
and dumping region data. Read allows reading from a region specific
address for given length. Dump allows reading the full region.
If only snapshot ID is provided a snapshot dump will be done.
If snapshot ID, Address and Length are provided a snapshot read
will done.

This is used for both snapshot access and will be used in the same
way to access current data on the region.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |   7 ++
 net/core/devlink.c   | 182 +++
 2 files changed, 189 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 029bd7f..d573393 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -87,6 +87,7 @@ enum devlink_command {
DEVLINK_CMD_REGION_SET,
DEVLINK_CMD_REGION_NEW,
DEVLINK_CMD_REGION_DEL,
+   DEVLINK_CMD_REGION_READ,
 
/* add new commands above here */
__DEVLINK_CMD_MAX,
@@ -273,6 +274,12 @@ enum devlink_attr {
DEVLINK_ATTR_REGION_SNAPSHOT,   /* nested */
DEVLINK_ATTR_REGION_SNAPSHOT_ID,/* u32 */
 
+   DEVLINK_ATTR_REGION_CHUNKS, /* nested */
+   DEVLINK_ATTR_REGION_CHUNK,  /* nested */
+   DEVLINK_ATTR_REGION_CHUNK_DATA, /* binary */
+   DEVLINK_ATTR_REGION_CHUNK_ADDR, /* u64 */
+   DEVLINK_ATTR_REGION_CHUNK_LEN,  /* u64 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index fc08363..e5118db 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3388,6 +3388,181 @@ static int devlink_nl_cmd_region_del(struct sk_buff 
*skb,
return 0;
 }
 
+static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
+struct devlink *devlink,
+u8 *chunk, u32 chunk_size,
+u64 addr)
+{
+   struct nlattr *chunk_attr;
+   int err;
+
+   chunk_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_CHUNK);
+   if (!chunk_attr)
+   return -EINVAL;
+
+   err = nla_put(msg, DEVLINK_ATTR_REGION_CHUNK_DATA, chunk_size, chunk);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_CHUNK_ADDR, addr,
+   DEVLINK_ATTR_PAD);
+   if (err)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, chunk_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, chunk_attr);
+   return err;
+}
+
+#define DEVLINK_REGION_READ_CHUNK_SIZE 256
+
+static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
+   struct devlink *devlink,
+   struct devlink_region *region,
+   struct nlattr **attrs,
+   u64 start_offset,
+   u64 end_offset,
+   bool dump,
+   u64 *new_offset)
+{
+   struct devlink_snapshot *snapshot;
+   u64 curr_offset = start_offset;
+   u32 snapshot_id;
+   int err = 0;
+
+   *new_offset = start_offset;
+
+   snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+   snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+   if (!snapshot)
+   return -EINVAL;
+
+   if (end_offset > snapshot->data_len || dump)
+   end_offset = snapshot->data_len;
+
+   while (curr_offset < end_offset) {
+   u32 data_size;
+   u8 *data;
+
+   if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
+   data_size = end_offset - curr_offset;
+   else
+   data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
+
+   data = >data[curr_offset];
+   err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
+   data, data_size,
+   curr_offset);
+   if (err)
+   break;
+
+   curr_offset += data_size;
+   }
+   *new_offset = curr_offset;
+
+   return err;
+}
+
+static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
+struct netlink_callback *cb)
+{
+   u64 ret_offset, start_offset, end_offset = 0;
+   struct nlattr *attrs[DEVLINK_ATTR_MAX + 1];
+   const struct genl_ops *ops = cb->data;
+   struct d

[PATCH net-next v2 02/11] devlink: Add callback to query for snapshot id before snapshot create

2018-07-11 Thread Alex Vesker
To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID for multiple
snapshots taken of different regions on the same time.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h |  8 
 net/core/devlink.c| 21 +
 2 files changed, 29 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index e539765..f27d859 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -29,6 +29,7 @@ struct devlink {
struct list_head resource_list;
struct list_head param_list;
struct list_head region_list;
+   u32 snapshot_id;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -551,6 +552,7 @@ struct devlink_region *devlink_region_create(struct devlink 
*devlink,
 u32 region_max_snapshots,
 u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
+u32 devlink_region_shapshot_id_get(struct devlink *devlink);
 
 #else
 
@@ -792,6 +794,12 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
 {
 }
 
+static inline u32
+devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index cac8561..6c92ddd 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4193,6 +4193,27 @@ void devlink_region_destroy(struct devlink_region 
*region)
 }
 EXPORT_SYMBOL_GPL(devlink_region_destroy);
 
+/**
+ * devlink_region_shapshot_id_get - get snapshot ID
+ *
+ * This callback should be called when adding a new snapshot,
+ * Driver should use the same id for multiple snapshots taken
+ * on multiple regions at the same time/by the same trigger.
+ *
+ * @devlink: devlink
+ */
+u32 devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   u32 id;
+
+   mutex_lock(>lock);
+   id = ++devlink->snapshot_id;
+   mutex_unlock(>lock);
+
+   return id;
+}
+EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
+
 static int __init devlink_module_init(void)
 {
return genl_register_family(_nl_family);
-- 
1.8.3.1



[PATCH net-next v2 06/11] devlink: Add support for region snapshot delete command

2018-07-11 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_DEL used
for deleting a snapshot from a region. The snapshot ID is required.
Also added notification support for NEW and DEL of snapshots.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |  2 +
 net/core/devlink.c   | 93 
 2 files changed, 95 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 42fcb55..029bd7f 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -85,6 +85,8 @@ enum devlink_command {
 
DEVLINK_CMD_REGION_GET,
DEVLINK_CMD_REGION_SET,
+   DEVLINK_CMD_REGION_NEW,
+   DEVLINK_CMD_REGION_DEL,
 
/* add new commands above here */
__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index cb75e26..fc08363 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3236,6 +3236,58 @@ static int devlink_nl_region_fill(struct sk_buff *msg, 
struct devlink *devlink,
return err;
 }
 
+static void devlink_nl_region_notify(struct devlink_region *region,
+struct devlink_snapshot *snapshot,
+enum devlink_command cmd)
+{
+   struct devlink *devlink = region->devlink;
+   struct sk_buff *msg;
+   void *hdr;
+   int err;
+
+   WARN_ON(cmd != DEVLINK_CMD_REGION_NEW && cmd != DEVLINK_CMD_REGION_DEL);
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return;
+
+   hdr = genlmsg_put(msg, 0, 0, _nl_family, 0, cmd);
+   if (!hdr)
+   goto out_free_msg;
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto out_cancel_msg;
+
+   err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME,
+region->name);
+   if (err)
+   goto out_cancel_msg;
+
+   if (snapshot) {
+   err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID,
+ snapshot->id);
+   if (err)
+   goto out_cancel_msg;
+   } else {
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE,
+   region->size, DEVLINK_ATTR_PAD);
+   if (err)
+   goto out_cancel_msg;
+   }
+   genlmsg_end(msg, hdr);
+
+   genlmsg_multicast_netns(_nl_family, devlink_net(devlink),
+   msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+
+   return;
+
+out_cancel_msg:
+   genlmsg_cancel(msg, hdr);
+out_free_msg:
+   nlmsg_free(msg);
+}
+
 static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb,
  struct genl_info *info)
 {
@@ -3307,6 +3359,35 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
return msg->len;
 }
 
+static int devlink_nl_cmd_region_del(struct sk_buff *skb,
+struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink_snapshot *snapshot;
+   struct devlink_region *region;
+   const char *region_name;
+   u32 snapshot_id;
+
+   if (!info->attrs[DEVLINK_ATTR_REGION_NAME] ||
+   !info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
+   return -EINVAL;
+
+   region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+   snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+
+   region = devlink_region_get_by_name(devlink, region_name);
+   if (!region)
+   return -EINVAL;
+
+   snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+   if (!snapshot)
+   return -EINVAL;
+
+   devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL);
+   devlink_region_snapshot_del(snapshot);
+   return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -3331,6 +3412,7 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
[DEVLINK_ATTR_PARAM_TYPE] = { .type = NLA_U8 },
[DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
+   [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -3537,6 +3619,13 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+   {
+   .cmd = DEVLINK_CMD_REGION_DEL,
+   .doit = devlink_nl_cmd_region_del,
+   .policy = devlink_nl_policy,
+   .flags = GENL_ADMIN

[PATCH net-next v2 08/11] net/mlx4_core: Add health buffer address capability

2018-07-11 Thread Alex Vesker
Health buffer address is a 32 bit PCI address offset provided by
the FW. This offset is used for reading FW health debug data
located on the shared CR space. Cr space is accessible in both
driver and FW and allows for different queries and configurations.
Health buffer size is always 64B of readable data followed by a
lock which is used to block volatile CR space access.

Signed-off-by: Alex Vesker 
Signed-off-by: Tariq Toukan 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c   | 5 -
 drivers/net/ethernet/mellanox/mlx4/fw.h   | 1 +
 drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
 include/linux/mlx4/device.h   | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 46dcbfb..babcfd9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -825,7 +825,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_NUM_OFFSET 0xcc
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_MAX_OFFSET 0xd0
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_MIN_OFFSET 0xd2
-
+#define QUERY_DEV_CAP_HEALTH_BUFFER_ADDRESS_OFFSET 0xe4
 
dev_cap->flags2 = 0;
mailbox = mlx4_alloc_cmd_mailbox(dev);
@@ -1082,6 +1082,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev_cap->rl_caps.min_unit = size >> 14;
}
 
+   MLX4_GET(dev_cap->health_buffer_addrs, outbox,
+QUERY_DEV_CAP_HEALTH_BUFFER_ADDRESS_OFFSET);
+
MLX4_GET(field32, outbox, QUERY_DEV_CAP_EXT_2_FLAGS_OFFSET);
if (field32 & (1 << 16))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h 
b/drivers/net/ethernet/mellanox/mlx4/fw.h
index cd6399c..650ae08 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -128,6 +128,7 @@ struct mlx4_dev_cap {
u32 dmfs_high_rate_qpn_base;
u32 dmfs_high_rate_qpn_range;
struct mlx4_rate_limit_caps rl_caps;
+   u32 health_buffer_addrs;
struct mlx4_port_cap port_cap[MLX4_MAX_PORTS + 1];
bool wol_port[MLX4_MAX_PORTS + 1];
 };
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index c42eddf..806d441 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -523,6 +523,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev->caps.max_rss_tbl_sz = dev_cap->max_rss_tbl_sz;
dev->caps.wol_port[1]  = dev_cap->wol_port[1];
dev->caps.wol_port[2]  = dev_cap->wol_port[2];
+   dev->caps.health_buffer_addrs  = dev_cap->health_buffer_addrs;
 
/* Save uar page shift */
if (!mlx4_is_slave(dev)) {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 122e7e9..e3bfe76 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -630,6 +630,7 @@ struct mlx4_caps {
u32 vf_caps;
boolwol_port[MLX4_MAX_PORTS + 1];
struct mlx4_rate_limit_caps rl_caps;
+   u32 health_buffer_addrs;
 };
 
 struct mlx4_buf_list {
-- 
1.8.3.1



[PATCH net-next v2 09/11] net/mlx4_core: Add Crdump FW snapshot support

2018-07-11 Thread Alex Vesker
Crdump allows the driver to create a snapshot of the FW PCI
crspace and health buffer during a critical FW issue.
In case of a FW command timeout, FW getting stuck or a non zero
value on the catastrophic buffer, a snapshot will be taken.

The snapshot is exposed using devlink, cr-space, fw-health
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.

Signed-off-by: Alex Vesker 
Signed-off-by: Tariq Toukan 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 231 
 drivers/net/ethernet/mellanox/mlx4/main.c   |  10 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h |   6 +
 6 files changed, 255 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

diff --git a/drivers/net/ethernet/mellanox/mlx4/Makefile 
b/drivers/net/ethernet/mellanox/mlx4/Makefile
index 16b10d0..3f40077 100644
--- a/drivers/net/ethernet/mellanox/mlx4/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx4/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_MLX4_CORE) += mlx4_core.o
 
 mlx4_core-y := alloc.o catas.o cmd.o cq.o eq.o fw.o fw_qos.o icm.o intf.o \
main.o mcg.o mr.o pd.o port.o profile.o qp.o reset.o sense.o \
-   srq.o resource_tracker.o
+   srq.o resource_tracker.o crdump.o
 
 obj-$(CONFIG_MLX4_EN)   += mlx4_en.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c 
b/drivers/net/ethernet/mellanox/mlx4/catas.c
index 8afe4b5..c81d15b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/catas.c
+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c
@@ -178,10 +178,12 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent 
*persist)
 
dev = persist->dev;
mlx4_err(dev, "device is going to be reset\n");
-   if (mlx4_is_slave(dev))
+   if (mlx4_is_slave(dev)) {
err = mlx4_reset_slave(dev);
-   else
+   } else {
+   mlx4_crdump_collect(dev);
err = mlx4_reset_master(dev);
+   }
 
if (!err) {
mlx4_err(dev, "device was reset successfully\n");
diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c 
b/drivers/net/ethernet/mellanox/mlx4/crdump.c
new file mode 100644
index 000..4d5524d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -0,0 +1,231 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "mlx4.h"
+
+#define BAD_ACCESS 0xBADACCE5
+#define HEALTH_BUFFER_SIZE 0x40
+#define CR_ENABLE_BIT  swab32(BIT(6))
+#define CR_ENABLE_BIT_OFFSET   0xF3F04
+#define MAX_NUM_OF_DUMPS_TO_STORE  (8)
+
+static const char *region_cr_space_str = "cr-space";
+static const char *region_fw_health_str = "fw-health";
+
+/* Set to true in case cr enable bit was set to true before crdump */
+static bool crdump_enbale_bit_set;
+
+static void crdump_enable_crspace_access(struct mlx4_dev *dev,
+u8 __iomem *cr_space)
+{
+   /* Get current enable bit value */
+   crdump_enbale_bit_set =
+   readl(cr_space + CR_ENABLE_BIT_OFFSET) & CR_ENABLE_BIT;
+
+   /* Enable FW CR filter (set bit6 to 0) */
+   if (crdump_enbale_bit_set)
+   writel(readl(cr_space + CR_ENABLE_BIT_OFFSET) &

[PATCH net-next v2 01/11] devlink: Add support for creating and destroying regions

2018-07-11 Thread Alex Vesker
This allows a device to register its supported address regions.
Each address region can be accessed directly for example reading
the snapshots taken of this address space.
Drivers are not limited in the name selection for different regions.
An example of a region-name can be: pci cr-space, register-space.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h | 22 ++
 net/core/devlink.c| 84 +++
 2 files changed, 106 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f67c29c..e539765 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -28,6 +28,7 @@ struct devlink {
struct list_head dpipe_table_list;
struct list_head resource_list;
struct list_head param_list;
+   struct list_head region_list;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -397,6 +398,8 @@ enum devlink_param_generic_id {
.validate = _validate,  \
 }
 
+struct devlink_region;
+
 struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -543,6 +546,11 @@ int devlink_param_driverinit_value_get(struct devlink 
*devlink, u32 param_id,
 int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id,
   union devlink_param_value init_val);
 void devlink_param_value_changed(struct devlink *devlink, u32 param_id);
+struct devlink_region *devlink_region_create(struct devlink *devlink,
+const char *region_name,
+u32 region_max_snapshots,
+u64 region_size);
+void devlink_region_destroy(struct devlink_region *region);
 
 #else
 
@@ -770,6 +778,20 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
 {
 }
 
+static inline struct devlink_region *
+devlink_region_create(struct devlink *devlink,
+ const char *region_name,
+ u32 region_max_snapshots,
+ u64 region_size)
+{
+   return NULL;
+}
+
+static inline void
+devlink_region_destroy(struct devlink_region *region)
+{
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 470f3db..cac8561 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -326,6 +326,28 @@ static int devlink_sb_pool_index_get_from_info(struct 
devlink_sb *devlink_sb,
  pool_type, p_tc_index);
 }
 
+struct devlink_region {
+   struct devlink *devlink;
+   struct list_head list;
+   const char *name;
+   struct list_head snapshot_list;
+   u32 max_snapshots;
+   u32 cur_snapshots;
+   u64 size;
+};
+
+static struct devlink_region *
+devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
+{
+   struct devlink_region *region;
+
+   list_for_each_entry(region, >region_list, list)
+   if (!strcmp(region->name, region_name))
+   return region;
+
+   return NULL;
+}
+
 #define DEVLINK_NL_FLAG_NEED_DEVLINK   BIT(0)
 #define DEVLINK_NL_FLAG_NEED_PORT  BIT(1)
 #define DEVLINK_NL_FLAG_NEED_SBBIT(2)
@@ -3358,6 +3380,7 @@ struct devlink *devlink_alloc(const struct devlink_ops 
*ops, size_t priv_size)
INIT_LIST_HEAD_RCU(>dpipe_table_list);
INIT_LIST_HEAD(>resource_list);
INIT_LIST_HEAD(>param_list);
+   INIT_LIST_HEAD(>region_list);
mutex_init(>lock);
return devlink;
 }
@@ -4109,6 +4132,67 @@ void devlink_param_value_changed(struct devlink 
*devlink, u32 param_id)
 }
 EXPORT_SYMBOL_GPL(devlink_param_value_changed);
 
+/**
+ * devlink_region_create - create a new address region
+ *
+ * @devlink: devlink
+ * @region_name: region name
+ * @region_max_snapshots: Maximum supported number of snapshots for region
+ * @region_size: size of region
+ */
+struct devlink_region *devlink_region_create(struct devlink *devlink,
+const char *region_name,
+u32 region_max_snapshots,
+u64 region_size)
+{
+   struct devlink_region *region;
+   int err = 0;
+
+   mutex_lock(>lock);
+
+   if (devlink_region_get_by_name(devlink, region_name)) {
+   err = -EEXIST;
+   goto unlock;
+   }
+
+   region = kzalloc(sizeof(*region), GFP_KERNEL);
+   if (!region) {
+   err = -ENOMEM;
+   goto unlock;
+   }
+
+   region->devlink = devlink;
+   region->max_snapshots = region_max_snapshots;
+   

[PATCH net-next v2 00/11] devlink: Add support for region access

2018-07-11 Thread Alex Vesker
This is a proposal which will allow access to driver defined address
regions using devlink. Each device can create its supported address
regions and register them. A device which exposes a region will allow
access to it using devlink.

The suggested implementation will allow exposing regions to the user,
reading and dumping snapshots taken from different regions. 
A snapshot represents a memory image of a region taken by the driver.

If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots to be
done.

The major benefit of this support is not only to provide access to
internal address regions which were inaccessible to the user but also
to provide an additional way to debug complex error states using the
region snapshots.

Implemented commands:
$ devlink region help
$ devlink region show [ DEV/REGION ]
$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
$ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
$ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
address ADDRESS length length

Show all of the exposed regions with region sizes:
$ devlink region show
pci/:00:05.0/cr-space: size 1048576 snapshot [1 2]
pci/:00:05.0/fw-health: size 64 snapshot [1 2]

Delete a snapshot using:
$ devlink region del pci/:00:05.0/cr-space snapshot 1

Dump a snapshot:
$ devlink region dump pci/:00:05.0/fw-health snapshot 1
 0014 95dc 0014 9514 0035 1670 0034 db30
0010    ff04 0029 8c00 0028 8cc8
0020 0016 0bb8 0016 1720   c00f 3ffc
0030 bada cce5 bada cce5 bada cce5 bada cce5

Read a specific part of a snapshot:
$ devlink region read pci/:00:05.0/fw-health snapshot 1 address 0 
length 16
 0014 95dc 0014 9514 0035 1670 0034 db30

For more information you can check devlink-region.8 man page

Future:
There is a plan to extend the support to include a write command
as well as performing read and dump live region

v1->v2:
-Add a parameter to enable devlink region snapshot
-Allocate snapshot memory using kvmalloc
-Introduce destructor function devlink_snapshot_data_dest_t to avoid
 double allocation

Alex Vesker (11):
  devlink: Add support for creating and destroying regions
  devlink: Add callback to query for snapshot id before snapshot create
  devlink: Add support for creating region snapshots
  devlink: Add support for region get command
  devlink: Extend the support querying for region snapshot IDs
  devlink: Add support for region snapshot delete command
  devlink: Add support for region snapshot read command
  net/mlx4_core: Add health buffer address capability
  net/mlx4_core: Add Crdump FW snapshot support
  devlink: Add generic parameters region_snapshot
  net/mlx4_core: Use devlink region_snapshot parameter

 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 239 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c |   5 +-
 drivers/net/ethernet/mellanox/mlx4/fw.h |   1 +
 drivers/net/ethernet/mellanox/mlx4/main.c   |  52 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h |   8 +
 include/net/devlink.h   |  47 ++
 include/uapi/linux/devlink.h|  18 +
 net/core/devlink.c  | 647 
 11 files changed, 1024 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

-- 
1.8.3.1



[PATCH net-next v2 10/11] devlink: Add generic parameters region_snapshot

2018-07-11 Thread Alex Vesker
region_snapshot - When set enables capturing region snapshots

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
Reviewed-by: Moshe Shemesh 
---
 include/net/devlink.h | 4 
 net/core/devlink.c| 5 +
 2 files changed, 9 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 905f0bb..b9b89d6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -361,6 +361,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET,
DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV,
+   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
 
/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -376,6 +377,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_NAME "enable_sriov"
 #define DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME "region_snapshot_enable"
+#define DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE DEVLINK_PARAM_TYPE_BOOL
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
 {  \
.id = DEVLINK_PARAM_GENERIC_ID_##_id,   \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e5118db..65fc366 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2671,6 +2671,11 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, 
struct genl_info *info)
.name = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_NAME,
.type = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_TYPE,
},
+   {
+   .id = DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+   .name = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME,
+   .type = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE,
+   },
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
1.8.3.1



Re: [PATCH net-next 0/9] devlink: Add support for region access

2018-04-04 Thread Alex Vesker



On 3/31/2018 8:21 PM, David Ahern wrote:

On 3/31/18 9:53 AM, Andrew Lunn wrote:

I want to be able to login to a customer and accessing this snapshot
without any previous configuration from the user and not asking for
enabling the feature and then waiting for a repro...this will help
debugging issues that are hard to reproduce, I don't see any reason
to disable this.

The likely reality is 99.9% of these snapshots will never be seen or
used. But they take up memory sitting there doing nothing. And if the
snapshot is 2GB, that is a lot of memory. I expect a system admin
wants to be able to choose to enable this feature or not, because of
that memory. You should also consider implementing the memory pressure
callbacks, so you can discard snapshots, rather than OOM the machine.


That is exactly my point. Nobody wants one rogue device triggering
snapshots, consuming system resources and with no options to disable it.



OK, currently there is a task to support persistent/permanent configuration
to devlink. Once this support is in I will add my code on top of it.
This will allow a user to enable the snapshot functionality on the driver.
Regarding the double continuous allocation of memory, I will fix to a single
vmalloc on the driver in case of adding a snapshot. Tell me what you think.




Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in second kernel

2018-04-02 Thread Alex Vesker



On 4/2/2018 12:12 PM, Jiri Pirko wrote:

Fri, Mar 30, 2018 at 05:11:29PM CEST, and...@lunn.ch wrote:

Please see:
http://patchwork.ozlabs.org/project/netdev/list/?series=36524

I bevieve that the solution in the patchset could be used for
your usecase too.

Hi Jiri

https://lkml.org/lkml/2018/3/20/436

How well does this API work for a 2Gbyte snapshot?

Ccing Alex who did the tests.


I didn't check the performance for such a large snapshot.
From my measurement it takes 0.09s for 1 MB of data this means
about ~3m.
This can be tuned and improved since this is a socket application.


Andrew




Re: [PATCH net-next 0/9] devlink: Add support for region access

2018-03-31 Thread Alex Vesker



On 3/31/2018 1:26 AM, David Ahern wrote:

On 3/30/18 1:39 PM, Alex Vesker wrote:


On 3/30/2018 7:57 PM, David Ahern wrote:

On 3/30/18 8:34 AM, Andrew Lunn wrote:

And it seems to want contiguous pages. How well does that work after
the system has been running for a while and memory is fragmented?

The allocation can be changed, there is no read need for contiguous
pages.
It is important to note that we the amount of snapshots is limited
by the
driver
this can be based on the dump size or expected frequency of collection.
I also prefer not to pre-allocate this memory.

The driver code also asks for a 1MB contiguous chunk of memory!  You
really should think about this API, how can you avoid double memory
allocations. And can kvmalloc be used. But then you get into the
problem for DMA'ing the memory from the device...

This API also does not scale. 1MB is actually quite small. I'm sure
there is firmware running on CPUs with a lot more than 1MB of RAM.
How well does with API work with 64MB? Say i wanted to snapshot my
GPU? Or the MC/BMC?


That and the drivers control the number of snapshots. The user should be
able to control the number of snapshots, and an option to remove all
snapshots to free up that memory.

There is an option to free up this memory, using a delete command.
The reason I added the option to control the number of snapshots from
the driver side only is because the driver knows the size of the snapshots
and when/why they will be taken.
For example in our mlx4 driver the snapshots are taken on rare failures,
the snapshot is quite large and from past analyses the first dump is
usually
the important one, this means that 8 is more than enough in my case.
If a user wants more than that he can always monitor notification read
the snapshot and delete once backup-ed, there is no reason for keeping
all of this data in the kernel.



I was thinking less. ie., a user says keep only 1 or 2 snapshots or
disable snapshots altogether.

Devlink configuration is not persistent if the driver is reloaded, currently
there is no way to sync this. One or two might not be enough time to
read, delete and make room for the next one, as I said each driver should
do its calculations here based on frequency, size and even the time it takes
capturing it. The user can't know if one snapshot is enough for debug
I saw cases in which debug requires more than one snapshot to make
sure a health clock is incremented and the FW is alive.

I want to be able to login to a customer and accessing this snapshot
without any previous configuration from the user and not asking for
enabling the feature and then waiting for a repro...this will help
debugging issues that are hard to reproduce, I don't see any reason
to disable this.




Re: [PATCH net-next 0/9] devlink: Add support for region access

2018-03-30 Thread Alex Vesker



On 3/30/2018 7:57 PM, David Ahern wrote:

On 3/30/18 8:34 AM, Andrew Lunn wrote:

And it seems to want contiguous pages. How well does that work after
the system has been running for a while and memory is fragmented?

The allocation can be changed, there is no read need for contiguous pages.
It is important to note that we the amount of snapshots is limited by the
driver
this can be based on the dump size or expected frequency of collection.
I also prefer not to pre-allocate this memory.

The driver code also asks for a 1MB contiguous chunk of memory!  You
really should think about this API, how can you avoid double memory
allocations. And can kvmalloc be used. But then you get into the
problem for DMA'ing the memory from the device...

This API also does not scale. 1MB is actually quite small. I'm sure
there is firmware running on CPUs with a lot more than 1MB of RAM.
How well does with API work with 64MB? Say i wanted to snapshot my
GPU? Or the MC/BMC?


That and the drivers control the number of snapshots. The user should be
able to control the number of snapshots, and an option to remove all
snapshots to free up that memory.


There is an option to free up this memory, using a delete command.
The reason I added the option to control the number of snapshots from
the driver side only is because the driver knows the size of the snapshots
and when/why they will be taken.
For example in our mlx4 driver the snapshots are taken on rare failures,
the snapshot is quite large and from past analyses the first dump is usually
the important one, this means that 8 is more than enough in my case.
If a user wants more than that he can always monitor notification read
the snapshot and delete once backup-ed, there is no reason for keeping
all of this data in the kernel.




Re: [PATCH net-next 0/9] devlink: Add support for region access

2018-03-29 Thread Alex Vesker



On 3/29/2018 10:51 PM, Andrew Lunn wrote:

Show all of the exposed regions with region sizes:
$ devlink region show
pci/:00:05.0/cr-space: size 1048576 snapshot [1 2]

So you have 2Mbytes of snapshot data. Is this held in the device, or
kernel memory?

This is allocated in devlink, the maximum number of snapshots is set by the
driver.

And it seems to want contiguous pages. How well does that work after
the system has been running for a while and memory is fragmented?


The allocation can be changed, there is no read need for contiguous pages.
It is important to note that we the amount of snapshots is limited by 
the driver

this can be based on the dump size or expected frequency of collection.
I also prefer not to pre-allocate this memory.

Dump a snapshot:
$ devlink region dump pci/:00:05.0/fw-health snapshot 1
 0014 95dc 0014 9514 0035 1670 0034 db30
0010    ff04 0029 8c00 0028 8cc8
0020 0016 0bb8 0016 1720   c00f 3ffc
0030 bada cce5 bada cce5 bada cce5 bada cce5

Read a specific part of a snapshot:
$ devlink region read pci/:00:05.0/fw-health snapshot 1 address 0
length 16
 0014 95dc 0014 9514 0035 1670 0034 db30

Why a separate command? It seems to be just a subset of dump.

This is useful when debugging values on specific addresses, this also
brings the API one step closer for a read and write API.

The functionality is useful, yes. But why two commands? Why not one
command, dump, which takes optional parameters?


Dump in devlink means provide all the data, saying dump address x length 
y sounds

confusing.  Do you see this as a critical issue?


Also, i doubt write support will be accepted. That sounds like the
start of an API to allow a user space driver.


If this will be an issue we will stay with read access only.



   Andrew




Re: [PATCH net-next 0/9] devlink: Add support for region access

2018-03-29 Thread Alex Vesker



On 3/29/2018 8:13 PM, Andrew Lunn wrote:

On Thu, Mar 29, 2018 at 07:07:43PM +0300, Alex Vesker wrote:

This is a proposal which will allow access to driver defined address
regions using devlink. Each device can create its supported address
regions and register them. A device which exposes a region will allow
access to it using devlink.

The suggested implementation will allow exposing regions to the user,
reading and dumping snapshots taken from different regions.
A snapshot represents a memory image of a region taken by the driver.

If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots to be
done.

Hi Alex

So the device is in change of making a snapshot? A user cannot
initiate it?

Hi,
Correct, currently the user cannot initiate saving a snapshot but
as I said in the cover letter, planned support is for dumping "live" 
regions.



Seems like if i'm trying to debug something, i want to take a snapshot
in the good state, issue the command which breaks things, and then
take another snapshot. Looking at the diff then gives me an idea what
happened.


Show all of the exposed regions with region sizes:
$ devlink region show
pci/:00:05.0/cr-space: size 1048576 snapshot [1 2]

So you have 2Mbytes of snapshot data. Is this held in the device, or
kernel memory?
This is allocated in devlink, the maximum number of snapshots is set by 
the driver.



Dump a snapshot:
$ devlink region dump pci/:00:05.0/fw-health snapshot 1
 0014 95dc 0014 9514 0035 1670 0034 db30
0010    ff04 0029 8c00 0028 8cc8
0020 0016 0bb8 0016 1720   c00f 3ffc
0030 bada cce5 bada cce5 bada cce5 bada cce5

Read a specific part of a snapshot:
$ devlink region read pci/:00:05.0/fw-health snapshot 1 address 0
length 16
 0014 95dc 0014 9514 0035 1670 0034 db30

Why a separate command? It seems to be just a subset of dump.


This is useful when debugging values on specific addresses, this also
brings the API one step closer for a read and write API.



 Andrew




[PATCH net-next 1/9] devlink: Add support for creating and destroying regions

2018-03-29 Thread Alex Vesker
This allows a device to register its supported address regions.
Each address region can be accessed directly for example reading
the snapshots taken of this address space.
Drivers are not limited in the name selection for different regions.
An example of a region-name can be: pci cr-space, register-space.

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/devlink.h | 22 ++
 net/core/devlink.c| 84 +++
 2 files changed, 106 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index e21d8ca..784a33c 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -28,6 +28,7 @@ struct devlink {
struct list_head dpipe_table_list;
struct list_head resource_list;
struct devlink_dpipe_headers *dpipe_headers;
+   struct list_head region_list;
const struct devlink_ops *ops;
struct device *dev;
possible_net_t _net;
@@ -294,6 +295,8 @@ struct devlink_resource {
 
 #define DEVLINK_RESOURCE_ID_PARENT_TOP 0
 
+struct devlink_region;
+
 struct devlink_ops {
int (*reload)(struct devlink *devlink);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -419,6 +422,11 @@ int devlink_resource_size_get(struct devlink *devlink,
 int devlink_dpipe_table_resource_set(struct devlink *devlink,
 const char *table_name, u64 resource_id,
 u64 resource_units);
+struct devlink_region *devlink_region_create(struct devlink *devlink,
+const char *region_name,
+u32 region_max_snapshots,
+u64 region_size);
+void devlink_region_destroy(struct devlink_region *region);
 
 #else
 
@@ -589,6 +597,20 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
return -EOPNOTSUPP;
 }
 
+static inline struct devlink_region *
+devlink_region_create(struct devlink *devlink,
+ const char *region_name,
+ u32 region_max_snapshots,
+ u64 region_size)
+{
+   return NULL;
+}
+
+static inline void
+devlink_region_destroy(struct devlink_region *region)
+{
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9236e42..fd5b9f6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -326,6 +326,28 @@ static int devlink_sb_pool_index_get_from_info(struct 
devlink_sb *devlink_sb,
  pool_type, p_tc_index);
 }
 
+struct devlink_region {
+   struct devlink *devlink;
+   struct list_head list;
+   const char *name;
+   struct list_head snapshot_list;
+   u32 max_snapshots;
+   u32 cur_snapshots;
+   u64 size;
+};
+
+static struct devlink_region *
+devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
+{
+   struct devlink_region *region;
+
+   list_for_each_entry(region, >region_list, list)
+   if (!strcmp(region->name, region_name))
+   return region;
+
+   return NULL;
+}
+
 #define DEVLINK_NL_FLAG_NEED_DEVLINK   BIT(0)
 #define DEVLINK_NL_FLAG_NEED_PORT  BIT(1)
 #define DEVLINK_NL_FLAG_NEED_SBBIT(2)
@@ -2820,6 +2842,7 @@ struct devlink *devlink_alloc(const struct devlink_ops 
*ops, size_t priv_size)
INIT_LIST_HEAD(>sb_list);
INIT_LIST_HEAD_RCU(>dpipe_table_list);
INIT_LIST_HEAD(>resource_list);
+   INIT_LIST_HEAD(>region_list);
mutex_init(>lock);
return devlink;
 }
@@ -3315,6 +3338,67 @@ int devlink_dpipe_table_resource_set(struct devlink 
*devlink,
 }
 EXPORT_SYMBOL_GPL(devlink_dpipe_table_resource_set);
 
+/**
+ * devlink_region_create - create a new address region
+ *
+ * @devlink: devlink
+ * @region_name: region name
+ * @region_max_snapshots: Maximum supported number of snapshots for region
+ * @region_size: size of region
+ */
+struct devlink_region *devlink_region_create(struct devlink *devlink,
+const char *region_name,
+u32 region_max_snapshots,
+u64 region_size)
+{
+   struct devlink_region *region;
+   int err = 0;
+
+   mutex_lock(>lock);
+
+   if (devlink_region_get_by_name(devlink, region_name)) {
+   err = -EEXIST;
+   goto unlock;
+   }
+
+   region = kzalloc(sizeof(*region), GFP_KERNEL);
+   if (!region) {
+   err = -ENOMEM;
+   goto unlock;
+   }
+
+   region->devlink = devlink;
+   region->max_snapshots = region_max_snapshots;
+   region->name = region_name;
+   region->size = re

[PATCH net-next 2/9] devlink: Add callback to query for snapshot id before snapshot create

2018-03-29 Thread Alex Vesker
To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID for multiple
snapshots taken of different regions on the same time.

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/devlink.h |  8 
 net/core/devlink.c| 21 +
 2 files changed, 29 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 784a33c..5697c55 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -29,6 +29,7 @@ struct devlink {
struct list_head resource_list;
struct devlink_dpipe_headers *dpipe_headers;
struct list_head region_list;
+   u32 snapshot_id;
const struct devlink_ops *ops;
struct device *dev;
possible_net_t _net;
@@ -427,6 +428,7 @@ struct devlink_region *devlink_region_create(struct devlink 
*devlink,
 u32 region_max_snapshots,
 u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
+u32 devlink_region_shapshot_id_get(struct devlink *devlink);
 
 #else
 
@@ -611,6 +613,12 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
 {
 }
 
+static inline u32
+devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index fd5b9f6..4822a08 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3399,6 +3399,27 @@ void devlink_region_destroy(struct devlink_region 
*region)
 }
 EXPORT_SYMBOL_GPL(devlink_region_destroy);
 
+/**
+ * devlink_region_shapshot_id_get - get snapshot ID
+ *
+ * This callback should be called when adding a new snapshot,
+ * Driver should use the same id for multiple snapshots taken
+ * on multiple regions at the same time/by the same trigger.
+ *
+ * @devlink: devlink
+ */
+u32 devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   u32 id;
+
+   mutex_lock(>lock);
+   id = ++devlink->snapshot_id;
+   mutex_unlock(>lock);
+
+   return id;
+}
+EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
+
 static int __init devlink_module_init(void)
 {
return genl_register_family(_nl_family);
-- 
1.8.3.1



[PATCH net-next 7/9] devlink: Add support for region snapshot read command

2018-03-29 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_READ_GET used for both reading
and dumping region data. Read allows reading from a region specific
address for given length. Dump allows reading the full region.
If only snapshot ID is provided a snapshot dump will be done.
If snapshot ID, Address and Length are provided a snapshot read
will done.

This is used for both snapshot access and will be used in the same
way to access current data on the region.

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/devlink.h |   7 ++
 net/core/devlink.c   | 182 +++
 2 files changed, 189 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 8662a03..e9e94dd 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -82,6 +82,7 @@ enum devlink_command {
DEVLINK_CMD_REGION_SET,
DEVLINK_CMD_REGION_NEW,
DEVLINK_CMD_REGION_DEL,
+   DEVLINK_CMD_REGION_READ,
 
/* add new commands above here */
__DEVLINK_CMD_MAX,
@@ -235,6 +236,12 @@ enum devlink_attr {
DEVLINK_ATTR_REGION_SNAPSHOT,   /* nested */
DEVLINK_ATTR_REGION_SNAPSHOT_ID,/* u32 */
 
+   DEVLINK_ATTR_REGION_CHUNKS, /* nested */
+   DEVLINK_ATTR_REGION_CHUNK,  /* nested */
+   DEVLINK_ATTR_REGION_CHUNK_DATA, /* binary */
+   DEVLINK_ATTR_REGION_CHUNK_ADDR, /* u64 */
+   DEVLINK_ATTR_REGION_CHUNK_LEN,  /* u64 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index f5c90a8..101c6ef 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2869,6 +2869,181 @@ static int devlink_nl_cmd_region_del(struct sk_buff 
*skb,
return 0;
 }
 
+static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
+struct devlink *devlink,
+u8 *chunk, u32 chunk_size,
+u64 addr)
+{
+   struct nlattr *chunk_attr;
+   int err;
+
+   chunk_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_CHUNK);
+   if (!chunk_attr)
+   return -EINVAL;
+
+   err = nla_put(msg, DEVLINK_ATTR_REGION_CHUNK_DATA, chunk_size, chunk);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_CHUNK_ADDR, addr,
+   DEVLINK_ATTR_PAD);
+   if (err)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, chunk_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, chunk_attr);
+   return err;
+}
+
+#define DEVLINK_REGION_READ_CHUNK_SIZE 256
+
+static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
+   struct devlink *devlink,
+   struct devlink_region *region,
+   struct nlattr **attrs,
+   u64 start_offset,
+   u64 end_offset,
+   bool dump,
+   u64 *new_offset)
+{
+   struct devlink_snapshot *snapshot;
+   u64 curr_offset = start_offset;
+   u32 snapshot_id;
+   int err = 0;
+
+   *new_offset = start_offset;
+
+   snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+   snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+   if (!snapshot)
+   return -EINVAL;
+
+   if (end_offset > snapshot->data_len || dump)
+   end_offset = snapshot->data_len;
+
+   while (curr_offset < end_offset) {
+   u32 data_size;
+   u8 *data;
+
+   if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
+   data_size = end_offset - curr_offset;
+   else
+   data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
+
+   data = >data[curr_offset];
+   err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
+   data, data_size,
+   curr_offset);
+   if (err)
+   break;
+
+   curr_offset += data_size;
+   }
+   *new_offset = curr_offset;
+
+   return err;
+}
+
+static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
+struct netlink_callback *cb)
+{
+   u64 ret_offset, start_offset, end_offset = 0;
+   struct nlattr *attrs[DEVLINK_ATTR_MAX + 1];
+   const s

[PATCH net-next 5/9] devlink: Extend the support querying for region snapshot IDs

2018-03-29 Thread Alex Vesker
Extend the support for DEVLINK_CMD_REGION_GET command to also
return the IDs of the snapshot currently present on the region.
Each reply will include a nested snapshots attribute that
can contain multiple snapshot attributes each with an ID.

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/devlink.h |  3 +++
 net/core/devlink.c   | 53 
 2 files changed, 56 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 8d24f49..786185a 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -229,6 +229,9 @@ enum devlink_attr {
 
DEVLINK_ATTR_REGION_NAME,   /* string */
DEVLINK_ATTR_REGION_SIZE,   /* u32 */
+   DEVLINK_ATTR_REGION_SNAPSHOTS,  /* nested */
+   DEVLINK_ATTR_REGION_SNAPSHOT,   /* nested */
+   DEVLINK_ATTR_REGION_SNAPSHOT_ID,/* u32 */
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 20d243d..915bb33 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2630,6 +2630,55 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, 
struct genl_info *info)
return devlink->ops->reload(devlink);
 }
 
+static int devlink_nl_region_snapshot_id_put(struct sk_buff *msg,
+struct devlink *devlink,
+struct devlink_snapshot *snapshot)
+{
+   struct nlattr *snap_attr;
+   int err;
+
+   snap_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_SNAPSHOT);
+   if (!snap_attr)
+   return -EINVAL;
+
+   err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID, snapshot->id);
+   if (err)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, snap_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, snap_attr);
+   return err;
+}
+
+static int devlink_nl_region_snapshots_id_put(struct sk_buff *msg,
+ struct devlink *devlink,
+ struct devlink_region *region)
+{
+   struct devlink_snapshot *snapshot;
+   struct nlattr *snapshots_attr;
+   int err;
+
+   snapshots_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_SNAPSHOTS);
+   if (!snapshots_attr)
+   return -EINVAL;
+
+   list_for_each_entry(snapshot, >snapshot_list, list) {
+   err = devlink_nl_region_snapshot_id_put(msg, devlink, snapshot);
+   if (err)
+   goto nla_put_failure;
+   }
+
+   nla_nest_end(msg, snapshots_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, snapshots_attr);
+   return err;
+}
+
 static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
  enum devlink_command cmd, u32 portid,
  u32 seq, int flags,
@@ -2656,6 +2705,10 @@ static int devlink_nl_region_fill(struct sk_buff *msg, 
struct devlink *devlink,
if (err)
goto nla_put_failure;
 
+   err = devlink_nl_region_snapshots_id_put(msg, devlink, region);
+   if (err)
+   goto nla_put_failure;
+
genlmsg_end(msg, hdr);
return 0;
 
-- 
1.8.3.1



[PATCH net-next 4/9] devlink: Add support for region get command

2018-03-29 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_GET command which is used for
querying for the supported DEV/REGION values of devlink devices.
The support is both for doit and dumpit.

Reply includes:
  BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/devlink.h |   6 +++
 net/core/devlink.c   | 114 +++
 2 files changed, 120 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 1df65a4..8d24f49 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -78,6 +78,9 @@ enum devlink_command {
 */
DEVLINK_CMD_RELOAD,
 
+   DEVLINK_CMD_REGION_GET,
+   DEVLINK_CMD_REGION_SET,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -224,6 +227,9 @@ enum devlink_attr {
DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_ID,   /* u64 */
DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_UNITS,/* u64 */
 
+   DEVLINK_ATTR_REGION_NAME,   /* string */
+   DEVLINK_ATTR_REGION_SIZE,   /* u32 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 785e87d..20d243d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2630,6 +2630,111 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, 
struct genl_info *info)
return devlink->ops->reload(devlink);
 }
 
+static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
+ enum devlink_command cmd, u32 portid,
+ u32 seq, int flags,
+ struct devlink_region *region)
+{
+   void *hdr;
+   int err;
+
+   hdr = genlmsg_put(msg, portid, seq, _nl_family, flags, cmd);
+   if (!hdr)
+   return -EMSGSIZE;
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->name);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE,
+   region->size,
+   DEVLINK_ATTR_PAD);
+   if (err)
+   goto nla_put_failure;
+
+   genlmsg_end(msg, hdr);
+   return 0;
+
+nla_put_failure:
+   genlmsg_cancel(msg, hdr);
+   return err;
+}
+
+static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink_region *region;
+   const char *region_name;
+   struct sk_buff *msg;
+   int err;
+
+   if (!info->attrs[DEVLINK_ATTR_REGION_NAME])
+   return -EINVAL;
+
+   region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+   region = devlink_region_get_by_name(devlink, region_name);
+   if (!region)
+   return -EINVAL;
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return -ENOMEM;
+
+   err = devlink_nl_region_fill(msg, devlink, DEVLINK_CMD_REGION_GET,
+info->snd_portid, info->snd_seq, 0,
+region);
+   if (err) {
+   nlmsg_free(msg);
+   return err;
+   }
+
+   return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg,
+   struct netlink_callback *cb)
+{
+   struct devlink_region *region;
+   struct devlink *devlink;
+   int start = cb->args[0];
+   int idx = 0;
+   int err;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(devlink, _list, list) {
+   if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+   continue;
+
+   mutex_lock(>lock);
+   list_for_each_entry(region, >region_list, list) {
+   if (idx < start) {
+   idx++;
+   continue;
+   }
+   err = devlink_nl_region_fill(msg, devlink,
+DEVLINK_CMD_REGION_GET,
+NETLINK_CB(cb->skb).portid,
+cb->nlh->nlmsg_seq,
+NLM_F_MULTI, region);
+   if (err) {
+   mutex_unlock(>lock);
+   goto out;
+   }
+

[PATCH net-next 3/9] devlink: Add support for creating region snapshots

2018-03-29 Thread Alex Vesker
Each device address region can store multiple snapshots,
each snapshot is identified using a different numerical ID.
This ID is used when deleting a snapshot or showing an address
region specific snapshot. This patch exposes a callback to add
a new snapshot (data, data length and ID) to an address region.
The snapshot are can be deleted from devlink user tool or when
destroying a region.

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/devlink.h |  9 +
 net/core/devlink.c| 99 +++
 2 files changed, 108 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 5697c55..83e569f 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -429,6 +429,8 @@ struct devlink_region *devlink_region_create(struct devlink 
*devlink,
 u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
 u32 devlink_region_shapshot_id_get(struct devlink *devlink);
+int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id);
 
 #else
 
@@ -619,6 +621,13 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
return 0;
 }
 
+static inline int
+devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 4822a08..785e87d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -336,6 +336,14 @@ struct devlink_region {
u64 size;
 };
 
+struct devlink_snapshot {
+   struct list_head list;
+   struct devlink_region *region;
+   u64 data_len;
+   u8 *data;
+   u32 id;
+};
+
 static struct devlink_region *
 devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
 {
@@ -348,6 +356,26 @@ struct devlink_region {
return NULL;
 }
 
+static struct devlink_snapshot *
+devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id)
+{
+   struct devlink_snapshot *snapshot;
+
+   list_for_each_entry(snapshot, >snapshot_list, list)
+   if (snapshot->id == id)
+   return snapshot;
+
+   return NULL;
+}
+
+static void devlink_region_snapshot_del(struct devlink_snapshot *snapshot)
+{
+   snapshot->region->cur_snapshots--;
+   list_del(>list);
+   kfree(snapshot->data);
+   kfree(snapshot);
+}
+
 #define DEVLINK_NL_FLAG_NEED_DEVLINK   BIT(0)
 #define DEVLINK_NL_FLAG_NEED_PORT  BIT(1)
 #define DEVLINK_NL_FLAG_NEED_SBBIT(2)
@@ -3391,8 +3419,14 @@ struct devlink_region *devlink_region_create(struct 
devlink *devlink,
 void devlink_region_destroy(struct devlink_region *region)
 {
struct devlink *devlink = region->devlink;
+   struct devlink_snapshot *snapshot, *ts;
 
mutex_lock(>lock);
+
+   /* Free all snapshots of region */
+   list_for_each_entry_safe(snapshot, ts, >snapshot_list, list)
+   devlink_region_snapshot_del(snapshot);
+
list_del(>list);
mutex_unlock(>lock);
kfree(region);
@@ -3420,6 +3454,71 @@ u32 devlink_region_shapshot_id_get(struct devlink 
*devlink)
 }
 EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
 
+/**
+ * devlink_region_snapshot_create - create a new snapshot
+ * This will add a new snapshot of a region. The snapshot
+ * will be stored on the region struct and can be accessed
+ * from devlink. This is useful for future analyses of snapshots.
+ * Multiple snapshots can be created on a region.
+ * The @snapshot_id should be obtained using the getter function.
+ *
+ * @devlink_region: devlink region of the snapshot
+ * @data_len: size of snapshot data
+ * @data: snapshot data
+ * @snapshot_id: snapshot id to be created
+ */
+int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id)
+{
+   struct devlink *devlink = region->devlink;
+   struct devlink_snapshot *snapshot;
+   int err;
+
+   mutex_lock(>lock);
+
+   /* check if region can hold one more snapshot */
+   if (region->cur_snapshots == region->max_snapshots) {
+   err = -ENOMEM;
+   goto unlock;
+   }
+
+   if (devlink_region_snapshot_get_by_id(region, snapshot_id)) {
+   err = -EEXIST;
+   goto unlock;
+   }
+
+   snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL);
+   if (!snapshot) {
+   err = -ENOMEM;
+   goto unlock;
+   }
+
+   snapshot->data = kzalloc(data_len, GFP_KERNEL);
+   if (!snapshot->data) {
+   err = -ENOMEM;
+

[PATCH net-next 8/9] net/mlx4_core: Add health buffer address capability

2018-03-29 Thread Alex Vesker
Health buffer address is a 32 bit PCI address offset provided by
the FW. This offset is used for reading FW health debug data
located on the shared CR space. Cr space is accessible in both
driver and FW and allows for different queries and configurations.
Health buffer size is always 64B of readable data followed by a
lock which is used to block volatile CR space access.

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Tariq Toukan <tar...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c   | 5 -
 drivers/net/ethernet/mellanox/mlx4/fw.h   | 1 +
 drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
 include/linux/mlx4/device.h   | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 634f603..4bb266e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -823,7 +823,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_NUM_OFFSET 0xcc
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_MAX_OFFSET 0xd0
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_MIN_OFFSET 0xd2
-
+#define QUERY_DEV_CAP_HEALTH_BUFFER_ADDRESS_OFFSET 0xe4
 
dev_cap->flags2 = 0;
mailbox = mlx4_alloc_cmd_mailbox(dev);
@@ -1078,6 +1078,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev_cap->rl_caps.min_unit = size >> 14;
}
 
+   MLX4_GET(dev_cap->health_buffer_addrs, outbox,
+QUERY_DEV_CAP_HEALTH_BUFFER_ADDRESS_OFFSET);
+
MLX4_GET(field32, outbox, QUERY_DEV_CAP_EXT_2_FLAGS_OFFSET);
if (field32 & (1 << 16))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h 
b/drivers/net/ethernet/mellanox/mlx4/fw.h
index cd6399c..650ae08 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -128,6 +128,7 @@ struct mlx4_dev_cap {
u32 dmfs_high_rate_qpn_base;
u32 dmfs_high_rate_qpn_range;
struct mlx4_rate_limit_caps rl_caps;
+   u32 health_buffer_addrs;
struct mlx4_port_cap port_cap[MLX4_MAX_PORTS + 1];
bool wol_port[MLX4_MAX_PORTS + 1];
 };
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 100ded5..acc6ccc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -427,6 +427,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev->caps.max_rss_tbl_sz = dev_cap->max_rss_tbl_sz;
dev->caps.wol_port[1]  = dev_cap->wol_port[1];
dev->caps.wol_port[2]  = dev_cap->wol_port[2];
+   dev->caps.health_buffer_addrs  = dev_cap->health_buffer_addrs;
 
/* Save uar page shift */
if (!mlx4_is_slave(dev)) {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index b2423ba..1e4b0f1 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -633,6 +633,7 @@ struct mlx4_caps {
u32 vf_caps;
boolwol_port[MLX4_MAX_PORTS + 1];
struct mlx4_rate_limit_caps rl_caps;
+   u32 health_buffer_addrs;
 };
 
 struct mlx4_buf_list {
-- 
1.8.3.1



[PATCH net-next 9/9] net/mlx4_core: Add Crdump FW snapshot support

2018-03-29 Thread Alex Vesker
Crdump allows the driver to create a snapshot of the FW PCI
crspace and health buffer during a critical FW issue.
In case of a FW command timeout, FW getting stuck or a non zero
value on the catastrophic buffer, a snapshot will be taken.

The snapshot is exposed using devlink, cr-space, fw-health
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Tariq Toukan <tar...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 224 
 drivers/net/ethernet/mellanox/mlx4/main.c   |  10 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h |   6 +
 6 files changed, 248 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

diff --git a/drivers/net/ethernet/mellanox/mlx4/Makefile 
b/drivers/net/ethernet/mellanox/mlx4/Makefile
index 16b10d0..3f40077 100644
--- a/drivers/net/ethernet/mellanox/mlx4/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx4/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_MLX4_CORE) += mlx4_core.o
 
 mlx4_core-y := alloc.o catas.o cmd.o cq.o eq.o fw.o fw_qos.o icm.o intf.o \
main.o mcg.o mr.o pd.o port.o profile.o qp.o reset.o sense.o \
-   srq.o resource_tracker.o
+   srq.o resource_tracker.o crdump.o
 
 obj-$(CONFIG_MLX4_EN)   += mlx4_en.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c 
b/drivers/net/ethernet/mellanox/mlx4/catas.c
index e2b6b0c..e9fdf14 100644
--- a/drivers/net/ethernet/mellanox/mlx4/catas.c
+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c
@@ -178,10 +178,12 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent 
*persist)
 
dev = persist->dev;
mlx4_err(dev, "device is going to be reset\n");
-   if (mlx4_is_slave(dev))
+   if (mlx4_is_slave(dev)) {
err = mlx4_reset_slave(dev);
-   else
+   } else {
+   mlx4_crdump_collect(dev);
err = mlx4_reset_master(dev);
+   }
 
if (!err) {
mlx4_err(dev, "device was reset successfully\n");
diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c 
b/drivers/net/ethernet/mellanox/mlx4/crdump.c
new file mode 100644
index 000..677d2d9
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -0,0 +1,224 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "mlx4.h"
+
+#define BAD_ACCESS 0xBADACCE5
+#define HEALTH_BUFFER_SIZE 0x40
+#define CR_ENABLE_BIT  swab32(BIT(6))
+#define CR_ENABLE_BIT_OFFSET   0xF3F04
+#define MAX_NUM_OF_DUMPS_TO_STORE  (8)
+
+const char *region_cr_space_str = "cr-space";
+const char *region_fw_health_str = "fw-health";
+
+/* Set to true in case cr enable bit was set to true before crdump */
+bool crdump_enbale_bit_set;
+
+static void crdump_enable_crspace_access(struct mlx4_dev *dev, u8 *cr_space)
+{
+   /* Get current enable bit value */
+   crdump_enbale_bit_set =
+   readl(cr_space + CR_ENABLE_BIT_OFFSET) & CR_ENABLE_BIT;
+
+   /* Enable FW CR filter (set bit6 to 0) */
+   if (crdump_enbale_bit_set)
+   writel(readl(cr_space + CR_ENABLE_BIT_OFFSET) & ~CR_ENABL

[PATCH net-next 0/9] devlink: Add support for region access

2018-03-29 Thread Alex Vesker
This is a proposal which will allow access to driver defined address
regions using devlink. Each device can create its supported address
regions and register them. A device which exposes a region will allow
access to it using devlink.

The suggested implementation will allow exposing regions to the user,
reading and dumping snapshots taken from different regions. 
A snapshot represents a memory image of a region taken by the driver.

If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots to be
done.

The major benefit of this support is not only to provide access to
internal address regions which were inaccessible to the user but also
to provide an additional way to debug complex error states using the
region snapshots.

Implemented commands:
$ devlink region help
$ devlink region show [ DEV/REGION ]
$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
$ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
$ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
address ADDRESS length length

Show all of the exposed regions with region sizes:
$ devlink region show
pci/:00:05.0/cr-space: size 1048576 snapshot [1 2]
pci/:00:05.0/fw-health: size 64 snapshot [1 2]

Delete a snapshot using:
$ devlink region del pci/:00:05.0/cr-space snapshot 1

Dump a snapshot:
$ devlink region dump pci/:00:05.0/fw-health snapshot 1
 0014 95dc 0014 9514 0035 1670 0034 db30
0010    ff04 0029 8c00 0028 8cc8
0020 0016 0bb8 0016 1720   c00f 3ffc
0030 bada cce5 bada cce5 bada cce5 bada cce5

Read a specific part of a snapshot:
$ devlink region read pci/:00:05.0/fw-health snapshot 1 address 0 
length 16
 0014 95dc 0014 9514 0035 1670 0034 db30

For more information you can check devlink-region.8 man page

Future:
There is a plan to extend the support to include a write command
as well as performing read and dump live region

Alex Vesker (9):
  devlink: Add support for creating and destroying regions
  devlink: Add callback to query for snapshot id before snapshot create
  devlink: Add support for creating region snapshots
  devlink: Add support for region get command
  devlink: Extend the support querying for region snapshot IDs
  devlink: Add support for region snapshot delete command
  devlink: Add support for region snapshot read command
  net/mlx4_core: Add health buffer address capability
  net/mlx4_core: Add Crdump FW snapshot support

 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 224 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c |   5 +-
 drivers/net/ethernet/mellanox/mlx4/fw.h |   1 +
 drivers/net/ethernet/mellanox/mlx4/main.c   |  11 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h |   7 +
 include/net/devlink.h   |  39 ++
 include/uapi/linux/devlink.h|  18 +
 net/core/devlink.c  | 646 
 11 files changed, 958 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

-- 
1.8.3.1



[PATCH net-next 6/9] devlink: Add support for region snapshot delete command

2018-03-29 Thread Alex Vesker
Add support for DEVLINK_CMD_REGION_DEL used
for deleting a snapshot from a region. The snapshot ID is required.
Also added notification support for NEW and DEL of snapshots.

Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/devlink.h |  2 +
 net/core/devlink.c   | 93 
 2 files changed, 95 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 786185a..8662a03 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -80,6 +80,8 @@ enum devlink_command {
 
DEVLINK_CMD_REGION_GET,
DEVLINK_CMD_REGION_SET,
+   DEVLINK_CMD_REGION_NEW,
+   DEVLINK_CMD_REGION_DEL,
 
/* add new commands above here */
__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 915bb33..f5c90a8 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2717,6 +2717,58 @@ static int devlink_nl_region_fill(struct sk_buff *msg, 
struct devlink *devlink,
return err;
 }
 
+static void devlink_nl_region_notify(struct devlink_region *region,
+struct devlink_snapshot *snapshot,
+enum devlink_command cmd)
+{
+   struct devlink *devlink = region->devlink;
+   struct sk_buff *msg;
+   void *hdr;
+   int err;
+
+   WARN_ON(cmd != DEVLINK_CMD_REGION_NEW && cmd != DEVLINK_CMD_REGION_DEL);
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return;
+
+   hdr = genlmsg_put(msg, 0, 0, _nl_family, 0, cmd);
+   if (!hdr)
+   goto out_free_msg;
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto out_cancel_msg;
+
+   err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME,
+region->name);
+   if (err)
+   goto out_cancel_msg;
+
+   if (snapshot) {
+   err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID,
+ snapshot->id);
+   if (err)
+   goto out_cancel_msg;
+   } else {
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE,
+   region->size, DEVLINK_ATTR_PAD);
+   if (err)
+   goto out_cancel_msg;
+   }
+   genlmsg_end(msg, hdr);
+
+   genlmsg_multicast_netns(_nl_family, devlink_net(devlink),
+   msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+
+   return;
+
+out_cancel_msg:
+   genlmsg_cancel(msg, hdr);
+out_free_msg:
+   nlmsg_free(msg);
+}
+
 static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb,
  struct genl_info *info)
 {
@@ -2788,6 +2840,35 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
return msg->len;
 }
 
+static int devlink_nl_cmd_region_del(struct sk_buff *skb,
+struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink_snapshot *snapshot;
+   struct devlink_region *region;
+   const char *region_name;
+   u32 snapshot_id;
+
+   if (!info->attrs[DEVLINK_ATTR_REGION_NAME] ||
+   !info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
+   return -EINVAL;
+
+   region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+   snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+
+   region = devlink_region_get_by_name(devlink, region_name);
+   if (!region)
+   return -EINVAL;
+
+   snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+   if (!snapshot)
+   return -EINVAL;
+
+   devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL);
+   devlink_region_snapshot_del(snapshot);
+   return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -2809,6 +2890,7 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
[DEVLINK_ATTR_RESOURCE_ID] = { .type = NLA_U64},
[DEVLINK_ATTR_RESOURCE_SIZE] = { .type = NLA_U64},
[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
+   [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -2999,6 +3081,13 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+   {
+   .cmd = DEVLINK_CMD_REGION_DEL,
+   .doit = devlink_nl_cmd_region_del,
+   .

Re: [for-next 08/12] net/mlx5e: IPoIB, Use hash-table to map between QPN to child netdev

2017-10-24 Thread Alex Vesker



On 10/23/2017 6:47 PM, Jason Gunthorpe wrote:

On Sat, Oct 14, 2017 at 11:48:23AM -0700, Saeed Mahameed wrote:

From: Alex Vesker <va...@mellanox.com>

This change is needed for PKEY support, since the RQs are shared
between the child interface and the parent. The parent is responsible
for NAPI and the precessing of RX completions. Using the dqpn in the
completion descriptor we set the corresponding child IPoIB netdevice
on the SKB.
The mapping between the dqpn and the netdevice is done using a HT,
each mlx5 IPoIB interface registers its mapping on creation.

It seems really really weird to share the receive Q across all of the
children and do the sorting in software.. why is this done like this?

Wouldn't it be better to allow the children to progress concurrently,
potentially on multiple cores? They all have their own QPs after all..

Jason

The child interface RQs are in minimum size since they are not used.
The reason for this is saving resources, since we want to be able to 
support many PKEYs
and allocating less memory while reusing HW resources such as flow 
steering table.
This solution allowed us save memory on allocating the RQs of the child 
and use the same
HW flow steering table, TIR, RQT. Using the hash table doesn't impact on 
performance.

The parent have multiple RQs on different cores running in parallel.