Re: [PATCH blktests 1/9] blktests: add hepler functions for new md tests

2018-06-06 Thread bingjingc

Hi Johannes & Jens,

Thank you for your information and positive feedbacks on this patch.

I'm also inspired by xfstests, too. Most conventional filesystems
have their test cases on it. I believe block layer can also have one.

I felt sorry. It's first time I realized there is a test suite under
mdadm project. Now, I am trying to find documentation to get
it works. I can examine the efforts of migrating them or just
triggering them from blktests.

Any ideas from linux-block and linux-raid channels are welcome. :)

All the best,
BingJing

Johannes Thumshirn 於 2018-06-06 22:33 寫到:

On Wed, Jun 06, 2018 at 08:29:25AM -0600, Jens Axboe wrote:

Hopefully this can be the start of migrating over those tests!


Yes this would be great. I just wanted to connect the submitter and
the md developers and make them aware of possibly duplicated efforts
;-).




Re: [GIT PULL] Block changes for 4.18-rc

2018-06-06 Thread Jens Axboe
On 6/4/18 6:56 PM, Kent Overstreet wrote:
> On Mon, Jun 04, 2018 at 05:42:04PM -0700, Linus Torvalds wrote:
>> On Mon, Jun 4, 2018 at 12:04 PM Kent Overstreet
>>  wrote:
>>>
>>> However, that's not correct as is because mddev_delayed_put() calls
>>> kobject_put(), and the kobject isn't initialized when the mddev is first
>>> allocated, it's initialized when the gendisk is allocated... that isn't 
>>> hard to
>>> fix but that's getting into real refactoring that I'll need to put actual 
>>> work
>>> into testing.
>>
>> Well, it also removes the bioset_exit() calls entirely.
> 
> Yeah, I realized that when I went back to finish that patch
>>
>> How about just the attached?
>>
>> It simply does it as two different cases, and adds the bioset_exit()
>> calls to mddev_delayed_delete().
> 
> Oh right, just taking advantage of the fact that just the queue_work() needs 
> to
> be under the spinlock, not the actual free in the other case.
> 
> I like your patch for a less invasive version, but I did finish and test my
> version, which deletes more code :)
> 
> I've already gone to the trouble of coming up with a VM smoketest, so I can 
> test
> yours too... I don't really have a strong opinion on which patch should go in.

Kent, care to submit a proper version? We should get this in.

-- 
Jens Axboe



[PATCH v3 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-06-06 Thread Roman Pen
Hi all,

here is v3 of IBNBD/IBTRS patches, which have minor changes:
sparse fixes, comments, sysfs API changes, etc.

 Changelog
 -

v3:
  o Sparse fixes:
 - le32 -> le16 conversion
 - pcpu and RCU wrong declaration
 - sysfs: dynamically alloc array of sockaddr structures to reduce
   size of a stack frame

  o Rename sysfs folder on client and server sides to show source and
destination addresses of the connection, i.e.:
   ...//paths//

  o Remove external inclusions from Makefiles.

v2:
  o IBNBD:
 - No legacy request IO mode, only MQ is left.

  o IBTRS:
 - No FMR registration, only FR is left.

 - By default memory is always registered for the sake of the security,
   i.e. by default no pd is created with IB_PD_UNSAFE_GLOBAL_RKEY.

 - Server side (target) always does memory registration and exchanges
   MRs dma addresses with client for direct writes from client side.

 - Client side (initiator) has `noreg_cnt` module option, which 
specifies
   sg number, from which read IO should be registered.  By default 0
   is set, i.e. always register memory for read IOs. (IBTRS protocol
   does not require registration for writes, which always go directly
   to server memory).

 - Proper DMA sync with ib_dma_sync_single_for_(cpu|device) calls.

 - Do signalled IB_WR_LOCAL_INV.

 - Avoid open-coding of string conversion to IPv4/6 sockaddr,
   inet_pton_with_scope() is used instead.

 - Introduced block device namespaces configuration on server side
   (target) to avoid security gap in not trusted environment, when
   client can map a block device which does not belong to him.
   When device namespaces are enabled on server side, server opens
   device using client's session name in the device path, where
   session name is a random token, e.g. GUID.  If server is configured
   to find device namespaces in a folder /run/ibnbd-guid/, then
   request to map device 'sda1' from client with session 'A' (or any
   token) will be resolved by path /run/ibnbd-guid/A/sda1.

 - README is extended with description of IBTRS and IBNBD protocol,
   e.g. how IB IMM field is used to acknowledge IO requests or
   heartbeats.

 - IBTRS/IBNBD client and server modules are registered as devices in
   the kernel in order to have all sysfs configuration entries under
   /sys/devices/virtual/ in order not to spoil /sys/kernel directory.
   I failed to switch configuration to configfs, because of the
   several reasons:

   a) configfs entries created from kernel side using
  configfs_register_group() API call can't be removed from
  userspace side using rmdir() syscall.  That is required
  behaviour for IBTRS when session is created by API call and
  not from userspace.
   
  Actually, I have a patch for configfs to solve a), but then
  b) comes.

   b) configfs show/store callbacks are racy by design (in
  contradiction to kernfs), i.e. even dentry is unhashed, opener
  of it can be faster and in few moments later those callbacks
  can be invoked.  To guarantee that all openers left and nobody
  is able to access an entry after configfs_drop_dentry() is
  returned additional hairy code should be written with wait
  queues, locks, etc.  I didn't like at all what I eventually
  got, gave up and left as is, i.e. sysfs.


  What is left unchanged on IBTRS side but was suggested to modify:

 - Bart suggested to use sbitmap instead of calling find_first_zero_bit()
   and friends.  I found calling pure bit API is more explicit in
   comparison to sbitmap - there is no need in using sbitmap_queue
   and all the power of wait queues, no benefits in terms of LoC
   as well.
   
 - I did several attempts to unify approach of wrapping ib_device
   with ULP device structure (e.g. device pool or using ib_client
   API) but it turns out to be that none of these approaches bring
   simplicity, so IBTRS still creates ULP specific device on demand
   and keeps it in the list.

 - Sagi suggested to extend inet_pton_with_scope() with gid to
   sockaddr conversion, but after IPv6 conversion (gid is compliant
   with IPv6) special RDMA magic should be done in order to setup
   IB port space range, which is very specific and does not fit to
   be some generic library helper.  And am I right that gid is not
   used and seems dying?

  * https://lwn.net/Articles/755075/

v1:
  - IBTRS: load-balancing and IO 

[PATCH v3 02/25] ibtrs: public interface header to establish RDMA connections

2018-06-06 Thread Roman Pen
Introduce public header which provides set of API functions to
establish RDMA connections from client to server machine using
IBTRS protocol, which manages RDMA connections for each session,
does multipathing and load balancing.

Main functions for client (active) side:

 ibtrs_clt_open() - Creates set of RDMA connections incapsulated
in IBTRS session and returns pointer on IBTRS
session object.
 ibtrs_clt_close() - Closes RDMA connections associated with IBTRS
 session.
 ibtrs_clt_request() - Requests zero-copy RDMA transfer to/from
   server.

Main functions for server (passive) side:

 ibtrs_srv_open() - Starts listening for IBTRS clients on specified
port and invokes IBTRS callbacks for incoming
RDMA requests or link events.
 ibtrs_srv_close() - Closes IBTRS server context.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs.h | 325 +++
 1 file changed, 325 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs.h
new file mode 100644
index ..24a1e18816d7
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs.h
@@ -0,0 +1,325 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef IBTRS_H
+#define IBTRS_H
+
+#include 
+#include 
+
+struct ibtrs_tag;
+struct ibtrs_clt;
+struct ibtrs_srv_ctx;
+struct ibtrs_srv;
+struct ibtrs_srv_op;
+
+/*
+ * Here goes IBTRS client API
+ */
+
+/**
+ * enum ibtrs_clt_link_ev - Events about connectivity state of a client
+ * @IBTRS_CLT_LINK_EV_RECONNECTED  Client was reconnected.
+ * @IBTRS_CLT_LINK_EV_DISCONNECTED Client was disconnected.
+ */
+enum ibtrs_clt_link_ev {
+   IBTRS_CLT_LINK_EV_RECONNECTED,
+   IBTRS_CLT_LINK_EV_DISCONNECTED,
+};
+
+/**
+ * Source and destination address of a path to be established
+ */
+struct ibtrs_addr {
+   struct sockaddr_storage *src;
+   struct sockaddr_storage *dst;
+};
+
+typedef void (link_clt_ev_fn)(void *priv, enum ibtrs_clt_link_ev ev);
+/**
+ * ibtrs_clt_open() - Open a session to a IBTRS client
+ * @priv:  User supplied private data.
+ * @link_ev:   Event notification for connection state changes
+ * @priv:  user supplied data that was passed to
+ * ibtrs_clt_open()
+ * @ev:Occurred event
+ * @sessname: name of the session
+ * @paths: Paths to be established defined by their src and dst addresses
+ * @path_cnt: Number of elemnts in the @paths array
+ * @port: port to be used by the IBTRS session
+ * @pdu_sz: Size of extra payload which can be accessed after tag allocation.
+ * @max_inflight_msg: Max. number of parallel inflight messages for the session
+ * @max_segments: Max. number of segments per IO request
+ * @reconnect_delay_sec: time between reconnect tries
+ * @max_reconnect_attempts: Number of times to reconnect on error before giving
+ * up, 0 for * disabled, -1 for forever
+ *
+ * Starts session establishment with the ibtrs_server. The function can block
+ * up to ~2000ms until it returns.
+ *
+ * Return a valid pointer on success otherwise PTR_ERR.
+ */
+struct ibtrs_clt *ibtrs_clt_open(void *priv, link_clt_ev_fn *link_ev,
+const char *sessname,
+const struct ibtrs_addr *paths,
+size_t path_cnt, short port,
+size_t pdu_sz, u8 reconnect_delay_sec,
+u16 max_segments,
+s16 max_reconnect_attempts);
+
+/**
+ * ibtrs_clt_close() - Close a session
+ * @sess: Session handler, is freed on return
+ */
+void ibtrs_clt_close(struct ibtrs_clt *sess);
+
+/**
+ * 

[PATCH v3 23/25] ibnbd: include client and server modules into kernel compilation

2018-06-06 Thread Roman Pen
Add IBNBD Makefile, Kconfig and also corresponding lines into upper
block layer files.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/Kconfig|  2 ++
 drivers/block/Makefile   |  1 +
 drivers/block/ibnbd/Kconfig  | 22 ++
 drivers/block/ibnbd/Makefile | 11 +++
 4 files changed, 36 insertions(+)
 create mode 100644 drivers/block/ibnbd/Kconfig
 create mode 100644 drivers/block/ibnbd/Makefile

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index ad9b687a236a..d8c1590411c8 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -481,4 +481,6 @@ config BLK_DEV_RSXX
  To compile this driver as a module, choose M here: the
  module will be called rsxx.
 
+source "drivers/block/ibnbd/Kconfig"
+
 endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index dc061158b403..65346a1d0b1a 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)+= mtip32xx/
 obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/
 obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk.o
 obj-$(CONFIG_ZRAM) += zram/
+obj-$(CONFIG_BLK_DEV_IBNBD)+= ibnbd/
 
 skd-y  := skd_main.o
 swim_mod-y := swim.o swim_asm.o
diff --git a/drivers/block/ibnbd/Kconfig b/drivers/block/ibnbd/Kconfig
new file mode 100644
index ..b381c6c084d2
--- /dev/null
+++ b/drivers/block/ibnbd/Kconfig
@@ -0,0 +1,22 @@
+config BLK_DEV_IBNBD
+   bool
+
+config BLK_DEV_IBNBD_CLIENT
+   tristate "Network block device driver on top of IBTRS transport"
+   depends on INFINIBAND_IBTRS_CLIENT
+   select BLK_DEV_IBNBD
+   help
+ IBNBD client allows for mapping of a remote block devices over
+ IBTRS protocol from a target system where IBNBD server is running.
+
+ If unsure, say N.
+
+config BLK_DEV_IBNBD_SERVER
+   tristate "Network block device over RDMA Infiniband server support"
+   depends on INFINIBAND_IBTRS_SERVER
+   select BLK_DEV_IBNBD
+   help
+ IBNBD server allows for exporting local block devices to a remote 
client
+ over IBTRS protocol.
+
+ If unsure, say N.
diff --git a/drivers/block/ibnbd/Makefile b/drivers/block/ibnbd/Makefile
new file mode 100644
index ..ac906036310e
--- /dev/null
+++ b/drivers/block/ibnbd/Makefile
@@ -0,0 +1,11 @@
+ccflags-y := -Idrivers/infiniband/ulp/ibtrs
+
+ibnbd-client-y := ibnbd-clt.o \
+ ibnbd-clt-sysfs.o
+
+ibnbd-server-y := ibnbd-srv.o \
+ ibnbd-srv-dev.o \
+ ibnbd-srv-sysfs.o
+
+obj-$(CONFIG_BLK_DEV_IBNBD_CLIENT) += ibnbd-client.o
+obj-$(CONFIG_BLK_DEV_IBNBD_SERVER) += ibnbd-server.o
-- 
2.13.1



[PATCH v3 18/25] ibnbd: client: sysfs interface functions

2018-06-06 Thread Roman Pen
This is the sysfs interface to IBNBD block devices on client side:

  /sys/devices/virtual/ibnbd-client/ctl/
|- map_device
|  *** maps remote device
|
|- devices/
   *** all mapped devices

  /sys/block/ibnbd/ibnbd_client/
|- unmap_device
|  *** unmaps device
|
|- state
|  *** device state
|
|- session
|  *** session name
|
|- mapping_path
   *** path of the dev that was mapped on server

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-clt-sysfs.c | 685 ++
 1 file changed, 685 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-clt-sysfs.c

diff --git a/drivers/block/ibnbd/ibnbd-clt-sysfs.c 
b/drivers/block/ibnbd/ibnbd-clt-sysfs.c
new file mode 100644
index ..3d3659a74e94
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt-sysfs.c
@@ -0,0 +1,685 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-clt.h"
+
+static struct device *ibnbd_dev;
+static struct class *ibnbd_dev_class;
+static struct kobject *ibnbd_devs_kobj;
+
+enum {
+   IBNBD_OPT_ERR   = 0,
+   IBNBD_OPT_PATH  = 1 << 0,
+   IBNBD_OPT_DEV_PATH  = 1 << 1,
+   IBNBD_OPT_ACCESS_MODE   = 1 << 3,
+   IBNBD_OPT_IO_MODE   = 1 << 5,
+   IBNBD_OPT_SESSNAME  = 1 << 6,
+};
+
+static unsigned int ibnbd_opt_mandatory[] = {
+   IBNBD_OPT_PATH,
+   IBNBD_OPT_DEV_PATH,
+   IBNBD_OPT_SESSNAME,
+};
+
+static const match_table_t ibnbd_opt_tokens = {
+   {   IBNBD_OPT_PATH, "path=%s"   },
+   {   IBNBD_OPT_DEV_PATH, "device_path=%s"},
+   {   IBNBD_OPT_ACCESS_MODE,  "access_mode=%s"},
+   {   IBNBD_OPT_IO_MODE,  "io_mode=%s"},
+   {   IBNBD_OPT_SESSNAME, "sessname=%s"   },
+   {   IBNBD_OPT_ERR,  NULL},
+};
+
+/* remove new line from string */
+static void strip(char *s)
+{
+   char *p = s;
+
+   while (*s != '\0') {
+   if (*s != '\n')
+   *p++ = *s++;
+   else
+   ++s;
+   }
+   *p = '\0';
+}
+
+static int ibnbd_clt_parse_map_options(const char *buf,
+  char *sessname,
+  struct ibtrs_addr *paths,
+  size_t *path_cnt,
+  size_t max_path_cnt,
+  char *pathname,
+  enum ibnbd_access_mode *access_mode,
+  enum ibnbd_io_mode *io_mode)
+{
+   char *options, *sep_opt;
+   char *p;
+   substring_t args[MAX_OPT_ARGS];
+   int opt_mask = 0;
+   int token;
+   int ret = -EINVAL;
+   int i;
+   int p_cnt = 0;
+
+   options = kstrdup(buf, GFP_KERNEL);
+   if (!options)
+   return -ENOMEM;
+
+   sep_opt = strstrip(options);
+   strip(sep_opt);
+   while ((p = strsep(_opt, " ")) != NULL) {
+   if (!*p)
+   continue;
+
+   token = match_token(p, ibnbd_opt_tokens, args);
+   opt_mask |= token;
+
+   switch (token) {
+   case IBNBD_OPT_SESSNAME:
+   p = match_strdup(args);
+   if (!p) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   if (strlen(p) > NAME_MAX) {
+   pr_err("map_device: sessname too long\n");
+   ret 

[PATCH v3 20/25] ibnbd: server: main functionality

2018-06-06 Thread Roman Pen
This is main functionality of ibnbd-server module, which handles IBTRS
events and IBNBD protocol requests, like map (open) or unmap (close)
device.  Also server side is responsible for processing incoming IBTRS
IO requests and forward them to local mapped devices.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-srv.c | 946 
 1 file changed, 946 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv.c

diff --git a/drivers/block/ibnbd/ibnbd-srv.c b/drivers/block/ibnbd/ibnbd-srv.c
new file mode 100644
index ..b045f8071ab0
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv.c
@@ -0,0 +1,946 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibnbd-srv.h"
+#include "ibnbd-srv-dev.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_VERSION(IBNBD_VER_STRING);
+MODULE_DESCRIPTION("InfiniBand Network Block Device Server");
+MODULE_LICENSE("GPL");
+
+#define DEFAULT_DEV_SEARCH_PATH "/"
+
+static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH;
+
+static int dev_search_path_set(const char *val, const struct kernel_param *kp)
+{
+   char *dup;
+
+   if (strlen(val) >= sizeof(dev_search_path))
+   return -EINVAL;
+
+   dup = kstrdup(val, GFP_KERNEL);
+
+   if (dup[strlen(dup) - 1] == '\n')
+   dup[strlen(dup) - 1] = '\0';
+
+   strlcpy(dev_search_path, dup, sizeof(dev_search_path));
+
+   kfree(dup);
+   pr_info("dev_search_path changed to '%s'\n", dev_search_path);
+
+   return 0;
+}
+
+static struct kparam_string dev_search_path_kparam_str = {
+   .maxlen = sizeof(dev_search_path),
+   .string = dev_search_path
+};
+
+static const struct kernel_param_ops dev_search_path_ops = {
+   .set= dev_search_path_set,
+   .get= param_get_string,
+};
+
+module_param_cb(dev_search_path, _search_path_ops,
+   _search_path_kparam_str, 0444);
+MODULE_PARM_DESC(dev_search_path, "Sets the dev_search_path."
+" When a device is mapped this path is prepended to the"
+" device path from the map device operation.  If %SESSNAME%"
+" is specified in a path, then device will be searched in a"
+" session namespace."
+" (default: " DEFAULT_DEV_SEARCH_PATH ")");
+
+static int def_io_mode = IBNBD_BLOCKIO;
+
+static int def_io_mode_set(const char *val, const struct kernel_param *kp)
+{
+   int io_mode, rc;
+
+   rc = kstrtoint(val, 0, _mode);
+   if (unlikely(rc))
+   return rc;
+
+   switch (io_mode) {
+   case IBNBD_FILEIO:
+   case IBNBD_BLOCKIO:
+   def_io_mode = io_mode;
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
+static const struct kernel_param_ops def_io_mode_ops = {
+   .set= def_io_mode_set,
+   .get= param_get_int,
+};
+module_param_cb(def_io_mode, _io_mode_ops, _io_mode, 0444);
+MODULE_PARM_DESC(def_io_mode, "By default, export devices in"
+" blockio(" __stringify(_IBNBD_BLOCKIO) ") or"
+" fileio(" __stringify(_IBNBD_FILEIO) ") mode."
+" (default: " __stringify(_IBNBD_BLOCKIO) " (blockio))");
+
+static DEFINE_MUTEX(sess_lock);
+static DEFINE_SPINLOCK(dev_lock);
+
+static LIST_HEAD(sess_list);
+static LIST_HEAD(dev_list);
+
+struct ibnbd_io_private {
+   struct ibtrs_srv_op *id;
+   struct ibnbd_srv_sess_dev   *sess_dev;
+};
+
+static void ibnbd_sess_dev_release(struct kref *kref)
+{
+   struct ibnbd_srv_sess_dev *sess_dev;
+
+   sess_dev = container_of(kref, struct ibnbd_srv_sess_dev, kref);
+   complete(sess_dev->destroy_comp);
+}
+
+static inline void ibnbd_put_sess_dev(struct ibnbd_srv_sess_dev *sess_dev)
+{
+   

[PATCH v3 24/25] ibnbd: a bit of documentation

2018-06-06 Thread Roman Pen
README with description of major sysfs entries.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/README | 299 +
 1 file changed, 299 insertions(+)
 create mode 100644 drivers/block/ibnbd/README

diff --git a/drivers/block/ibnbd/README b/drivers/block/ibnbd/README
new file mode 100644
index ..bbaddd02c1c5
--- /dev/null
+++ b/drivers/block/ibnbd/README
@@ -0,0 +1,299 @@
+***
+Infiniband Network Block Device (IBNBD)
+***
+
+Introduction
+
+
+IBNBD (InfiniBand Network Block Device) is a pair of kernel modules
+(client and server) that allow for remote access of a block device on
+the server over IBTRS protocol using the RDMA (InfiniBand, RoCE, iWarp)
+transport. After being mapped, the remote block devices can be accessed
+on the client side as local block devices.
+
+I/O is transfered between client and server by the IBTRS transport
+modules. The administration of IBNBD and IBTRS modules is done via
+sysfs entries.
+
+Requirements
+
+
+  IBTRS kernel modules
+
+Quick Start
+---
+
+Server side:
+  # modprobe ibnbd_server
+
+Client side:
+  # modprobe ibnbd_client
+  # echo "sessname=blya path=ip:10.50.100.66 device_path=/dev/ram0" > \
+/sys/devices/virtual/ibnbd-client/ctl/map_device
+
+  Where "sessname=" is a session name, a string to identify the session
+  on client and on server sides; "path=" is a destination IP address or
+  a pair of a source and a destination IPs, separated by comma.  Multiple
+  "path=" options can be specified in order to use multipath  (see IBTRS
+  description for details); "device_path=" is the block device to be
+  mapped from the server side. After the session to the server machine is
+  established, the mapped device will appear on the client side under
+  /dev/ibnbd.
+
+
+==
+Client Sysfs Interface
+==
+
+All sysfs files that are not read-only provide the usage information on read:
+
+Example:
+  # cat /sys/devices/virtual/ibnbd-client/ctl/map_device
+
+  > Usage: echo "sessname= path=<[srcaddr,]dstaddr>
+  > [path=<[srcaddr,]dstaddr>] device_path=
+  > [access_mode=]
+  > [io_mode=]" > map_device
+  >
+  > addr ::= [ ip: | ip: | gid: ]
+
+Entries under /sys/devices/virtual/ibnbd-client/ctl/
+===
+
+map_device (RW)
+---
+
+Expected format is the following:
+
+sessname=
+path=<[srcaddr,]dstaddr> [path=<[srcaddr,]dstaddr> ...]
+device_path=
+[access_mode=]
+[io_mode=]
+
+Where:
+
+sessname: accepts a string not bigger than 256 chars, which identifies
+  a given session on the client and on the server.
+  I.e. "clt_hostname-srv_hostname" could be a natural choice.
+
+path: describes a connection between the client and the server by
+  specifying destination and, when required, the source address.
+  The addresses are to be provided in the following format:
+
+ip:
+ip:
+gid:
+
+  for example:
+
+  path=ip:10.0.0.66
+ The single addr is treated as the destination.
+ The connection will be established to this
+ server from any client IP address.
+
+  path=ip:10.0.0.66,ip:10.0.1.66
+ First addr is the source address and the second
+ is the destination.
+
+  If multiple "path=" options are specified multiple connection
+  will be established and data will be sent according to
+  the selected multipath policy (see IBTRS mp_policy sysfs entry
+  description).
+
+device_path: Path to the block device on the server side. Path is specified
+ relative to the directory on server side configured in the
+ 'dev_search_path' module parameter of the ibnbd_server.
+ The ibnbd_server prepends the  received from client
+ with  and tries to open the
+ / block device.  On success,
+ a /dev/ibnbd device file, a /sys/block/ibnbd_client/ibnbd/
+ directory and an entry in 
/sys/devices/virtual/ibnbd-client/ctl/devices
+ will be created.
+
+ If 'dev_search_path' contains '%SESSNAME%', then each session can
+ have different devices namespace, e.g. server was configured with
+ the following parameter "dev_search_path=/run/ibnbd-devs/%SESSNAME%",
+ client has this string "sessname=blya device_path=sda", then server
+ will try to open: /run/ibnbd-devs/blya/sda.
+
+access_mode: the access_mode parameter specifies if the device is to be
+ mapped as "ro" read-only or "rw" read-write. The server allows
+ a device to be exported in rw mode only once. The "migration"
+ access mode has to be specified if a second mapping 

[PATCH v3 19/25] ibnbd: server: private header with server structs and functions

2018-06-06 Thread Roman Pen
This header describes main structs and functions used by ibnbd-server
module, namely structs for managing sessions from different clients
and mapped (opened) devices.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-srv.h | 100 
 1 file changed, 100 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv.h

diff --git a/drivers/block/ibnbd/ibnbd-srv.h b/drivers/block/ibnbd/ibnbd-srv.h
new file mode 100644
index ..191a1650bc1d
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv.h
@@ -0,0 +1,100 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef IBNBD_SRV_H
+#define IBNBD_SRV_H
+
+#include 
+#include 
+#include 
+
+#include "ibtrs.h"
+#include "ibnbd-proto.h"
+#include "ibnbd-log.h"
+
+struct ibnbd_srv_session {
+   /* Entry inside global sess_list */
+   struct list_headlist;
+   struct ibtrs_srv*ibtrs;
+   charsessname[NAME_MAX];
+   int queue_depth;
+   struct bio_set  *sess_bio_set;
+
+   rwlock_tindex_lock cacheline_aligned;
+   struct idr  index_idr;
+   /* List of struct ibnbd_srv_sess_dev */
+   struct list_headsess_dev_list;
+   struct mutexlock;
+   u8  ver;
+};
+
+struct ibnbd_srv_dev {
+   /* Entry inside global dev_list */
+   struct list_headlist;
+   struct kobject  dev_kobj;
+   struct kobject  dev_sessions_kobj;
+   struct kref kref;
+   charid[NAME_MAX];
+   /* List of ibnbd_srv_sess_dev structs */
+   struct list_headsess_dev_list;
+   struct mutexlock;
+   int open_write_cnt;
+   enum ibnbd_io_mode  mode;
+};
+
+/* Structure which binds N devices and N sessions */
+struct ibnbd_srv_sess_dev {
+   /* Entry inside ibnbd_srv_dev struct */
+   struct list_headdev_list;
+   /* Entry inside ibnbd_srv_session struct */
+   struct list_headsess_list;
+   struct ibnbd_dev*ibnbd_dev;
+   struct ibnbd_srv_session*sess;
+   struct ibnbd_srv_dev*dev;
+   struct kobject  kobj;
+   struct completion   *sysfs_release_compl;
+   u32 device_id;
+   fmode_t open_flags;
+   struct kref kref;
+   struct completion   *destroy_comp;
+   charpathname[NAME_MAX];
+};
+
+/* ibnbd-srv-sysfs.c */
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+  struct block_device *bdev,
+  const char *dir_name);
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev);
+int ibnbd_srv_create_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+void ibnbd_srv_destroy_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+int ibnbd_srv_create_sysfs_files(void);
+void ibnbd_srv_destroy_sysfs_files(void);
+
+#endif /* IBNBD_SRV_H */
-- 
2.13.1



[PATCH v3 03/25] ibtrs: private headers with IBTRS protocol structs and helpers

2018-06-06 Thread Roman Pen
These are common private headers with IBTRS protocol structures,
logging, sysfs and other helper functions, which are used on
both client and server sides.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-log.h |  91 ++
 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h | 470 +++
 2 files changed, 561 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-log.h
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-log.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h
new file mode 100644
index ..f56257eabdee
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h
@@ -0,0 +1,91 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef IBTRS_LOG_H
+#define IBTRS_LOG_H
+
+#define P1 )
+#define P2 ))
+#define P3 )))
+#define P4 
+#define P(N) P ## N
+
+#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
+#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__
+
+#define LIST(...)  \
+   __VA_ARGS__,\
+   ({ unknown_type(); NULL; }) \
+   CAT(P, COUNT_ARGS(__VA_ARGS__)) \
+
+#define EMPTY()
+#define DEFER(id) id EMPTY()
+
+#define _CASE(obj, type, member)   \
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(   \
+   typeof(obj), type), \
+   ((type)obj)->member
+#define CASE(o, t, m) DEFER(_CASE)(o,t,m)
+
+/*
+ * Below we define retrieving of sessname from common IBTRS types.
+ * Client or server related types have to be defined by special
+ * TYPES_TO_SESSNAME macro.
+ */
+
+void unknown_type(void);
+
+#ifndef TYPES_TO_SESSNAME
+#define TYPES_TO_SESSNAME(...) ({ unknown_type(); NULL; })
+#endif
+
+#define ibtrs_prefix(obj)  \
+   _CASE(obj, struct ibtrs_con *,  sess->sessname),\
+   _CASE(obj, struct ibtrs_sess *, sessname),  \
+   TYPES_TO_SESSNAME(obj)  \
+   ))
+
+#define ibtrs_log(fn, obj, fmt, ...)   \
+   fn("<%s>: " fmt, ibtrs_prefix(obj), ##__VA_ARGS__)
+
+#define ibtrs_err(obj, fmt, ...)   \
+   ibtrs_log(pr_err, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_err_rl(obj, fmt, ...)\
+   ibtrs_log(pr_err_ratelimited, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_wrn(obj, fmt, ...)   \
+   ibtrs_log(pr_warn, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_wrn_rl(obj, fmt, ...) \
+   ibtrs_log(pr_warn_ratelimited, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_info(obj, fmt, ...) \
+   ibtrs_log(pr_info, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_info_rl(obj, fmt, ...) \
+   ibtrs_log(pr_info_ratelimited, obj, fmt, ##__VA_ARGS__)
+
+#endif /* IBTRS_LOG_H */
diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h
new file mode 100644
index ..f56652a46a8d
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h
@@ -0,0 +1,470 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the 

[PATCH v3 21/25] ibnbd: server: functionality for IO submission to file or block dev

2018-06-06 Thread Roman Pen
This provides helper functions for IO submission to file or block dev.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-srv-dev.c | 413 
 drivers/block/ibnbd/ibnbd-srv-dev.h | 149 +
 2 files changed, 562 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.c
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.h

diff --git a/drivers/block/ibnbd/ibnbd-srv-dev.c 
b/drivers/block/ibnbd/ibnbd-srv-dev.c
new file mode 100644
index ..aefa10fcafc3
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv-dev.c
@@ -0,0 +1,413 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibnbd-srv-dev.h"
+#include "ibnbd-log.h"
+
+#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0
+
+struct ibnbd_dev_file_io_work {
+   struct ibnbd_dev*dev;
+   void*priv;
+
+   sector_tsector;
+   void*data;
+   size_t  len;
+   size_t  bi_size;
+   enum ibnbd_io_flags flags;
+
+   struct work_struct  work;
+};
+
+struct ibnbd_dev_blk_io {
+   struct ibnbd_dev *dev;
+   void *priv;
+};
+
+static struct workqueue_struct *fileio_wq;
+
+int ibnbd_dev_init(void)
+{
+   fileio_wq = alloc_workqueue("%s", WQ_UNBOUND,
+   IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS,
+   "ibnbd_server_fileio_wq");
+   if (!fileio_wq)
+   return -ENOMEM;
+
+   return 0;
+}
+
+void ibnbd_dev_destroy(void)
+{
+   destroy_workqueue(fileio_wq);
+}
+
+static inline struct block_device *ibnbd_dev_open_bdev(const char *path,
+  fmode_t flags)
+{
+   return blkdev_get_by_path(path, flags, THIS_MODULE);
+}
+
+static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path,
+ fmode_t flags)
+{
+   dev->bdev = ibnbd_dev_open_bdev(path, flags);
+   return PTR_ERR_OR_ZERO(dev->bdev);
+}
+
+static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path,
+ fmode_t flags)
+{
+   int oflags = O_DSYNC; /* enable write-through */
+
+   if (flags & FMODE_WRITE)
+   oflags |= O_RDWR;
+   else if (flags & FMODE_READ)
+   oflags |= O_RDONLY;
+   else
+   return -EINVAL;
+
+   dev->file = filp_open(path, oflags, 0);
+   return PTR_ERR_OR_ZERO(dev->file);
+}
+
+struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags,
+enum ibnbd_io_mode mode, struct bio_set *bs,
+ibnbd_dev_io_fn io_cb)
+{
+   struct ibnbd_dev *dev;
+   int ret;
+
+   dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   if (mode == IBNBD_BLOCKIO) {
+   dev->blk_open_flags = flags;
+   ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+   if (ret)
+   goto err;
+   } else if (mode == IBNBD_FILEIO) {
+   dev->blk_open_flags = FMODE_READ;
+   ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+   if (ret)
+   goto err;
+
+   ret = ibnbd_dev_vfs_open(dev, path, flags);
+   if (ret)
+   goto blk_put;
+   } else {
+   ret = -EINVAL;
+   goto err;
+   }
+
+   dev->blk_open_flags = flags;
+   dev->mode   = mode;
+   dev->io_cb  = io_cb;
+   bdevname(dev->bdev, dev->name);
+   dev->ibd_bio_set= bs;
+
+   return dev;
+
+blk_put:
+   blkdev_put(dev->bdev, dev->blk_open_flags);
+err:
+   

[PATCH v3 22/25] ibnbd: server: sysfs interface functions

2018-06-06 Thread Roman Pen
This is the sysfs interface to IBNBD mapped devices on server side:

  /sys/devices/virtual/ibnbd-server/ctl/devices//
|- block_dev
|  *** link pointing to the corresponding block device sysfs entry
|
|- sessions//
|  *** sessions directory
   |
   |- read_only
   |  *** is devices mapped as read only
   |
   |- mapping_path
  *** relative device path provided by the client during mapping

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-srv-sysfs.c | 242 ++
 1 file changed, 242 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-sysfs.c

diff --git a/drivers/block/ibnbd/ibnbd-srv-sysfs.c 
b/drivers/block/ibnbd/ibnbd-srv-sysfs.c
new file mode 100644
index ..5bf77cdb09c8
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv-sysfs.c
@@ -0,0 +1,242 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-srv.h"
+
+static struct device *ibnbd_dev;
+static struct class *ibnbd_dev_class;
+static struct kobject *ibnbd_devs_kobj;
+
+static struct attribute *ibnbd_srv_default_dev_attrs[] = {
+   NULL,
+};
+
+static struct attribute_group ibnbd_srv_default_dev_attr_group = {
+   .attrs = ibnbd_srv_default_dev_attrs,
+};
+
+static struct kobj_type ktype = {
+   .sysfs_ops  = _sysfs_ops,
+};
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+  struct block_device *bdev,
+  const char *dir_name)
+{
+   struct kobject *bdev_kobj;
+   int ret;
+
+   ret = kobject_init_and_add(>dev_kobj, ,
+  ibnbd_devs_kobj, dir_name);
+   if (ret)
+   return ret;
+
+   ret = kobject_init_and_add(>dev_sessions_kobj,
+  ,
+  >dev_kobj, "sessions");
+   if (ret)
+   goto err;
+
+   ret = sysfs_create_group(>dev_kobj,
+_srv_default_dev_attr_group);
+   if (ret)
+   goto err2;
+
+   bdev_kobj = _to_dev(bdev->bd_disk)->kobj;
+   ret = sysfs_create_link(>dev_kobj, bdev_kobj, "block_dev");
+   if (ret)
+   goto err3;
+
+   return 0;
+
+err3:
+   sysfs_remove_group(>dev_kobj,
+  _srv_default_dev_attr_group);
+err2:
+   kobject_del(>dev_sessions_kobj);
+   kobject_put(>dev_sessions_kobj);
+err:
+   kobject_del(>dev_kobj);
+   kobject_put(>dev_kobj);
+   return ret;
+}
+
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev)
+{
+   sysfs_remove_link(>dev_kobj, "block_dev");
+   sysfs_remove_group(>dev_kobj, _srv_default_dev_attr_group);
+   kobject_del(>dev_sessions_kobj);
+   kobject_put(>dev_sessions_kobj);
+   kobject_del(>dev_kobj);
+   kobject_put(>dev_kobj);
+}
+
+static ssize_t ibnbd_srv_dev_session_ro_show(struct kobject *kobj,
+struct kobj_attribute *attr,
+char *page)
+{
+   struct ibnbd_srv_sess_dev *sess_dev;
+
+   sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+
+   return scnprintf(page, PAGE_SIZE, "%s\n",
+(sess_dev->open_flags & FMODE_WRITE) ? "0" : "1");
+}
+
+static struct kobj_attribute ibnbd_srv_dev_session_ro_attr =
+   __ATTR(read_only, 0444,
+  ibnbd_srv_dev_session_ro_show,
+  NULL);
+
+static ssize_t
+ibnbd_srv_dev_session_mapping_path_show(struct kobject *kobj,
+   struct kobj_attribute *attr, char *page)
+{
+   struct ibnbd_srv_sess_dev *sess_dev;
+
+   sess_dev = container_of(kobj, struct 

[PATCH v3 15/25] ibnbd: private headers with IBNBD protocol structs and helpers

2018-06-06 Thread Roman Pen
These are common private headers with IBNBD protocol structures,
logging, sysfs and other helper functions, which are used on
both client and server sides.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-log.h   |  71 
 drivers/block/ibnbd/ibnbd-proto.h | 364 ++
 2 files changed, 435 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-log.h
 create mode 100644 drivers/block/ibnbd/ibnbd-proto.h

diff --git a/drivers/block/ibnbd/ibnbd-log.h b/drivers/block/ibnbd/ibnbd-log.h
new file mode 100644
index ..489343a61171
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-log.h
@@ -0,0 +1,71 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef IBNBD_LOG_H
+#define IBNBD_LOG_H
+
+#include "ibnbd-clt.h"
+#include "ibnbd-srv.h"
+
+#define ibnbd_diskname(dev) ({ \
+   struct gendisk *gd = ((struct ibnbd_clt_dev *)dev)->gd; \
+   gd ? gd->disk_name : "";\
+})
+
+void unknown_type(void);
+
+#define ibnbd_log(fn, dev, fmt, ...) ({
\
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(   \
+   typeof(dev), struct ibnbd_clt_dev *),   \
+   fn("<%s@%s> %s: " fmt, (dev)->pathname, \
+  (dev)->sess->sessname, ibnbd_diskname(dev),  \
+  ##__VA_ARGS__),  \
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(typeof(dev),   \
+   struct ibnbd_srv_sess_dev *),   \
+   fn("<%s@%s>: " fmt, (dev)->pathname,\
+  (dev)->sess->sessname, ##__VA_ARGS__),   
\
+   unknown_type()));   \
+})
+
+#define ibnbd_err(dev, fmt, ...)   \
+   ibnbd_log(pr_err, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_err_rl(dev, fmt, ...)\
+   ibnbd_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_wrn(dev, fmt, ...)   \
+   ibnbd_log(pr_warn, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_wrn_rl(dev, fmt, ...) \
+   ibnbd_log(pr_warn_ratelimited, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_info(dev, fmt, ...) \
+   ibnbd_log(pr_info, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_info_rl(dev, fmt, ...) \
+   ibnbd_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__)
+
+#endif /* IBNBD_LOG_H */
diff --git a/drivers/block/ibnbd/ibnbd-proto.h 
b/drivers/block/ibnbd/ibnbd-proto.h
new file mode 100644
index ..050d3fa4c1bf
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-proto.h
@@ -0,0 +1,364 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+

[PATCH v3 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules

2018-06-06 Thread Roman Pen
Signed-off-by: Roman Pen 
Cc: Danil Kipnis 
Cc: Jack Wang 
---
 MAINTAINERS | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ca4afd68530c..201c6c8e039e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6782,6 +6782,20 @@ IBM ServeRAID RAID DRIVER
 S: Orphan
 F: drivers/scsi/ips.*
 
+IBNBD BLOCK DRIVERS
+M: IBNBD/IBTRS Storage Team 
+L: linux-block@vger.kernel.org
+S: Maintained
+T: git git://github.com/profitbricks/ibnbd.git
+F: drivers/block/ibnbd/
+
+IBTRS TRANSPORT DRIVERS
+M: IBNBD/IBTRS Storage Team 
+L: linux-r...@vger.kernel.org
+S: Maintained
+T: git git://github.com/profitbricks/ibnbd.git
+F: drivers/infiniband/ulp/ibtrs/
+
 ICH LPC AND GPIO DRIVER
 M: Peter Tyser 
 S: Maintained
-- 
2.13.1



[PATCH v3 10/25] ibtrs: server: main functionality

2018-06-06 Thread Roman Pen
This is main functionality of ibtrs-server module, which accepts
set of RDMA connections (so called IBTRS session), creates/destroys
sysfs entries associated with IBTRS session and notifies upper layer
(user of IBTRS API) about RDMA requests or link events.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c | 2003 ++
 1 file changed, 2003 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c
new file mode 100644
index ..22c965cd5c8b
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c
@@ -0,0 +1,2003 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-srv.h"
+#include "ibtrs-log.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Server");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+/* Must be power of 2, see mask from mr->page_size in ib_sg_to_pages() */
+#define DEFAULT_MAX_CHUNK_SIZE (128 << 10)
+#define DEFAULT_SESS_QUEUE_DEPTH 512
+#define MAX_HDR_SIZE PAGE_SIZE
+#define MAX_SG_COUNT ((MAX_HDR_SIZE - sizeof(struct ibtrs_msg_rdma_read)) \
+ / sizeof(struct ibtrs_sg_desc))
+
+/* We guarantee to serve 10 paths at least */
+#define CHUNK_POOL_SZ 10
+
+static struct ibtrs_ib_dev_pool dev_pool;
+static mempool_t *chunk_pool;
+struct class *ibtrs_dev_class;
+
+static int retry_count = 7;
+static int __read_mostly max_chunk_size = DEFAULT_MAX_CHUNK_SIZE;
+static int __read_mostly sess_queue_depth = DEFAULT_SESS_QUEUE_DEPTH;
+
+module_param_named(max_chunk_size, max_chunk_size, int, 0444);
+MODULE_PARM_DESC(max_chunk_size,
+"Max size for each IO request, when change the unit is in byte"
+" (default: " __stringify(DEFAULT_MAX_CHUNK_SIZE_KB) "KB)");
+
+module_param_named(sess_queue_depth, sess_queue_depth, int, 0444);
+MODULE_PARM_DESC(sess_queue_depth,
+"Number of buffers for pending I/O requests to allocate"
+" per session. Maximum: " __stringify(MAX_SESS_QUEUE_DEPTH)
+" (default: " __stringify(DEFAULT_SESS_QUEUE_DEPTH) ")");
+
+static int retry_count_set(const char *val, const struct kernel_param *kp)
+{
+   int err, ival;
+
+   err = kstrtoint(val, 0, );
+   if (err)
+   return err;
+
+   if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) {
+   pr_err("Invalid retry count value %d, has to be"
+  " > %d, < %d\n", ival, MIN_RTR_CNT, MAX_RTR_CNT);
+   return -EINVAL;
+   }
+
+   retry_count = ival;
+   pr_info("QP retry count changed to %d\n", ival);
+
+   return 0;
+}
+
+static const struct kernel_param_ops retry_count_ops = {
+   .set= retry_count_set,
+   .get= param_get_int,
+};
+module_param_cb(retry_count, _count_ops, _count, 0644);
+
+MODULE_PARM_DESC(retry_count, "Number of times to send the message if the"
+" remote side didn't respond with Ack or Nack (default: 3,"
+" min: " __stringify(MIN_RTR_CNT) ", max: "
+__stringify(MAX_RTR_CNT) ")");
+
+static char cq_affinity_list[256] = "";
+static cpumask_t cq_affinity_mask = { CPU_BITS_ALL };
+
+static void init_cq_affinity(void)
+{
+   sprintf(cq_affinity_list, "0-%d", nr_cpu_ids - 1);
+}
+
+static int cq_affinity_list_set(const char *val, const struct kernel_param *kp)
+{
+   int ret = 0, len = strlen(val);
+   cpumask_var_t new_value;
+
+   if (!strlen(cq_affinity_list))
+   init_cq_affinity();
+
+   if (len >= sizeof(cq_affinity_list))
+   return -EINVAL;
+   if (!alloc_cpumask_var(_value, GFP_KERNEL))
+   

[PATCH v3 09/25] ibtrs: server: private header with server structs and functions

2018-06-06 Thread Roman Pen
This header describes main structs and functions used by ibtrs-server
module, mainly for accepting IBTRS sessions, creating/destroying
sysfs entries, accounting statistics on server side.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h | 177 +++
 1 file changed, 177 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h
new file mode 100644
index ..b1e32136f352
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h
@@ -0,0 +1,177 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef IBTRS_SRV_H
+#define IBTRS_SRV_H
+
+#include 
+#include 
+#include "ibtrs-pri.h"
+
+/**
+ * enum ibtrs_srv_state - Server states.
+ */
+enum ibtrs_srv_state {
+   IBTRS_SRV_CONNECTING,
+   IBTRS_SRV_CONNECTED,
+   IBTRS_SRV_CLOSING,
+   IBTRS_SRV_CLOSED,
+};
+
+static inline const char *ibtrs_srv_state_str(enum ibtrs_srv_state state)
+{
+   switch (state) {
+   case IBTRS_SRV_CONNECTING:
+   return "IBTRS_SRV_CONNECTING";
+   case IBTRS_SRV_CONNECTED:
+   return "IBTRS_SRV_CONNECTED";
+   case IBTRS_SRV_CLOSING:
+   return "IBTRS_SRV_CLOSING";
+   case IBTRS_SRV_CLOSED:
+   return "IBTRS_SRV_CLOSED";
+   default:
+   return "UNKNOWN";
+   }
+}
+
+struct ibtrs_stats_wc_comp {
+   atomic64_t  calls;
+   atomic64_t  total_wc_cnt;
+};
+
+struct ibtrs_srv_stats_rdma_stats {
+   struct {
+   atomic64_t  cnt;
+   atomic64_t  size_total;
+   } dir[2];
+};
+
+struct ibtrs_srv_stats {
+   struct ibtrs_srv_stats_rdma_stats   rdma_stats;
+   atomic_tapm_cnt;
+   struct ibtrs_stats_wc_comp  wc_comp;
+};
+
+struct ibtrs_srv_con {
+   struct ibtrs_conc;
+   atomic_twr_cnt;
+};
+
+struct ibtrs_srv_op {
+   struct ibtrs_srv_con*con;
+   u32 msg_id;
+   u8  dir;
+   struct ibtrs_msg_rdma_read  *rd_msg;
+   struct ib_rdma_wr   *tx_wr;
+   struct ib_sge   *tx_sg;
+};
+
+struct ibtrs_srv_mr {
+   struct ib_mr*mr;
+   struct sg_table sgt;
+};
+
+struct ibtrs_srv_sess {
+   struct ibtrs_sess   s;
+   struct ibtrs_srv*srv;
+   struct work_struct  close_work;
+   enum ibtrs_srv_statestate;
+   spinlock_t  state_lock;
+   int cur_cq_vector;
+   struct ibtrs_srv_op **ops_ids;
+   atomic_tids_inflight;
+   wait_queue_head_t   ids_waitq;
+   struct ibtrs_srv_mr *mrs;
+   unsigned intmrs_num;
+   dma_addr_t  *dma_addr;
+   boolestablished;
+   unsigned intmem_bits;
+   struct kobject  kobj;
+   struct kobject  kobj_stats;
+   struct ibtrs_srv_stats  stats;
+};
+
+struct ibtrs_srv {
+   struct list_headpaths_list;
+   int paths_up;
+   struct mutexpaths_ev_mutex;
+   size_t  paths_num;
+   struct mutexpaths_mutex;
+   uuid_t  paths_uuid;
+   refcount_t  refcount;
+   struct ibtrs_srv_ctx*ctx;
+   struct list_headctx_list;
+   void*priv;
+   size_t  queue_depth;
+   struct page **chunks;
+   struct device   dev;
+   unsigneddev_ref;
+   struct kobject  kobj_paths;
+};
+
+struct ibtrs_srv_ctx {
+   rdma_ev_fn *rdma_ev;
+   link_ev_fn *link_ev;
+

[PATCH v3 16/25] ibnbd: client: private header with client structs and functions

2018-06-06 Thread Roman Pen
This header describes main structs and functions used by ibnbd-client
module, mainly for managing IBNBD sessions and mapped block devices,
creating and destroying sysfs entries.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-clt.h | 172 
 1 file changed, 172 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-clt.h

diff --git a/drivers/block/ibnbd/ibnbd-clt.h b/drivers/block/ibnbd/ibnbd-clt.h
new file mode 100644
index ..c5f6f08ec338
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt.h
@@ -0,0 +1,172 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef IBNBD_CLT_H
+#define IBNBD_CLT_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibtrs.h"
+#include "ibnbd-proto.h"
+#include "ibnbd-log.h"
+
+#define BMAX_SEGMENTS 31
+#define RECONNECT_DELAY 30
+#define MAX_RECONNECTS -1
+
+enum ibnbd_clt_dev_state {
+   DEV_STATE_INIT,
+   DEV_STATE_MAPPED,
+   DEV_STATE_MAPPED_DISCONNECTED,
+   DEV_STATE_UNMAPPED,
+};
+
+struct ibnbd_iu_comp {
+   wait_queue_head_t wait;
+   int errno;
+};
+
+struct ibnbd_iu {
+   union {
+   struct request *rq; /* for block io */
+   void *buf; /* for user messages */
+   };
+   struct ibtrs_tag*tag;
+   union {
+   /* use to send msg associated with a dev */
+   struct ibnbd_clt_dev *dev;
+   /* use to send msg associated with a sess */
+   struct ibnbd_clt_session *sess;
+   };
+   blk_status_tstatus;
+   struct scatterlist  sglist[BMAX_SEGMENTS];
+   struct work_struct  work;
+   int errno;
+   struct ibnbd_iu_comp*comp;
+};
+
+struct ibnbd_cpu_qlist {
+   struct list_headrequeue_list;
+   spinlock_t  requeue_lock;
+   unsigned intcpu;
+};
+
+struct ibnbd_clt_session {
+   struct list_headlist;
+   struct ibtrs_clt*ibtrs;
+   wait_queue_head_t   ibtrs_waitq;
+   boolibtrs_ready;
+   struct ibnbd_cpu_qlist  __percpu
+   *cpu_queues;
+   DECLARE_BITMAP(cpu_queues_bm, NR_CPUS);
+   int __percpu*cpu_rr; /* per-cpu var for CPU round-robin */
+   atomic_tbusy;
+   int queue_depth;
+   u32 max_io_size;
+   struct blk_mq_tag_set   tag_set;
+   struct mutexlock; /* protects state and devs_list */
+   struct list_headdevs_list; /* list of struct ibnbd_clt_dev */
+   refcount_t  refcount;
+   charsessname[NAME_MAX];
+   u8  ver; /* protocol version */
+};
+
+/**
+ * Submission queues.
+ */
+struct ibnbd_queue {
+   struct list_headrequeue_list;
+   unsigned long   in_list;
+   struct ibnbd_clt_dev*dev;
+   struct blk_mq_hw_ctx*hctx;
+};
+
+struct ibnbd_clt_dev {
+   struct ibnbd_clt_session*sess;
+   struct request_queue*queue;
+   struct ibnbd_queue  *hw_queues;
+   u32 device_id;
+   /* local Idr index - used to track minor number allocations. */
+   u32 clt_device_id;
+   struct mutexlock;
+   enum ibnbd_clt_dev_statedev_state;
+   enum ibnbd_io_mode  io_mode; /* user requested */
+   enum ibnbd_io_mode  remote_io_mode; /* server really used */
+   charpathname[NAME_MAX];
+   enum ibnbd_access_mode  access_mode;
+   boolread_only;
+   boolrotational;
+   u32 max_hw_sectors;
+   u32 max_write_same_sectors;
+   u32 

[PATCH v3 17/25] ibnbd: client: main functionality

2018-06-06 Thread Roman Pen
This is main functionality of ibnbd-client module, which provides
interface to map remote device as local block device /dev/ibnbd
and feeds IBTRS with IO requests.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-clt.c | 1817 +++
 1 file changed, 1817 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-clt.c

diff --git a/drivers/block/ibnbd/ibnbd-clt.c b/drivers/block/ibnbd/ibnbd-clt.c
new file mode 100644
index ..d665e144a253
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt.c
@@ -0,0 +1,1817 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-clt.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("InfiniBand Network Block Device Client");
+MODULE_VERSION(IBNBD_VER_STRING);
+MODULE_LICENSE("GPL");
+
+/*
+ * This is for closing devices when unloading the module:
+ * we might be closing a lot (>256) of devices in parallel
+ * and it is better not to use the system_wq.
+ */
+static struct workqueue_struct *unload_wq;
+static int ibnbd_client_major;
+static DEFINE_IDA(index_ida);
+static DEFINE_MUTEX(ida_lock);
+static DEFINE_MUTEX(sess_lock);
+static LIST_HEAD(sess_list);
+
+static bool softirq_enable;
+module_param(softirq_enable, bool, 0444);
+MODULE_PARM_DESC(softirq_enable, "finish request in softirq_fn."
+" (default: 0)");
+/*
+ * Maximum number of partitions an instance can have.
+ * 6 bits = 64 minors = 63 partitions (one minor is used for the device itself)
+ */
+#define IBNBD_PART_BITS6
+#define KERNEL_SECTOR_SIZE  512
+
+static inline bool ibnbd_clt_get_sess(struct ibnbd_clt_session *sess)
+{
+   return refcount_inc_not_zero(>refcount);
+}
+
+static void free_sess(struct ibnbd_clt_session *sess);
+
+static void ibnbd_clt_put_sess(struct ibnbd_clt_session *sess)
+{
+   might_sleep();
+
+   if (refcount_dec_and_test(>refcount))
+   free_sess(sess);
+}
+
+static inline bool ibnbd_clt_dev_is_mapped(struct ibnbd_clt_dev *dev)
+{
+   return dev->dev_state == DEV_STATE_MAPPED;
+}
+
+static void ibnbd_clt_put_dev(struct ibnbd_clt_dev *dev)
+{
+   might_sleep();
+
+   if (refcount_dec_and_test(>refcount)) {
+   mutex_lock(_lock);
+   ida_simple_remove(_ida, dev->clt_device_id);
+   mutex_unlock(_lock);
+   kfree(dev->hw_queues);
+   ibnbd_clt_put_sess(dev->sess);
+   kfree(dev);
+   }
+}
+
+static inline bool ibnbd_clt_get_dev(struct ibnbd_clt_dev *dev)
+{
+   return refcount_inc_not_zero(>refcount);
+}
+
+static int ibnbd_clt_set_dev_attr(struct ibnbd_clt_dev *dev,
+ const struct ibnbd_msg_open_rsp *rsp)
+{
+   struct ibnbd_clt_session *sess = dev->sess;
+
+   if (unlikely(!rsp->logical_block_size))
+   return -EINVAL;
+
+   dev->device_id  = le32_to_cpu(rsp->device_id);
+   dev->nsectors   = le64_to_cpu(rsp->nsectors);
+   dev->logical_block_size = le16_to_cpu(rsp->logical_block_size);
+   dev->physical_block_size= le16_to_cpu(rsp->physical_block_size);
+   dev->max_write_same_sectors = le32_to_cpu(rsp->max_write_same_sectors);
+   dev->max_discard_sectors= le32_to_cpu(rsp->max_discard_sectors);
+   dev->discard_granularity= le32_to_cpu(rsp->discard_granularity);
+   dev->discard_alignment  = le32_to_cpu(rsp->discard_alignment);
+   dev->secure_discard = le16_to_cpu(rsp->secure_discard);
+   dev->rotational = rsp->rotational;
+   dev->remote_io_mode = rsp->io_mode;
+
+   dev->max_hw_sectors = sess->max_io_size / dev->logical_block_size;
+   dev->max_segments = 

[PATCH v3 13/25] ibtrs: include client and server modules into kernel compilation

2018-06-06 Thread Roman Pen
Add IBTRS Makefile, Kconfig and also corresponding lines into upper
layer infiniband/ulp files.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/Kconfig|  1 +
 drivers/infiniband/ulp/Makefile   |  1 +
 drivers/infiniband/ulp/ibtrs/Kconfig  | 20 
 drivers/infiniband/ulp/ibtrs/Makefile | 13 +
 4 files changed, 35 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs/Makefile

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 2a972ed6851b..10df5d2bb8fe 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -97,6 +97,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig"
 
 source "drivers/infiniband/ulp/iser/Kconfig"
 source "drivers/infiniband/ulp/isert/Kconfig"
+source "drivers/infiniband/ulp/ibtrs/Kconfig"
 
 source "drivers/infiniband/ulp/opa_vnic/Kconfig"
 source "drivers/infiniband/sw/rdmavt/Kconfig"
diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile
index 437813c7b481..1c4f10dc8d49 100644
--- a/drivers/infiniband/ulp/Makefile
+++ b/drivers/infiniband/ulp/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_INFINIBAND_SRPT)   += srpt/
 obj-$(CONFIG_INFINIBAND_ISER)  += iser/
 obj-$(CONFIG_INFINIBAND_ISERT) += isert/
 obj-$(CONFIG_INFINIBAND_OPA_VNIC)  += opa_vnic/
+obj-$(CONFIG_INFINIBAND_IBTRS) += ibtrs/
diff --git a/drivers/infiniband/ulp/ibtrs/Kconfig 
b/drivers/infiniband/ulp/ibtrs/Kconfig
new file mode 100644
index ..eaeb8f3f6b4e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/Kconfig
@@ -0,0 +1,20 @@
+config INFINIBAND_IBTRS
+   tristate
+   depends on INFINIBAND_ADDR_TRANS
+
+config INFINIBAND_IBTRS_CLIENT
+   tristate "IBTRS client module"
+   depends on INFINIBAND_ADDR_TRANS
+   select INFINIBAND_IBTRS
+   help
+ IBTRS client allows for simplified data transfer and connection
+ establishment over RDMA (InfiniBand, RoCE, iWarp). Uses BIO-like
+ READ/WRITE semantics and provides multipath capabilities.
+
+config INFINIBAND_IBTRS_SERVER
+   tristate "IBTRS server module"
+   depends on INFINIBAND_ADDR_TRANS
+   select INFINIBAND_IBTRS
+   help
+ IBTRS server module processing connection and IO requests received
+ from the IBTRS client module.
diff --git a/drivers/infiniband/ulp/ibtrs/Makefile 
b/drivers/infiniband/ulp/ibtrs/Makefile
new file mode 100644
index ..2a145f8d252a
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/Makefile
@@ -0,0 +1,13 @@
+ibtrs-client-y := ibtrs-clt.o \
+ ibtrs-clt-stats.o \
+ ibtrs-clt-sysfs.o
+
+ibtrs-server-y := ibtrs-srv.o \
+ ibtrs-srv-stats.o \
+ ibtrs-srv-sysfs.o
+
+ibtrs-core-y := ibtrs.o
+
+obj-$(CONFIG_INFINIBAND_IBTRS)+= ibtrs-core.o
+obj-$(CONFIG_INFINIBAND_IBTRS_CLIENT) += ibtrs-client.o
+obj-$(CONFIG_INFINIBAND_IBTRS_SERVER) += ibtrs-server.o
-- 
2.13.1



[PATCH v3 05/25] ibtrs: client: private header with client structs and functions

2018-06-06 Thread Roman Pen
This header describes main structs and functions used by ibtrs-client
module, mainly for managing IBTRS sessions, creating/destroying sysfs
entries, accounting statistics on client side.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h | 315 +++
 1 file changed, 315 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h
new file mode 100644
index ..3212a33a0bf5
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h
@@ -0,0 +1,315 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef IBTRS_CLT_H
+#define IBTRS_CLT_H
+
+#include 
+#include "ibtrs-pri.h"
+
+/**
+ * enum ibtrs_clt_state - Client states.
+ */
+enum ibtrs_clt_state {
+   IBTRS_CLT_CONNECTING,
+   IBTRS_CLT_CONNECTING_ERR,
+   IBTRS_CLT_RECONNECTING,
+   IBTRS_CLT_CONNECTED,
+   IBTRS_CLT_CLOSING,
+   IBTRS_CLT_CLOSED,
+   IBTRS_CLT_DEAD,
+};
+
+static inline const char *ibtrs_clt_state_str(enum ibtrs_clt_state state)
+{
+   switch (state) {
+   case IBTRS_CLT_CONNECTING:
+   return "IBTRS_CLT_CONNECTING";
+   case IBTRS_CLT_CONNECTING_ERR:
+   return "IBTRS_CLT_CONNECTING_ERR";
+   case IBTRS_CLT_RECONNECTING:
+   return "IBTRS_CLT_RECONNECTING";
+   case IBTRS_CLT_CONNECTED:
+   return "IBTRS_CLT_CONNECTED";
+   case IBTRS_CLT_CLOSING:
+   return "IBTRS_CLT_CLOSING";
+   case IBTRS_CLT_CLOSED:
+   return "IBTRS_CLT_CLOSED";
+   case IBTRS_CLT_DEAD:
+   return "IBTRS_CLT_DEAD";
+   default:
+   return "UNKNOWN";
+   }
+}
+
+enum ibtrs_mp_policy {
+   MP_POLICY_RR,
+   MP_POLICY_MIN_INFLIGHT,
+};
+
+struct ibtrs_clt_stats_reconnects {
+   int successful_cnt;
+   int fail_cnt;
+};
+
+struct ibtrs_clt_stats_wc_comp {
+   u32 cnt;
+   u64 total_cnt;
+};
+
+struct ibtrs_clt_stats_cpu_migr {
+   atomic_t from;
+   int to;
+};
+
+struct ibtrs_clt_stats_rdma {
+   struct {
+   u64 cnt;
+   u64 size_total;
+   } dir[2];
+
+   u64 failover_cnt;
+};
+
+struct ibtrs_clt_stats_rdma_lat {
+   u64 read;
+   u64 write;
+};
+
+#define MIN_LOG_SG 2
+#define MAX_LOG_SG 5
+#define MAX_LIN_SG BIT(MIN_LOG_SG)
+#define SG_DISTR_SZ (MAX_LOG_SG - MIN_LOG_SG + MAX_LIN_SG + 2)
+
+#define MAX_LOG_LAT 16
+#define MIN_LOG_LAT 0
+#define LOG_LAT_SZ (MAX_LOG_LAT - MIN_LOG_LAT + 2)
+
+struct ibtrs_clt_stats_pcpu {
+   struct ibtrs_clt_stats_cpu_migr cpu_migr;
+   struct ibtrs_clt_stats_rdma rdma;
+   u64 sg_list_total;
+   u64 sg_list_distr[SG_DISTR_SZ];
+   struct ibtrs_clt_stats_rdma_lat rdma_lat_distr[LOG_LAT_SZ];
+   struct ibtrs_clt_stats_rdma_lat rdma_lat_max;
+   struct ibtrs_clt_stats_wc_comp  wc_comp;
+};
+
+struct ibtrs_clt_stats {
+   boolenable_rdma_lat;
+   struct ibtrs_clt_stats_pcpu__percpu *pcpu_stats;
+   struct ibtrs_clt_stats_reconnects   reconnects;
+   atomic_tinflight;
+};
+
+struct ibtrs_clt_con {
+   struct ibtrs_conc;
+   unsignedcpu;
+   atomic_tio_cnt;
+   int cm_err;
+};
+
+/**
+ * ibtrs_tag - tags the memory allocation for future RDMA operation
+ */
+struct ibtrs_tag {
+   enum ibtrs_clt_con_type con_type;
+   unsigned int cpu_id;
+   unsigned int mem_id;
+   unsigned int mem_off;
+};
+
+struct ibtrs_clt_io_req {
+   struct list_headlist;
+   struct ibtrs_iu *iu;
+   struct scatterlist  *sglist; 

[PATCH v3 11/25] ibtrs: server: statistics functions

2018-06-06 Thread Roman Pen
This introduces set of functions used on server side to account
statistics of RDMA data sent/received.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c | 110 +
 1 file changed, 110 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c
new file mode 100644
index ..5933cfc03f95
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c
@@ -0,0 +1,110 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-srv.h"
+
+void ibtrs_srv_update_rdma_stats(struct ibtrs_srv_stats *s,
+size_t size, int d)
+{
+   atomic64_inc(>rdma_stats.dir[d].cnt);
+   atomic64_add(size, >rdma_stats.dir[d].size_total);
+}
+
+void ibtrs_srv_update_wc_stats(struct ibtrs_srv_stats *s)
+{
+   atomic64_inc(>wc_comp.calls);
+   atomic64_inc(>wc_comp.total_wc_cnt);
+}
+
+int ibtrs_srv_reset_rdma_stats(struct ibtrs_srv_stats *stats, bool enable)
+{
+   if (enable) {
+   struct ibtrs_srv_stats_rdma_stats *r = >rdma_stats;
+
+   memset(r, 0, sizeof(*r));
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_srv_stats *stats,
+   char *page, size_t len)
+{
+   struct ibtrs_srv_stats_rdma_stats *r = >rdma_stats;
+   struct ibtrs_srv_sess *sess;
+
+   sess = container_of(stats, typeof(*sess), stats);
+
+   return scnprintf(page, len, "%lld %lld %lld %lld %u\n",
+(s64)atomic64_read(>dir[READ].cnt),
+(s64)atomic64_read(>dir[READ].size_total),
+(s64)atomic64_read(>dir[WRITE].cnt),
+(s64)atomic64_read(>dir[WRITE].size_total),
+atomic_read(>ids_inflight));
+}
+
+int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_srv_stats *stats,
+   bool enable)
+{
+   if (enable) {
+   memset(>wc_comp, 0, sizeof(stats->wc_comp));
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_srv_stats *stats,
+char *buf, size_t len)
+{
+   return snprintf(buf, len, "%lld %lld\n",
+   (s64)atomic64_read(>wc_comp.total_wc_cnt),
+   (s64)atomic64_read(>wc_comp.calls));
+}
+
+ssize_t ibtrs_srv_reset_all_help(struct ibtrs_srv_stats *stats,
+char *page, size_t len)
+{
+   return scnprintf(page, PAGE_SIZE, "echo 1 to reset all statistics\n");
+}
+
+int ibtrs_srv_reset_all_stats(struct ibtrs_srv_stats *stats, bool enable)
+{
+   if (enable) {
+   ibtrs_srv_reset_wc_completion_stats(stats, enable);
+   ibtrs_srv_reset_rdma_stats(stats, enable);
+   return 0;
+   }
+
+   return -EINVAL;
+}
-- 
2.13.1



[PATCH v3 12/25] ibtrs: server: sysfs interface functions

2018-06-06 Thread Roman Pen
This is the sysfs interface to IBTRS sessions on server side:

  /sys/devices/virtual/ibtrs-server//
*** IBTRS session accepted from a client peer
|
|- paths//
   *** established paths from a client in a session
   |
   |- disconnect
   |  *** disconnect path
   |
   |- hca_name
   |  *** HCA name
   |
   |- hca_port
   |  *** HCA port
   |
   |- stats/
  *** current path statistics
  |
  |- rdma
  |- reset_all
  |- wc_completions

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c | 307 +
 1 file changed, 307 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c
new file mode 100644
index ..91f664b7eb66
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c
@@ -0,0 +1,307 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-pri.h"
+#include "ibtrs-srv.h"
+#include "ibtrs-log.h"
+
+static struct kobj_type ktype = {
+   .sysfs_ops  = _sysfs_ops,
+};
+
+static ssize_t ibtrs_srv_disconnect_show(struct kobject *kobj,
+struct kobj_attribute *attr,
+char *page)
+{
+   return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n",
+attr->attr.name);
+}
+
+static ssize_t ibtrs_srv_disconnect_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct ibtrs_srv_sess *sess;
+   char str[MAXHOSTNAMELEN];
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+   if (!sysfs_streq(buf, "1")) {
+   ibtrs_err(sess, "%s: invalid value: '%s'\n",
+ attr->attr.name, buf);
+   return -EINVAL;
+   }
+
+   sockaddr_to_str((struct sockaddr *)>s.dst_addr, str, sizeof(str));
+
+   ibtrs_info(sess, "disconnect for path %s requested\n", str);
+   ibtrs_srv_queue_close(sess);
+
+   return count;
+}
+
+static struct kobj_attribute ibtrs_srv_disconnect_attr =
+   __ATTR(disconnect, 0644,
+  ibtrs_srv_disconnect_show, ibtrs_srv_disconnect_store);
+
+static ssize_t ibtrs_srv_hca_port_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+   struct ibtrs_con *usr_con;
+
+   sess = container_of(kobj, typeof(*sess), kobj);
+   usr_con = sess->s.con[0];
+
+   return scnprintf(page, PAGE_SIZE, "%u\n",
+usr_con->cm_id->port_num);
+}
+
+static struct kobj_attribute ibtrs_srv_hca_port_attr =
+   __ATTR(hca_port, 0444, ibtrs_srv_hca_port_show, NULL);
+
+static ssize_t ibtrs_srv_hca_name_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+
+   return scnprintf(page, PAGE_SIZE, "%s\n",
+sess->s.dev->ib_dev->name);
+}
+
+static struct kobj_attribute ibtrs_srv_hca_name_attr =
+   __ATTR(hca_name, 0444, ibtrs_srv_hca_name_show, NULL);
+
+static ssize_t ibtrs_srv_src_addr_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+   int cnt;
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+   cnt = 

[PATCH v3 14/25] ibtrs: a bit of documentation

2018-06-06 Thread Roman Pen
README with description of major sysfs entries.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/README | 390 
 1 file changed, 390 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/README

diff --git a/drivers/infiniband/ulp/ibtrs/README 
b/drivers/infiniband/ulp/ibtrs/README
new file mode 100644
index ..d9d8cd69d44f
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/README
@@ -0,0 +1,390 @@
+
+InfiniBand Transport (IBTRS)
+
+
+IBTRS (InfiniBand Transport) is a reliable high speed transport library
+which provides support to establish optimal number of connections
+between client and server machines using RDMA (InfiniBand, RoCE, iWarp)
+transport. It is optimized to transfer (read/write) IO blocks.
+
+In its core interface it follows the BIO semantics of providing the
+possibility to either write data from an sg list to the remote side
+or to request ("read") data transfer from the remote side into a given
+sg list.
+
+IBTRS provides I/O fail-over and load-balancing capabilities by using
+multipath I/O (see "add_path" and "mp_policy" configuration entries).
+
+IBTRS is used by the IBNBD (Infiniband Network Block Device) modules.
+
+==
+Client Sysfs Interface
+==
+
+This chapter describes only the most important files of sysfs interface
+on client side.
+
+Entries under /sys/devices/virtual/ibtrs-client/
+
+
+When a user of IBTRS API creates a new session, a directory entry with
+the name of that session is created.
+
+Entries under /sys/devices/virtual/ibtrs-client//
+===
+
+add_path (RW)
+-
+
+Adds a new path (connection) to an existing session. Expected format is the
+following:
+
+  <[source addr,]destination addr>
+
+  *addr ::= [ ip: | gid: ]
+
+max_reconnect_attempts (RW)
+---
+
+Maximum number reconnect attempts the client should make before giving up
+after connection breaks unexpectedly.
+
+mp_policy (RW)
+--
+
+Multipath policy specifies which path should be selected on each IO:
+
+   round-robin (0):
+   select path in per CPU round-robin manner.
+
+   min-inflight (1):
+   select path with minimum inflights.
+
+Entries under /sys/devices/virtual/ibtrs-client//paths/
+=
+
+
+Each path belonging to a given session is listed here by its source and
+destination address. When a new path is added to a session by writing to
+the "add_path" entry, a directory  is created.
+
+Entries under /sys/devices/virtual/ibtrs-client//paths//
+===
+
+state (R)
+-
+
+Contains "connected" if the session is connected to the peer and fully
+functional.  Otherwise the file contains "disconnected"
+
+reconnect (RW)
+--
+
+Write "1" to the file in order to reconnect the path.
+Operation is blocking and returns 0 if reconnect was successful.
+
+disconnect (RW)
+---
+
+Write "1" to the file in order to disconnect the path.
+Operation blocks until IBTRS path is disconnected.
+
+remove_path (RW)
+
+
+Write "1" to the file in order to disconnected and remove the path
+from the session.  Operation blocks until the path is disconnected
+and removed from the session.
+
+hca_name (R)
+
+
+Contains the the name of HCA the connection established on.
+
+hca_port (R)
+
+
+Contains the port number of active port traffic is going through.
+
+src_addr (R)
+
+
+Contains the source address of the path
+
+dst_addr (R)
+
+
+Contains the destination address of the path
+
+
+Entries under 
/sys/devices/virtual/ibtrs-client//paths//stats/
+=
+
+Write "0" to any file in that directory to reset corresponding statistics.
+
+reset_all (RW)
+--
+
+Read will return usage help, write 0 will clear all the statistics.
+
+sg_entries (RW)
+---
+
+Data to be transferred via RDMA is passed to IBTRS as scatter-gather
+list. A scatter-gather list can contain multiple entries.
+Scatter-gather list with less entries require less processing power
+and can therefore transferred faster. The file sg_entries outputs a
+per-CPU distribution table for the number of entries in the
+scatter-gather lists, that were passed to the IBTRS API function
+ibtrs_clt_request (READ or WRITE).
+
+cpu_migration (RW)
+--
+
+IBTRS expects that each HCA IRQ is pinned to a separate CPU. If it's
+not the case, the processing of an I/O response could be processed on a
+different CPU than where it was originally submitted.  This file shows
+how many interrupts 

[PATCH v3 07/25] ibtrs: client: statistics functions

2018-06-06 Thread Roman Pen
This introduces set of functions used on client side to account
statistics of RDMA data sent/received, amount of IOs inflight,
latency, cpu migrations, etc.  Almost all statistics is collected
using percpu variables.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c | 455 +
 1 file changed, 455 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c
new file mode 100644
index ..af2ed05d2900
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c
@@ -0,0 +1,455 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-clt.h"
+
+static inline int ibtrs_clt_ms_to_id(unsigned long ms)
+{
+   int id = ms ? ilog2(ms) - MIN_LOG_LAT + 1 : 0;
+
+   return clamp(id, 0, LOG_LAT_SZ - 1);
+}
+
+void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *stats, bool read,
+  unsigned long ms)
+{
+   struct ibtrs_clt_stats_pcpu *s;
+   int id;
+
+   id = ibtrs_clt_ms_to_id(ms);
+   s = this_cpu_ptr(stats->pcpu_stats);
+   if (read) {
+   s->rdma_lat_distr[id].read++;
+   if (s->rdma_lat_max.read < ms)
+   s->rdma_lat_max.read = ms;
+   } else {
+   s->rdma_lat_distr[id].write++;
+   if (s->rdma_lat_max.write < ms)
+   s->rdma_lat_max.write = ms;
+   }
+}
+
+void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *stats)
+{
+   atomic_dec(>inflight);
+}
+
+void ibtrs_clt_update_wc_stats(struct ibtrs_clt_con *con)
+{
+   struct ibtrs_clt_sess *sess = to_clt_sess(con->c.sess);
+   struct ibtrs_clt_stats *stats = >stats;
+   struct ibtrs_clt_stats_pcpu *s;
+   int cpu;
+
+   cpu = raw_smp_processor_id();
+   s = this_cpu_ptr(stats->pcpu_stats);
+   s->wc_comp.cnt++;
+   s->wc_comp.total_cnt++;
+   if (unlikely(con->cpu != cpu)) {
+   s->cpu_migr.to++;
+
+   /* Careful here, override s pointer */
+   s = per_cpu_ptr(stats->pcpu_stats, con->cpu);
+   atomic_inc(>cpu_migr.from);
+   }
+}
+
+void ibtrs_clt_inc_failover_cnt(struct ibtrs_clt_stats *stats)
+{
+   struct ibtrs_clt_stats_pcpu *s;
+
+   s = this_cpu_ptr(stats->pcpu_stats);
+   s->rdma.failover_cnt++;
+}
+
+static inline u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_clt_stats *stats)
+{
+   u32 cnt = 0;
+   u64 sum = 0;
+   int cpu;
+
+   for_each_possible_cpu(cpu) {
+   struct ibtrs_clt_stats_pcpu *s;
+
+   s = per_cpu_ptr(stats->pcpu_stats, cpu);
+   sum += s->wc_comp.total_cnt;
+   cnt += s->wc_comp.cnt;
+   }
+
+   return cnt ? sum / cnt : 0;
+}
+
+int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_clt_stats *stats,
+char *buf, size_t len)
+{
+   return scnprintf(buf, len, "%u\n",
+ibtrs_clt_stats_get_avg_wc_cnt(stats));
+}
+
+ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_clt_stats *stats,
+ char *page, size_t len)
+{
+   struct ibtrs_clt_stats_rdma_lat res[LOG_LAT_SZ];
+   struct ibtrs_clt_stats_rdma_lat max;
+   struct ibtrs_clt_stats_pcpu *s;
+
+   ssize_t cnt = 0;
+   int i, cpu;
+
+   max.write = 0;
+   max.read = 0;
+   for_each_possible_cpu(cpu) {
+   s = per_cpu_ptr(stats->pcpu_stats, cpu);
+
+   if (max.write < s->rdma_lat_max.write)
+   max.write = s->rdma_lat_max.write;
+   if (max.read < s->rdma_lat_max.read)
+   max.read = s->rdma_lat_max.read;
+   

[PATCH v3 06/25] ibtrs: client: main functionality

2018-06-06 Thread Roman Pen
This is main functionality of ibtrs-client module, which manages
set of RDMA connections for each IBTRS session, does multipathing,
load balancing and failover of RDMA requests.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c | 2844 ++
 1 file changed, 2844 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c
new file mode 100644
index ..dc0327a95ef6
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c
@@ -0,0 +1,2844 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-clt.h"
+#include "ibtrs-log.h"
+
+#define MAX_SEGMENTS 31
+#define IBTRS_CONNECT_TIMEOUT_MS 5000
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Client");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+static ushort nr_cons_per_session;
+module_param(nr_cons_per_session, ushort, 0444);
+MODULE_PARM_DESC(nr_cons_per_session, "Number of connections per session."
+" (default: nr_cpu_ids)");
+
+static int retry_cnt = 7;
+module_param_named(retry_cnt, retry_cnt, int, 0644);
+MODULE_PARM_DESC(retry_cnt, "Number of times to send the message if the"
+" remote side didn't respond with Ack or Nack (default: 7,"
+" min: " __stringify(MIN_RTR_CNT) ", max: "
+__stringify(MAX_RTR_CNT) ")");
+
+static int __read_mostly noreg_cnt = 0;
+module_param_named(noreg_cnt, noreg_cnt, int, 0444);
+MODULE_PARM_DESC(noreg_cnt, "Max number of SG entries when MR registration "
+"does not happen (default: 0)");
+
+static const struct ibtrs_ib_dev_pool_ops dev_pool_ops;
+static struct ibtrs_ib_dev_pool dev_pool = {
+   .ops = _pool_ops
+};
+static struct workqueue_struct *ibtrs_wq;
+static struct class *ibtrs_dev_class;
+
+static void ibtrs_rdma_error_recovery(struct ibtrs_clt_con *con);
+static int ibtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id,
+struct rdma_cm_event *ev);
+static void ibtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc);
+static void complete_rdma_req(struct ibtrs_clt_io_req *req, int errno,
+ bool notify, bool can_wait);
+static int ibtrs_clt_write_req(struct ibtrs_clt_io_req *req);
+static int ibtrs_clt_read_req(struct ibtrs_clt_io_req *req);
+
+bool ibtrs_clt_sess_is_connected(const struct ibtrs_clt_sess *sess)
+{
+   return sess->state == IBTRS_CLT_CONNECTED;
+}
+
+static inline bool ibtrs_clt_is_connected(const struct ibtrs_clt *clt)
+{
+   struct ibtrs_clt_sess *sess;
+   bool connected = false;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(sess, >paths_list, s.entry)
+   connected |= ibtrs_clt_sess_is_connected(sess);
+   rcu_read_unlock();
+
+   return connected;
+}
+
+static inline struct ibtrs_tag *
+__ibtrs_get_tag(struct ibtrs_clt *clt, enum ibtrs_clt_con_type con_type)
+{
+   size_t max_depth = clt->queue_depth;
+   struct ibtrs_tag *tag;
+   int cpu, bit;
+
+   cpu = get_cpu();
+   do {
+   bit = find_first_zero_bit(clt->tags_map, max_depth);
+   if (unlikely(bit >= max_depth)) {
+   put_cpu();
+   return NULL;
+   }
+
+   } while (unlikely(test_and_set_bit_lock(bit, clt->tags_map)));
+   put_cpu();
+
+   tag = GET_TAG(clt, bit);
+   WARN_ON(tag->mem_id != bit);
+   tag->cpu_id = cpu;
+   tag->con_type = con_type;
+
+   return tag;
+}
+
+static inline void __ibtrs_put_tag(struct ibtrs_clt *clt,
+  struct ibtrs_tag *tag)
+{
+   clear_bit_unlock(tag->mem_id, 

[PATCH v3 08/25] ibtrs: client: sysfs interface functions

2018-06-06 Thread Roman Pen
This is the sysfs interface to IBTRS sessions on client side:

  /sys/devices/virtual/ibtrs-client//
*** IBTRS session created by ibtrs_clt_open() API call
|
|- max_reconnect_attempts
|  *** number of reconnect attempts for session
|
|- add_path
|  *** adds another connection path into IBTRS session
|
|- paths//
   *** established paths to server in a session
   |
   |- disconnect
   |  *** disconnect path
   |
   |- reconnect
   |  *** reconnect path
   |
   |- remove_path
   |  *** remove current path
   |
   |- state
   |  *** retrieve current path state
   |
   |- hca_port
   |  *** HCA port number
   |
   |- hca_name
   |  *** HCA name
   |
   |- stats/
  *** current path statistics
  |
  |- cpu_migration
  |- rdma
  |- rdma_lat
  |- reconnects
  |- reset_all
  |- sg_entries
  |- wc_completions

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c | 520 +
 1 file changed, 520 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c
new file mode 100644
index ..a25763a29a17
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c
@@ -0,0 +1,520 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-pri.h"
+#include "ibtrs-clt.h"
+#include "ibtrs-log.h"
+
+#define MIN_MAX_RECONN_ATT -1
+#define MAX_MAX_RECONN_ATT 
+
+static struct kobj_type ktype = {
+   .sysfs_ops = _sysfs_ops,
+};
+
+static ssize_t max_reconnect_attempts_show(struct device *dev,
+  struct device_attribute *attr,
+  char *page)
+{
+   struct ibtrs_clt *clt;
+
+   clt = container_of(dev, struct ibtrs_clt, dev);
+
+   return sprintf(page, "%d\n", ibtrs_clt_get_max_reconnect_attempts(clt));
+}
+
+static ssize_t max_reconnect_attempts_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf,
+   size_t count)
+{
+   struct ibtrs_clt *clt;
+   int value;
+   int ret;
+
+   clt = container_of(dev, struct ibtrs_clt, dev);
+
+   ret = kstrtoint(buf, 10, );
+   if (unlikely(ret)) {
+   ibtrs_err(clt, "%s: failed to convert string '%s' to int\n",
+ attr->attr.name, buf);
+   return ret;
+   }
+   if (unlikely(value > MAX_MAX_RECONN_ATT ||
+value < MIN_MAX_RECONN_ATT)) {
+   ibtrs_err(clt, "%s: invalid range"
+ " (provided: '%s', accepted: min: %d, max: %d)\n",
+ attr->attr.name, buf, MIN_MAX_RECONN_ATT,
+ MAX_MAX_RECONN_ATT);
+   return -EINVAL;
+   }
+   ibtrs_clt_set_max_reconnect_attempts(clt, value);
+
+   return count;
+}
+
+static DEVICE_ATTR_RW(max_reconnect_attempts);
+
+static ssize_t mpath_policy_show(struct device *dev,
+struct device_attribute *attr,
+char *page)
+{
+   struct ibtrs_clt *clt;
+
+   clt = container_of(dev, struct ibtrs_clt, dev);
+
+   switch (clt->mp_policy) {
+   case MP_POLICY_RR:
+   return sprintf(page, "round-robin (RR: %d)\n", clt->mp_policy);
+   case MP_POLICY_MIN_INFLIGHT:
+   return sprintf(page, "min-inflight (MI: %d)\n", clt->mp_policy);
+   default:
+   return sprintf(page, "Unknown (%d)\n", 

[PATCH v3 04/25] ibtrs: core: lib functions shared between client and server modules

2018-06-06 Thread Roman Pen
This is a set of library functions existing as a ibtrs-core module,
used by client and server modules.

Mainly these functions wrap IB and RDMA calls and provide a bit higher
abstraction for implementing of IBTRS protocol on client or server
sides.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs.c | 611 +++
 1 file changed, 611 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs.c
new file mode 100644
index ..11302408b13c
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs.c
@@ -0,0 +1,611 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-pri.h"
+#include "ibtrs-log.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Core");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t gfp_mask,
+   struct ib_device *dma_dev,
+   enum dma_data_direction direction,
+   void (*done)(struct ib_cq *cq,
+struct ib_wc *wc))
+{
+   struct ibtrs_iu *iu;
+
+   iu = kmalloc(sizeof(*iu), gfp_mask);
+   if (unlikely(!iu))
+   return NULL;
+
+   iu->buf = kzalloc(size, gfp_mask);
+   if (unlikely(!iu->buf))
+   goto err1;
+
+   iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, direction);
+   if (unlikely(ib_dma_mapping_error(dma_dev, iu->dma_addr)))
+   goto err2;
+
+   iu->cqe.done  = done;
+   iu->size  = size;
+   iu->direction = direction;
+   iu->tag   = tag;
+
+   return iu;
+
+err2:
+   kfree(iu->buf);
+err1:
+   kfree(iu);
+
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_alloc);
+
+void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir,
+  struct ib_device *ibdev)
+{
+   if (!iu)
+   return;
+
+   ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, dir);
+   kfree(iu->buf);
+   kfree(iu);
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_free);
+
+int ibtrs_iu_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+   struct ibtrs_sess *sess = con->sess;
+   struct ib_recv_wr wr, *bad_wr;
+   struct ib_sge list;
+
+   list.addr   = iu->dma_addr;
+   list.length = iu->size;
+   list.lkey   = sess->dev->ib_pd->local_dma_lkey;
+
+   if (WARN_ON(list.length == 0)) {
+   ibtrs_wrn(con, "Posting receive work request failed,"
+ " sg list is empty\n");
+   return -EINVAL;
+   }
+
+   wr.next= NULL;
+   wr.wr_cqe  = >cqe;
+   wr.sg_list = 
+   wr.num_sge = 1;
+
+   return ib_post_recv(con->qp, , _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_post_recv);
+
+int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe)
+{
+   struct ib_recv_wr wr, *bad_wr;
+
+   wr.next= NULL;
+   wr.wr_cqe  = cqe;
+   wr.sg_list = NULL;
+   wr.num_sge = 0;
+
+   return ib_post_recv(con->qp, , _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty);
+
+int ibtrs_post_recv_empty_x2(struct ibtrs_con *con, struct ib_cqe *cqe)
+{
+   struct ib_recv_wr wr_arr[2], *wr, *bad_wr;
+   int i;
+
+   memset(wr_arr, 0, sizeof(wr_arr));
+   for (i = 0; i < ARRAY_SIZE(wr_arr); i++) {
+   wr = _arr[i];
+   wr->wr_cqe  = cqe;
+   if (i)
+   /* Chain backwards */
+   wr->next = _arr[i - 1];
+   }
+
+   return ib_post_recv(con->qp, wr, _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty_x2);
+
+int ibtrs_iu_post_send(struct ibtrs_con *con, 

Re: [PATCH] block: always set partition number to '0' in blk_partition_remap()

2018-06-06 Thread Hannes Reinecke
On Wed, 6 Jun 2018 08:26:56 -0600
Jens Axboe  wrote:

> On 6/6/18 8:22 AM, Hannes Reinecke wrote:
> > blk_partition_remap() will only clear bi_partno if an actual
> > remapping has happened. But flush request et al don't have an
> > actual size, so the remapping doesn't happen and bi_partno is never
> > cleared. So for stacked devices blk_partition_remap() will be
> > called on each level. If (as is the case for native nvme
> > multipathing) one of the lower-level devices do _not_support
> > partitioning a spurious I/O error is generated.  
> 
> Just move it down, we're now clearing it for both cases.
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3f56be15f17e..cf0ee764b908 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2220,10 +2220,10 @@ static inline int blk_partition_remap(struct
> bio *bio) if (bio_check_eod(bio, part_nr_sects_read(p)))
>   goto out;
>   bio->bi_iter.bi_sector += p->start_sect;
> - bio->bi_partno = 0;
>   trace_block_bio_remap(bio->bi_disk->queue, bio,
> part_devt(p), bio->bi_iter.bi_sector - p->start_sect);
>   }
> + bio->bi_partno = 0;
>   ret = 0;
>  out:
>   rcu_read_unlock();
> 

Okay, will be resending.

Cheers,

Hannes



Re: [PATCH] block: pass failfast and driver-specific flags to flush requests

2018-06-06 Thread Christoph Hellwig
On Wed, Jun 06, 2018 at 04:21:40PM +0200, Hannes Reinecke wrote:
> If flush requests are being sent to the device we need to inherit the
> failfast and driver-specific flags, too, otherwise I/O will fail.

Looks fine,

Reviewed-by: Christoph Hellwig 


Re: [PATCH] block: always set partition number to '0' in blk_partition_remap()

2018-06-06 Thread Christoph Hellwig
On Wed, Jun 06, 2018 at 08:26:56AM -0600, Jens Axboe wrote:
> On 6/6/18 8:22 AM, Hannes Reinecke wrote:
> > blk_partition_remap() will only clear bi_partno if an actual remapping
> > has happened. But flush request et al don't have an actual size, so
> > the remapping doesn't happen and bi_partno is never cleared.
> > So for stacked devices blk_partition_remap() will be called on each level.
> > If (as is the case for native nvme multipathing) one of the lower-level
> > devices do _not_support partitioning a spurious I/O error is generated.
> 
> Just move it down, we're now clearing it for both cases.

Agreed.


Re: [PATCH blktests 1/9] blktests: add hepler functions for new md tests

2018-06-06 Thread Johannes Thumshirn
On Wed, Jun 06, 2018 at 08:29:25AM -0600, Jens Axboe wrote:
> Hopefully this can be the start of migrating over those tests!

Yes this would be great. I just wanted to connect the submitter and
the md developers and make them aware of possibly duplicated efforts
;-).

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH blktests 1/9] blktests: add hepler functions for new md tests

2018-06-06 Thread Jens Axboe
On 6/6/18 2:44 AM, Johannes Thumshirn wrote:
> On Wed, Jun 06, 2018 at 04:06:40PM +0800, bingjingc wrote:
>> We'd like to leverage this test framework for testing linux raid
>> software. There are several resync tasks in md/raid. For this commit,
>> we are trying to add creation resync and basic recovery tests for
>> every raid type.
>>
>> RAID is different from other block devices. It requires several
>> raid devices and hotspare devices for being assembled, disambled,
>> expended or recovered in the runtime. So we don't test devices
>> iteratively in TEST_DEVS list. We define RAID_DEVS and
>> RAID_SPARE_DEVS lists for providing block devices instead.
>>
>> We want to test the software not devices. We also provide a
>> LIMIT_DEV_SIZE option for limiting the tested array size by limiting
>> used space for each block device.
>>
>> [Getting Started]
>>
>> Additional dependencies are also minimal:
>> - mdadm
>> - cmp
>>
>> And please provide a file named config:
>> RAID_DEVS=(/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4)
>> RAID_SPARE_DEVS=(/dev/loop100 /dev/loop101)
>> LIMIT_DEV_SIZE=20480 # optional
>>
>> And as root, you can run the md set of tests by typing
>> ./check md
>>
>> For someone who don't want to run md tests, they just keep RAID_DEVS
>> and RAID_SPARE_DEVS not assigned, all md tests will be skipped.
> 
> While I'm very much in support for the idea, please be aware that md
> has an own test suite by itself and please Cc the md mailing list as
> well.

Hopefully this can be the start of migrating over those tests!

-- 
Jens Axboe



Re: [PATCH] block: pass failfast and driver-specific flags to flush requests

2018-06-06 Thread Jens Axboe
On 6/6/18 8:21 AM, Hannes Reinecke wrote:
> If flush requests are being sent to the device we need to inherit the
> failfast and driver-specific flags, too, otherwise I/O will fail.

Looks good to me.

-- 
Jens Axboe



Re: [PATCH] block: always set partition number to '0' in blk_partition_remap()

2018-06-06 Thread Jens Axboe
On 6/6/18 8:22 AM, Hannes Reinecke wrote:
> blk_partition_remap() will only clear bi_partno if an actual remapping
> has happened. But flush request et al don't have an actual size, so
> the remapping doesn't happen and bi_partno is never cleared.
> So for stacked devices blk_partition_remap() will be called on each level.
> If (as is the case for native nvme multipathing) one of the lower-level
> devices do _not_support partitioning a spurious I/O error is generated.

Just move it down, we're now clearing it for both cases.

diff --git a/block/blk-core.c b/block/blk-core.c
index 3f56be15f17e..cf0ee764b908 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2220,10 +2220,10 @@ static inline int blk_partition_remap(struct bio *bio)
if (bio_check_eod(bio, part_nr_sects_read(p)))
goto out;
bio->bi_iter.bi_sector += p->start_sect;
-   bio->bi_partno = 0;
trace_block_bio_remap(bio->bi_disk->queue, bio, part_devt(p),
  bio->bi_iter.bi_sector - p->start_sect);
}
+   bio->bi_partno = 0;
ret = 0;
 out:
rcu_read_unlock();

-- 
Jens Axboe



Re: blktests block/019 lead system hang

2018-06-06 Thread Keith Busch
On Wed, Jun 06, 2018 at 01:42:15PM +0800, Yi Zhang wrote:
> Here is the output, and I can see "HotPlug+ Surprise+" on SltCap

Thanks. That looks like a perfectly capable port. I even have the same
switch in one of my machines, but the test doesn't trigger fatal
firmware-first errors.

Might need to query something about the platform to know how it treats
link-downs before proceeding with the test (don't know off the top of
my head; will do some digging).


[PATCH] block: always set partition number to '0' in blk_partition_remap()

2018-06-06 Thread Hannes Reinecke
blk_partition_remap() will only clear bi_partno if an actual remapping
has happened. But flush request et al don't have an actual size, so
the remapping doesn't happen and bi_partno is never cleared.
So for stacked devices blk_partition_remap() will be called on each level.
If (as is the case for native nvme multipathing) one of the lower-level
devices do _not_support partitioning a spurious I/O error is generated.

Signed-off-by: Hannes Reinecke 
---
 block/blk-core.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cee03cad99f2..8a2c3a474234 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2225,7 +2225,10 @@ static inline int blk_partition_remap(struct bio *bio)
bio->bi_partno = 0;
trace_block_bio_remap(bio->bi_disk->queue, bio, part_devt(p),
  bio->bi_iter.bi_sector - p->start_sect);
-   }
+   } else
+   /* Set partition number to '0' to avoid repetitive calls */
+   bio->bi_partno = 0;
+
ret = 0;
 out:
rcu_read_unlock();
-- 
2.12.3



[PATCH] block: pass failfast and driver-specific flags to flush requests

2018-06-06 Thread Hannes Reinecke
If flush requests are being sent to the device we need to inherit the
failfast and driver-specific flags, too, otherwise I/O will fail.

Signed-off-by: Hannes Reinecke 
---
 block/blk-flush.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index f17170675917..058abdb50f31 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -94,7 +94,7 @@ enum {
 };
 
 static bool blk_kick_flush(struct request_queue *q,
-  struct blk_flush_queue *fq);
+  struct blk_flush_queue *fq, unsigned int flags);
 
 static unsigned int blk_flush_policy(unsigned long fflags, struct request *rq)
 {
@@ -212,7 +212,7 @@ static bool blk_flush_complete_seq(struct request *rq,
BUG();
}
 
-   kicked = blk_kick_flush(q, fq);
+   kicked = blk_kick_flush(q, fq, rq->cmd_flags);
return kicked | queued;
 }
 
@@ -281,6 +281,7 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
  * blk_kick_flush - consider issuing flush request
  * @q: request_queue being kicked
  * @fq: flush queue
+ * @flags: cmd_flags of the original request
  *
  * Flush related states of @q have changed, consider issuing flush request.
  * Please read the comment at the top of this file for more info.
@@ -291,7 +292,8 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
  * RETURNS:
  * %true if flush was issued, %false otherwise.
  */
-static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
+static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
+  unsigned int flags)
 {
struct list_head *pending = >flush_queue[fq->flush_pending_idx];
struct request *first_rq =
@@ -346,6 +348,7 @@ static bool blk_kick_flush(struct request_queue *q, struct 
blk_flush_queue *fq)
}
 
flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
+   flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
flush_rq->rq_flags |= RQF_FLUSH_SEQ;
flush_rq->rq_disk = first_rq->rq_disk;
flush_rq->end_io = flush_end_io;
-- 
2.12.3



Re: [PATCH blktests 1/9] blktests: add hepler functions for new md tests

2018-06-06 Thread Johannes Thumshirn
On Wed, Jun 06, 2018 at 04:06:40PM +0800, bingjingc wrote:
> We'd like to leverage this test framework for testing linux raid
> software. There are several resync tasks in md/raid. For this commit,
> we are trying to add creation resync and basic recovery tests for
> every raid type.
> 
> RAID is different from other block devices. It requires several
> raid devices and hotspare devices for being assembled, disambled,
> expended or recovered in the runtime. So we don't test devices
> iteratively in TEST_DEVS list. We define RAID_DEVS and
> RAID_SPARE_DEVS lists for providing block devices instead.
> 
> We want to test the software not devices. We also provide a
> LIMIT_DEV_SIZE option for limiting the tested array size by limiting
> used space for each block device.
> 
> [Getting Started]
> 
> Additional dependencies are also minimal:
> - mdadm
> - cmp
> 
> And please provide a file named config:
> RAID_DEVS=(/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4)
> RAID_SPARE_DEVS=(/dev/loop100 /dev/loop101)
> LIMIT_DEV_SIZE=20480 # optional
> 
> And as root, you can run the md set of tests by typing
> ./check md
> 
> For someone who don't want to run md tests, they just keep RAID_DEVS
> and RAID_SPARE_DEVS not assigned, all md tests will be skipped.

While I'm very much in support for the idea, please be aware that md
has an own test suite by itself and please Cc the md mailing list as
well.

Thanks,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH blktests 9/9] blktests: add a regression test for raid6 recovery

2018-06-06 Thread bingjingc
Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 tests/md/008 | 69 
 tests/md/008.out |  4 
 2 files changed, 73 insertions(+)
 create mode 100755 tests/md/008
 create mode 100644 tests/md/008.out

diff --git a/tests/md/008 b/tests/md/008
new file mode 100755
index 000..0c85201
--- /dev/null
+++ b/tests/md/008
@@ -0,0 +1,69 @@
+#!/bin/bash
+#
+# RAID6 recovery test
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/md
+
+DESCRIPTION="run raid6 recovery test"
+QUICK=1
+
+requires() {
+   _check_md_devname_available /dev/md/blktests8 && \
+   _have_program cmp && _have_spares && _meet_raid6_requirement && \
+   _check_raid_devs_available && _check_raid_spares_available
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   if [ -n "$LIMIT_DEV_SIZE" ]; then
+   mdadm -C /dev/md/blktests8 -R -l6 "-n${#RAID_DEVS[@]}" \
+   "${RAID_DEVS[@]}" -z "$LIMIT_DEV_SIZE" 2>&1 | grep started
+   else
+   mdadm -C /dev/md/blktests8 -R -l6 "-n${#RAID_DEVS[@]}" \
+   "${RAID_DEVS[@]}" 2>&1 | grep started
+   fi
+   if [ $? -ne 0 ]; then
+   echo "Array is not started."
+   return
+   fi
+
+   local md
+   md=$(basename "$(readlink /dev/md/blktests8)")
+   _wait_sync_completed "$md"
+
+   # blktests doesn't support regex on *.out, so ignore the result.
+   mdadm /dev/md/blktests8 -f "${RAID_DEVS[0]}" 2>/dev/null
+   mdadm /dev/md/blktests8 -r "${RAID_DEVS[0]}" 2>/dev/null
+   mdadm /dev/md/blktests8 -a "${RAID_SPARE_DEVS[0]}" 2>/dev/null
+   _wait_recovery_completed "$md"
+
+   local skip1
+   local skip2
+   local size
+   skip1=$(mdadm -E "${RAID_DEVS[0]}" | grep "Data Offset" | grep -o 
"[0-9]*")
+   skip2=$(mdadm -E "${RAID_SPARE_DEVS[0]}" | grep "Data Offset" | grep -o 
"[0-9]*")
+   size=$(cat /sys/block/"$md"/md/component_size)
+   mdadm -S /dev/md/blktests8
+   skip1=$((skip1 * 512))
+   skip2=$((skip2 * 512))
+   size=$((size * 1024))
+   cmp -i "$skip1:$skip2" -n "$size" "${RAID_DEVS[0]}" 
"${RAID_SPARE_DEVS[0]}"
+
+   echo "Test complete"
+}
diff --git a/tests/md/008.out b/tests/md/008.out
new file mode 100644
index 000..16b68a7
--- /dev/null
+++ b/tests/md/008.out
@@ -0,0 +1,4 @@
+Running md/008
+mdadm: array /dev/md/blktests8 started.
+mdadm: stopped /dev/md/blktests8
+Test complete
-- 
2.7.4



[PATCH blktests 8/9] blktests: add a regression test for raid5 recovery

2018-06-06 Thread bingjingc
Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 tests/md/007 | 72 
 tests/md/007.out |  4 
 2 files changed, 76 insertions(+)
 create mode 100755 tests/md/007
 create mode 100644 tests/md/007.out

diff --git a/tests/md/007 b/tests/md/007
new file mode 100755
index 000..ae80229
--- /dev/null
+++ b/tests/md/007
@@ -0,0 +1,72 @@
+#!/bin/bash
+#
+# RAID5 recovery test
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/md
+
+DESCRIPTION="run raid5 recovery test"
+QUICK=1
+
+requires() {
+   _check_md_devname_available /dev/md/blktests7 && \
+   _have_program cmp && _have_spares && _meet_raid5_requirement && \
+   _check_raid_devs_available && _check_raid_spares_available
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   if [ -n "$LIMIT_DEV_SIZE" ]; then
+   mdadm -C /dev/md/blktests7 -R --force -l5 "-n${#RAID_DEVS[@]}" \
+   "${RAID_DEVS[@]}" -z "$LIMIT_DEV_SIZE" 2>&1 | grep started
+   else
+   mdadm -C /dev/md/blktests7 -R --force -l5 "-n${#RAID_DEVS[@]}" \
+   "${RAID_DEVS[@]}" 2>&1 | grep started
+   fi
+   if [ $? -ne 0 ]; then
+   echo "Array is not started."
+   return
+   fi
+
+   local md
+   md=$(basename "$(readlink /dev/md/blktests7)")
+   _wait_sync_completed "$md"
+
+   # blktests doesn't support regex on *.out, so ignore the result.
+   mdadm /dev/md/blktests7 -f "${RAID_DEVS[0]}" 2>/dev/null
+   mdadm /dev/md/blktests7 -r "${RAID_DEVS[0]}" 2>/dev/null
+   mdadm /dev/md/blktests7 -a "${RAID_SPARE_DEVS[0]}" 2>/dev/null
+   _wait_recovery_completed "$md"
+
+   local skip1
+   local skip2
+   local size
+   skip1=$(mdadm -E "${RAID_DEVS[0]}" | grep "Data Offset" | \
+   grep -o "[0-9]*")
+   skip2=$(mdadm -E "${RAID_SPARE_DEVS[0]}" | grep "Data Offset" | \
+   grep -o "[0-9]*")
+   size=$(cat /sys/block/"$md"/md/component_size)
+   mdadm -S /dev/md/blktests7
+   skip1=$((skip1 * 512))
+   skip2=$((skip2 * 512))
+   size=$((size * 1024))
+   cmp -i "$skip1:$skip2" -n "$size" "${RAID_DEVS[0]}" \
+   "${RAID_SPARE_DEVS[0]}"
+
+   echo "Test complete"
+}
diff --git a/tests/md/007.out b/tests/md/007.out
new file mode 100644
index 000..1c79d9b
--- /dev/null
+++ b/tests/md/007.out
@@ -0,0 +1,4 @@
+Running md/007
+mdadm: array /dev/md/blktests7 started.
+mdadm: stopped /dev/md/blktests7
+Test complete
-- 
2.7.4



[PATCH blktests 7/9] blktests: add a regression test for raid10 recovery

2018-06-06 Thread bingjingc
Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 tests/md/006 | 80 
 tests/md/006.out |  4 +++
 2 files changed, 84 insertions(+)
 create mode 100755 tests/md/006
 create mode 100644 tests/md/006.out

diff --git a/tests/md/006 b/tests/md/006
new file mode 100755
index 000..c124913
--- /dev/null
+++ b/tests/md/006
@@ -0,0 +1,80 @@
+#!/bin/bash
+#
+# RAID10 recovery test
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/md
+
+DESCRIPTION="run raid10 recovery test"
+QUICK=1
+
+requires() {
+   _check_md_devname_available /dev/md/blktests6 && \
+   _have_program cmp && _have_spares && _meet_raid10_requirement && \
+   _check_raid_devs_available && _check_raid_spares_available
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   local size=${#RAID_DEVS[@]}
+   local devices=( "${RAID_DEVS[@]}" )
+
+   if [ $((size % 2)) -eq 1 ]; then
+   unset "devices[${#devices[@]}-1]"
+   size=$((size - 1))
+   fi
+
+   if [ -n "$LIMIT_DEV_SIZE" ]; then
+   mdadm -C /dev/md/blktests6 -R -l10 "-n$size" -pn2 \
+   "${devices[@]}" -z "$LIMIT_DEV_SIZE" 2>&1 | grep started
+   else
+   mdadm -C /dev/md/blktests6 -R -l10 "-n$size" -pn2 \
+   "${devices[@]}" 2>&1 | grep started
+   fi
+   if [ $? -ne 0 ]; then
+   echo "Array is not started."
+   return
+   fi
+
+   local md
+   md=$(basename "$(readlink /dev/md/blktests6)")
+   _wait_sync_completed "$md"
+
+   # blktests doesn't support regex on *.out, so ignore the result.
+   mdadm /dev/md/blktests6 -f "${RAID_DEVS[0]}" 2>/dev/null
+   mdadm /dev/md/blktests6 -r "${RAID_DEVS[0]}" 2>/dev/null
+   mdadm /dev/md/blktests6 -a "${RAID_SPARE_DEVS[0]}" 2>/dev/null
+   _wait_recovery_completed "$md"
+
+   local skip1
+   local skip2
+   local size
+   skip1=$(mdadm -E "${RAID_DEVS[0]}" | grep "Data Offset" | \
+   grep -o "[0-9]*")
+   skip2=$(mdadm -E "${RAID_SPARE_DEVS[0]}" | grep "Data Offset" | \
+   grep -o "[0-9]*")
+   size=$(cat /sys/block/"$md"/md/component_size)
+   mdadm -S /dev/md/blktests6
+   skip1=$((skip1 * 512))
+   skip2=$((skip2 * 512))
+   size=$((size * 1024))
+   cmp -i "$skip1:$skip2" -n "$size" "${RAID_DEVS[0]}" \
+   "${RAID_SPARE_DEVS[0]}"
+
+   echo "Test complete"
+}
diff --git a/tests/md/006.out b/tests/md/006.out
new file mode 100644
index 000..2eff0b1
--- /dev/null
+++ b/tests/md/006.out
@@ -0,0 +1,4 @@
+Running md/006
+mdadm: array /dev/md/blktests6 started.
+mdadm: stopped /dev/md/blktests6
+Test complete
-- 
2.7.4



[PATCH blktests 6/9] blktests: add a regression test for raid1 recovery

2018-06-06 Thread bingjingc
Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 tests/md/005 | 72 
 tests/md/005.out |  4 
 2 files changed, 76 insertions(+)
 create mode 100755 tests/md/005
 create mode 100644 tests/md/005.out

diff --git a/tests/md/005 b/tests/md/005
new file mode 100755
index 000..e526e74
--- /dev/null
+++ b/tests/md/005
@@ -0,0 +1,72 @@
+#!/bin/bash
+#
+# RAID1 recovery test
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/md
+
+DESCRIPTION="run raid1 recovery test"
+QUICK=1
+
+requires() {
+   _check_md_devname_available /dev/md/blktests5 && \
+   _have_program cmp && _have_spares && _meet_raid1_requirement && \
+   _check_raid_devs_available && _check_raid_spares_available
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   if [ -n "$LIMIT_DEV_SIZE" ]; then
+   mdadm -C /dev/md/blktests5 -R -l1 "-n${#RAID_DEVS[@]}" \
+   "${RAID_DEVS[@]}" -z "$LIMIT_DEV_SIZE" 2>&1 | grep started
+   else
+   mdadm -C /dev/md/blktests5 -R -l1 "-n${#RAID_DEVS[@]}" \
+   "${RAID_DEVS[@]}" 2>&1 | grep started
+   fi
+   if [ $? -ne 0 ]; then
+   echo "Array is not started."
+   return
+   fi
+
+   local md
+   md=$(basename "$(readlink /dev/md/blktests5)")
+   _wait_sync_completed "$md"
+
+   # blktests doesn't support regex on *.out, so ignore the result.
+   mdadm /dev/md/blktests5 -f "${RAID_DEVS[0]}" 2>/dev/null
+   mdadm /dev/md/blktests5 -r "${RAID_DEVS[0]}" 2>/dev/null
+   mdadm /dev/md/blktests5 -a "${RAID_SPARE_DEVS[0]}" 2>/dev/null
+   _wait_recovery_completed "$md"
+
+   local skip1
+   local skip2
+   local size
+   skip1=$(mdadm -E "${RAID_DEVS[0]}" | grep "Data Offset" | \
+   grep -o "[0-9]*")
+   skip2=$(mdadm -E "${RAID_SPARE_DEVS[0]}" | grep "Data Offset" | \
+   grep -o "[0-9]*")
+   size=$(cat /sys/block/"$md"/md/component_size)
+   mdadm -S /dev/md/blktests5
+   skip1=$((skip1 * 512))
+   skip2=$((skip2 * 512))
+   size=$((size * 1024))
+   cmp -i "$skip1:$skip2" -n "$size" "${RAID_DEVS[0]}" \
+   "${RAID_SPARE_DEVS[0]}"
+
+   echo "Test complete"
+}
diff --git a/tests/md/005.out b/tests/md/005.out
new file mode 100644
index 000..ec5b2bd
--- /dev/null
+++ b/tests/md/005.out
@@ -0,0 +1,4 @@
+Running md/005
+mdadm: array /dev/md/blktests5 started.
+mdadm: stopped /dev/md/blktests5
+Test complete
-- 
2.7.4



[PATCH blktests 5/9] blktests: add a regression test for raid6 creation resync

2018-06-06 Thread bingjingc
Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 tests/md/004 | 60 
 tests/md/004.out |  4 
 2 files changed, 64 insertions(+)
 create mode 100755 tests/md/004
 create mode 100644 tests/md/004.out

diff --git a/tests/md/004 b/tests/md/004
new file mode 100755
index 000..591e096
--- /dev/null
+++ b/tests/md/004
@@ -0,0 +1,60 @@
+#!/bin/bash
+#
+# Create a raid6 device of all given devices, and check theirs are
+# all in-sync afterwards.
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/md
+
+DESCRIPTION="run raid6 creation resync test"
+QUICK=1
+
+requires() {
+   _check_md_devname_available /dev/md/blktests4 && \
+   _check_raid_devs_available && _meet_raid6_requirement
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   if [ -n "$LIMIT_DEV_SIZE" ]; then
+   mdadm -C /dev/md/blktests4 -R -l6 "-n${#RAID_DEVS[@]}"\
+   "${RAID_DEVS[@]}" -z "$LIMIT_DEV_SIZE" 2>&1 | grep started
+   else
+   mdadm -C /dev/md/blktests4 -R -l6 "-n${#RAID_DEVS[@]}"\
+   "${RAID_DEVS[@]}" 2>&1 | grep started
+   fi
+   if [ $? -ne 0 ]; then
+   echo "Array is not started."
+   return
+   fi
+
+   local md
+   md=$(basename "$(readlink /dev/md/blktests4)")
+   _wait_sync_completed "$md"
+
+   local mismatched
+   _check_integrity "$md"
+   mismatched=$(cat /sys/block/"$md"/md/mismatch_cnt)
+   mdadm -S /dev/md/blktests4
+
+   if [ "$mismatched" -ne 0 ]; then
+   echo "Array is not synced."
+   fi
+
+   echo "Test complete"
+}
\ No newline at end of file
diff --git a/tests/md/004.out b/tests/md/004.out
new file mode 100644
index 000..1c9b7a8
--- /dev/null
+++ b/tests/md/004.out
@@ -0,0 +1,4 @@
+Running md/004
+mdadm: array /dev/md/blktests4 started.
+mdadm: stopped /dev/md/blktests4
+Test complete
-- 
2.7.4



[PATCH blktests 4/9] blktests: add a regression test for raid5 creation resync

2018-06-06 Thread bingjingc
Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 tests/md/003 | 60 
 tests/md/003.out |  4 
 2 files changed, 64 insertions(+)
 create mode 100755 tests/md/003
 create mode 100644 tests/md/003.out

diff --git a/tests/md/003 b/tests/md/003
new file mode 100755
index 000..71c2e56
--- /dev/null
+++ b/tests/md/003
@@ -0,0 +1,60 @@
+#!/bin/bash
+#
+# Create a raid5 device of all given devices, and check theirs are
+# all in-sync afterwards.
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/md
+
+DESCRIPTION="run raid5 creation resync test"
+QUICK=1
+
+requires() {
+   _check_md_devname_available /dev/md/blktests3 && \
+   _check_raid_devs_available && _meet_raid5_requirement
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   if [ -n "$LIMIT_DEV_SIZE" ]; then
+   mdadm -C /dev/md/blktests3 -R --force -l5 "-n${#RAID_DEVS[@]}"\
+   "${RAID_DEVS[@]}" -z "$LIMIT_DEV_SIZE" 2>&1 | grep started
+   else
+   mdadm -C /dev/md/blktests3 -R --force -l5 "-n${#RAID_DEVS[@]}"\
+   "${RAID_DEVS[@]}" 2>&1 | grep started
+   fi
+   if [ $? -ne 0 ]; then
+   echo "Array is not started."
+   return
+   fi
+
+   local md
+   md=$(basename "$(readlink /dev/md/blktests3)")
+   _wait_sync_completed "$md"
+
+   local mismatched
+   _check_integrity "$md"
+   mismatched=$(cat /sys/block/"$md"/md/mismatch_cnt)
+   mdadm -S /dev/md/blktests3
+
+   if [ "$mismatched" -ne 0 ]; then
+   echo "Array is not synced."
+   fi
+
+   echo "Test complete"
+}
\ No newline at end of file
diff --git a/tests/md/003.out b/tests/md/003.out
new file mode 100644
index 000..ae7c984
--- /dev/null
+++ b/tests/md/003.out
@@ -0,0 +1,4 @@
+Running md/003
+mdadm: array /dev/md/blktests3 started.
+mdadm: stopped /dev/md/blktests3
+Test complete
-- 
2.7.4



[PATCH blktests 3/9] blktests: add a regression test for raid10 creation resync

2018-06-06 Thread bingjingc
Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 tests/md/002 | 68 
 tests/md/002.out |  4 
 2 files changed, 72 insertions(+)
 create mode 100755 tests/md/002
 create mode 100644 tests/md/002.out

diff --git a/tests/md/002 b/tests/md/002
new file mode 100755
index 000..9da322a
--- /dev/null
+++ b/tests/md/002
@@ -0,0 +1,68 @@
+#!/bin/bash
+#
+# Create a raid10 (nearcopy=2) device of all given devices, and check
+# theirs are all in-sync afterwards.
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/md
+
+DESCRIPTION="run raid10 (nearcopy=2) creation resync test"
+QUICK=1
+
+requires() {
+   _check_md_devname_available /dev/md/blktests2 && \
+   _check_raid_devs_available && _meet_raid10_requirement
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   local size=${#RAID_DEVS[@]}
+   local devices=( "${RAID_DEVS[@]}" )
+
+   if [ $((size % 2)) -eq 1 ]; then
+   unset "devices[${#devices[@]}-1]"
+   size=$((size - 1))
+   fi
+
+   if [ -n "$LIMIT_DEV_SIZE" ]; then
+   mdadm -C /dev/md/blktests2 -R -l10 "-n$size" -pn2 \
+   "${devices[@]}" -z "$LIMIT_DEV_SIZE" 2>&1 | grep started
+   else
+   mdadm -C /dev/md/blktests2 -R -l10 "-n$size" -pn2 \
+   "${devices[@]}" 2>&1 | grep started
+   fi
+   if [ $? -ne 0 ]; then
+   echo "Array is not started."
+   return
+   fi
+
+   local md
+   md=$(basename "$(readlink /dev/md/blktests2)")
+   _wait_sync_completed "$md"
+
+   local mismatched
+   _check_integrity "$md"
+   mismatched=$(cat /sys/block/"$md"/md/mismatch_cnt)
+   mdadm -S /dev/md/blktests2
+
+   if [ "$mismatched" -ne 0 ]; then
+   echo "Array is not synced."
+   fi
+
+   echo "Test complete"
+}
diff --git a/tests/md/002.out b/tests/md/002.out
new file mode 100644
index 000..356f3fe
--- /dev/null
+++ b/tests/md/002.out
@@ -0,0 +1,4 @@
+Running md/002
+mdadm: array /dev/md/blktests2 started.
+mdadm: stopped /dev/md/blktests2
+Test complete
-- 
2.7.4



[PATCH blktests 2/9] blktests: add a regression test for raid1 creation resync

2018-06-06 Thread bingjingc
Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 tests/md/001 | 60 
 tests/md/001.out |  4 
 2 files changed, 64 insertions(+)
 create mode 100755 tests/md/001
 create mode 100644 tests/md/001.out

diff --git a/tests/md/001 b/tests/md/001
new file mode 100755
index 000..8b3070d
--- /dev/null
+++ b/tests/md/001
@@ -0,0 +1,60 @@
+#!/bin/bash
+#
+# Create a raid1 device of all given devices, and check theirs are
+# all in-sync afterwards.
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/md
+
+DESCRIPTION="run raid1 creation resync test"
+QUICK=1
+
+requires() {
+   _check_md_devname_available /dev/md/blktests1 && \
+   _check_raid_devs_available && _meet_raid1_requirement
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   if [ -n "$LIMIT_DEV_SIZE" ]; then
+   mdadm -C /dev/md/blktests1 -R -l1 "-n${#RAID_DEVS[@]}" \
+   "${RAID_DEVS[@]}" -z "$LIMIT_DEV_SIZE" 2>&1 | grep started
+   else
+   mdadm -C /dev/md/blktests1 -R -l1 "-n${#RAID_DEVS[@]}" \
+   "${RAID_DEVS[@]}" 2>&1 | grep started
+   fi
+   if [ $? -ne 0 ]; then
+   echo "Array is not started."
+   return
+   fi
+
+   local md
+   md=$(basename "$(readlink /dev/md/blktests1)")
+   _wait_sync_completed "$md"
+
+   local mismatched
+   _check_integrity "$md"
+   mismatched=$(cat /sys/block/"$md"/md/mismatch_cnt)
+   mdadm -S /dev/md/blktests1
+
+   if [ "$mismatched" -ne 0 ]; then
+   echo "Array is not synced."
+   fi
+
+   echo "Test complete"
+}
diff --git a/tests/md/001.out b/tests/md/001.out
new file mode 100644
index 000..0dd94d8
--- /dev/null
+++ b/tests/md/001.out
@@ -0,0 +1,4 @@
+Running md/001
+mdadm: array /dev/md/blktests1 started.
+mdadm: stopped /dev/md/blktests1
+Test complete
-- 
2.7.4



[PATCH blktests 1/9] blktests: add hepler functions for new md tests

2018-06-06 Thread bingjingc
We'd like to leverage this test framework for testing linux raid
software. There are several resync tasks in md/raid. For this commit,
we are trying to add creation resync and basic recovery tests for
every raid type.

RAID is different from other block devices. It requires several
raid devices and hotspare devices for being assembled, disambled,
expended or recovered in the runtime. So we don't test devices
iteratively in TEST_DEVS list. We define RAID_DEVS and
RAID_SPARE_DEVS lists for providing block devices instead.

We want to test the software not devices. We also provide a
LIMIT_DEV_SIZE option for limiting the tested array size by limiting
used space for each block device.

[Getting Started]

Additional dependencies are also minimal:
- mdadm
- cmp

And please provide a file named config:
RAID_DEVS=(/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4)
RAID_SPARE_DEVS=(/dev/loop100 /dev/loop101)
LIMIT_DEV_SIZE=20480 # optional

And as root, you can run the md set of tests by typing
./check md

For someone who don't want to run md tests, they just keep RAID_DEVS
and RAID_SPARE_DEVS not assigned, all md tests will be skipped.

Reviewed-by: Chung-Chiang Cheng 
Signed-off-by: BingJing Chang 
---
 common/md  | 198 +
 tests/md/group |  24 +++
 2 files changed, 222 insertions(+)
 create mode 100644 common/md
 create mode 100644 tests/md/group

diff --git a/common/md b/common/md
new file mode 100644
index 000..38c2554
--- /dev/null
+++ b/common/md
@@ -0,0 +1,198 @@
+#!/bin/bash
+#
+# Default helper functions for MD devices.
+#
+# Copyright (C) 2018 Synology Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+
+_check_md_devname_available() {
+   local dev="$1"
+
+   if [ -b "$dev" ]; then
+   SKIP_REASON="Detect that $1 exists. Stop it before this test."
+   return 1
+   fi
+
+   return 0;
+}
+
+_dev_is_available() {
+   local dev="$1"
+
+   if grep -qw "$dev" /proc/mounts; then
+   SKIP_REASON="Detect that $1 is mounted. (see mount)"
+   return 1
+   fi
+
+   dev=$(basename "$dev")
+   if grep -qw "$dev" /proc/mdstat; then
+   SKIP_REASON="Detect that $1 is used. (see cat /proc/mdstat)"
+   return 1
+   fi
+
+   return 0
+}
+
+_check_raid_devs_available() {
+   for dev in "${RAID_DEVS[@]}"
+   do
+   if ! _dev_is_available "$dev"; then
+   return 1
+   fi
+   done
+   return 0
+}
+
+_check_raid_spares_available() {
+   for spare in "${RAID_SPARE_DEVS[@]}"
+   do
+   if ! _dev_is_available "$spare"; then
+   return 1
+   fi
+   done
+   return 0
+}
+
+_have_spares() {
+   local size=${#RAID_SPARE_DEVS[@]}
+
+   if [ -z "$RAID_SPARE_DEVS" ] || [ "$size" -eq 0 ]; then
+   SKIP_REASON="There are no spare devices."
+   SKIP_REASON+=" (RAID_SPARE_DEVS=$RAID_SPARE_DEVS)"
+   return 1
+   fi
+   return 0
+}
+
+_meet_raid1_requirement() {
+   local size=${#RAID_DEVS[@]}
+
+   if ! grep -qw raid1 /proc/mdstat; then
+   SKIP_REASON="RAID1 is not available in /proc/mdstat."
+   return 1
+   fi
+
+   if [ "$size" -lt 2 ]; then
+   SKIP_REASON="RAID1 requires at least 2 devices."
+   SKIP_REASON+=" (RAID_DEVS=$RAID_DEVS)"
+   return 1
+   fi
+   return 0
+}
+
+_meet_raid10_requirement() {
+   local size=${#RAID_DEVS[@]}
+
+   if ! grep -qw raid10 /proc/mdstat; then
+   SKIP_REASON="RAID10 is not available in /proc/mdstat."
+   return 1
+   fi
+
+   if [ "$size" -lt 4 ]; then
+   SKIP_REASON="RAID10 requires at least 4 devices."
+   SKIP_REASON+=" (RAID_DEVS=$RAID_DEVS)"
+   return 1
+   fi
+   return 0
+}
+
+_meet_raid5_requirement() {
+   local size=${#RAID_DEVS[@]}
+
+   if ! grep -qw raid5 /proc/mdstat; then
+   SKIP_REASON="RAID5 is not available in /proc/mdstat."
+   return 1
+   fi
+
+   if [ "$size" -lt 3 ]; then
+   SKIP_REASON="RAID5 requires at least 3 devices."
+   SKIP_REASON+=" (RAID_DEVS=$RAID_DEVS)"
+   return 1
+