Re: [dm-devel] [PATCH 2/2] multipath-tools/libmultipath: Add args min_avg_latency for path_latency.

2017-07-23 Thread Yang Feng
Dear Martin,

Thanks, I got it.
Regards,
-Yang

On 2017/7/21 20:26, Martin Wilck wrote:
> Dear Yang,
> 
> On Fri, 2017-07-21 at 11:38 +0800, Yang Feng wrote:
>>
>> Sorry, I don't find this similar syntax. Could you give a example of
>> the
>> other prioritizer.
> 
> Please look at the prio_args section in multipath.conf. "weighted" uses
> just space separated args. "iet" uses "parm=value" syntax. That's
> actually most intuitive IMO. No current prio algorithm uses anything
> but space to separate options.
> 
> Regards
> Martin
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 2/2] multipath-tools/libmultipath: Add args min_avg_latency for path_latency.

2017-07-20 Thread Yang Feng
Dear Martin,

Thank you very much for your guidance and reviews.
Please find my replys as follows.
The up-to-date patch will be sent later.

Regards,
-Yang

On 2017/7/21 2:07, Martin Wilck wrote:
> Dear Yang,
> 
> On Thu, 2017-07-20 at 11:36 +0800, Yang Feng wrote:
>>   Add args min_avg_latency of logarithmic scale, for
>> prioritizers/path_latency.c.
>> Min average latency is not constant 1us, and is set by user.
>> Certainly, max average
>> latency value is still 100s. It make support better for more scenes,
>> because it can
>> deal better with the normal standard deviation of path latency. For
>> example, when the
>> standard deviation value is 200us and the average latency of the
>> normal paths is 1ms,
>> args base_num can be set to 5 and args min_avg_latency can be set to
>> 2ms, so the paths
>> will be grouped in priority groups with path latency <=2ms, (2ms,
>> 10ms], (10ms, 50ms],
>> etc.
> 
> Nack. You need this because you are using a wrong calculation for
> standard deviation. If your scale is logarithmic, you need to calculate
> the standard deviation on a log scale, too. The scientific term is
> "geometric standard deviation".
> 
> https://en.wikipedia.org/wiki/Geometric_standard_deviation
> 
> Suppose you really have 3 classes of devices with us, ms, and seconds
> average latency, respectively. The latency of the devices in the
> "seconds" class will also vary on the order of seconds (e.g. 3.5-5s).
> That's obviously impossible for the us and ms class devices. Rather,
> it's reasonable to assume that the us devices will have us (or sub-us)
> standard deviation and the ms devices have ms (or sub-ms) standard
> deviation, or in other words, the uncertainty is roughly in the same
> order of magnitude as the average.
>> This is exactly how the geometric standard deviation behaves.
> 
> I think I actually remarked this on your previous patch, but the patch
> has been  applied without that having been fixed. It would be good if
> you could fix it now.
>
Sorry, there is a mistake. I will fix it later.
>>
>> Signed-off-by: Yang Feng <philip.y...@huawei.com>
>> Reviewed-by: Martin Wilck <mwi...@suse.com>
> 
> Please don't add my Reviewed-by: tag for patches I haven't reviewed, or
> have negatively reviewed. Reviewed-by: is supposed to indicate
> approval.
> 
Sorry, it will fixed.
> Wrt your prio_args parser, could you perhaps support a syntax that is
> similar to other prioritizers, e.g. "base_num=5 io_num=10"?
> 
Sorry, I don't find this similar syntax. Could you give a example of the
other prioritizer.
> Regards
> Martin
> 
> 
>> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
>> ---
>>  libmultipath/prioritizers/path_latency.c | 61
>> ++--
>>  multipath/multipath.conf.5   | 14 +---
>>  2 files changed, 52 insertions(+), 23 deletions(-)
>>
>> diff --git a/libmultipath/prioritizers/path_latency.c
>> b/libmultipath/prioritizers/path_latency.c
>> index 34b734b..a71faff 100644
>> --- a/libmultipath/prioritizers/path_latency.c
>> +++ b/libmultipath/prioritizers/path_latency.c
>> @@ -66,7 +66,7 @@ static int do_readsector0(int fd, unsigned int
>> timeout)
>>  return ret;
>>  }
>>  
>> -int check_args_valid(int io_num, int base_num)
>> +int check_args_valid(int io_num, int base_num, int min_avg_latency)
>>  {
>>  if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM))
>>  {
>> @@ -80,24 +80,33 @@ int check_args_valid(int io_num, int base_num)
>>  return 0;
>>  }
>>  
>> +if ((min_avg_latency < MIN_AVG_LATENCY) || (min_avg_latency >
>> MAX_AVG_LATENCY))
>> +{
>> +pp_pl_log(0, "args min_avg_latency is outside the valid
>> range");
>> +return 0;
>> +}
>> +
>>  return 1;
>>  }
>>  
>> -/* In multipath.conf, args form: io_num|base_num. For example,
>> -*  args is "20|10", this function can get io_num value 20, and
>> -   base_num value 10.
>> +/* In multipath.conf, args form: io_num|base_num|min_avg_latency.
>> +*  For example, args is "20|10|1", this function can get io_num
>> +*  value 20, base_num value 10, min_avg_latency value 1 (us).
>>  */
>> -static int get_ionum_and_basenum(char *args,
>> - int *ionum,
>> - int *basenum)
>> +static int get_prio_args(char *args,
>> +int *ionum,
>> +int *basen

Re: [dm-devel] [PATCH 1/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command.

2017-07-20 Thread Yang Feng
Dear Martin,

Thanks a lot for your reviews.
Please find my replys as follows.
And the up-to-date patch will be sent later.

Regards,
-Yang

On 2017/7/21 1:50, Martin Wilck wrote:
> Dear Yang,
> 
> On Thu, 2017-07-20 at 11:36 +0800, Yang Feng wrote:
>> 1. The SCSI-to-NVMe translations have been removed in the patch
>> "nvme:
>> Remove SCSI translations" in the linux-nvme, so the native NVMe Ioctl
>> command should be supported in the multipath-tools.
> 
>> 2. In the prioritizers/path_latency.c, modify the func
>> do_readsector0():
>> send a native NVMe Read Ioctl command to the nvme device, and send a
>> SG
>> Read Ioctl command to the scsi device.
> 
> As noted previously, I would prefer to use a directio command here. I
> see no advantage of using sg or nvme ioctls over directio.
> 
Thanks, insteadly, directio will be used.

>> 3. In the tur checker, add support for the native NVMe Keep Alive
>> Ioctl
>> command to the nvme device.
> 
> No, please don't. I'm still with Xose here. There's no need to use the
> same checker for NVMe and SCSI devices. This is what we have the
> hwtable for. Keep the tur checker as it was before, create a keepalive
> checker for NVMe, and use hwtable entries to match them appropriately
> to devices.
> 
OK, it will be fixed.

> See below for some details. 
> NAK from my side for the patch in this form.
> 
> Martin
> 
>>
>> Signed-off-by: Yang Feng <philip.y...@huawei.com>
>> Reviewed-by: Martin Wilck <mwi...@suse.com>
>> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
>> ---
>>  libmultipath/checkers.c  |   7 ++
>>  libmultipath/checkers.h  |   4 +
>>  libmultipath/checkers/Makefile   |   4 +-
>>  libmultipath/checkers/emc_clariion.c |   4 +-
>>  libmultipath/checkers/libsg.c|  94 ---
> 
> IIRC you did this in your previous patch already. Can you give a good
> reason for moving this code? It makes your patch unnecessarily big and
> messes up git history. If it's REALLY necessary, separate it out,
> please.
>
OK, the libsg.c/libsg.h will not be moved.
> 
>> ---
>>  libmultipath/checkers/libsg.h|   9 ---
>>  libmultipath/checkers/readsector0.c  |   4 +-
>>  libmultipath/checkers/tur.c  |  64 ++-
>>  libmultipath/discovery.c |   1 +
>>  libmultipath/libnvme.c   | 130
>> +++
>>  libmultipath/libnvme.h   |  10 +++
>>  libmultipath/libsg.c | 113
>> +++
>>  libmultipath/libsg.h |  13 
>>  libmultipath/prioritizers/Makefile   |   2 +-
>>  libmultipath/prioritizers/path_latency.c |  34 +---
>>  multipath/multipath.conf.5   |   2 +-
>>  16 files changed, 354 insertions(+), 141 deletions(-)
>>  delete mode 100644 libmultipath/checkers/libsg.c
>>  delete mode 100644 libmultipath/checkers/libsg.h
>>  create mode 100644 libmultipath/libnvme.c
>>  create mode 100644 libmultipath/libnvme.h
>>  create mode 100644 libmultipath/libsg.c
>>  create mode 100644 libmultipath/libsg.h
>>
>> diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
>> index 05e024f..00fbd6e 100644
>> --- a/libmultipath/checkers.c
>> +++ b/libmultipath/checkers.c
>> @@ -162,6 +162,13 @@ void checker_set_fd (struct checker * c, int fd)
>>  c->fd = fd;
>>  }
>>  
>> +void checker_set_dev(struct checker *c, char *dev)
>> +{
>> +if (!c)
>> +return;
>> +strncpy(c->dev, dev, strlen(dev)+1);
>> +}
>> +
>>  void checker_set_sync (struct checker * c)
>>  {
>>  if (!c)
>> diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
>> index 1d225de..f924624 100644
>> --- a/libmultipath/checkers.h
>> +++ b/libmultipath/checkers.h
>> @@ -97,6 +97,8 @@ enum path_check_state {
>>  #define CHECKER_DEV_LEN 256
>>  #define LIB_CHECKER_NAMELEN 256
>>  
>> +#define FILE_NAME_SIZE  256
>> +
>>  struct checker {
>>  struct list_head node;
>>  void *handle;
>> @@ -107,6 +109,7 @@ struct checker {
>>  int disable;
>>  char name[CHECKER_NAME_LEN];
>>  char message[CHECKER_MSG_LEN];   /* comm with callers */
>> +char dev[FILE_NAME_SIZE];
> 
> FILE_NAME_SIZE more memory usage only to distinguish SCSI from NVMe.
> Please, no.
> 
OK, it will be fixed.
> 
>>  void * context;  /* st

[dm-devel] [PATCH 1/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command.

2017-07-19 Thread Yang Feng
1. The SCSI-to-NVMe translations have been removed in the patch "nvme:
Remove SCSI translations" in the linux-nvme, so the native NVMe Ioctl
command should be supported in the multipath-tools.
2. In the prioritizers/path_latency.c, modify the func do_readsector0():
send a native NVMe Read Ioctl command to the nvme device, and send a SG
Read Ioctl command to the scsi device.
3. In the tur checker, add support for the native NVMe Keep Alive Ioctl
command to the nvme device.

Signed-off-by: Yang Feng <philip.y...@huawei.com>
Reviewed-by: Martin Wilck <mwi...@suse.com>
Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
---
 libmultipath/checkers.c  |   7 ++
 libmultipath/checkers.h  |   4 +
 libmultipath/checkers/Makefile   |   4 +-
 libmultipath/checkers/emc_clariion.c |   4 +-
 libmultipath/checkers/libsg.c|  94 --
 libmultipath/checkers/libsg.h|   9 ---
 libmultipath/checkers/readsector0.c  |   4 +-
 libmultipath/checkers/tur.c  |  64 ++-
 libmultipath/discovery.c |   1 +
 libmultipath/libnvme.c   | 130 +++
 libmultipath/libnvme.h   |  10 +++
 libmultipath/libsg.c | 113 +++
 libmultipath/libsg.h |  13 
 libmultipath/prioritizers/Makefile   |   2 +-
 libmultipath/prioritizers/path_latency.c |  34 +---
 multipath/multipath.conf.5   |   2 +-
 16 files changed, 354 insertions(+), 141 deletions(-)
 delete mode 100644 libmultipath/checkers/libsg.c
 delete mode 100644 libmultipath/checkers/libsg.h
 create mode 100644 libmultipath/libnvme.c
 create mode 100644 libmultipath/libnvme.h
 create mode 100644 libmultipath/libsg.c
 create mode 100644 libmultipath/libsg.h

diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
index 05e024f..00fbd6e 100644
--- a/libmultipath/checkers.c
+++ b/libmultipath/checkers.c
@@ -162,6 +162,13 @@ void checker_set_fd (struct checker * c, int fd)
c->fd = fd;
 }
 
+void checker_set_dev(struct checker *c, char *dev)
+{
+if (!c)
+return;
+strncpy(c->dev, dev, strlen(dev)+1);
+}
+
 void checker_set_sync (struct checker * c)
 {
if (!c)
diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
index 1d225de..f924624 100644
--- a/libmultipath/checkers.h
+++ b/libmultipath/checkers.h
@@ -97,6 +97,8 @@ enum path_check_state {
 #define CHECKER_DEV_LEN 256
 #define LIB_CHECKER_NAMELEN 256
 
+#define FILE_NAME_SIZE  256
+
 struct checker {
struct list_head node;
void *handle;
@@ -107,6 +109,7 @@ struct checker {
int disable;
char name[CHECKER_NAME_LEN];
char message[CHECKER_MSG_LEN];   /* comm with callers */
+char dev[FILE_NAME_SIZE];
void * context;  /* store for persistent data */
void ** mpcontext;   /* store for persistent data shared
multipath-wide. Use MALLOC if
@@ -132,6 +135,7 @@ void checker_reset (struct checker *);
 void checker_set_sync (struct checker *);
 void checker_set_async (struct checker *);
 void checker_set_fd (struct checker *, int);
+void checker_set_dev(struct checker *c, char *dev);
 void checker_enable (struct checker *);
 void checker_disable (struct checker *);
 void checker_repair (struct checker *);
diff --git a/libmultipath/checkers/Makefile b/libmultipath/checkers/Makefile
index bce6b8b..a9be6b6 100644
--- a/libmultipath/checkers/Makefile
+++ b/libmultipath/checkers/Makefile
@@ -24,10 +24,10 @@ all: $(LIBS)
 libcheckrbd.so: rbd.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lrados -ludev
 
-libcheckdirectio.so: libsg.o directio.o
+libcheckdirectio.so: ../libsg.o ../libnvme.o directio.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -laio
 
-libcheck%.so: libsg.o %.o
+libcheck%.so: ../libsg.o ../libnvme.o %.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
 install:
diff --git a/libmultipath/checkers/emc_clariion.c 
b/libmultipath/checkers/emc_clariion.c
index 9c1ffed..12c1e3e 100644
--- a/libmultipath/checkers/emc_clariion.c
+++ b/libmultipath/checkers/emc_clariion.c
@@ -12,7 +12,7 @@
 #include 
 
 #include "../libmultipath/sg_include.h"
-#include "libsg.h"
+#include "../libmultipath/libsg.h"
 #include "checkers.h"
 #include "debug.h"
 #include "memory.h"
@@ -21,6 +21,8 @@
 #define INQUIRY_CMDLEN  6
 #define HEAVY_CHECK_COUNT   10
 
+#define SENSE_BUFF_LEN  32
+
 /*
  * Mechanism to track CLARiiON inactive snapshot LUs.
  * This is done so that we can fail passive paths
diff --git a/libmultipath/checkers/libsg.c b/libmultipath/checkers/libsg.c
deleted file mode 100644
index 958ea92..000
--- a/libmultipath/checkers/libsg.c
+++ /dev/null
@@ -1,94 +0,0 @@
-/*
- * Copyright 

[dm-devel] [PATCH 2/2] multipath-tools/libmultipath: Add args min_avg_latency for path_latency.

2017-07-19 Thread Yang Feng
  Add args min_avg_latency of logarithmic scale, for 
prioritizers/path_latency.c.
Min average latency is not constant 1us, and is set by user. Certainly, max 
average
latency value is still 100s. It make support better for more scenes, because it 
can
deal better with the normal standard deviation of path latency. For example, 
when the
standard deviation value is 200us and the average latency of the normal paths 
is 1ms,
args base_num can be set to 5 and args min_avg_latency can be set to 2ms, so 
the paths
will be grouped in priority groups with path latency <=2ms, (2ms, 10ms], (10ms, 
50ms],
etc.

Signed-off-by: Yang Feng <philip.y...@huawei.com>
Reviewed-by: Martin Wilck <mwi...@suse.com>
Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
---
 libmultipath/prioritizers/path_latency.c | 61 ++--
 multipath/multipath.conf.5   | 14 +---
 2 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/libmultipath/prioritizers/path_latency.c 
b/libmultipath/prioritizers/path_latency.c
index 34b734b..a71faff 100644
--- a/libmultipath/prioritizers/path_latency.c
+++ b/libmultipath/prioritizers/path_latency.c
@@ -66,7 +66,7 @@ static int do_readsector0(int fd, unsigned int timeout)
return ret;
 }
 
-int check_args_valid(int io_num, int base_num)
+int check_args_valid(int io_num, int base_num, int min_avg_latency)
 {
 if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM))
 {
@@ -80,24 +80,33 @@ int check_args_valid(int io_num, int base_num)
 return 0;
 }
 
+if ((min_avg_latency < MIN_AVG_LATENCY) || (min_avg_latency > 
MAX_AVG_LATENCY))
+{
+pp_pl_log(0, "args min_avg_latency is outside the valid range");
+return 0;
+}
+
 return 1;
 }
 
-/* In multipath.conf, args form: io_num|base_num. For example,
-*  args is "20|10", this function can get io_num value 20, and
-   base_num value 10.
+/* In multipath.conf, args form: io_num|base_num|min_avg_latency.
+*  For example, args is "20|10|1", this function can get io_num
+*  value 20, base_num value 10, min_avg_latency value 1 (us).
 */
-static int get_ionum_and_basenum(char *args,
- int *ionum,
- int *basenum)
+static int get_prio_args(char *args,
+int *ionum,
+int *basenum,
+int *minavglatency)
 {
 char source[MAX_CHAR_SIZE];
 char vertica = '|';
-char *endstrbefore = NULL;
-char *endstrafter = NULL;
+char *endstr1 = NULL;
+char *endstr2 = NULL;
+char *endstr3 = NULL;
 unsigned int size = strlen(args);
 
-if ((args == NULL) || (ionum == NULL) || (basenum == NULL))
+if ((args == NULL) || (ionum == NULL)
+|| (basenum == NULL) || (minavglatency == NULL))
 {
 pp_pl_log(0, "args string is NULL");
 return 0;
@@ -117,21 +126,34 @@ static int get_ionum_and_basenum(char *args,
 return 0;
 }
 
-*ionum = (int)strtoul(source, , 10);
-if (endstrbefore[0] != vertica)
+*ionum = (int)strtoul(source, , 10);
+if (endstr1[0] != vertica)
+{
+pp_pl_log(0, "invalid prio_args format: %s", source);
+return 0;
+}
+
+if (!isdigit(endstr1[1]))
+{
+pp_pl_log(0, "invalid prio_args format: %s", source);
+return 0;
+}
+
+*basenum = (long long)strtol([1], , 10);
+if (endstr2[0] != vertica)
 {
 pp_pl_log(0, "invalid prio_args format: %s", source);
 return 0;
 }
 
-if (!isdigit(endstrbefore[1]))
+if (!isdigit(endstr2[1]))
 {
 pp_pl_log(0, "invalid prio_args format: %s", source);
 return 0;
 }
 
-*basenum = (long long)strtol([1], , 10);
-if (check_args_valid(*ionum, *basenum) == 0)
+*minavglatency = (long long)strtol([1], , 10);
+if (check_args_valid(*ionum, *basenum, *minavglatency) == 0)
 {
 return 0;
 }
@@ -194,6 +216,7 @@ int getprio (struct path *pp, char *args, unsigned int 
timeout)
 int index = 0;
 int io_num;
 int base_num;
+int min_avg_latency;
 long long avglatency;
 long long latency_interval;
 long long standard_deviation;
@@ -204,7 +227,7 @@ int getprio (struct path *pp, char *args, unsigned int 
timeout)
 if (pp->fd < 0)
 return -1;
 
-if (get_ionum_and_basenum(args, _num, _num) == 0)
+if (get_prio_args(args, _num, _num, _avg_latency) == 0)
 {
 pp_pl_log(0, "%s: get path_latency args fail", pp->dev);
 return DEFAULT_PRIORITY;
@@ -245,13 +268,13 @@ int getprio (struct path *pp, char *args, unsigned int 
timeout)
 set can change latency_interval value corresponding to avglatency and is 
not constant.
 Warn the user if latency_interval is smaller than (2 * 
standard_deviation), or equal */
 standard_deviation = 

Re: [dm-devel] [PATCH 0/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency.

2017-07-18 Thread Yang Feng
Hi Xose,

Thanks for your reviews

The up-to-date patch will be sent later.

Regards!
-Yang

On 2017/7/18 7:00, Xose Vazquez Perez wrote:
> On 07/17/2017 04:23 AM, Yang Feng wrote:
> 
>> I think there is a mistake:
>> 1. In fact, the tur checker feature has never been removed. The tur checker
>> has just been renamed to "ping" checker and the keep alive command checker
>> feature is added into this checker. So, the renamed checker remains "tur"
>> and add "keep alive". Absolutely, I agree that SCSI is going to stay for
>> some time to come, So it can support a SCSI device by "tur" and support a
>> NVMe device by "keep alive".
>> 2. If add a new checker for "keep alive", it maybe increase the redundancy
>> of code. And, when the SCSI LUN and the NVMe namespace are maped in the
>> same array for the same host, no matter which checker is choosed, it will
>> not be able to support the scenario very well.
>> 3. Using read() with O_DIRECT for path_latency can avoid this problem, but
>> can not fix this problem.
> There is no need to rename tur to ping. Same work can be done over tur.c.
> 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 0/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency.

2017-07-16 Thread Yang Feng
Hi Martin,

Thanks for your reviews.

I think there is a mistake:
1. In fact, the tur checker feature has never been removed. The tur checker
has just been renamed to "ping" checker and the keep alive command checker
feature is added into this checker. So, the renamed checker remains "tur"
and add "keep alive". Absolutely, I agree that SCSI is going to stay for
some time to come, So it can support a SCSI device by "tur" and support a
NVMe device by "keep alive".
2. If add a new checker for "keep alive", it maybe increase the redundancy
of code. And, when the SCSI LUN and the NVMe namespace are maped in the
same array for the same host, no matter which checker is choosed, it will
not be able to support the scenario very well.
3. Using read() with O_DIRECT for path_latency can avoid this problem, but
can not fix this problem.

Respect and regards.
-Yang

On 2017/7/14 18:45, Martin Wilck wrote:
> On Fri, 2017-07-14 at 09:47 +0800, Yang Feng wrote:
>> Hi Xose,
>>
>> But tur can not support NVMe device, and if the default checker
>> change to ping,
>> then it can support NVMe device by a keep alive command and SCSI
>> device by a
>> tur command.
> 
> I agree with Xose. This is what we have the hwtable for. It's a good
> and necessary thing to implement a checker equivalent to TUR for NVME,
> but not a reason to remove TUR. SCSI is going to stay for some time to
> come.
> 
> As for the path latency checker and your readsector0 function, I don't
> quite understand why you are using SG_IO at all. Why can't you just
> read() with O_DIRECT?
> 
> Saying that without having had the time for a deeper review of your
> patch.
> 
> Martin
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 0/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency.

2017-07-13 Thread Yang Feng
Hi Xose,

But tur can not support NVMe device, and if the default checker change to ping,
then it can support NVMe device by a keep alive command and SCSI device by a
tur command.

Regards!
-Yang

On 2017/7/14 0:44, Xose Vazquez Perez wrote:
> On 07/13/2017 09:51 AM, Yang Feng wrote:
> 
>> [...]
>>In the checkers, delete the file tur.c and create the new file ping.c:
>> ping.c can support the native NVMe Keep Alive Ioctl command to the nvme
>> device, and can support the SG TUR Ioctl command to the scsi device.
>> [...]
>>  delete mode 100644 libmultipath/checkers/tur.c
>>  delete mode 100644 libmultipath/checkers/tur.h
> 
> tur can not be deleted. It breaks current configs, and replacing
> the default checker so fast is a bit delicate.
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH]prioritizers/path_latency: Fix failure of sg_read to a nvme device

2017-07-13 Thread Yang Feng
Closed and traced by another patch.
-Yang

On 2017/6/28 12:42, Yang Feng wrote:
> Fixed failure of sg_read to a nvme device:
> 1. Send sg_read command to a nvme device failed,
> because the block size in user-mode is 4096 bytes,
> not equal to the bolck size in kernel-mode, who is
> 512 bytes. According to the func nvme_trans_io of
> scsi.c in the dir "drivers/nvme/host". The code
> fragmentS as follows:
> /* If block count and actual data buffer size dont match, error out */
> if (xfer_bytes != (cdb_info.xfer_len << ns->lba_shift)) {
> res = -EINVAL;
> goto out;
> }
> 2. And the SCSI-to-NVMe translations will be removed in
> the patch "nvme: Remove SCSI translations" in the linux-nvme,
> so using the nvme_read command is a good idea.
> 
> Signed-off-by: Yang Feng <philip.y...@huawei.com>
> ---
>  libmultipath/prioritizers/path_latency.c | 78 
> +++-
>  1 file changed, 66 insertions(+), 12 deletions(-)
> 
> diff --git a/libmultipath/prioritizers/path_latency.c 
> b/libmultipath/prioritizers/path_latency.c
> index 34b734b..8f633e0 100644
> --- a/libmultipath/prioritizers/path_latency.c
> +++ b/libmultipath/prioritizers/path_latency.c
> @@ -23,14 +23,32 @@
>  #include 
>  #include 
>  #include 
> -
>  #include "debug.h"
>  #include "prio.h"
>  #include "structs.h"
> +#include 
> +#include 
>  #include "../checkers/libsg.h"
>  
>  #define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " 
> fmt, ##args)
>  
> +struct nvme_user_io {
> +__u8 opcode;
> +__u8 flags;
> +__u16 control;
> +__u16 nblocks;
> +__u16 rsvd;
> +__u64 metadata;
> +__u64 addr;
> +__u64 slba;
> +__u32 dsmgmt;
> +__u32 reftag;
> +__u16 apptag;
> +__u16 appmask;
> +};
> +
> +#define NVME_IOCTL_SUBMIT_IO _IOW('N', 0x42, struct nvme_user_io)
> +
>  #define MAX_IO_NUM  200
>  #define MIN_IO_NUM  2
>  
> @@ -51,19 +69,55 @@ static long long path_latency[MAX_IO_NUM];
>  
>  static inline long long timeval_to_us(const struct timespec *tv)
>  {
> - return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / 
> NSEC_PER_USEC);
> +return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / 
> NSEC_PER_USEC);
> +}
> +
> +int nvme_io(int fd, __u8 opcode, __u64 slba, __u16 nblocks, __u16 control,
> +__u32 dsmgmt, __u32 reftag, __u16 apptag, __u16 appmask, void 
> *data, void *metadata)
> +{
> +struct nvme_user_io io = {
> +.opcode = opcode,
> +.flags = 0,
> +.control = control,
> +.nblocks = nblocks,
> +.rsvd = 0,
> +.metadata = (__u64)(uintptr_t) metadata,
> +.addr = (__u64)(uintptr_t) data,
> +.slba = slba,
> +.dsmgmt = dsmgmt,
> +.reftag = reftag,
> +.appmask = apptag,
> +.apptag = appmask,
> +};
> +
> +return ioctl(fd, NVME_IOCTL_SUBMIT_IO, );
> +}
> +
> +int nvme_read(int fd, __u64 slba, __u16 nblocks, __u16 control, __u32 dsmgmt,
> +__u32 reftag, __u16 apptag, __u16 appmask, void *data, void 
> *metadata)
> +{
> +return nvme_io(fd, 0x2, slba, nblocks, control, dsmgmt,
> +reftag, apptag, appmask, data, metadata);
>  }
>  
> -static int do_readsector0(int fd, unsigned int timeout)
> +static int do_readsector0(struct path *pp, unsigned int timeout)
>  {
> - unsigned char buf[4096];
> - unsigned char sbuf[SENSE_BUFF_LEN];
> - int ret;
> +unsigned char buf[4096];
> +unsigned char mbuf[512];
> +unsigned char sbuf[SENSE_BUFF_LEN];
>  
> - ret = sg_read(fd, [0], 4096, [0],
> -   SENSE_BUFF_LEN, timeout);
> +if (!strncmp(pp->dev, "nvme", 4))
> +{
> +if (nvme_read(pp->fd, 0, 1, 0, 0, 0, 0, 0, buf, mbuf) < 0)
> +return 0;
> +}
> +else
> +{
> +if (sg_read(pp->fd, [0], 4096, [0],SENSE_BUFF_LEN, timeout) 
> == 2)
> +return 0;
> +}
>  
> - return ret;
> +return 1;
>  }
>  
>  int check_args_valid(int io_num, int base_num)
> @@ -218,9 +272,9 @@ int getprio (struct path *pp, char *args, unsigned int 
> timeout)
>  (void)clock_gettime(CLOCK_MONOTONIC, );
>  before = timeval_to_us();
>  
> -if (do_readsector0(pp->fd, timeout) == 2)
> +if (do_readsector0(pp, timeout) == 0)
>  {
> -pp_pl_log(0, "%s: path down", pp->dev);
> +pp_pl_log(0, "%s: send read sector0 command fail", pp->dev);
>  

[dm-devel] [PATCH 2/2] multipath-tools/libmultipath: Add args min_avg_latency for path_latency.

2017-07-13 Thread Yang Feng
  Add args min_avg_latency of logarithmic scale, for 
prioritizers/path_latency.c.
Min average latency is not constant 1us, and is set by user. Certainly, max 
average
latency value is still 100s. It make support better for more scenes, because it 
can
deal better with the normal standard deviation of path latency. For example, 
when the
standard deviation value is 200us and the average latency of the normal paths 
is 1ms,
args base_num can be set to 5 and args min_avg_latency can be set to 2ms, so 
the paths
will be grouped in priority groups with path latency <=2ms, (2ms, 10ms], (10ms, 
50ms],
etc.

Signed-off-by: Yang Feng <philip.y...@huawei.com>
---
 libmultipath/prioritizers/path_latency.c | 61 ++--
 multipath/multipath.conf.5   | 14 +---
 2 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/libmultipath/prioritizers/path_latency.c 
b/libmultipath/prioritizers/path_latency.c
index 21209ff..c8fe318 100644
--- a/libmultipath/prioritizers/path_latency.c
+++ b/libmultipath/prioritizers/path_latency.c
@@ -76,7 +76,7 @@ static int do_readsector0(struct path *pp, unsigned int 
timeout)
 return 1;
 }
 
-int check_args_valid(int io_num, int base_num)
+int check_args_valid(int io_num, int base_num, int min_avg_latency)
 {
 if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM))
 {
@@ -90,24 +90,33 @@ int check_args_valid(int io_num, int base_num)
 return 0;
 }
 
+if ((min_avg_latency < MIN_AVG_LATENCY) || (min_avg_latency > 
MAX_AVG_LATENCY))
+{
+pp_pl_log(0, "args min_avg_latency is outside the valid range");
+return 0;
+}
+
 return 1;
 }
 
-/* In multipath.conf, args form: io_num|base_num. For example,
-*  args is "20|10", this function can get io_num value 20, and
-   base_num value 10.
+/* In multipath.conf, args form: io_num|base_num|min_avg_latency.
+*  For example, args is "20|10|1", this function can get io_num
+*  value 20, base_num value 10, min_avg_latency value 1 (us).
 */
-static int get_ionum_and_basenum(char *args,
- int *ionum,
- int *basenum)
+static int get_prio_args(char *args,
+int *ionum,
+int *basenum,
+int *minavglatency)
 {
 char source[MAX_CHAR_SIZE];
 char vertica = '|';
-char *endstrbefore = NULL;
-char *endstrafter = NULL;
+char *endstr1 = NULL;
+char *endstr2 = NULL;
+char *endstr3 = NULL;
 unsigned int size = strlen(args);
 
-if ((args == NULL) || (ionum == NULL) || (basenum == NULL))
+if ((args == NULL) || (ionum == NULL)
+|| (basenum == NULL) || (minavglatency == NULL))
 {
 pp_pl_log(0, "args string is NULL");
 return 0;
@@ -127,21 +136,34 @@ static int get_ionum_and_basenum(char *args,
 return 0;
 }
 
-*ionum = (int)strtoul(source, , 10);
-if (endstrbefore[0] != vertica)
+*ionum = (int)strtoul(source, , 10);
+if (endstr1[0] != vertica)
+{
+pp_pl_log(0, "invalid prio_args format: %s", source);
+return 0;
+}
+
+if (!isdigit(endstr1[1]))
+{
+pp_pl_log(0, "invalid prio_args format: %s", source);
+return 0;
+}
+
+*basenum = (long long)strtol([1], , 10);
+if (endstr2[0] != vertica)
 {
 pp_pl_log(0, "invalid prio_args format: %s", source);
 return 0;
 }
 
-if (!isdigit(endstrbefore[1]))
+if (!isdigit(endstr2[1]))
 {
 pp_pl_log(0, "invalid prio_args format: %s", source);
 return 0;
 }
 
-*basenum = (long long)strtol([1], , 10);
-if (check_args_valid(*ionum, *basenum) == 0)
+*minavglatency = (long long)strtol([1], , 10);
+if (check_args_valid(*ionum, *basenum, *minavglatency) == 0)
 {
 return 0;
 }
@@ -204,6 +226,7 @@ int getprio (struct path *pp, char *args, unsigned int 
timeout)
 int index = 0;
 int io_num;
 int base_num;
+int min_avg_latency;
 long long avglatency;
 long long latency_interval;
 long long standard_deviation;
@@ -214,7 +237,7 @@ int getprio (struct path *pp, char *args, unsigned int 
timeout)
 if (pp->fd < 0)
 return -1;
 
-if (get_ionum_and_basenum(args, _num, _num) == 0)
+if (get_prio_args(args, _num, _num, _avg_latency) == 0)
 {
 pp_pl_log(0, "%s: get path_latency args fail", pp->dev);
 return DEFAULT_PRIORITY;
@@ -255,13 +278,13 @@ int getprio (struct path *pp, char *args, unsigned int 
timeout)
 set can change latency_interval value corresponding to avglatency and is 
not constant.
 Warn the user if latency_interval is smaller than (2 * 
standard_deviation), or equal */
 standard_deviation = calc_standard_deviation(path_latency, index, 
avglatency);
-latency_interval = calc_latency_

[dm-devel] [PATCH 0/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency.

2017-07-13 Thread Yang Feng
1. The SCSI-to-NVMe translations have been removed in the patch "nvme:
Remove SCSI translations" in the linux-nvme, so the native NVMe Ioctl
command should be supported in the multipath-tools.
   In the prioritizers/path_latency.c, modify the func do_readsector0():
send a native NVMe Read Ioctl command to the nvme device, and send a SG
Read Ioctl command to the scsi device.
   In the checkers, delete the file tur.c and create the new file ping.c:
ping.c can support the native NVMe Keep Alive Ioctl command to the nvme
device, and can support the SG TUR Ioctl command to the scsi device.
2. Add args min_avg_latency of logarithmic scale, for prioritizers/
path_latency.c. Min average latency is not constant 1us, and can be set
by user. Certainly, max average latency value is still constant 100s.
It make support better for more scenes, because it can deal better with
the normal standard deviation of path latency.
   For example, when the standard deviation value is 200us and the average
latency of the normal paths is 1ms, args base_num can be set to 5 and args
min_avg_latency can be set to 2ms, so the paths will be grouped in priority
groups with path latency <=2ms, (2ms, 10ms], (10ms, 50ms], etc.

Yang Feng (1):
  multipath-tools/libmultipath: Support for the native NVMe Ioctl command
and add args min_avg_latency for path_latency.

 libmultipath/checkers.c  |   7 +
 libmultipath/checkers.h  |   6 +-
 libmultipath/checkers/Makefile   |   6 +-
 libmultipath/checkers/emc_clariion.c |   4 +-
 libmultipath/checkers/libsg.c|  94 ---
 libmultipath/checkers/libsg.h|   9 -
 libmultipath/checkers/ping.c | 453 +++
 libmultipath/checkers/readsector0.c  |   4 +-
 libmultipath/checkers/tur.c  | 427 -
 libmultipath/checkers/tur.h  |   8 -
 libmultipath/defaults.h  |   2 +-
 libmultipath/discovery.c |   1 +
 libmultipath/hwtable.c   |   2 +-
 libmultipath/libnvme.c   | 130 +
 libmultipath/libnvme.h   |  10 +
 libmultipath/libsg.c | 113 
 libmultipath/libsg.h |  13 +
 libmultipath/prioritizers/Makefile   |   2 +-
 libmultipath/prioritizers/path_latency.c |  95 ---
 libmultipath/propsel.c   |   2 +-
 multipath/multipath.conf.5   |  18 +-
 21 files changed, 821 insertions(+), 585 deletions(-)
 delete mode 100644 libmultipath/checkers/libsg.c
 delete mode 100644 libmultipath/checkers/libsg.h
 create mode 100644 libmultipath/checkers/ping.c
 delete mode 100644 libmultipath/checkers/tur.c
 delete mode 100644 libmultipath/checkers/tur.h
 create mode 100644 libmultipath/libnvme.c
 create mode 100644 libmultipath/libnvme.h
 create mode 100644 libmultipath/libsg.c
 create mode 100644 libmultipath/libsg.h

-- 
2.6.4.windows.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH 1/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command.

2017-07-13 Thread Yang Feng
1. The SCSI-to-NVMe translations have been removed in the patch "nvme:
Remove SCSI translations" in the linux-nvme, so the native NVMe Ioctl
command should be supported in the multipath-tools.
2. In the prioritizers/path_latency.c, modify the func do_readsector0():
send a native NVMe Read Ioctl command to the nvme device, and send a SG
Read Ioctl command to the scsi device.
3. In the checkers, delete the file tur.c and create the new file ping.c:
ping.c can support the native NVMe Keep Alive Ioctl command to the nvme
device, and can support the SG TUR Ioctl command to the scsi device.

Signed-off-by: Yang Feng <philip.y...@huawei.com>
---
 libmultipath/checkers.c  |   7 +
 libmultipath/checkers.h  |   6 +-
 libmultipath/checkers/Makefile   |   6 +-
 libmultipath/checkers/emc_clariion.c |   4 +-
 libmultipath/checkers/libsg.c|  94 ---
 libmultipath/checkers/libsg.h|   9 -
 libmultipath/checkers/ping.c | 453 +++
 libmultipath/checkers/readsector0.c  |   4 +-
 libmultipath/checkers/tur.c  | 427 -
 libmultipath/checkers/tur.h  |   8 -
 libmultipath/defaults.h  |   2 +-
 libmultipath/discovery.c |   1 +
 libmultipath/hwtable.c   |   2 +-
 libmultipath/libnvme.c   | 130 +
 libmultipath/libnvme.h   |  10 +
 libmultipath/libsg.c | 113 
 libmultipath/libsg.h |  13 +
 libmultipath/prioritizers/Makefile   |   2 +-
 libmultipath/prioritizers/path_latency.c |  58 +---
 libmultipath/propsel.c   |   2 +-
 multipath/multipath.conf.5   |   4 +-
 21 files changed, 754 insertions(+), 601 deletions(-)
 delete mode 100644 libmultipath/checkers/libsg.c
 delete mode 100644 libmultipath/checkers/libsg.h
 create mode 100644 libmultipath/checkers/ping.c
 delete mode 100644 libmultipath/checkers/tur.c
 delete mode 100644 libmultipath/checkers/tur.h
 create mode 100644 libmultipath/libnvme.c
 create mode 100644 libmultipath/libnvme.h
 create mode 100644 libmultipath/libsg.c
 create mode 100644 libmultipath/libsg.h

diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
index 05e024f..00fbd6e 100644
--- a/libmultipath/checkers.c
+++ b/libmultipath/checkers.c
@@ -162,6 +162,13 @@ void checker_set_fd (struct checker * c, int fd)
c->fd = fd;
 }
 
+void checker_set_dev(struct checker *c, char *dev)
+{
+if (!c)
+return;
+strncpy(c->dev, dev, strlen(dev)+1);
+}
+
 void checker_set_sync (struct checker * c)
 {
if (!c)
diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
index 1d225de..506dd4c 100644
--- a/libmultipath/checkers.h
+++ b/libmultipath/checkers.h
@@ -79,7 +79,7 @@ enum path_check_state {
 };
 
 #define DIRECTIO "directio"
-#define TUR  "tur"
+#define PING "ping"
 #define HP_SW"hp_sw"
 #define RDAC "rdac"
 #define EMC_CLARIION "emc_clariion"
@@ -97,6 +97,8 @@ enum path_check_state {
 #define CHECKER_DEV_LEN 256
 #define LIB_CHECKER_NAMELEN 256
 
+#define FILE_NAME_SIZE  256
+
 struct checker {
struct list_head node;
void *handle;
@@ -107,6 +109,7 @@ struct checker {
int disable;
char name[CHECKER_NAME_LEN];
char message[CHECKER_MSG_LEN];   /* comm with callers */
+char dev[FILE_NAME_SIZE];
void * context;  /* store for persistent data */
void ** mpcontext;   /* store for persistent data shared
multipath-wide. Use MALLOC if
@@ -132,6 +135,7 @@ void checker_reset (struct checker *);
 void checker_set_sync (struct checker *);
 void checker_set_async (struct checker *);
 void checker_set_fd (struct checker *, int);
+void checker_set_dev(struct checker *c, char *dev);
 void checker_enable (struct checker *);
 void checker_disable (struct checker *);
 void checker_repair (struct checker *);
diff --git a/libmultipath/checkers/Makefile b/libmultipath/checkers/Makefile
index bce6b8b..3ab04ef 100644
--- a/libmultipath/checkers/Makefile
+++ b/libmultipath/checkers/Makefile
@@ -9,7 +9,7 @@ CFLAGS += $(LIB_CFLAGS) -I..
 LIBS= \
libcheckcciss_tur.so \
libcheckreadsector0.so \
-   libchecktur.so \
+   libcheckping.so \
libcheckdirectio.so \
libcheckemc_clariion.so \
libcheckhp_sw.so \
@@ -24,10 +24,10 @@ all: $(LIBS)
 libcheckrbd.so: rbd.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lrados -ludev
 
-libcheckdirectio.so: libsg.o directio.o
+libcheckdirectio.so: ../libsg.o ../libnvme.o directio.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -laio
 
-libcheck%.so: libsg.o %.o
+libcheck%.so: ../libsg.o ../libnvme.o %.o
$(CC) $(LDFLAGS) $(SH

[dm-devel] [PATCH]prioritizers/path_latency: Fix failure of sg_read to a nvme device

2017-06-27 Thread Yang Feng
Fixed failure of sg_read to a nvme device:
1. Send sg_read command to a nvme device failed,
because the block size in user-mode is 4096 bytes,
not equal to the bolck size in kernel-mode, who is
512 bytes. According to the func nvme_trans_io of
scsi.c in the dir "drivers/nvme/host". The code
fragmentS as follows:
/* If block count and actual data buffer size dont match, error out */
if (xfer_bytes != (cdb_info.xfer_len << ns->lba_shift)) {
res = -EINVAL;
goto out;
}
2. And the SCSI-to-NVMe translations will be removed in
the patch "nvme: Remove SCSI translations" in the linux-nvme,
so using the nvme_read command is a good idea.

Signed-off-by: Yang Feng <philip.y...@huawei.com>
---
 libmultipath/prioritizers/path_latency.c | 78 +++-
 1 file changed, 66 insertions(+), 12 deletions(-)

diff --git a/libmultipath/prioritizers/path_latency.c 
b/libmultipath/prioritizers/path_latency.c
index 34b734b..8f633e0 100644
--- a/libmultipath/prioritizers/path_latency.c
+++ b/libmultipath/prioritizers/path_latency.c
@@ -23,14 +23,32 @@
 #include 
 #include 
 #include 
-
 #include "debug.h"
 #include "prio.h"
 #include "structs.h"
+#include 
+#include 
 #include "../checkers/libsg.h"
 
 #define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " fmt, 
##args)
 
+struct nvme_user_io {
+__u8 opcode;
+__u8 flags;
+__u16 control;
+__u16 nblocks;
+__u16 rsvd;
+__u64 metadata;
+__u64 addr;
+__u64 slba;
+__u32 dsmgmt;
+__u32 reftag;
+__u16 apptag;
+__u16 appmask;
+};
+
+#define NVME_IOCTL_SUBMIT_IO _IOW('N', 0x42, struct nvme_user_io)
+
 #define MAX_IO_NUM  200
 #define MIN_IO_NUM  2
 
@@ -51,19 +69,55 @@ static long long path_latency[MAX_IO_NUM];
 
 static inline long long timeval_to_us(const struct timespec *tv)
 {
-   return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / 
NSEC_PER_USEC);
+return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / 
NSEC_PER_USEC);
+}
+
+int nvme_io(int fd, __u8 opcode, __u64 slba, __u16 nblocks, __u16 control,
+__u32 dsmgmt, __u32 reftag, __u16 apptag, __u16 appmask, void 
*data, void *metadata)
+{
+struct nvme_user_io io = {
+.opcode = opcode,
+.flags = 0,
+.control = control,
+.nblocks = nblocks,
+.rsvd = 0,
+.metadata = (__u64)(uintptr_t) metadata,
+.addr = (__u64)(uintptr_t) data,
+.slba = slba,
+.dsmgmt = dsmgmt,
+.reftag = reftag,
+.appmask = apptag,
+.apptag = appmask,
+};
+
+return ioctl(fd, NVME_IOCTL_SUBMIT_IO, );
+}
+
+int nvme_read(int fd, __u64 slba, __u16 nblocks, __u16 control, __u32 dsmgmt,
+__u32 reftag, __u16 apptag, __u16 appmask, void *data, void 
*metadata)
+{
+return nvme_io(fd, 0x2, slba, nblocks, control, dsmgmt,
+reftag, apptag, appmask, data, metadata);
 }
 
-static int do_readsector0(int fd, unsigned int timeout)
+static int do_readsector0(struct path *pp, unsigned int timeout)
 {
-   unsigned char buf[4096];
-   unsigned char sbuf[SENSE_BUFF_LEN];
-   int ret;
+unsigned char buf[4096];
+unsigned char mbuf[512];
+unsigned char sbuf[SENSE_BUFF_LEN];
 
-   ret = sg_read(fd, [0], 4096, [0],
- SENSE_BUFF_LEN, timeout);
+if (!strncmp(pp->dev, "nvme", 4))
+{
+if (nvme_read(pp->fd, 0, 1, 0, 0, 0, 0, 0, buf, mbuf) < 0)
+return 0;
+}
+else
+{
+if (sg_read(pp->fd, [0], 4096, [0],SENSE_BUFF_LEN, timeout) 
== 2)
+return 0;
+}
 
-   return ret;
+return 1;
 }
 
 int check_args_valid(int io_num, int base_num)
@@ -218,9 +272,9 @@ int getprio (struct path *pp, char *args, unsigned int 
timeout)
 (void)clock_gettime(CLOCK_MONOTONIC, );
 before = timeval_to_us();
 
-if (do_readsector0(pp->fd, timeout) == 2)
+if (do_readsector0(pp, timeout) == 0)
 {
-pp_pl_log(0, "%s: path down", pp->dev);
+pp_pl_log(0, "%s: send read sector0 command fail", pp->dev);
 return -1;
 }
 
@@ -246,7 +300,7 @@ int getprio (struct path *pp, char *args, unsigned int 
timeout)
 Warn the user if latency_interval is smaller than (2 * 
standard_deviation), or equal */
 standard_deviation = calc_standard_deviation(path_latency, index, 
avglatency);
 latency_interval = calc_latency_interval(avglatency, MAX_AVG_LATENCY, 
MIN_AVG_LATENCY, base_num);
-if ((latency_interval!= 0)
+if ((latency_interval != 0)
 && (latency_interval <= (2 * standard_deviation)))
 pp_pl_log(3, "%s: latency interval (%lld) according to average latency 
(%lld us) is smaller than "
 "2 * standard deviation (%lld us), or equal, args base_num (%d) 
needs to be set bigger value",
-- 
2.6.4.windows.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] multipath-tools: beautify path_latency.c code

2017-06-24 Thread Yang Feng
Hi Xose,
Thanks for your work.

ACK
-Yang

On 2017/6/23 0:46, Xose Vazquez Perez wrote:
> Mainly running scripts/Lindent, from kernel dir, to replace indent spaces
> by tabs.
> 
> Cc: Yang Feng <philip.y...@huawei.com>
> Cc: Christophe Varoqui <christophe.varo...@opensvc.com>
> Cc: device-mapper development <dm-devel@redhat.com>
> Signed-off-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
> ---
>  libmultipath/prioritizers/path_latency.c | 354 
> +++
>  1 file changed, 177 insertions(+), 177 deletions(-)
> 
> diff --git a/libmultipath/prioritizers/path_latency.c 
> b/libmultipath/prioritizers/path_latency.c
> index 34b734b..9fc2dfc 100644
> --- a/libmultipath/prioritizers/path_latency.c
> +++ b/libmultipath/prioritizers/path_latency.c
> @@ -17,8 +17,8 @@
>   * Author(s): Yang Feng <philip.y...@huawei.com>
>   *
>   * This file is released under the GPL version 2, or any later version.
> - *
>   */
> +
>  #include 
>  #include 
>  #include 
> @@ -31,27 +31,28 @@
>  
>  #define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " 
> fmt, ##args)
>  
> -#define MAX_IO_NUM  200
> -#define MIN_IO_NUM  2
> +#define MAX_IO_NUM   200
> +#define MIN_IO_NUM   2
>  
> -#define MAX_BASE_NUM10
> -#define MIN_BASE_NUM2
> +#define MAX_BASE_NUM 10
> +#define MIN_BASE_NUM 2
>  
> -#define MAX_AVG_LATENCY 1.  /*Unit: us*/
> -#define MIN_AVG_LATENCY 1.  /*Unit: us*/
> +#define MAX_AVG_LATENCY  1.  /* Unit: us */
> +#define MIN_AVG_LATENCY  1.  /* Unit: us */
>  
> -#define DEFAULT_PRIORITY0
> +#define DEFAULT_PRIORITY 0
>  
> -#define MAX_CHAR_SIZE   30
> +#define MAX_CHAR_SIZE30
>  
> -#define USEC_PER_SEC100LL
> -#define NSEC_PER_USEC   1000LL
> +#define USEC_PER_SEC 100LL
> +#define NSEC_PER_USEC1000LL
>  
>  static long long path_latency[MAX_IO_NUM];
>  
>  static inline long long timeval_to_us(const struct timespec *tv)
>  {
> - return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / 
> NSEC_PER_USEC);
> + return ((long long)tv->tv_sec * USEC_PER_SEC) +
> + (tv->tv_nsec / NSEC_PER_USEC);
>  }
>  
>  static int do_readsector0(int fd, unsigned int timeout)
> @@ -60,198 +61,197 @@ static int do_readsector0(int fd, unsigned int timeout)
>   unsigned char sbuf[SENSE_BUFF_LEN];
>   int ret;
>  
> - ret = sg_read(fd, [0], 4096, [0],
> -   SENSE_BUFF_LEN, timeout);
> + ret = sg_read(fd, [0], 4096, [0], SENSE_BUFF_LEN, timeout);
>  
>   return ret;
>  }
>  
>  int check_args_valid(int io_num, int base_num)
>  {
> -if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM))
> -{
> -pp_pl_log(0, "args io_num is outside the valid range");
> -return 0;
> -}
> -
> -if ((base_num < MIN_BASE_NUM) || (base_num > MAX_BASE_NUM))
> -{
> -pp_pl_log(0, "args base_num is outside the valid range");
> -return 0;
> -}
> -
> -return 1;
> + if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) {
> + pp_pl_log(0, "args io_num is outside the valid range");
> + return 0;
> + }
> +
> + if ((base_num < MIN_BASE_NUM) || (base_num > MAX_BASE_NUM)) {
> + pp_pl_log(0, "args base_num is outside the valid range");
> + return 0;
> + }
> +
> + return 1;
>  }
>  
> -/* In multipath.conf, args form: io_num|base_num. For example,
> -*  args is "20|10", this function can get io_num value 20, and
> -   base_num value 10.
> -*/
> -static int get_ionum_and_basenum(char *args,
> - int *ionum,
> - int *basenum)
> +/*
> + * In multipath.conf, args form: io_num|base_num. For example,
> + * args is "20|10", this function can get io_num value 20, and
> + * base_num value 10.
> + */
> +static int get_ionum_and_basenum(char *args, int *ionum, int *basenum)
>  {
> -char source[MAX_CHAR_SIZE];
> -char vertica = '|';
> -char *endstrbefore = NULL;
> -char *endstrafter = NULL;
> -unsigned int size = strlen(args);
> -
> -if ((args == NULL) || (ionum == NULL) || (basenum == NULL))
> -{
> -pp_pl_log(0, "args string is NULL");
> -return 0;
> -}
> -
> -i

Re: [dm-devel] [PATCH v6 1/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-21 Thread Yang Feng
Hi Christophe,

OK, thanks a lot.

Regards,
Yang

---
On 2017/6/21 18:22, Christophe Varoqui wrote:
> This version is now merged.
> Thanks.
> 
> Incremental enhancements can go from there if needed.
> 
> On Wed, Jun 21, 2017 at 11:40 AM, Yang Feng <philip.y...@huawei.com 
> <mailto:philip.y...@huawei.com>> wrote:
> 
> libmultipath/prioritizers: Prioritizer for device mapper multipath,
> where the corresponding priority values of specific paths are provided
> by a latency algorithm on the logarithmic scale. And the latency algorithm
> is dependent on the following arguments(io_num and base_num).
> The principle of the algorithm is illustrated as follows:
> 1. By sending a certain number "io_num" of read IOs to the current
> path continuously, the IOs' average latency can be calculated.
> 2. Max average latency value is 100s, and min value is 1us. According
> to the average latency of each path and the "base_number" of logarithmic
> scale, the priority "rc" of each path can be provided.
> 
> For example: If base_num=10, the paths will be grouped in priority groups
> with path latency <=1us, (1us, 10us], (10us, 100us], (100us, 1ms], (1ms, 
> 10ms],
> (10ms, 100ms], (100ms, 1s], (1s, 10s], (10s, 100s], >100s. As follows:
> 
><=1us(1us, 10us]  (10us, 100us]   >100s
> 
> |--|--|--|...|--|
> |  priority rank 9 |  priority rank 8 |  priority rank 7 |...|  priority 
> rank 0 |
> 
> |--|------|--|...|--|
>Priority Rank Partitioning
> 
> Signed-off-by: Yang Feng <philip.y...@huawei.com 
> <mailto:philip.y...@huawei.com>>
> Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com 
> <mailto:bmarz...@redhat.com>>
> Reviewed-by: Martin Wilck <mwi...@suse.com <mailto:mwi...@suse.com>>
> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com 
> <mailto:xose.vazq...@gmail.com>>
> Reviewed-by: Hannes Reinecke <h...@suse.de <mailto:h...@suse.de>>
> ---
>  libmultipath/prio.h  |   1 +
>  libmultipath/prioritizers/Makefile   |   4 +
>  libmultipath/prioritizers/path_latency.c | 257 
> +++
>  multipath/multipath.conf.5   |  20 +++
>  4 files changed, 282 insertions(+)
>  create mode 100644 libmultipath/prioritizers/path_latency.c
> 
---

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH v6 1/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-21 Thread Yang Feng
libmultipath/prioritizers: Prioritizer for device mapper multipath,
where the corresponding priority values of specific paths are provided
by a latency algorithm on the logarithmic scale. And the latency algorithm
is dependent on the following arguments(io_num and base_num).
The principle of the algorithm is illustrated as follows:
1. By sending a certain number "io_num" of read IOs to the current
path continuously, the IOs' average latency can be calculated.
2. Max average latency value is 100s, and min value is 1us. According
to the average latency of each path and the "base_number" of logarithmic
scale, the priority "rc" of each path can be provided.

For example: If base_num=10, the paths will be grouped in priority groups
with path latency <=1us, (1us, 10us], (10us, 100us], (100us, 1ms], (1ms, 10ms],
(10ms, 100ms], (100ms, 1s], (1s, 10s], (10s, 100s], >100s. As follows:

   <=1us(1us, 10us]  (10us, 100us]   >100s
|--|--|--|...|--|
|  priority rank 9 |  priority rank 8 |  priority rank 7 |...|  priority rank 0 
|
|--|--|--|...|--|
   Priority Rank Partitioning

Signed-off-by: Yang Feng <philip.y...@huawei.com>
Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com>
Reviewed-by: Martin Wilck <mwi...@suse.com>
Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
Reviewed-by: Hannes Reinecke <h...@suse.de>
---
 libmultipath/prio.h  |   1 +
 libmultipath/prioritizers/Makefile   |   4 +
 libmultipath/prioritizers/path_latency.c | 257 +++
 multipath/multipath.conf.5   |  20 +++
 4 files changed, 282 insertions(+)
 create mode 100644 libmultipath/prioritizers/path_latency.c

diff --git a/libmultipath/prio.h b/libmultipath/prio.h
index 0193c52..c97fe39 100644
--- a/libmultipath/prio.h
+++ b/libmultipath/prio.h
@@ -29,6 +29,7 @@ struct path;
 #define PRIO_RDAC  "rdac"
 #define PRIO_WEIGHTED_PATH "weightedpath"
 #define PRIO_SYSFS "sysfs"
+#define PRIO_PATH_LATENCY  "path_latency"
 
 /*
  * Value used to mark the fact prio was not defined
diff --git a/libmultipath/prioritizers/Makefile 
b/libmultipath/prioritizers/Makefile
index 36b42e4..ca47cdf 100644
--- a/libmultipath/prioritizers/Makefile
+++ b/libmultipath/prioritizers/Makefile
@@ -18,6 +18,7 @@ LIBS = \
libpriorandom.so \
libpriordac.so \
libprioweightedpath.so \
+   libpriopath_latency.so \
libpriosysfs.so
 
 all: $(LIBS)
@@ -25,6 +26,9 @@ all: $(LIBS)
 libprioalua.so: alua.o alua_rtpg.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
+libpriopath_latency.so: path_latency.o  ../checkers/libsg.o
+   $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm
+
 libprio%.so: %.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
diff --git a/libmultipath/prioritizers/path_latency.c 
b/libmultipath/prioritizers/path_latency.c
new file mode 100644
index 000..046e13b
--- /dev/null
+++ b/libmultipath/prioritizers/path_latency.c
@@ -0,0 +1,257 @@
+/*
+ * (C) Copyright HUAWEI Technology Corp. 2017, All Rights Reserved.
+ *
+ * path_latency.c
+ *
+ * Prioritizer for device mapper multipath, where the corresponding priority
+ * values of specific paths are provided by a latency algorithm. And the
+ * latency algorithm is dependent on arguments("io_num" and "base_num").
+ *
+ * The principle of the algorithm as follows:
+ * 1. By sending a certain number "io_num" of read IOs to the current path
+ *continuously, the IOs' average latency can be calculated.
+ * 2. Max value and min value of average latency are constant. According to
+ *the average latency of each path and the "base_num" of logarithmic
+ *scale, the priority "rc" of each path can be provided.
+ *
+ * Author(s): Yang Feng <philip.y...@huawei.com>
+ *
+ * This file is released under the GPL version 2, or any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include "debug.h"
+#include "prio.h"
+#include "structs.h"
+#include "../checkers/libsg.h"
+
+#define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " fmt, 
##args)
+
+#define MAX_IO_NUM  200
+#define MIN_IO_NUM  2
+
+#define MAX_BASE_NUM10
+#define MIN_BASE_NUM2
+
+#define MAX_AVG_LATENCY 1.  /*Unit: us*/
+#define MIN_AVG_LATENCY 1.  /*Unit: us*/
+
+#define DEFAULT_PRIORITY0
+
+#define MAX_CHAR_SIZE   30
+
+#define USEC_PER_SEC100LL
+#define NSEC_PER_USEC   1000LL
+
+static long long path_latency[MAX_IO_NUM];
+
+static inline long long timeval_to

[dm-devel] [PATCH v6 0/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-21 Thread Yang Feng
This patch value is in the following: 
1. In the Storage-Backup environment of HyperCluster, includes
one storage array near to the host and one remote storage array,
and the two storage arrays have the same hardware. The same LUN is
writed or readed by the two storage arrays. However, usually, the
average latency of the paths of the remote storage array is higher
than the near storage array's. Apparently, the prioritizer can be
a good automatic solution. And the current selectors don't solve it,
IOs will send to the paths of the remote storage array, IOPS will
be influenced unavoidably.

2. In the environment of single storage array, the prioritizer can
automatically separate the paths who's latency is higher, IOs will
not send to this paths. But the current selectors don't solve this
problem, IOPS will be influenced unavoidably.

Changes from v5:
* Max average latency value is set to 100s, and min value is set to
1us. Use logarithmic scale to partition different path priority ranks
instead of linear scale, while use args "base_num" instead of
"latency_interval". Fix according to Martin's reviews.
* Make log to be more regular. Fix according to Martin's reviews.

Changes from v4:
* Args "latency_interval" is set by using logarithmic scale, 
where base number is 10. Fix according to Martin's reviews.
* The value of "MAX_LATENCY_INTERVAL" is set 10 from 60.

Changes from v3:
* Delete Copyright time "2021", Fix according to Xose's reviews.
* Add version for GPL, Fix according to Xose's reviews.

Changes from v2:
* Reorganize the commit comment and patch format.
* Added Benjamin, Martin and Xose's Reviewed-by.

Changes from v1:
* The value of "MIN_IO_NUM" is set 2 from 10.
* Fix according to Benjamin, Martin and Xose's reviews.

Yang Feng (1):
  multipath-tools: Prioritizer based on a latency algorithm

 libmultipath/prio.h  |   1 +
 libmultipath/prioritizers/Makefile   |   4 +
 libmultipath/prioritizers/path_latency.c | 257 +++
 multipath/multipath.conf.5   |  20 +++
 4 files changed, 282 insertions(+)
 create mode 100644 libmultipath/prioritizers/path_latency.c

-- 
2.6.4.windows.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v5 1/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-16 Thread Yang Feng
Hi Martin,

Thanks for your reviews.
Please find my replys.
And the patch v6 will be sent later.

Regards,
Yang

---
> 
>> +#define MAX_LATENCY_INTERVAL10/*unit: s*/
>> +#define MIN_LATENCY_INTERVAL1 /*unit: us, or ms, or s*/
> 
> Why not use the same unit or both values?
> 
It's easy to check the valid range of the "latency_interval" value.
For unit: us, or ms, or s, the min digital value is the same, is 1.
And for all unit, the max conversion value is the same, is 10 seconds.

>> +static inline long long timeval_to_us(const struct timespec *tv)
>> +{
>> +return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv-
>>> tv_nsec >> 10);
> 
> Please calculate correctly, divide by 1000.
> 
Thanks, it will be fixed.

>> +int check_args_valid(int io_num, long long latency_interval, int
>> type)
>> +{
>> +if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM))
>> +{
>> +condlog(0, "args io_num is more than the valid values
>> range");
>> +return 0;
>> +}
> 
> Please write "outside the valid range" rather than "more than the valid
> values range".
> 
Thanks, it will be fixed.

> In general, when you use condlog(), please prefix all messages with a
> string that clearly identifies the path_latency prioritizer, e.g.
> PRIO_PATH_LATENCY.
> I know we're not currently doing that consistently in multipath-tools,
> but that shouldn't be a reason for not doing when adding new code.
> 
Good, thanks, it will be fixed.

>> +
>> +/* check value is set by using logarithmic scale, and base
>> number is 10 */
>> +if ((latency_interval % BASE_NUMBER) != 0)
>> +{
>> +condlog(0, "args latency_interval is not set by using
>> logarithmic scale , where base number is 10");
>> +return 0;
>> +}
> 
> This is not what I meant with "logarithmic scale". See below.
> 
>> +
>> +/* s:[1, 10], ms:[1, 1], us:[1, 1000] */
>> +if ((latency_interval < MIN_LATENCY_INTERVAL)
>> +|| (latency_interval > (MAX_LATENCY_INTERVAL * USEC_PER_SEC
>> / conversion_ratio[type])))
>> +{
>> +condlog(0, "args latency_interval is more than the valid
>> values range");
>> +return 0;
>> +}
> 
> Again, please use "outside".
> 
Thanks, it will be fixed.

>> +/* In multipath.conf, args form: io_num|latency_interval. For
>> example,
>> +*  args is "20|10ms", this function can get 20, 10.
>> +*/
>> +static int get_interval_and_ionum(char *args,
>> +int *ionum,
>> +long long *interval)
>> +{
>> +char source[MAX_CHAR_SIZE];
>> +char vertica = '|';
>> +char *endstrbefore = NULL;
>> +char *endstrafter = NULL;
>> +int type;
>> +unsigned int size = strlen(args);
>> +long long ratio;
>> +
>> +if ((args == NULL) || (ionum == NULL) || (interval == NULL))
>> +{
>> +condlog(0, "args string is NULL");
>> +return 0;
>> +}
>> +
>> +if ((size < 1) || (size > MAX_CHAR_SIZE-1))
>> +{
>> +condlog(0, "args string's size is too long");
>> +return 0;
>> +}
> 
> "too long" is not correct for the first (more likely) error condition.
> 
>> +
>> +memcpy(source, args, size+1);
>> +
>> +if (!isdigit(source[0]))
>> +{
>> +condlog(0, "args io_num string's first char is not digit");
>> +return 0;
>> +}
> 
> It would be ok to catch all these errors with a single line of code
> such as "path_latency: invalid prio_args format: %s".
> 
OK, it will be fixed.

>> +
>> +*ionum = (int)strtoul(source, , 10);
>> +if (endstrbefore[0] != vertica)
>> +{
>> +condlog(0, "segmentation char is invalid");
>> +return 0;
>> +}
>> +
>> +if (!isdigit(endstrbefore[1]))
>> +{
>> +condlog(0, "args latency_interval string's first char is not
>> digit");
>> +return 0;
>> +}
>> +
>> +*interval = (long long)strtol([1], ,
>> 10);
>> +type = get_interval_type(endstrafter);
>> +if (type == INTERVAL_INVALID)
>> +{
>> +condlog(0, "args latency_interval type is invalid");
>> +return 0;
>> +}
>> +
>> +if (check_args_valid(*ionum, *interval, type) == 0)
>> +{
>> +return 0;
>> +}
>> +
>> +ratio = get_conversion_ratio(type);
>> +*interval *= (long long)ratio;
>> +
>> +return 1;
>> +}
>> +
>> +long long calc_standard_deviation(long long *path_latency, int size,
>> long long avglatency)
>> +{
>> +int index;
>> +long long total = 0;
>> +
>> +for (index = 0; index < size; index++)
>> +{
>> +total += (path_latency[index] - avglatency) *
>> (path_latency[index] - avglatency);
>> +}
>> +
>> +total /= (size-1);
>> +
>> +return (long long)sqrt((double)total);
>> +}
>> +
>> +int getprio (struct path *pp, char *args, unsigned int timeout)
>> +{
>> +int rc, temp;
>> +int index = 0;
>> +int io_num;
>> +long long latency_interval;
>> +long long 

[dm-devel] [PATCH v5 0/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-09 Thread Yang Feng
This patch value is in the following: 
1. In the Storage-Backup environment of HyperCluster, includes
one storage array near to the host and one remote storage array,
and the two storage arrays have the same hardware.The same LUN is
writed or readed by the two storage arrays. However, usually, the
average latency of the paths of the remote storage array is much 
higher than the near storage array's. Apparently, the prioritizer
can be a good automatic solution. And the current selectors don't
solve it, IOs will send to the paths of the remote storage array,
IOPS will be influenced unavoidably.

2. In the environment of single storage array, the prioritizer can
automatically separate the paths who's latency is much higher, IOs
will not send to this paths. But the current selectors don't solve
this problem, IOPS will be influenced unavoidably.

Changes from v4:
* Argument "latency_interval" is set by using logarithmic scale, 
where base number is 10. Fix according to Martin's reviews.
* The value of "MAX_LATENCY_INTERVAL" is set 10 from 60.

Changes from v3:
* Delete Copyright time "2021", Fix according to Xose's reviews.
* Add version for GPL, Fix according to Xose's reviews.

Changes from v2:
* Reorganize the commit comment and patch format.
* Added Benjamin, Martin and Xose's Reviewed-by.

Changes from v1:
* The value of "MIN_IO_NUM" is set 2 from 10.
* Fix according to Benjamin, Martin and Xose's reviews.

Yang Feng (1):
  multipath-tools: Prioritizer based on a latency algorithm

 libmultipath/prio.h  |   1 +
 libmultipath/prioritizers/Makefile   |   4 +
 libmultipath/prioritizers/path_latency.c | 282 +++
 multipath/multipath.conf.5   |  20 +++
 4 files changed, 307 insertions(+)
 create mode 100644 libmultipath/prioritizers/path_latency.c

-- 
2.6.4.windows.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH v5 1/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-09 Thread Yang Feng
libmultipath/prioritizers: Prioritizer for device mapper multipath,
where the corresponding priority values of specific paths are provided
by a latency algorithm. And the latency algorithm is dependent on the
following arguments(latency_interval and io_num).
The principle of the algorithm is illustrated as follows:
1. By sending a certain number "cons_num" of read IOs to the current
path continuously, the IOs' average latency can be calculated.
2. According to the average latency of each path and the weight value
"latency_interval", the priority "rc" of each path can be provided.

   latency_interval   latency_interval   latency_interval   latency_interval
 
|--|--|--|...|--|
 |  priority rank 1 |  priority rank 2 |  priority rank 3 |...|  priority rank 
x |
 
|--|--|--|...|--|
  Priority Rank Partitioning

Signed-off-by: Yang Feng <philip.y...@huawei.com>
Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com>
Reviewed-by: Martin Wilck <mwi...@suse.com>
Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
Reviewed-by: Hannes Reinecke <h...@suse.de>
---
 libmultipath/prio.h  |   1 +
 libmultipath/prioritizers/Makefile   |   4 +
 libmultipath/prioritizers/path_latency.c | 282 +++
 multipath/multipath.conf.5   |  20 +++
 4 files changed, 307 insertions(+)
 create mode 100644 libmultipath/prioritizers/path_latency.c

diff --git a/libmultipath/prio.h b/libmultipath/prio.h
index 0193c52..c97fe39 100644
--- a/libmultipath/prio.h
+++ b/libmultipath/prio.h
@@ -29,6 +29,7 @@ struct path;
 #define PRIO_RDAC  "rdac"
 #define PRIO_WEIGHTED_PATH "weightedpath"
 #define PRIO_SYSFS "sysfs"
+#define PRIO_PATH_LATENCY  "path_latency"
 
 /*
  * Value used to mark the fact prio was not defined
diff --git a/libmultipath/prioritizers/Makefile 
b/libmultipath/prioritizers/Makefile
index 36b42e4..ca47cdf 100644
--- a/libmultipath/prioritizers/Makefile
+++ b/libmultipath/prioritizers/Makefile
@@ -18,6 +18,7 @@ LIBS = \
libpriorandom.so \
libpriordac.so \
libprioweightedpath.so \
+   libpriopath_latency.so \
libpriosysfs.so
 
 all: $(LIBS)
@@ -25,6 +26,9 @@ all: $(LIBS)
 libprioalua.so: alua.o alua_rtpg.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
+libpriopath_latency.so: path_latency.o  ../checkers/libsg.o
+   $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm
+
 libprio%.so: %.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
diff --git a/libmultipath/prioritizers/path_latency.c 
b/libmultipath/prioritizers/path_latency.c
new file mode 100644
index 000..e9cf679
--- /dev/null
+++ b/libmultipath/prioritizers/path_latency.c
@@ -0,0 +1,282 @@
+/*
+ * (C) Copyright HUAWEI Technology Corp. 2017, All Rights Reserved.
+ *
+ * path_latency.c
+ *
+ * Prioritizer for device mapper multipath, where the corresponding priority
+ * values of specific paths are provided by a latency algorithm. And the
+ * latency algorithm is dependent on arguments.
+ *
+ * The principle of the algorithm as follows:
+ * 1. By sending a certain number "io_num" of read IOs to the current path
+ *continuously, the IOs' average latency can be calculated.
+ * 2. According to the average latency of each path and the weight value
+ *"latency_interval", the priority "rc" of each path can be provided.
+ *
+ * Author(s): Yang Feng <philip.y...@huawei.com>
+ *Guan Junxiong <guanjunxi...@huawei.com>
+ *Zou Ming <zouming.zoum...@huawei.com>
+ *
+ * This file is released under the GPL version 2, or any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include "debug.h"
+#include "prio.h"
+#include "structs.h"
+#include "../checkers/libsg.h"
+
+#define THRES_USEC_VALUE12000LL/*unit: us, =120s*/
+#define BASE_NUMBER 10
+
+#define MAX_IO_NUM  200
+#define MIN_IO_NUM  2
+
+#define MAX_LATENCY_INTERVAL10/*unit: s*/
+#define MIN_LATENCY_INTERVAL1 /*unit: us, or ms, or s*/
+
+#define DEFAULT_PRIORITY0
+
+#define MAX_CHAR_SIZE   30
+
+#define CHAR_USEC   "us"
+#define CHAR_MSEC   "ms"
+#define CHAR_SEC"s"
+
+enum interval_type {
+INTERVAL_USEC,
+INTERVAL_MSEC,
+INTERVAL_SEC,
+INTERVAL_INVALID
+};
+
+/* interval_unit_str and interval_unit_type keep the same assignment sequence 
*/
+static const char *interval_unit_str[MAX_CHAR_SIZE] = {
+CHAR_USEC, CHAR_MSEC, CHAR_SEC
+};
+static const int interval_unit_type[] = {
+INTERVAL_USEC, INTERVAL

Re: [dm-devel] [PATCH v4 0/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-09 Thread Yang Feng
Hi Martin,

Thanks a lot.
It's a good idea.
The updated patch will be sent later.

Regards,
-Yang


On 2017/6/9 16:05, Martin Wilck wrote:
> Hello Yang,
> 
>>> Actually, you're not alone here; several other storage array setups
>>> suffer from the same problem.
>>>
>>> Eg if you have a site-failover setup with two storage arrays at
>>> different locations the problem is more-or-less the same;
>>> both arrays potentially will be displaying identical priority
>>> information, despite one array being remote.
>>>
>>
>> It's up to the value set of the argument "latency_interval".For
>> example,
>> If latency_interval=10ms, the paths will be grouped in priority
>> groups
>> with path latency 0-10ms, 10-20ms, 20-30ms, etc. If the argument
>> "latency_interval" is set to appropriate value and the distance
>> between
>> two arrays is not enough far, two priorities may be the same, But
>> it's
>> OK, because between two arrays, the gap of average path latency is
>> very
>> small and tolerable.
> 
> I wonder if it would make sense to use "logarithmically" scaled latency
> intervals here. It wouldn't make a large difference whether the latency
> is 1ms or 2ms, but if we have paths where the latencies differ by order
> of magnitude, it would be very important to make a distinction. With
> the current linear intervals, it would be hard to get this right (us
> interval size would result in too many intervals, and sec interval size
> wouldn't allow a distinction between us and ms latencies).
> 
> By using logarithmic scale, you could setup the latency intevals e.g
> like this: 
> 
>   < 10us, 10us-100us, 100us-1ms, 1ms-10ms, 10ms-100ms, > 100ms
> 
> IMO that would be better, in particular if the latencies differ
> strongly between paths.
> 
> Regards
> Martin
> 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v4 0/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-09 Thread Yang Feng
Hi Hannes,

Thanks a lot.
Please find my replys as follows.

regards.
-Yang

On 2017/6/8 23:37, Hannes Reinecke wrote:
> On 06/06/2017 04:43 AM, Yang Feng wrote:
>> This patch value is in the following: 
>> 1. In the Storage-Backup environment of HyperCluster, includes
>> one storage array near to the host and one remote storage array,
>> and the two storage arrays have the same hardware.The same LUN is
>> writed or readed by the two storage arrays. However, usually, the
>> average latency of the paths of the remote storage array is much 
>> higher than the near storage array's. Apparently, the prioritizer
>> can be a good automatic solution. And the current selectors don't
>> solve it, IOs will send to the paths of the remote storage array,
>> IOPS will be influenced unavoidably.
>>
> Actually, you're not alone here; several other storage array setups
> suffer from the same problem.
> 
> Eg if you have a site-failover setup with two storage arrays at
> different locations the problem is more-or-less the same;
> both arrays potentially will be displaying identical priority
> information, despite one array being remote.
> 

It's up to the value set of the argument "latency_interval".For example,
If latency_interval=10ms, the paths will be grouped in priority groups
with path latency 0-10ms, 10-20ms, 20-30ms, etc. If the argument
"latency_interval" is set to appropriate value and the distance between
two arrays is not enough far, two priorities may be the same, But it's
OK, because between two arrays, the gap of average path latency is very
small and tolerable.

> Similarly, if you have a failover scenario where the individual paths
> are accessed via different protocols you're facing the same problem.
> 

Usually, the number of paths grouped into one priority is not only one.
And Mostly, the average latency of paths via FC protocol is much smaller
than the average latency of paths via iSCSI protocol. Unless all paths
via FC are fault, the failover scenario to paths via iSCSI will not happen.

> The underlying reason for this difficulty is the two-stage topology of
> the current multipath implementation:
> 
> - Each path is grouped into a path group
> - Priorities are attached to a path group
> - Each path group belongs to a multipath device
> 
> And as we only have a _single_ prioritizer per path we cannot easily
> handle this situation.
> 
> Ideally we should be able to 'stack' prioritizers, which would work by
> combining the priority numbers from the stacked/combined prioritizers.
> 

In this prioritizer, all paths will be grouped into several priorities,
and each path group has own priority who's different from the others.

1. By sending a certain number "cons_num" of read IOs to the current
path continuously, the IOs' average latency can be calculated.
2. According to the average latency of each path and the weight value
"latency_interval", the priority "rc" of each path can be provided.

Then all paths can be divided into different path groups. For example,
If latency_interval=10ms, the paths will be grouped in priority groups
with path latency 0-10ms, 10-20ms, 20-30ms, etc, as follows:

   latency_interval   latency_interval   latency_interval   latency_interval
 
|--|--|--|...|--|
 |  priority rank 1 |  priority rank 2 |  priority rank 3 |...|  priority rank 
x |
 
|--|--|--|...|--|
 | 0~10ms   | 10~20ms  | 20~30ms  |...| 10*(x-1)~10*x 
ms |
  Priority Rank Partitioning

> But that would be requiring quite some rework, of course.
> 
> Cheers,
> 
> Hannes
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH v3 0/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-06 Thread Yang Feng
This patch value is in the following: 
1. In the Storage-Backup environment of HyperCluster, includes
one storage array near to the host and one remote storage array,
and the two storage arrays have the same hardware.The same LUN is
writed or readed by the two storage arrays. However, usually, the
average latency of the paths of the remote storage array is much 
higher than the near storage array's. Apparently, the prioritizer
can be a good automatic solution. And the current selectors don't
solve it, IOs will send to the paths of the remote storage array,
IOPS will be influenced unavoidably.

2. In the environment of single storage array, the prioritizer can
automatically separate the paths who's latency is much higher, IOs
will not send to this paths. But the current selectors don't solve
this problem, IOPS will be influenced unavoidably.


Changes to v2:
* Reorganize the commit comment and patch format
* Added Benjamin, Martin and Xose's Reviewed-by

Changes to v1:
* The value of "MIN_IO_NUM" is set 2 from 10
* Fix according to Benjamin, Martin and Xose's reviews.

Yang Feng (1):
  multipath-tools: Prioritizer based on a latency algorithm

 libmultipath/prio.h  |   1 +
 libmultipath/prioritizers/Makefile   |   4 +
 libmultipath/prioritizers/path_latency.c | 272 +++
 multipath/multipath.conf.5   |  18 ++
 4 files changed, 295 insertions(+)
 create mode 100644 libmultipath/prioritizers/path_latency.c

-- 
2.6.4.windows.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH v4 0/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-05 Thread Yang Feng
This patch value is in the following: 
1. In the Storage-Backup environment of HyperCluster, includes
one storage array near to the host and one remote storage array,
and the two storage arrays have the same hardware.The same LUN is
writed or readed by the two storage arrays. However, usually, the
average latency of the paths of the remote storage array is much 
higher than the near storage array's. Apparently, the prioritizer
can be a good automatic solution. And the current selectors don't
solve it, IOs will send to the paths of the remote storage array,
IOPS will be influenced unavoidably.

2. In the environment of single storage array, the prioritizer can
automatically separate the paths who's latency is much higher, IOs
will not send to this paths. But the current selectors don't solve
this problem, IOPS will be influenced unavoidably.

Changes from v3:
* Delete Copyright time "2021", Fix according to Xose's reviews.
* Add version for GPL, Fix according to Xose's reviews.

Changes from v2:
* Reorganize the commit comment and patch format.
* Added Benjamin, Martin and Xose's Reviewed-by.

Changes from v1:
* The value of "MIN_IO_NUM" is set 2 from 10.
* Fix according to Benjamin, Martin and Xose's reviews.

Yang Feng (1):
  multipath-tools: Prioritizer based on a latency algorithm

 libmultipath/prio.h  |   1 +
 libmultipath/prioritizers/Makefile   |   4 +
 libmultipath/prioritizers/path_latency.c | 272 +++
 multipath/multipath.conf.5   |  18 ++
 4 files changed, 295 insertions(+)
 create mode 100644 libmultipath/prioritizers/path_latency.c

-- 
2.6.4.windows.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH v4 1/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-05 Thread Yang Feng
libmultipath/prioritizers: Prioritizer for device mapper multipath,
where the corresponding priority values of specific paths are provided
by a latency algorithm. And the latency algorithm is dependent on the
following arguments(latency_interval and io_num).
The principle of the algorithm is illustrated as follows:
1. By sending a certain number "cons_num" of read IOs to the current
path continuously, the IOs' average latency can be calculated.
2. According to the average latency of each path and the weight value
"latency_interval", the priority "rc" of each path can be provided.

   latency_interval   latency_interval   latency_interval   latency_interval
 
|--|--|--|...|--|
 |  priority rank 1 |  priority rank 2 |  priority rank 3 |...|  priority rank 
x |
 
|--|--|--|...|--|
  Priority Rank Partitioning

Signed-off-by: Yang Feng <philip.y...@huawei.com>
Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com>
Reviewed-by: Martin Wilck <mwi...@suse.com>
Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
---
 libmultipath/prio.h  |   1 +
 libmultipath/prioritizers/Makefile   |   4 +
 libmultipath/prioritizers/path_latency.c | 272 +++
 multipath/multipath.conf.5   |  18 ++
 4 files changed, 295 insertions(+)
 create mode 100644 libmultipath/prioritizers/path_latency.c

diff --git a/libmultipath/prio.h b/libmultipath/prio.h
index 0193c52..c97fe39 100644
--- a/libmultipath/prio.h
+++ b/libmultipath/prio.h
@@ -29,6 +29,7 @@ struct path;
 #define PRIO_RDAC  "rdac"
 #define PRIO_WEIGHTED_PATH "weightedpath"
 #define PRIO_SYSFS "sysfs"
+#define PRIO_PATH_LATENCY  "path_latency"
 
 /*
  * Value used to mark the fact prio was not defined
diff --git a/libmultipath/prioritizers/Makefile 
b/libmultipath/prioritizers/Makefile
index 36b42e4..ca47cdf 100644
--- a/libmultipath/prioritizers/Makefile
+++ b/libmultipath/prioritizers/Makefile
@@ -18,6 +18,7 @@ LIBS = \
libpriorandom.so \
libpriordac.so \
libprioweightedpath.so \
+   libpriopath_latency.so \
libpriosysfs.so
 
 all: $(LIBS)
@@ -25,6 +26,9 @@ all: $(LIBS)
 libprioalua.so: alua.o alua_rtpg.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
+libpriopath_latency.so: path_latency.o  ../checkers/libsg.o
+   $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm
+
 libprio%.so: %.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
diff --git a/libmultipath/prioritizers/path_latency.c 
b/libmultipath/prioritizers/path_latency.c
new file mode 100644
index 000..f8c9347
--- /dev/null
+++ b/libmultipath/prioritizers/path_latency.c
@@ -0,0 +1,272 @@
+/*
+ * (C) Copyright HUAWEI Technology Corp. 2017, All Rights Reserved.
+ *
+ * main.c
+ *
+ * Prioritizer for device mapper multipath, where the corresponding priority
+ * values of specific paths are provided by a latency algorithm. And the
+ * latency algorithm is dependent on arguments.
+ *
+ * The principle of the algorithm as follows:
+ * 1. By sending a certain number "io_num" of read IOs to the current path
+ *continuously, the IOs' average latency can be calculated.
+ * 2. According to the average latency of each path and the weight value
+ *"latency_interval", the priority "rc" of each path can be provided.
+ *
+ * Author(s): Yang Feng <philip.y...@huawei.com>
+ *Zou Ming <zouming.zoum...@huawei.com>
+ *Guan Junxiong <guanjunxi...@huawei.com>
+ *
+ * This file is released under the GPL version 2, or any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include "debug.h"
+#include "prio.h"
+#include "structs.h"
+#include "../checkers/libsg.h"
+
+#define THRES_USEC_VALUE12000LL/*unit: us, =120s*/
+
+#define MAX_IO_NUM  200
+#define MIN_IO_NUM  2
+
+#define MAX_LATENCY_INTERVAL60/*unit: s*/
+#define MIN_LATENCY_INTERVAL1 /*unit: us, or ms, or s*/
+
+#define DEFAULT_PRIORITY0
+
+#define MAX_CHAR_SIZE   30
+
+#define CHAR_USEC   "us"
+#define CHAR_MSEC   "ms"
+#define CHAR_SEC"s"
+
+enum interval_type {
+INTERVAL_USEC,
+INTERVAL_MSEC,
+INTERVAL_SEC,
+INTERVAL_INVALID
+};
+
+/* interval_unit_str and interval_unit_type keep the same assignment sequence 
*/
+static const char *interval_unit_str[MAX_CHAR_SIZE] = {
+CHAR_USEC, CHAR_MSEC, CHAR_SEC
+};
+static const int interval_unit_type[] = {
+INTERVAL_USEC, INTERVAL_MSEC, INTERVAL_SEC
+};
+
+#define USEC_PER_SEC  100LL
+#define USEC_PER_MSEC 1000LL

[dm-devel] [PATCH v3 1/1] multipath-tools: Prioritizer based on a latency algorithm

2017-06-04 Thread Yang Feng
libmultipath/prioritizers: Prioritizer for device mapper multipath,
where the corresponding priority values of specific paths are provided
by a latency algorithm. And the latency algorithm is dependent on the
following arguments(latency_interval and io_num).
The principle of the algorithm is illustrated as follows:
1. By sending a certain number "cons_num" of read IOs to the current
path continuously, the IOs' average latency can be calculated.
2. According to the average latency of each path and the weight value
"latency_interval", the priority "rc" of each path can be provided.

 latency_interval   latency_interval   latency_interval   latency_interval
 
|--|--|--|...|--|
 |  priority rank 1 |  priority rank 2 |  priority rank 3 |...|  priority rank 
x |
 
|--|--|--|...|--|
  Priority Rank Partitioning

Signed-off-by: Yang Feng <philip.y...@huawei.com>
Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com>
Reviewed-by: Martin Wilck <mwi...@suse.com>
Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com>
---
 libmultipath/prio.h  |   1 +
 libmultipath/prioritizers/Makefile   |   4 +
 libmultipath/prioritizers/path_latency.c | 272 +++
 multipath/multipath.conf.5   |  18 ++
 4 files changed, 295 insertions(+)
 create mode 100644 libmultipath/prioritizers/path_latency.c

diff --git a/libmultipath/prio.h b/libmultipath/prio.h
index 0193c52..c97fe39 100644
--- a/libmultipath/prio.h
+++ b/libmultipath/prio.h
@@ -29,6 +29,7 @@ struct path;
 #define PRIO_RDAC  "rdac"
 #define PRIO_WEIGHTED_PATH "weightedpath"
 #define PRIO_SYSFS "sysfs"
+#define PRIO_PATH_LATENCY  "path_latency"
 
 /*
  * Value used to mark the fact prio was not defined
diff --git a/libmultipath/prioritizers/Makefile 
b/libmultipath/prioritizers/Makefile
index 36b42e4..ca47cdf 100644
--- a/libmultipath/prioritizers/Makefile
+++ b/libmultipath/prioritizers/Makefile
@@ -18,6 +18,7 @@ LIBS = \
libpriorandom.so \
libpriordac.so \
libprioweightedpath.so \
+   libpriopath_latency.so \
libpriosysfs.so
 
 all: $(LIBS)
@@ -25,6 +26,9 @@ all: $(LIBS)
 libprioalua.so: alua.o alua_rtpg.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
+libpriopath_latency.so: path_latency.o  ../checkers/libsg.o
+   $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm
+
 libprio%.so: %.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
diff --git a/libmultipath/prioritizers/path_latency.c 
b/libmultipath/prioritizers/path_latency.c
new file mode 100644
index 000..081f546
--- /dev/null
+++ b/libmultipath/prioritizers/path_latency.c
@@ -0,0 +1,272 @@
+/*
+ * (C) Copyright HUAWEI Technology Corp. 2017, 2021   All Rights Reserved.
+ *
+ * main.c
+ *
+ * Prioritizer for device mapper multipath, where the corresponding priority
+ * values of specific paths are provided by a latency algorithm. And the
+ * latency algorithm is dependent on arguments.
+ *
+ * The principle of the algorithm as follows:
+ * 1. By sending a certain number "io_num" of read IOs to the current path
+ *continuously, the IOs' average latency can be calculated.
+ * 2. According to the average latency of each path and the weight value
+ *"latency_interval", the priority "rc" of each path can be provided.
+ *
+ * Author(s): Yang Feng <philip.y...@huawei.com>
+ *Zou Ming <zouming.zoum...@huawei.com>
+ *Guan Junxiong <guanjunxi...@huawei.com>
+ *
+ * This file is released under the GPL.
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include "debug.h"
+#include "prio.h"
+#include "structs.h"
+#include "../checkers/libsg.h"
+
+#define THRES_USEC_VALUE12000LL/*unit: us, =120s*/
+
+#define MAX_IO_NUM  200
+#define MIN_IO_NUM  2
+
+#define MAX_LATENCY_INTERVAL60/*unit: s*/
+#define MIN_LATENCY_INTERVAL1 /*unit: us, or ms, or s*/
+
+#define DEFAULT_PRIORITY0
+
+#define MAX_CHAR_SIZE   30
+
+#define CHAR_USEC   "us"
+#define CHAR_MSEC   "ms"
+#define CHAR_SEC"s"
+
+enum interval_type {
+INTERVAL_USEC,
+INTERVAL_MSEC,
+INTERVAL_SEC,
+INTERVAL_INVALID
+};
+
+/* interval_unit_str and interval_unit_type keep the same assignment sequence 
*/
+static const char *interval_unit_str[MAX_CHAR_SIZE] = {
+CHAR_USEC, CHAR_MSEC, CHAR_SEC
+};
+static const int interval_unit_type[] = {
+INTERVAL_USEC, INTERVAL_MSEC, INTERVAL_SEC
+};
+
+#define USEC_PER_SEC  100LL
+#define USEC_PER_MSEC 1000LL
+#define USEC_PER_USEC 1LL
+
+static c

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a latency algorithm

2017-05-24 Thread Yang Feng


On 2017/5/25 3:57, Benjamin Marzinski wrote:
> On Wed, May 24, 2017 at 10:50:18AM +0800, Yang Feng wrote:
>> Hello Christophe,
> ACK
> 
> -Ben
>  

Hello Christophe,

Any advance for this patch?
Thanks,Best!

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a latency algorithm

2017-05-23 Thread Yang Feng
Hello Christophe,

Our technology team is working on a open source contribution for multipath.
We think that this patch is essential to isolate the high latency paths and
avoid high fluctuation of io performance, particularly, for the Storage-Backup
environment of HyperCluster.

Look forward your reply very much.
Respects and regards!


1. In the Storage-Backup environment of HyperCluster, includes one storage 
array near
to the host and one remote storage array, and the two storage arrays have the 
same hardware.
The same LUN is writed or readed by the two storage arrays.However, usually, 
the average
latency of the paths of the remote storage array is much higher than the near 
storage array's.
Apparently, the prioritizer can be a good automatic solution. And the current 
selectors don't
solve it, IOs will send to the paths of the remote storage array, IOPS will be 
influenced unavoidably.
2. In the environment of single storage array, the prioritizer can 
automatically separate the paths
who's latency is much higher, IOs will not send to this paths. But the current 
selectors don't solve
this problem, io performance will be influenced unavoidably.


The up-to-date patch as follows:
---
>From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001
From: Yang Feng <philip.y...@huawei.com>
Date: Fri, 19 May 2017 16:09:07 +0800
Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper 
multipath, where the corresponding priority
values of specific paths are provided by a latency algorithm. And the latency 
algorithm is dependent on the following
arguments(latency_interval and io_num). The principle of the algorithm is 
illustrated as follows:
1. By sending a certain number "cons_num" of read IOs to the current path 
continuously, the IOs' average latency can be calculated.
2. According to the average latency of each path and the weight value 
"latency_interval", the priority "rc" of each path can be provided.

   latency_interval   latency_interval   latency_interval   
latency_interval
 
|--|--|--|...|--|
 |  priority rank 1 |  priority rank 2 |  priority rank 3 |...| 
 priority rank x |
 
|--|--|--|...|--|
  Priority Rank Partitioning
---
 libmultipath/prioritizers/Makefile   |   6 +-
 libmultipath/prioritizers/path_latency.c | 271 +++
 multipath/multipath.conf.5   |  18 ++
 libmultipath/prio.h  |   1 +
 4 files changed, 295 insertions(+), 1 deletion(-)
 create mode 100644 libmultipath/prioritizers/path_latency.c

diff --git a/libmultipath/prioritizers/Makefile 
b/libmultipath/prioritizers/Makefile
index 36b42e4..d2f20f6 100644
--- a/libmultipath/prioritizers/Makefile
+++ b/libmultipath/prioritizers/Makefile
@@ -18,13 +18,17 @@ LIBS = \
libpriorandom.so \
libpriordac.so \
libprioweightedpath.so \
-   libpriosysfs.so
+   libpriopath_latency.so \
+   libpriosysfs.so

 all: $(LIBS)

 libprioalua.so: alua.o alua_rtpg.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^

+libpriopath_latency.so: path_latency.o  ../checkers/libsg.o
+   $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm
+
 libprio%.so: %.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^

diff --git a/libmultipath/prioritizers/path_latency.c 
b/libmultipath/prioritizers/path_latency.c
new file mode 100644
index 000..a666b6c
--- /dev/null
+++ b/libmultipath/prioritizers/path_latency.c
@@ -0,0 +1,271 @@
+/*
+ * (C) Copyright HUAWEI Technology Corp. 2017, 2021   All Rights Reserved.
+ *
+ * main.c
+ *
+ * Prioritizer for device mapper multipath, where the corresponding priority
+ * values of specific paths are provided by a latency algorithm. And the
+ * latency algorithm is dependent on arguments.
+ *
+ * The principle of the algorithm as follows:
+ * 1. By sending a certain number "io_num" of read IOs to the current path
+ *continuously, the IOs' average latency can be calculated.
+ * 2. According to the average latency of each path and the weight value
+ *"latency_interval", the priority "rc" of each path can be provided.
+ *
+ * Author(s): Yang Feng <philip.y...@huawei.com>
+ *Zou Ming <zouming.zoum...@huawei.com>
+ *
+ * This file is released under the GPL.
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include "debug.h"
+#include "prio.h"
+#include "structs.h"
+#include "../checkers/libsg.h"
+
+#define THRES_USEC_VALUE12000LL/*unit: us, =120s*/
+
+#define MAX_IO_NUM  200
+#define MIN_IO_NUM  10
+
+#define MAX_LATENCY_INTERVAL60/*unit: s*/
+#define MIN_LATENCY_INTERVAL1   

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a latency algorithm

2017-05-23 Thread Yang Feng
/prioritizers/delayedpath.h
>> new file mode 100644
>> index 000..d8213e9
>> --- /dev/null
>> +++ b/libmultipath/prioritizers/delayedpath.h
>> @@ -0,0 +1,17 @@
>> +#ifndef _DELAYEDPATH_H
>> +#define _DELAYEDPATH_H
>> +
>> +#define PRIO_DELAYED_PATH "delayedpath"
> 
> In order for the rest of the code to refer to this prioritizer, this
> define should be in prio.h with the other prioritizer names, and as long
> as delayedpath.c includes prio.h, there's no need to put it in
> delayedpath.h.
OK, as the following patch.
> 
>> +
>> +#define PRIO_NO_INFORMATION 5
> 
> The rest of the multipath code only cares if getprio returns a negative
> number of not. It doesn't check what the specific negative number is.  I
> realize the the alua prioritizer returns a set of error codes, but they
> aren't used, or even usable in their present form. If we wanted to have
> better error reporting, we should set up a universal set of error codes
> in prio.h, and have all prioritizers use them, instead of having each
> prioritizer define its own error codes. There's no reason why your
> prioritizer needs to return this error code instead of -1.
OK, as the following patch.
> 
>> +
>> +#define USEC_PER_SEC  100LL
>> +#define USEC_PER_MSEC 1000LL
>> +#define USEC_PER_USEC 1LL
>> +
>> +static inline long long timeval_to_us(const struct timespec *tv)
>> +{
>> +return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec >> 10);
>> +}
> 
> No other file besides delayedpath.c will likely be including this .h
> file, so I don't see any purpose for these being defined here.  In fact,
> I don't see why you can't just have a .c file without a .h file like the
> majority of prioritizers.  I'm pretty sure that none of the prioritizers
> really need their own .h file.
OK, as the following patch.
> 
>> +#endif
>> diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
>> index 5939688..f1e126e 100644
>> --- a/multipath/multipath.conf.5
>> +++ b/multipath/multipath.conf.5
>> @@ -293,6 +293,10 @@ Generate a random priority between 1 and 10.
>>  Generate the path priority based on the regular expression and the
>>  priority provided as argument. Requires prio_args keyword.
>>  .TP
>> +.I delayedpath
>> +Generate the path priority based on a time-delay algorithm.
>> +Requires prio_args keyword.
> 
> Really it doesn't require prio_args if you want to use the default
> values, and should probably say so.
The default args is discarded, as the following patch.

>> +.I delayed
>> +Needs a value of the form
>> +\fI"<delay_interval|cons_num>"\fR
>> +.RS
>> +.TP 8
>> +.I delay_interval
>> +The interval values of average IO-time-delay between two different 
>> neighbour ranks of path priority, used to partition different priority ranks.
>> +Form: XXs, or XXXus, or XXXms. Unit: Second, or Microsecond, or 
>> Millisecond. Valid Values: Integer, s [1, 60], ms [1, 1000), us [1, 1000),
>> +For example: 10s, or 100us, or 100ms. The default is: 10ms.
>> +.TP
>> +.I cons_num
>> +The number of read IOs sent to the current path continuously, used to 
>> calculate the average IO-time-delay. Valid Values: Integer, [3, 1000].
>> +For example: 30. The default is: 20.
>> +.RE
>> +.TP 12
> 
> Looking at the "weighted" prio_args definition just above your "delayed"
> definition, the pipe character "|" is being used to say that any of a
> set of options is allowed.  Your definition has it being a literal
> character, but it's still inside the angle brackets that usually
> delineate a variable.  perhaps "|" would be
> easier to understand, or even "[delayed_interval]|[io_num]" if you can
> omit these to use the defaults.
OK, as the following patch.
The up-to-date patch as follows:

---
>From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001
From: Yang Feng <philip.y...@huawei.com>
Date: Fri, 19 May 2017 16:09:07 +0800
Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper 
multipath, where the corresponding priority
values of specific paths are provided by a latency algorithm. And the latency 
algorithm is dependent on the following
arguments(latency_interval and io_num). The principle of the algorithm is 
illustrated as follows:
1. By sending a certain number "cons_num" of read IOs to the current path 
continuously, the IOs' average latency can be calculated.
2. According to the average latency of each path and the weight value 
"latency_interval", the priority "rc&q

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm

2017-05-23 Thread Yang Feng
Hello Benjamin,

How about this patch?
Look forward your replies.
Thank you very much.
Best regards!





On 2017/5/19 17:45, Yang Feng wrote:
> Hi Benjamin,
> 
> Thank you very much for your comments.
> Please find my replys and the up-to-date patch.
> Best regards!
> 
>>
>> First, one overall question. We have dynamic path selectors available to
>> deal with paths that are just simply slower that other paths, but can
>> still be used together.  Is there specific hardware or a specific setup
>> where this isn't good enough and we really need to seperate these paths
>> into different pathgroups, but we can't find out deterministically how
>> the groups should be set up?  It just seems like there could be a less
>> hacky solution to this problem, but perhaps there are some situations
>> where this is truly the best option. I'm just wondering what those are.1. In 
>> the Storage-Backup environment of HyperCluster,includs one storage array near
> to the host and one remote storage array, and the two storage arrays have the 
> same hardware.
> The same LUN is writed or readed by the two storage arrays.
> However, usually, the average latency of the paths of the remote storage 
> array is much higher than the
> near storage array's.
> apparently, the prioritizer can be a good automatic solution.
> And the current selectors don't solve it, IOs will send to the paths of the 
> remote storage array, IOPS will be influenced unavoidably.
> 2. In the environment of single storage array, the prioritizer can 
> automatically separate the paths who's latency is much higher,
> IOs will not send to this paths.
> But the current selectors don't solve this problem, IOPS will be influenced 
> unavoidably.
> 
>>> +
>>> +/* interval_unit_str and interval_unit_type keep the same assignment 
>>> sequence */
>>> +static const char interval_unit_str[][MAX_CHAR_SIZE] = {
>>> +CHAR_USEC, CHAR_MSEC, CHAR_SEC
>>
>> This is a nit, but for constant strings, could you please use "char
>> *var" instead of "char var[]", to be consistent with the rest of the
>> multipath code.
> Thanks, as the following patch.
> 
>>> +if ((args == NULL) || (interval == NULL)
>>> +|| (consnum == NULL) || (type == NULL))
>>> +return 0;
>>> +
>>> +/* int type */
>>> +if ((size < 1) || (size > MAX_CHAR_SIZE-1))
>>> +return 0;
>>
>> You should probably have log messages for these error returns.
> Thanks, as the following patch.
> 
>>> +
>>> +memcpy(source, args, size+1);
>>> +if (strstr(source, vertica) == NULL)
>>> +return 0;
>>> +
>>> +*type = get_interval_type(source, typestr);
>>> +if (*type == INTERVAL_INVALID)
>>> +{
>>> +condlog(0, "delay_interval type is invalid");
>>> +return 0;
>>> +}
>>
>> I'm confused here. How do you get to use the default interval. Shouldn't
>> you accept "20s|" and "|30" and as valid inputs that use the defaults
>> for the part they don't specify. 
> OK,the default arguments value is removed. If get inputs failed, return 
> default priority "0".
> As the following patch.
>>
>>> +tokenbefore = strtok(source, vertica);
>>> +tokenafter = strtok(NULL, vertica);
>>> +typestr[1] = '\0';
>>> +tokenbefore = strtok(tokenbefore, typestr);
>>> +if ((tokenbefore == NULL) || (tokenafter == NULL))
>>> +return 0;
>>> +
>>> +tmp = tokenbefore;
>>> +while (*tmp != '\0')
>>> +if (!isdigit(*tmp++))
>>> +{
>>> +condlog(0, "delay_interval string include invalid char");
>>> +return 0;
>>> +}
>>> +
>>> +tmp = tokenafter;
>>> +while (*tmp != '\0')
>>> +if (!isdigit(*tmp++))
>>> +{
>>> +condlog(0, "cons_num string include invalid char");
>>> +return 0;
>>> +}
>>> +
>>> +*interval = atoi(tokenbefore);
>>
>> Why do you keep track of the type and the interval seperately? Can't you
>> just find out the type, and use that to multiply the interval once you
>> read it, and then just use that value, instead of keeping track of two
>> values across multiple functions?
> Thanks, as the following patch.
> 
>>> +
>>> +   if (pp->fd < 0)
>>> +   return -PRIO_

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a latency algorithm

2017-05-23 Thread Yang Feng
Hello Martin,

How about this patch?
Look forward your replies.
Thank you very much.
Best regards!




On 2017/5/19 16:43, Yang Feng wrote:
> Hello Martin,
> 
> Firstly, thank you very much for your comments.
> And find my replys and the up-to-date patch.
> 
> Best regards!
> 
> 
>> Please think about the name once again. Maybe you should call it
>> "io_latency" or "path_latency" instead of "delayedpath"?
> OK, as the following patch.
> 
>>
>> Hm, I can't see a lot of difference in the parsing code wrt the
>> previous version. IMO it's still non-straightforward and hard to
>> comprehend. Maybe I didn't express myself clearly enough. Here is how
>> I'd code this:
>>
>>  1. Verify that the string starts with a digit. Error if it does not.
>>  2. Parse the delay interval using strtoul().
>>  3. The "end" pointer of strtoul() points to the unit, which has to be
>> "s", "ms" or "us". Verify, and set the unit accordingly.
>>  4. Verify that the next character is '|', and that it's followed by a
>> digit.
>>  5. Parse the number with strtoul()
>>  6. Verify that there's no garbage at the end of the string. 
> Thank you , as the following patch.
> 
>>
>> Please follow the https://en.wikipedia.org/wiki/Robustness_principle.
>> If a user enters "1500ms" here, the parsing will silently fail, and
>> with it the whole prio algorithm. This will cause user confusion.
>> Please don't do this
> Thank you , as the following patch.
> 
>>
>> However please consider lowering the upper bound, I kind of doubt that
>> 1000 IOs will finish quickly. More often than not, a lot of paths will
>> appear at the same time (e.g. if a port of a storage array is enabled)
>> and we'll have to send 1000 IOs to each one.
>>
> OK, the upper bound lower to 200, as the following patch.
> 
>>> +while (temp-- > 0)
>>> +{
>>> +(void)clock_gettime(CLOCK_MONOTONIC, );
>>> +before = timeval_to_us();   
>>> +
>>> +if (do_readsector0(pp->fd, timeout) == 2)
>>> +{
>>> +condlog(0, "%s: path down", pp->dev);
>>> +return -1;
>>> +}
>>> +
>>> +(void)clock_gettime(CLOCK_MONOTONIC, );
>>> +after = timeval_to_us();
>>> +
>>> +delay = after - before;
>>> +   
>>> +min = (min <= delay) ? min : delay;
>>> +max = (max >= delay) ? max : delay;
>>> +
>>> +toldelay += delay;
>>> +}
>>> +
>>> +toldelay -= min + max;
>>
>> Why are you doing this? If you want to discard "extreme" values, this
>> is probably not sufficient. If cons == 3, this will have the effect to
>> use a single measurement rather than an average, is that intended?
>>
>> Btw, as you are doing statistics here anyway, you may want to calculate
>> the estimate of the standard deviation and warn the user if the
>> delay_interval is smaller than, say, 2 * standard deviation.
>>
>> Please consider printing a message with the measured value at debug
>> level 3 or higher.
> OK, as the following patch.
> 
>>>  "%N:%R:%n:%r"\fR. For example: 0x200100e08ba0aea0:0x210100e08ba0aea0:.*:.* 
>>> , .*:.*:iqn.2009-10.com.redhat.msp.lab.ask-06:.*
>>>  .RE
>>>  .TP 12
>>> +.I delayed
>>
>> should be "delayedpath" here?
> OK, as the following patch.
>>
>>> +Needs a value of the form
>>> +\fI"<delay_interval|cons_num>"\fR
>>> +.RS
>>> +.TP 8
>>> +.I delay_interval
>>> +The interval values of average IO-time-delay between two different
>>> neighbour ranks of path priority, used to partition different
>>> priority ranks.
>>
>> It might be good to give an example here, like this:
>>
>> "If delay_interval=10ms, the paths will be grouped in priority groups
>> with path latency 0-10ms, 10-20ms, 20-30ms, etc." 
> OK, as the following patch.>
> 
> ---
>>From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001
> From: Yang Feng <philip.y...@huawei.com>
> Date: Fri, 19 May 2017 16:09:07 +0800
> Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper 
> multipath, where the corresponding priority
> values of specific paths are provided by a latency algorithm. And the latency 
> algorithm is dependent on the

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm

2017-05-22 Thread Yang Feng
Hello Benjamin and Christophe,

How about this patch?
Thanks a lot.
Best.



On 2017/5/19 17:45, Yang Feng wrote:
> Hi Benjamin,
> 
> Thank you very much for your comments.
> Please find my replys and the up-to-date patch.
> Best regards!
> 
>>
>> First, one overall question. We have dynamic path selectors available to
>> deal with paths that are just simply slower that other paths, but can
>> still be used together.  Is there specific hardware or a specific setup
>> where this isn't good enough and we really need to seperate these paths
>> into different pathgroups, but we can't find out deterministically how
>> the groups should be set up?  It just seems like there could be a less
>> hacky solution to this problem, but perhaps there are some situations
>> where this is truly the best option. I'm just wondering what those are.1. In 
>> the Storage-Backup environment of HyperCluster,includs one storage array near
> to the host and one remote storage array, and the two storage arrays have the 
> same hardware.
> The same LUN is writed or readed by the two storage arrays.
> However, usually, the average latency of the paths of the remote storage 
> array is much higher than the
> near storage array's.
> apparently, the prioritizer can be a good automatic solution.
> And the current selectors don't solve it, IOs will send to the paths of the 
> remote storage array, IOPS will be influenced unavoidably.
> 2. In the environment of single storage array, the prioritizer can 
> automatically separate the paths who's latency is much higher,
> IOs will not send to this paths.
> But the current selectors don't solve this problem, IOPS will be influenced 
> unavoidably.
> 
>>> +
>>> +/* interval_unit_str and interval_unit_type keep the same assignment 
>>> sequence */
>>> +static const char interval_unit_str[][MAX_CHAR_SIZE] = {
>>> +CHAR_USEC, CHAR_MSEC, CHAR_SEC
>>
>> This is a nit, but for constant strings, could you please use "char
>> *var" instead of "char var[]", to be consistent with the rest of the
>> multipath code.
> Thanks, as the following patch.
> 
>>> +if ((args == NULL) || (interval == NULL)
>>> +|| (consnum == NULL) || (type == NULL))
>>> +return 0;
>>> +
>>> +/* int type */
>>> +if ((size < 1) || (size > MAX_CHAR_SIZE-1))
>>> +return 0;
>>
>> You should probably have log messages for these error returns.
> Thanks, as the following patch.
> 
>>> +
>>> +memcpy(source, args, size+1);
>>> +if (strstr(source, vertica) == NULL)
>>> +return 0;
>>> +
>>> +*type = get_interval_type(source, typestr);
>>> +if (*type == INTERVAL_INVALID)
>>> +{
>>> +condlog(0, "delay_interval type is invalid");
>>> +return 0;
>>> +}
>>
>> I'm confused here. How do you get to use the default interval. Shouldn't
>> you accept "20s|" and "|30" and as valid inputs that use the defaults
>> for the part they don't specify. 
> OK,the default arguments value is removed. If get inputs failed, return 
> default priority "0".
> As the following patch.
>>
>>> +tokenbefore = strtok(source, vertica);
>>> +tokenafter = strtok(NULL, vertica);
>>> +typestr[1] = '\0';
>>> +tokenbefore = strtok(tokenbefore, typestr);
>>> +if ((tokenbefore == NULL) || (tokenafter == NULL))
>>> +return 0;
>>> +
>>> +tmp = tokenbefore;
>>> +while (*tmp != '\0')
>>> +if (!isdigit(*tmp++))
>>> +{
>>> +condlog(0, "delay_interval string include invalid char");
>>> +return 0;
>>> +}
>>> +
>>> +tmp = tokenafter;
>>> +while (*tmp != '\0')
>>> +if (!isdigit(*tmp++))
>>> +{
>>> +condlog(0, "cons_num string include invalid char");
>>> +return 0;
>>> +}
>>> +
>>> +*interval = atoi(tokenbefore);
>>
>> Why do you keep track of the type and the interval seperately? Can't you
>> just find out the type, and use that to multiply the interval once you
>> read it, and then just use that value, instead of keeping track of two
>> values across multiple functions?
> Thanks, as the following patch.
> 
>>> +
>>> +   if (pp->fd < 0)
>>> +   return -PRIO_NO_INFORMATION;
>>

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm

2017-05-19 Thread Yang Feng
Hello Martin,

Firstly, thank you very much for your comments.
And find my replys and the up-to-date patch.

Best regards!


> Please think about the name once again. Maybe you should call it
> "io_latency" or "path_latency" instead of "delayedpath"?
OK, as the following patch.

> 
> Hm, I can't see a lot of difference in the parsing code wrt the
> previous version. IMO it's still non-straightforward and hard to
> comprehend. Maybe I didn't express myself clearly enough. Here is how
> I'd code this:
> 
>  1. Verify that the string starts with a digit. Error if it does not.
>  2. Parse the delay interval using strtoul().
>  3. The "end" pointer of strtoul() points to the unit, which has to be
> "s", "ms" or "us". Verify, and set the unit accordingly.
>  4. Verify that the next character is '|', and that it's followed by a
> digit.
>  5. Parse the number with strtoul()
>  6. Verify that there's no garbage at the end of the string. 
Thank you , as the following patch.

> 
> Please follow the https://en.wikipedia.org/wiki/Robustness_principle.
> If a user enters "1500ms" here, the parsing will silently fail, and
> with it the whole prio algorithm. This will cause user confusion.
> Please don't do this
Thank you , as the following patch.

> 
> However please consider lowering the upper bound, I kind of doubt that
> 1000 IOs will finish quickly. More often than not, a lot of paths will
> appear at the same time (e.g. if a port of a storage array is enabled)
> and we'll have to send 1000 IOs to each one.
> 
OK, the upper bound lower to 200, as the following patch.

>> +while (temp-- > 0)
>> +{
>> +(void)clock_gettime(CLOCK_MONOTONIC, );
>> +before = timeval_to_us();
>> +
>> +if (do_readsector0(pp->fd, timeout) == 2)
>> +{
>> +condlog(0, "%s: path down", pp->dev);
>> +return -1;
>> +}
>> +
>> +(void)clock_gettime(CLOCK_MONOTONIC, );
>> +after = timeval_to_us();
>> +
>> +delay = after - before;
>> +
>> +min = (min <= delay) ? min : delay;
>> +max = (max >= delay) ? max : delay;
>> +
>> +toldelay += delay;
>> +}
>> +
>> +toldelay -= min + max;
> 
> Why are you doing this? If you want to discard "extreme" values, this
> is probably not sufficient. If cons == 3, this will have the effect to
> use a single measurement rather than an average, is that intended?
> 
> Btw, as you are doing statistics here anyway, you may want to calculate
> the estimate of the standard deviation and warn the user if the
> delay_interval is smaller than, say, 2 * standard deviation.
> 
> Please consider printing a message with the measured value at debug
> level 3 or higher.
OK, as the following patch.

>>  "%N:%R:%n:%r"\fR. For example: 0x200100e08ba0aea0:0x210100e08ba0aea0:.*:.* 
>> , .*:.*:iqn.2009-10.com.redhat.msp.lab.ask-06:.*
>>  .RE
>>  .TP 12
>> +.I delayed
> 
> should be "delayedpath" here?
OK, as the following patch.
> 
>> +Needs a value of the form
>> +\fI"<delay_interval|cons_num>"\fR
>> +.RS
>> +.TP 8
>> +.I delay_interval
>> +The interval values of average IO-time-delay between two different
>> neighbour ranks of path priority, used to partition different
>> priority ranks.
> 
> It might be good to give an example here, like this:
> 
> "If delay_interval=10ms, the paths will be grouped in priority groups
> with path latency 0-10ms, 10-20ms, 20-30ms, etc." 
OK, as the following patch.>

---
>From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001
From: Yang Feng <philip.y...@huawei.com>
Date: Fri, 19 May 2017 16:09:07 +0800
Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper 
multipath, where the corresponding priority
values of specific paths are provided by a latency algorithm. And the latency 
algorithm is dependent on the following
arguments(latency_interval and io_num). The principle of the algorithm is 
illustrated as follows:
1. By sending a certain number "cons_num" of read IOs to the current path 
continuously, the IOs' average latency can be calculated.
2. According to the average latency of each path and the weight value 
"latency_interval", the priority "rc" of each path can be provided.

   latency_interval   latency_interval   latency_interval   
latency_interval
 
|--|--|--|...|--

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm

2017-05-16 Thread Yang Feng
Hello Martin,

How about this patch?
Thanks a lot,
Best regards.


On 2017/5/15 18:44, Yang Feng wrote:
> Hello Martin,
> 
> Thank you very much for your remarks. I am sorry for late reply, Please find 
> my answer and the updated patch.
> 
>> Hello Yang,
>>
>> thank you for your work. Please find my remarks below.
>>
>> On Mon, 2017-05-08 at 11:58 +0800, Yang Feng wrote:
>>> Prioritizer for device mapper multipath, where the corresponding
>>> priority
>>> values of specific paths are provided by a time-delay algorithm. And
>>> the
>>> time-delay algorithm is dependent on the following
>>> arguments(delay_interval,
>>> cons_num).
>>> The principle of the algorithm is illustrated as follows:
>>> 1. By sending a certain number "cons_num" of read IOs to the current
>>> path
>>>continuously, the IOs' average delay can be calculated.
>>> 2. According to the average delay of each path and the weight value
>>>"delay_interval", the priority "rc" of each path can be provided.
>>>
>>>  delay_interval  delay_interval  delay_interval   delay_inter
>>
>> How does this algorithm behave under load? Can we be sure that
>> priorities don't start to fluctuate wildly because busy paths will
>> usually have longer latencies than idle ones?
> I have a lot of test under load. When the appropriate value of argument 
> "delay_interval" is set,
> this algorithm behave well and can separate the paths who's average delay is 
> more than others.
> When add a new path or the path's state change from down to up, getprio() of 
> the prioritizer is triggered, and
> the current path is not under IOs.
> 
>>>  libmultipath/Makefile   |   2 +-
>>>  libmultipath/checkers/Makefile  |   7 +-
>>>  libmultipath/checkers/emc_clariion.c|   2 +-
>>>  libmultipath/checkers/libsg.c   |  94 
>>>  libmultipath/checkers/libsg.h   |   9 --
>>>  libmultipath/checkers/readsector0.c |   2 +-
>>>  libmultipath/libsg.c|  94 
>>>  libmultipath/libsg.h|   9 ++
>>>  libmultipath/prioritizers/Makefile  |   6 +-
>>>  libmultipath/prioritizers/delayedpath.c | 246 
>>
>> Why do you have to move libsg for this? It's already used by various
>> checkers, why can't your checker do the same? If you really need to do
>> it, you should at least separate that part of the patch from the added
>> code.
> OK, this time, libsg will not be moved.
> 
>>> +
>>> +#define CHAR_SEC"SEC"
>>> +#define CHAR_MSEC   "MSEC"
>>> +#define CHAR_USEC   "USEC"
>>
>> I suggest to use "s", "ms", and "us" here instead.
> OK, as the following patch.
> 
>> If you create an array of "const char*" instead like you did for
>> conversion_ratio below, you could implement get_interval_type() more
>> elegantly using a loop over that array.
> OK, as the following patch.
> 
>>> +static int get_interval_type(char *source, char *type)
>>> +{  
>>> +/*is USEC*/
>>> +if ((strstr(source, CHAR_USEC) != NULL)
>>> +&& (strstr(source, CHAR_USEC)[4] == '_'))
>>
>> Please avoid these double strstr() invocation. The compiler may
>> optimize it away, but it just looks strange. The following would 
>> look better to me, and I find it actually more readable:
>>
>> if (((p = strstr(source, CHAR_USEC)) != NULL) && p[4] == '_')
> OK, as the following patch.
> 
>>> +static int get_string_from_under(char *args,
>>> +char *beforestring,
>>> +char *afterstring,
>>> +int *type)
>>
>> Maybe you could figure out a more descriptive name for this function?
>>
>> A comment in the code showing how the string to be parsed typically
>> looks like would be helpful for the reader.
> OK, as the following patch.
> 
>>> +token = strtok_r(source, under, );
>>> +token = strtok(token, char_type);
>>
>> I'm pretty sure this is is not what you intended to write. If char_type
>> is "usec", this would split the string at the possible delimiters 'u',
>> 's', 'e', and 'c' (the 2nd argument of strtok(3) is not a sequence, but
>> a 'set' of bytes). It might accidentally work w

Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm

2017-05-15 Thread Yang Feng
Hello Martin,

Thank you very much for your remarks. I am sorry for late reply, Please find my 
answer and the updated patch.

> Hello Yang,
> 
> thank you for your work. Please find my remarks below.
> 
> On Mon, 2017-05-08 at 11:58 +0800, Yang Feng wrote:
>> Prioritizer for device mapper multipath, where the corresponding
>> priority
>> values of specific paths are provided by a time-delay algorithm. And
>> the
>> time-delay algorithm is dependent on the following
>> arguments(delay_interval,
>> cons_num).
>> The principle of the algorithm is illustrated as follows:
>> 1. By sending a certain number "cons_num" of read IOs to the current
>> path
>>continuously, the IOs' average delay can be calculated.
>> 2. According to the average delay of each path and the weight value
>>"delay_interval", the priority "rc" of each path can be provided.
>>
>>  delay_interval  delay_interval  delay_interval   delay_inter
> 
> How does this algorithm behave under load? Can we be sure that
> priorities don't start to fluctuate wildly because busy paths will
> usually have longer latencies than idle ones?
I have a lot of test under load. When the appropriate value of argument 
"delay_interval" is set,
this algorithm behave well and can separate the paths who's average delay is 
more than others.
When add a new path or the path's state change from down to up, getprio() of 
the prioritizer is triggered, and
the current path is not under IOs.

>>  libmultipath/Makefile   |   2 +-
>>  libmultipath/checkers/Makefile  |   7 +-
>>  libmultipath/checkers/emc_clariion.c|   2 +-
>>  libmultipath/checkers/libsg.c   |  94 
>>  libmultipath/checkers/libsg.h   |   9 --
>>  libmultipath/checkers/readsector0.c |   2 +-
>>  libmultipath/libsg.c|  94 
>>  libmultipath/libsg.h|   9 ++
>>  libmultipath/prioritizers/Makefile  |   6 +-
>>  libmultipath/prioritizers/delayedpath.c | 246 
> 
> Why do you have to move libsg for this? It's already used by various
> checkers, why can't your checker do the same? If you really need to do
> it, you should at least separate that part of the patch from the added
> code.
OK, this time, libsg will not be moved.

>> +
>> +#define CHAR_SEC"SEC"
>> +#define CHAR_MSEC   "MSEC"
>> +#define CHAR_USEC   "USEC"
> 
> I suggest to use "s", "ms", and "us" here instead.
OK, as the following patch.

> If you create an array of "const char*" instead like you did for
> conversion_ratio below, you could implement get_interval_type() more
> elegantly using a loop over that array.
OK, as the following patch.

>> +static int get_interval_type(char *source, char *type)
>> +{  
>> +/*is USEC*/
>> +if ((strstr(source, CHAR_USEC) != NULL)
>> +&& (strstr(source, CHAR_USEC)[4] == '_'))
> 
> Please avoid these double strstr() invocation. The compiler may
> optimize it away, but it just looks strange. The following would 
> look better to me, and I find it actually more readable:
> 
> if (((p = strstr(source, CHAR_USEC)) != NULL) && p[4] == '_')
OK, as the following patch.

>> +static int get_string_from_under(char *args,
>> +char *beforestring,
>> +char *afterstring,
>> +int *type)
> 
> Maybe you could figure out a more descriptive name for this function?
> 
> A comment in the code showing how the string to be parsed typically
> looks like would be helpful for the reader.
OK, as the following patch.

>> +token = strtok_r(source, under, );
>> +token = strtok(token, char_type);
> 
> I'm pretty sure this is is not what you intended to write. If char_type
> is "usec", this would split the string at the possible delimiters 'u',
> 's', 'e', and 'c' (the 2nd argument of strtok(3) is not a sequence, but
> a 'set' of bytes). It might accidentally work with the input strings
> you are using (in particular because you only look at the first token),
> but nevertheless it's wrong.
OK, as the following patch.

>> +if ((token == NULL) || (saveptr == NULL))
>> +return 0;
>> +
>> +tmp = token;
>> +while (*tmp != '\0')
>> +if (!isdigit(*tmp++))
>> +return 0;
>> +
>> +tmp = saveptr;
>> +while (*tmp != '\0')
>> +if (!isdigit(*tm

[dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm

2017-05-07 Thread Yang Feng
errno));
+
+   if (res < 0) {
+   if (ENOMEM == errno) {
+   return PATH_UP;
+   }
+   return PATH_DOWN;
+   }
+
+   if ((0 == io_hdr.status) &&
+   (0 == io_hdr.host_status) &&
+   (0 == io_hdr.driver_status)) {
+   return PATH_UP;
+   } else {
+   int key = 0;
+
+   if (io_hdr.sb_len_wr > 3) {
+   if (sbb[0] == 0x72 || sbb[0] == 0x73)
+   key = sbb[1] & 0x0f;
+   else if (io_hdr.sb_len_wr > 13 &&
+((sbb[0] & 0x7f) == 0x70 ||
+ (sbb[0] & 0x7f) == 0x71))
+   key = sbb[2] & 0x0f;
+   }
+
+   /*
+* Retry if UNIT_ATTENTION check condition.
+*/
+   if (key == 0x6) {
+   if (--retry_count)
+   goto retry;
+   }
+   return PATH_DOWN;
+   }
+}
diff --git a/libmultipath/libsg.h b/libmultipath/libsg.h
new file mode 100644
index 000..3994f45
--- /dev/null
+++ b/libmultipath/libsg.h
@@ -0,0 +1,9 @@
+#ifndef _LIBSG_H
+#define _LIBSG_H
+
+#define SENSE_BUFF_LEN 32
+
+int sg_read (int sg_fd, unsigned char * buff, int buff_len,
+unsigned char * sense, int sense_len, unsigned int timeout);
+
+#endif /* _LIBSG_H */
diff --git a/libmultipath/prioritizers/Makefile 
b/libmultipath/prioritizers/Makefile
index 36b42e4..7e3da51 100644
--- a/libmultipath/prioritizers/Makefile
+++ b/libmultipath/prioritizers/Makefile
@@ -18,13 +18,17 @@ LIBS = \
libpriorandom.so \
libpriordac.so \
libprioweightedpath.so \
-   libpriosysfs.so
+   libpriodelayedpath.so \
+   libpriosysfs.so 
 
 all: $(LIBS)
 
 libprioalua.so: alua.o alua_rtpg.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
+libpriodelayedpath.so: delayedpath.o  ../libsg.o
+   $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
+
 libprio%.so: %.o
$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^
 
diff --git a/libmultipath/prioritizers/delayedpath.c 
b/libmultipath/prioritizers/delayedpath.c
new file mode 100644
index 000..4c1cfea
--- /dev/null
+++ b/libmultipath/prioritizers/delayedpath.c
@@ -0,0 +1,246 @@
+/*
+ * (C) Copyright HUAWEI Technology Corp. 2017, 2021   All Rights Reserved.
+ *
+ * main.c
+ *
+ * Prioritizer for device mapper multipath, where the corresponding priority 
+ * values of specific paths are provided by a time-delay algorithm. And the
+ * time-delay algorithm is dependent on arguments.
+ * 
+ * The principle of the algorithm as follows: 
+ * 1. By sending a certain number "cons_num" of read IOs to the current path 
+ *continuously, the IOs' average delay can be calculated. 
+ * 2. According to the average delay of each path and the weight value 
+ *"delay_interval", the priority "rc" of each path can be provided. 
+ *
+ * Author(s): Yang Feng <philip.y...@huawei.com>
+ *Zou Ming <zouming.zoum...@huawei.com>
+ *
+ * This file is released under the GPL.
+ */
+#include 
+#include 
+#include 
+
+#include "debug.h"
+#include "prio.h"
+#include "structs.h"
+#include "../libmultipath/libsg.h"
+
+#include "delayedpath.h"
+
+#define THRES_USEC_VALUE3LL/*USEC, 300SEC*/
+#define DEFAULT_DELAY_INTERVAL  10 /*MSEC*/
+#define DEFAULT_CONS_NUM20
+
+#define MAX_CHAR_SIZE   30
+
+#define CHAR_SEC"SEC"
+#define CHAR_MSEC   "MSEC"
+#define CHAR_USEC   "USEC"
+
+enum interval_type {
+INTERVAL_SEC,
+INTERVAL_MSEC,
+INTERVAL_USEC,
+INTERVAL_INVALID
+};
+
+static int conversion_ratio[] = {
+   [INTERVAL_SEC]  = USEC_PER_SEC,
+   [INTERVAL_MSEC] = USEC_PER_MSEC,
+   [INTERVAL_USEC] = USEC_PER_USEC,
+   [INTERVAL_INVALID]  = 0,
+};
+
+
+static int do_readsector0(int fd, unsigned int timeout)
+{
+   unsigned char buf[4096];
+   unsigned char sbuf[SENSE_BUFF_LEN];
+   int ret;
+
+   ret = sg_read(fd, [0], 4096, [0],
+ SENSE_BUFF_LEN, timeout);
+
+   return ret;
+}
+
+static int get_interval_type(char *source, char *type)
+{  
+/*is USEC*/
+if ((strstr(source, CHAR_USEC) != NULL)
+&& (strstr(source, CHAR_USEC)[4] == '_'))
+{
+memcpy(type, CHAR_USEC, strlen(CHAR_USEC)+1);
+return INTERVAL_USEC;
+}
+
+/*is MSEC*/
+if ((strstr(source, CHAR_MSEC) != NULL) 
+&& (strstr(source, CHAR_MSEC)[4] == '_'))
+{
+memcpy(type, CHAR_MSEC, strlen(CHAR_MSEC)+1);
+return INTERVAL_MSEC;
+}
+
+/*is SEC*/
+   if ((strstr(source, CHAR_SEC) != NULL)
+