Re: [dm-devel] [PATCH 2/2] multipath-tools/libmultipath: Add args min_avg_latency for path_latency.
Dear Martin, Thanks, I got it. Regards, -Yang On 2017/7/21 20:26, Martin Wilck wrote: > Dear Yang, > > On Fri, 2017-07-21 at 11:38 +0800, Yang Feng wrote: >> >> Sorry, I don't find this similar syntax. Could you give a example of >> the >> other prioritizer. > > Please look at the prio_args section in multipath.conf. "weighted" uses > just space separated args. "iet" uses "parm=value" syntax. That's > actually most intuitive IMO. No current prio algorithm uses anything > but space to separate options. > > Regards > Martin > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 2/2] multipath-tools/libmultipath: Add args min_avg_latency for path_latency.
Dear Martin, Thank you very much for your guidance and reviews. Please find my replys as follows. The up-to-date patch will be sent later. Regards, -Yang On 2017/7/21 2:07, Martin Wilck wrote: > Dear Yang, > > On Thu, 2017-07-20 at 11:36 +0800, Yang Feng wrote: >> Add args min_avg_latency of logarithmic scale, for >> prioritizers/path_latency.c. >> Min average latency is not constant 1us, and is set by user. >> Certainly, max average >> latency value is still 100s. It make support better for more scenes, >> because it can >> deal better with the normal standard deviation of path latency. For >> example, when the >> standard deviation value is 200us and the average latency of the >> normal paths is 1ms, >> args base_num can be set to 5 and args min_avg_latency can be set to >> 2ms, so the paths >> will be grouped in priority groups with path latency <=2ms, (2ms, >> 10ms], (10ms, 50ms], >> etc. > > Nack. You need this because you are using a wrong calculation for > standard deviation. If your scale is logarithmic, you need to calculate > the standard deviation on a log scale, too. The scientific term is > "geometric standard deviation". > > https://en.wikipedia.org/wiki/Geometric_standard_deviation > > Suppose you really have 3 classes of devices with us, ms, and seconds > average latency, respectively. The latency of the devices in the > "seconds" class will also vary on the order of seconds (e.g. 3.5-5s). > That's obviously impossible for the us and ms class devices. Rather, > it's reasonable to assume that the us devices will have us (or sub-us) > standard deviation and the ms devices have ms (or sub-ms) standard > deviation, or in other words, the uncertainty is roughly in the same > order of magnitude as the average. >> This is exactly how the geometric standard deviation behaves. > > I think I actually remarked this on your previous patch, but the patch > has been applied without that having been fixed. It would be good if > you could fix it now. > Sorry, there is a mistake. I will fix it later. >> >> Signed-off-by: Yang Feng <philip.y...@huawei.com> >> Reviewed-by: Martin Wilck <mwi...@suse.com> > > Please don't add my Reviewed-by: tag for patches I haven't reviewed, or > have negatively reviewed. Reviewed-by: is supposed to indicate > approval. > Sorry, it will fixed. > Wrt your prio_args parser, could you perhaps support a syntax that is > similar to other prioritizers, e.g. "base_num=5 io_num=10"? > Sorry, I don't find this similar syntax. Could you give a example of the other prioritizer. > Regards > Martin > > >> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com> >> --- >> libmultipath/prioritizers/path_latency.c | 61 >> ++-- >> multipath/multipath.conf.5 | 14 +--- >> 2 files changed, 52 insertions(+), 23 deletions(-) >> >> diff --git a/libmultipath/prioritizers/path_latency.c >> b/libmultipath/prioritizers/path_latency.c >> index 34b734b..a71faff 100644 >> --- a/libmultipath/prioritizers/path_latency.c >> +++ b/libmultipath/prioritizers/path_latency.c >> @@ -66,7 +66,7 @@ static int do_readsector0(int fd, unsigned int >> timeout) >> return ret; >> } >> >> -int check_args_valid(int io_num, int base_num) >> +int check_args_valid(int io_num, int base_num, int min_avg_latency) >> { >> if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) >> { >> @@ -80,24 +80,33 @@ int check_args_valid(int io_num, int base_num) >> return 0; >> } >> >> +if ((min_avg_latency < MIN_AVG_LATENCY) || (min_avg_latency > >> MAX_AVG_LATENCY)) >> +{ >> +pp_pl_log(0, "args min_avg_latency is outside the valid >> range"); >> +return 0; >> +} >> + >> return 1; >> } >> >> -/* In multipath.conf, args form: io_num|base_num. For example, >> -* args is "20|10", this function can get io_num value 20, and >> - base_num value 10. >> +/* In multipath.conf, args form: io_num|base_num|min_avg_latency. >> +* For example, args is "20|10|1", this function can get io_num >> +* value 20, base_num value 10, min_avg_latency value 1 (us). >> */ >> -static int get_ionum_and_basenum(char *args, >> - int *ionum, >> - int *basenum) >> +static int get_prio_args(char *args, >> +int *ionum, >> +int *basen
Re: [dm-devel] [PATCH 1/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command.
Dear Martin, Thanks a lot for your reviews. Please find my replys as follows. And the up-to-date patch will be sent later. Regards, -Yang On 2017/7/21 1:50, Martin Wilck wrote: > Dear Yang, > > On Thu, 2017-07-20 at 11:36 +0800, Yang Feng wrote: >> 1. The SCSI-to-NVMe translations have been removed in the patch >> "nvme: >> Remove SCSI translations" in the linux-nvme, so the native NVMe Ioctl >> command should be supported in the multipath-tools. > >> 2. In the prioritizers/path_latency.c, modify the func >> do_readsector0(): >> send a native NVMe Read Ioctl command to the nvme device, and send a >> SG >> Read Ioctl command to the scsi device. > > As noted previously, I would prefer to use a directio command here. I > see no advantage of using sg or nvme ioctls over directio. > Thanks, insteadly, directio will be used. >> 3. In the tur checker, add support for the native NVMe Keep Alive >> Ioctl >> command to the nvme device. > > No, please don't. I'm still with Xose here. There's no need to use the > same checker for NVMe and SCSI devices. This is what we have the > hwtable for. Keep the tur checker as it was before, create a keepalive > checker for NVMe, and use hwtable entries to match them appropriately > to devices. > OK, it will be fixed. > See below for some details. > NAK from my side for the patch in this form. > > Martin > >> >> Signed-off-by: Yang Feng <philip.y...@huawei.com> >> Reviewed-by: Martin Wilck <mwi...@suse.com> >> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com> >> --- >> libmultipath/checkers.c | 7 ++ >> libmultipath/checkers.h | 4 + >> libmultipath/checkers/Makefile | 4 +- >> libmultipath/checkers/emc_clariion.c | 4 +- >> libmultipath/checkers/libsg.c| 94 --- > > IIRC you did this in your previous patch already. Can you give a good > reason for moving this code? It makes your patch unnecessarily big and > messes up git history. If it's REALLY necessary, separate it out, > please. > OK, the libsg.c/libsg.h will not be moved. > >> --- >> libmultipath/checkers/libsg.h| 9 --- >> libmultipath/checkers/readsector0.c | 4 +- >> libmultipath/checkers/tur.c | 64 ++- >> libmultipath/discovery.c | 1 + >> libmultipath/libnvme.c | 130 >> +++ >> libmultipath/libnvme.h | 10 +++ >> libmultipath/libsg.c | 113 >> +++ >> libmultipath/libsg.h | 13 >> libmultipath/prioritizers/Makefile | 2 +- >> libmultipath/prioritizers/path_latency.c | 34 +--- >> multipath/multipath.conf.5 | 2 +- >> 16 files changed, 354 insertions(+), 141 deletions(-) >> delete mode 100644 libmultipath/checkers/libsg.c >> delete mode 100644 libmultipath/checkers/libsg.h >> create mode 100644 libmultipath/libnvme.c >> create mode 100644 libmultipath/libnvme.h >> create mode 100644 libmultipath/libsg.c >> create mode 100644 libmultipath/libsg.h >> >> diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c >> index 05e024f..00fbd6e 100644 >> --- a/libmultipath/checkers.c >> +++ b/libmultipath/checkers.c >> @@ -162,6 +162,13 @@ void checker_set_fd (struct checker * c, int fd) >> c->fd = fd; >> } >> >> +void checker_set_dev(struct checker *c, char *dev) >> +{ >> +if (!c) >> +return; >> +strncpy(c->dev, dev, strlen(dev)+1); >> +} >> + >> void checker_set_sync (struct checker * c) >> { >> if (!c) >> diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h >> index 1d225de..f924624 100644 >> --- a/libmultipath/checkers.h >> +++ b/libmultipath/checkers.h >> @@ -97,6 +97,8 @@ enum path_check_state { >> #define CHECKER_DEV_LEN 256 >> #define LIB_CHECKER_NAMELEN 256 >> >> +#define FILE_NAME_SIZE 256 >> + >> struct checker { >> struct list_head node; >> void *handle; >> @@ -107,6 +109,7 @@ struct checker { >> int disable; >> char name[CHECKER_NAME_LEN]; >> char message[CHECKER_MSG_LEN]; /* comm with callers */ >> +char dev[FILE_NAME_SIZE]; > > FILE_NAME_SIZE more memory usage only to distinguish SCSI from NVMe. > Please, no. > OK, it will be fixed. > >> void * context; /* st
[dm-devel] [PATCH 1/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command.
1. The SCSI-to-NVMe translations have been removed in the patch "nvme: Remove SCSI translations" in the linux-nvme, so the native NVMe Ioctl command should be supported in the multipath-tools. 2. In the prioritizers/path_latency.c, modify the func do_readsector0(): send a native NVMe Read Ioctl command to the nvme device, and send a SG Read Ioctl command to the scsi device. 3. In the tur checker, add support for the native NVMe Keep Alive Ioctl command to the nvme device. Signed-off-by: Yang Feng <philip.y...@huawei.com> Reviewed-by: Martin Wilck <mwi...@suse.com> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com> --- libmultipath/checkers.c | 7 ++ libmultipath/checkers.h | 4 + libmultipath/checkers/Makefile | 4 +- libmultipath/checkers/emc_clariion.c | 4 +- libmultipath/checkers/libsg.c| 94 -- libmultipath/checkers/libsg.h| 9 --- libmultipath/checkers/readsector0.c | 4 +- libmultipath/checkers/tur.c | 64 ++- libmultipath/discovery.c | 1 + libmultipath/libnvme.c | 130 +++ libmultipath/libnvme.h | 10 +++ libmultipath/libsg.c | 113 +++ libmultipath/libsg.h | 13 libmultipath/prioritizers/Makefile | 2 +- libmultipath/prioritizers/path_latency.c | 34 +--- multipath/multipath.conf.5 | 2 +- 16 files changed, 354 insertions(+), 141 deletions(-) delete mode 100644 libmultipath/checkers/libsg.c delete mode 100644 libmultipath/checkers/libsg.h create mode 100644 libmultipath/libnvme.c create mode 100644 libmultipath/libnvme.h create mode 100644 libmultipath/libsg.c create mode 100644 libmultipath/libsg.h diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c index 05e024f..00fbd6e 100644 --- a/libmultipath/checkers.c +++ b/libmultipath/checkers.c @@ -162,6 +162,13 @@ void checker_set_fd (struct checker * c, int fd) c->fd = fd; } +void checker_set_dev(struct checker *c, char *dev) +{ +if (!c) +return; +strncpy(c->dev, dev, strlen(dev)+1); +} + void checker_set_sync (struct checker * c) { if (!c) diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h index 1d225de..f924624 100644 --- a/libmultipath/checkers.h +++ b/libmultipath/checkers.h @@ -97,6 +97,8 @@ enum path_check_state { #define CHECKER_DEV_LEN 256 #define LIB_CHECKER_NAMELEN 256 +#define FILE_NAME_SIZE 256 + struct checker { struct list_head node; void *handle; @@ -107,6 +109,7 @@ struct checker { int disable; char name[CHECKER_NAME_LEN]; char message[CHECKER_MSG_LEN]; /* comm with callers */ +char dev[FILE_NAME_SIZE]; void * context; /* store for persistent data */ void ** mpcontext; /* store for persistent data shared multipath-wide. Use MALLOC if @@ -132,6 +135,7 @@ void checker_reset (struct checker *); void checker_set_sync (struct checker *); void checker_set_async (struct checker *); void checker_set_fd (struct checker *, int); +void checker_set_dev(struct checker *c, char *dev); void checker_enable (struct checker *); void checker_disable (struct checker *); void checker_repair (struct checker *); diff --git a/libmultipath/checkers/Makefile b/libmultipath/checkers/Makefile index bce6b8b..a9be6b6 100644 --- a/libmultipath/checkers/Makefile +++ b/libmultipath/checkers/Makefile @@ -24,10 +24,10 @@ all: $(LIBS) libcheckrbd.so: rbd.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lrados -ludev -libcheckdirectio.so: libsg.o directio.o +libcheckdirectio.so: ../libsg.o ../libnvme.o directio.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -laio -libcheck%.so: libsg.o %.o +libcheck%.so: ../libsg.o ../libnvme.o %.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ install: diff --git a/libmultipath/checkers/emc_clariion.c b/libmultipath/checkers/emc_clariion.c index 9c1ffed..12c1e3e 100644 --- a/libmultipath/checkers/emc_clariion.c +++ b/libmultipath/checkers/emc_clariion.c @@ -12,7 +12,7 @@ #include #include "../libmultipath/sg_include.h" -#include "libsg.h" +#include "../libmultipath/libsg.h" #include "checkers.h" #include "debug.h" #include "memory.h" @@ -21,6 +21,8 @@ #define INQUIRY_CMDLEN 6 #define HEAVY_CHECK_COUNT 10 +#define SENSE_BUFF_LEN 32 + /* * Mechanism to track CLARiiON inactive snapshot LUs. * This is done so that we can fail passive paths diff --git a/libmultipath/checkers/libsg.c b/libmultipath/checkers/libsg.c deleted file mode 100644 index 958ea92..000 --- a/libmultipath/checkers/libsg.c +++ /dev/null @@ -1,94 +0,0 @@ -/* - * Copyright
[dm-devel] [PATCH 2/2] multipath-tools/libmultipath: Add args min_avg_latency for path_latency.
Add args min_avg_latency of logarithmic scale, for prioritizers/path_latency.c. Min average latency is not constant 1us, and is set by user. Certainly, max average latency value is still 100s. It make support better for more scenes, because it can deal better with the normal standard deviation of path latency. For example, when the standard deviation value is 200us and the average latency of the normal paths is 1ms, args base_num can be set to 5 and args min_avg_latency can be set to 2ms, so the paths will be grouped in priority groups with path latency <=2ms, (2ms, 10ms], (10ms, 50ms], etc. Signed-off-by: Yang Feng <philip.y...@huawei.com> Reviewed-by: Martin Wilck <mwi...@suse.com> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com> --- libmultipath/prioritizers/path_latency.c | 61 ++-- multipath/multipath.conf.5 | 14 +--- 2 files changed, 52 insertions(+), 23 deletions(-) diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c index 34b734b..a71faff 100644 --- a/libmultipath/prioritizers/path_latency.c +++ b/libmultipath/prioritizers/path_latency.c @@ -66,7 +66,7 @@ static int do_readsector0(int fd, unsigned int timeout) return ret; } -int check_args_valid(int io_num, int base_num) +int check_args_valid(int io_num, int base_num, int min_avg_latency) { if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) { @@ -80,24 +80,33 @@ int check_args_valid(int io_num, int base_num) return 0; } +if ((min_avg_latency < MIN_AVG_LATENCY) || (min_avg_latency > MAX_AVG_LATENCY)) +{ +pp_pl_log(0, "args min_avg_latency is outside the valid range"); +return 0; +} + return 1; } -/* In multipath.conf, args form: io_num|base_num. For example, -* args is "20|10", this function can get io_num value 20, and - base_num value 10. +/* In multipath.conf, args form: io_num|base_num|min_avg_latency. +* For example, args is "20|10|1", this function can get io_num +* value 20, base_num value 10, min_avg_latency value 1 (us). */ -static int get_ionum_and_basenum(char *args, - int *ionum, - int *basenum) +static int get_prio_args(char *args, +int *ionum, +int *basenum, +int *minavglatency) { char source[MAX_CHAR_SIZE]; char vertica = '|'; -char *endstrbefore = NULL; -char *endstrafter = NULL; +char *endstr1 = NULL; +char *endstr2 = NULL; +char *endstr3 = NULL; unsigned int size = strlen(args); -if ((args == NULL) || (ionum == NULL) || (basenum == NULL)) +if ((args == NULL) || (ionum == NULL) +|| (basenum == NULL) || (minavglatency == NULL)) { pp_pl_log(0, "args string is NULL"); return 0; @@ -117,21 +126,34 @@ static int get_ionum_and_basenum(char *args, return 0; } -*ionum = (int)strtoul(source, , 10); -if (endstrbefore[0] != vertica) +*ionum = (int)strtoul(source, , 10); +if (endstr1[0] != vertica) +{ +pp_pl_log(0, "invalid prio_args format: %s", source); +return 0; +} + +if (!isdigit(endstr1[1])) +{ +pp_pl_log(0, "invalid prio_args format: %s", source); +return 0; +} + +*basenum = (long long)strtol([1], , 10); +if (endstr2[0] != vertica) { pp_pl_log(0, "invalid prio_args format: %s", source); return 0; } -if (!isdigit(endstrbefore[1])) +if (!isdigit(endstr2[1])) { pp_pl_log(0, "invalid prio_args format: %s", source); return 0; } -*basenum = (long long)strtol([1], , 10); -if (check_args_valid(*ionum, *basenum) == 0) +*minavglatency = (long long)strtol([1], , 10); +if (check_args_valid(*ionum, *basenum, *minavglatency) == 0) { return 0; } @@ -194,6 +216,7 @@ int getprio (struct path *pp, char *args, unsigned int timeout) int index = 0; int io_num; int base_num; +int min_avg_latency; long long avglatency; long long latency_interval; long long standard_deviation; @@ -204,7 +227,7 @@ int getprio (struct path *pp, char *args, unsigned int timeout) if (pp->fd < 0) return -1; -if (get_ionum_and_basenum(args, _num, _num) == 0) +if (get_prio_args(args, _num, _num, _avg_latency) == 0) { pp_pl_log(0, "%s: get path_latency args fail", pp->dev); return DEFAULT_PRIORITY; @@ -245,13 +268,13 @@ int getprio (struct path *pp, char *args, unsigned int timeout) set can change latency_interval value corresponding to avglatency and is not constant. Warn the user if latency_interval is smaller than (2 * standard_deviation), or equal */ standard_deviation =
Re: [dm-devel] [PATCH 0/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency.
Hi Xose, Thanks for your reviews The up-to-date patch will be sent later. Regards! -Yang On 2017/7/18 7:00, Xose Vazquez Perez wrote: > On 07/17/2017 04:23 AM, Yang Feng wrote: > >> I think there is a mistake: >> 1. In fact, the tur checker feature has never been removed. The tur checker >> has just been renamed to "ping" checker and the keep alive command checker >> feature is added into this checker. So, the renamed checker remains "tur" >> and add "keep alive". Absolutely, I agree that SCSI is going to stay for >> some time to come, So it can support a SCSI device by "tur" and support a >> NVMe device by "keep alive". >> 2. If add a new checker for "keep alive", it maybe increase the redundancy >> of code. And, when the SCSI LUN and the NVMe namespace are maped in the >> same array for the same host, no matter which checker is choosed, it will >> not be able to support the scenario very well. >> 3. Using read() with O_DIRECT for path_latency can avoid this problem, but >> can not fix this problem. > There is no need to rename tur to ping. Same work can be done over tur.c. > > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 0/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency.
Hi Martin, Thanks for your reviews. I think there is a mistake: 1. In fact, the tur checker feature has never been removed. The tur checker has just been renamed to "ping" checker and the keep alive command checker feature is added into this checker. So, the renamed checker remains "tur" and add "keep alive". Absolutely, I agree that SCSI is going to stay for some time to come, So it can support a SCSI device by "tur" and support a NVMe device by "keep alive". 2. If add a new checker for "keep alive", it maybe increase the redundancy of code. And, when the SCSI LUN and the NVMe namespace are maped in the same array for the same host, no matter which checker is choosed, it will not be able to support the scenario very well. 3. Using read() with O_DIRECT for path_latency can avoid this problem, but can not fix this problem. Respect and regards. -Yang On 2017/7/14 18:45, Martin Wilck wrote: > On Fri, 2017-07-14 at 09:47 +0800, Yang Feng wrote: >> Hi Xose, >> >> But tur can not support NVMe device, and if the default checker >> change to ping, >> then it can support NVMe device by a keep alive command and SCSI >> device by a >> tur command. > > I agree with Xose. This is what we have the hwtable for. It's a good > and necessary thing to implement a checker equivalent to TUR for NVME, > but not a reason to remove TUR. SCSI is going to stay for some time to > come. > > As for the path latency checker and your readsector0 function, I don't > quite understand why you are using SG_IO at all. Why can't you just > read() with O_DIRECT? > > Saying that without having had the time for a deeper review of your > patch. > > Martin > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 0/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency.
Hi Xose, But tur can not support NVMe device, and if the default checker change to ping, then it can support NVMe device by a keep alive command and SCSI device by a tur command. Regards! -Yang On 2017/7/14 0:44, Xose Vazquez Perez wrote: > On 07/13/2017 09:51 AM, Yang Feng wrote: > >> [...] >>In the checkers, delete the file tur.c and create the new file ping.c: >> ping.c can support the native NVMe Keep Alive Ioctl command to the nvme >> device, and can support the SG TUR Ioctl command to the scsi device. >> [...] >> delete mode 100644 libmultipath/checkers/tur.c >> delete mode 100644 libmultipath/checkers/tur.h > > tur can not be deleted. It breaks current configs, and replacing > the default checker so fast is a bit delicate. > > . > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH]prioritizers/path_latency: Fix failure of sg_read to a nvme device
Closed and traced by another patch. -Yang On 2017/6/28 12:42, Yang Feng wrote: > Fixed failure of sg_read to a nvme device: > 1. Send sg_read command to a nvme device failed, > because the block size in user-mode is 4096 bytes, > not equal to the bolck size in kernel-mode, who is > 512 bytes. According to the func nvme_trans_io of > scsi.c in the dir "drivers/nvme/host". The code > fragmentS as follows: > /* If block count and actual data buffer size dont match, error out */ > if (xfer_bytes != (cdb_info.xfer_len << ns->lba_shift)) { > res = -EINVAL; > goto out; > } > 2. And the SCSI-to-NVMe translations will be removed in > the patch "nvme: Remove SCSI translations" in the linux-nvme, > so using the nvme_read command is a good idea. > > Signed-off-by: Yang Feng <philip.y...@huawei.com> > --- > libmultipath/prioritizers/path_latency.c | 78 > +++- > 1 file changed, 66 insertions(+), 12 deletions(-) > > diff --git a/libmultipath/prioritizers/path_latency.c > b/libmultipath/prioritizers/path_latency.c > index 34b734b..8f633e0 100644 > --- a/libmultipath/prioritizers/path_latency.c > +++ b/libmultipath/prioritizers/path_latency.c > @@ -23,14 +23,32 @@ > #include > #include > #include > - > #include "debug.h" > #include "prio.h" > #include "structs.h" > +#include > +#include > #include "../checkers/libsg.h" > > #define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " > fmt, ##args) > > +struct nvme_user_io { > +__u8 opcode; > +__u8 flags; > +__u16 control; > +__u16 nblocks; > +__u16 rsvd; > +__u64 metadata; > +__u64 addr; > +__u64 slba; > +__u32 dsmgmt; > +__u32 reftag; > +__u16 apptag; > +__u16 appmask; > +}; > + > +#define NVME_IOCTL_SUBMIT_IO _IOW('N', 0x42, struct nvme_user_io) > + > #define MAX_IO_NUM 200 > #define MIN_IO_NUM 2 > > @@ -51,19 +69,55 @@ static long long path_latency[MAX_IO_NUM]; > > static inline long long timeval_to_us(const struct timespec *tv) > { > - return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / > NSEC_PER_USEC); > +return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / > NSEC_PER_USEC); > +} > + > +int nvme_io(int fd, __u8 opcode, __u64 slba, __u16 nblocks, __u16 control, > +__u32 dsmgmt, __u32 reftag, __u16 apptag, __u16 appmask, void > *data, void *metadata) > +{ > +struct nvme_user_io io = { > +.opcode = opcode, > +.flags = 0, > +.control = control, > +.nblocks = nblocks, > +.rsvd = 0, > +.metadata = (__u64)(uintptr_t) metadata, > +.addr = (__u64)(uintptr_t) data, > +.slba = slba, > +.dsmgmt = dsmgmt, > +.reftag = reftag, > +.appmask = apptag, > +.apptag = appmask, > +}; > + > +return ioctl(fd, NVME_IOCTL_SUBMIT_IO, ); > +} > + > +int nvme_read(int fd, __u64 slba, __u16 nblocks, __u16 control, __u32 dsmgmt, > +__u32 reftag, __u16 apptag, __u16 appmask, void *data, void > *metadata) > +{ > +return nvme_io(fd, 0x2, slba, nblocks, control, dsmgmt, > +reftag, apptag, appmask, data, metadata); > } > > -static int do_readsector0(int fd, unsigned int timeout) > +static int do_readsector0(struct path *pp, unsigned int timeout) > { > - unsigned char buf[4096]; > - unsigned char sbuf[SENSE_BUFF_LEN]; > - int ret; > +unsigned char buf[4096]; > +unsigned char mbuf[512]; > +unsigned char sbuf[SENSE_BUFF_LEN]; > > - ret = sg_read(fd, [0], 4096, [0], > - SENSE_BUFF_LEN, timeout); > +if (!strncmp(pp->dev, "nvme", 4)) > +{ > +if (nvme_read(pp->fd, 0, 1, 0, 0, 0, 0, 0, buf, mbuf) < 0) > +return 0; > +} > +else > +{ > +if (sg_read(pp->fd, [0], 4096, [0],SENSE_BUFF_LEN, timeout) > == 2) > +return 0; > +} > > - return ret; > +return 1; > } > > int check_args_valid(int io_num, int base_num) > @@ -218,9 +272,9 @@ int getprio (struct path *pp, char *args, unsigned int > timeout) > (void)clock_gettime(CLOCK_MONOTONIC, ); > before = timeval_to_us(); > > -if (do_readsector0(pp->fd, timeout) == 2) > +if (do_readsector0(pp, timeout) == 0) > { > -pp_pl_log(0, "%s: path down", pp->dev); > +pp_pl_log(0, "%s: send read sector0 command fail", pp->dev); >
[dm-devel] [PATCH 2/2] multipath-tools/libmultipath: Add args min_avg_latency for path_latency.
Add args min_avg_latency of logarithmic scale, for prioritizers/path_latency.c. Min average latency is not constant 1us, and is set by user. Certainly, max average latency value is still 100s. It make support better for more scenes, because it can deal better with the normal standard deviation of path latency. For example, when the standard deviation value is 200us and the average latency of the normal paths is 1ms, args base_num can be set to 5 and args min_avg_latency can be set to 2ms, so the paths will be grouped in priority groups with path latency <=2ms, (2ms, 10ms], (10ms, 50ms], etc. Signed-off-by: Yang Feng <philip.y...@huawei.com> --- libmultipath/prioritizers/path_latency.c | 61 ++-- multipath/multipath.conf.5 | 14 +--- 2 files changed, 52 insertions(+), 23 deletions(-) diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c index 21209ff..c8fe318 100644 --- a/libmultipath/prioritizers/path_latency.c +++ b/libmultipath/prioritizers/path_latency.c @@ -76,7 +76,7 @@ static int do_readsector0(struct path *pp, unsigned int timeout) return 1; } -int check_args_valid(int io_num, int base_num) +int check_args_valid(int io_num, int base_num, int min_avg_latency) { if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) { @@ -90,24 +90,33 @@ int check_args_valid(int io_num, int base_num) return 0; } +if ((min_avg_latency < MIN_AVG_LATENCY) || (min_avg_latency > MAX_AVG_LATENCY)) +{ +pp_pl_log(0, "args min_avg_latency is outside the valid range"); +return 0; +} + return 1; } -/* In multipath.conf, args form: io_num|base_num. For example, -* args is "20|10", this function can get io_num value 20, and - base_num value 10. +/* In multipath.conf, args form: io_num|base_num|min_avg_latency. +* For example, args is "20|10|1", this function can get io_num +* value 20, base_num value 10, min_avg_latency value 1 (us). */ -static int get_ionum_and_basenum(char *args, - int *ionum, - int *basenum) +static int get_prio_args(char *args, +int *ionum, +int *basenum, +int *minavglatency) { char source[MAX_CHAR_SIZE]; char vertica = '|'; -char *endstrbefore = NULL; -char *endstrafter = NULL; +char *endstr1 = NULL; +char *endstr2 = NULL; +char *endstr3 = NULL; unsigned int size = strlen(args); -if ((args == NULL) || (ionum == NULL) || (basenum == NULL)) +if ((args == NULL) || (ionum == NULL) +|| (basenum == NULL) || (minavglatency == NULL)) { pp_pl_log(0, "args string is NULL"); return 0; @@ -127,21 +136,34 @@ static int get_ionum_and_basenum(char *args, return 0; } -*ionum = (int)strtoul(source, , 10); -if (endstrbefore[0] != vertica) +*ionum = (int)strtoul(source, , 10); +if (endstr1[0] != vertica) +{ +pp_pl_log(0, "invalid prio_args format: %s", source); +return 0; +} + +if (!isdigit(endstr1[1])) +{ +pp_pl_log(0, "invalid prio_args format: %s", source); +return 0; +} + +*basenum = (long long)strtol([1], , 10); +if (endstr2[0] != vertica) { pp_pl_log(0, "invalid prio_args format: %s", source); return 0; } -if (!isdigit(endstrbefore[1])) +if (!isdigit(endstr2[1])) { pp_pl_log(0, "invalid prio_args format: %s", source); return 0; } -*basenum = (long long)strtol([1], , 10); -if (check_args_valid(*ionum, *basenum) == 0) +*minavglatency = (long long)strtol([1], , 10); +if (check_args_valid(*ionum, *basenum, *minavglatency) == 0) { return 0; } @@ -204,6 +226,7 @@ int getprio (struct path *pp, char *args, unsigned int timeout) int index = 0; int io_num; int base_num; +int min_avg_latency; long long avglatency; long long latency_interval; long long standard_deviation; @@ -214,7 +237,7 @@ int getprio (struct path *pp, char *args, unsigned int timeout) if (pp->fd < 0) return -1; -if (get_ionum_and_basenum(args, _num, _num) == 0) +if (get_prio_args(args, _num, _num, _avg_latency) == 0) { pp_pl_log(0, "%s: get path_latency args fail", pp->dev); return DEFAULT_PRIORITY; @@ -255,13 +278,13 @@ int getprio (struct path *pp, char *args, unsigned int timeout) set can change latency_interval value corresponding to avglatency and is not constant. Warn the user if latency_interval is smaller than (2 * standard_deviation), or equal */ standard_deviation = calc_standard_deviation(path_latency, index, avglatency); -latency_interval = calc_latency_
[dm-devel] [PATCH 0/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency.
1. The SCSI-to-NVMe translations have been removed in the patch "nvme: Remove SCSI translations" in the linux-nvme, so the native NVMe Ioctl command should be supported in the multipath-tools. In the prioritizers/path_latency.c, modify the func do_readsector0(): send a native NVMe Read Ioctl command to the nvme device, and send a SG Read Ioctl command to the scsi device. In the checkers, delete the file tur.c and create the new file ping.c: ping.c can support the native NVMe Keep Alive Ioctl command to the nvme device, and can support the SG TUR Ioctl command to the scsi device. 2. Add args min_avg_latency of logarithmic scale, for prioritizers/ path_latency.c. Min average latency is not constant 1us, and can be set by user. Certainly, max average latency value is still constant 100s. It make support better for more scenes, because it can deal better with the normal standard deviation of path latency. For example, when the standard deviation value is 200us and the average latency of the normal paths is 1ms, args base_num can be set to 5 and args min_avg_latency can be set to 2ms, so the paths will be grouped in priority groups with path latency <=2ms, (2ms, 10ms], (10ms, 50ms], etc. Yang Feng (1): multipath-tools/libmultipath: Support for the native NVMe Ioctl command and add args min_avg_latency for path_latency. libmultipath/checkers.c | 7 + libmultipath/checkers.h | 6 +- libmultipath/checkers/Makefile | 6 +- libmultipath/checkers/emc_clariion.c | 4 +- libmultipath/checkers/libsg.c| 94 --- libmultipath/checkers/libsg.h| 9 - libmultipath/checkers/ping.c | 453 +++ libmultipath/checkers/readsector0.c | 4 +- libmultipath/checkers/tur.c | 427 - libmultipath/checkers/tur.h | 8 - libmultipath/defaults.h | 2 +- libmultipath/discovery.c | 1 + libmultipath/hwtable.c | 2 +- libmultipath/libnvme.c | 130 + libmultipath/libnvme.h | 10 + libmultipath/libsg.c | 113 libmultipath/libsg.h | 13 + libmultipath/prioritizers/Makefile | 2 +- libmultipath/prioritizers/path_latency.c | 95 --- libmultipath/propsel.c | 2 +- multipath/multipath.conf.5 | 18 +- 21 files changed, 821 insertions(+), 585 deletions(-) delete mode 100644 libmultipath/checkers/libsg.c delete mode 100644 libmultipath/checkers/libsg.h create mode 100644 libmultipath/checkers/ping.c delete mode 100644 libmultipath/checkers/tur.c delete mode 100644 libmultipath/checkers/tur.h create mode 100644 libmultipath/libnvme.c create mode 100644 libmultipath/libnvme.h create mode 100644 libmultipath/libsg.c create mode 100644 libmultipath/libsg.h -- 2.6.4.windows.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 1/2] multipath-tools/libmultipath: Support for the native NVMe Ioctl command.
1. The SCSI-to-NVMe translations have been removed in the patch "nvme: Remove SCSI translations" in the linux-nvme, so the native NVMe Ioctl command should be supported in the multipath-tools. 2. In the prioritizers/path_latency.c, modify the func do_readsector0(): send a native NVMe Read Ioctl command to the nvme device, and send a SG Read Ioctl command to the scsi device. 3. In the checkers, delete the file tur.c and create the new file ping.c: ping.c can support the native NVMe Keep Alive Ioctl command to the nvme device, and can support the SG TUR Ioctl command to the scsi device. Signed-off-by: Yang Feng <philip.y...@huawei.com> --- libmultipath/checkers.c | 7 + libmultipath/checkers.h | 6 +- libmultipath/checkers/Makefile | 6 +- libmultipath/checkers/emc_clariion.c | 4 +- libmultipath/checkers/libsg.c| 94 --- libmultipath/checkers/libsg.h| 9 - libmultipath/checkers/ping.c | 453 +++ libmultipath/checkers/readsector0.c | 4 +- libmultipath/checkers/tur.c | 427 - libmultipath/checkers/tur.h | 8 - libmultipath/defaults.h | 2 +- libmultipath/discovery.c | 1 + libmultipath/hwtable.c | 2 +- libmultipath/libnvme.c | 130 + libmultipath/libnvme.h | 10 + libmultipath/libsg.c | 113 libmultipath/libsg.h | 13 + libmultipath/prioritizers/Makefile | 2 +- libmultipath/prioritizers/path_latency.c | 58 +--- libmultipath/propsel.c | 2 +- multipath/multipath.conf.5 | 4 +- 21 files changed, 754 insertions(+), 601 deletions(-) delete mode 100644 libmultipath/checkers/libsg.c delete mode 100644 libmultipath/checkers/libsg.h create mode 100644 libmultipath/checkers/ping.c delete mode 100644 libmultipath/checkers/tur.c delete mode 100644 libmultipath/checkers/tur.h create mode 100644 libmultipath/libnvme.c create mode 100644 libmultipath/libnvme.h create mode 100644 libmultipath/libsg.c create mode 100644 libmultipath/libsg.h diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c index 05e024f..00fbd6e 100644 --- a/libmultipath/checkers.c +++ b/libmultipath/checkers.c @@ -162,6 +162,13 @@ void checker_set_fd (struct checker * c, int fd) c->fd = fd; } +void checker_set_dev(struct checker *c, char *dev) +{ +if (!c) +return; +strncpy(c->dev, dev, strlen(dev)+1); +} + void checker_set_sync (struct checker * c) { if (!c) diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h index 1d225de..506dd4c 100644 --- a/libmultipath/checkers.h +++ b/libmultipath/checkers.h @@ -79,7 +79,7 @@ enum path_check_state { }; #define DIRECTIO "directio" -#define TUR "tur" +#define PING "ping" #define HP_SW"hp_sw" #define RDAC "rdac" #define EMC_CLARIION "emc_clariion" @@ -97,6 +97,8 @@ enum path_check_state { #define CHECKER_DEV_LEN 256 #define LIB_CHECKER_NAMELEN 256 +#define FILE_NAME_SIZE 256 + struct checker { struct list_head node; void *handle; @@ -107,6 +109,7 @@ struct checker { int disable; char name[CHECKER_NAME_LEN]; char message[CHECKER_MSG_LEN]; /* comm with callers */ +char dev[FILE_NAME_SIZE]; void * context; /* store for persistent data */ void ** mpcontext; /* store for persistent data shared multipath-wide. Use MALLOC if @@ -132,6 +135,7 @@ void checker_reset (struct checker *); void checker_set_sync (struct checker *); void checker_set_async (struct checker *); void checker_set_fd (struct checker *, int); +void checker_set_dev(struct checker *c, char *dev); void checker_enable (struct checker *); void checker_disable (struct checker *); void checker_repair (struct checker *); diff --git a/libmultipath/checkers/Makefile b/libmultipath/checkers/Makefile index bce6b8b..3ab04ef 100644 --- a/libmultipath/checkers/Makefile +++ b/libmultipath/checkers/Makefile @@ -9,7 +9,7 @@ CFLAGS += $(LIB_CFLAGS) -I.. LIBS= \ libcheckcciss_tur.so \ libcheckreadsector0.so \ - libchecktur.so \ + libcheckping.so \ libcheckdirectio.so \ libcheckemc_clariion.so \ libcheckhp_sw.so \ @@ -24,10 +24,10 @@ all: $(LIBS) libcheckrbd.so: rbd.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lrados -ludev -libcheckdirectio.so: libsg.o directio.o +libcheckdirectio.so: ../libsg.o ../libnvme.o directio.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -laio -libcheck%.so: libsg.o %.o +libcheck%.so: ../libsg.o ../libnvme.o %.o $(CC) $(LDFLAGS) $(SH
[dm-devel] [PATCH]prioritizers/path_latency: Fix failure of sg_read to a nvme device
Fixed failure of sg_read to a nvme device: 1. Send sg_read command to a nvme device failed, because the block size in user-mode is 4096 bytes, not equal to the bolck size in kernel-mode, who is 512 bytes. According to the func nvme_trans_io of scsi.c in the dir "drivers/nvme/host". The code fragmentS as follows: /* If block count and actual data buffer size dont match, error out */ if (xfer_bytes != (cdb_info.xfer_len << ns->lba_shift)) { res = -EINVAL; goto out; } 2. And the SCSI-to-NVMe translations will be removed in the patch "nvme: Remove SCSI translations" in the linux-nvme, so using the nvme_read command is a good idea. Signed-off-by: Yang Feng <philip.y...@huawei.com> --- libmultipath/prioritizers/path_latency.c | 78 +++- 1 file changed, 66 insertions(+), 12 deletions(-) diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c index 34b734b..8f633e0 100644 --- a/libmultipath/prioritizers/path_latency.c +++ b/libmultipath/prioritizers/path_latency.c @@ -23,14 +23,32 @@ #include #include #include - #include "debug.h" #include "prio.h" #include "structs.h" +#include +#include #include "../checkers/libsg.h" #define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " fmt, ##args) +struct nvme_user_io { +__u8 opcode; +__u8 flags; +__u16 control; +__u16 nblocks; +__u16 rsvd; +__u64 metadata; +__u64 addr; +__u64 slba; +__u32 dsmgmt; +__u32 reftag; +__u16 apptag; +__u16 appmask; +}; + +#define NVME_IOCTL_SUBMIT_IO _IOW('N', 0x42, struct nvme_user_io) + #define MAX_IO_NUM 200 #define MIN_IO_NUM 2 @@ -51,19 +69,55 @@ static long long path_latency[MAX_IO_NUM]; static inline long long timeval_to_us(const struct timespec *tv) { - return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / NSEC_PER_USEC); +return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / NSEC_PER_USEC); +} + +int nvme_io(int fd, __u8 opcode, __u64 slba, __u16 nblocks, __u16 control, +__u32 dsmgmt, __u32 reftag, __u16 apptag, __u16 appmask, void *data, void *metadata) +{ +struct nvme_user_io io = { +.opcode = opcode, +.flags = 0, +.control = control, +.nblocks = nblocks, +.rsvd = 0, +.metadata = (__u64)(uintptr_t) metadata, +.addr = (__u64)(uintptr_t) data, +.slba = slba, +.dsmgmt = dsmgmt, +.reftag = reftag, +.appmask = apptag, +.apptag = appmask, +}; + +return ioctl(fd, NVME_IOCTL_SUBMIT_IO, ); +} + +int nvme_read(int fd, __u64 slba, __u16 nblocks, __u16 control, __u32 dsmgmt, +__u32 reftag, __u16 apptag, __u16 appmask, void *data, void *metadata) +{ +return nvme_io(fd, 0x2, slba, nblocks, control, dsmgmt, +reftag, apptag, appmask, data, metadata); } -static int do_readsector0(int fd, unsigned int timeout) +static int do_readsector0(struct path *pp, unsigned int timeout) { - unsigned char buf[4096]; - unsigned char sbuf[SENSE_BUFF_LEN]; - int ret; +unsigned char buf[4096]; +unsigned char mbuf[512]; +unsigned char sbuf[SENSE_BUFF_LEN]; - ret = sg_read(fd, [0], 4096, [0], - SENSE_BUFF_LEN, timeout); +if (!strncmp(pp->dev, "nvme", 4)) +{ +if (nvme_read(pp->fd, 0, 1, 0, 0, 0, 0, 0, buf, mbuf) < 0) +return 0; +} +else +{ +if (sg_read(pp->fd, [0], 4096, [0],SENSE_BUFF_LEN, timeout) == 2) +return 0; +} - return ret; +return 1; } int check_args_valid(int io_num, int base_num) @@ -218,9 +272,9 @@ int getprio (struct path *pp, char *args, unsigned int timeout) (void)clock_gettime(CLOCK_MONOTONIC, ); before = timeval_to_us(); -if (do_readsector0(pp->fd, timeout) == 2) +if (do_readsector0(pp, timeout) == 0) { -pp_pl_log(0, "%s: path down", pp->dev); +pp_pl_log(0, "%s: send read sector0 command fail", pp->dev); return -1; } @@ -246,7 +300,7 @@ int getprio (struct path *pp, char *args, unsigned int timeout) Warn the user if latency_interval is smaller than (2 * standard_deviation), or equal */ standard_deviation = calc_standard_deviation(path_latency, index, avglatency); latency_interval = calc_latency_interval(avglatency, MAX_AVG_LATENCY, MIN_AVG_LATENCY, base_num); -if ((latency_interval!= 0) +if ((latency_interval != 0) && (latency_interval <= (2 * standard_deviation))) pp_pl_log(3, "%s: latency interval (%lld) according to average latency (%lld us) is smaller than " "2 * standard deviation (%lld us), or equal, args base_num (%d) needs to be set bigger value", -- 2.6.4.windows.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] multipath-tools: beautify path_latency.c code
Hi Xose, Thanks for your work. ACK -Yang On 2017/6/23 0:46, Xose Vazquez Perez wrote: > Mainly running scripts/Lindent, from kernel dir, to replace indent spaces > by tabs. > > Cc: Yang Feng <philip.y...@huawei.com> > Cc: Christophe Varoqui <christophe.varo...@opensvc.com> > Cc: device-mapper development <dm-devel@redhat.com> > Signed-off-by: Xose Vazquez Perez <xose.vazq...@gmail.com> > --- > libmultipath/prioritizers/path_latency.c | 354 > +++ > 1 file changed, 177 insertions(+), 177 deletions(-) > > diff --git a/libmultipath/prioritizers/path_latency.c > b/libmultipath/prioritizers/path_latency.c > index 34b734b..9fc2dfc 100644 > --- a/libmultipath/prioritizers/path_latency.c > +++ b/libmultipath/prioritizers/path_latency.c > @@ -17,8 +17,8 @@ > * Author(s): Yang Feng <philip.y...@huawei.com> > * > * This file is released under the GPL version 2, or any later version. > - * > */ > + > #include > #include > #include > @@ -31,27 +31,28 @@ > > #define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " > fmt, ##args) > > -#define MAX_IO_NUM 200 > -#define MIN_IO_NUM 2 > +#define MAX_IO_NUM 200 > +#define MIN_IO_NUM 2 > > -#define MAX_BASE_NUM10 > -#define MIN_BASE_NUM2 > +#define MAX_BASE_NUM 10 > +#define MIN_BASE_NUM 2 > > -#define MAX_AVG_LATENCY 1. /*Unit: us*/ > -#define MIN_AVG_LATENCY 1. /*Unit: us*/ > +#define MAX_AVG_LATENCY 1. /* Unit: us */ > +#define MIN_AVG_LATENCY 1. /* Unit: us */ > > -#define DEFAULT_PRIORITY0 > +#define DEFAULT_PRIORITY 0 > > -#define MAX_CHAR_SIZE 30 > +#define MAX_CHAR_SIZE30 > > -#define USEC_PER_SEC100LL > -#define NSEC_PER_USEC 1000LL > +#define USEC_PER_SEC 100LL > +#define NSEC_PER_USEC1000LL > > static long long path_latency[MAX_IO_NUM]; > > static inline long long timeval_to_us(const struct timespec *tv) > { > - return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec / > NSEC_PER_USEC); > + return ((long long)tv->tv_sec * USEC_PER_SEC) + > + (tv->tv_nsec / NSEC_PER_USEC); > } > > static int do_readsector0(int fd, unsigned int timeout) > @@ -60,198 +61,197 @@ static int do_readsector0(int fd, unsigned int timeout) > unsigned char sbuf[SENSE_BUFF_LEN]; > int ret; > > - ret = sg_read(fd, [0], 4096, [0], > - SENSE_BUFF_LEN, timeout); > + ret = sg_read(fd, [0], 4096, [0], SENSE_BUFF_LEN, timeout); > > return ret; > } > > int check_args_valid(int io_num, int base_num) > { > -if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) > -{ > -pp_pl_log(0, "args io_num is outside the valid range"); > -return 0; > -} > - > -if ((base_num < MIN_BASE_NUM) || (base_num > MAX_BASE_NUM)) > -{ > -pp_pl_log(0, "args base_num is outside the valid range"); > -return 0; > -} > - > -return 1; > + if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) { > + pp_pl_log(0, "args io_num is outside the valid range"); > + return 0; > + } > + > + if ((base_num < MIN_BASE_NUM) || (base_num > MAX_BASE_NUM)) { > + pp_pl_log(0, "args base_num is outside the valid range"); > + return 0; > + } > + > + return 1; > } > > -/* In multipath.conf, args form: io_num|base_num. For example, > -* args is "20|10", this function can get io_num value 20, and > - base_num value 10. > -*/ > -static int get_ionum_and_basenum(char *args, > - int *ionum, > - int *basenum) > +/* > + * In multipath.conf, args form: io_num|base_num. For example, > + * args is "20|10", this function can get io_num value 20, and > + * base_num value 10. > + */ > +static int get_ionum_and_basenum(char *args, int *ionum, int *basenum) > { > -char source[MAX_CHAR_SIZE]; > -char vertica = '|'; > -char *endstrbefore = NULL; > -char *endstrafter = NULL; > -unsigned int size = strlen(args); > - > -if ((args == NULL) || (ionum == NULL) || (basenum == NULL)) > -{ > -pp_pl_log(0, "args string is NULL"); > -return 0; > -} > - > -i
Re: [dm-devel] [PATCH v6 1/1] multipath-tools: Prioritizer based on a latency algorithm
Hi Christophe, OK, thanks a lot. Regards, Yang --- On 2017/6/21 18:22, Christophe Varoqui wrote: > This version is now merged. > Thanks. > > Incremental enhancements can go from there if needed. > > On Wed, Jun 21, 2017 at 11:40 AM, Yang Feng <philip.y...@huawei.com > <mailto:philip.y...@huawei.com>> wrote: > > libmultipath/prioritizers: Prioritizer for device mapper multipath, > where the corresponding priority values of specific paths are provided > by a latency algorithm on the logarithmic scale. And the latency algorithm > is dependent on the following arguments(io_num and base_num). > The principle of the algorithm is illustrated as follows: > 1. By sending a certain number "io_num" of read IOs to the current > path continuously, the IOs' average latency can be calculated. > 2. Max average latency value is 100s, and min value is 1us. According > to the average latency of each path and the "base_number" of logarithmic > scale, the priority "rc" of each path can be provided. > > For example: If base_num=10, the paths will be grouped in priority groups > with path latency <=1us, (1us, 10us], (10us, 100us], (100us, 1ms], (1ms, > 10ms], > (10ms, 100ms], (100ms, 1s], (1s, 10s], (10s, 100s], >100s. As follows: > ><=1us(1us, 10us] (10us, 100us] >100s > > |--|--|--|...|--| > | priority rank 9 | priority rank 8 | priority rank 7 |...| priority > rank 0 | > > |--|------|--|...|--| >Priority Rank Partitioning > > Signed-off-by: Yang Feng <philip.y...@huawei.com > <mailto:philip.y...@huawei.com>> > Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com > <mailto:bmarz...@redhat.com>> > Reviewed-by: Martin Wilck <mwi...@suse.com <mailto:mwi...@suse.com>> > Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com > <mailto:xose.vazq...@gmail.com>> > Reviewed-by: Hannes Reinecke <h...@suse.de <mailto:h...@suse.de>> > --- > libmultipath/prio.h | 1 + > libmultipath/prioritizers/Makefile | 4 + > libmultipath/prioritizers/path_latency.c | 257 > +++ > multipath/multipath.conf.5 | 20 +++ > 4 files changed, 282 insertions(+) > create mode 100644 libmultipath/prioritizers/path_latency.c > --- -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v6 1/1] multipath-tools: Prioritizer based on a latency algorithm
libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority values of specific paths are provided by a latency algorithm on the logarithmic scale. And the latency algorithm is dependent on the following arguments(io_num and base_num). The principle of the algorithm is illustrated as follows: 1. By sending a certain number "io_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. 2. Max average latency value is 100s, and min value is 1us. According to the average latency of each path and the "base_number" of logarithmic scale, the priority "rc" of each path can be provided. For example: If base_num=10, the paths will be grouped in priority groups with path latency <=1us, (1us, 10us], (10us, 100us], (100us, 1ms], (1ms, 10ms], (10ms, 100ms], (100ms, 1s], (1s, 10s], (10s, 100s], >100s. As follows: <=1us(1us, 10us] (10us, 100us] >100s |--|--|--|...|--| | priority rank 9 | priority rank 8 | priority rank 7 |...| priority rank 0 | |--|--|--|...|--| Priority Rank Partitioning Signed-off-by: Yang Feng <philip.y...@huawei.com> Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com> Reviewed-by: Martin Wilck <mwi...@suse.com> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com> Reviewed-by: Hannes Reinecke <h...@suse.de> --- libmultipath/prio.h | 1 + libmultipath/prioritizers/Makefile | 4 + libmultipath/prioritizers/path_latency.c | 257 +++ multipath/multipath.conf.5 | 20 +++ 4 files changed, 282 insertions(+) create mode 100644 libmultipath/prioritizers/path_latency.c diff --git a/libmultipath/prio.h b/libmultipath/prio.h index 0193c52..c97fe39 100644 --- a/libmultipath/prio.h +++ b/libmultipath/prio.h @@ -29,6 +29,7 @@ struct path; #define PRIO_RDAC "rdac" #define PRIO_WEIGHTED_PATH "weightedpath" #define PRIO_SYSFS "sysfs" +#define PRIO_PATH_LATENCY "path_latency" /* * Value used to mark the fact prio was not defined diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile index 36b42e4..ca47cdf 100644 --- a/libmultipath/prioritizers/Makefile +++ b/libmultipath/prioritizers/Makefile @@ -18,6 +18,7 @@ LIBS = \ libpriorandom.so \ libpriordac.so \ libprioweightedpath.so \ + libpriopath_latency.so \ libpriosysfs.so all: $(LIBS) @@ -25,6 +26,9 @@ all: $(LIBS) libprioalua.so: alua.o alua_rtpg.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ +libpriopath_latency.so: path_latency.o ../checkers/libsg.o + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm + libprio%.so: %.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c new file mode 100644 index 000..046e13b --- /dev/null +++ b/libmultipath/prioritizers/path_latency.c @@ -0,0 +1,257 @@ +/* + * (C) Copyright HUAWEI Technology Corp. 2017, All Rights Reserved. + * + * path_latency.c + * + * Prioritizer for device mapper multipath, where the corresponding priority + * values of specific paths are provided by a latency algorithm. And the + * latency algorithm is dependent on arguments("io_num" and "base_num"). + * + * The principle of the algorithm as follows: + * 1. By sending a certain number "io_num" of read IOs to the current path + *continuously, the IOs' average latency can be calculated. + * 2. Max value and min value of average latency are constant. According to + *the average latency of each path and the "base_num" of logarithmic + *scale, the priority "rc" of each path can be provided. + * + * Author(s): Yang Feng <philip.y...@huawei.com> + * + * This file is released under the GPL version 2, or any later version. + * + */ +#include +#include +#include +#include + +#include "debug.h" +#include "prio.h" +#include "structs.h" +#include "../checkers/libsg.h" + +#define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " fmt, ##args) + +#define MAX_IO_NUM 200 +#define MIN_IO_NUM 2 + +#define MAX_BASE_NUM10 +#define MIN_BASE_NUM2 + +#define MAX_AVG_LATENCY 1. /*Unit: us*/ +#define MIN_AVG_LATENCY 1. /*Unit: us*/ + +#define DEFAULT_PRIORITY0 + +#define MAX_CHAR_SIZE 30 + +#define USEC_PER_SEC100LL +#define NSEC_PER_USEC 1000LL + +static long long path_latency[MAX_IO_NUM]; + +static inline long long timeval_to
[dm-devel] [PATCH v6 0/1] multipath-tools: Prioritizer based on a latency algorithm
This patch value is in the following: 1. In the Storage-Backup environment of HyperCluster, includes one storage array near to the host and one remote storage array, and the two storage arrays have the same hardware. The same LUN is writed or readed by the two storage arrays. However, usually, the average latency of the paths of the remote storage array is higher than the near storage array's. Apparently, the prioritizer can be a good automatic solution. And the current selectors don't solve it, IOs will send to the paths of the remote storage array, IOPS will be influenced unavoidably. 2. In the environment of single storage array, the prioritizer can automatically separate the paths who's latency is higher, IOs will not send to this paths. But the current selectors don't solve this problem, IOPS will be influenced unavoidably. Changes from v5: * Max average latency value is set to 100s, and min value is set to 1us. Use logarithmic scale to partition different path priority ranks instead of linear scale, while use args "base_num" instead of "latency_interval". Fix according to Martin's reviews. * Make log to be more regular. Fix according to Martin's reviews. Changes from v4: * Args "latency_interval" is set by using logarithmic scale, where base number is 10. Fix according to Martin's reviews. * The value of "MAX_LATENCY_INTERVAL" is set 10 from 60. Changes from v3: * Delete Copyright time "2021", Fix according to Xose's reviews. * Add version for GPL, Fix according to Xose's reviews. Changes from v2: * Reorganize the commit comment and patch format. * Added Benjamin, Martin and Xose's Reviewed-by. Changes from v1: * The value of "MIN_IO_NUM" is set 2 from 10. * Fix according to Benjamin, Martin and Xose's reviews. Yang Feng (1): multipath-tools: Prioritizer based on a latency algorithm libmultipath/prio.h | 1 + libmultipath/prioritizers/Makefile | 4 + libmultipath/prioritizers/path_latency.c | 257 +++ multipath/multipath.conf.5 | 20 +++ 4 files changed, 282 insertions(+) create mode 100644 libmultipath/prioritizers/path_latency.c -- 2.6.4.windows.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 1/1] multipath-tools: Prioritizer based on a latency algorithm
Hi Martin, Thanks for your reviews. Please find my replys. And the patch v6 will be sent later. Regards, Yang --- > >> +#define MAX_LATENCY_INTERVAL10/*unit: s*/ >> +#define MIN_LATENCY_INTERVAL1 /*unit: us, or ms, or s*/ > > Why not use the same unit or both values? > It's easy to check the valid range of the "latency_interval" value. For unit: us, or ms, or s, the min digital value is the same, is 1. And for all unit, the max conversion value is the same, is 10 seconds. >> +static inline long long timeval_to_us(const struct timespec *tv) >> +{ >> +return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv- >>> tv_nsec >> 10); > > Please calculate correctly, divide by 1000. > Thanks, it will be fixed. >> +int check_args_valid(int io_num, long long latency_interval, int >> type) >> +{ >> +if ((io_num < MIN_IO_NUM) || (io_num > MAX_IO_NUM)) >> +{ >> +condlog(0, "args io_num is more than the valid values >> range"); >> +return 0; >> +} > > Please write "outside the valid range" rather than "more than the valid > values range". > Thanks, it will be fixed. > In general, when you use condlog(), please prefix all messages with a > string that clearly identifies the path_latency prioritizer, e.g. > PRIO_PATH_LATENCY. > I know we're not currently doing that consistently in multipath-tools, > but that shouldn't be a reason for not doing when adding new code. > Good, thanks, it will be fixed. >> + >> +/* check value is set by using logarithmic scale, and base >> number is 10 */ >> +if ((latency_interval % BASE_NUMBER) != 0) >> +{ >> +condlog(0, "args latency_interval is not set by using >> logarithmic scale , where base number is 10"); >> +return 0; >> +} > > This is not what I meant with "logarithmic scale". See below. > >> + >> +/* s:[1, 10], ms:[1, 1], us:[1, 1000] */ >> +if ((latency_interval < MIN_LATENCY_INTERVAL) >> +|| (latency_interval > (MAX_LATENCY_INTERVAL * USEC_PER_SEC >> / conversion_ratio[type]))) >> +{ >> +condlog(0, "args latency_interval is more than the valid >> values range"); >> +return 0; >> +} > > Again, please use "outside". > Thanks, it will be fixed. >> +/* In multipath.conf, args form: io_num|latency_interval. For >> example, >> +* args is "20|10ms", this function can get 20, 10. >> +*/ >> +static int get_interval_and_ionum(char *args, >> +int *ionum, >> +long long *interval) >> +{ >> +char source[MAX_CHAR_SIZE]; >> +char vertica = '|'; >> +char *endstrbefore = NULL; >> +char *endstrafter = NULL; >> +int type; >> +unsigned int size = strlen(args); >> +long long ratio; >> + >> +if ((args == NULL) || (ionum == NULL) || (interval == NULL)) >> +{ >> +condlog(0, "args string is NULL"); >> +return 0; >> +} >> + >> +if ((size < 1) || (size > MAX_CHAR_SIZE-1)) >> +{ >> +condlog(0, "args string's size is too long"); >> +return 0; >> +} > > "too long" is not correct for the first (more likely) error condition. > >> + >> +memcpy(source, args, size+1); >> + >> +if (!isdigit(source[0])) >> +{ >> +condlog(0, "args io_num string's first char is not digit"); >> +return 0; >> +} > > It would be ok to catch all these errors with a single line of code > such as "path_latency: invalid prio_args format: %s". > OK, it will be fixed. >> + >> +*ionum = (int)strtoul(source, , 10); >> +if (endstrbefore[0] != vertica) >> +{ >> +condlog(0, "segmentation char is invalid"); >> +return 0; >> +} >> + >> +if (!isdigit(endstrbefore[1])) >> +{ >> +condlog(0, "args latency_interval string's first char is not >> digit"); >> +return 0; >> +} >> + >> +*interval = (long long)strtol([1], , >> 10); >> +type = get_interval_type(endstrafter); >> +if (type == INTERVAL_INVALID) >> +{ >> +condlog(0, "args latency_interval type is invalid"); >> +return 0; >> +} >> + >> +if (check_args_valid(*ionum, *interval, type) == 0) >> +{ >> +return 0; >> +} >> + >> +ratio = get_conversion_ratio(type); >> +*interval *= (long long)ratio; >> + >> +return 1; >> +} >> + >> +long long calc_standard_deviation(long long *path_latency, int size, >> long long avglatency) >> +{ >> +int index; >> +long long total = 0; >> + >> +for (index = 0; index < size; index++) >> +{ >> +total += (path_latency[index] - avglatency) * >> (path_latency[index] - avglatency); >> +} >> + >> +total /= (size-1); >> + >> +return (long long)sqrt((double)total); >> +} >> + >> +int getprio (struct path *pp, char *args, unsigned int timeout) >> +{ >> +int rc, temp; >> +int index = 0; >> +int io_num; >> +long long latency_interval; >> +long long
[dm-devel] [PATCH v5 0/1] multipath-tools: Prioritizer based on a latency algorithm
This patch value is in the following: 1. In the Storage-Backup environment of HyperCluster, includes one storage array near to the host and one remote storage array, and the two storage arrays have the same hardware.The same LUN is writed or readed by the two storage arrays. However, usually, the average latency of the paths of the remote storage array is much higher than the near storage array's. Apparently, the prioritizer can be a good automatic solution. And the current selectors don't solve it, IOs will send to the paths of the remote storage array, IOPS will be influenced unavoidably. 2. In the environment of single storage array, the prioritizer can automatically separate the paths who's latency is much higher, IOs will not send to this paths. But the current selectors don't solve this problem, IOPS will be influenced unavoidably. Changes from v4: * Argument "latency_interval" is set by using logarithmic scale, where base number is 10. Fix according to Martin's reviews. * The value of "MAX_LATENCY_INTERVAL" is set 10 from 60. Changes from v3: * Delete Copyright time "2021", Fix according to Xose's reviews. * Add version for GPL, Fix according to Xose's reviews. Changes from v2: * Reorganize the commit comment and patch format. * Added Benjamin, Martin and Xose's Reviewed-by. Changes from v1: * The value of "MIN_IO_NUM" is set 2 from 10. * Fix according to Benjamin, Martin and Xose's reviews. Yang Feng (1): multipath-tools: Prioritizer based on a latency algorithm libmultipath/prio.h | 1 + libmultipath/prioritizers/Makefile | 4 + libmultipath/prioritizers/path_latency.c | 282 +++ multipath/multipath.conf.5 | 20 +++ 4 files changed, 307 insertions(+) create mode 100644 libmultipath/prioritizers/path_latency.c -- 2.6.4.windows.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v5 1/1] multipath-tools: Prioritizer based on a latency algorithm
libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority values of specific paths are provided by a latency algorithm. And the latency algorithm is dependent on the following arguments(latency_interval and io_num). The principle of the algorithm is illustrated as follows: 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc" of each path can be provided. latency_interval latency_interval latency_interval latency_interval |--|--|--|...|--| | priority rank 1 | priority rank 2 | priority rank 3 |...| priority rank x | |--|--|--|...|--| Priority Rank Partitioning Signed-off-by: Yang Feng <philip.y...@huawei.com> Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com> Reviewed-by: Martin Wilck <mwi...@suse.com> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com> Reviewed-by: Hannes Reinecke <h...@suse.de> --- libmultipath/prio.h | 1 + libmultipath/prioritizers/Makefile | 4 + libmultipath/prioritizers/path_latency.c | 282 +++ multipath/multipath.conf.5 | 20 +++ 4 files changed, 307 insertions(+) create mode 100644 libmultipath/prioritizers/path_latency.c diff --git a/libmultipath/prio.h b/libmultipath/prio.h index 0193c52..c97fe39 100644 --- a/libmultipath/prio.h +++ b/libmultipath/prio.h @@ -29,6 +29,7 @@ struct path; #define PRIO_RDAC "rdac" #define PRIO_WEIGHTED_PATH "weightedpath" #define PRIO_SYSFS "sysfs" +#define PRIO_PATH_LATENCY "path_latency" /* * Value used to mark the fact prio was not defined diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile index 36b42e4..ca47cdf 100644 --- a/libmultipath/prioritizers/Makefile +++ b/libmultipath/prioritizers/Makefile @@ -18,6 +18,7 @@ LIBS = \ libpriorandom.so \ libpriordac.so \ libprioweightedpath.so \ + libpriopath_latency.so \ libpriosysfs.so all: $(LIBS) @@ -25,6 +26,9 @@ all: $(LIBS) libprioalua.so: alua.o alua_rtpg.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ +libpriopath_latency.so: path_latency.o ../checkers/libsg.o + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm + libprio%.so: %.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c new file mode 100644 index 000..e9cf679 --- /dev/null +++ b/libmultipath/prioritizers/path_latency.c @@ -0,0 +1,282 @@ +/* + * (C) Copyright HUAWEI Technology Corp. 2017, All Rights Reserved. + * + * path_latency.c + * + * Prioritizer for device mapper multipath, where the corresponding priority + * values of specific paths are provided by a latency algorithm. And the + * latency algorithm is dependent on arguments. + * + * The principle of the algorithm as follows: + * 1. By sending a certain number "io_num" of read IOs to the current path + *continuously, the IOs' average latency can be calculated. + * 2. According to the average latency of each path and the weight value + *"latency_interval", the priority "rc" of each path can be provided. + * + * Author(s): Yang Feng <philip.y...@huawei.com> + *Guan Junxiong <guanjunxi...@huawei.com> + *Zou Ming <zouming.zoum...@huawei.com> + * + * This file is released under the GPL version 2, or any later version. + * + */ +#include +#include +#include +#include + +#include "debug.h" +#include "prio.h" +#include "structs.h" +#include "../checkers/libsg.h" + +#define THRES_USEC_VALUE12000LL/*unit: us, =120s*/ +#define BASE_NUMBER 10 + +#define MAX_IO_NUM 200 +#define MIN_IO_NUM 2 + +#define MAX_LATENCY_INTERVAL10/*unit: s*/ +#define MIN_LATENCY_INTERVAL1 /*unit: us, or ms, or s*/ + +#define DEFAULT_PRIORITY0 + +#define MAX_CHAR_SIZE 30 + +#define CHAR_USEC "us" +#define CHAR_MSEC "ms" +#define CHAR_SEC"s" + +enum interval_type { +INTERVAL_USEC, +INTERVAL_MSEC, +INTERVAL_SEC, +INTERVAL_INVALID +}; + +/* interval_unit_str and interval_unit_type keep the same assignment sequence */ +static const char *interval_unit_str[MAX_CHAR_SIZE] = { +CHAR_USEC, CHAR_MSEC, CHAR_SEC +}; +static const int interval_unit_type[] = { +INTERVAL_USEC, INTERVAL
Re: [dm-devel] [PATCH v4 0/1] multipath-tools: Prioritizer based on a latency algorithm
Hi Martin, Thanks a lot. It's a good idea. The updated patch will be sent later. Regards, -Yang On 2017/6/9 16:05, Martin Wilck wrote: > Hello Yang, > >>> Actually, you're not alone here; several other storage array setups >>> suffer from the same problem. >>> >>> Eg if you have a site-failover setup with two storage arrays at >>> different locations the problem is more-or-less the same; >>> both arrays potentially will be displaying identical priority >>> information, despite one array being remote. >>> >> >> It's up to the value set of the argument "latency_interval".For >> example, >> If latency_interval=10ms, the paths will be grouped in priority >> groups >> with path latency 0-10ms, 10-20ms, 20-30ms, etc. If the argument >> "latency_interval" is set to appropriate value and the distance >> between >> two arrays is not enough far, two priorities may be the same, But >> it's >> OK, because between two arrays, the gap of average path latency is >> very >> small and tolerable. > > I wonder if it would make sense to use "logarithmically" scaled latency > intervals here. It wouldn't make a large difference whether the latency > is 1ms or 2ms, but if we have paths where the latencies differ by order > of magnitude, it would be very important to make a distinction. With > the current linear intervals, it would be hard to get this right (us > interval size would result in too many intervals, and sec interval size > wouldn't allow a distinction between us and ms latencies). > > By using logarithmic scale, you could setup the latency intevals e.g > like this: > > < 10us, 10us-100us, 100us-1ms, 1ms-10ms, 10ms-100ms, > 100ms > > IMO that would be better, in particular if the latencies differ > strongly between paths. > > Regards > Martin > > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v4 0/1] multipath-tools: Prioritizer based on a latency algorithm
Hi Hannes, Thanks a lot. Please find my replys as follows. regards. -Yang On 2017/6/8 23:37, Hannes Reinecke wrote: > On 06/06/2017 04:43 AM, Yang Feng wrote: >> This patch value is in the following: >> 1. In the Storage-Backup environment of HyperCluster, includes >> one storage array near to the host and one remote storage array, >> and the two storage arrays have the same hardware.The same LUN is >> writed or readed by the two storage arrays. However, usually, the >> average latency of the paths of the remote storage array is much >> higher than the near storage array's. Apparently, the prioritizer >> can be a good automatic solution. And the current selectors don't >> solve it, IOs will send to the paths of the remote storage array, >> IOPS will be influenced unavoidably. >> > Actually, you're not alone here; several other storage array setups > suffer from the same problem. > > Eg if you have a site-failover setup with two storage arrays at > different locations the problem is more-or-less the same; > both arrays potentially will be displaying identical priority > information, despite one array being remote. > It's up to the value set of the argument "latency_interval".For example, If latency_interval=10ms, the paths will be grouped in priority groups with path latency 0-10ms, 10-20ms, 20-30ms, etc. If the argument "latency_interval" is set to appropriate value and the distance between two arrays is not enough far, two priorities may be the same, But it's OK, because between two arrays, the gap of average path latency is very small and tolerable. > Similarly, if you have a failover scenario where the individual paths > are accessed via different protocols you're facing the same problem. > Usually, the number of paths grouped into one priority is not only one. And Mostly, the average latency of paths via FC protocol is much smaller than the average latency of paths via iSCSI protocol. Unless all paths via FC are fault, the failover scenario to paths via iSCSI will not happen. > The underlying reason for this difficulty is the two-stage topology of > the current multipath implementation: > > - Each path is grouped into a path group > - Priorities are attached to a path group > - Each path group belongs to a multipath device > > And as we only have a _single_ prioritizer per path we cannot easily > handle this situation. > > Ideally we should be able to 'stack' prioritizers, which would work by > combining the priority numbers from the stacked/combined prioritizers. > In this prioritizer, all paths will be grouped into several priorities, and each path group has own priority who's different from the others. 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc" of each path can be provided. Then all paths can be divided into different path groups. For example, If latency_interval=10ms, the paths will be grouped in priority groups with path latency 0-10ms, 10-20ms, 20-30ms, etc, as follows: latency_interval latency_interval latency_interval latency_interval |--|--|--|...|--| | priority rank 1 | priority rank 2 | priority rank 3 |...| priority rank x | |--|--|--|...|--| | 0~10ms | 10~20ms | 20~30ms |...| 10*(x-1)~10*x ms | Priority Rank Partitioning > But that would be requiring quite some rework, of course. > > Cheers, > > Hannes > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v3 0/1] multipath-tools: Prioritizer based on a latency algorithm
This patch value is in the following: 1. In the Storage-Backup environment of HyperCluster, includes one storage array near to the host and one remote storage array, and the two storage arrays have the same hardware.The same LUN is writed or readed by the two storage arrays. However, usually, the average latency of the paths of the remote storage array is much higher than the near storage array's. Apparently, the prioritizer can be a good automatic solution. And the current selectors don't solve it, IOs will send to the paths of the remote storage array, IOPS will be influenced unavoidably. 2. In the environment of single storage array, the prioritizer can automatically separate the paths who's latency is much higher, IOs will not send to this paths. But the current selectors don't solve this problem, IOPS will be influenced unavoidably. Changes to v2: * Reorganize the commit comment and patch format * Added Benjamin, Martin and Xose's Reviewed-by Changes to v1: * The value of "MIN_IO_NUM" is set 2 from 10 * Fix according to Benjamin, Martin and Xose's reviews. Yang Feng (1): multipath-tools: Prioritizer based on a latency algorithm libmultipath/prio.h | 1 + libmultipath/prioritizers/Makefile | 4 + libmultipath/prioritizers/path_latency.c | 272 +++ multipath/multipath.conf.5 | 18 ++ 4 files changed, 295 insertions(+) create mode 100644 libmultipath/prioritizers/path_latency.c -- 2.6.4.windows.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v4 0/1] multipath-tools: Prioritizer based on a latency algorithm
This patch value is in the following: 1. In the Storage-Backup environment of HyperCluster, includes one storage array near to the host and one remote storage array, and the two storage arrays have the same hardware.The same LUN is writed or readed by the two storage arrays. However, usually, the average latency of the paths of the remote storage array is much higher than the near storage array's. Apparently, the prioritizer can be a good automatic solution. And the current selectors don't solve it, IOs will send to the paths of the remote storage array, IOPS will be influenced unavoidably. 2. In the environment of single storage array, the prioritizer can automatically separate the paths who's latency is much higher, IOs will not send to this paths. But the current selectors don't solve this problem, IOPS will be influenced unavoidably. Changes from v3: * Delete Copyright time "2021", Fix according to Xose's reviews. * Add version for GPL, Fix according to Xose's reviews. Changes from v2: * Reorganize the commit comment and patch format. * Added Benjamin, Martin and Xose's Reviewed-by. Changes from v1: * The value of "MIN_IO_NUM" is set 2 from 10. * Fix according to Benjamin, Martin and Xose's reviews. Yang Feng (1): multipath-tools: Prioritizer based on a latency algorithm libmultipath/prio.h | 1 + libmultipath/prioritizers/Makefile | 4 + libmultipath/prioritizers/path_latency.c | 272 +++ multipath/multipath.conf.5 | 18 ++ 4 files changed, 295 insertions(+) create mode 100644 libmultipath/prioritizers/path_latency.c -- 2.6.4.windows.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v4 1/1] multipath-tools: Prioritizer based on a latency algorithm
libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority values of specific paths are provided by a latency algorithm. And the latency algorithm is dependent on the following arguments(latency_interval and io_num). The principle of the algorithm is illustrated as follows: 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc" of each path can be provided. latency_interval latency_interval latency_interval latency_interval |--|--|--|...|--| | priority rank 1 | priority rank 2 | priority rank 3 |...| priority rank x | |--|--|--|...|--| Priority Rank Partitioning Signed-off-by: Yang Feng <philip.y...@huawei.com> Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com> Reviewed-by: Martin Wilck <mwi...@suse.com> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com> --- libmultipath/prio.h | 1 + libmultipath/prioritizers/Makefile | 4 + libmultipath/prioritizers/path_latency.c | 272 +++ multipath/multipath.conf.5 | 18 ++ 4 files changed, 295 insertions(+) create mode 100644 libmultipath/prioritizers/path_latency.c diff --git a/libmultipath/prio.h b/libmultipath/prio.h index 0193c52..c97fe39 100644 --- a/libmultipath/prio.h +++ b/libmultipath/prio.h @@ -29,6 +29,7 @@ struct path; #define PRIO_RDAC "rdac" #define PRIO_WEIGHTED_PATH "weightedpath" #define PRIO_SYSFS "sysfs" +#define PRIO_PATH_LATENCY "path_latency" /* * Value used to mark the fact prio was not defined diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile index 36b42e4..ca47cdf 100644 --- a/libmultipath/prioritizers/Makefile +++ b/libmultipath/prioritizers/Makefile @@ -18,6 +18,7 @@ LIBS = \ libpriorandom.so \ libpriordac.so \ libprioweightedpath.so \ + libpriopath_latency.so \ libpriosysfs.so all: $(LIBS) @@ -25,6 +26,9 @@ all: $(LIBS) libprioalua.so: alua.o alua_rtpg.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ +libpriopath_latency.so: path_latency.o ../checkers/libsg.o + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm + libprio%.so: %.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c new file mode 100644 index 000..f8c9347 --- /dev/null +++ b/libmultipath/prioritizers/path_latency.c @@ -0,0 +1,272 @@ +/* + * (C) Copyright HUAWEI Technology Corp. 2017, All Rights Reserved. + * + * main.c + * + * Prioritizer for device mapper multipath, where the corresponding priority + * values of specific paths are provided by a latency algorithm. And the + * latency algorithm is dependent on arguments. + * + * The principle of the algorithm as follows: + * 1. By sending a certain number "io_num" of read IOs to the current path + *continuously, the IOs' average latency can be calculated. + * 2. According to the average latency of each path and the weight value + *"latency_interval", the priority "rc" of each path can be provided. + * + * Author(s): Yang Feng <philip.y...@huawei.com> + *Zou Ming <zouming.zoum...@huawei.com> + *Guan Junxiong <guanjunxi...@huawei.com> + * + * This file is released under the GPL version 2, or any later version. + */ +#include +#include +#include +#include + +#include "debug.h" +#include "prio.h" +#include "structs.h" +#include "../checkers/libsg.h" + +#define THRES_USEC_VALUE12000LL/*unit: us, =120s*/ + +#define MAX_IO_NUM 200 +#define MIN_IO_NUM 2 + +#define MAX_LATENCY_INTERVAL60/*unit: s*/ +#define MIN_LATENCY_INTERVAL1 /*unit: us, or ms, or s*/ + +#define DEFAULT_PRIORITY0 + +#define MAX_CHAR_SIZE 30 + +#define CHAR_USEC "us" +#define CHAR_MSEC "ms" +#define CHAR_SEC"s" + +enum interval_type { +INTERVAL_USEC, +INTERVAL_MSEC, +INTERVAL_SEC, +INTERVAL_INVALID +}; + +/* interval_unit_str and interval_unit_type keep the same assignment sequence */ +static const char *interval_unit_str[MAX_CHAR_SIZE] = { +CHAR_USEC, CHAR_MSEC, CHAR_SEC +}; +static const int interval_unit_type[] = { +INTERVAL_USEC, INTERVAL_MSEC, INTERVAL_SEC +}; + +#define USEC_PER_SEC 100LL +#define USEC_PER_MSEC 1000LL
[dm-devel] [PATCH v3 1/1] multipath-tools: Prioritizer based on a latency algorithm
libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority values of specific paths are provided by a latency algorithm. And the latency algorithm is dependent on the following arguments(latency_interval and io_num). The principle of the algorithm is illustrated as follows: 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc" of each path can be provided. latency_interval latency_interval latency_interval latency_interval |--|--|--|...|--| | priority rank 1 | priority rank 2 | priority rank 3 |...| priority rank x | |--|--|--|...|--| Priority Rank Partitioning Signed-off-by: Yang Feng <philip.y...@huawei.com> Reviewed-by: Benjamin Marzinski <bmarz...@redhat.com> Reviewed-by: Martin Wilck <mwi...@suse.com> Reviewed-by: Xose Vazquez Perez <xose.vazq...@gmail.com> --- libmultipath/prio.h | 1 + libmultipath/prioritizers/Makefile | 4 + libmultipath/prioritizers/path_latency.c | 272 +++ multipath/multipath.conf.5 | 18 ++ 4 files changed, 295 insertions(+) create mode 100644 libmultipath/prioritizers/path_latency.c diff --git a/libmultipath/prio.h b/libmultipath/prio.h index 0193c52..c97fe39 100644 --- a/libmultipath/prio.h +++ b/libmultipath/prio.h @@ -29,6 +29,7 @@ struct path; #define PRIO_RDAC "rdac" #define PRIO_WEIGHTED_PATH "weightedpath" #define PRIO_SYSFS "sysfs" +#define PRIO_PATH_LATENCY "path_latency" /* * Value used to mark the fact prio was not defined diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile index 36b42e4..ca47cdf 100644 --- a/libmultipath/prioritizers/Makefile +++ b/libmultipath/prioritizers/Makefile @@ -18,6 +18,7 @@ LIBS = \ libpriorandom.so \ libpriordac.so \ libprioweightedpath.so \ + libpriopath_latency.so \ libpriosysfs.so all: $(LIBS) @@ -25,6 +26,9 @@ all: $(LIBS) libprioalua.so: alua.o alua_rtpg.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ +libpriopath_latency.so: path_latency.o ../checkers/libsg.o + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm + libprio%.so: %.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c new file mode 100644 index 000..081f546 --- /dev/null +++ b/libmultipath/prioritizers/path_latency.c @@ -0,0 +1,272 @@ +/* + * (C) Copyright HUAWEI Technology Corp. 2017, 2021 All Rights Reserved. + * + * main.c + * + * Prioritizer for device mapper multipath, where the corresponding priority + * values of specific paths are provided by a latency algorithm. And the + * latency algorithm is dependent on arguments. + * + * The principle of the algorithm as follows: + * 1. By sending a certain number "io_num" of read IOs to the current path + *continuously, the IOs' average latency can be calculated. + * 2. According to the average latency of each path and the weight value + *"latency_interval", the priority "rc" of each path can be provided. + * + * Author(s): Yang Feng <philip.y...@huawei.com> + *Zou Ming <zouming.zoum...@huawei.com> + *Guan Junxiong <guanjunxi...@huawei.com> + * + * This file is released under the GPL. + */ +#include +#include +#include +#include + +#include "debug.h" +#include "prio.h" +#include "structs.h" +#include "../checkers/libsg.h" + +#define THRES_USEC_VALUE12000LL/*unit: us, =120s*/ + +#define MAX_IO_NUM 200 +#define MIN_IO_NUM 2 + +#define MAX_LATENCY_INTERVAL60/*unit: s*/ +#define MIN_LATENCY_INTERVAL1 /*unit: us, or ms, or s*/ + +#define DEFAULT_PRIORITY0 + +#define MAX_CHAR_SIZE 30 + +#define CHAR_USEC "us" +#define CHAR_MSEC "ms" +#define CHAR_SEC"s" + +enum interval_type { +INTERVAL_USEC, +INTERVAL_MSEC, +INTERVAL_SEC, +INTERVAL_INVALID +}; + +/* interval_unit_str and interval_unit_type keep the same assignment sequence */ +static const char *interval_unit_str[MAX_CHAR_SIZE] = { +CHAR_USEC, CHAR_MSEC, CHAR_SEC +}; +static const int interval_unit_type[] = { +INTERVAL_USEC, INTERVAL_MSEC, INTERVAL_SEC +}; + +#define USEC_PER_SEC 100LL +#define USEC_PER_MSEC 1000LL +#define USEC_PER_USEC 1LL + +static c
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a latency algorithm
On 2017/5/25 3:57, Benjamin Marzinski wrote: > On Wed, May 24, 2017 at 10:50:18AM +0800, Yang Feng wrote: >> Hello Christophe, > ACK > > -Ben > Hello Christophe, Any advance for this patch? Thanks,Best! -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a latency algorithm
Hello Christophe, Our technology team is working on a open source contribution for multipath. We think that this patch is essential to isolate the high latency paths and avoid high fluctuation of io performance, particularly, for the Storage-Backup environment of HyperCluster. Look forward your reply very much. Respects and regards! 1. In the Storage-Backup environment of HyperCluster, includes one storage array near to the host and one remote storage array, and the two storage arrays have the same hardware. The same LUN is writed or readed by the two storage arrays.However, usually, the average latency of the paths of the remote storage array is much higher than the near storage array's. Apparently, the prioritizer can be a good automatic solution. And the current selectors don't solve it, IOs will send to the paths of the remote storage array, IOPS will be influenced unavoidably. 2. In the environment of single storage array, the prioritizer can automatically separate the paths who's latency is much higher, IOs will not send to this paths. But the current selectors don't solve this problem, io performance will be influenced unavoidably. The up-to-date patch as follows: --- >From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001 From: Yang Feng <philip.y...@huawei.com> Date: Fri, 19 May 2017 16:09:07 +0800 Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority values of specific paths are provided by a latency algorithm. And the latency algorithm is dependent on the following arguments(latency_interval and io_num). The principle of the algorithm is illustrated as follows: 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc" of each path can be provided. latency_interval latency_interval latency_interval latency_interval |--|--|--|...|--| | priority rank 1 | priority rank 2 | priority rank 3 |...| priority rank x | |--|--|--|...|--| Priority Rank Partitioning --- libmultipath/prioritizers/Makefile | 6 +- libmultipath/prioritizers/path_latency.c | 271 +++ multipath/multipath.conf.5 | 18 ++ libmultipath/prio.h | 1 + 4 files changed, 295 insertions(+), 1 deletion(-) create mode 100644 libmultipath/prioritizers/path_latency.c diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile index 36b42e4..d2f20f6 100644 --- a/libmultipath/prioritizers/Makefile +++ b/libmultipath/prioritizers/Makefile @@ -18,13 +18,17 @@ LIBS = \ libpriorandom.so \ libpriordac.so \ libprioweightedpath.so \ - libpriosysfs.so + libpriopath_latency.so \ + libpriosysfs.so all: $(LIBS) libprioalua.so: alua.o alua_rtpg.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ +libpriopath_latency.so: path_latency.o ../checkers/libsg.o + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lm + libprio%.so: %.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c new file mode 100644 index 000..a666b6c --- /dev/null +++ b/libmultipath/prioritizers/path_latency.c @@ -0,0 +1,271 @@ +/* + * (C) Copyright HUAWEI Technology Corp. 2017, 2021 All Rights Reserved. + * + * main.c + * + * Prioritizer for device mapper multipath, where the corresponding priority + * values of specific paths are provided by a latency algorithm. And the + * latency algorithm is dependent on arguments. + * + * The principle of the algorithm as follows: + * 1. By sending a certain number "io_num" of read IOs to the current path + *continuously, the IOs' average latency can be calculated. + * 2. According to the average latency of each path and the weight value + *"latency_interval", the priority "rc" of each path can be provided. + * + * Author(s): Yang Feng <philip.y...@huawei.com> + *Zou Ming <zouming.zoum...@huawei.com> + * + * This file is released under the GPL. + */ +#include +#include +#include +#include + +#include "debug.h" +#include "prio.h" +#include "structs.h" +#include "../checkers/libsg.h" + +#define THRES_USEC_VALUE12000LL/*unit: us, =120s*/ + +#define MAX_IO_NUM 200 +#define MIN_IO_NUM 10 + +#define MAX_LATENCY_INTERVAL60/*unit: s*/ +#define MIN_LATENCY_INTERVAL1
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a latency algorithm
/prioritizers/delayedpath.h >> new file mode 100644 >> index 000..d8213e9 >> --- /dev/null >> +++ b/libmultipath/prioritizers/delayedpath.h >> @@ -0,0 +1,17 @@ >> +#ifndef _DELAYEDPATH_H >> +#define _DELAYEDPATH_H >> + >> +#define PRIO_DELAYED_PATH "delayedpath" > > In order for the rest of the code to refer to this prioritizer, this > define should be in prio.h with the other prioritizer names, and as long > as delayedpath.c includes prio.h, there's no need to put it in > delayedpath.h. OK, as the following patch. > >> + >> +#define PRIO_NO_INFORMATION 5 > > The rest of the multipath code only cares if getprio returns a negative > number of not. It doesn't check what the specific negative number is. I > realize the the alua prioritizer returns a set of error codes, but they > aren't used, or even usable in their present form. If we wanted to have > better error reporting, we should set up a universal set of error codes > in prio.h, and have all prioritizers use them, instead of having each > prioritizer define its own error codes. There's no reason why your > prioritizer needs to return this error code instead of -1. OK, as the following patch. > >> + >> +#define USEC_PER_SEC 100LL >> +#define USEC_PER_MSEC 1000LL >> +#define USEC_PER_USEC 1LL >> + >> +static inline long long timeval_to_us(const struct timespec *tv) >> +{ >> +return ((long long) tv->tv_sec * USEC_PER_SEC) + (tv->tv_nsec >> 10); >> +} > > No other file besides delayedpath.c will likely be including this .h > file, so I don't see any purpose for these being defined here. In fact, > I don't see why you can't just have a .c file without a .h file like the > majority of prioritizers. I'm pretty sure that none of the prioritizers > really need their own .h file. OK, as the following patch. > >> +#endif >> diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5 >> index 5939688..f1e126e 100644 >> --- a/multipath/multipath.conf.5 >> +++ b/multipath/multipath.conf.5 >> @@ -293,6 +293,10 @@ Generate a random priority between 1 and 10. >> Generate the path priority based on the regular expression and the >> priority provided as argument. Requires prio_args keyword. >> .TP >> +.I delayedpath >> +Generate the path priority based on a time-delay algorithm. >> +Requires prio_args keyword. > > Really it doesn't require prio_args if you want to use the default > values, and should probably say so. The default args is discarded, as the following patch. >> +.I delayed >> +Needs a value of the form >> +\fI"<delay_interval|cons_num>"\fR >> +.RS >> +.TP 8 >> +.I delay_interval >> +The interval values of average IO-time-delay between two different >> neighbour ranks of path priority, used to partition different priority ranks. >> +Form: XXs, or XXXus, or XXXms. Unit: Second, or Microsecond, or >> Millisecond. Valid Values: Integer, s [1, 60], ms [1, 1000), us [1, 1000), >> +For example: 10s, or 100us, or 100ms. The default is: 10ms. >> +.TP >> +.I cons_num >> +The number of read IOs sent to the current path continuously, used to >> calculate the average IO-time-delay. Valid Values: Integer, [3, 1000]. >> +For example: 30. The default is: 20. >> +.RE >> +.TP 12 > > Looking at the "weighted" prio_args definition just above your "delayed" > definition, the pipe character "|" is being used to say that any of a > set of options is allowed. Your definition has it being a literal > character, but it's still inside the angle brackets that usually > delineate a variable. perhaps "|" would be > easier to understand, or even "[delayed_interval]|[io_num]" if you can > omit these to use the defaults. OK, as the following patch. The up-to-date patch as follows: --- >From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001 From: Yang Feng <philip.y...@huawei.com> Date: Fri, 19 May 2017 16:09:07 +0800 Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority values of specific paths are provided by a latency algorithm. And the latency algorithm is dependent on the following arguments(latency_interval and io_num). The principle of the algorithm is illustrated as follows: 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc&q
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm
Hello Benjamin, How about this patch? Look forward your replies. Thank you very much. Best regards! On 2017/5/19 17:45, Yang Feng wrote: > Hi Benjamin, > > Thank you very much for your comments. > Please find my replys and the up-to-date patch. > Best regards! > >> >> First, one overall question. We have dynamic path selectors available to >> deal with paths that are just simply slower that other paths, but can >> still be used together. Is there specific hardware or a specific setup >> where this isn't good enough and we really need to seperate these paths >> into different pathgroups, but we can't find out deterministically how >> the groups should be set up? It just seems like there could be a less >> hacky solution to this problem, but perhaps there are some situations >> where this is truly the best option. I'm just wondering what those are.1. In >> the Storage-Backup environment of HyperCluster,includs one storage array near > to the host and one remote storage array, and the two storage arrays have the > same hardware. > The same LUN is writed or readed by the two storage arrays. > However, usually, the average latency of the paths of the remote storage > array is much higher than the > near storage array's. > apparently, the prioritizer can be a good automatic solution. > And the current selectors don't solve it, IOs will send to the paths of the > remote storage array, IOPS will be influenced unavoidably. > 2. In the environment of single storage array, the prioritizer can > automatically separate the paths who's latency is much higher, > IOs will not send to this paths. > But the current selectors don't solve this problem, IOPS will be influenced > unavoidably. > >>> + >>> +/* interval_unit_str and interval_unit_type keep the same assignment >>> sequence */ >>> +static const char interval_unit_str[][MAX_CHAR_SIZE] = { >>> +CHAR_USEC, CHAR_MSEC, CHAR_SEC >> >> This is a nit, but for constant strings, could you please use "char >> *var" instead of "char var[]", to be consistent with the rest of the >> multipath code. > Thanks, as the following patch. > >>> +if ((args == NULL) || (interval == NULL) >>> +|| (consnum == NULL) || (type == NULL)) >>> +return 0; >>> + >>> +/* int type */ >>> +if ((size < 1) || (size > MAX_CHAR_SIZE-1)) >>> +return 0; >> >> You should probably have log messages for these error returns. > Thanks, as the following patch. > >>> + >>> +memcpy(source, args, size+1); >>> +if (strstr(source, vertica) == NULL) >>> +return 0; >>> + >>> +*type = get_interval_type(source, typestr); >>> +if (*type == INTERVAL_INVALID) >>> +{ >>> +condlog(0, "delay_interval type is invalid"); >>> +return 0; >>> +} >> >> I'm confused here. How do you get to use the default interval. Shouldn't >> you accept "20s|" and "|30" and as valid inputs that use the defaults >> for the part they don't specify. > OK,the default arguments value is removed. If get inputs failed, return > default priority "0". > As the following patch. >> >>> +tokenbefore = strtok(source, vertica); >>> +tokenafter = strtok(NULL, vertica); >>> +typestr[1] = '\0'; >>> +tokenbefore = strtok(tokenbefore, typestr); >>> +if ((tokenbefore == NULL) || (tokenafter == NULL)) >>> +return 0; >>> + >>> +tmp = tokenbefore; >>> +while (*tmp != '\0') >>> +if (!isdigit(*tmp++)) >>> +{ >>> +condlog(0, "delay_interval string include invalid char"); >>> +return 0; >>> +} >>> + >>> +tmp = tokenafter; >>> +while (*tmp != '\0') >>> +if (!isdigit(*tmp++)) >>> +{ >>> +condlog(0, "cons_num string include invalid char"); >>> +return 0; >>> +} >>> + >>> +*interval = atoi(tokenbefore); >> >> Why do you keep track of the type and the interval seperately? Can't you >> just find out the type, and use that to multiply the interval once you >> read it, and then just use that value, instead of keeping track of two >> values across multiple functions? > Thanks, as the following patch. > >>> + >>> + if (pp->fd < 0) >>> + return -PRIO_
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a latency algorithm
Hello Martin, How about this patch? Look forward your replies. Thank you very much. Best regards! On 2017/5/19 16:43, Yang Feng wrote: > Hello Martin, > > Firstly, thank you very much for your comments. > And find my replys and the up-to-date patch. > > Best regards! > > >> Please think about the name once again. Maybe you should call it >> "io_latency" or "path_latency" instead of "delayedpath"? > OK, as the following patch. > >> >> Hm, I can't see a lot of difference in the parsing code wrt the >> previous version. IMO it's still non-straightforward and hard to >> comprehend. Maybe I didn't express myself clearly enough. Here is how >> I'd code this: >> >> 1. Verify that the string starts with a digit. Error if it does not. >> 2. Parse the delay interval using strtoul(). >> 3. The "end" pointer of strtoul() points to the unit, which has to be >> "s", "ms" or "us". Verify, and set the unit accordingly. >> 4. Verify that the next character is '|', and that it's followed by a >> digit. >> 5. Parse the number with strtoul() >> 6. Verify that there's no garbage at the end of the string. > Thank you , as the following patch. > >> >> Please follow the https://en.wikipedia.org/wiki/Robustness_principle. >> If a user enters "1500ms" here, the parsing will silently fail, and >> with it the whole prio algorithm. This will cause user confusion. >> Please don't do this > Thank you , as the following patch. > >> >> However please consider lowering the upper bound, I kind of doubt that >> 1000 IOs will finish quickly. More often than not, a lot of paths will >> appear at the same time (e.g. if a port of a storage array is enabled) >> and we'll have to send 1000 IOs to each one. >> > OK, the upper bound lower to 200, as the following patch. > >>> +while (temp-- > 0) >>> +{ >>> +(void)clock_gettime(CLOCK_MONOTONIC, ); >>> +before = timeval_to_us(); >>> + >>> +if (do_readsector0(pp->fd, timeout) == 2) >>> +{ >>> +condlog(0, "%s: path down", pp->dev); >>> +return -1; >>> +} >>> + >>> +(void)clock_gettime(CLOCK_MONOTONIC, ); >>> +after = timeval_to_us(); >>> + >>> +delay = after - before; >>> + >>> +min = (min <= delay) ? min : delay; >>> +max = (max >= delay) ? max : delay; >>> + >>> +toldelay += delay; >>> +} >>> + >>> +toldelay -= min + max; >> >> Why are you doing this? If you want to discard "extreme" values, this >> is probably not sufficient. If cons == 3, this will have the effect to >> use a single measurement rather than an average, is that intended? >> >> Btw, as you are doing statistics here anyway, you may want to calculate >> the estimate of the standard deviation and warn the user if the >> delay_interval is smaller than, say, 2 * standard deviation. >> >> Please consider printing a message with the measured value at debug >> level 3 or higher. > OK, as the following patch. > >>> "%N:%R:%n:%r"\fR. For example: 0x200100e08ba0aea0:0x210100e08ba0aea0:.*:.* >>> , .*:.*:iqn.2009-10.com.redhat.msp.lab.ask-06:.* >>> .RE >>> .TP 12 >>> +.I delayed >> >> should be "delayedpath" here? > OK, as the following patch. >> >>> +Needs a value of the form >>> +\fI"<delay_interval|cons_num>"\fR >>> +.RS >>> +.TP 8 >>> +.I delay_interval >>> +The interval values of average IO-time-delay between two different >>> neighbour ranks of path priority, used to partition different >>> priority ranks. >> >> It might be good to give an example here, like this: >> >> "If delay_interval=10ms, the paths will be grouped in priority groups >> with path latency 0-10ms, 10-20ms, 20-30ms, etc." > OK, as the following patch.> > > --- >>From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001 > From: Yang Feng <philip.y...@huawei.com> > Date: Fri, 19 May 2017 16:09:07 +0800 > Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper > multipath, where the corresponding priority > values of specific paths are provided by a latency algorithm. And the latency > algorithm is dependent on the
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm
Hello Benjamin and Christophe, How about this patch? Thanks a lot. Best. On 2017/5/19 17:45, Yang Feng wrote: > Hi Benjamin, > > Thank you very much for your comments. > Please find my replys and the up-to-date patch. > Best regards! > >> >> First, one overall question. We have dynamic path selectors available to >> deal with paths that are just simply slower that other paths, but can >> still be used together. Is there specific hardware or a specific setup >> where this isn't good enough and we really need to seperate these paths >> into different pathgroups, but we can't find out deterministically how >> the groups should be set up? It just seems like there could be a less >> hacky solution to this problem, but perhaps there are some situations >> where this is truly the best option. I'm just wondering what those are.1. In >> the Storage-Backup environment of HyperCluster,includs one storage array near > to the host and one remote storage array, and the two storage arrays have the > same hardware. > The same LUN is writed or readed by the two storage arrays. > However, usually, the average latency of the paths of the remote storage > array is much higher than the > near storage array's. > apparently, the prioritizer can be a good automatic solution. > And the current selectors don't solve it, IOs will send to the paths of the > remote storage array, IOPS will be influenced unavoidably. > 2. In the environment of single storage array, the prioritizer can > automatically separate the paths who's latency is much higher, > IOs will not send to this paths. > But the current selectors don't solve this problem, IOPS will be influenced > unavoidably. > >>> + >>> +/* interval_unit_str and interval_unit_type keep the same assignment >>> sequence */ >>> +static const char interval_unit_str[][MAX_CHAR_SIZE] = { >>> +CHAR_USEC, CHAR_MSEC, CHAR_SEC >> >> This is a nit, but for constant strings, could you please use "char >> *var" instead of "char var[]", to be consistent with the rest of the >> multipath code. > Thanks, as the following patch. > >>> +if ((args == NULL) || (interval == NULL) >>> +|| (consnum == NULL) || (type == NULL)) >>> +return 0; >>> + >>> +/* int type */ >>> +if ((size < 1) || (size > MAX_CHAR_SIZE-1)) >>> +return 0; >> >> You should probably have log messages for these error returns. > Thanks, as the following patch. > >>> + >>> +memcpy(source, args, size+1); >>> +if (strstr(source, vertica) == NULL) >>> +return 0; >>> + >>> +*type = get_interval_type(source, typestr); >>> +if (*type == INTERVAL_INVALID) >>> +{ >>> +condlog(0, "delay_interval type is invalid"); >>> +return 0; >>> +} >> >> I'm confused here. How do you get to use the default interval. Shouldn't >> you accept "20s|" and "|30" and as valid inputs that use the defaults >> for the part they don't specify. > OK,the default arguments value is removed. If get inputs failed, return > default priority "0". > As the following patch. >> >>> +tokenbefore = strtok(source, vertica); >>> +tokenafter = strtok(NULL, vertica); >>> +typestr[1] = '\0'; >>> +tokenbefore = strtok(tokenbefore, typestr); >>> +if ((tokenbefore == NULL) || (tokenafter == NULL)) >>> +return 0; >>> + >>> +tmp = tokenbefore; >>> +while (*tmp != '\0') >>> +if (!isdigit(*tmp++)) >>> +{ >>> +condlog(0, "delay_interval string include invalid char"); >>> +return 0; >>> +} >>> + >>> +tmp = tokenafter; >>> +while (*tmp != '\0') >>> +if (!isdigit(*tmp++)) >>> +{ >>> +condlog(0, "cons_num string include invalid char"); >>> +return 0; >>> +} >>> + >>> +*interval = atoi(tokenbefore); >> >> Why do you keep track of the type and the interval seperately? Can't you >> just find out the type, and use that to multiply the interval once you >> read it, and then just use that value, instead of keeping track of two >> values across multiple functions? > Thanks, as the following patch. > >>> + >>> + if (pp->fd < 0) >>> + return -PRIO_NO_INFORMATION; >>
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm
Hello Martin, Firstly, thank you very much for your comments. And find my replys and the up-to-date patch. Best regards! > Please think about the name once again. Maybe you should call it > "io_latency" or "path_latency" instead of "delayedpath"? OK, as the following patch. > > Hm, I can't see a lot of difference in the parsing code wrt the > previous version. IMO it's still non-straightforward and hard to > comprehend. Maybe I didn't express myself clearly enough. Here is how > I'd code this: > > 1. Verify that the string starts with a digit. Error if it does not. > 2. Parse the delay interval using strtoul(). > 3. The "end" pointer of strtoul() points to the unit, which has to be > "s", "ms" or "us". Verify, and set the unit accordingly. > 4. Verify that the next character is '|', and that it's followed by a > digit. > 5. Parse the number with strtoul() > 6. Verify that there's no garbage at the end of the string. Thank you , as the following patch. > > Please follow the https://en.wikipedia.org/wiki/Robustness_principle. > If a user enters "1500ms" here, the parsing will silently fail, and > with it the whole prio algorithm. This will cause user confusion. > Please don't do this Thank you , as the following patch. > > However please consider lowering the upper bound, I kind of doubt that > 1000 IOs will finish quickly. More often than not, a lot of paths will > appear at the same time (e.g. if a port of a storage array is enabled) > and we'll have to send 1000 IOs to each one. > OK, the upper bound lower to 200, as the following patch. >> +while (temp-- > 0) >> +{ >> +(void)clock_gettime(CLOCK_MONOTONIC, ); >> +before = timeval_to_us(); >> + >> +if (do_readsector0(pp->fd, timeout) == 2) >> +{ >> +condlog(0, "%s: path down", pp->dev); >> +return -1; >> +} >> + >> +(void)clock_gettime(CLOCK_MONOTONIC, ); >> +after = timeval_to_us(); >> + >> +delay = after - before; >> + >> +min = (min <= delay) ? min : delay; >> +max = (max >= delay) ? max : delay; >> + >> +toldelay += delay; >> +} >> + >> +toldelay -= min + max; > > Why are you doing this? If you want to discard "extreme" values, this > is probably not sufficient. If cons == 3, this will have the effect to > use a single measurement rather than an average, is that intended? > > Btw, as you are doing statistics here anyway, you may want to calculate > the estimate of the standard deviation and warn the user if the > delay_interval is smaller than, say, 2 * standard deviation. > > Please consider printing a message with the measured value at debug > level 3 or higher. OK, as the following patch. >> "%N:%R:%n:%r"\fR. For example: 0x200100e08ba0aea0:0x210100e08ba0aea0:.*:.* >> , .*:.*:iqn.2009-10.com.redhat.msp.lab.ask-06:.* >> .RE >> .TP 12 >> +.I delayed > > should be "delayedpath" here? OK, as the following patch. > >> +Needs a value of the form >> +\fI"<delay_interval|cons_num>"\fR >> +.RS >> +.TP 8 >> +.I delay_interval >> +The interval values of average IO-time-delay between two different >> neighbour ranks of path priority, used to partition different >> priority ranks. > > It might be good to give an example here, like this: > > "If delay_interval=10ms, the paths will be grouped in priority groups > with path latency 0-10ms, 10-20ms, 20-30ms, etc." OK, as the following patch.> --- >From 58d718fdd34550bd9c4a32c6e9a87099c1e45a9f Mon Sep 17 00:00:00 2001 From: Yang Feng <philip.y...@huawei.com> Date: Fri, 19 May 2017 16:09:07 +0800 Subject: [PATCH] libmultipath/prioritizers: Prioritizer for device mapper multipath, where the corresponding priority values of specific paths are provided by a latency algorithm. And the latency algorithm is dependent on the following arguments(latency_interval and io_num). The principle of the algorithm is illustrated as follows: 1. By sending a certain number "cons_num" of read IOs to the current path continuously, the IOs' average latency can be calculated. 2. According to the average latency of each path and the weight value "latency_interval", the priority "rc" of each path can be provided. latency_interval latency_interval latency_interval latency_interval |--|--|--|...|--
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm
Hello Martin, How about this patch? Thanks a lot, Best regards. On 2017/5/15 18:44, Yang Feng wrote: > Hello Martin, > > Thank you very much for your remarks. I am sorry for late reply, Please find > my answer and the updated patch. > >> Hello Yang, >> >> thank you for your work. Please find my remarks below. >> >> On Mon, 2017-05-08 at 11:58 +0800, Yang Feng wrote: >>> Prioritizer for device mapper multipath, where the corresponding >>> priority >>> values of specific paths are provided by a time-delay algorithm. And >>> the >>> time-delay algorithm is dependent on the following >>> arguments(delay_interval, >>> cons_num). >>> The principle of the algorithm is illustrated as follows: >>> 1. By sending a certain number "cons_num" of read IOs to the current >>> path >>>continuously, the IOs' average delay can be calculated. >>> 2. According to the average delay of each path and the weight value >>>"delay_interval", the priority "rc" of each path can be provided. >>> >>> delay_interval delay_interval delay_interval delay_inter >> >> How does this algorithm behave under load? Can we be sure that >> priorities don't start to fluctuate wildly because busy paths will >> usually have longer latencies than idle ones? > I have a lot of test under load. When the appropriate value of argument > "delay_interval" is set, > this algorithm behave well and can separate the paths who's average delay is > more than others. > When add a new path or the path's state change from down to up, getprio() of > the prioritizer is triggered, and > the current path is not under IOs. > >>> libmultipath/Makefile | 2 +- >>> libmultipath/checkers/Makefile | 7 +- >>> libmultipath/checkers/emc_clariion.c| 2 +- >>> libmultipath/checkers/libsg.c | 94 >>> libmultipath/checkers/libsg.h | 9 -- >>> libmultipath/checkers/readsector0.c | 2 +- >>> libmultipath/libsg.c| 94 >>> libmultipath/libsg.h| 9 ++ >>> libmultipath/prioritizers/Makefile | 6 +- >>> libmultipath/prioritizers/delayedpath.c | 246 >> >> Why do you have to move libsg for this? It's already used by various >> checkers, why can't your checker do the same? If you really need to do >> it, you should at least separate that part of the patch from the added >> code. > OK, this time, libsg will not be moved. > >>> + >>> +#define CHAR_SEC"SEC" >>> +#define CHAR_MSEC "MSEC" >>> +#define CHAR_USEC "USEC" >> >> I suggest to use "s", "ms", and "us" here instead. > OK, as the following patch. > >> If you create an array of "const char*" instead like you did for >> conversion_ratio below, you could implement get_interval_type() more >> elegantly using a loop over that array. > OK, as the following patch. > >>> +static int get_interval_type(char *source, char *type) >>> +{ >>> +/*is USEC*/ >>> +if ((strstr(source, CHAR_USEC) != NULL) >>> +&& (strstr(source, CHAR_USEC)[4] == '_')) >> >> Please avoid these double strstr() invocation. The compiler may >> optimize it away, but it just looks strange. The following would >> look better to me, and I find it actually more readable: >> >> if (((p = strstr(source, CHAR_USEC)) != NULL) && p[4] == '_') > OK, as the following patch. > >>> +static int get_string_from_under(char *args, >>> +char *beforestring, >>> +char *afterstring, >>> +int *type) >> >> Maybe you could figure out a more descriptive name for this function? >> >> A comment in the code showing how the string to be parsed typically >> looks like would be helpful for the reader. > OK, as the following patch. > >>> +token = strtok_r(source, under, ); >>> +token = strtok(token, char_type); >> >> I'm pretty sure this is is not what you intended to write. If char_type >> is "usec", this would split the string at the possible delimiters 'u', >> 's', 'e', and 'c' (the 2nd argument of strtok(3) is not a sequence, but >> a 'set' of bytes). It might accidentally work w
Re: [dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm
Hello Martin, Thank you very much for your remarks. I am sorry for late reply, Please find my answer and the updated patch. > Hello Yang, > > thank you for your work. Please find my remarks below. > > On Mon, 2017-05-08 at 11:58 +0800, Yang Feng wrote: >> Prioritizer for device mapper multipath, where the corresponding >> priority >> values of specific paths are provided by a time-delay algorithm. And >> the >> time-delay algorithm is dependent on the following >> arguments(delay_interval, >> cons_num). >> The principle of the algorithm is illustrated as follows: >> 1. By sending a certain number "cons_num" of read IOs to the current >> path >>continuously, the IOs' average delay can be calculated. >> 2. According to the average delay of each path and the weight value >>"delay_interval", the priority "rc" of each path can be provided. >> >> delay_interval delay_interval delay_interval delay_inter > > How does this algorithm behave under load? Can we be sure that > priorities don't start to fluctuate wildly because busy paths will > usually have longer latencies than idle ones? I have a lot of test under load. When the appropriate value of argument "delay_interval" is set, this algorithm behave well and can separate the paths who's average delay is more than others. When add a new path or the path's state change from down to up, getprio() of the prioritizer is triggered, and the current path is not under IOs. >> libmultipath/Makefile | 2 +- >> libmultipath/checkers/Makefile | 7 +- >> libmultipath/checkers/emc_clariion.c| 2 +- >> libmultipath/checkers/libsg.c | 94 >> libmultipath/checkers/libsg.h | 9 -- >> libmultipath/checkers/readsector0.c | 2 +- >> libmultipath/libsg.c| 94 >> libmultipath/libsg.h| 9 ++ >> libmultipath/prioritizers/Makefile | 6 +- >> libmultipath/prioritizers/delayedpath.c | 246 > > Why do you have to move libsg for this? It's already used by various > checkers, why can't your checker do the same? If you really need to do > it, you should at least separate that part of the patch from the added > code. OK, this time, libsg will not be moved. >> + >> +#define CHAR_SEC"SEC" >> +#define CHAR_MSEC "MSEC" >> +#define CHAR_USEC "USEC" > > I suggest to use "s", "ms", and "us" here instead. OK, as the following patch. > If you create an array of "const char*" instead like you did for > conversion_ratio below, you could implement get_interval_type() more > elegantly using a loop over that array. OK, as the following patch. >> +static int get_interval_type(char *source, char *type) >> +{ >> +/*is USEC*/ >> +if ((strstr(source, CHAR_USEC) != NULL) >> +&& (strstr(source, CHAR_USEC)[4] == '_')) > > Please avoid these double strstr() invocation. The compiler may > optimize it away, but it just looks strange. The following would > look better to me, and I find it actually more readable: > > if (((p = strstr(source, CHAR_USEC)) != NULL) && p[4] == '_') OK, as the following patch. >> +static int get_string_from_under(char *args, >> +char *beforestring, >> +char *afterstring, >> +int *type) > > Maybe you could figure out a more descriptive name for this function? > > A comment in the code showing how the string to be parsed typically > looks like would be helpful for the reader. OK, as the following patch. >> +token = strtok_r(source, under, ); >> +token = strtok(token, char_type); > > I'm pretty sure this is is not what you intended to write. If char_type > is "usec", this would split the string at the possible delimiters 'u', > 's', 'e', and 'c' (the 2nd argument of strtok(3) is not a sequence, but > a 'set' of bytes). It might accidentally work with the input strings > you are using (in particular because you only look at the first token), > but nevertheless it's wrong. OK, as the following patch. >> +if ((token == NULL) || (saveptr == NULL)) >> +return 0; >> + >> +tmp = token; >> +while (*tmp != '\0') >> +if (!isdigit(*tmp++)) >> +return 0; >> + >> +tmp = saveptr; >> +while (*tmp != '\0') >> +if (!isdigit(*tm
[dm-devel] [PATCH] multipath-tools:Prioritizer based on a time-delay algorithm
errno)); + + if (res < 0) { + if (ENOMEM == errno) { + return PATH_UP; + } + return PATH_DOWN; + } + + if ((0 == io_hdr.status) && + (0 == io_hdr.host_status) && + (0 == io_hdr.driver_status)) { + return PATH_UP; + } else { + int key = 0; + + if (io_hdr.sb_len_wr > 3) { + if (sbb[0] == 0x72 || sbb[0] == 0x73) + key = sbb[1] & 0x0f; + else if (io_hdr.sb_len_wr > 13 && +((sbb[0] & 0x7f) == 0x70 || + (sbb[0] & 0x7f) == 0x71)) + key = sbb[2] & 0x0f; + } + + /* +* Retry if UNIT_ATTENTION check condition. +*/ + if (key == 0x6) { + if (--retry_count) + goto retry; + } + return PATH_DOWN; + } +} diff --git a/libmultipath/libsg.h b/libmultipath/libsg.h new file mode 100644 index 000..3994f45 --- /dev/null +++ b/libmultipath/libsg.h @@ -0,0 +1,9 @@ +#ifndef _LIBSG_H +#define _LIBSG_H + +#define SENSE_BUFF_LEN 32 + +int sg_read (int sg_fd, unsigned char * buff, int buff_len, +unsigned char * sense, int sense_len, unsigned int timeout); + +#endif /* _LIBSG_H */ diff --git a/libmultipath/prioritizers/Makefile b/libmultipath/prioritizers/Makefile index 36b42e4..7e3da51 100644 --- a/libmultipath/prioritizers/Makefile +++ b/libmultipath/prioritizers/Makefile @@ -18,13 +18,17 @@ LIBS = \ libpriorandom.so \ libpriordac.so \ libprioweightedpath.so \ - libpriosysfs.so + libpriodelayedpath.so \ + libpriosysfs.so all: $(LIBS) libprioalua.so: alua.o alua_rtpg.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ +libpriodelayedpath.so: delayedpath.o ../libsg.o + $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ + libprio%.so: %.o $(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ diff --git a/libmultipath/prioritizers/delayedpath.c b/libmultipath/prioritizers/delayedpath.c new file mode 100644 index 000..4c1cfea --- /dev/null +++ b/libmultipath/prioritizers/delayedpath.c @@ -0,0 +1,246 @@ +/* + * (C) Copyright HUAWEI Technology Corp. 2017, 2021 All Rights Reserved. + * + * main.c + * + * Prioritizer for device mapper multipath, where the corresponding priority + * values of specific paths are provided by a time-delay algorithm. And the + * time-delay algorithm is dependent on arguments. + * + * The principle of the algorithm as follows: + * 1. By sending a certain number "cons_num" of read IOs to the current path + *continuously, the IOs' average delay can be calculated. + * 2. According to the average delay of each path and the weight value + *"delay_interval", the priority "rc" of each path can be provided. + * + * Author(s): Yang Feng <philip.y...@huawei.com> + *Zou Ming <zouming.zoum...@huawei.com> + * + * This file is released under the GPL. + */ +#include +#include +#include + +#include "debug.h" +#include "prio.h" +#include "structs.h" +#include "../libmultipath/libsg.h" + +#include "delayedpath.h" + +#define THRES_USEC_VALUE3LL/*USEC, 300SEC*/ +#define DEFAULT_DELAY_INTERVAL 10 /*MSEC*/ +#define DEFAULT_CONS_NUM20 + +#define MAX_CHAR_SIZE 30 + +#define CHAR_SEC"SEC" +#define CHAR_MSEC "MSEC" +#define CHAR_USEC "USEC" + +enum interval_type { +INTERVAL_SEC, +INTERVAL_MSEC, +INTERVAL_USEC, +INTERVAL_INVALID +}; + +static int conversion_ratio[] = { + [INTERVAL_SEC] = USEC_PER_SEC, + [INTERVAL_MSEC] = USEC_PER_MSEC, + [INTERVAL_USEC] = USEC_PER_USEC, + [INTERVAL_INVALID] = 0, +}; + + +static int do_readsector0(int fd, unsigned int timeout) +{ + unsigned char buf[4096]; + unsigned char sbuf[SENSE_BUFF_LEN]; + int ret; + + ret = sg_read(fd, [0], 4096, [0], + SENSE_BUFF_LEN, timeout); + + return ret; +} + +static int get_interval_type(char *source, char *type) +{ +/*is USEC*/ +if ((strstr(source, CHAR_USEC) != NULL) +&& (strstr(source, CHAR_USEC)[4] == '_')) +{ +memcpy(type, CHAR_USEC, strlen(CHAR_USEC)+1); +return INTERVAL_USEC; +} + +/*is MSEC*/ +if ((strstr(source, CHAR_MSEC) != NULL) +&& (strstr(source, CHAR_MSEC)[4] == '_')) +{ +memcpy(type, CHAR_MSEC, strlen(CHAR_MSEC)+1); +return INTERVAL_MSEC; +} + +/*is SEC*/ + if ((strstr(source, CHAR_SEC) != NULL) +