Re: [dm-devel] multipath-tools: add ANA support for NVMe device

2018-11-12 Thread Martin Wilck
On Mon, 2018-11-12 at 16:53 -0500, Mike Snitzer wrote:
> 
> I (and others) think it makes sense to at least triple check with the
> NVMe developers (now cc'd) to see if we could get agreement on the
> nvme
> driver providing the ANA state via sysfs (when modparam
> nvme_core.multipath=N is set), like Hannes proposed here:
> http://lists.infradead.org/pipermail/linux-nvme/2018-November/020765.html
> 
> Then the userspace multipath-tools ANA support could just read sysfs
> rather than reinvent harvesting the ANA state via ioctl.

If some kernels expose the sysfs attributes and some don't, what are we
supposed to do? In order to be portable, multipath-tools (and other
user space tools, for that matter) need to support both, and maintain
multiple code paths. Just like SCSI :-)

I'd really like to see this abstracted away with a "libnvme" (you name
it), which user space tools could simply call without having to worry
how it actually talks to the kernel.

Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH] dm integrity: Document size and format of superblock fields

2018-11-12 Thread Andy Grover

On 11/9/18 12:21 PM, Andy Grover wrote:

As mentioned elsewhere in dm-integrity.txt, creating a new integrity
device requires creating a small integrity device on top of the base
device that formats the base device, reading the provided data sectors
out of the superblock, and then recreating the integrity device with the
correct size. For this, userspace must know the offset, length, and
endianness of the provided_data_sectors field in the superblock.

Document all fields mentioned in the txt to include this, based on struct
superblock in dm-integrity.c. Extra fields in struct superblock not
already mentioned in the txt remain undocumented.


In 4.19 I just noticed provided_data_sectors is now included in dm 
status. I'm assuming that is now the preferred way for userspace to 
discover this value? Thus making reading it from the on-disk superblock 
unnecessary, and thus *documenting* the superblock format unnecessary. 
Sounds good.


So please disregard this patch, although some different documentation 
changes are probably now needed.


Thanks -- Regards -- Andy

p.s. I'd just run across an issue where creating an integrity device on 
a loopback device would result in the superblock still reading as all 
zeroes. Another reason to do it the new way :)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] multipath-tools: add ANA support for NVMe device

2018-11-12 Thread Mike Snitzer
On Mon, Nov 12 2018 at 11:23am -0500,
Martin Wilck  wrote:

> Hello Lijie,
> 
> On Thu, 2018-11-08 at 14:09 +0800, lijie wrote:
> > Add support for Asynchronous Namespace Access as specified in NVMe
> > 1.3
> > TP 4004. The states are updated through reading the ANA log page.
> > 
> > By default, the native nvme multipath takes over the nvme device.
> > We can pass a false to the parameter 'multipath' of the nvme-core.ko
> > module,when we want to use multipath-tools.
> 
> Thank you for the patch. It looks quite good to me. I've tested it with
> a Linux target and found no problems so far.
> 
> I have a few questions and comments inline below.
> 
> I suggest you also have a look at detect_prio(); it seems to make sense
> to use the ana prioritizer for NVMe paths automatically if ANA is
> supported (with your patch, "detect_prio no" and "prio ana" have to be
> configured explicitly). But that can be done in a later patch.

I (and others) think it makes sense to at least triple check with the
NVMe developers (now cc'd) to see if we could get agreement on the nvme
driver providing the ANA state via sysfs (when modparam
nvme_core.multipath=N is set), like Hannes proposed here:
http://lists.infradead.org/pipermail/linux-nvme/2018-November/020765.html

Then the userspace multipath-tools ANA support could just read sysfs
rather than reinvent harvesting the ANA state via ioctl.

But if we continue to hit a wall on that type of sharing of the nvme
driver's state then I'm fine with reinventing ANA state inquiry and
tracking like has been proposed here.

Mike

p.s. thanks for your review Martin, we really need to determine the way
forward for full multipath-tools support of NVMe with ANA.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] multipath-tools: add ANA support for NVMe device

2018-11-12 Thread Martin Wilck
Hello Lijie,

On Thu, 2018-11-08 at 14:09 +0800, lijie wrote:
> Add support for Asynchronous Namespace Access as specified in NVMe
> 1.3
> TP 4004. The states are updated through reading the ANA log page.
> 
> By default, the native nvme multipath takes over the nvme device.
> We can pass a false to the parameter 'multipath' of the nvme-core.ko
> module,when we want to use multipath-tools.

Thank you for the patch. It looks quite good to me. I've tested it with
a Linux target and found no problems so far.

I have a few questions and comments inline below.

I suggest you also have a look at detect_prio(); it seems to make sense
to use the ana prioritizer for NVMe paths automatically if ANA is
supported (with your patch, "detect_prio no" and "prio ana" have to be
configured explicitly). But that can be done in a later patch.

Martin

> ---
>  libmultipath/prio.h|   1 +
>  libmultipath/prioritizers/Makefile |   1 +
>  libmultipath/prioritizers/ana.c| 292
> +
>  libmultipath/prioritizers/ana.h| 221
> 
>  multipath/multipath.conf.5 |   8 +
>  5 files changed, 523 insertions(+)
>  create mode 100644 libmultipath/prioritizers/ana.c
>  create mode 100644 libmultipath/prioritizers/ana.h
> 
> diff --git a/libmultipath/prio.h b/libmultipath/prio.h
> index aa587cc..599d1d8 100644
> --- a/libmultipath/prio.h
> +++ b/libmultipath/prio.h
> @@ -30,6 +30,7 @@ struct path;
>  #define PRIO_WEIGHTED_PATH   "weightedpath"
>  #define PRIO_SYSFS   "sysfs"
>  #define PRIO_PATH_LATENCY"path_latency"
> +#define PRIO_ANA "ana"
>  
>  /*
>   * Value used to mark the fact prio was not defined
> diff --git a/libmultipath/prioritizers/Makefile
> b/libmultipath/prioritizers/Makefile
> index ab7bc07..15afaba 100644
> --- a/libmultipath/prioritizers/Makefile
> +++ b/libmultipath/prioritizers/Makefile
> @@ -19,6 +19,7 @@ LIBS = \
>   libpriordac.so \
>   libprioweightedpath.so \
>   libpriopath_latency.so \
> + libprioana.so \
>   libpriosysfs.so
>  
>  all: $(LIBS)
> diff --git a/libmultipath/prioritizers/ana.c
> b/libmultipath/prioritizers/ana.c
> new file mode 100644
> index 000..c5aaa5f
> --- /dev/null
> +++ b/libmultipath/prioritizers/ana.c
> @@ -0,0 +1,292 @@
> +/*
> + * (C) Copyright HUAWEI Technology Corp. 2017   All Rights Reserved.
> + *
> + * ana.c
> + * Version 1.00
> + *
> + * Tool to make use of a NVMe-feature called  Asymmetric Namespace
> Access.
> + * It determines the ANA state of a device and prints a priority
> value to stdout.
> + *
> + * Author(s): Cheng Jike 
> + *Li Jie 
> + *
> + * This file is released under the GPL version 2, or any later
> version.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "debug.h"
> +#include "prio.h"
> +#include "structs.h"
> +#include "ana.h"
> +
> +enum {
> + ANA_PRIO_OPTIMIZED  = 50,
> + ANA_PRIO_NONOPTIMIZED   = 10,
> + ANA_PRIO_INACCESSIBLE   = 5,
> + ANA_PRIO_PERSISTENT_LOSS= 1,
> + ANA_PRIO_CHANGE = 0,
> + ANA_PRIO_RESERVED   = 0,
> + ANA_PRIO_GETCTRL_FAILED = -1,
> + ANA_PRIO_NOT_SUPPORTED  = -2,
> + ANA_PRIO_GETANAS_FAILED = -3,
> + ANA_PRIO_GETANALOG_FAILED   = -4,
> + ANA_PRIO_GETNSID_FAILED = -5,
> + ANA_PRIO_GETNS_FAILED   = -6,
> + ANA_PRIO_NO_MEMORY  = -7,
> + ANA_PRIO_NO_INFORMATION = -8,
> +};
> +
> +static const char * anas_string[] = {
> + [NVME_ANA_OPTIMIZED]= "ANA Optimized
> State",
> + [NVME_ANA_NONOPTIMIZED] = "ANA Non-Optimized
> State",
> + [NVME_ANA_INACCESSIBLE] = "ANA Inaccessible
> State",
> + [NVME_ANA_PERSISTENT_LOSS]  = "ANA Persistent
> Loss State",
> + [NVME_ANA_CHANGE]   = "ANA Change state",
> + [NVME_ANA_RESERVED] = "Invalid namespace
> group state!",
> +};
> +
> +static const char *aas_print_string(int rc)
> +{
> + rc &= 0xff;
> +
> + switch(rc) {
> + case NVME_ANA_OPTIMIZED:
> + case NVME_ANA_NONOPTIMIZED:
> + case NVME_ANA_INACCESSIBLE:
> + case NVME_ANA_PERSISTENT_LOSS:
> + case NVME_ANA_CHANGE:
> + return anas_string[rc];
> + default:
> + return anas_string[NVME_ANA_RESERVED];
> + }
> +
> + return anas_string[NVME_ANA_RESERVED];
> +}

nit: the last statement is superfluous.

> +
> +static int nvme_get_nsid(int fd, unsigned *nsid)
> +{
> + static struct stat nvme_stat;
> + int err = fstat(fd, &nvme_stat);
> + if (err < 0)
> + return 1;
> +
> + if (!S_ISBLK(nvme_stat.st_mode)) {
> + condlog(0, "Error: requesting namespace-id from non-
> block device\n");
> + return 1;
> + }
> +
> + *nsid = ioctl(fd, NVME_IOCTL_ID);
> + retu

[dm-devel] unnecessary "memset(n, 0, block_size); " in dm_btree_empty ?

2018-11-12 Thread shenghui
Hi,

Sorry to trouble you.

I'm reading the code of persistent data, and noticed:

dm_btree_empty() -> new_block() -> dm_tm_new_block() to get a new block 
allocated.
And the new block is zeroed with write lock held.

"memset(n, 0, block_size);" in dm_btree_empty() seems unnecessary. Right?

Thanks,
shenghui

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel