Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
> > On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote: > > On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta wrote: > > > > > > Dont support 'MAP_SYNC' with non-DAX files and DAX files > > > with asynchronous dax_device. Virtio pmem provides > > > asynchronous host page cache flush mechanism. We don't > > > support 'MAP_SYNC' with virtio pmem and xfs. > > > > > > Signed-off-by: Pankaj Gupta > > > --- > > > fs/xfs/xfs_file.c | 9 ++--- > > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > Darrick, does this look ok to take through the nvdimm tree? > > forgot about this, sorry. :/ > > > > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > > index a7ceae90110e..f17652cca5ff 100644 > > > --- a/fs/xfs/xfs_file.c > > > +++ b/fs/xfs/xfs_file.c > > > @@ -1203,11 +1203,14 @@ xfs_file_mmap( > > > struct file *filp, > > > struct vm_area_struct *vma) > > > { > > > + struct dax_device *dax_dev; > > > + > > > + dax_dev = xfs_find_daxdev_for_inode(file_inode(filp)); > > > /* > > > -* We don't support synchronous mappings for non-DAX files. At > > > least > > > -* until someone comes with a sensible use case. > > > +* We don't support synchronous mappings for non-DAX files and > > > +* for DAX files if underneath dax_device is not synchronous. > > > */ > > > - if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) > > > + if (!daxdev_mapping_supported(vma, dax_dev)) > > > return -EOPNOTSUPP; > > LGTM, and I'm fine with it going through nvdimm. Nothing in > xfs-5.2-merge touches that function so it should be clean. > > Reviewed-by: Darrick J. Wong Thank you for the review. Pankaj > > --D > > > > > > > file_accessed(filp); > > > -- > > > 2.20.1 > > > > > ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Delivery failed
The original message was received at Wed, 8 May 2019 11:35:04 +0800 from lists.01.org [182.75.71.75] - The following addresses had permanent fatal errors - - Transcript of session follows - while talking to lists.01.org.: >>> MAIL From:"Automatic Email Delivery Software" <<< 501 "Automatic Email Delivery Software" ... Refused ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH ndctl] ndctl, region: return the rc value in region_action
Hi Elliott Thanks for your review. On 4/16/19 2:29 AM, Elliott, Robert (Servers) wrote: -Original Message- From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of Dan Williams Sent: Monday, April 15, 2019 11:29 AM Subject: Re: [PATCH ndctl] ndctl, region: return the rc value in region_action On Mon, Apr 15, 2019 at 1:24 AM Yi Zhang wrote: if region_action failed, return the rc valude instead of 0 valude s/b value Sorry for the late response, I just found this patch was merged: https://git.kernel.org/pub/scm/utils/kernel/ndctl/ndctl.git/commit/?id=9840942d5f6dc0d0e3693956f6733bbbc4e06e3a Signed-off-by: Yi Zhang --- ndctl/region.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ndctl/region.c b/ndctl/region.c index 993b3f8..7945007 100644 --- a/ndctl/region.c +++ b/ndctl/region.c @@ -89,7 +89,7 @@ static int region_action(struct ndctl_region *region, enum device_action mode) break; } - return 0; + return rc; } Looks good to me: Reviewed-by: Dan Williams Perhaps a similar change should be made to ndctl_namespace_disable_safe(), one of the functions called by region_action() whose rc is propagated, to propagate the rc from ndctl_namespace_disable_invalidate(). I'm not familiar with the ndctl code, could you help send patch to fix it if the change needed. Thanks Yi ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[ndctl PATCH] contrib: add an example qemu setup script for HMAT emulation
Add a script to demonstrate HMAT emulation using qemu by injecting the initramfs with a made-up ACPI HMAT table. Cc: Dan Williams Originally-authored-by: Keith Busch [vishal: minor shellcheck fixups] Signed-off-by: Vishal Verma --- contrib/daxctl-qemu-hmat-setup | 211 + 1 file changed, 211 insertions(+) create mode 100755 contrib/daxctl-qemu-hmat-setup diff --git a/contrib/daxctl-qemu-hmat-setup b/contrib/daxctl-qemu-hmat-setup new file mode 100755 index 000..0670e99 --- /dev/null +++ b/contrib/daxctl-qemu-hmat-setup @@ -0,0 +1,211 @@ +#!/bin/bash -e + +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2019 Intel Corporation. All rights reserved. + +KERNEL=${KERNEL:-$(uname -r)} +ROOT_IMAGE=${ROOT_IMAGE:-root.img} + +INITIATORS=4 +TARGETS=128 +TARGET_SIZE_MB=128 +TARGET_SIZE="$((TARGET_SIZE_MB * 1024 * 1024))" +QEMU_MEM="$((TARGET_SIZE_MB * TARGETS))" + +for i in $(seq 0 $((TARGETS - 1))); do + qemu-img create -f raw "dimm-$i" "${TARGET_SIZE}" +done; + +# Address Range Structures +cat < hmat.dsl + Signature : "HMAT"[Heterogeneous Memory Attributes Table] +Table Length : 0200 +Revision : 01 +Checksum : F4 + Oem ID : "BOCHS " +Oem Table ID : "BXPCHMAT" +Oem Revision : 0001 + Asl Compiler ID : "INTL" + Asl Compiler Revision : 20170929 +Reserved : + + Structure Type : [Memory Subystem Address Range] +Reserved : + Length : 0028 + Flags (decoded below) : 0003 +Processor Proximity Domain Valid : 1 + Memory Proximity Domain Valid : 1 +Reservation Hint : 0 + Reserved1 : + Processor Proximity Domain : + Memory Proximity Domain : + Reserved2 : + Physical Address Range Base : + Physical Address Range Size : 000A + + Structure Type : [Memory Subystem Address Range] +Reserved : + Length : 0028 + Flags (decoded below) : 0003 +Processor Proximity Domain Valid : 1 + Memory Proximity Domain Valid : 1 +Reservation Hint : 0 + Reserved1 : + Processor Proximity Domain : + Memory Proximity Domain : + Reserved2 : + Physical Address Range Base : 0010 + Physical Address Range Size : 07F0 +EOF + +for i in $(seq 1 $((TARGETS - 1))); do + BASE=$(printf "%016x" $((0x800 * i))) +cat <> hmat.dsl + + Structure Type : [Memory Subystem Address Range] +Reserved : + Length : 0028 + Flags (decoded below) : 0003 +Processor Proximity Domain Valid : 1 + Memory Proximity Domain Valid : 1 +Reservation Hint : 0 + Reserved1 : + Processor Proximity Domain : $(printf "%08x" $((i % INITIATORS))) + Memory Proximity Domain : $(printf "%08x" "${i}") + Reserved2 : + Physical Address Range Base : ${BASE} + Physical Address Range Size : 0800 +EOF +done + +# System Locality Latency +TABLE_SIZE="$(printf "%08x" $((40 + 4 * INITIATORS + 4 * TARGETS + 2 * INITIATORS * TARGETS)))" +cat <> hmat.dsl + + Structure Type : 0001 [System Locality Latency and Bandwidth Information] +Reserved : + Length : ${TABLE_SIZE} + Flags (decoded below) : 00 +Memory Hierarchy : 0 + Data Type : 00 + Reserved1 : + Initiator Proximity Domains # : $(printf "%08x" ${INITIATORS}) + Target Proximity Domains # : $(printf "%08x" ${TARGETS}) + Reserved2 : + Entry Base Unit : 0001 +EOF + +for i in $(seq 0 $((INITIATORS - 1))); do +cat <> hmat.dsl + Initiator Proximity Domain List : $(printf "%08x" "${i}") +EOF +done + +for i in $(seq 0 $((TARGETS - 1))); do +cat <> hmat.dsl +Target Proximity Domain List : $(printf "%08x" "${i}") +EOF +done + +LOCAL_START=0x10 +REMOTE_START=0x20 +for i in $(seq 0 $((INITIATORS - 1))); do + for j in $(seq 0 $((TARGETS - 1))); do +if [ "$((j % INITIATORS))" -eq "${i}" ]; then + cat <> hmat.dsl + Entry : $(printf "%04x" $((LOCAL_START + j * 10))) +EOF +else + cat <> hmat.dsl + Entry : $(printf "%04x" $((REMOTE_START + j * 10))) +EOF +fi + done +done + +# System Locality Bandwidth +TABLE_SIZE=$(printf "%08x" $((40 + 4 * INITIATORS + 4 * TARGETS + 2 *
[ndctl PATCH v2 06/10] libdaxctl: add an interface to get the mode for a dax device
In preparation for a reconfigure-device command, add an interface to retrieve the 'mode' of a dax device. This will allow the reconfigure-device command (and via daxctl_dev_to_json()), also daxctl-list) to print the mode on device listings via a list command or immediately after a mode change. Cc: Dave Hansen Cc: Dan Williams Signed-off-by: Vishal Verma --- daxctl/lib/libdaxctl.c | 43 daxctl/lib/libdaxctl.sym | 1 + daxctl/libdaxctl.h | 1 + util/json.c | 14 + 4 files changed, 59 insertions(+) diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c index aab2364..a4919e0 100644 --- a/daxctl/lib/libdaxctl.c +++ b/daxctl/lib/libdaxctl.c @@ -12,6 +12,8 @@ */ #include #include +#include +#include #include #include #include @@ -840,6 +842,47 @@ DAXCTL_EXPORT int daxctl_dev_disable(struct daxctl_dev *dev) return 0; } +DAXCTL_EXPORT enum daxctl_dev_mode daxctl_dev_get_mode(struct daxctl_dev *dev) +{ + const char *devname = daxctl_dev_get_devname(dev); + struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev); + int rc = DAXCTL_DEV_MODE_UNKNOWN; + char path[200]; + const int len = sizeof(path); + char *mod_path, *mod_base; + + if (!daxctl_dev_is_enabled(dev)) + return rc; + + if (ctx->subsys != DAX_BUS) { + err(ctx, "%s: invalid operation for dax subsystem\n", devname); + err(ctx, "%s: see daxctl-migrate-device-model(1)\n", devname); + return -ENXIO; + } + + if (snprintf(path, len, "%s/driver/module", dev->dev_path) >= len) { + err(ctx, "%s: buffer too small!\n", devname); + return -ENXIO; + } + + mod_path = realpath(path, NULL); + if (!mod_path) { + rc = -errno; + err(ctx, "%s: unable to determine module: %s\n", devname, + strerror(errno)); + return rc; + } + + mod_base = basename(mod_path); + if (strcmp(mod_base, dax_modules[DAXCTL_DEV_MODE_RAM]) == 0) + rc = DAXCTL_DEV_MODE_RAM; + else if (strcmp(mod_base, dax_modules[DAXCTL_DEV_MODE_DEVDAX]) == 0) + rc = DAXCTL_DEV_MODE_DEVDAX; + + free(mod_path); + return rc; +} + DAXCTL_EXPORT struct daxctl_ctx *daxctl_dev_get_ctx(struct daxctl_dev *dev) { return dev->region->ctx; diff --git a/daxctl/lib/libdaxctl.sym b/daxctl/lib/libdaxctl.sym index cc47ed6..d53976d 100644 --- a/daxctl/lib/libdaxctl.sym +++ b/daxctl/lib/libdaxctl.sym @@ -62,4 +62,5 @@ global: daxctl_dev_online_node; daxctl_dev_offline_node; daxctl_dev_node_is_online; + daxctl_dev_get_mode; } LIBDAXCTL_5; diff --git a/daxctl/libdaxctl.h b/daxctl/libdaxctl.h index db0d4ea..4f9088f 100644 --- a/daxctl/libdaxctl.h +++ b/daxctl/libdaxctl.h @@ -76,6 +76,7 @@ int daxctl_dev_enable_ram(struct daxctl_dev *dev); int daxctl_dev_online_node(struct daxctl_dev *dev); int daxctl_dev_offline_node(struct daxctl_dev *dev); int daxctl_dev_node_is_online(struct daxctl_dev *dev); +enum daxctl_dev_mode daxctl_dev_get_mode(struct daxctl_dev *dev); #define daxctl_dev_foreach(region, dev) \ for (dev = daxctl_dev_get_first(region); \ diff --git a/util/json.c b/util/json.c index b7ce719..4f13222 100644 --- a/util/json.c +++ b/util/json.c @@ -271,6 +271,7 @@ struct json_object *util_daxctl_dev_to_json(struct daxctl_dev *dev, { const char *devname = daxctl_dev_get_devname(dev); struct json_object *jdev, *jobj; + enum daxctl_dev_mode mode; int node; jdev = json_object_new_object(); @@ -292,6 +293,19 @@ struct json_object *util_daxctl_dev_to_json(struct daxctl_dev *dev, json_object_object_add(jdev, "numa_node", jobj); } + mode = daxctl_dev_get_mode(dev); + if (mode > 0) { + jobj = NULL; + if (mode == DAXCTL_DEV_MODE_RAM) + jobj = json_object_new_string("system-ram"); + else if (mode == DAXCTL_DEV_MODE_DEVDAX) + jobj = json_object_new_string("devdax"); + else + jobj = json_object_new_string("unknown"); + if (jobj) + json_object_object_add(jdev, "mode", jobj); + } + return jdev; } -- 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[ndctl PATCH v2 00/10] daxctl: add a new reconfigure-device command
Changes in v2: - Add examples to the documentation page (Dave Hansen) - Clarify documentation regarding the conversion from system-ram to devdax - Remove any references to a persistent config from the documentation - those can be added when the feature is added. - device.c: validate option compatibility - daxctl-list: display numa_node for device listings - daxctl-list: display mode for device listings - make the options more consistent by adding a '-O' short option for --attempt-offline Add a new daxctl-reconfigure-device command that lets us reconfigure DAX devices back and forth between 'system-ram' and 'device-dax' modes. It also includes facilities to online any newly hot-plugged memory (default), and attempt to offline memory before converting away from the system-ram mode (not default, requires a --attempt-offline option). Currently missing from this series is a way to persistently store which devices have been 'marked' for use as system-ram. This depends on a config system overhaul in ndctl, and patches for those will follow separately and are independent of this work. Example invocations: 1. Reconfigure dax0.0 to system-ram mode, don’t online the memory # daxctl reconfigure-device --mode=system-ram --no-online dax0.0 [ { "chardev":"dax0.0", "size":16777216000, "numa_node":2, "mode":"system-ram" } ] 2. Reconfigure dax0.0 to devdax mode, attempt to offline the memory # daxctl reconfigure-device --human --mode=devdax --attempt-offline dax0.0 { "chardev":"dax0.0", "size":"15.63 GiB (16.78 GB)", "numa_node":2, "mode":"devdax" } 3. Reconfigure all dax devices on region0 to system-ram mode # daxctl reconfigure-device --mode=system-ram --region=0 all [ { "chardev":"dax0.0", "size":16777216000, "numa_node":2, "mode":"system-ram" }, { "chardev":"dax0.1", "size":16777216000, "numa_node":3, "mode":"system-ram" } ] These patches can also be found in the 'kmem-pending' branch on github: https://github.com/pmem/ndctl/tree/kmem-pending Cc: Dan Williams Cc: Dave Hansen Cc: Pavel Tatashin Vishal Verma (10): libdaxctl: add interfaces in support of device modes libdaxctl: cache 'subsystem' in daxctl_ctx libdaxctl: add interfaces to enable/disable devices libdaxctl: add interfaces to get/set the online state for a node daxctl/list: add numa_node for device listings libdaxctl: add an interface to get the mode for a dax device daxctl: add a new reconfigure-device command Documentation/daxctl: add a man page for daxctl-reconfigure-device contrib/ndctl: fix region-id completions for daxctl contrib/ndctl: add bash-completion for daxctl-reconfigure-device Documentation/daxctl/Makefile.am | 3 +- .../daxctl/daxctl-reconfigure-device.txt | 118 contrib/ndctl | 34 +- daxctl/Makefile.am| 2 + daxctl/builtin.h | 1 + daxctl/daxctl.c | 1 + daxctl/device.c | 237 daxctl/lib/Makefile.am| 3 +- daxctl/lib/libdaxctl-private.h| 21 + daxctl/lib/libdaxctl.c| 554 +- daxctl/lib/libdaxctl.sym | 14 + daxctl/libdaxctl.h| 16 + util/json.c | 22 + 13 files changed, 1015 insertions(+), 11 deletions(-) create mode 100644 Documentation/daxctl/daxctl-reconfigure-device.txt create mode 100644 daxctl/device.c -- 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[ndctl PATCH v2 01/10] libdaxctl: add interfaces in support of device modes
In preparation for libdaxctl and daxctl to grow operational modes for DAX devices, add the following supporting APIs: daxctl_dev_get_ctx daxctl_dev_is_enabled Cc: Dan Williams Signed-off-by: Vishal Verma --- daxctl/lib/libdaxctl.c | 30 ++ daxctl/lib/libdaxctl.sym | 6 ++ daxctl/libdaxctl.h | 2 ++ 3 files changed, 38 insertions(+) diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c index c2e3a52..70f896b 100644 --- a/daxctl/lib/libdaxctl.c +++ b/daxctl/lib/libdaxctl.c @@ -559,6 +559,36 @@ static void dax_regions_init(struct daxctl_ctx *ctx) } } +static int is_enabled(const char *drvpath) +{ + struct stat st; + + if (lstat(drvpath, ) < 0 || !S_ISLNK(st.st_mode)) + return 0; + else + return 1; +} + +DAXCTL_EXPORT int daxctl_dev_is_enabled(struct daxctl_dev *dev) +{ + struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev); + char *path = dev->dev_buf; + int len = dev->buf_len; + + if (snprintf(path, len, "%s/driver", dev->dev_path) >= len) { + err(ctx, "%s: buffer too small!\n", + daxctl_dev_get_devname(dev)); + return 0; + } + + return is_enabled(path); +} + +DAXCTL_EXPORT struct daxctl_ctx *daxctl_dev_get_ctx(struct daxctl_dev *dev) +{ + return dev->region->ctx; +} + DAXCTL_EXPORT struct daxctl_dev *daxctl_dev_get_first(struct daxctl_region *region) { dax_devices_init(region); diff --git a/daxctl/lib/libdaxctl.sym b/daxctl/lib/libdaxctl.sym index 84d3a69..c4af9a7 100644 --- a/daxctl/lib/libdaxctl.sym +++ b/daxctl/lib/libdaxctl.sym @@ -50,3 +50,9 @@ LIBDAXCTL_5 { global: daxctl_region_get_path; } LIBDAXCTL_4; + +LIBDAXCTL_6 { +global: + daxctl_dev_get_ctx; + daxctl_dev_is_enabled; +} LIBDAXCTL_5; diff --git a/daxctl/libdaxctl.h b/daxctl/libdaxctl.h index 1d13ea2..e20ccb4 100644 --- a/daxctl/libdaxctl.h +++ b/daxctl/libdaxctl.h @@ -67,6 +67,8 @@ const char *daxctl_dev_get_devname(struct daxctl_dev *dev); int daxctl_dev_get_major(struct daxctl_dev *dev); int daxctl_dev_get_minor(struct daxctl_dev *dev); unsigned long long daxctl_dev_get_size(struct daxctl_dev *dev); +struct daxctl_ctx *daxctl_dev_get_ctx(struct daxctl_dev *dev); +int daxctl_dev_is_enabled(struct daxctl_dev *dev); #define daxctl_dev_foreach(region, dev) \ for (dev = daxctl_dev_get_first(region); \ -- 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[ndctl PATCH v2 10/10] contrib/ndctl: add bash-completion for daxctl-reconfigure-device
Add bash completion helpers for the new daxctl-reconfigure-device command. Cc: Dan Williams Signed-off-by: Vishal Verma --- contrib/ndctl | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/contrib/ndctl b/contrib/ndctl index d1f8bd6..32c4731 100755 --- a/contrib/ndctl +++ b/contrib/ndctl @@ -544,7 +544,7 @@ __daxctlcomp() COMPREPLY=( $( compgen -W "$1" -- "$2" ) ) for cword in "${COMPREPLY[@]}"; do - if [[ "$cword" == @(--region|--dev) ]]; then + if [[ "$cword" == @(--region|--dev|--mode) ]]; then COMPREPLY[$i]="${cword}=" else COMPREPLY[$i]="${cword} " @@ -569,6 +569,9 @@ __daxctl_comp_options() --dev) opts="$(__daxctl_get_devs -i)" ;; + --mode) + opts="system-ram devdax" + ;; *) return ;; @@ -579,8 +582,19 @@ __daxctl_comp_options() __daxctl_comp_non_option_args() { - # there aren't any commands that accept non option arguments yet - return + local subcmd=$1 + local cur=$2 + local opts + + case $subcmd in + reconfigure-device) + opts="$(__daxctl_get_devs -i) all" + ;; + *) + return + ;; + esac + __daxctlcomp "$opts" "$cur" } __daxctl_main() -- 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[ndctl PATCH v2 04/10] libdaxctl: add interfaces to get/set the online state for a node
In preparation for converting DAX devices for use as system-ram via the kernel's 'kmem' facility, add libndctl helpers to manipulate the sysfs interfaces to get the target_node of a DAX device, and to online, offline, and query the state of hotplugged memory sections associated with a given node. This adds the following new interfaces: daxctl_dev_get_target_node daxctl_dev_online_node daxctl_dev_offline_node daxctl_dev_node_is_online Cc: Pavel Tatashin Cc: Dave Hansen Cc: Dan Williams Signed-off-by: Vishal Verma --- daxctl/lib/libdaxctl-private.h | 6 + daxctl/lib/libdaxctl.c | 228 + daxctl/lib/libdaxctl.sym | 4 + daxctl/libdaxctl.h | 4 + 4 files changed, 242 insertions(+) diff --git a/daxctl/lib/libdaxctl-private.h b/daxctl/lib/libdaxctl-private.h index e64d0a7..ef443aa 100644 --- a/daxctl/lib/libdaxctl-private.h +++ b/daxctl/lib/libdaxctl-private.h @@ -33,6 +33,11 @@ static const char *dax_modules[] = { [DAXCTL_DEV_MODE_RAM] = "kmem", }; +enum node_state { + NODE_OFFLINE, + NODE_ONLINE, +}; + /** * struct daxctl_region - container for dax_devices */ @@ -63,6 +68,7 @@ struct daxctl_dev { struct kmod_module *module; struct kmod_list *kmod_list; struct daxctl_region *region; + int target_node; }; static inline int check_kmod(struct kmod_ctx *kmod_ctx) diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c index d50b321..aab2364 100644 --- a/daxctl/lib/libdaxctl.c +++ b/daxctl/lib/libdaxctl.c @@ -28,6 +28,7 @@ #include "libdaxctl-private.h" static const char *attrs = "dax_region"; +static const char *node_base = "/sys/devices/system/node"; static void free_region(struct daxctl_region *region, struct list_head *head); @@ -397,6 +398,12 @@ static void *add_dax_dev(void *parent, int id, const char *daxdev_base) } else dbg(ctx, "%s: modalias attribute missing\n", devname); + sprintf(path, "%s/target_node", daxdev_base); + if (sysfs_read_attr(ctx, path, buf) == 0) + dev->target_node = strtol(buf, NULL, 0); + else + dev->target_node = -1; + daxctl_dev_foreach(region, dev_dup) if (dev_dup->id == dev->id) { free_dev(dev, NULL); @@ -897,3 +904,224 @@ DAXCTL_EXPORT unsigned long long daxctl_dev_get_size(struct daxctl_dev *dev) { return dev->size; } + +DAXCTL_EXPORT int daxctl_dev_get_target_node(struct daxctl_dev *dev) +{ + return dev->target_node; +} + +static int online_one_memblock(struct daxctl_dev *dev, char *path) +{ + const char *devname = daxctl_dev_get_devname(dev); + struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev); + const char *mode = "online_movable"; + char buf[SYSFS_ATTR_SIZE]; + int rc; + + rc = sysfs_read_attr(ctx, path, buf); + if (rc) { + err(ctx, "%s: Failed to read %s: %s\n", + devname, path, strerror(-rc)); + return rc; + } + + /* +* if already online, possibly due to kernel config or a udev rule, +* there is nothing to do and we can skip over the memblock +*/ + if (strncmp(buf, "online", 6) == 0) + return 0; + + rc = sysfs_write_attr_quiet(ctx, path, mode); + if (rc) + err(ctx, "%s: Failed to online %s: %s\n", + devname, path, strerror(-rc)); + return rc; +} + +static int offline_one_memblock(struct daxctl_dev *dev, char *path) +{ + const char *devname = daxctl_dev_get_devname(dev); + struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev); + const char *mode = "offline"; + char buf[SYSFS_ATTR_SIZE]; + int rc; + + rc = sysfs_read_attr(ctx, path, buf); + if (rc) { + err(ctx, "%s: Failed to read %s: %s\n", + devname, path, strerror(-rc)); + return rc; + } + + /* if already offline, there is nothing to do */ + if (strncmp(buf, "offline", 6) == 0) + return 0; + + rc = sysfs_write_attr_quiet(ctx, path, mode); + if (rc) + err(ctx, "%s: Failed to offline %s: %s\n", + devname, path, strerror(-rc)); + return rc; +} + +static int daxctl_dev_node_set_state(struct daxctl_dev *dev, + enum node_state state) +{ + struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev); + int target_node, rc, changed = 0; + struct dirent *de; + char *node_path; + DIR *node_dir; + + target_node = daxctl_dev_get_target_node(dev); + if (target_node < 0) { + err(ctx, "%s: Unable to get target node\n", + daxctl_dev_get_devname(dev)); + return -ENXIO; + } + + rc = asprintf(_path, "%s/node%d", node_base, target_node); + if (rc < 0) + return
[ndctl PATCH v2 02/10] libdaxctl: cache 'subsystem' in daxctl_ctx
The 'DAX subsystem' in effect is determined at region or device init time, and dictates the sysfs base paths for all device/region operations. In preparation for adding bind/unbind functionality, cache the subsystem as determined at init time in the library context. Cc: Dan Williams Signed-off-by: Vishal Verma --- daxctl/lib/libdaxctl.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c index 70f896b..f8f5b8c 100644 --- a/daxctl/lib/libdaxctl.c +++ b/daxctl/lib/libdaxctl.c @@ -46,6 +46,7 @@ struct daxctl_ctx { void *userdata; int regions_init; struct list_head regions; + enum dax_subsystem subsys; }; /** @@ -96,6 +97,7 @@ DAXCTL_EXPORT int daxctl_new(struct daxctl_ctx **ctx) dbg(c, "log_priority=%d\n", c->ctx.log_priority); *ctx = c; list_head_init(>regions); + c->subsys = DAX_UNKNOWN; return 0; } @@ -454,14 +456,18 @@ static void dax_devices_init(struct daxctl_region *region) for (i = 0; i < ARRAY_SIZE(dax_subsystems); i++) { char *region_path; - if (i == DAX_BUS) + if (i == DAX_BUS) { region_path = region->region_path; - else if (i == DAX_CLASS) { + if (ctx->subsys == DAX_UNKNOWN) + ctx->subsys = DAX_BUS; + } else if (i == DAX_CLASS) { if (asprintf(_path, "%s/dax", region->region_path) < 0) { dbg(ctx, "region path alloc fail\n"); continue; } + if (ctx->subsys == DAX_UNKNOWN) + ctx->subsys = DAX_CLASS; } else continue; sysfs_device_parse(ctx, region_path, daxdev_fmt, region, @@ -539,6 +545,8 @@ static void __dax_regions_init(struct daxctl_ctx *ctx, enum dax_subsystem subsys free(dev_path); if (!region) err(ctx, "add_dax_region() for %s failed\n", de->d_name); + if (ctx->subsys == DAX_UNKNOWN) + ctx->subsys = subsys; } closedir(dir); } -- 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[ndctl PATCH v2 07/10] daxctl: add a new reconfigure-device command
Add a new command 'daxctl-reconfigure-device'. This is used to switch the mode of a dax device between regular 'device_dax' and 'system-memory'. The command also uses the memory hotplug sysfs interfaces to online the newly available memory when converting to 'system-ram', and to attempt to offline the memory when converting back to a DAX device. Cc: Pavel Tatashin Cc: Dave Hansen Cc: Dan Williams Signed-off-by: Vishal Verma --- daxctl/Makefile.am | 2 + daxctl/builtin.h | 1 + daxctl/daxctl.c| 1 + daxctl/device.c| 237 + 4 files changed, 241 insertions(+) create mode 100644 daxctl/device.c diff --git a/daxctl/Makefile.am b/daxctl/Makefile.am index 94f73f9..66dcc7f 100644 --- a/daxctl/Makefile.am +++ b/daxctl/Makefile.am @@ -15,10 +15,12 @@ daxctl_SOURCES =\ daxctl.c \ list.c \ migrate.c \ + device.c \ ../util/json.c daxctl_LDADD =\ lib/libdaxctl.la \ ../libutil.a \ $(UUID_LIBS) \ + $(KMOD_LIBS) \ $(JSON_LIBS) diff --git a/daxctl/builtin.h b/daxctl/builtin.h index 00ef5e9..756ba2a 100644 --- a/daxctl/builtin.h +++ b/daxctl/builtin.h @@ -6,4 +6,5 @@ struct daxctl_ctx; int cmd_list(int argc, const char **argv, struct daxctl_ctx *ctx); int cmd_migrate(int argc, const char **argv, struct daxctl_ctx *ctx); +int cmd_reconfig_device(int argc, const char **argv, struct daxctl_ctx *ctx); #endif /* _DAXCTL_BUILTIN_H_ */ diff --git a/daxctl/daxctl.c b/daxctl/daxctl.c index 2e41747..e1ba7b8 100644 --- a/daxctl/daxctl.c +++ b/daxctl/daxctl.c @@ -71,6 +71,7 @@ static struct cmd_struct commands[] = { { "list", .d_fn = cmd_list }, { "help", .d_fn = cmd_help }, { "migrate-device-model", .d_fn = cmd_migrate }, + { "reconfigure-device", .d_fn = cmd_reconfig_device }, }; int main(int argc, const char **argv) diff --git a/daxctl/device.c b/daxctl/device.c new file mode 100644 index 000..12644c5 --- /dev/null +++ b/daxctl/device.c @@ -0,0 +1,237 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019 Intel Corporation. All rights reserved. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static struct { + const char *dev; + const char *mode; + int region_id; + bool no_online; + bool do_offline; + bool human; + bool verbose; +} param = { + .region_id = -1, +}; + +static int dev_disable(struct daxctl_dev *dev) +{ + int rc; + + if (!daxctl_dev_is_enabled(dev)) + return 0; + + rc = daxctl_dev_disable(dev); + if (rc) + fprintf(stderr, "%s: disable failed: %s\n", + daxctl_dev_get_devname(dev), strerror(-rc)); + + return rc; +} + +static int reconfig_mode_ram(struct daxctl_dev *dev) +{ + const char *devname = daxctl_dev_get_devname(dev); + int rc; + + rc = dev_disable(dev); + if (rc) + return rc; + rc = daxctl_dev_enable_ram(dev); + if (rc) + return rc; + + if (param.no_online) + return 0; + + rc = daxctl_dev_online_node(dev); + if (rc < 0) { + fprintf(stderr, "%s: unable to online memory: %s\n", + devname, strerror(-rc)); + return rc; + } + if (param.verbose) + fprintf(stderr, "%s: onlined %d memory sections\n", + devname, rc); + + return 0; +} + +static int reconfig_mode_devdax(struct daxctl_dev *dev) +{ + const char *devname = daxctl_dev_get_devname(dev); + int rc; + + if (param.do_offline) { + rc = daxctl_dev_offline_node(dev); + if (rc < 0) { + fprintf(stderr, "%s: unable to offline memory: %s\n", + devname, strerror(-rc)); + return rc; + } + if (param.verbose) + fprintf(stderr, "%s: offlined %d memory sections\n", + devname, rc); + } + + rc = daxctl_dev_node_is_online(dev); + if (rc < 0) { + fprintf(stderr, "%s: unable to determine node state: %s\n", + devname, strerror(-rc)); + return rc; + } + if (rc > 0) { + if (param.verbose) { + fprintf(stderr, "%s: found %d memory sections online\n", + devname, rc); + fprintf(stderr, "%s: refusing to change modes\n", + devname); + } + return -EBUSY; + } + + rc = dev_disable(dev); + if (rc) + return rc; + + rc = daxctl_dev_enable_devdax(dev); + if (rc) + return rc; + + return 0;
[ndctl PATCH v2 08/10] Documentation/daxctl: add a man page for daxctl-reconfigure-device
Add a man page describing the new daxctl-reconfigure-device command. Cc: Pavel Tatashin Cc: Dave Hansen Cc: Dan Williams Signed-off-by: Vishal Verma --- Documentation/daxctl/Makefile.am | 3 +- .../daxctl/daxctl-reconfigure-device.txt | 118 ++ 2 files changed, 120 insertions(+), 1 deletion(-) create mode 100644 Documentation/daxctl/daxctl-reconfigure-device.txt diff --git a/Documentation/daxctl/Makefile.am b/Documentation/daxctl/Makefile.am index 6aba035..715fbad 100644 --- a/Documentation/daxctl/Makefile.am +++ b/Documentation/daxctl/Makefile.am @@ -28,7 +28,8 @@ endif man1_MANS = \ daxctl.1 \ daxctl-list.1 \ - daxctl-migrate-device-model.1 + daxctl-migrate-device-model.1 \ + daxctl-reconfigure-device.1 CLEANFILES = $(man1_MANS) diff --git a/Documentation/daxctl/daxctl-reconfigure-device.txt b/Documentation/daxctl/daxctl-reconfigure-device.txt new file mode 100644 index 000..b575091 --- /dev/null +++ b/Documentation/daxctl/daxctl-reconfigure-device.txt @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0 + +daxctl-reconfigure-device(1) + + +NAME + +daxctl-reconfigure-device - Reconfigure a dax device into a different mode + +SYNOPSIS + +[verse] +'daxctl reconfigure-device' [...] [] + +EXAMPLES + + +* Reconfigure dax0.0 to system-ram mode, don't online the memory + +# daxctl reconfigure-device --mode=system-ram --no-online dax0.0 +[ + { +"chardev":"dax0.0", +"size":16777216000, +"numa_node":2, +"mode":"system-ram" + } +] + + +* Reconfigure dax0.0 to devdax mode, attempt to offline the memory + +# daxctl reconfigure-device --human --mode=devdax --attempt-offline dax0.0 +{ + "chardev":"dax0.0", + "size":"15.63 GiB (16.78 GB)", + "numa_node":2, + "mode":"devdax" +} + + +* Reconfigure all dax devices on region0 to system-ram mode + +# daxctl reconfigure-device --mode=system-ram --region=0 all +[ + { +"chardev":"dax0.0", +"size":16777216000, +"numa_node":2, +"mode":"system-ram" + }, + { +"chardev":"dax0.1", +"size":16777216000, +"numa_node":3, +"mode":"system-ram" + } +] + + +DESCRIPTION +--- + +Reconfigure the operational mode of a dax device. This can be used to convert +a regular 'devdax' mode device to the 'system-ram' mode which allows for the dax +range to be hot-plugged into the system as regular memory. + +NOTE: This is a destructive operation. Any data on the dax device *will* be +lost. + +OPTIONS +--- +-r:: +--region=:: + Restrict the reconfigure operation to devices belonging to the + specified region(s). A device-dax region is a contiguous range of + memory that hosts one or more /dev/daxX.Y devices, where X is the + region id and Y is the device instance id. + +-m:: +--mode=:: + Specify the mode to which the dax device(s) should be reconfigured. + - "system-ram": hotplug the device into system memory. + + - "devdax": switch to the normal "device dax" mode. This may not work + on kernels prior to v5.2. In such a case, a reboot is the only way to + switch back to 'devdax' mode. + +-N:: +--no-online:: + By default, memory sections provided by system-ram devices will be + brought online automatically and immediately. Use this option to + disable the automatic onlining behavior. + +-O:: +--attempt-offline:: + When converting from "system-ram" mode to "devdax", it is expected + that all the memory sections are first made offline. By default, + daxctl won't touch online memory. However with this option, attempt + to offline the memory on the NUMA node associated with the dax device + before converting it back to "devdax" mode. + +-u:: +--human:: + By default the command will output machine-friendly raw-integer + data. Instead, with this flag, numbers representing storage size + will be formatted as human readable strings with units, other + fields are converted to hexadecimal strings. + +-v:: +--verbose:: + Emit more debug messages for the reconfiguration process + +include::../copyright.txt[] + +SEE ALSO + +linkdaxctl:daxctl-list[1] -- 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[ndctl PATCH v2 09/10] contrib/ndctl: fix region-id completions for daxctl
The completion helpers for daxctl assumed the region arguments for specifying daxctl regions were the same as ndctl regions, i.e. "regionX". This is not true - daxctl region arguments are a simple numeric 'id'. Add a new helper __daxctl_get_regions() to complete daxctl region IDs properly. While at it, fix a useless use of 'echo' in __daxctl_get_devs() and quoting in __daxctl_comp_options() Fixes: d6790a32f32c ("daxctl: Add bash-completion") Signed-off-by: Vishal Verma --- contrib/ndctl | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/contrib/ndctl b/contrib/ndctl index e17fb0b..d1f8bd6 100755 --- a/contrib/ndctl +++ b/contrib/ndctl @@ -528,8 +528,14 @@ _ndctl() __daxctl_get_devs() { - local opts="--devices $*" - echo "$(daxctl list $opts | grep -E "^\s*\"chardev\":" | cut -d\" -f4)" + local opts=("--devices" "$*") + daxctl list "${opts[@]}" | grep -E "^\s*\"chardev\":" | cut -d'"' -f4 +} + +__daxctl_get_regions() +{ + local opts=("--regions" "$*") + daxctl list "${opts[@]}" | grep -E "^\s*\"id\":" | grep -Eo "[0-9]+" } __daxctlcomp() @@ -558,10 +564,10 @@ __daxctl_comp_options() local cur_arg=${cur##*=} case $cur_subopt in --region) - opts=$(__ndctl_get_regions -i) + opts="$(__daxctl_get_regions -i)" ;; --dev) - opts=$(__daxctl_get_devs -i) + opts="$(__daxctl_get_devs -i)" ;; *) return -- 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[ndctl PATCH v2 03/10] libdaxctl: add interfaces to enable/disable devices
Add new libdaxctl interfaces to disable a device_dax based devices, and to enable it into a given mode. The modes available are 'device_dax', and 'system-ram', where device_dax is the normal device DAX mode used via a character device, and 'system-ram' uses the kernel's 'kmem' facility to hotplug the device into the system's memory space, and can be used as normal system memory. This adds the following new interfaces: daxctl_dev_disable; daxctl_dev_enable_devdax; daxctl_dev_enable_ram; Cc: Dave Hansen Cc: Dan Williams Signed-off-by: Vishal Verma --- daxctl/lib/Makefile.am | 3 +- daxctl/lib/libdaxctl-private.h | 15 ++ daxctl/lib/libdaxctl.c | 241 + daxctl/lib/libdaxctl.sym | 3 + daxctl/libdaxctl.h | 9 ++ 5 files changed, 270 insertions(+), 1 deletion(-) diff --git a/daxctl/lib/Makefile.am b/daxctl/lib/Makefile.am index d3d4852..9f0e444 100644 --- a/daxctl/lib/Makefile.am +++ b/daxctl/lib/Makefile.am @@ -16,7 +16,8 @@ libdaxctl_la_SOURCES =\ libdaxctl.c libdaxctl_la_LIBADD =\ - $(UUID_LIBS) + $(UUID_LIBS) \ + $(KMOD_LIBS) daxctl_modprobe_data_DATA = daxctl.conf diff --git a/daxctl/lib/libdaxctl-private.h b/daxctl/lib/libdaxctl-private.h index 4a462e7..e64d0a7 100644 --- a/daxctl/lib/libdaxctl-private.h +++ b/daxctl/lib/libdaxctl-private.h @@ -13,6 +13,8 @@ #ifndef _LIBDAXCTL_PRIVATE_H_ #define _LIBDAXCTL_PRIVATE_H_ +#include + #define DAXCTL_EXPORT __attribute__ ((visibility("default"))) enum dax_subsystem { @@ -26,6 +28,11 @@ static const char *dax_subsystems[] = { [DAX_BUS] = "/sys/bus/dax/devices", }; +static const char *dax_modules[] = { + [DAXCTL_DEV_MODE_DEVDAX] = "device_dax", + [DAXCTL_DEV_MODE_RAM] = "kmem", +}; + /** * struct daxctl_region - container for dax_devices */ @@ -53,6 +60,14 @@ struct daxctl_dev { char *dev_path; struct list_node list; unsigned long long size; + struct kmod_module *module; + struct kmod_list *kmod_list; struct daxctl_region *region; }; + +static inline int check_kmod(struct kmod_ctx *kmod_ctx) +{ + return kmod_ctx ? 0 : -ENXIO; +} + #endif /* _LIBDAXCTL_PRIVATE_H_ */ diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c index f8f5b8c..d50b321 100644 --- a/daxctl/lib/libdaxctl.c +++ b/daxctl/lib/libdaxctl.c @@ -47,6 +47,7 @@ struct daxctl_ctx { int regions_init; struct list_head regions; enum dax_subsystem subsys; + struct kmod_ctx *kmod_ctx; }; /** @@ -85,12 +86,20 @@ DAXCTL_EXPORT void daxctl_set_userdata(struct daxctl_ctx *ctx, void *userdata) */ DAXCTL_EXPORT int daxctl_new(struct daxctl_ctx **ctx) { + struct kmod_ctx *kmod_ctx; struct daxctl_ctx *c; + int rc = 0; c = calloc(1, sizeof(struct daxctl_ctx)); if (!c) return -ENOMEM; + kmod_ctx = kmod_new(NULL, NULL); + if (check_kmod(kmod_ctx) != 0) { + rc = -ENXIO; + goto out; + } + c->refcount = 1; log_init(>ctx, "libdaxctl", "DAXCTL_LOG"); info(c, "ctx %p created\n", c); @@ -98,8 +107,12 @@ DAXCTL_EXPORT int daxctl_new(struct daxctl_ctx **ctx) *ctx = c; list_head_init(>regions); c->subsys = DAX_UNKNOWN; + c->kmod_ctx = kmod_ctx; return 0; +out: + free(c); + return rc; } /** @@ -134,6 +147,7 @@ DAXCTL_EXPORT void daxctl_unref(struct daxctl_ctx *ctx) list_for_each_safe(>regions, region, _r, list) free_region(region, >regions); + kmod_unref(ctx->kmod_ctx); info(ctx, "context %p released\n", ctx); free(ctx); } @@ -191,6 +205,7 @@ static void free_dev(struct daxctl_dev *dev, struct list_head *head) { if (head) list_del_from(head, >list); + kmod_module_unref_list(dev->kmod_list); free(dev->dev_buf); free(dev->dev_path); free(dev); @@ -308,6 +323,27 @@ DAXCTL_EXPORT struct daxctl_region *daxctl_new_region(struct daxctl_ctx *ctx, return region; } +static struct kmod_list *to_module_list(struct daxctl_ctx *ctx, + const char *alias) +{ + struct kmod_list *list = NULL; + int rc; + + if (!ctx->kmod_ctx || !alias) + return NULL; + if (alias[0] == 0) + return NULL; + + rc = kmod_module_new_from_lookup(ctx->kmod_ctx, alias, ); + if (rc < 0 || !list) { + dbg(ctx, "failed to find modules for alias: %s %d list: %s\n", + alias, rc, list ? "populated" : "empty"); + return NULL; + } + + return list; +} + static void *add_dax_dev(void *parent, int id, const char *daxdev_base) { const char *devname = devpath_to_devname(daxdev_base); @@ -317,6 +353,7 @@ static void *add_dax_dev(void *parent, int id, const char *daxdev_base)
[ndctl PATCH v2 05/10] daxctl/list: add numa_node for device listings
The kernel provides a 'target_node' attribute for dax devices. When converting a dax device to the system-ram mode, the memory is hotplugged into this numa node. It would be helpful to print this in device listings so that it is easy for applications to detect the numa node to which the new memory belongs. Cc: Dan Williams Cc: Dave Hansen Signed-off-by: Vishal Verma --- util/json.c | 8 1 file changed, 8 insertions(+) diff --git a/util/json.c b/util/json.c index babdc8c..b7ce719 100644 --- a/util/json.c +++ b/util/json.c @@ -271,6 +271,7 @@ struct json_object *util_daxctl_dev_to_json(struct daxctl_dev *dev, { const char *devname = daxctl_dev_get_devname(dev); struct json_object *jdev, *jobj; + int node; jdev = json_object_new_object(); if (!devname || !jdev) @@ -284,6 +285,13 @@ struct json_object *util_daxctl_dev_to_json(struct daxctl_dev *dev, if (jobj) json_object_object_add(jdev, "size", jobj); + node = daxctl_dev_get_target_node(dev); + if (node >= 0) { + jobj = json_object_new_int(node); + if (jobj) + json_object_object_add(jdev, "numa_node", jobj); + } + return jdev; } -- 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 6/6] mm/devm_memremap_pages: Fix final page put race
Logan noticed that devm_memremap_pages_release() kills the percpu_ref drops all the page references that were acquired at init and then immediately proceeds to unplug, arch_remove_memory(), the backing pages for the pagemap. If for some reason device shutdown actually collides with a busy / elevated-ref-count page then arch_remove_memory() should be deferred until after that reference is dropped. As it stands the "wait for last page ref drop" happens *after* devm_memremap_pages_release() returns, which is obviously too late and can lead to crashes. Fix this situation by assigning the responsibility to wait for the percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup() callback. Implement the new cleanup callback for all devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma. Reported-by: Logan Gunthorpe Fixes: 41e94a851304 ("add devm_memremap_pages") Cc: Bjorn Helgaas Cc: "Jérôme Glisse" Cc: Christoph Hellwig Reviewed-by: Ira Weiny Signed-off-by: Dan Williams --- drivers/dax/device.c | 13 +++-- drivers/nvdimm/pmem.c | 17 + drivers/pci/p2pdma.c | 17 +++-- include/linux/memremap.h |2 ++ kernel/memremap.c | 17 - mm/hmm.c | 14 +++--- tools/testing/nvdimm/test/iomap.c |2 ++ 7 files changed, 38 insertions(+), 44 deletions(-) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index e428468ab661..e3aa78dd1bb0 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -27,9 +27,8 @@ static void dev_dax_percpu_release(struct percpu_ref *ref) complete(_dax->cmp); } -static void dev_dax_percpu_exit(void *data) +static void dev_dax_percpu_exit(struct percpu_ref *ref) { - struct percpu_ref *ref = data; struct dev_dax *dev_dax = ref_to_dev_dax(ref); dev_dbg(_dax->dev, "%s\n", __func__); @@ -468,18 +467,12 @@ int dev_dax_probe(struct device *dev) if (rc) return rc; - rc = devm_add_action_or_reset(dev, dev_dax_percpu_exit, _dax->ref); - if (rc) - return rc; - dev_dax->pgmap.ref = _dax->ref; dev_dax->pgmap.kill = dev_dax_percpu_kill; + dev_dax->pgmap.cleanup = dev_dax_percpu_exit; addr = devm_memremap_pages(dev, _dax->pgmap); - if (IS_ERR(addr)) { - devm_remove_action(dev, dev_dax_percpu_exit, _dax->ref); - percpu_ref_exit(_dax->ref); + if (IS_ERR(addr)) return PTR_ERR(addr); - } inode = dax_inode(dax_dev); cdev = inode->i_cdev; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 0279eb1da3ef..1c9181712fa4 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -304,11 +304,19 @@ static const struct attribute_group *pmem_attribute_groups[] = { NULL, }; -static void pmem_release_queue(void *q) +static void __pmem_release_queue(struct percpu_ref *ref) { + struct request_queue *q; + + q = container_of(ref, typeof(*q), q_usage_counter); blk_cleanup_queue(q); } +static void pmem_release_queue(void *ref) +{ + __pmem_release_queue(ref); +} + static void pmem_freeze_queue(struct percpu_ref *ref) { struct request_queue *q; @@ -400,12 +408,10 @@ static int pmem_attach_disk(struct device *dev, if (!q) return -ENOMEM; - if (devm_add_action_or_reset(dev, pmem_release_queue, q)) - return -ENOMEM; - pmem->pfn_flags = PFN_DEV; pmem->pgmap.ref = >q_usage_counter; pmem->pgmap.kill = pmem_freeze_queue; + pmem->pgmap.cleanup = __pmem_release_queue; if (is_nd_pfn(dev)) { if (setup_pagemap_fsdax(dev, >pgmap)) return -ENOMEM; @@ -426,6 +432,9 @@ static int pmem_attach_disk(struct device *dev, pmem->pfn_flags |= PFN_MAP; memcpy(_res, >pgmap.res, sizeof(bb_res)); } else { + if (devm_add_action_or_reset(dev, pmem_release_queue, + >q_usage_counter)) + return -ENOMEM; addr = devm_memremap(dev, pmem->phys_addr, pmem->size, ARCH_MEMREMAP_PMEM); memcpy(_res, >res, sizeof(bb_res)); diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 54d475569058..a7a66b958720 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -95,7 +95,7 @@ static void pci_p2pdma_percpu_kill(struct percpu_ref *ref) percpu_ref_kill(ref); } -static void pci_p2pdma_percpu_cleanup(void *ref) +static void pci_p2pdma_percpu_cleanup(struct percpu_ref *ref) { struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref); @@ -198,16 +198,6 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, if (error) goto pgmap_free; - /* -
[PATCH v2 5/6] PCI/P2PDMA: Track pgmap references per resource, not globally
In preparation for fixing a race between devm_memremap_pages_release() and the final put of a page from the device-page-map, allocate a percpu-ref per p2pdma resource mapping. Cc: Logan Gunthorpe Cc: Bjorn Helgaas Cc: Christoph Hellwig Reviewed-by: Ira Weiny Signed-off-by: Dan Williams --- drivers/pci/p2pdma.c | 124 +- 1 file changed, 81 insertions(+), 43 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 595a534bd749..54d475569058 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -20,12 +20,16 @@ #include struct pci_p2pdma { - struct percpu_ref devmap_ref; - struct completion devmap_ref_done; struct gen_pool *pool; bool p2pmem_published; }; +struct p2pdma_pagemap { + struct dev_pagemap pgmap; + struct percpu_ref ref; + struct completion ref_done; +}; + static ssize_t size_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -74,41 +78,45 @@ static const struct attribute_group p2pmem_group = { .name = "p2pmem", }; +static struct p2pdma_pagemap *to_p2p_pgmap(struct percpu_ref *ref) +{ + return container_of(ref, struct p2pdma_pagemap, ref); +} + static void pci_p2pdma_percpu_release(struct percpu_ref *ref) { - struct pci_p2pdma *p2p = - container_of(ref, struct pci_p2pdma, devmap_ref); + struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref); - complete_all(>devmap_ref_done); + complete(_pgmap->ref_done); } static void pci_p2pdma_percpu_kill(struct percpu_ref *ref) { - /* -* pci_p2pdma_add_resource() may be called multiple times -* by a driver and may register the percpu_kill devm action multiple -* times. We only want the first action to actually kill the -* percpu_ref. -*/ - if (percpu_ref_is_dying(ref)) - return; - percpu_ref_kill(ref); } +static void pci_p2pdma_percpu_cleanup(void *ref) +{ + struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref); + + wait_for_completion(_pgmap->ref_done); + percpu_ref_exit(_pgmap->ref); +} + static void pci_p2pdma_release(void *data) { struct pci_dev *pdev = data; + struct pci_p2pdma *p2pdma = pdev->p2pdma; - if (!pdev->p2pdma) + if (!p2pdma) return; - wait_for_completion(>p2pdma->devmap_ref_done); - percpu_ref_exit(>p2pdma->devmap_ref); + /* Flush and disable pci_alloc_p2p_mem() */ + pdev->p2pdma = NULL; + synchronize_rcu(); - gen_pool_destroy(pdev->p2pdma->pool); + gen_pool_destroy(p2pdma->pool); sysfs_remove_group(>dev.kobj, _group); - pdev->p2pdma = NULL; } static int pci_p2pdma_setup(struct pci_dev *pdev) @@ -124,12 +132,6 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) if (!p2p->pool) goto out; - init_completion(>devmap_ref_done); - error = percpu_ref_init(>devmap_ref, - pci_p2pdma_percpu_release, 0, GFP_KERNEL); - if (error) - goto out_pool_destroy; - error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev); if (error) goto out_pool_destroy; @@ -163,6 +165,7 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) { + struct p2pdma_pagemap *p2p_pgmap; struct dev_pagemap *pgmap; void *addr; int error; @@ -185,14 +188,32 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, return error; } - pgmap = devm_kzalloc(>dev, sizeof(*pgmap), GFP_KERNEL); - if (!pgmap) + p2p_pgmap = devm_kzalloc(>dev, sizeof(*p2p_pgmap), GFP_KERNEL); + if (!p2p_pgmap) return -ENOMEM; + init_completion(_pgmap->ref_done); + error = percpu_ref_init(_pgmap->ref, + pci_p2pdma_percpu_release, 0, GFP_KERNEL); + if (error) + goto pgmap_free; + + /* +* FIXME: the percpu_ref_exit needs to be coordinated internal +* to devm_memremap_pages_release(). Duplicate the same ordering +* as other devm_memremap_pages() users for now. +*/ + error = devm_add_action(>dev, pci_p2pdma_percpu_cleanup, + _pgmap->ref); + if (error) + goto ref_cleanup; + + pgmap = _pgmap->pgmap; + pgmap->res.start = pci_resource_start(pdev, bar) + offset; pgmap->res.end = pgmap->res.start + size - 1; pgmap->res.flags = pci_resource_flags(pdev, bar); - pgmap->ref = >p2pdma->devmap_ref; + pgmap->ref = _pgmap->ref; pgmap->type = MEMORY_DEVICE_PCI_P2PDMA; pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) -
[PATCH v2 3/6] PCI/P2PDMA: Fix the gen_pool_add_virt() failure path
The pci_p2pdma_add_resource() implementation immediately frees the pgmap if gen_pool_add_virt() fails. However, that means that when @dev triggers a devres release devm_memremap_pages_release() will crash trying to access the freed @pgmap. Use the new devm_memunmap_pages() to manually free the mapping in the error path. Fixes: 52916982af48 ("PCI/P2PDMA: Support peer-to-peer memory") Cc: Logan Gunthorpe Cc: Christoph Hellwig Reviewed-by: Ira Weiny Acked-by: Bjorn Helgaas Signed-off-by: Dan Williams --- drivers/pci/p2pdma.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index c52298d76e64..595a534bd749 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -208,13 +208,15 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, pci_bus_address(pdev, bar) + offset, resource_size(>res), dev_to_node(>dev)); if (error) - goto pgmap_free; + goto pages_free; pci_info(pdev, "added peer-to-peer DMA memory %pR\n", >res); return 0; +pages_free: + devm_memunmap_pages(>dev, pgmap); pgmap_free: devm_kfree(>dev, pgmap); return error; ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 1/6] drivers/base/devres: Introduce devm_release_action()
The devm_add_action() facility allows a resource allocation routine to add custom devm semantics. One such user is devm_memremap_pages(). There is now a need to manually trigger devm_memremap_pages_release(). Introduce devm_release_action() so the release action can be triggered via a new devm_memunmap_pages() api in a follow-on change. Cc: Logan Gunthorpe Cc: Bjorn Helgaas Cc: Christoph Hellwig Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Reviewed-by: Ira Weiny Signed-off-by: Dan Williams --- drivers/base/devres.c | 24 +++- include/linux/device.h |1 + 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/drivers/base/devres.c b/drivers/base/devres.c index e038e2b3b7ea..0bbb328bd17f 100644 --- a/drivers/base/devres.c +++ b/drivers/base/devres.c @@ -755,10 +755,32 @@ void devm_remove_action(struct device *dev, void (*action)(void *), void *data) WARN_ON(devres_destroy(dev, devm_action_release, devm_action_match, )); - } EXPORT_SYMBOL_GPL(devm_remove_action); +/** + * devm_release_action() - release previously added custom action + * @dev: Device that owns the action + * @action: Function implementing the action + * @data: Pointer to data passed to @action implementation + * + * Releases and removes instance of @action previously added by + * devm_add_action(). Both action and data should match one of the + * existing entries. + */ +void devm_release_action(struct device *dev, void (*action)(void *), void *data) +{ + struct action_devres devres = { + .data = data, + .action = action, + }; + + WARN_ON(devres_release(dev, devm_action_release, devm_action_match, + )); + +} +EXPORT_SYMBOL_GPL(devm_release_action); + /* * Managed kmalloc/kfree */ diff --git a/include/linux/device.h b/include/linux/device.h index 4e6987e11f68..6d7fd5370f3d 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -713,6 +713,7 @@ void __iomem *devm_of_iomap(struct device *dev, /* allows to add/remove a custom action to devres stack */ int devm_add_action(struct device *dev, void (*action)(void *), void *data); void devm_remove_action(struct device *dev, void (*action)(void *), void *data); +void devm_release_action(struct device *dev, void (*action)(void *), void *data); static inline int devm_add_action_or_reset(struct device *dev, void (*action)(void *), void *data) ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 0/6] mm/devm_memremap_pages: Fix page release race
Changes since v1 [1]: - Fix a NULL-pointer deref crash in pci_p2pdma_release() (Logan) - Refresh the p2pdma patch headers to match the format of other p2pdma patches (Bjorn) - Collect Ira's reviewed-by [1]: https://lore.kernel.org/lkml/155387324370.2443841.574715745262628837.st...@dwillia2-desk3.amr.corp.intel.com/ --- Logan audited the devm_memremap_pages() shutdown path and noticed that it was possible to proceed to arch_remove_memory() before all potential page references have been reaped. Introduce a new ->cleanup() callback to do the work of waiting for any straggling page references and then perform the percpu_ref_exit() in devm_memremap_pages_release() context. For p2pdma this involves some deeper reworks to reference count resources on a per-instance basis rather than a per pci-device basis. A modified genalloc api is introduced to convey a driver-private pointer through gen_pool_{alloc,free}() interfaces. Also, a devm_memunmap_pages() api is introduced since p2pdma does not auto-release resources on a setup failure. The dax and pmem changes pass the nvdimm unit tests, and the p2pdma changes should now pass testing with the pci_p2pdma_release() fix. Jérôme, how does this look for HMM? In general, I think these patches / fixes are suitable for v5.2-rc1 or v5.2-rc2, and since they touch kernel/memremap.c, and other various pieces of the core, they should go through the -mm tree. These patches merge cleanly with the current state of -next, pass the nvdimm unit tests, and are exposed to the 0day robot with no issues reported (https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending). --- Dan Williams (6): drivers/base/devres: Introduce devm_release_action() mm/devm_memremap_pages: Introduce devm_memunmap_pages PCI/P2PDMA: Fix the gen_pool_add_virt() failure path lib/genalloc: Introduce chunk owners PCI/P2PDMA: Track pgmap references per resource, not globally mm/devm_memremap_pages: Fix final page put race drivers/base/devres.c | 24 +++- drivers/dax/device.c | 13 +--- drivers/nvdimm/pmem.c | 17 - drivers/pci/p2pdma.c | 115 +++-- include/linux/device.h|1 include/linux/genalloc.h | 55 -- include/linux/memremap.h |8 +++ kernel/memremap.c | 23 ++- lib/genalloc.c| 51 mm/hmm.c | 14 + tools/testing/nvdimm/test/iomap.c |2 + 11 files changed, 217 insertions(+), 106 deletions(-) ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 2/6] mm/devm_memremap_pages: Introduce devm_memunmap_pages
Use the new devm_relase_action() facility to allow devm_memremap_pages_release() to be manually triggered. Cc: Logan Gunthorpe Cc: Bjorn Helgaas Cc: Christoph Hellwig Reviewed-by: Ira Weiny Signed-off-by: Dan Williams --- include/linux/memremap.h |6 ++ kernel/memremap.c|6 ++ 2 files changed, 12 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index f0628660d541..7601ee314c4a 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -100,6 +100,7 @@ struct dev_pagemap { #ifdef CONFIG_ZONE_DEVICE void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); +void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap); struct dev_pagemap *get_dev_pagemap(unsigned long pfn, struct dev_pagemap *pgmap); @@ -118,6 +119,11 @@ static inline void *devm_memremap_pages(struct device *dev, return ERR_PTR(-ENXIO); } +static inline void devm_memunmap_pages(struct device *dev, + struct dev_pagemap *pgmap) +{ +} + static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn, struct dev_pagemap *pgmap) { diff --git a/kernel/memremap.c b/kernel/memremap.c index a856cb5ff192..65afbacab44e 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -266,6 +266,12 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) } EXPORT_SYMBOL_GPL(devm_memremap_pages); +void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap) +{ + devm_release_action(dev, devm_memremap_pages_release, pgmap); +} +EXPORT_SYMBOL_GPL(devm_memunmap_pages); + unsigned long vmem_altmap_offset(struct vmem_altmap *altmap) { /* number of pfns from base where pfn_to_page() is valid */ ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 4/6] lib/genalloc: Introduce chunk owners
The p2pdma facility enables a provider to publish a pool of dma addresses for a consumer to allocate. A genpool is used internally by p2pdma to collect dma resources, 'chunks', to be handed out to consumers. Whenever a consumer allocates a resource it needs to pin the 'struct dev_pagemap' instance that backs the chunk selected by pci_alloc_p2pmem(). Currently that reference is taken globally on the entire provider device. That sets up a lifetime mismatch whereby the p2pdma core needs to maintain hacks to make sure the percpu_ref is not released twice. This lifetime mismatch also stands in the way of a fix to devm_memremap_pages() whereby devm_memremap_pages_release() must wait for the percpu_ref ->release() callback to complete before it can proceed to teardown pages. So, towards fixing this situation, introduce the ability to store a 'chunk owner' at gen_pool_add() time, and a facility to retrieve the owner at gen_pool_{alloc,free}() time. For p2pdma this will be used to store and recall individual dev_pagemap reference counter instances per-chunk. Cc: Logan Gunthorpe Cc: Bjorn Helgaas Cc: "Jérôme Glisse" Cc: Christoph Hellwig Reviewed-by: Ira Weiny Signed-off-by: Dan Williams --- include/linux/genalloc.h | 55 +- lib/genalloc.c | 51 +-- 2 files changed, 74 insertions(+), 32 deletions(-) diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index dd0a452373e7..a337313e064f 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -75,6 +75,7 @@ struct gen_pool_chunk { struct list_head next_chunk;/* next chunk in pool */ atomic_long_t avail; phys_addr_t phys_addr; /* physical starting address of memory chunk */ + void *owner;/* private data to retrieve at alloc time */ unsigned long start_addr; /* start address of memory chunk */ unsigned long end_addr; /* end address of memory chunk (inclusive) */ unsigned long bits[0]; /* bitmap for allocating memory chunk */ @@ -96,8 +97,15 @@ struct genpool_data_fixed { extern struct gen_pool *gen_pool_create(int, int); extern phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long); -extern int gen_pool_add_virt(struct gen_pool *, unsigned long, phys_addr_t, -size_t, int); +extern int gen_pool_add_owner(struct gen_pool *, unsigned long, phys_addr_t, +size_t, int, void *); + +static inline int gen_pool_add_virt(struct gen_pool *pool, unsigned long addr, + phys_addr_t phys, size_t size, int nid) +{ + return gen_pool_add_owner(pool, addr, phys, size, nid, NULL); +} + /** * gen_pool_add - add a new chunk of special memory to the pool * @pool: pool to add new memory chunk to @@ -116,12 +124,47 @@ static inline int gen_pool_add(struct gen_pool *pool, unsigned long addr, return gen_pool_add_virt(pool, addr, -1, size, nid); } extern void gen_pool_destroy(struct gen_pool *); -extern unsigned long gen_pool_alloc(struct gen_pool *, size_t); -extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t, - genpool_algo_t algo, void *data); +unsigned long gen_pool_alloc_algo_owner(struct gen_pool *pool, size_t size, + genpool_algo_t algo, void *data, void **owner); + +static inline unsigned long gen_pool_alloc_owner(struct gen_pool *pool, + size_t size, void **owner) +{ + return gen_pool_alloc_algo_owner(pool, size, pool->algo, pool->data, + owner); +} + +static inline unsigned long gen_pool_alloc_algo(struct gen_pool *pool, + size_t size, genpool_algo_t algo, void *data) +{ + return gen_pool_alloc_algo_owner(pool, size, algo, data, NULL); +} + +/** + * gen_pool_alloc - allocate special memory from the pool + * @pool: pool to allocate from + * @size: number of bytes to allocate from the pool + * + * Allocate the requested number of bytes from the specified pool. + * Uses the pool allocation function (with first-fit algorithm by default). + * Can not be used in NMI handler on architectures without + * NMI-safe cmpxchg implementation. + */ +static inline unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size) +{ + return gen_pool_alloc_algo(pool, size, pool->algo, pool->data); +} + extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); -extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); +extern void gen_pool_free_owner(struct gen_pool *pool, unsigned long addr, + size_t size, void **owner); +static inline void gen_pool_free(struct gen_pool *pool, unsigned long addr, +size_t size) +{ + gen_pool_free_owner(pool, addr, size, NULL); +} + extern void gen_pool_for_each_chunk(struct gen_pool *, void (*)(struct gen_pool
Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
On 4/25/19 10:00 PM, Pankaj Gupta wrote: > +void host_ack(struct virtqueue *vq) > +{ > + unsigned int len; > + unsigned long flags; > + struct virtio_pmem_request *req, *req_buf; > + struct virtio_pmem *vpmem = vq->vdev->priv; > + > + spin_lock_irqsave(>pmem_lock, flags); > + while ((req = virtqueue_get_buf(vq, )) != NULL) { > + req->done = true; > + wake_up(>host_acked); > + > + if (!list_empty(>req_list)) { > + req_buf = list_first_entry(>req_list, > + struct virtio_pmem_request, list); > + list_del(>req_list); Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to unlink first element of the list and `vpmem->req_list` is just the list head. > +int virtio_pmem_flush(struct nd_region *nd_region) > +{ > + int err; > + unsigned long flags; > + struct scatterlist *sgs[2], sg, ret; > + struct virtio_device *vdev = nd_region->provider_data; > + struct virtio_pmem *vpmem = vdev->priv; > + struct virtio_pmem_request *req; > + > + might_sleep(); > + req = kmalloc(sizeof(*req), GFP_KERNEL); > + if (!req) > + return -ENOMEM; > + > + req->done = req->wq_buf_avail = false; > + strcpy(req->name, "FLUSH"); > + init_waitqueue_head(>host_acked); > + init_waitqueue_head(>wq_buf); > + sg_init_one(, req->name, strlen(req->name)); > + sgs[0] = > + sg_init_one(, >ret, sizeof(req->ret)); > + sgs[1] = > + > + spin_lock_irqsave(>pmem_lock, flags); > + err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC); > + if (err) { > + dev_err(>dev, "failed to send command to virtio pmem > device\n"); > + > + list_add_tail(>req_list, >list); > + spin_unlock_irqrestore(>pmem_lock, flags); > + > + /* When host has read buffer, this completes via host_ack */ > + wait_event(req->wq_buf, req->wq_buf_avail); > + spin_lock_irqsave(>pmem_lock, flags); > + } Aren't the arguments in `list_add_tail` swapped? The element we are adding should be first, the list should be second. Also, shouldn't we resubmit the request after waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`? I propose rewriting it like that: diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c index 66b582f751a3..ff0556b04e86 100644 --- a/drivers/nvdimm/virtio_pmem.c +++ b/drivers/nvdimm/virtio_pmem.c @@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq) if (!list_empty(>req_list)) { req_buf = list_first_entry(>req_list, struct virtio_pmem_request, list); - list_del(>req_list); + list_del(vpmem->req_list.next); req_buf->wq_buf_avail = true; wake_up(_buf->wq_buf); } @@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region) sgs[1] = spin_lock_irqsave(>pmem_lock, flags); - err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC); - if (err) { - dev_err(>dev, "failed to send command to virtio pmem device\n"); + /* +* If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does not +* have free descriptor slots. We add the request to req_list and wait +* for host_ack to wake us up when free slots are available. +*/ + while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC)) == -ENOSPC) { + dev_err(>dev, "failed to send command to virtio pmem device, no free slots in the virtqueue, postponing request\n"); + req->wq_buf_avail = false; - list_add_tail(>req_list, >list); + list_add_tail(>list, >req_list); spin_unlock_irqrestore(>pmem_lock, flags); /* When host has read buffer, this completes via host_ack */ wait_event(req->wq_buf, req->wq_buf_avail); spin_lock_irqsave(>pmem_lock, flags); } + + /* +* virtqueue_add_sgs failed with error different than -ENOSPC, we can't +* do anything about that. +*/ + if (err) { + dev_info(>dev, "failed to send command to virtio pmem device, error code %d\n", err); + spin_unlock_irqrestore(>pmem_lock, flags); + err = -EIO; + goto ret; + } err = virtqueue_kick(vpmem->req_vq); spin_unlock_irqrestore(>pmem_lock, flags); Let me know if it looks reasonable to you. Thank you, Jakub Staron ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
From: Pankaj Gupta Date: Thu, Apr 25, 2019 at 10:00 PM > +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma, > + struct dax_device *dax_dev) > +{ > + return !(vma->flags & VM_SYNC); > +} Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is no field named `flags` in `struct vm_area_struct`. Thank you, Jakub ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
RE: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests
Here is a bit of inline commentary on the TAP13/TAP14 discussion. > -Original Message- > From: Brendan Higgins > > > On 5/3/19 4:14 PM, Brendan Higgins wrote: > > >> On 5/2/19 10:36 PM, Brendan Higgins wrote: > > > In any case, it sounds like you and Greg are in agreement on the core > > > libraries generating the output in TAP13, so I won't argue that point > > > further. > > > > > > ## Analysis of using TAP13 > > > > I have never looked at TAP version 13 in any depth at all, so do not > > consider > > me to be any sort of expert. > > > > My entire TAP knowledge is based on: > > > > https://testanything.org/tap-version-13-specification.html > > > > and the pull request to create the TAP version 14 specification: > > > >https://github.com/TestAnything/testanything.github.io/pull/36/files > > > > You can see the full version 14 document in the submitter's repo: > > > > $ git clone https://github.com/isaacs/testanything.github.io.git > > $ cd testanything.github.io > > $ git checkout tap14 > > $ ls tap-version-14-specification.md > > > > My understanding is the the version 14 specification is not trying to > > add new features, but instead capture what is already implemented in > > the wild. > > > > > > > One of my earlier concerns was that TAP13 is a bit over constrained > > > for what I would like to output from the KUnit core. It only allows > > > data to be output as either: > > > - test number > > > - ok/not ok with single line description > > > - directive > > > - diagnostics > > > - YAML block > > > > > > The test number must become before a set of ok/not ok lines, and does > > > not contain any additional information. One annoying thing about this > > > is it doesn't provide any kind of nesting or grouping. > > > > Greg's response mentions ktest (?) already does nesting. > > I think we are talking about kselftest. > > > Version 14 allows nesting through subtests. I have not looked at what > > ktest does, so I do not know if it uses subtest, or something else. > > Oh nice! That is new in version 14. I can use that. We have run into the problem of subtests (or nested tests, both using TAP13) in Fuego. I recall that this issue came up in kselftest, and I believe we discussed a solution, but I don't recall what it was. Can someone remind me what kselftest does to handle nested tests (in terms of TAP13 output)? > > > > There is one ok/not ok line per test and it may have a short > > > description of the test immediately after 'ok' or 'not ok'; this is > > > problematic because it wants the first thing you say about a test to > > > be after you know whether it passes or not. > > > > I think you could output a diagnostic line that says a test is starting. > > This is important to me because printk() errors and warnings that are > > related to a test can be output by a subsystem other than the subsystem > > that I am testing. If there is no marker at the start of the test > > then there is no way to attribute the printk()s to the test. > > I agree. This is a significant problem. In Fuego we output each line with a test id prefix, which goes against the spec, but helps solve this. Test output should be kept separate from system output, but if I understand correctly, there are no channels in prinkt to use to keep different data streams separate. How does kselftest deal with this now? > > Technically conforms with the spec, and kselftest does that, but is > also not part of the spec. Well, it *is* specified if you use > subtests. I think the right approach is to make each > "kunit_module/test suite" a test, and all the test cases will be > subtests. > > > > Directives are just a way to specify skipped tests and TODOs. > > > > > > Diagnostics seem useful, it looks like you can put whatever > > > information in them and print them out at anytime. It looks like a lot > > > of kselftests emit a lot of data this way. > > > > > > The YAML block seems to be the way that they prefer users to emit data > > > beyond number of tests run and whether a test passed or failed. I > > > could express most things I want to express in terms of YAML, but it > > > is not the nicest format for displaying a lot of data like > > > expectations, missed function calls, and other things which have a > > > natural concise representation. Nevertheless, YAML readability is > > > mostly a problem who won't be using the wrapper scripts. > > > > The examples in specification V13 and V14 look very simple and very > > readable to me. (And I am not a fan of YAML.) > > > > > > > My biggest > > > problem with the YAML block is that you can only have one, and TAP > > > specifies that it must come after the corresponding ok/not ok line, > > > which again has the issue that you have to hold on to a lot of > > > diagnostic data longer than you ideally would. Another downside is > > > that I now have to write a YAML serializer for the kernel. > > > > If a test generates diagnostic data, then I would
Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework
On Tue, May 07, 2019 at 10:01:19AM +0200, Greg KH wrote: > > My understanding is that the intent of KUnit is to avoid booting a kernel on > > real hardware or in a virtual machine. That seems to be a matter of > > semantics > > to me because isn't invoking a UML Linux just running the Linux kernel in > > a different form of virtualization? > > > > So I do not understand why KUnit is an improvement over kselftest. > > > > It seems to me that KUnit is just another piece of infrastructure that I > > am going to have to be familiar with as a kernel developer. More overhead, > > more information to stuff into my tiny little brain. > > > > I would guess that some developers will focus on just one of the two test > > environments (and some will focus on both), splitting the development > > resources instead of pooling them on a common infrastructure. > > > > What am I missing? > > kselftest provides no in-kernel framework for testing kernel code > specifically. That should be what kunit provides, an "easy" way to > write in-kernel tests for things. > > Brendan, did I get it right? Yes, that's basically right. You don't *have* to use KUnit. It's supposed to be a simple way to run a large number of small tests that for specific small components in a system. For example, I currently use xfstests using KVM and GCE to test all of ext4. These tests require using multiple 5 GB and 20GB virtual disks, and it works by mounting ext4 file systems and exercising ext4 through the system call interfaces, using userspace tools such as fsstress, fsx, fio, etc. It requires time overhead to start the VM, create and allocate virtual disks, etc. For example, to run a single 3 seconds xfstest (generic/001), it requires full 10 seconds to run it via kvm-xfstests. KUnit is something else; it's specifically intended to allow you to create lightweight tests quickly and easily, and by reducing the effort needed to write and run unit tests, hopefully we'll have a lot more of them and thus improve kernel quality. As an example, I have a volunteer working on developing KUinit tests for ext4. We're going to start by testing the ext4 extent status tree. The source code is at fs/ext4/extent_status.c; it's approximately 1800 LOC. The Kunit tests for the extent status tree will exercise all of the corner cases for the various extent status tree functions --- e.g., ext4_es_insert_delayed_block(), ext4_es_remove_extent(), ext4_es_cache_extent(), etc. And it will do this in isolation without our needing to create a test file system or using a test block device. Next we'll test the ext4 block allocator, again in isolation. To test the block allocator we will have to write "mock functions" which simulate reading allocation bitmaps from disk. Again, this will allow the test writer to explicitly construct corner cases and validate that the block allocator works as expected without having to reverese engineer file system data structures which will force a particular code path to be executed. So this is why it's largely irrelevant to me that KUinit uses UML. In fact, it's a feature. We're not testing device drivers, or the scheduler, or anything else architecture-specific. UML is not about virtualization. What it's about in this context is allowing us to start running test code as quickly as possible. Booting KVM takes about 3-4 seconds, and this includes initializing virtio_scsi and other device drivers. If by using UML we can hold the amount of unnecessary kernel subsystem initialization down to the absolute minimum, and if it means that we can communicating to the test framework via a userspace "printf" from UML/KUnit code, as opposed to via a virtual serial port to KVM's virtual console, it all makes for lighter weight testing. Why did I go looking for a volunteer to write KUnit tests for ext4? Well, I have a plan to make some changes in restructing how ext4's write path works, in order to support things like copy-on-write, a more efficient delayed allocation system, etc. This will require making changes to the extent status tree, and by having unit tests for the extent status tree, we'll be able to detect any bugs that we might accidentally introduce in the es tree far more quickly than if we didn't have those tests available. Google has long found that having these sorts of unit tests is a real win for developer velocity for any non-trivial code module (or C++ class), even when you take into account the time it takes to create the unit tests. - Ted P.S. Many thanks to Brendan for finding such a volunteer for me; the person in question is a SRE from Switzerland who is interested in getting involved with kernel testing, and this is going to be their 20% project. :-) ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote: > On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta wrote: > > > > Dont support 'MAP_SYNC' with non-DAX files and DAX files > > with asynchronous dax_device. Virtio pmem provides > > asynchronous host page cache flush mechanism. We don't > > support 'MAP_SYNC' with virtio pmem and xfs. > > > > Signed-off-by: Pankaj Gupta > > --- > > fs/xfs/xfs_file.c | 9 ++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > Darrick, does this look ok to take through the nvdimm tree? forgot about this, sorry. :/ > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index a7ceae90110e..f17652cca5ff 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -1203,11 +1203,14 @@ xfs_file_mmap( > > struct file *filp, > > struct vm_area_struct *vma) > > { > > + struct dax_device *dax_dev; > > + > > + dax_dev = xfs_find_daxdev_for_inode(file_inode(filp)); > > /* > > -* We don't support synchronous mappings for non-DAX files. At least > > -* until someone comes with a sensible use case. > > +* We don't support synchronous mappings for non-DAX files and > > +* for DAX files if underneath dax_device is not synchronous. > > */ > > - if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) > > + if (!daxdev_mapping_supported(vma, dax_dev)) > > return -EOPNOTSUPP; LGTM, and I'm fine with it going through nvdimm. Nothing in xfs-5.2-merge touches that function so it should be clean. Reviewed-by: Darrick J. Wong --D > > > > file_accessed(filp); > > -- > > 2.20.1 > > ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta wrote: > > This patch adds 'DAXDEV_SYNC' flag which is set > for nd_region doing synchronous flush. This later > is used to disable MAP_SYNC functionality for > ext4 & xfs filesystem for devices don't support > synchronous flush. > > Signed-off-by: Pankaj Gupta [..] > diff --git a/include/linux/dax.h b/include/linux/dax.h > index 0dd316a74a29..c97fc0cc7167 100644 > --- a/include/linux/dax.h > +++ b/include/linux/dax.h > @@ -7,6 +7,9 @@ > #include > #include > > +/* Flag for synchronous flush */ > +#define DAXDEV_F_SYNC true I'd feel better, i.e. it reads more canonically, if this was defined as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned long flags' rather than a bool. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta wrote: > > Dont support 'MAP_SYNC' with non-DAX files and DAX files > with asynchronous dax_device. Virtio pmem provides > asynchronous host page cache flush mechanism. We don't > support 'MAP_SYNC' with virtio pmem and xfs. > > Signed-off-by: Pankaj Gupta > --- > fs/xfs/xfs_file.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) Darrick, does this look ok to take through the nvdimm tree? > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index a7ceae90110e..f17652cca5ff 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -1203,11 +1203,14 @@ xfs_file_mmap( > struct file *filp, > struct vm_area_struct *vma) > { > + struct dax_device *dax_dev; > + > + dax_dev = xfs_find_daxdev_for_inode(file_inode(filp)); > /* > -* We don't support synchronous mappings for non-DAX files. At least > -* until someone comes with a sensible use case. > +* We don't support synchronous mappings for non-DAX files and > +* for DAX files if underneath dax_device is not synchronous. > */ > - if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) > + if (!daxdev_mapping_supported(vma, dax_dev)) > return -EOPNOTSUPP; > > file_accessed(filp); > -- > 2.20.1 > ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
Hi Pankaj, Some minor file placement comments below. On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta wrote: > > This patch adds virtio-pmem driver for KVM guest. > > Guest reads the persistent memory range information from > Qemu over VIRTIO and registers it on nvdimm_bus. It also > creates a nd_region object with the persistent memory > range information so that existing 'nvdimm/pmem' driver > can reserve this into system memory map. This way > 'virtio-pmem' driver uses existing functionality of pmem > driver to register persistent memory compatible for DAX > capable filesystems. > > This also provides function to perform guest flush over > VIRTIO from 'pmem' driver when userspace performs flush > on DAX memory range. > > Signed-off-by: Pankaj Gupta > --- > drivers/nvdimm/virtio_pmem.c | 114 + > drivers/virtio/Kconfig | 10 +++ > drivers/virtio/Makefile | 1 + > drivers/virtio/pmem.c| 118 +++ > include/linux/virtio_pmem.h | 60 > include/uapi/linux/virtio_ids.h | 1 + > include/uapi/linux/virtio_pmem.h | 10 +++ > 7 files changed, 314 insertions(+) > create mode 100644 drivers/nvdimm/virtio_pmem.c > create mode 100644 drivers/virtio/pmem.c > create mode 100644 include/linux/virtio_pmem.h > create mode 100644 include/uapi/linux/virtio_pmem.h > > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c > new file mode 100644 > index ..66b582f751a3 > --- /dev/null > +++ b/drivers/nvdimm/virtio_pmem.c > @@ -0,0 +1,114 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * virtio_pmem.c: Virtio pmem Driver > + * > + * Discovers persistent memory range information > + * from host and provides a virtio based flushing > + * interface. > + */ > +#include > +#include "nd.h" > + > + /* The interrupt handler */ > +void host_ack(struct virtqueue *vq) > +{ > + unsigned int len; > + unsigned long flags; > + struct virtio_pmem_request *req, *req_buf; > + struct virtio_pmem *vpmem = vq->vdev->priv; > + > + spin_lock_irqsave(>pmem_lock, flags); > + while ((req = virtqueue_get_buf(vq, )) != NULL) { > + req->done = true; > + wake_up(>host_acked); > + > + if (!list_empty(>req_list)) { > + req_buf = list_first_entry(>req_list, > + struct virtio_pmem_request, list); > + list_del(>req_list); > + req_buf->wq_buf_avail = true; > + wake_up(_buf->wq_buf); > + } > + } > + spin_unlock_irqrestore(>pmem_lock, flags); > +} > +EXPORT_SYMBOL_GPL(host_ack); > + > + /* The request submission function */ > +int virtio_pmem_flush(struct nd_region *nd_region) > +{ > + int err; > + unsigned long flags; > + struct scatterlist *sgs[2], sg, ret; > + struct virtio_device *vdev = nd_region->provider_data; > + struct virtio_pmem *vpmem = vdev->priv; > + struct virtio_pmem_request *req; > + > + might_sleep(); > + req = kmalloc(sizeof(*req), GFP_KERNEL); > + if (!req) > + return -ENOMEM; > + > + req->done = req->wq_buf_avail = false; > + strcpy(req->name, "FLUSH"); > + init_waitqueue_head(>host_acked); > + init_waitqueue_head(>wq_buf); > + sg_init_one(, req->name, strlen(req->name)); > + sgs[0] = > + sg_init_one(, >ret, sizeof(req->ret)); > + sgs[1] = > + > + spin_lock_irqsave(>pmem_lock, flags); > + err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC); > + if (err) { > + dev_err(>dev, "failed to send command to virtio pmem > device\n"); > + > + list_add_tail(>req_list, >list); > + spin_unlock_irqrestore(>pmem_lock, flags); > + > + /* When host has read buffer, this completes via host_ack */ > + wait_event(req->wq_buf, req->wq_buf_avail); > + spin_lock_irqsave(>pmem_lock, flags); > + } > + err = virtqueue_kick(vpmem->req_vq); > + spin_unlock_irqrestore(>pmem_lock, flags); > + > + if (!err) { > + err = -EIO; > + goto ret; > + } > + /* When host has read buffer, this completes via host_ack */ > + wait_event(req->host_acked, req->done); > + err = req->ret; > +ret: > + kfree(req); > + return err; > +}; > + > + /* The asynchronous flush callback function */ > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio) > +{ > + int rc = 0; > + > + /* Create child bio for asynchronous flush and chain with > +* parent bio. Otherwise directly call nd_region flush. > +*/ > + if (bio && bio->bi_iter.bi_sector != -1) { > + struct bio *child = bio_alloc(GFP_ATOMIC, 0); > + > + if (!child) > +
Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework
On 5/7/19 2:01 AM, Greg KH wrote: On Mon, May 06, 2019 at 08:14:12PM -0700, Frank Rowand wrote: On 5/1/19 4:01 PM, Brendan Higgins wrote: ## TLDR I rebased the last patchset on 5.1-rc7 in hopes that we can get this in 5.2. Shuah, I think you, Greg KH, and myself talked off thread, and we agreed we would merge through your tree when the time came? Am I remembering correctly? ## Background This patch set proposes KUnit, a lightweight unit testing and mocking framework for the Linux kernel. Unlike Autotest and kselftest, KUnit is a true unit testing framework; it does not require installing the kernel on a test machine or in a VM and does not require tests to be written in userspace running on a host kernel. Additionally, KUnit is fast: From invocation to completion KUnit can run several dozen tests in under a second. Currently, the entire KUnit test suite for KUnit runs in under a second from the initial invocation (build time excluded). KUnit is heavily inspired by JUnit, Python's unittest.mock, and Googletest/Googlemock for C++. KUnit provides facilities for defining unit test cases, grouping related test cases into test suites, providing common infrastructure for running tests, mocking, spying, and much more. As a result of the emails replying to this patch thread, I am now starting to look at kselftest. My level of understanding is based on some slide presentations, an LWN article, https://kselftest.wiki.kernel.org/ and a _tiny_ bit of looking at kselftest code. tl;dr; I don't really understand kselftest yet. (1) why KUnit exists ## What's so special about unit testing? A unit test is supposed to test a single unit of code in isolation, hence the name. There should be no dependencies outside the control of the test; this means no external dependencies, which makes tests orders of magnitudes faster. Likewise, since there are no external dependencies, there are no hoops to jump through to run the tests. Additionally, this makes unit tests deterministic: a failing unit test always indicates a problem. Finally, because unit tests necessarily have finer granularity, they are able to test all code paths easily solving the classic problem of difficulty in exercising error handling code. (2) KUnit is not meant to replace kselftest ## Is KUnit trying to replace other testing frameworks for the kernel? No. Most existing tests for the Linux kernel are end-to-end tests, which have their place. A well tested system has lots of unit tests, a reasonable number of integration tests, and some end-to-end tests. KUnit is just trying to address the unit test space which is currently not being addressed. My understanding is that the intent of KUnit is to avoid booting a kernel on real hardware or in a virtual machine. That seems to be a matter of semantics to me because isn't invoking a UML Linux just running the Linux kernel in a different form of virtualization? So I do not understand why KUnit is an improvement over kselftest. They are in two different categories. Kselftest falls into black box regression test suite which is a collection of user-space tests with a few kernel test modules back-ending the tests in some cases. Kselftest can be used by both kernel developers and users and provides a good way to regression test releases in test rings. KUnit is a white box category and is a better fit as unit test framework for development and provides a in-kernel testing. I wouldn't view them one replacing the other. They just provide coverage for different areas of testing. I wouldn't view KUnit as something that would be easily run in test rings for example. Brendan, does that sound about right? It seems to me that KUnit is just another piece of infrastructure that I am going to have to be familiar with as a kernel developer. More overhead, more information to stuff into my tiny little brain. I would guess that some developers will focus on just one of the two test environments (and some will focus on both), splitting the development resources instead of pooling them on a common infrastructure. What am I missing? kselftest provides no in-kernel framework for testing kernel code specifically. That should be what kunit provides, an "easy" way to write in-kernel tests for things. Brendan, did I get it right? thanks, -- Shuah ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2] drivers/dax: Allow to include DEV_DAX_PMEM as builtin
On Tue, May 7, 2019 at 4:50 AM Aneesh Kumar K.V wrote: > > > Hi Dan, > > "Aneesh Kumar K.V" writes: > > > This move the dependency to DEV_DAX_PMEM_COMPAT such that only > > if DEV_DAX_PMEM is built as module we can allow the compat support. > > > > This allows to test the new code easily in a emulation setup where we > > often build things without module support. > > > > Signed-off-by: Aneesh Kumar K.V > > Any update on this. Can we merge this? Applied for the v5.2 pull request. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2] drivers/dax: Allow to include DEV_DAX_PMEM as builtin
Hi Dan, "Aneesh Kumar K.V" writes: > This move the dependency to DEV_DAX_PMEM_COMPAT such that only > if DEV_DAX_PMEM is built as module we can allow the compat support. > > This allows to test the new code easily in a emulation setup where we > often build things without module support. > > Signed-off-by: Aneesh Kumar K.V Any update on this. Can we merge this? > --- > Changes from V1: > * Make sure we only build compat code as module > > drivers/dax/Kconfig | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig > index 5ef624fe3934..a59f338f520f 100644 > --- a/drivers/dax/Kconfig > +++ b/drivers/dax/Kconfig > @@ -23,7 +23,6 @@ config DEV_DAX > config DEV_DAX_PMEM > tristate "PMEM DAX: direct access to persistent memory" > depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX > - depends on m # until we can kill DEV_DAX_PMEM_COMPAT > default DEV_DAX > help > Support raw access to persistent memory. Note that this > @@ -50,7 +49,7 @@ config DEV_DAX_KMEM > > config DEV_DAX_PMEM_COMPAT > tristate "PMEM DAX: support the deprecated /sys/class/dax interface" > - depends on DEV_DAX_PMEM > + depends on m && DEV_DAX_PMEM=m > default DEV_DAX_PMEM > help > Older versions of the libdaxctl library expect to find all > -- > 2.20.1 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework
On Mon, May 06, 2019 at 08:14:12PM -0700, Frank Rowand wrote: > On 5/1/19 4:01 PM, Brendan Higgins wrote: > > ## TLDR > > > > I rebased the last patchset on 5.1-rc7 in hopes that we can get this in > > 5.2. > > > > Shuah, I think you, Greg KH, and myself talked off thread, and we agreed > > we would merge through your tree when the time came? Am I remembering > > correctly? > > > > ## Background > > > > This patch set proposes KUnit, a lightweight unit testing and mocking > > framework for the Linux kernel. > > > > Unlike Autotest and kselftest, KUnit is a true unit testing framework; > > it does not require installing the kernel on a test machine or in a VM > > and does not require tests to be written in userspace running on a host > > kernel. Additionally, KUnit is fast: From invocation to completion KUnit > > can run several dozen tests in under a second. Currently, the entire > > KUnit test suite for KUnit runs in under a second from the initial > > invocation (build time excluded). > > > > KUnit is heavily inspired by JUnit, Python's unittest.mock, and > > Googletest/Googlemock for C++. KUnit provides facilities for defining > > unit test cases, grouping related test cases into test suites, providing > > common infrastructure for running tests, mocking, spying, and much more. > > As a result of the emails replying to this patch thread, I am now > starting to look at kselftest. My level of understanding is based > on some slide presentations, an LWN article, > https://kselftest.wiki.kernel.org/ > and a _tiny_ bit of looking at kselftest code. > > tl;dr; I don't really understand kselftest yet. > > > (1) why KUnit exists > > > ## What's so special about unit testing? > > > > A unit test is supposed to test a single unit of code in isolation, > > hence the name. There should be no dependencies outside the control of > > the test; this means no external dependencies, which makes tests orders > > of magnitudes faster. Likewise, since there are no external dependencies, > > there are no hoops to jump through to run the tests. Additionally, this > > makes unit tests deterministic: a failing unit test always indicates a > > problem. Finally, because unit tests necessarily have finer granularity, > > they are able to test all code paths easily solving the classic problem > > of difficulty in exercising error handling code. > > (2) KUnit is not meant to replace kselftest > > > ## Is KUnit trying to replace other testing frameworks for the kernel? > > > > No. Most existing tests for the Linux kernel are end-to-end tests, which > > have their place. A well tested system has lots of unit tests, a > > reasonable number of integration tests, and some end-to-end tests. KUnit > > is just trying to address the unit test space which is currently not > > being addressed. > > My understanding is that the intent of KUnit is to avoid booting a kernel on > real hardware or in a virtual machine. That seems to be a matter of semantics > to me because isn't invoking a UML Linux just running the Linux kernel in > a different form of virtualization? > > So I do not understand why KUnit is an improvement over kselftest. > > It seems to me that KUnit is just another piece of infrastructure that I > am going to have to be familiar with as a kernel developer. More overhead, > more information to stuff into my tiny little brain. > > I would guess that some developers will focus on just one of the two test > environments (and some will focus on both), splitting the development > resources instead of pooling them on a common infrastructure. > > What am I missing? kselftest provides no in-kernel framework for testing kernel code specifically. That should be what kunit provides, an "easy" way to write in-kernel tests for things. Brendan, did I get it right? thanks, greg k-h ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm