Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush

2019-05-07 Thread Pankaj Gupta


> 
> On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> > On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta  wrote:
> > >
> > > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > > with asynchronous dax_device. Virtio pmem provides
> > > asynchronous host page cache flush mechanism. We don't
> > > support 'MAP_SYNC' with virtio pmem and xfs.
> > >
> > > Signed-off-by: Pankaj Gupta 
> > > ---
> > >  fs/xfs/xfs_file.c | 9 ++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > Darrick, does this look ok to take through the nvdimm tree?
> 
>  forgot about this, sorry. :/
> 
> > >
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index a7ceae90110e..f17652cca5ff 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> > > struct file *filp,
> > > struct vm_area_struct *vma)
> > >  {
> > > +   struct dax_device   *dax_dev;
> > > +
> > > +   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> > > /*
> > > -* We don't support synchronous mappings for non-DAX files. At
> > > least
> > > -* until someone comes with a sensible use case.
> > > +* We don't support synchronous mappings for non-DAX files and
> > > +* for DAX files if underneath dax_device is not synchronous.
> > >  */
> > > -   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > > +   if (!daxdev_mapping_supported(vma, dax_dev))
> > > return -EOPNOTSUPP;
> 
> LGTM, and I'm fine with it going through nvdimm.  Nothing in
> xfs-5.2-merge touches that function so it should be clean.
> 
> Reviewed-by: Darrick J. Wong 

Thank you for the review.

Pankaj

> 
> --D
> 
> > >
> > > file_accessed(filp);
> > > --
> > > 2.20.1
> > >
> 
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Delivery failed

2019-05-07 Thread Automatic Email Delivery Software
The original message was received at Wed, 8 May 2019 11:35:04 +0800
from lists.01.org [182.75.71.75]

- The following addresses had permanent fatal errors -


- Transcript of session follows -
  while talking to lists.01.org.:
>>> MAIL From:"Automatic Email Delivery Software" 
<<< 501 "Automatic Email Delivery Software" ... Refused



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH ndctl] ndctl, region: return the rc value in region_action

2019-05-07 Thread Yi Zhang

Hi Elliott

Thanks for your review.

On 4/16/19 2:29 AM, Elliott, Robert (Servers) wrote:



-Original Message-
From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of Dan 
Williams
Sent: Monday, April 15, 2019 11:29 AM
Subject: Re: [PATCH ndctl] ndctl, region: return the rc value in region_action

On Mon, Apr 15, 2019 at 1:24 AM Yi Zhang  wrote:

if region_action failed, return the rc valude instead of 0

valude s/b value


Sorry for the late response, I just found this patch  was merged:

https://git.kernel.org/pub/scm/utils/kernel/ndctl/ndctl.git/commit/?id=9840942d5f6dc0d0e3693956f6733bbbc4e06e3a


Signed-off-by: Yi Zhang 
---
  ndctl/region.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ndctl/region.c b/ndctl/region.c
index 993b3f8..7945007 100644
--- a/ndctl/region.c
+++ b/ndctl/region.c
@@ -89,7 +89,7 @@ static int region_action(struct ndctl_region *region, enum 
device_action mode)
 break;
 }

-   return 0;
+   return rc;
  }

Looks good to me:

Reviewed-by: Dan Williams 

Perhaps a similar change should be made to ndctl_namespace_disable_safe(),
one of the functions called by region_action() whose rc is propagated, to
propagate the rc from ndctl_namespace_disable_invalidate().


I'm not familiar with the ndctl code, could you help send patch to fix 
it if the change needed.



Thanks

Yi





___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH] contrib: add an example qemu setup script for HMAT emulation

2019-05-07 Thread Vishal Verma
Add a script to demonstrate HMAT emulation using qemu by injecting the
initramfs with a made-up ACPI HMAT table.

Cc: Dan Williams 
Originally-authored-by: Keith Busch 
[vishal: minor shellcheck fixups]
Signed-off-by: Vishal Verma 
---
 contrib/daxctl-qemu-hmat-setup | 211 +
 1 file changed, 211 insertions(+)
 create mode 100755 contrib/daxctl-qemu-hmat-setup

diff --git a/contrib/daxctl-qemu-hmat-setup b/contrib/daxctl-qemu-hmat-setup
new file mode 100755
index 000..0670e99
--- /dev/null
+++ b/contrib/daxctl-qemu-hmat-setup
@@ -0,0 +1,211 @@
+#!/bin/bash -e
+
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2019 Intel Corporation. All rights reserved.
+
+KERNEL=${KERNEL:-$(uname -r)}
+ROOT_IMAGE=${ROOT_IMAGE:-root.img}
+
+INITIATORS=4
+TARGETS=128
+TARGET_SIZE_MB=128
+TARGET_SIZE="$((TARGET_SIZE_MB * 1024 * 1024))"
+QEMU_MEM="$((TARGET_SIZE_MB * TARGETS))"
+
+for i in $(seq 0 $((TARGETS - 1))); do
+  qemu-img create -f raw "dimm-$i" "${TARGET_SIZE}"
+done;
+
+# Address Range Structures
+cat < hmat.dsl
+   Signature : "HMAT"[Heterogeneous Memory Attributes 
Table]
+Table Length : 0200
+Revision : 01
+Checksum : F4
+  Oem ID : "BOCHS "
+Oem Table ID : "BXPCHMAT"
+Oem Revision : 0001
+ Asl Compiler ID : "INTL"
+   Asl Compiler Revision : 20170929
+Reserved : 
+
+  Structure Type :  [Memory Subystem Address Range]
+Reserved : 
+  Length : 0028
+   Flags (decoded below) : 0003
+Processor Proximity Domain Valid : 1
+   Memory Proximity Domain Valid : 1
+Reservation Hint : 0
+   Reserved1 : 
+  Processor Proximity Domain : 
+ Memory Proximity Domain : 
+   Reserved2 : 
+ Physical Address Range Base : 
+ Physical Address Range Size : 000A
+
+  Structure Type :  [Memory Subystem Address Range]
+Reserved : 
+  Length : 0028
+   Flags (decoded below) : 0003
+Processor Proximity Domain Valid : 1
+   Memory Proximity Domain Valid : 1
+Reservation Hint : 0
+   Reserved1 : 
+  Processor Proximity Domain : 
+ Memory Proximity Domain : 
+   Reserved2 : 
+ Physical Address Range Base : 0010
+ Physical Address Range Size : 07F0
+EOF
+
+for i in $(seq 1 $((TARGETS - 1))); do
+  BASE=$(printf "%016x" $((0x800 * i)))
+cat <> hmat.dsl
+
+  Structure Type :  [Memory Subystem Address Range]
+Reserved : 
+  Length : 0028
+   Flags (decoded below) : 0003
+Processor Proximity Domain Valid : 1
+   Memory Proximity Domain Valid : 1
+Reservation Hint : 0
+   Reserved1 : 
+  Processor Proximity Domain : $(printf "%08x" $((i % INITIATORS)))
+ Memory Proximity Domain : $(printf "%08x" "${i}")
+   Reserved2 : 
+ Physical Address Range Base : ${BASE}
+ Physical Address Range Size : 0800
+EOF
+done
+
+# System Locality Latency
+TABLE_SIZE="$(printf "%08x" $((40 + 4 * INITIATORS + 4 * TARGETS + 2 * 
INITIATORS * TARGETS)))"
+cat <> hmat.dsl
+
+  Structure Type : 0001 [System Locality Latency and Bandwidth 
Information]
+Reserved : 
+  Length : ${TABLE_SIZE}
+   Flags (decoded below) : 00
+Memory Hierarchy : 0
+   Data Type : 00
+   Reserved1 : 
+   Initiator Proximity Domains # : $(printf "%08x" ${INITIATORS})
+  Target Proximity Domains # : $(printf "%08x" ${TARGETS})
+   Reserved2 : 
+ Entry Base Unit : 0001
+EOF
+
+for i in $(seq 0 $((INITIATORS - 1))); do
+cat <> hmat.dsl
+ Initiator Proximity Domain List : $(printf "%08x" "${i}")
+EOF
+done
+
+for i in $(seq 0 $((TARGETS - 1))); do
+cat <> hmat.dsl
+Target Proximity Domain List : $(printf "%08x" "${i}")
+EOF
+done
+
+LOCAL_START=0x10
+REMOTE_START=0x20
+for i in $(seq 0 $((INITIATORS - 1))); do
+  for j in $(seq 0 $((TARGETS - 1))); do
+if [ "$((j % INITIATORS))" -eq "${i}" ]; then
+  cat <> hmat.dsl
+   Entry : $(printf "%04x" $((LOCAL_START + j * 10)))
+EOF
+else
+  cat <> hmat.dsl
+   Entry : $(printf "%04x" $((REMOTE_START + j * 10)))
+EOF
+fi
+  done
+done
+
+# System Locality Bandwidth
+TABLE_SIZE=$(printf "%08x" $((40 + 4 * INITIATORS + 4 * TARGETS + 2 * 

[ndctl PATCH v2 06/10] libdaxctl: add an interface to get the mode for a dax device

2019-05-07 Thread Vishal Verma
In preparation for a reconfigure-device command, add an interface to
retrieve the 'mode' of a dax device. This will allow the
reconfigure-device command (and via daxctl_dev_to_json()), also
daxctl-list) to print the mode on device listings via a list command or
immediately after a mode change.

Cc: Dave Hansen 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 daxctl/lib/libdaxctl.c   | 43 
 daxctl/lib/libdaxctl.sym |  1 +
 daxctl/libdaxctl.h   |  1 +
 util/json.c  | 14 +
 4 files changed, 59 insertions(+)

diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c
index aab2364..a4919e0 100644
--- a/daxctl/lib/libdaxctl.c
+++ b/daxctl/lib/libdaxctl.c
@@ -12,6 +12,8 @@
  */
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -840,6 +842,47 @@ DAXCTL_EXPORT int daxctl_dev_disable(struct daxctl_dev 
*dev)
return 0;
 }
 
+DAXCTL_EXPORT enum daxctl_dev_mode daxctl_dev_get_mode(struct daxctl_dev *dev)
+{
+   const char *devname = daxctl_dev_get_devname(dev);
+   struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev);
+   int rc = DAXCTL_DEV_MODE_UNKNOWN;
+   char path[200];
+   const int len = sizeof(path);
+   char *mod_path, *mod_base;
+
+   if (!daxctl_dev_is_enabled(dev))
+   return rc;
+
+   if (ctx->subsys != DAX_BUS) {
+   err(ctx, "%s: invalid operation for dax subsystem\n", devname);
+   err(ctx, "%s:  see daxctl-migrate-device-model(1)\n", devname);
+   return -ENXIO;
+   }
+
+   if (snprintf(path, len, "%s/driver/module", dev->dev_path) >= len) {
+   err(ctx, "%s: buffer too small!\n", devname);
+   return -ENXIO;
+   }
+
+   mod_path = realpath(path, NULL);
+   if (!mod_path) {
+   rc = -errno;
+   err(ctx, "%s:  unable to determine module: %s\n", devname,
+   strerror(errno));
+   return rc;
+   }
+
+   mod_base = basename(mod_path);
+   if (strcmp(mod_base, dax_modules[DAXCTL_DEV_MODE_RAM]) == 0)
+   rc = DAXCTL_DEV_MODE_RAM;
+   else if (strcmp(mod_base, dax_modules[DAXCTL_DEV_MODE_DEVDAX]) == 0)
+   rc = DAXCTL_DEV_MODE_DEVDAX;
+
+   free(mod_path);
+   return rc;
+}
+
 DAXCTL_EXPORT struct daxctl_ctx *daxctl_dev_get_ctx(struct daxctl_dev *dev)
 {
return dev->region->ctx;
diff --git a/daxctl/lib/libdaxctl.sym b/daxctl/lib/libdaxctl.sym
index cc47ed6..d53976d 100644
--- a/daxctl/lib/libdaxctl.sym
+++ b/daxctl/lib/libdaxctl.sym
@@ -62,4 +62,5 @@ global:
daxctl_dev_online_node;
daxctl_dev_offline_node;
daxctl_dev_node_is_online;
+   daxctl_dev_get_mode;
 } LIBDAXCTL_5;
diff --git a/daxctl/libdaxctl.h b/daxctl/libdaxctl.h
index db0d4ea..4f9088f 100644
--- a/daxctl/libdaxctl.h
+++ b/daxctl/libdaxctl.h
@@ -76,6 +76,7 @@ int daxctl_dev_enable_ram(struct daxctl_dev *dev);
 int daxctl_dev_online_node(struct daxctl_dev *dev);
 int daxctl_dev_offline_node(struct daxctl_dev *dev);
 int daxctl_dev_node_is_online(struct daxctl_dev *dev);
+enum daxctl_dev_mode daxctl_dev_get_mode(struct daxctl_dev *dev);
 
 #define daxctl_dev_foreach(region, dev) \
 for (dev = daxctl_dev_get_first(region); \
diff --git a/util/json.c b/util/json.c
index b7ce719..4f13222 100644
--- a/util/json.c
+++ b/util/json.c
@@ -271,6 +271,7 @@ struct json_object *util_daxctl_dev_to_json(struct 
daxctl_dev *dev,
 {
const char *devname = daxctl_dev_get_devname(dev);
struct json_object *jdev, *jobj;
+   enum daxctl_dev_mode mode;
int node;
 
jdev = json_object_new_object();
@@ -292,6 +293,19 @@ struct json_object *util_daxctl_dev_to_json(struct 
daxctl_dev *dev,
json_object_object_add(jdev, "numa_node", jobj);
}
 
+   mode = daxctl_dev_get_mode(dev);
+   if (mode > 0) {
+   jobj = NULL;
+   if (mode == DAXCTL_DEV_MODE_RAM)
+   jobj = json_object_new_string("system-ram");
+   else if (mode == DAXCTL_DEV_MODE_DEVDAX)
+   jobj = json_object_new_string("devdax");
+   else
+   jobj = json_object_new_string("unknown");
+   if (jobj)
+   json_object_object_add(jdev, "mode", jobj);
+   }
+
return jdev;
 }
 
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v2 00/10] daxctl: add a new reconfigure-device command

2019-05-07 Thread Vishal Verma
Changes in v2:
 - Add examples to the documentation page (Dave Hansen)
 - Clarify documentation regarding the conversion from system-ram to devdax
 - Remove any references to a persistent config from the documentation -
   those can be added when the feature is added.
 - device.c: validate option compatibility
 - daxctl-list: display numa_node for device listings
 - daxctl-list: display mode for device listings
 - make the options more consistent by adding a '-O' short option
   for --attempt-offline

Add a new daxctl-reconfigure-device command that lets us reconfigure DAX
devices back and forth between 'system-ram' and 'device-dax' modes. It
also includes facilities to online any newly hot-plugged memory
(default), and attempt to offline memory before converting away from the
system-ram mode (not default, requires a --attempt-offline option).

Currently missing from this series is a way to persistently store which
devices have been 'marked' for use as system-ram. This depends on a
config system overhaul in ndctl, and patches for those will follow
separately and are independent of this work.

Example invocations:

1. Reconfigure dax0.0 to system-ram mode, don’t online the memory
# daxctl reconfigure-device --mode=system-ram --no-online dax0.0
[
  {
"chardev":"dax0.0",
"size":16777216000,
"numa_node":2,
"mode":"system-ram"
  }
]

2. Reconfigure dax0.0 to devdax mode, attempt to offline the memory
# daxctl reconfigure-device --human --mode=devdax --attempt-offline dax0.0
{
  "chardev":"dax0.0",
  "size":"15.63 GiB (16.78 GB)",
  "numa_node":2,
  "mode":"devdax"
}

3. Reconfigure all dax devices on region0 to system-ram mode
# daxctl reconfigure-device --mode=system-ram --region=0 all
[
  {
"chardev":"dax0.0",
"size":16777216000,
"numa_node":2,
"mode":"system-ram"
  },
  {
"chardev":"dax0.1",
"size":16777216000,
"numa_node":3,
"mode":"system-ram"
  }
]

These patches can also be found in the 'kmem-pending' branch on github:
https://github.com/pmem/ndctl/tree/kmem-pending

Cc: Dan Williams 
Cc: Dave Hansen 
Cc: Pavel Tatashin 

Vishal Verma (10):
  libdaxctl: add interfaces in support of device modes
  libdaxctl: cache 'subsystem' in daxctl_ctx
  libdaxctl: add interfaces to enable/disable devices
  libdaxctl: add interfaces to get/set the online state for a node
  daxctl/list: add numa_node for device listings
  libdaxctl: add an interface to get the mode for a dax device
  daxctl: add a new reconfigure-device command
  Documentation/daxctl: add a man page for daxctl-reconfigure-device
  contrib/ndctl: fix region-id completions for daxctl
  contrib/ndctl: add bash-completion for daxctl-reconfigure-device

 Documentation/daxctl/Makefile.am  |   3 +-
 .../daxctl/daxctl-reconfigure-device.txt  | 118 
 contrib/ndctl |  34 +-
 daxctl/Makefile.am|   2 +
 daxctl/builtin.h  |   1 +
 daxctl/daxctl.c   |   1 +
 daxctl/device.c   | 237 
 daxctl/lib/Makefile.am|   3 +-
 daxctl/lib/libdaxctl-private.h|  21 +
 daxctl/lib/libdaxctl.c| 554 +-
 daxctl/lib/libdaxctl.sym  |  14 +
 daxctl/libdaxctl.h|  16 +
 util/json.c   |  22 +
 13 files changed, 1015 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/daxctl/daxctl-reconfigure-device.txt
 create mode 100644 daxctl/device.c

-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v2 01/10] libdaxctl: add interfaces in support of device modes

2019-05-07 Thread Vishal Verma
In preparation for libdaxctl and daxctl to grow operational modes for
DAX devices, add the following supporting APIs:

  daxctl_dev_get_ctx
  daxctl_dev_is_enabled

Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 daxctl/lib/libdaxctl.c   | 30 ++
 daxctl/lib/libdaxctl.sym |  6 ++
 daxctl/libdaxctl.h   |  2 ++
 3 files changed, 38 insertions(+)

diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c
index c2e3a52..70f896b 100644
--- a/daxctl/lib/libdaxctl.c
+++ b/daxctl/lib/libdaxctl.c
@@ -559,6 +559,36 @@ static void dax_regions_init(struct daxctl_ctx *ctx)
}
 }
 
+static int is_enabled(const char *drvpath)
+{
+   struct stat st;
+
+   if (lstat(drvpath, ) < 0 || !S_ISLNK(st.st_mode))
+   return 0;
+   else
+   return 1;
+}
+
+DAXCTL_EXPORT int daxctl_dev_is_enabled(struct daxctl_dev *dev)
+{
+   struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev);
+   char *path = dev->dev_buf;
+   int len = dev->buf_len;
+
+   if (snprintf(path, len, "%s/driver", dev->dev_path) >= len) {
+   err(ctx, "%s: buffer too small!\n",
+   daxctl_dev_get_devname(dev));
+   return 0;
+   }
+
+   return is_enabled(path);
+}
+
+DAXCTL_EXPORT struct daxctl_ctx *daxctl_dev_get_ctx(struct daxctl_dev *dev)
+{
+   return dev->region->ctx;
+}
+
 DAXCTL_EXPORT struct daxctl_dev *daxctl_dev_get_first(struct daxctl_region 
*region)
 {
dax_devices_init(region);
diff --git a/daxctl/lib/libdaxctl.sym b/daxctl/lib/libdaxctl.sym
index 84d3a69..c4af9a7 100644
--- a/daxctl/lib/libdaxctl.sym
+++ b/daxctl/lib/libdaxctl.sym
@@ -50,3 +50,9 @@ LIBDAXCTL_5 {
 global:
daxctl_region_get_path;
 } LIBDAXCTL_4;
+
+LIBDAXCTL_6 {
+global:
+   daxctl_dev_get_ctx;
+   daxctl_dev_is_enabled;
+} LIBDAXCTL_5;
diff --git a/daxctl/libdaxctl.h b/daxctl/libdaxctl.h
index 1d13ea2..e20ccb4 100644
--- a/daxctl/libdaxctl.h
+++ b/daxctl/libdaxctl.h
@@ -67,6 +67,8 @@ const char *daxctl_dev_get_devname(struct daxctl_dev *dev);
 int daxctl_dev_get_major(struct daxctl_dev *dev);
 int daxctl_dev_get_minor(struct daxctl_dev *dev);
 unsigned long long daxctl_dev_get_size(struct daxctl_dev *dev);
+struct daxctl_ctx *daxctl_dev_get_ctx(struct daxctl_dev *dev);
+int daxctl_dev_is_enabled(struct daxctl_dev *dev);
 
 #define daxctl_dev_foreach(region, dev) \
 for (dev = daxctl_dev_get_first(region); \
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v2 10/10] contrib/ndctl: add bash-completion for daxctl-reconfigure-device

2019-05-07 Thread Vishal Verma
Add bash completion helpers for the new daxctl-reconfigure-device
command.

Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 contrib/ndctl | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/contrib/ndctl b/contrib/ndctl
index d1f8bd6..32c4731 100755
--- a/contrib/ndctl
+++ b/contrib/ndctl
@@ -544,7 +544,7 @@ __daxctlcomp()
 
COMPREPLY=( $( compgen -W "$1" -- "$2" ) )
for cword in "${COMPREPLY[@]}"; do
-   if [[ "$cword" == @(--region|--dev) ]]; then
+   if [[ "$cword" == @(--region|--dev|--mode) ]]; then
COMPREPLY[$i]="${cword}="
else
COMPREPLY[$i]="${cword} "
@@ -569,6 +569,9 @@ __daxctl_comp_options()
--dev)
opts="$(__daxctl_get_devs -i)"
;;
+   --mode)
+   opts="system-ram devdax"
+   ;;
*)
return
;;
@@ -579,8 +582,19 @@ __daxctl_comp_options()
 
 __daxctl_comp_non_option_args()
 {
-   # there aren't any commands that accept non option arguments yet
-   return
+   local subcmd=$1
+   local cur=$2
+   local opts
+
+   case $subcmd in
+   reconfigure-device)
+   opts="$(__daxctl_get_devs -i) all"
+   ;;
+   *)
+   return
+   ;;
+   esac
+   __daxctlcomp "$opts" "$cur"
 }
 
 __daxctl_main()
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v2 04/10] libdaxctl: add interfaces to get/set the online state for a node

2019-05-07 Thread Vishal Verma
In preparation for converting DAX devices for use as system-ram via the
kernel's 'kmem' facility, add libndctl helpers to manipulate the sysfs
interfaces to get the target_node of a DAX device, and to online,
offline, and query the state of hotplugged memory sections associated
with a given node.

This adds the following new interfaces:

  daxctl_dev_get_target_node
  daxctl_dev_online_node
  daxctl_dev_offline_node
  daxctl_dev_node_is_online

Cc: Pavel Tatashin 
Cc: Dave Hansen 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 daxctl/lib/libdaxctl-private.h |   6 +
 daxctl/lib/libdaxctl.c | 228 +
 daxctl/lib/libdaxctl.sym   |   4 +
 daxctl/libdaxctl.h |   4 +
 4 files changed, 242 insertions(+)

diff --git a/daxctl/lib/libdaxctl-private.h b/daxctl/lib/libdaxctl-private.h
index e64d0a7..ef443aa 100644
--- a/daxctl/lib/libdaxctl-private.h
+++ b/daxctl/lib/libdaxctl-private.h
@@ -33,6 +33,11 @@ static const char *dax_modules[] = {
[DAXCTL_DEV_MODE_RAM] = "kmem",
 };
 
+enum node_state {
+   NODE_OFFLINE,
+   NODE_ONLINE,
+};
+
 /**
  * struct daxctl_region - container for dax_devices
  */
@@ -63,6 +68,7 @@ struct daxctl_dev {
struct kmod_module *module;
struct kmod_list *kmod_list;
struct daxctl_region *region;
+   int target_node;
 };
 
 static inline int check_kmod(struct kmod_ctx *kmod_ctx)
diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c
index d50b321..aab2364 100644
--- a/daxctl/lib/libdaxctl.c
+++ b/daxctl/lib/libdaxctl.c
@@ -28,6 +28,7 @@
 #include "libdaxctl-private.h"
 
 static const char *attrs = "dax_region";
+static const char *node_base = "/sys/devices/system/node";
 
 static void free_region(struct daxctl_region *region, struct list_head *head);
 
@@ -397,6 +398,12 @@ static void *add_dax_dev(void *parent, int id, const char 
*daxdev_base)
} else
dbg(ctx, "%s: modalias attribute missing\n", devname);
 
+   sprintf(path, "%s/target_node", daxdev_base);
+   if (sysfs_read_attr(ctx, path, buf) == 0)
+   dev->target_node = strtol(buf, NULL, 0);
+   else
+   dev->target_node = -1;
+
daxctl_dev_foreach(region, dev_dup)
if (dev_dup->id == dev->id) {
free_dev(dev, NULL);
@@ -897,3 +904,224 @@ DAXCTL_EXPORT unsigned long long 
daxctl_dev_get_size(struct daxctl_dev *dev)
 {
return dev->size;
 }
+
+DAXCTL_EXPORT int daxctl_dev_get_target_node(struct daxctl_dev *dev)
+{
+   return dev->target_node;
+}
+
+static int online_one_memblock(struct daxctl_dev *dev, char *path)
+{
+   const char *devname = daxctl_dev_get_devname(dev);
+   struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev);
+   const char *mode = "online_movable";
+   char buf[SYSFS_ATTR_SIZE];
+   int rc;
+
+   rc = sysfs_read_attr(ctx, path, buf);
+   if (rc) {
+   err(ctx, "%s: Failed to read %s: %s\n",
+   devname, path, strerror(-rc));
+   return rc;
+   }
+
+   /*
+* if already online, possibly due to kernel config or a udev rule,
+* there is nothing to do and we can skip over the memblock
+*/
+   if (strncmp(buf, "online", 6) == 0)
+   return 0;
+
+   rc = sysfs_write_attr_quiet(ctx, path, mode);
+   if (rc)
+   err(ctx, "%s: Failed to online %s: %s\n",
+   devname, path, strerror(-rc));
+   return rc;
+}
+
+static int offline_one_memblock(struct daxctl_dev *dev, char *path)
+{
+   const char *devname = daxctl_dev_get_devname(dev);
+   struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev);
+   const char *mode = "offline";
+   char buf[SYSFS_ATTR_SIZE];
+   int rc;
+
+   rc = sysfs_read_attr(ctx, path, buf);
+   if (rc) {
+   err(ctx, "%s: Failed to read %s: %s\n",
+   devname, path, strerror(-rc));
+   return rc;
+   }
+
+   /* if already offline, there is nothing to do */
+   if (strncmp(buf, "offline", 6) == 0)
+   return 0;
+
+   rc = sysfs_write_attr_quiet(ctx, path, mode);
+   if (rc)
+   err(ctx, "%s: Failed to offline %s: %s\n",
+   devname, path, strerror(-rc));
+   return rc;
+}
+
+static int daxctl_dev_node_set_state(struct daxctl_dev *dev,
+   enum node_state state)
+{
+   struct daxctl_ctx *ctx = daxctl_dev_get_ctx(dev);
+   int target_node, rc, changed = 0;
+   struct dirent *de;
+   char *node_path;
+   DIR *node_dir;
+
+   target_node = daxctl_dev_get_target_node(dev);
+   if (target_node < 0) {
+   err(ctx, "%s: Unable to get target node\n",
+   daxctl_dev_get_devname(dev));
+   return -ENXIO;
+   }
+
+   rc = asprintf(_path, "%s/node%d", node_base, target_node);
+   if (rc < 0)
+   return 

[ndctl PATCH v2 02/10] libdaxctl: cache 'subsystem' in daxctl_ctx

2019-05-07 Thread Vishal Verma
The 'DAX subsystem' in effect is determined at region or device init
time, and dictates the sysfs base paths for all device/region
operations. In preparation for adding bind/unbind functionality, cache
the subsystem as determined at init time in the library context.

Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 daxctl/lib/libdaxctl.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c
index 70f896b..f8f5b8c 100644
--- a/daxctl/lib/libdaxctl.c
+++ b/daxctl/lib/libdaxctl.c
@@ -46,6 +46,7 @@ struct daxctl_ctx {
void *userdata;
int regions_init;
struct list_head regions;
+   enum dax_subsystem subsys;
 };
 
 /**
@@ -96,6 +97,7 @@ DAXCTL_EXPORT int daxctl_new(struct daxctl_ctx **ctx)
dbg(c, "log_priority=%d\n", c->ctx.log_priority);
*ctx = c;
list_head_init(>regions);
+   c->subsys = DAX_UNKNOWN;
 
return 0;
 }
@@ -454,14 +456,18 @@ static void dax_devices_init(struct daxctl_region *region)
for (i = 0; i < ARRAY_SIZE(dax_subsystems); i++) {
char *region_path;
 
-   if (i == DAX_BUS)
+   if (i == DAX_BUS) {
region_path = region->region_path;
-   else if (i == DAX_CLASS) {
+   if (ctx->subsys == DAX_UNKNOWN)
+   ctx->subsys = DAX_BUS;
+   } else if (i == DAX_CLASS) {
if (asprintf(_path, "%s/dax",
region->region_path) < 0) {
dbg(ctx, "region path alloc fail\n");
continue;
}
+   if (ctx->subsys == DAX_UNKNOWN)
+   ctx->subsys = DAX_CLASS;
} else
continue;
sysfs_device_parse(ctx, region_path, daxdev_fmt, region,
@@ -539,6 +545,8 @@ static void __dax_regions_init(struct daxctl_ctx *ctx, enum 
dax_subsystem subsys
free(dev_path);
if (!region)
err(ctx, "add_dax_region() for %s failed\n", 
de->d_name);
+   if (ctx->subsys == DAX_UNKNOWN)
+   ctx->subsys = subsys;
}
closedir(dir);
 }
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v2 07/10] daxctl: add a new reconfigure-device command

2019-05-07 Thread Vishal Verma
Add a new command 'daxctl-reconfigure-device'. This is used to switch
the mode of a dax device between regular 'device_dax' and
'system-memory'. The command also uses the memory hotplug sysfs
interfaces to online the newly available memory when converting to
'system-ram', and to attempt to offline the memory when converting back
to a DAX device.

Cc: Pavel Tatashin 
Cc: Dave Hansen 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 daxctl/Makefile.am |   2 +
 daxctl/builtin.h   |   1 +
 daxctl/daxctl.c|   1 +
 daxctl/device.c| 237 +
 4 files changed, 241 insertions(+)
 create mode 100644 daxctl/device.c

diff --git a/daxctl/Makefile.am b/daxctl/Makefile.am
index 94f73f9..66dcc7f 100644
--- a/daxctl/Makefile.am
+++ b/daxctl/Makefile.am
@@ -15,10 +15,12 @@ daxctl_SOURCES =\
daxctl.c \
list.c \
migrate.c \
+   device.c \
../util/json.c
 
 daxctl_LDADD =\
lib/libdaxctl.la \
../libutil.a \
$(UUID_LIBS) \
+   $(KMOD_LIBS) \
$(JSON_LIBS)
diff --git a/daxctl/builtin.h b/daxctl/builtin.h
index 00ef5e9..756ba2a 100644
--- a/daxctl/builtin.h
+++ b/daxctl/builtin.h
@@ -6,4 +6,5 @@
 struct daxctl_ctx;
 int cmd_list(int argc, const char **argv, struct daxctl_ctx *ctx);
 int cmd_migrate(int argc, const char **argv, struct daxctl_ctx *ctx);
+int cmd_reconfig_device(int argc, const char **argv, struct daxctl_ctx *ctx);
 #endif /* _DAXCTL_BUILTIN_H_ */
diff --git a/daxctl/daxctl.c b/daxctl/daxctl.c
index 2e41747..e1ba7b8 100644
--- a/daxctl/daxctl.c
+++ b/daxctl/daxctl.c
@@ -71,6 +71,7 @@ static struct cmd_struct commands[] = {
{ "list", .d_fn = cmd_list },
{ "help", .d_fn = cmd_help },
{ "migrate-device-model", .d_fn = cmd_migrate },
+   { "reconfigure-device", .d_fn = cmd_reconfig_device },
 };
 
 int main(int argc, const char **argv)
diff --git a/daxctl/device.c b/daxctl/device.c
new file mode 100644
index 000..12644c5
--- /dev/null
+++ b/daxctl/device.c
@@ -0,0 +1,237 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2019 Intel Corporation. All rights reserved. */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct {
+   const char *dev;
+   const char *mode;
+   int region_id;
+   bool no_online;
+   bool do_offline;
+   bool human;
+   bool verbose;
+} param = {
+   .region_id = -1,
+};
+
+static int dev_disable(struct daxctl_dev *dev)
+{
+   int rc;
+
+   if (!daxctl_dev_is_enabled(dev))
+   return 0;
+
+   rc = daxctl_dev_disable(dev);
+   if (rc)
+   fprintf(stderr, "%s: disable failed: %s\n",
+   daxctl_dev_get_devname(dev), strerror(-rc));
+
+   return rc;
+}
+
+static int reconfig_mode_ram(struct daxctl_dev *dev)
+{
+   const char *devname = daxctl_dev_get_devname(dev);
+   int rc;
+
+   rc = dev_disable(dev);
+   if (rc)
+   return rc;
+   rc = daxctl_dev_enable_ram(dev);
+   if (rc)
+   return rc;
+
+   if (param.no_online)
+   return 0;
+
+   rc = daxctl_dev_online_node(dev);
+   if (rc < 0) {
+   fprintf(stderr, "%s: unable to online memory: %s\n",
+   devname, strerror(-rc));
+   return rc;
+   }
+   if (param.verbose)
+   fprintf(stderr, "%s: onlined %d memory sections\n",
+   devname, rc);
+
+   return 0;
+}
+
+static int reconfig_mode_devdax(struct daxctl_dev *dev)
+{
+   const char *devname = daxctl_dev_get_devname(dev);
+   int rc;
+
+   if (param.do_offline) {
+   rc = daxctl_dev_offline_node(dev);
+   if (rc < 0) {
+   fprintf(stderr, "%s: unable to offline memory: %s\n",
+   devname, strerror(-rc));
+   return rc;
+   }
+   if (param.verbose)
+   fprintf(stderr, "%s: offlined %d memory sections\n",
+   devname, rc);
+   }
+
+   rc = daxctl_dev_node_is_online(dev);
+   if (rc < 0) {
+   fprintf(stderr, "%s: unable to determine node state: %s\n",
+   devname, strerror(-rc));
+   return rc;
+   }
+   if (rc > 0) {
+   if (param.verbose) {
+   fprintf(stderr, "%s: found %d memory sections online\n",
+   devname, rc);
+   fprintf(stderr, "%s: refusing to change modes\n",
+   devname);
+   }
+   return -EBUSY;
+   }
+
+   rc = dev_disable(dev);
+   if (rc)
+   return rc;
+
+   rc = daxctl_dev_enable_devdax(dev);
+   if (rc)
+   return rc;
+
+   return 0;

[ndctl PATCH v2 08/10] Documentation/daxctl: add a man page for daxctl-reconfigure-device

2019-05-07 Thread Vishal Verma
Add a man page describing the new daxctl-reconfigure-device command.

Cc: Pavel Tatashin 
Cc: Dave Hansen 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 Documentation/daxctl/Makefile.am  |   3 +-
 .../daxctl/daxctl-reconfigure-device.txt  | 118 ++
 2 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/daxctl/daxctl-reconfigure-device.txt

diff --git a/Documentation/daxctl/Makefile.am b/Documentation/daxctl/Makefile.am
index 6aba035..715fbad 100644
--- a/Documentation/daxctl/Makefile.am
+++ b/Documentation/daxctl/Makefile.am
@@ -28,7 +28,8 @@ endif
 man1_MANS = \
daxctl.1 \
daxctl-list.1 \
-   daxctl-migrate-device-model.1
+   daxctl-migrate-device-model.1 \
+   daxctl-reconfigure-device.1
 
 CLEANFILES = $(man1_MANS)
 
diff --git a/Documentation/daxctl/daxctl-reconfigure-device.txt 
b/Documentation/daxctl/daxctl-reconfigure-device.txt
new file mode 100644
index 000..b575091
--- /dev/null
+++ b/Documentation/daxctl/daxctl-reconfigure-device.txt
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+
+daxctl-reconfigure-device(1)
+
+
+NAME
+
+daxctl-reconfigure-device - Reconfigure a dax device into a different mode
+
+SYNOPSIS
+
+[verse]
+'daxctl reconfigure-device'  [...] []
+
+EXAMPLES
+
+
+* Reconfigure dax0.0 to system-ram mode, don't online the memory
+
+# daxctl reconfigure-device --mode=system-ram --no-online dax0.0
+[
+  {
+"chardev":"dax0.0",
+"size":16777216000,
+"numa_node":2,
+"mode":"system-ram"
+  }
+]
+
+
+* Reconfigure dax0.0 to devdax mode, attempt to offline the memory
+
+# daxctl reconfigure-device --human --mode=devdax --attempt-offline dax0.0
+{
+  "chardev":"dax0.0",
+  "size":"15.63 GiB (16.78 GB)",
+  "numa_node":2,
+  "mode":"devdax"
+}
+
+
+* Reconfigure all dax devices on region0 to system-ram mode
+
+# daxctl reconfigure-device --mode=system-ram --region=0 all
+[
+  {
+"chardev":"dax0.0",
+"size":16777216000,
+"numa_node":2,
+"mode":"system-ram"
+  },
+  {
+"chardev":"dax0.1",
+"size":16777216000,
+"numa_node":3,
+"mode":"system-ram"
+  }
+]
+
+
+DESCRIPTION
+---
+
+Reconfigure the operational mode of a dax device. This can be used to convert
+a regular 'devdax' mode device to the 'system-ram' mode which allows for the 
dax
+range to be hot-plugged into the system as regular memory.
+
+NOTE: This is a destructive operation. Any data on the dax device *will* be
+lost.
+
+OPTIONS
+---
+-r::
+--region=::
+   Restrict the reconfigure operation to devices belonging to the
+   specified region(s). A device-dax region is a contiguous range of
+   memory that hosts one or more /dev/daxX.Y devices, where X is the
+   region id and Y is the device instance id.
+
+-m::
+--mode=::
+   Specify the mode to which the dax device(s) should be reconfigured.
+   - "system-ram": hotplug the device into system memory.
+
+   - "devdax": switch to the normal "device dax" mode. This may not work
+ on kernels prior to v5.2. In such a case, a reboot is the only way to
+ switch back to 'devdax' mode.
+
+-N::
+--no-online::
+   By default, memory sections provided by system-ram devices will be
+   brought online automatically and immediately. Use this option to
+   disable the automatic onlining behavior.
+
+-O::
+--attempt-offline::
+   When converting from "system-ram" mode to "devdax", it is expected
+   that all the memory sections are first made offline. By default,
+   daxctl won't touch online memory. However with this option, attempt
+   to offline the memory on the NUMA node associated with the dax device
+   before converting it back to "devdax" mode.
+
+-u::
+--human::
+   By default the command will output machine-friendly raw-integer
+   data. Instead, with this flag, numbers representing storage size
+   will be formatted as human readable strings with units, other
+   fields are converted to hexadecimal strings.
+
+-v::
+--verbose::
+   Emit more debug messages for the reconfiguration process
+
+include::../copyright.txt[]
+
+SEE ALSO
+
+linkdaxctl:daxctl-list[1]
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v2 09/10] contrib/ndctl: fix region-id completions for daxctl

2019-05-07 Thread Vishal Verma
The completion helpers for daxctl assumed the region arguments for
specifying daxctl regions were the same as ndctl regions, i.e.
"regionX". This is not true - daxctl region arguments are a simple
numeric 'id'.

Add a new helper __daxctl_get_regions() to complete daxctl region IDs
properly.

While at it, fix a useless use of 'echo' in __daxctl_get_devs() and
quoting in __daxctl_comp_options()

Fixes: d6790a32f32c ("daxctl: Add bash-completion")
Signed-off-by: Vishal Verma 
---
 contrib/ndctl | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/contrib/ndctl b/contrib/ndctl
index e17fb0b..d1f8bd6 100755
--- a/contrib/ndctl
+++ b/contrib/ndctl
@@ -528,8 +528,14 @@ _ndctl()
 
 __daxctl_get_devs()
 {
-   local opts="--devices $*"
-   echo "$(daxctl list $opts | grep -E "^\s*\"chardev\":" | cut -d\" -f4)"
+   local opts=("--devices" "$*")
+   daxctl list "${opts[@]}" | grep -E "^\s*\"chardev\":" | cut -d'"' -f4
+}
+
+__daxctl_get_regions()
+{
+   local opts=("--regions" "$*")
+   daxctl list "${opts[@]}" | grep -E "^\s*\"id\":" | grep -Eo "[0-9]+"
 }
 
 __daxctlcomp()
@@ -558,10 +564,10 @@ __daxctl_comp_options()
local cur_arg=${cur##*=}
case $cur_subopt in
--region)
-   opts=$(__ndctl_get_regions -i)
+   opts="$(__daxctl_get_regions -i)"
;;
--dev)
-   opts=$(__daxctl_get_devs -i)
+   opts="$(__daxctl_get_devs -i)"
;;
*)
return
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v2 03/10] libdaxctl: add interfaces to enable/disable devices

2019-05-07 Thread Vishal Verma
Add new libdaxctl interfaces to disable a device_dax based devices, and
to enable it into a given mode. The modes available are 'device_dax',
and 'system-ram', where device_dax is the normal device DAX mode used
via a character device, and 'system-ram' uses the kernel's 'kmem'
facility to hotplug the device into the system's memory space, and can
be used as normal system memory.

This adds the following new interfaces:

  daxctl_dev_disable;
  daxctl_dev_enable_devdax;
  daxctl_dev_enable_ram;

Cc: Dave Hansen 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 daxctl/lib/Makefile.am |   3 +-
 daxctl/lib/libdaxctl-private.h |  15 ++
 daxctl/lib/libdaxctl.c | 241 +
 daxctl/lib/libdaxctl.sym   |   3 +
 daxctl/libdaxctl.h |   9 ++
 5 files changed, 270 insertions(+), 1 deletion(-)

diff --git a/daxctl/lib/Makefile.am b/daxctl/lib/Makefile.am
index d3d4852..9f0e444 100644
--- a/daxctl/lib/Makefile.am
+++ b/daxctl/lib/Makefile.am
@@ -16,7 +16,8 @@ libdaxctl_la_SOURCES =\
libdaxctl.c
 
 libdaxctl_la_LIBADD =\
-   $(UUID_LIBS)
+   $(UUID_LIBS) \
+   $(KMOD_LIBS)
 
 daxctl_modprobe_data_DATA = daxctl.conf
 
diff --git a/daxctl/lib/libdaxctl-private.h b/daxctl/lib/libdaxctl-private.h
index 4a462e7..e64d0a7 100644
--- a/daxctl/lib/libdaxctl-private.h
+++ b/daxctl/lib/libdaxctl-private.h
@@ -13,6 +13,8 @@
 #ifndef _LIBDAXCTL_PRIVATE_H_
 #define _LIBDAXCTL_PRIVATE_H_
 
+#include 
+
 #define DAXCTL_EXPORT __attribute__ ((visibility("default")))
 
 enum dax_subsystem {
@@ -26,6 +28,11 @@ static const char *dax_subsystems[] = {
[DAX_BUS] = "/sys/bus/dax/devices",
 };
 
+static const char *dax_modules[] = {
+   [DAXCTL_DEV_MODE_DEVDAX] = "device_dax",
+   [DAXCTL_DEV_MODE_RAM] = "kmem",
+};
+
 /**
  * struct daxctl_region - container for dax_devices
  */
@@ -53,6 +60,14 @@ struct daxctl_dev {
char *dev_path;
struct list_node list;
unsigned long long size;
+   struct kmod_module *module;
+   struct kmod_list *kmod_list;
struct daxctl_region *region;
 };
+
+static inline int check_kmod(struct kmod_ctx *kmod_ctx)
+{
+   return kmod_ctx ? 0 : -ENXIO;
+}
+
 #endif /* _LIBDAXCTL_PRIVATE_H_ */
diff --git a/daxctl/lib/libdaxctl.c b/daxctl/lib/libdaxctl.c
index f8f5b8c..d50b321 100644
--- a/daxctl/lib/libdaxctl.c
+++ b/daxctl/lib/libdaxctl.c
@@ -47,6 +47,7 @@ struct daxctl_ctx {
int regions_init;
struct list_head regions;
enum dax_subsystem subsys;
+   struct kmod_ctx *kmod_ctx;
 };
 
 /**
@@ -85,12 +86,20 @@ DAXCTL_EXPORT void daxctl_set_userdata(struct daxctl_ctx 
*ctx, void *userdata)
  */
 DAXCTL_EXPORT int daxctl_new(struct daxctl_ctx **ctx)
 {
+   struct kmod_ctx *kmod_ctx;
struct daxctl_ctx *c;
+   int rc = 0;
 
c = calloc(1, sizeof(struct daxctl_ctx));
if (!c)
return -ENOMEM;
 
+   kmod_ctx = kmod_new(NULL, NULL);
+   if (check_kmod(kmod_ctx) != 0) {
+   rc = -ENXIO;
+   goto out;
+   }
+
c->refcount = 1;
log_init(>ctx, "libdaxctl", "DAXCTL_LOG");
info(c, "ctx %p created\n", c);
@@ -98,8 +107,12 @@ DAXCTL_EXPORT int daxctl_new(struct daxctl_ctx **ctx)
*ctx = c;
list_head_init(>regions);
c->subsys = DAX_UNKNOWN;
+   c->kmod_ctx = kmod_ctx;
 
return 0;
+out:
+   free(c);
+   return rc;
 }
 
 /**
@@ -134,6 +147,7 @@ DAXCTL_EXPORT void daxctl_unref(struct daxctl_ctx *ctx)
list_for_each_safe(>regions, region, _r, list)
free_region(region, >regions);
 
+   kmod_unref(ctx->kmod_ctx);
info(ctx, "context %p released\n", ctx);
free(ctx);
 }
@@ -191,6 +205,7 @@ static void free_dev(struct daxctl_dev *dev, struct 
list_head *head)
 {
if (head)
list_del_from(head, >list);
+   kmod_module_unref_list(dev->kmod_list);
free(dev->dev_buf);
free(dev->dev_path);
free(dev);
@@ -308,6 +323,27 @@ DAXCTL_EXPORT struct daxctl_region 
*daxctl_new_region(struct daxctl_ctx *ctx,
return region;
 }
 
+static struct kmod_list *to_module_list(struct daxctl_ctx *ctx,
+   const char *alias)
+{
+   struct kmod_list *list = NULL;
+   int rc;
+
+   if (!ctx->kmod_ctx || !alias)
+   return NULL;
+   if (alias[0] == 0)
+   return NULL;
+
+   rc = kmod_module_new_from_lookup(ctx->kmod_ctx, alias, );
+   if (rc < 0 || !list) {
+   dbg(ctx, "failed to find modules for alias: %s %d list: %s\n",
+   alias, rc, list ? "populated" : "empty");
+   return NULL;
+   }
+
+   return list;
+}
+
 static void *add_dax_dev(void *parent, int id, const char *daxdev_base)
 {
const char *devname = devpath_to_devname(daxdev_base);
@@ -317,6 +353,7 @@ static void *add_dax_dev(void *parent, int id, const char 
*daxdev_base)

[ndctl PATCH v2 05/10] daxctl/list: add numa_node for device listings

2019-05-07 Thread Vishal Verma
The kernel provides a 'target_node' attribute for dax devices. When
converting a dax device to the system-ram mode, the memory is hotplugged
into this numa node. It would be helpful to print this in device
listings so that it is easy for applications to detect the numa node to
which the new memory belongs.

Cc: Dan Williams 
Cc: Dave Hansen 
Signed-off-by: Vishal Verma 
---
 util/json.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/util/json.c b/util/json.c
index babdc8c..b7ce719 100644
--- a/util/json.c
+++ b/util/json.c
@@ -271,6 +271,7 @@ struct json_object *util_daxctl_dev_to_json(struct 
daxctl_dev *dev,
 {
const char *devname = daxctl_dev_get_devname(dev);
struct json_object *jdev, *jobj;
+   int node;
 
jdev = json_object_new_object();
if (!devname || !jdev)
@@ -284,6 +285,13 @@ struct json_object *util_daxctl_dev_to_json(struct 
daxctl_dev *dev,
if (jobj)
json_object_object_add(jdev, "size", jobj);
 
+   node = daxctl_dev_get_target_node(dev);
+   if (node >= 0) {
+   jobj = json_object_new_int(node);
+   if (jobj)
+   json_object_object_add(jdev, "numa_node", jobj);
+   }
+
return jdev;
 }
 
-- 
2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 6/6] mm/devm_memremap_pages: Fix final page put race

2019-05-07 Thread Dan Williams
Logan noticed that devm_memremap_pages_release() kills the percpu_ref
drops all the page references that were acquired at init and then
immediately proceeds to unplug, arch_remove_memory(), the backing pages
for the pagemap. If for some reason device shutdown actually collides
with a busy / elevated-ref-count page then arch_remove_memory() should
be deferred until after that reference is dropped.

As it stands the "wait for last page ref drop" happens *after*
devm_memremap_pages_release() returns, which is obviously too late and
can lead to crashes.

Fix this situation by assigning the responsibility to wait for the
percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
callback. Implement the new cleanup callback for all
devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.

Reported-by: Logan Gunthorpe 
Fixes: 41e94a851304 ("add devm_memremap_pages")
Cc: Bjorn Helgaas 
Cc: "Jérôme Glisse" 
Cc: Christoph Hellwig 
Reviewed-by: Ira Weiny 
Signed-off-by: Dan Williams 
---
 drivers/dax/device.c  |   13 +++--
 drivers/nvdimm/pmem.c |   17 +
 drivers/pci/p2pdma.c  |   17 +++--
 include/linux/memremap.h  |2 ++
 kernel/memremap.c |   17 -
 mm/hmm.c  |   14 +++---
 tools/testing/nvdimm/test/iomap.c |2 ++
 7 files changed, 38 insertions(+), 44 deletions(-)

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index e428468ab661..e3aa78dd1bb0 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -27,9 +27,8 @@ static void dev_dax_percpu_release(struct percpu_ref *ref)
complete(_dax->cmp);
 }
 
-static void dev_dax_percpu_exit(void *data)
+static void dev_dax_percpu_exit(struct percpu_ref *ref)
 {
-   struct percpu_ref *ref = data;
struct dev_dax *dev_dax = ref_to_dev_dax(ref);
 
dev_dbg(_dax->dev, "%s\n", __func__);
@@ -468,18 +467,12 @@ int dev_dax_probe(struct device *dev)
if (rc)
return rc;
 
-   rc = devm_add_action_or_reset(dev, dev_dax_percpu_exit, _dax->ref);
-   if (rc)
-   return rc;
-
dev_dax->pgmap.ref = _dax->ref;
dev_dax->pgmap.kill = dev_dax_percpu_kill;
+   dev_dax->pgmap.cleanup = dev_dax_percpu_exit;
addr = devm_memremap_pages(dev, _dax->pgmap);
-   if (IS_ERR(addr)) {
-   devm_remove_action(dev, dev_dax_percpu_exit, _dax->ref);
-   percpu_ref_exit(_dax->ref);
+   if (IS_ERR(addr))
return PTR_ERR(addr);
-   }
 
inode = dax_inode(dax_dev);
cdev = inode->i_cdev;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0279eb1da3ef..1c9181712fa4 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -304,11 +304,19 @@ static const struct attribute_group 
*pmem_attribute_groups[] = {
NULL,
 };
 
-static void pmem_release_queue(void *q)
+static void __pmem_release_queue(struct percpu_ref *ref)
 {
+   struct request_queue *q;
+
+   q = container_of(ref, typeof(*q), q_usage_counter);
blk_cleanup_queue(q);
 }
 
+static void pmem_release_queue(void *ref)
+{
+   __pmem_release_queue(ref);
+}
+
 static void pmem_freeze_queue(struct percpu_ref *ref)
 {
struct request_queue *q;
@@ -400,12 +408,10 @@ static int pmem_attach_disk(struct device *dev,
if (!q)
return -ENOMEM;
 
-   if (devm_add_action_or_reset(dev, pmem_release_queue, q))
-   return -ENOMEM;
-
pmem->pfn_flags = PFN_DEV;
pmem->pgmap.ref = >q_usage_counter;
pmem->pgmap.kill = pmem_freeze_queue;
+   pmem->pgmap.cleanup = __pmem_release_queue;
if (is_nd_pfn(dev)) {
if (setup_pagemap_fsdax(dev, >pgmap))
return -ENOMEM;
@@ -426,6 +432,9 @@ static int pmem_attach_disk(struct device *dev,
pmem->pfn_flags |= PFN_MAP;
memcpy(_res, >pgmap.res, sizeof(bb_res));
} else {
+   if (devm_add_action_or_reset(dev, pmem_release_queue,
+   >q_usage_counter))
+   return -ENOMEM;
addr = devm_memremap(dev, pmem->phys_addr,
pmem->size, ARCH_MEMREMAP_PMEM);
memcpy(_res, >res, sizeof(bb_res));
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 54d475569058..a7a66b958720 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -95,7 +95,7 @@ static void pci_p2pdma_percpu_kill(struct percpu_ref *ref)
percpu_ref_kill(ref);
 }
 
-static void pci_p2pdma_percpu_cleanup(void *ref)
+static void pci_p2pdma_percpu_cleanup(struct percpu_ref *ref)
 {
struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref);
 
@@ -198,16 +198,6 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, 
size_t size,
if (error)
goto pgmap_free;
 
-   /*
-

[PATCH v2 5/6] PCI/P2PDMA: Track pgmap references per resource, not globally

2019-05-07 Thread Dan Williams
In preparation for fixing a race between devm_memremap_pages_release()
and the final put of a page from the device-page-map, allocate a
percpu-ref per p2pdma resource mapping.

Cc: Logan Gunthorpe 
Cc: Bjorn Helgaas 
Cc: Christoph Hellwig 
Reviewed-by: Ira Weiny 
Signed-off-by: Dan Williams 
---
 drivers/pci/p2pdma.c |  124 +-
 1 file changed, 81 insertions(+), 43 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 595a534bd749..54d475569058 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -20,12 +20,16 @@
 #include 
 
 struct pci_p2pdma {
-   struct percpu_ref devmap_ref;
-   struct completion devmap_ref_done;
struct gen_pool *pool;
bool p2pmem_published;
 };
 
+struct p2pdma_pagemap {
+   struct dev_pagemap pgmap;
+   struct percpu_ref ref;
+   struct completion ref_done;
+};
+
 static ssize_t size_show(struct device *dev, struct device_attribute *attr,
 char *buf)
 {
@@ -74,41 +78,45 @@ static const struct attribute_group p2pmem_group = {
.name = "p2pmem",
 };
 
+static struct p2pdma_pagemap *to_p2p_pgmap(struct percpu_ref *ref)
+{
+   return container_of(ref, struct p2pdma_pagemap, ref);
+}
+
 static void pci_p2pdma_percpu_release(struct percpu_ref *ref)
 {
-   struct pci_p2pdma *p2p =
-   container_of(ref, struct pci_p2pdma, devmap_ref);
+   struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref);
 
-   complete_all(>devmap_ref_done);
+   complete(_pgmap->ref_done);
 }
 
 static void pci_p2pdma_percpu_kill(struct percpu_ref *ref)
 {
-   /*
-* pci_p2pdma_add_resource() may be called multiple times
-* by a driver and may register the percpu_kill devm action multiple
-* times. We only want the first action to actually kill the
-* percpu_ref.
-*/
-   if (percpu_ref_is_dying(ref))
-   return;
-
percpu_ref_kill(ref);
 }
 
+static void pci_p2pdma_percpu_cleanup(void *ref)
+{
+   struct p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(ref);
+
+   wait_for_completion(_pgmap->ref_done);
+   percpu_ref_exit(_pgmap->ref);
+}
+
 static void pci_p2pdma_release(void *data)
 {
struct pci_dev *pdev = data;
+   struct pci_p2pdma *p2pdma = pdev->p2pdma;
 
-   if (!pdev->p2pdma)
+   if (!p2pdma)
return;
 
-   wait_for_completion(>p2pdma->devmap_ref_done);
-   percpu_ref_exit(>p2pdma->devmap_ref);
+   /* Flush and disable pci_alloc_p2p_mem() */
+   pdev->p2pdma = NULL;
+   synchronize_rcu();
 
-   gen_pool_destroy(pdev->p2pdma->pool);
+   gen_pool_destroy(p2pdma->pool);
sysfs_remove_group(>dev.kobj, _group);
-   pdev->p2pdma = NULL;
 }
 
 static int pci_p2pdma_setup(struct pci_dev *pdev)
@@ -124,12 +132,6 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
if (!p2p->pool)
goto out;
 
-   init_completion(>devmap_ref_done);
-   error = percpu_ref_init(>devmap_ref,
-   pci_p2pdma_percpu_release, 0, GFP_KERNEL);
-   if (error)
-   goto out_pool_destroy;
-
error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
if (error)
goto out_pool_destroy;
@@ -163,6 +165,7 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
 int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
u64 offset)
 {
+   struct p2pdma_pagemap *p2p_pgmap;
struct dev_pagemap *pgmap;
void *addr;
int error;
@@ -185,14 +188,32 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int 
bar, size_t size,
return error;
}
 
-   pgmap = devm_kzalloc(>dev, sizeof(*pgmap), GFP_KERNEL);
-   if (!pgmap)
+   p2p_pgmap = devm_kzalloc(>dev, sizeof(*p2p_pgmap), GFP_KERNEL);
+   if (!p2p_pgmap)
return -ENOMEM;
 
+   init_completion(_pgmap->ref_done);
+   error = percpu_ref_init(_pgmap->ref,
+   pci_p2pdma_percpu_release, 0, GFP_KERNEL);
+   if (error)
+   goto pgmap_free;
+
+   /*
+* FIXME: the percpu_ref_exit needs to be coordinated internal
+* to devm_memremap_pages_release(). Duplicate the same ordering
+* as other devm_memremap_pages() users for now.
+*/
+   error = devm_add_action(>dev, pci_p2pdma_percpu_cleanup,
+   _pgmap->ref);
+   if (error)
+   goto ref_cleanup;
+
+   pgmap = _pgmap->pgmap;
+
pgmap->res.start = pci_resource_start(pdev, bar) + offset;
pgmap->res.end = pgmap->res.start + size - 1;
pgmap->res.flags = pci_resource_flags(pdev, bar);
-   pgmap->ref = >p2pdma->devmap_ref;
+   pgmap->ref = _pgmap->ref;
pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) -

[PATCH v2 3/6] PCI/P2PDMA: Fix the gen_pool_add_virt() failure path

2019-05-07 Thread Dan Williams
The pci_p2pdma_add_resource() implementation immediately frees the pgmap
if gen_pool_add_virt() fails. However, that means that when @dev
triggers a devres release devm_memremap_pages_release() will crash
trying to access the freed @pgmap.

Use the new devm_memunmap_pages() to manually free the mapping in the
error path.

Fixes: 52916982af48 ("PCI/P2PDMA: Support peer-to-peer memory")
Cc: Logan Gunthorpe 
Cc: Christoph Hellwig 
Reviewed-by: Ira Weiny 
Acked-by: Bjorn Helgaas 
Signed-off-by: Dan Williams 
---
 drivers/pci/p2pdma.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index c52298d76e64..595a534bd749 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -208,13 +208,15 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int 
bar, size_t size,
pci_bus_address(pdev, bar) + offset,
resource_size(>res), dev_to_node(>dev));
if (error)
-   goto pgmap_free;
+   goto pages_free;
 
pci_info(pdev, "added peer-to-peer DMA memory %pR\n",
 >res);
 
return 0;
 
+pages_free:
+   devm_memunmap_pages(>dev, pgmap);
 pgmap_free:
devm_kfree(>dev, pgmap);
return error;

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 1/6] drivers/base/devres: Introduce devm_release_action()

2019-05-07 Thread Dan Williams
The devm_add_action() facility allows a resource allocation routine to
add custom devm semantics. One such user is devm_memremap_pages().

There is now a need to manually trigger devm_memremap_pages_release().
Introduce devm_release_action() so the release action can be triggered
via a new devm_memunmap_pages() api in a follow-on change.

Cc: Logan Gunthorpe 
Cc: Bjorn Helgaas 
Cc: Christoph Hellwig 
Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Reviewed-by: Ira Weiny 
Signed-off-by: Dan Williams 
---
 drivers/base/devres.c  |   24 +++-
 include/linux/device.h |1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index e038e2b3b7ea..0bbb328bd17f 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -755,10 +755,32 @@ void devm_remove_action(struct device *dev, void 
(*action)(void *), void *data)
 
WARN_ON(devres_destroy(dev, devm_action_release, devm_action_match,
   ));
-
 }
 EXPORT_SYMBOL_GPL(devm_remove_action);
 
+/**
+ * devm_release_action() - release previously added custom action
+ * @dev: Device that owns the action
+ * @action: Function implementing the action
+ * @data: Pointer to data passed to @action implementation
+ *
+ * Releases and removes instance of @action previously added by
+ * devm_add_action().  Both action and data should match one of the
+ * existing entries.
+ */
+void devm_release_action(struct device *dev, void (*action)(void *), void 
*data)
+{
+   struct action_devres devres = {
+   .data = data,
+   .action = action,
+   };
+
+   WARN_ON(devres_release(dev, devm_action_release, devm_action_match,
+  ));
+
+}
+EXPORT_SYMBOL_GPL(devm_release_action);
+
 /*
  * Managed kmalloc/kfree
  */
diff --git a/include/linux/device.h b/include/linux/device.h
index 4e6987e11f68..6d7fd5370f3d 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -713,6 +713,7 @@ void __iomem *devm_of_iomap(struct device *dev,
 /* allows to add/remove a custom action to devres stack */
 int devm_add_action(struct device *dev, void (*action)(void *), void *data);
 void devm_remove_action(struct device *dev, void (*action)(void *), void 
*data);
+void devm_release_action(struct device *dev, void (*action)(void *), void 
*data);
 
 static inline int devm_add_action_or_reset(struct device *dev,
   void (*action)(void *), void *data)

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 0/6] mm/devm_memremap_pages: Fix page release race

2019-05-07 Thread Dan Williams
Changes since v1 [1]:
- Fix a NULL-pointer deref crash in pci_p2pdma_release() (Logan)

- Refresh the p2pdma patch headers to match the format of other p2pdma
  patches (Bjorn)

- Collect Ira's reviewed-by

[1]: 
https://lore.kernel.org/lkml/155387324370.2443841.574715745262628837.st...@dwillia2-desk3.amr.corp.intel.com/

---

Logan audited the devm_memremap_pages() shutdown path and noticed that
it was possible to proceed to arch_remove_memory() before all
potential page references have been reaped.

Introduce a new ->cleanup() callback to do the work of waiting for any
straggling page references and then perform the percpu_ref_exit() in
devm_memremap_pages_release() context.

For p2pdma this involves some deeper reworks to reference count
resources on a per-instance basis rather than a per pci-device basis. A
modified genalloc api is introduced to convey a driver-private pointer
through gen_pool_{alloc,free}() interfaces. Also, a
devm_memunmap_pages() api is introduced since p2pdma does not
auto-release resources on a setup failure.

The dax and pmem changes pass the nvdimm unit tests, and the p2pdma
changes should now pass testing with the pci_p2pdma_release() fix.
Jérôme, how does this look for HMM?

In general, I think these patches / fixes are suitable for v5.2-rc1 or
v5.2-rc2, and since they touch kernel/memremap.c, and other various
pieces of the core, they should go through the -mm tree. These patches
merge cleanly with the current state of -next, pass the nvdimm unit
tests, and are exposed to the 0day robot with no issues reported
(https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending).

---

Dan Williams (6):
  drivers/base/devres: Introduce devm_release_action()
  mm/devm_memremap_pages: Introduce devm_memunmap_pages
  PCI/P2PDMA: Fix the gen_pool_add_virt() failure path
  lib/genalloc: Introduce chunk owners
  PCI/P2PDMA: Track pgmap references per resource, not globally
  mm/devm_memremap_pages: Fix final page put race


 drivers/base/devres.c |   24 +++-
 drivers/dax/device.c  |   13 +---
 drivers/nvdimm/pmem.c |   17 -
 drivers/pci/p2pdma.c  |  115 +++--
 include/linux/device.h|1 
 include/linux/genalloc.h  |   55 --
 include/linux/memremap.h  |8 +++
 kernel/memremap.c |   23 ++-
 lib/genalloc.c|   51 
 mm/hmm.c  |   14 +
 tools/testing/nvdimm/test/iomap.c |2 +
 11 files changed, 217 insertions(+), 106 deletions(-)
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 2/6] mm/devm_memremap_pages: Introduce devm_memunmap_pages

2019-05-07 Thread Dan Williams
Use the new devm_relase_action() facility to allow
devm_memremap_pages_release() to be manually triggered.

Cc: Logan Gunthorpe 
Cc: Bjorn Helgaas 
Cc: Christoph Hellwig 
Reviewed-by: Ira Weiny 
Signed-off-by: Dan Williams 
---
 include/linux/memremap.h |6 ++
 kernel/memremap.c|6 ++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f0628660d541..7601ee314c4a 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -100,6 +100,7 @@ struct dev_pagemap {
 
 #ifdef CONFIG_ZONE_DEVICE
 void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
+void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap);
 struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap);
 
@@ -118,6 +119,11 @@ static inline void *devm_memremap_pages(struct device *dev,
return ERR_PTR(-ENXIO);
 }
 
+static inline void devm_memunmap_pages(struct device *dev,
+   struct dev_pagemap *pgmap)
+{
+}
+
 static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap)
 {
diff --git a/kernel/memremap.c b/kernel/memremap.c
index a856cb5ff192..65afbacab44e 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -266,6 +266,12 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
 }
 EXPORT_SYMBOL_GPL(devm_memremap_pages);
 
+void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap)
+{
+   devm_release_action(dev, devm_memremap_pages_release, pgmap);
+}
+EXPORT_SYMBOL_GPL(devm_memunmap_pages);
+
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
 {
/* number of pfns from base where pfn_to_page() is valid */

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 4/6] lib/genalloc: Introduce chunk owners

2019-05-07 Thread Dan Williams
The p2pdma facility enables a provider to publish a pool of dma
addresses for a consumer to allocate. A genpool is used internally by
p2pdma to collect dma resources, 'chunks', to be handed out to
consumers. Whenever a consumer allocates a resource it needs to pin the
'struct dev_pagemap' instance that backs the chunk selected by
pci_alloc_p2pmem().

Currently that reference is taken globally on the entire provider
device. That sets up a lifetime mismatch whereby the p2pdma core needs
to maintain hacks to make sure the percpu_ref is not released twice.

This lifetime mismatch also stands in the way of a fix to
devm_memremap_pages() whereby devm_memremap_pages_release() must wait
for the percpu_ref ->release() callback to complete before it can
proceed to teardown pages.

So, towards fixing this situation, introduce the ability to store a
'chunk owner' at gen_pool_add() time, and a facility to retrieve the
owner at gen_pool_{alloc,free}() time. For p2pdma this will be used to
store and recall individual dev_pagemap reference counter instances
per-chunk.

Cc: Logan Gunthorpe 
Cc: Bjorn Helgaas 
Cc: "Jérôme Glisse" 
Cc: Christoph Hellwig 
Reviewed-by: Ira Weiny 
Signed-off-by: Dan Williams 
---
 include/linux/genalloc.h |   55 +-
 lib/genalloc.c   |   51 +--
 2 files changed, 74 insertions(+), 32 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dd0a452373e7..a337313e064f 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -75,6 +75,7 @@ struct gen_pool_chunk {
struct list_head next_chunk;/* next chunk in pool */
atomic_long_t avail;
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
+   void *owner;/* private data to retrieve at alloc 
time */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
unsigned long bits[0];  /* bitmap for allocating memory chunk */
@@ -96,8 +97,15 @@ struct genpool_data_fixed {
 
 extern struct gen_pool *gen_pool_create(int, int);
 extern phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long);
-extern int gen_pool_add_virt(struct gen_pool *, unsigned long, phys_addr_t,
-size_t, int);
+extern int gen_pool_add_owner(struct gen_pool *, unsigned long, phys_addr_t,
+size_t, int, void *);
+
+static inline int gen_pool_add_virt(struct gen_pool *pool, unsigned long addr,
+   phys_addr_t phys, size_t size, int nid)
+{
+   return gen_pool_add_owner(pool, addr, phys, size, nid, NULL);
+}
+
 /**
  * gen_pool_add - add a new chunk of special memory to the pool
  * @pool: pool to add new memory chunk to
@@ -116,12 +124,47 @@ static inline int gen_pool_add(struct gen_pool *pool, 
unsigned long addr,
return gen_pool_add_virt(pool, addr, -1, size, nid);
 }
 extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
-   genpool_algo_t algo, void *data);
+unsigned long gen_pool_alloc_algo_owner(struct gen_pool *pool, size_t size,
+   genpool_algo_t algo, void *data, void **owner);
+
+static inline unsigned long gen_pool_alloc_owner(struct gen_pool *pool,
+   size_t size, void **owner)
+{
+   return gen_pool_alloc_algo_owner(pool, size, pool->algo, pool->data,
+   owner);
+}
+
+static inline unsigned long gen_pool_alloc_algo(struct gen_pool *pool,
+   size_t size, genpool_algo_t algo, void *data)
+{
+   return gen_pool_alloc_algo_owner(pool, size, algo, data, NULL);
+}
+
+/**
+ * gen_pool_alloc - allocate special memory from the pool
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses the pool allocation function (with first-fit algorithm by default).
+ * Can not be used in NMI handler on architectures without
+ * NMI-safe cmpxchg implementation.
+ */
+static inline unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
+{
+   return gen_pool_alloc_algo(pool, size, pool->algo, pool->data);
+}
+
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+extern void gen_pool_free_owner(struct gen_pool *pool, unsigned long addr,
+   size_t size, void **owner);
+static inline void gen_pool_free(struct gen_pool *pool, unsigned long addr,
+size_t size)
+{
+   gen_pool_free_owner(pool, addr, size, NULL);
+}
+
 extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool 

Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver

2019-05-07 Thread Jakub Staroń
On 4/25/19 10:00 PM, Pankaj Gupta wrote:

> +void host_ack(struct virtqueue *vq)
> +{
> + unsigned int len;
> + unsigned long flags;
> + struct virtio_pmem_request *req, *req_buf;
> + struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> + spin_lock_irqsave(>pmem_lock, flags);
> + while ((req = virtqueue_get_buf(vq, )) != NULL) {
> + req->done = true;
> + wake_up(>host_acked);
> +
> + if (!list_empty(>req_list)) {
> + req_buf = list_first_entry(>req_list,
> + struct virtio_pmem_request, list);
> + list_del(>req_list);

Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to unlink
first element of the list and `vpmem->req_list` is just the list head.

> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> + int err;
> + unsigned long flags;
> + struct scatterlist *sgs[2], sg, ret;
> + struct virtio_device *vdev = nd_region->provider_data;
> + struct virtio_pmem *vpmem = vdev->priv;
> + struct virtio_pmem_request *req;
> +
> + might_sleep();
> + req = kmalloc(sizeof(*req), GFP_KERNEL);
> + if (!req)
> + return -ENOMEM;
> +
> + req->done = req->wq_buf_avail = false;
> + strcpy(req->name, "FLUSH");
> + init_waitqueue_head(>host_acked);
> + init_waitqueue_head(>wq_buf);
> + sg_init_one(, req->name, strlen(req->name));
> + sgs[0] = 
> + sg_init_one(, >ret, sizeof(req->ret));
> + sgs[1] = 
> +
> + spin_lock_irqsave(>pmem_lock, flags);
> + err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> + if (err) {
> + dev_err(>dev, "failed to send command to virtio pmem 
> device\n");
> +
> + list_add_tail(>req_list, >list);
> + spin_unlock_irqrestore(>pmem_lock, flags);
> +
> + /* When host has read buffer, this completes via host_ack */
> + wait_event(req->wq_buf, req->wq_buf_avail);
> + spin_lock_irqsave(>pmem_lock, flags);
> + }

Aren't the arguments in `list_add_tail` swapped? The element we are adding 
should
be first, the list should be second. Also, shouldn't we resubmit the request 
after
waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`?

I propose rewriting it like that:

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
index 66b582f751a3..ff0556b04e86 100644
--- a/drivers/nvdimm/virtio_pmem.c
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq)
if (!list_empty(>req_list)) {
req_buf = list_first_entry(>req_list,
struct virtio_pmem_request, list);
-   list_del(>req_list);
+   list_del(vpmem->req_list.next);
req_buf->wq_buf_avail = true;
wake_up(_buf->wq_buf);
}
@@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region)
sgs[1] = 
 
spin_lock_irqsave(>pmem_lock, flags);
-   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
-   if (err) {
-   dev_err(>dev, "failed to send command to virtio pmem 
device\n");
+   /*
+* If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does 
not
+* have free descriptor slots. We add the request to req_list and wait
+* for host_ack to wake us up when free slots are available.
+*/
+   while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, 
GFP_ATOMIC)) == -ENOSPC) {
+   dev_err(>dev, "failed to send command to virtio pmem 
device, no free slots in the virtqueue, postponing request\n");
+   req->wq_buf_avail = false;
 
-   list_add_tail(>req_list, >list);
+   list_add_tail(>list, >req_list);
spin_unlock_irqrestore(>pmem_lock, flags);
 
/* When host has read buffer, this completes via host_ack */
wait_event(req->wq_buf, req->wq_buf_avail);
spin_lock_irqsave(>pmem_lock, flags);
}
+
+   /*
+* virtqueue_add_sgs failed with error different than -ENOSPC, we can't
+* do anything about that.
+*/
+   if (err) {
+   dev_info(>dev, "failed to send command to virtio pmem 
device, error code %d\n", err);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+   err = -EIO;
+   goto ret;
+   }
err = virtqueue_kick(vpmem->req_vq);
spin_unlock_irqrestore(>pmem_lock, flags);


Let me know if it looks reasonable to you.

Thank you,
Jakub Staron

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported

2019-05-07 Thread Jakub Staroń
From: Pankaj Gupta 
Date: Thu, Apr 25, 2019 at 10:00 PM

> +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> +   struct dax_device *dax_dev)
> +{
> +   return !(vma->flags & VM_SYNC);
> +}

Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
no field named `flags` in `struct vm_area_struct`.

Thank you,
Jakub
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH v2 12/17] kunit: tool: add Python wrappers for running KUnit tests

2019-05-07 Thread Tim.Bird
Here is a bit of inline commentary on the TAP13/TAP14 discussion.

> -Original Message-
> From: Brendan Higgins 
> 
> > On 5/3/19 4:14 PM, Brendan Higgins wrote:
> > >> On 5/2/19 10:36 PM, Brendan Higgins wrote:
> > > In any case, it sounds like you and Greg are in agreement on the core
> > > libraries generating the output in TAP13, so I won't argue that point
> > > further.
> > >
> > > ## Analysis of using TAP13
> >
> > I have never looked at TAP version 13 in any depth at all, so do not 
> > consider
> > me to be any sort of expert.
> >
> > My entire TAP knowledge is based on:
> >
> >   https://testanything.org/tap-version-13-specification.html
> >
> > and the pull request to create the TAP version 14 specification:
> >
> >https://github.com/TestAnything/testanything.github.io/pull/36/files
> >
> > You can see the full version 14 document in the submitter's repo:
> >
> >   $ git clone https://github.com/isaacs/testanything.github.io.git
> >   $ cd testanything.github.io
> >   $ git checkout tap14
> >   $ ls tap-version-14-specification.md
> >
> > My understanding is the the version 14 specification is not trying to
> > add new features, but instead capture what is already implemented in
> > the wild.
> >
> >
> > > One of my earlier concerns was that TAP13 is a bit over constrained
> > > for what I would like to output from the KUnit core. It only allows
> > > data to be output as either:
> > >  - test number
> > >  - ok/not ok with single line description
> > >  - directive
> > >  - diagnostics
> > >  - YAML block
> > >
> > > The test number must become before a set of ok/not ok lines, and does
> > > not contain any additional information. One annoying thing about this
> > > is it doesn't provide any kind of nesting or grouping.
> >
> > Greg's response mentions ktest (?) already does nesting.
> 
> I think we are talking about kselftest.
> 
> > Version 14 allows nesting through subtests.  I have not looked at what
> > ktest does, so I do not know if it uses subtest, or something else.
> 
> Oh nice! That is new in version 14. I can use that.

We have run into the problem of subtests (or nested tests, both using
TAP13) in Fuego.  I recall that this issue came up in kselftest, and I believe
we discussed a solution, but I don't recall what it was.

Can someone remind me what kselftest does to handle nested tests
(in terms of TAP13 output)?

> 
> > > There is one ok/not ok line per test and it may have a short
> > > description of the test immediately after 'ok' or 'not ok'; this is
> > > problematic because it wants the first thing you say about a test to
> > > be after you know whether it passes or not.
> >
> > I think you could output a diagnostic line that says a test is starting.
> > This is important to me because printk() errors and warnings that are
> > related to a test can be output by a subsystem other than the subsystem
> > that I am testing.  If there is no marker at the start of the test
> > then there is no way to attribute the printk()s to the test.
> 
> I agree.

This is a significant problem.  In Fuego we output each line with a test id 
prefix,
which goes against the spec, but helps solve this.  Test output should be
kept separate from system output, but if I understand correctly, there are no
channels in prinkt to use to keep different data streams separate.

How does kselftest deal with this now?

> 
> Technically conforms with the spec, and kselftest does that, but is
> also not part of the spec. Well, it *is* specified if you use
> subtests. I think the right approach is to make each
> "kunit_module/test suite" a test, and all the test cases will be
> subtests.
> 
> > > Directives are just a way to specify skipped tests and TODOs.
> > >
> > > Diagnostics seem useful, it looks like you can put whatever
> > > information in them and print them out at anytime. It looks like a lot
> > > of kselftests emit a lot of data this way.
> > >
> > > The YAML block seems to be the way that they prefer users to emit data
> > > beyond number of tests run and whether a test passed or failed. I
> > > could express most things I want to express in terms of YAML, but it
> > > is not the nicest format for displaying a lot of data like
> > > expectations, missed function calls, and other things which have a
> > > natural concise representation. Nevertheless, YAML readability is
> > > mostly a problem who won't be using the wrapper scripts.
> >
> > The examples in specification V13 and V14 look very simple and very
> > readable to me.  (And I am not a fan of YAML.)
> >
> >
> > > My biggest
> > > problem with the YAML block is that you can only have one, and TAP
> > > specifies that it must come after the corresponding ok/not ok line,
> > > which again has the issue that you have to hold on to a lot of
> > > diagnostic data longer than you ideally would. Another downside is
> > > that I now have to write a YAML serializer for the kernel.
> >
> > If a test generates diagnostic data, then I would 

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-07 Thread Theodore Ts'o
On Tue, May 07, 2019 at 10:01:19AM +0200, Greg KH wrote:
> > My understanding is that the intent of KUnit is to avoid booting a kernel on
> > real hardware or in a virtual machine.  That seems to be a matter of 
> > semantics
> > to me because isn't invoking a UML Linux just running the Linux kernel in
> > a different form of virtualization?
> > 
> > So I do not understand why KUnit is an improvement over kselftest.
> > 
> > It seems to me that KUnit is just another piece of infrastructure that I
> > am going to have to be familiar with as a kernel developer.  More overhead,
> > more information to stuff into my tiny little brain.
> > 
> > I would guess that some developers will focus on just one of the two test
> > environments (and some will focus on both), splitting the development
> > resources instead of pooling them on a common infrastructure.
> > 
> > What am I missing?
> 
> kselftest provides no in-kernel framework for testing kernel code
> specifically.  That should be what kunit provides, an "easy" way to
> write in-kernel tests for things.
> 
> Brendan, did I get it right?

Yes, that's basically right.  You don't *have* to use KUnit.  It's
supposed to be a simple way to run a large number of small tests that
for specific small components in a system.

For example, I currently use xfstests using KVM and GCE to test all of
ext4.  These tests require using multiple 5 GB and 20GB virtual disks,
and it works by mounting ext4 file systems and exercising ext4 through
the system call interfaces, using userspace tools such as fsstress,
fsx, fio, etc.  It requires time overhead to start the VM, create and
allocate virtual disks, etc.  For example, to run a single 3 seconds
xfstest (generic/001), it requires full 10 seconds to run it via
kvm-xfstests.

KUnit is something else; it's specifically intended to allow you to
create lightweight tests quickly and easily, and by reducing the
effort needed to write and run unit tests, hopefully we'll have a lot
more of them and thus improve kernel quality.

As an example, I have a volunteer working on developing KUinit tests
for ext4.  We're going to start by testing the ext4 extent status
tree.  The source code is at fs/ext4/extent_status.c; it's
approximately 1800 LOC.  The Kunit tests for the extent status tree
will exercise all of the corner cases for the various extent status
tree functions --- e.g., ext4_es_insert_delayed_block(),
ext4_es_remove_extent(), ext4_es_cache_extent(), etc.  And it will do
this in isolation without our needing to create a test file system or
using a test block device.

Next we'll test the ext4 block allocator, again in isolation.  To test
the block allocator we will have to write "mock functions" which
simulate reading allocation bitmaps from disk.  Again, this will allow
the test writer to explicitly construct corner cases and validate that
the block allocator works as expected without having to reverese
engineer file system data structures which will force a particular
code path to be executed.

So this is why it's largely irrelevant to me that KUinit uses UML.  In
fact, it's a feature.  We're not testing device drivers, or the
scheduler, or anything else architecture-specific.  UML is not about
virtualization.  What it's about in this context is allowing us to
start running test code as quickly as possible.  Booting KVM takes
about 3-4 seconds, and this includes initializing virtio_scsi and
other device drivers.  If by using UML we can hold the amount of
unnecessary kernel subsystem initialization down to the absolute
minimum, and if it means that we can communicating to the test
framework via a userspace "printf" from UML/KUnit code, as opposed to
via a virtual serial port to KVM's virtual console, it all makes for
lighter weight testing.

Why did I go looking for a volunteer to write KUnit tests for ext4?
Well, I have a plan to make some changes in restructing how ext4's
write path works, in order to support things like copy-on-write, a
more efficient delayed allocation system, etc.  This will require
making changes to the extent status tree, and by having unit tests for
the extent status tree, we'll be able to detect any bugs that we might
accidentally introduce in the es tree far more quickly than if we
didn't have those tests available.  Google has long found that having
these sorts of unit tests is a real win for developer velocity for any
non-trivial code module (or C++ class), even when you take into
account the time it takes to create the unit tests.

- Ted

P.S.  Many thanks to Brendan for finding such a volunteer for me; the
person in question is a SRE from Switzerland who is interested in
getting involved with kernel testing, and this is going to be their
20% project.  :-)

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 6/6] xfs: disable map_sync for async flush

2019-05-07 Thread Darrick J. Wong
On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta  wrote:
> >
> > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > with asynchronous dax_device. Virtio pmem provides
> > asynchronous host page cache flush mechanism. We don't
> > support 'MAP_SYNC' with virtio pmem and xfs.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  fs/xfs/xfs_file.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Darrick, does this look ok to take through the nvdimm tree?

 forgot about this, sorry. :/

> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index a7ceae90110e..f17652cca5ff 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> > struct file *filp,
> > struct vm_area_struct *vma)
> >  {
> > +   struct dax_device   *dax_dev;
> > +
> > +   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> > /*
> > -* We don't support synchronous mappings for non-DAX files. At least
> > -* until someone comes with a sensible use case.
> > +* We don't support synchronous mappings for non-DAX files and
> > +* for DAX files if underneath dax_device is not synchronous.
> >  */
> > -   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > +   if (!daxdev_mapping_supported(vma, dax_dev))
> > return -EOPNOTSUPP;

LGTM, and I'm fine with it going through nvdimm.  Nothing in
xfs-5.2-merge touches that function so it should be clean.

Reviewed-by: Darrick J. Wong 

--D

> >
> > file_accessed(filp);
> > --
> > 2.20.1
> >
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag

2019-05-07 Thread Dan Williams
On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta  wrote:
>
> This patch adds 'DAXDEV_SYNC' flag which is set
> for nd_region doing synchronous flush. This later
> is used to disable MAP_SYNC functionality for
> ext4 & xfs filesystem for devices don't support
> synchronous flush.
>
> Signed-off-by: Pankaj Gupta 
[..]
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 0dd316a74a29..c97fc0cc7167 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -7,6 +7,9 @@
>  #include 
>  #include 
>
> +/* Flag for synchronous flush */
> +#define DAXDEV_F_SYNC true

I'd feel better, i.e. it reads more canonically, if this was defined
as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
long flags' rather than a bool.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 6/6] xfs: disable map_sync for async flush

2019-05-07 Thread Dan Williams
On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta  wrote:
>
> Dont support 'MAP_SYNC' with non-DAX files and DAX files
> with asynchronous dax_device. Virtio pmem provides
> asynchronous host page cache flush mechanism. We don't
> support 'MAP_SYNC' with virtio pmem and xfs.
>
> Signed-off-by: Pankaj Gupta 
> ---
>  fs/xfs/xfs_file.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Darrick, does this look ok to take through the nvdimm tree?

>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a7ceae90110e..f17652cca5ff 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> struct file *filp,
> struct vm_area_struct *vma)
>  {
> +   struct dax_device   *dax_dev;
> +
> +   dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> /*
> -* We don't support synchronous mappings for non-DAX files. At least
> -* until someone comes with a sensible use case.
> +* We don't support synchronous mappings for non-DAX files and
> +* for DAX files if underneath dax_device is not synchronous.
>  */
> -   if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> +   if (!daxdev_mapping_supported(vma, dax_dev))
> return -EOPNOTSUPP;
>
> file_accessed(filp);
> --
> 2.20.1
>
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver

2019-05-07 Thread Dan Williams
Hi Pankaj,

Some minor file placement comments below.

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta  wrote:
>
> This patch adds virtio-pmem driver for KVM guest.
>
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
>
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
>
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/nvdimm/virtio_pmem.c | 114 +
>  drivers/virtio/Kconfig   |  10 +++
>  drivers/virtio/Makefile  |   1 +
>  drivers/virtio/pmem.c| 118 +++
>  include/linux/virtio_pmem.h  |  60 
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
>
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index ..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include 
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +   unsigned int len;
> +   unsigned long flags;
> +   struct virtio_pmem_request *req, *req_buf;
> +   struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +   spin_lock_irqsave(>pmem_lock, flags);
> +   while ((req = virtqueue_get_buf(vq, )) != NULL) {
> +   req->done = true;
> +   wake_up(>host_acked);
> +
> +   if (!list_empty(>req_list)) {
> +   req_buf = list_first_entry(>req_list,
> +   struct virtio_pmem_request, list);
> +   list_del(>req_list);
> +   req_buf->wq_buf_avail = true;
> +   wake_up(_buf->wq_buf);
> +   }
> +   }
> +   spin_unlock_irqrestore(>pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +   int err;
> +   unsigned long flags;
> +   struct scatterlist *sgs[2], sg, ret;
> +   struct virtio_device *vdev = nd_region->provider_data;
> +   struct virtio_pmem *vpmem = vdev->priv;
> +   struct virtio_pmem_request *req;
> +
> +   might_sleep();
> +   req = kmalloc(sizeof(*req), GFP_KERNEL);
> +   if (!req)
> +   return -ENOMEM;
> +
> +   req->done = req->wq_buf_avail = false;
> +   strcpy(req->name, "FLUSH");
> +   init_waitqueue_head(>host_acked);
> +   init_waitqueue_head(>wq_buf);
> +   sg_init_one(, req->name, strlen(req->name));
> +   sgs[0] = 
> +   sg_init_one(, >ret, sizeof(req->ret));
> +   sgs[1] = 
> +
> +   spin_lock_irqsave(>pmem_lock, flags);
> +   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +   if (err) {
> +   dev_err(>dev, "failed to send command to virtio pmem 
> device\n");
> +
> +   list_add_tail(>req_list, >list);
> +   spin_unlock_irqrestore(>pmem_lock, flags);
> +
> +   /* When host has read buffer, this completes via host_ack */
> +   wait_event(req->wq_buf, req->wq_buf_avail);
> +   spin_lock_irqsave(>pmem_lock, flags);
> +   }
> +   err = virtqueue_kick(vpmem->req_vq);
> +   spin_unlock_irqrestore(>pmem_lock, flags);
> +
> +   if (!err) {
> +   err = -EIO;
> +   goto ret;
> +   }
> +   /* When host has read buffer, this completes via host_ack */
> +   wait_event(req->host_acked, req->done);
> +   err = req->ret;
> +ret:
> +   kfree(req);
> +   return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +   int rc = 0;
> +
> +   /* Create child bio for asynchronous flush and chain with
> +* parent bio. Otherwise directly call nd_region flush.
> +*/
> +   if (bio && bio->bi_iter.bi_sector != -1) {
> +   struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +   if (!child)
> +

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-07 Thread shuah

On 5/7/19 2:01 AM, Greg KH wrote:

On Mon, May 06, 2019 at 08:14:12PM -0700, Frank Rowand wrote:

On 5/1/19 4:01 PM, Brendan Higgins wrote:

## TLDR

I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
5.2.

Shuah, I think you, Greg KH, and myself talked off thread, and we agreed
we would merge through your tree when the time came? Am I remembering
correctly?

## Background

This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.

Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).

KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.


As a result of the emails replying to this patch thread, I am now
starting to look at kselftest.  My level of understanding is based
on some slide presentations, an LWN article, https://kselftest.wiki.kernel.org/
and a _tiny_ bit of looking at kselftest code.

tl;dr; I don't really understand kselftest yet.


(1) why KUnit exists


## What's so special about unit testing?

A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.


(2) KUnit is not meant to replace kselftest


## Is KUnit trying to replace other testing frameworks for the kernel?

No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.


My understanding is that the intent of KUnit is to avoid booting a kernel on
real hardware or in a virtual machine.  That seems to be a matter of semantics
to me because isn't invoking a UML Linux just running the Linux kernel in
a different form of virtualization?

So I do not understand why KUnit is an improvement over kselftest.


They are in two different categories. Kselftest falls into black box
regression test suite which is a collection of user-space tests with a
few kernel test modules back-ending the tests in some cases.

Kselftest can be used by both kernel developers and users and provides
a good way to regression test releases in test rings.

KUnit is a white box category and is a better fit as unit test framework
for development and provides a in-kernel testing. I wouldn't view them
one replacing the other. They just provide coverage for different areas
of testing.

I wouldn't view KUnit as something that would be easily run in test 
rings for example.


Brendan, does that sound about right?



It seems to me that KUnit is just another piece of infrastructure that I
am going to have to be familiar with as a kernel developer.  More overhead,
more information to stuff into my tiny little brain.

I would guess that some developers will focus on just one of the two test
environments (and some will focus on both), splitting the development
resources instead of pooling them on a common infrastructure.



What am I missing?


kselftest provides no in-kernel framework for testing kernel code
specifically.  That should be what kunit provides, an "easy" way to
write in-kernel tests for things.

Brendan, did I get it right?

thanks,
-- Shuah
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] drivers/dax: Allow to include DEV_DAX_PMEM as builtin

2019-05-07 Thread Dan Williams
On Tue, May 7, 2019 at 4:50 AM Aneesh Kumar K.V
 wrote:
>
>
> Hi Dan,
>
> "Aneesh Kumar K.V"  writes:
>
> > This move the dependency to DEV_DAX_PMEM_COMPAT such that only
> > if DEV_DAX_PMEM is built as module we can allow the compat support.
> >
> > This allows to test the new code easily in a emulation setup where we
> > often build things without module support.
> >
> > Signed-off-by: Aneesh Kumar K.V 
>
> Any update on this. Can we merge this?

Applied for the v5.2 pull request.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] drivers/dax: Allow to include DEV_DAX_PMEM as builtin

2019-05-07 Thread Aneesh Kumar K.V


Hi Dan,

"Aneesh Kumar K.V"  writes:

> This move the dependency to DEV_DAX_PMEM_COMPAT such that only
> if DEV_DAX_PMEM is built as module we can allow the compat support.
>
> This allows to test the new code easily in a emulation setup where we
> often build things without module support.
>
> Signed-off-by: Aneesh Kumar K.V 

Any update on this. Can we merge this?

> ---
> Changes from V1:
> * Make sure we only build compat code as module
>
>  drivers/dax/Kconfig | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
> index 5ef624fe3934..a59f338f520f 100644
> --- a/drivers/dax/Kconfig
> +++ b/drivers/dax/Kconfig
> @@ -23,7 +23,6 @@ config DEV_DAX
>  config DEV_DAX_PMEM
>   tristate "PMEM DAX: direct access to persistent memory"
>   depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX
> - depends on m # until we can kill DEV_DAX_PMEM_COMPAT
>   default DEV_DAX
>   help
> Support raw access to persistent memory.  Note that this
> @@ -50,7 +49,7 @@ config DEV_DAX_KMEM
>  
>  config DEV_DAX_PMEM_COMPAT
>   tristate "PMEM DAX: support the deprecated /sys/class/dax interface"
> - depends on DEV_DAX_PMEM
> + depends on m && DEV_DAX_PMEM=m
>   default DEV_DAX_PMEM
>   help
> Older versions of the libdaxctl library expect to find all
> -- 
> 2.20.1

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-07 Thread Greg KH
On Mon, May 06, 2019 at 08:14:12PM -0700, Frank Rowand wrote:
> On 5/1/19 4:01 PM, Brendan Higgins wrote:
> > ## TLDR
> > 
> > I rebased the last patchset on 5.1-rc7 in hopes that we can get this in
> > 5.2.
> > 
> > Shuah, I think you, Greg KH, and myself talked off thread, and we agreed
> > we would merge through your tree when the time came? Am I remembering
> > correctly?
> > 
> > ## Background
> > 
> > This patch set proposes KUnit, a lightweight unit testing and mocking
> > framework for the Linux kernel.
> > 
> > Unlike Autotest and kselftest, KUnit is a true unit testing framework;
> > it does not require installing the kernel on a test machine or in a VM
> > and does not require tests to be written in userspace running on a host
> > kernel. Additionally, KUnit is fast: From invocation to completion KUnit
> > can run several dozen tests in under a second. Currently, the entire
> > KUnit test suite for KUnit runs in under a second from the initial
> > invocation (build time excluded).
> > 
> > KUnit is heavily inspired by JUnit, Python's unittest.mock, and
> > Googletest/Googlemock for C++. KUnit provides facilities for defining
> > unit test cases, grouping related test cases into test suites, providing
> > common infrastructure for running tests, mocking, spying, and much more.
> 
> As a result of the emails replying to this patch thread, I am now
> starting to look at kselftest.  My level of understanding is based
> on some slide presentations, an LWN article, 
> https://kselftest.wiki.kernel.org/
> and a _tiny_ bit of looking at kselftest code.
> 
> tl;dr; I don't really understand kselftest yet.
> 
> 
> (1) why KUnit exists
> 
> > ## What's so special about unit testing?
> > 
> > A unit test is supposed to test a single unit of code in isolation,
> > hence the name. There should be no dependencies outside the control of
> > the test; this means no external dependencies, which makes tests orders
> > of magnitudes faster. Likewise, since there are no external dependencies,
> > there are no hoops to jump through to run the tests. Additionally, this
> > makes unit tests deterministic: a failing unit test always indicates a
> > problem. Finally, because unit tests necessarily have finer granularity,
> > they are able to test all code paths easily solving the classic problem
> > of difficulty in exercising error handling code.
> 
> (2) KUnit is not meant to replace kselftest
> 
> > ## Is KUnit trying to replace other testing frameworks for the kernel?
> > 
> > No. Most existing tests for the Linux kernel are end-to-end tests, which
> > have their place. A well tested system has lots of unit tests, a
> > reasonable number of integration tests, and some end-to-end tests. KUnit
> > is just trying to address the unit test space which is currently not
> > being addressed.
> 
> My understanding is that the intent of KUnit is to avoid booting a kernel on
> real hardware or in a virtual machine.  That seems to be a matter of semantics
> to me because isn't invoking a UML Linux just running the Linux kernel in
> a different form of virtualization?
> 
> So I do not understand why KUnit is an improvement over kselftest.
> 
> It seems to me that KUnit is just another piece of infrastructure that I
> am going to have to be familiar with as a kernel developer.  More overhead,
> more information to stuff into my tiny little brain.
> 
> I would guess that some developers will focus on just one of the two test
> environments (and some will focus on both), splitting the development
> resources instead of pooling them on a common infrastructure.
> 
> What am I missing?

kselftest provides no in-kernel framework for testing kernel code
specifically.  That should be what kunit provides, an "easy" way to
write in-kernel tests for things.

Brendan, did I get it right?

thanks,

greg k-h
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm