RE: [External] Re: A question about ndctl namespace

2018-07-09 Thread Huaisheng HS1 Ye
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Tuesday, July 10, 2018 9:42 AM
> On Mon, Jul 9, 2018 at 6:33 PM, Huaisheng HS1 Ye  wrote:
> > Hi All,
> >
> > I want to destroy some redundant namespaces (namespace0.0 and 0.1) from
> > my system.
> > But it still doesn’t work even I used ‘destroy-namespace’, the result
> > of command looks ok, but I still could find them there. How can I remove
> > them so they wouldn't appear in next command 'ndctl list'?
> 
> Short answer, you can't.
> 
> This is a general misconception about what the 'destroy-namespace'
> command does. It destroys any info-block metadata and if the namespace
> is not a 'seed' device or the 0th device it may additionally delete
> the kernel device object. See this comment from the kernel source.
> 
> /*
>  * Try to delete the namespace if we deleted all of its
>  * allocation, this is not the seed or 0th device for the
>  * region, and it is not actively claimed by a btt, pfn, or dax
>  * instance.
>  */
> if (val == 0 && id != 0 && nd_region->ns_seed != dev && !ndns->claim)
> nd_device_unregister(dev, ND_ASYNC);
> 
> The reason we don't delete the seed device is that we always need at
> least one seed device to configure the next namespace, and we keep the
> 0th device around to try to make sure the first namespace in a region
> gets the namespaceX.0 device.

Thanks for your quick reply and detailed explanation, and it looks like
NamespaceX.1 also couldn't be deleted.

dmesg gets information like this,
[72254.583858] nd namespace0.1: (0)
[72254.589424] nd namespace0.1: __size_store: uuid not set
[72254.596454] nd namespace0.1: 0 fail (-6)

That is to say, __size_store exit at here,
/*
 * We need a uuid for the allocation-label and dimm(s) on which
 * to store the label.
 */
if (uuid_not_set(uuid, dev, __func__))
return -ENXIO;


Cheers,
Huaisheng Ye
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [kbuild-all] [PATCH v6 2/4] resource: Use list_head to link sibling resource

2018-07-09 Thread Baoquan He
On 07/10/18 at 08:59am, Ye Xiaolong wrote:
> Hi,
> 
> On 07/08, Baoquan He wrote:
> >Hi,
> >
> >On 07/05/18 at 01:00am, kbuild test robot wrote:
> >> Hi Baoquan,
> >> 
> >> I love your patch! Yet something to improve:
> >> 
> >> [auto build test ERROR on linus/master]
> >> [also build test ERROR on v4.18-rc3 next-20180704]
> >> [if your patch is applied to the wrong git tree, please drop us a note to 
> >> help improve the system]
> >
> >Thanks for telling. 
> >
> >I cloned 0day-ci/linut to my local pc.
> >https://github.com/0day-ci/linux.git
> >
> >However, I didn't find below branch. And tried to open it in web
> >broswer, also failed.
> >
> 
> Sorry for the inconvenience, 0day bot didn't push the branch to github 
> successfully,
> Just push it manually, you can have a try again.

Thanks, Xiaolong, I have applied them on top of linux-next/master, and
copy the config file attached, and run the command to reproduce as
suggested. Now I have fixed all those issues reported, will repost.

> 
> >
> >> url:
> >> https://github.com/0day-ci/linux/commits/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180704-121402
> >> config: mips-rb532_defconfig (attached as .config)
> >> compiler: mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> >> reproduce:
> >> wget 
> >> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
> >> -O ~/bin/make.cross
> >> chmod +x ~/bin/make.cross
> >> # save the attached .config to linux build tree
> >> GCC_VERSION=7.2.0 make.cross ARCH=mips 
> >
> >I did find a old one which is for the old version 5 post.
> >
> >[bhe@linux]$ git remote -v
> >0day-ci  https://github.com/0day-ci/linux.git (fetch)
> >0day-ci  https://github.com/0day-ci/linux.git (push)
> >[bhe@dhcp-128-28 linux]$ git branch -a| grep Baoquan| grep resource
> >  
> > remotes/0day-ci/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180612-113600
> >
> >Could you help have a look at this?
> >
> >Thanks
> >Baoquan
> >
> >> 
> >> All error/warnings (new ones prefixed by >>):
> >> 
> >> >> arch/mips/pci/pci-rc32434.c:57:11: error: initialization from 
> >> >> incompatible pointer type [-Werror=incompatible-pointer-types]
> >>  .child = _res_pci_mem2
> >>   ^
> >>arch/mips/pci/pci-rc32434.c:57:11: note: (near initialization for 
> >> 'rc32434_res_pci_mem1.child.next')
> >> >> arch/mips/pci/pci-rc32434.c:51:47: warning: missing braces around 
> >> >> initializer [-Wmissing-braces]
> >> static struct resource rc32434_res_pci_mem1 = {
> >>   ^
> >>arch/mips/pci/pci-rc32434.c:60:47: warning: missing braces around 
> >> initializer [-Wmissing-braces]
> >> static struct resource rc32434_res_pci_mem2 = {
> >>   ^
> >>cc1: some warnings being treated as errors
> >> 
> >> vim +57 arch/mips/pci/pci-rc32434.c
> >> 
> >> 73b4390f Ralf Baechle 2008-07-16  50  
> >> 73b4390f Ralf Baechle 2008-07-16 @51  static struct resource 
> >> rc32434_res_pci_mem1 = {
> >> 73b4390f Ralf Baechle 2008-07-16  52   .name = "PCI MEM1",
> >> 73b4390f Ralf Baechle 2008-07-16  53   .start = 0x5000,
> >> 73b4390f Ralf Baechle 2008-07-16  54   .end = 0x5FFF,
> >> 73b4390f Ralf Baechle 2008-07-16  55   .flags = IORESOURCE_MEM,
> >> 73b4390f Ralf Baechle 2008-07-16  56   .sibling = NULL,
> >> 73b4390f Ralf Baechle 2008-07-16 @57   .child = _res_pci_mem2
> >> 73b4390f Ralf Baechle 2008-07-16  58  };
> >> 73b4390f Ralf Baechle 2008-07-16  59  
> >> 
> >> :: The code at line 57 was first introduced by commit
> >> :: 73b4390fb23456964201abda79f1210fe337d01a [MIPS] Routerboard 532: 
> >> Support for base system
> >> 
> >> :: TO: Ralf Baechle 
> >> :: CC: Ralf Baechle 
> >> 
> >> ---
> >> 0-DAY kernel test infrastructureOpen Source Technology 
> >> Center
> >> https://lists.01.org/pipermail/kbuild-all   Intel 
> >> Corporation
> >
> >
> >___
> >kbuild-all mailing list
> >kbuild-...@lists.01.org
> >https://lists.01.org/mailman/listinfo/kbuild-all
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: A question about ndctl namespace

2018-07-09 Thread Dan Williams
On Mon, Jul 9, 2018 at 6:33 PM, Huaisheng HS1 Ye  wrote:
> Hi All,
>
> I want to destroy some redundant namespaces (namespace0.0 and 0.1) from
> my system.
> But it still doesn’t work even I used ‘destroy-namespace’, the result
> of command looks ok, but I still could find them there. How can I remove
> them so they wouldn't appear in next command 'ndctl list'?

Short answer, you can't.

This is a general misconception about what the 'destroy-namespace'
command does. It destroys any info-block metadata and if the namespace
is not a 'seed' device or the 0th device it may additionally delete
the kernel device object. See this comment from the kernel source.

/*
 * Try to delete the namespace if we deleted all of its
 * allocation, this is not the seed or 0th device for the
 * region, and it is not actively claimed by a btt, pfn, or dax
 * instance.
 */
if (val == 0 && id != 0 && nd_region->ns_seed != dev && !ndns->claim)
nd_device_unregister(dev, ND_ASYNC);

The reason we don't delete the seed device is that we always need at
least one seed device to configure the next namespace, and we keep the
0th device around to try to make sure the first namespace in a region
gets the namespaceX.0 device.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


A question about ndctl namespace

2018-07-09 Thread Huaisheng HS1 Ye
Hi All,

I want to destroy some redundant namespaces (namespace0.0 and 0.1) from
my system.
But it still doesn’t work even I used ‘destroy-namespace’, the result
of command looks ok, but I still could find them there. How can I remove
them so they wouldn't appear in next command 'ndctl list'?

My questions are below,

1. Is there a bug of ndctl, or just working as designed?
2. If this is correct, why there are two namespaces as *.0 and *.1 for
every single NVDIMM? BTW, I have two NVDIMMs installed on my test platform.


[root@fedora25-03 ~]# ndctl destroy-namespace namespace0.0
destroyed 1 namespace
[root@fedora25-03 ~]# ndctl destroy-namespace namespace0.1
destroyed 1 namespace
[root@fedora25-03 ~]# ndctl list -i
[
  {
"dev":"namespace1.0",
"mode":"fsdax",
"size":132118478848,
"uuid":"454f9a8c-8d23-477b-a1b0-3aabb167d2f1",
"raw_uuid":"a9d87d6c-6bb8-40fa-bebc-41c0a8e029d2",
"sector_size":512,
"blockdev":"pmem1",
"name":"yhs_pmem1"
  },
  {
"dev":"namespace1.1",
"mode":"raw",
"size":0,
"uuid":"----",
"sector_size":512,
"state":"disabled"
  },
  {
"dev":"namespace0.1",
"mode":"raw",
"size":0,
"uuid":"----",
"sector_size":512,
"state":"disabled"
  },
  {
"dev":"namespace0.0",
"mode":"raw",
"size":0,
"uuid":"----",
"sector_size":512,
"state":"disabled"
  }
]

Cheers,
Huaisheng Ye | 叶怀胜
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [kbuild-all] [PATCH v6 2/4] resource: Use list_head to link sibling resource

2018-07-09 Thread Ye Xiaolong
Hi,

On 07/08, Baoquan He wrote:
>Hi,
>
>On 07/05/18 at 01:00am, kbuild test robot wrote:
>> Hi Baoquan,
>> 
>> I love your patch! Yet something to improve:
>> 
>> [auto build test ERROR on linus/master]
>> [also build test ERROR on v4.18-rc3 next-20180704]
>> [if your patch is applied to the wrong git tree, please drop us a note to 
>> help improve the system]
>
>Thanks for telling. 
>
>I cloned 0day-ci/linut to my local pc.
>https://github.com/0day-ci/linux.git
>
>However, I didn't find below branch. And tried to open it in web
>broswer, also failed.
>

Sorry for the inconvenience, 0day bot didn't push the branch to github 
successfully,
Just push it manually, you can have a try again.

Thanks,
Xiaolong


>
>> url:
>> https://github.com/0day-ci/linux/commits/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180704-121402
>> config: mips-rb532_defconfig (attached as .config)
>> compiler: mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
>> reproduce:
>> wget 
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
>> ~/bin/make.cross
>> chmod +x ~/bin/make.cross
>> # save the attached .config to linux build tree
>> GCC_VERSION=7.2.0 make.cross ARCH=mips 
>
>I did find a old one which is for the old version 5 post.
>
>[bhe@linux]$ git remote -v
>0day-cihttps://github.com/0day-ci/linux.git (fetch)
>0day-cihttps://github.com/0day-ci/linux.git (push)
>[bhe@dhcp-128-28 linux]$ git branch -a| grep Baoquan| grep resource
>  
> remotes/0day-ci/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180612-113600
>
>Could you help have a look at this?
>
>Thanks
>Baoquan
>
>> 
>> All error/warnings (new ones prefixed by >>):
>> 
>> >> arch/mips/pci/pci-rc32434.c:57:11: error: initialization from 
>> >> incompatible pointer type [-Werror=incompatible-pointer-types]
>>  .child = _res_pci_mem2
>>   ^
>>arch/mips/pci/pci-rc32434.c:57:11: note: (near initialization for 
>> 'rc32434_res_pci_mem1.child.next')
>> >> arch/mips/pci/pci-rc32434.c:51:47: warning: missing braces around 
>> >> initializer [-Wmissing-braces]
>> static struct resource rc32434_res_pci_mem1 = {
>>   ^
>>arch/mips/pci/pci-rc32434.c:60:47: warning: missing braces around 
>> initializer [-Wmissing-braces]
>> static struct resource rc32434_res_pci_mem2 = {
>>   ^
>>cc1: some warnings being treated as errors
>> 
>> vim +57 arch/mips/pci/pci-rc32434.c
>> 
>> 73b4390f Ralf Baechle 2008-07-16  50  
>> 73b4390f Ralf Baechle 2008-07-16 @51  static struct resource 
>> rc32434_res_pci_mem1 = {
>> 73b4390f Ralf Baechle 2008-07-16  52 .name = "PCI MEM1",
>> 73b4390f Ralf Baechle 2008-07-16  53 .start = 0x5000,
>> 73b4390f Ralf Baechle 2008-07-16  54 .end = 0x5FFF,
>> 73b4390f Ralf Baechle 2008-07-16  55 .flags = IORESOURCE_MEM,
>> 73b4390f Ralf Baechle 2008-07-16  56 .sibling = NULL,
>> 73b4390f Ralf Baechle 2008-07-16 @57 .child = _res_pci_mem2
>> 73b4390f Ralf Baechle 2008-07-16  58  };
>> 73b4390f Ralf Baechle 2008-07-16  59  
>> 
>> :: The code at line 57 was first introduced by commit
>> :: 73b4390fb23456964201abda79f1210fe337d01a [MIPS] Routerboard 532: 
>> Support for base system
>> 
>> :: TO: Ralf Baechle 
>> :: CC: Ralf Baechle 
>> 
>> ---
>> 0-DAY kernel test infrastructureOpen Source Technology Center
>> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
>
>
>___
>kbuild-all mailing list
>kbuild-...@lists.01.org
>https://lists.01.org/mailman/listinfo/kbuild-all
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 04/11] nfit/libnvdimm: adding unlock of nvdimm support for Intel DIMMs

2018-07-09 Thread Dan Williams
On Thu, Jul 5, 2018 at 1:22 PM, Dave Jiang  wrote:
> Adding support to allow query the security status of the Intel nvdimms and
> also unlock the dimm via the kernel key management APIs. The passphrase is
> expected to be pulled from userspace through keyutils. Moving the Intel
> related bits to its own source file as well.
>
> Signed-off-by: Dave Jiang 
> ---
>  drivers/acpi/nfit/Makefile |2 -
>  drivers/acpi/nfit/core.c   |   11 +++
>  drivers/acpi/nfit/intel.c  |  146 
> 
>  drivers/acpi/nfit/intel.h  |2 +
>  drivers/nvdimm/dimm.c  |7 ++
>  drivers/nvdimm/dimm_devs.c |  108 -
>  drivers/nvdimm/nd-core.h   |2 +
>  drivers/nvdimm/nd.h|2 +
>  include/linux/libnvdimm.h  |   23 +++
>  9 files changed, 299 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/acpi/nfit/intel.c
>
> diff --git a/drivers/acpi/nfit/Makefile b/drivers/acpi/nfit/Makefile
> index a407e769f103..8287005f9226 100644
> --- a/drivers/acpi/nfit/Makefile
> +++ b/drivers/acpi/nfit/Makefile
> @@ -1,3 +1,3 @@
>  obj-$(CONFIG_ACPI_NFIT) := nfit.o
> -nfit-y := core.o
> +nfit-y := core.o intel.o

See below, this needs to be

nfit-$(CONFIG_X86) += intel.o

>  nfit-$(CONFIG_X86_MCE) += mce.o
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index d474d53473fe..8304ad6fcea4 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -1838,6 +1838,14 @@ static int acpi_nfit_get_dimm_id(struct 
> acpi_nfit_control_region *dcr,
> be32_to_cpu(dcr->serial_number));
>  }
>
> +static struct nvdimm_security_ops *acpi_nfit_get_security_ops(int family)
> +{
> +   if (family == NVDIMM_FAMILY_INTEL)
> +   return _security_ops;
> +   else
> +   return NULL;
> +}
> +
>  static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
>  {
> struct nfit_mem *nfit_mem;
> @@ -1908,7 +1916,8 @@ static int acpi_nfit_register_dimms(struct 
> acpi_nfit_desc *acpi_desc)
> nvdimm = nvdimm_create(acpi_desc->nvdimm_bus, nfit_mem,
> acpi_nfit_dimm_attribute_groups,
> flags, cmd_mask, flush ? flush->hint_count : 
> 0,
> -   nfit_mem->flush_wpq, nfit_mem->id);
> +   nfit_mem->flush_wpq, nfit_mem->id,
> +   acpi_nfit_get_security_ops(nfit_mem->family));
> if (!nvdimm)
> return -ENOMEM;
>
> diff --git a/drivers/acpi/nfit/intel.c b/drivers/acpi/nfit/intel.c
> new file mode 100644
> index ..c1140f6901da
> --- /dev/null
> +++ b/drivers/acpi/nfit/intel.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright(c) 2018 Intel Corporation. All rights reserved. */
> +/*
> + * Intel specific NFIT ops
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "intel.h"
> +#include "nfit.h"
> +
> +static int intel_dimm_security_unlock(struct nvdimm_bus *nvdimm_bus,
> +   struct nvdimm *nvdimm, struct nvdimm_key_data *nkey)
> +{
> +   struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
> +   int cmd_rc, rc = 0;
> +   struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm);
> +   struct {
> +   struct nd_cmd_pkg pkg;
> +   struct nd_intel_unlock_unit cmd;
> +   } nd_cmd = {
> +   .pkg = {
> +   .nd_command = NVDIMM_INTEL_UNLOCK_UNIT,
> +   .nd_family = NVDIMM_FAMILY_INTEL,
> +   .nd_size_in = ND_INTEL_PASSPHRASE_SIZE,
> +   .nd_size_out = ND_INTEL_STATUS_SIZE,
> +   .nd_fw_size = ND_INTEL_STATUS_SIZE,
> +   },
> +   .cmd = {
> +   .status = 0,
> +   },
> +   };
> +
> +   if (!test_bit(NVDIMM_INTEL_UNLOCK_UNIT, _mem->dsm_mask))
> +   return -ENOTTY;
> +
> +   memcpy(nd_cmd.cmd.passphrase, nkey->data, ND_INTEL_PASSPHRASE_SIZE);
> +   rc = nd_desc->ndctl(nd_desc, nvdimm, ND_CMD_CALL, _cmd,
> +   sizeof(nd_cmd), _rc);
> +   if (rc < 0)
> +   goto out;
> +   if (cmd_rc < 0) {
> +   rc = cmd_rc;
> +   goto out;
> +   }
> +
> +   switch (nd_cmd.cmd.status) {
> +   case 0:
> +   break;
> +   case ND_INTEL_STATUS_INVALID_PASS:
> +   rc = -EINVAL;
> +   goto out;
> +   case ND_INTEL_STATUS_INVALID_STATE:
> +   default:
> +   rc = -ENXIO;
> +   goto out;
> +   }
> +
> +   /* DIMM unlocked, invalidate all CPU caches before we read it */
> +   wbinvd();

This needs to be run on all cpus, there's a helper for that

Re: [PATCHv2 1/2] libnvdimm: Use max contiguous area for namespace size

2018-07-09 Thread Dan Williams
On Mon, Jul 9, 2018 at 8:44 AM, Keith Busch  wrote:
> On Fri, Jul 06, 2018 at 03:25:15PM -0700, Dan Williams wrote:
>> This is going in the right direction... but still needs to account for
>> the blk_overlap.
>>
>> So, on a given DIMM BLK capacity is allocated from the top of DPA
>> space going down and PMEM capacity is allocated from the bottom of the
>> DPA space going up.
>>
>> Since BLK capacity is single DIMM, and PMEM capacity is striped you
>> could get into the situation where one DIMM is fully allocated for BLK
>> usage and that would shade / remove the possibility to use the PMEM
>> capacity on the other DIMMs in the PMEM set. PMEM needs all the same
>> DPAs in all the DIMMs to be free.
>>
>> >
>> > ---
>> > diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
>> > index 8d348b22ba45..f30e0c3b0282 100644
>> > --- a/drivers/nvdimm/dimm_devs.c
>> > +++ b/drivers/nvdimm/dimm_devs.c
>> > @@ -536,6 +536,31 @@ resource_size_t nd_blk_available_dpa(struct nd_region 
>> > *nd_region)
>> > return info.available;
>> >  }
>> >
>> > +/**
>> > + * nd_pmem_max_contiguous_dpa - For the given dimm+region, return the max
>> > + * contiguous unallocated dpa range.
>> > + * @nd_region: constrain available space check to this reference region
>> > + * @nd_mapping: container of dpa-resource-root + labels
>> > + */
>> > +resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region,
>> > +  struct nd_mapping *nd_mapping)
>> > +{
>> > +   struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
>> > +   resource_size_t max = 0;
>> > +   struct resource *res;
>> > +
>> > +   if (!ndd)
>> > +   return 0;
>> > +
>> > +   for_each_dpa_resource(ndd, res) {
>> > +   if (strcmp(res->name, "pmem-reserve") != 0)
>> > +   continue;
>> > +   if (resource_size(res) > max)
>>
>> ...so instead straight resource_size() here you need trim the end of
>> this "pmem-reserve" resource to the start of the first BLK allocation
>> in any of the DIMMs in the set.
>>
>> See blk_start calculation in nd_pmem_available_dpa().
>
> Hmm, the resources defining this are a bit inconvenient given these
> constraints. If an unallocated portion of a DIMM may only be used for
> BLK because an overlapping range in another DIMM is allocated that way,
> would it make since to insert something like a "blk-reserve" resource
> in all the other DIMMs so we don't need multiple iterations to calculate
> which DPAs can be used for PMEM?

Do you mean temporarily allocate the blk-reserve? You could have a
different sized / offset BLK allocation on each DIMM in the PMEM set.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 2/2] ext4: handle layout changes to pinned DAX mappings

2018-07-09 Thread Jan Kara
On Mon 09-07-18 09:23:38, Darrick J. Wong wrote:
> On Mon, Jul 09, 2018 at 02:33:47PM +0200, Jan Kara wrote:
> > On Thu 05-07-18 10:53:10, Ross Zwisler wrote:
> > > On Wed, Jul 04, 2018 at 08:59:52PM -0700, Darrick J. Wong wrote:
> > > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > > > > On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > > > > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > > > > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > > > > > > Follow the lead of xfs_break_dax_layouts() and add 
> > > > > > > > synchronization between
> > > > > > > > operations in ext4 which remove blocks from an inode (hole 
> > > > > > > > punch, truncate
> > > > > > > > down, etc.) and pages which are pinned due to DAX DMA 
> > > > > > > > operations.
> > > > > > > > 
> > > > > > > > Signed-off-by: Ross Zwisler 
> > > > > > > > Reviewed-by: Jan Kara 
> > > > > > > > Reviewed-by: Lukas Czerner 
> > > > > > > > ---
> > > > > > > > 
> > > > > > > > Changes since v2:
> > > > > > > >  * Added a comment to ext4_insert_range() explaining why we 
> > > > > > > > don't call
> > > > > > > >ext4_break_layouts(). (Jan)
> > > > > > > 
> > > > > > > Which I think is wrong and will cause data corruption.
> > > > > > > 
> > > > > > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode 
> > > > > > > > *inode, loff_t offset, loff_t len)
> > > > > > > > LLONG_MAX);
> > > > > > > > if (ret)
> > > > > > > > goto out_mmap;
> > > > > > > > +   /*
> > > > > > > > +* We don't need to call ext4_break_layouts() because 
> > > > > > > > we aren't
> > > > > > > > +* removing any blocks from the inode.  We are just 
> > > > > > > > changing their
> > > > > > > > +* offset by inserting a hole.
> > > > > > > > +*/
> > > > 
> > > > Does calling ext4_break_layouts from insert range not work?
> > > > 
> > > > It's my understanding that file leases work are a mechanism for the
> > > > filesystem to delegate some of its authority over physical space
> > > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > > directly into userspace and exporting space on shared storage over
> > > > pNFS.  Some day we might use the same mechanism for the similar things
> > > > that RDMA does, or the swapfile code since that's essentially how it
> > > > works today.
> > > > 
> > > > The other part of these file leases is that the filesystem revokes them
> > > > any time it wants to perform a mapping operation on a file.  This breaks
> > > > my mental model of how leases work, and if you commit to this for ext4
> > > > then I'll have to remember that leases are different between xfs and
> > > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > > be the implementation detail that "DAX won't care", then someone else
> > > > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > > > back into ext4 or else kaboom.
> > > > 
> > > > Granted, Dave said all these things already, but I actually feel
> > > > strongly enough to reiterate.
> > > 
> > > Jan, would you like me to call ext4_break_layouts() in 
> > > ext4_insert_range() to
> > > keep the lease mechanism consistent between ext4 and XFS, or would you 
> > > prefer
> > > the s/ext4_break_layouts()/ext4_dax_unmap_pages()/ rename?
> > 
> > Let's just call it from ext4_insert_range(). I think the simple semantics
> > Dave and Darrick defend is more maintainable and insert range isn't really
> > performance critical operation.
> > 
> > The question remains whether equivalent of BREAK_UNMAP is really required
> > also for allocation of new blocks using fallocate. Because that doesn't
> > really seem fundamentally different from normal write which uses
> > BREAK_WRITE for xfs_break_layouts(). And that it more often used operation
> > so bothering with GUP synchronization when not needed could hurt there.
> > Dave, Darrick?
> 
> Hmm, IIRC BREAK_UNMAP is supposed to be for callers who are going to
> remove (or move) mappings that already exist, so that the caller blocks
> until the lessee acknowledges that they've forgotten all the mappings
> they knew about.  So I /think/ for fallocate mode 0 I think this could
> be BREAK_WRITE instead of _UNMAP, though (at least for xfs) the other
> modes all need _UNMAP.

OK, so we are on the same page here :)

> Side question: in xfs_file_aio_write_checks, do we need to do
> BREAK_UNMAP if is possible that writeback will end up performing a copy
> write?  Granted, the pnfs export and dax stuff don't support reflink or
> cow so I guess this is an academic question for now...

My naive understanding would be that yes, BREAK_UNMAP is needed in such
case...

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org

Re: [PATCH 00/13] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE

2018-07-09 Thread Dan Williams
On Mon, Jul 9, 2018 at 5:56 AM, Jan Kara  wrote:
> On Wed 04-07-18 23:49:02, Dan Williams wrote:
>> In order to keep pfn_to_page() a simple offset calculation the 'struct
>> page' memmap needs to be mapped and initialized in advance of any usage
>> of a page. This poses a problem for large memory systems as it delays
>> full availability of memory resources for 10s to 100s of seconds.
>>
>> For typical 'System RAM' the problem is mitigated by the fact that large
>> memory allocations tend to happen after the kernel has fully initialized
>> and userspace services / applications are launched. A small amount, 2GB
>> of memory, is initialized up front. The remainder is initialized in the
>> background and freed to the page allocator over time.
>>
>> Unfortunately, that scheme is not directly reusable for persistent
>> memory and dax because userspace has visibility to the entire resource
>> pool and can choose to access any offset directly at its choosing. In
>> other words there is no allocator indirection where the kernel can
>> satisfy requests with arbitrary pages as they become initialized.
>>
>> That said, we can approximate the optimization by performing the
>> initialization in the background, allow the kernel to fully boot the
>> platform, start up pmem block devices, mount filesystems in dax mode,
>> and only incur the delay at the first userspace dax fault.
>>
>> With this change an 8 socket system was observed to initialize pmem
>> namespaces in ~4 seconds whereas it was previously taking ~4 minutes.
>>
>> These patches apply on top of the HMM + devm_memremap_pages() reworks
>> [1]. Andrew, once the reviews come back, please consider this series for
>> -mm as well.
>>
>> [1]: https://lkml.org/lkml/2018/6/19/108
>
> One question: Why not (in addition to background initialization) have
> ->direct_access() initialize a block of struct pages around the pfn it
> needs if it finds it's not initialized yet? That would make devices usable
> immediately without waiting for init to complete...

Hmm, yes, relatively immediately... it would depend on the granularity
of the tracking where we can reliably steal initialization work from
the background thread. I'll give it a shot, I'm thinking dividing each
thread's work into 64 sub-units and track those units with a bitmap.
The worst case init time then becomes the time to initialize the pages
for a range that is namespace-size / (NR_MEMMAP_THREADS * 64).
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 2/2] ext4: handle layout changes to pinned DAX mappings

2018-07-09 Thread Darrick J. Wong
On Mon, Jul 09, 2018 at 02:33:47PM +0200, Jan Kara wrote:
> On Thu 05-07-18 10:53:10, Ross Zwisler wrote:
> > On Wed, Jul 04, 2018 at 08:59:52PM -0700, Darrick J. Wong wrote:
> > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > > > On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > > > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > > > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > > > > > Follow the lead of xfs_break_dax_layouts() and add 
> > > > > > > synchronization between
> > > > > > > operations in ext4 which remove blocks from an inode (hole punch, 
> > > > > > > truncate
> > > > > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > > > > > 
> > > > > > > Signed-off-by: Ross Zwisler 
> > > > > > > Reviewed-by: Jan Kara 
> > > > > > > Reviewed-by: Lukas Czerner 
> > > > > > > ---
> > > > > > > 
> > > > > > > Changes since v2:
> > > > > > >  * Added a comment to ext4_insert_range() explaining why we don't 
> > > > > > > call
> > > > > > >ext4_break_layouts(). (Jan)
> > > > > > 
> > > > > > Which I think is wrong and will cause data corruption.
> > > > > > 
> > > > > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, 
> > > > > > > loff_t offset, loff_t len)
> > > > > > >   LLONG_MAX);
> > > > > > >   if (ret)
> > > > > > >   goto out_mmap;
> > > > > > > + /*
> > > > > > > +  * We don't need to call ext4_break_layouts() because we aren't
> > > > > > > +  * removing any blocks from the inode.  We are just changing 
> > > > > > > their
> > > > > > > +  * offset by inserting a hole.
> > > > > > > +  */
> > > 
> > > Does calling ext4_break_layouts from insert range not work?
> > > 
> > > It's my understanding that file leases work are a mechanism for the
> > > filesystem to delegate some of its authority over physical space
> > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > directly into userspace and exporting space on shared storage over
> > > pNFS.  Some day we might use the same mechanism for the similar things
> > > that RDMA does, or the swapfile code since that's essentially how it
> > > works today.
> > > 
> > > The other part of these file leases is that the filesystem revokes them
> > > any time it wants to perform a mapping operation on a file.  This breaks
> > > my mental model of how leases work, and if you commit to this for ext4
> > > then I'll have to remember that leases are different between xfs and
> > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > be the implementation detail that "DAX won't care", then someone else
> > > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > > back into ext4 or else kaboom.
> > > 
> > > Granted, Dave said all these things already, but I actually feel
> > > strongly enough to reiterate.
> > 
> > Jan, would you like me to call ext4_break_layouts() in ext4_insert_range() 
> > to
> > keep the lease mechanism consistent between ext4 and XFS, or would you 
> > prefer
> > the s/ext4_break_layouts()/ext4_dax_unmap_pages()/ rename?
> 
> Let's just call it from ext4_insert_range(). I think the simple semantics
> Dave and Darrick defend is more maintainable and insert range isn't really
> performance critical operation.
> 
> The question remains whether equivalent of BREAK_UNMAP is really required
> also for allocation of new blocks using fallocate. Because that doesn't
> really seem fundamentally different from normal write which uses
> BREAK_WRITE for xfs_break_layouts(). And that it more often used operation
> so bothering with GUP synchronization when not needed could hurt there.
> Dave, Darrick?

Hmm, IIRC BREAK_UNMAP is supposed to be for callers who are going to
remove (or move) mappings that already exist, so that the caller blocks
until the lessee acknowledges that they've forgotten all the mappings
they knew about.  So I /think/ for fallocate mode 0 I think this could
be BREAK_WRITE instead of _UNMAP, though (at least for xfs) the other
modes all need _UNMAP.

Side question: in xfs_file_aio_write_checks, do we need to do
BREAK_UNMAP if is possible that writeback will end up performing a copy
write?  Granted, the pnfs export and dax stuff don't support reflink or
cow so I guess this is an academic question for now...

--D

>   Honza
> -- 
> Jan Kara 
> SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 2/2] ext4: handle layout changes to pinned DAX mappings

2018-07-09 Thread Darrick J. Wong
On Mon, Jul 09, 2018 at 11:59:07AM +0200, Lukas Czerner wrote:
> On Fri, Jul 06, 2018 at 09:29:34AM +1000, Dave Chinner wrote:
> > On Thu, Jul 05, 2018 at 01:40:17PM -0700, Dan Williams wrote:
> > > On Wed, Jul 4, 2018 at 8:59 PM, Darrick J. Wong  
> > > wrote:
> > > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > > >> On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > >> > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > >> > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > >> > > > Follow the lead of xfs_break_dax_layouts() and add 
> > > >> > > > synchronization between
> > > >> > > > operations in ext4 which remove blocks from an inode (hole 
> > > >> > > > punch, truncate
> > > >> > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > >> > > >
> > > >> > > > Signed-off-by: Ross Zwisler 
> > > >> > > > Reviewed-by: Jan Kara 
> > > >> > > > Reviewed-by: Lukas Czerner 
> > > >> > > > ---
> > > >> > > >
> > > >> > > > Changes since v2:
> > > >> > > >  * Added a comment to ext4_insert_range() explaining why we 
> > > >> > > > don't call
> > > >> > > >ext4_break_layouts(). (Jan)
> > > >> > >
> > > >> > > Which I think is wrong and will cause data corruption.
> > > >> > >
> > > >> > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode 
> > > >> > > > *inode, loff_t offset, loff_t len)
> > > >> > > > LLONG_MAX);
> > > >> > > > if (ret)
> > > >> > > > goto out_mmap;
> > > >> > > > +   /*
> > > >> > > > +* We don't need to call ext4_break_layouts() because we 
> > > >> > > > aren't
> > > >> > > > +* removing any blocks from the inode.  We are just 
> > > >> > > > changing their
> > > >> > > > +* offset by inserting a hole.
> > > >> > > > +*/
> > > >
> > > > Does calling ext4_break_layouts from insert range not work?
> > > >
> > > > It's my understanding that file leases work are a mechanism for the
> > > > filesystem to delegate some of its authority over physical space
> > > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > > directly into userspace and exporting space on shared storage over
> > > > pNFS.  Some day we might use the same mechanism for the similar things
> > > > that RDMA does, or the swapfile code since that's essentially how it
> > > > works today.
> > > >
> > > > The other part of these file leases is that the filesystem revokes them
> > > > any time it wants to perform a mapping operation on a file.  This breaks
> > > > my mental model of how leases work, and if you commit to this for ext4
> > > > then I'll have to remember that leases are different between xfs and
> > > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > > be the implementation detail that "DAX won't care", then someone else
> > > > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > > > back into ext4 or else kaboom.
> > > >
> > > > Granted, Dave said all these things already, but I actually feel
> > > > strongly enough to reiterate.
> > > 
> > > This patch kit is only for the DAX fix, this isn't full layout lease
> > > support. Even XFS is special casing unmap with the BREAK_UNMAP flag.
> > > So ext4 is achieving feature parity for BREAK_UNMAP, just not
> > > BREAK_WRITE, yet.
> > 
> > BREAK_UNMAP is issued unconditionally by XFS for all fallocate
> > operations. There is no special except for extent shifting (up or
> > down) in XFS as this patch set is making for ext4.  IOWs, this
> > patchset does not implement BREAK_UNMAP with the same semantics as
> > XFS.
> 
> If anything this is very usefull discussion ( at least for me ) and what
> I do take away from it is that there is no documentation, nor
> specification of the leases nor BREAK_UNMAP nor BREAK_WRITE.
> 
> grep -iR -e break_layout -e BREAK_UNMAP -e BREAK_WRITE Documentation/*
> 
> Maybe someone with a good understanding of how this stuff is supposed to
> be done could write it down so filesystem devs can make it behave the
> same.

Dan? :)

IIRC, BREAK_WRITE means "terminate all leases immediately" as the caller
prepares to write to a file range (which may or may not involve adding
more mappings), whereas BREAK_UNMAP means "terminate all leases and wait
until the lessee acknowledges" as the caller prepares to remove (or
move) file extent mappings.

--D

> Thanks!
> -Lukas
> 
> 
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > da...@fromorbit.com
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCHv2 1/2] libnvdimm: Use max contiguous area for namespace size

2018-07-09 Thread Keith Busch
On Fri, Jul 06, 2018 at 03:25:15PM -0700, Dan Williams wrote:
> This is going in the right direction... but still needs to account for
> the blk_overlap.
> 
> So, on a given DIMM BLK capacity is allocated from the top of DPA
> space going down and PMEM capacity is allocated from the bottom of the
> DPA space going up.
> 
> Since BLK capacity is single DIMM, and PMEM capacity is striped you
> could get into the situation where one DIMM is fully allocated for BLK
> usage and that would shade / remove the possibility to use the PMEM
> capacity on the other DIMMs in the PMEM set. PMEM needs all the same
> DPAs in all the DIMMs to be free.
> 
> >
> > ---
> > diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> > index 8d348b22ba45..f30e0c3b0282 100644
> > --- a/drivers/nvdimm/dimm_devs.c
> > +++ b/drivers/nvdimm/dimm_devs.c
> > @@ -536,6 +536,31 @@ resource_size_t nd_blk_available_dpa(struct nd_region 
> > *nd_region)
> > return info.available;
> >  }
> >
> > +/**
> > + * nd_pmem_max_contiguous_dpa - For the given dimm+region, return the max
> > + * contiguous unallocated dpa range.
> > + * @nd_region: constrain available space check to this reference region
> > + * @nd_mapping: container of dpa-resource-root + labels
> > + */
> > +resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region,
> > +  struct nd_mapping *nd_mapping)
> > +{
> > +   struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
> > +   resource_size_t max = 0;
> > +   struct resource *res;
> > +
> > +   if (!ndd)
> > +   return 0;
> > +
> > +   for_each_dpa_resource(ndd, res) {
> > +   if (strcmp(res->name, "pmem-reserve") != 0)
> > +   continue;
> > +   if (resource_size(res) > max)
> 
> ...so instead straight resource_size() here you need trim the end of
> this "pmem-reserve" resource to the start of the first BLK allocation
> in any of the DIMMs in the set.
> 
> See blk_start calculation in nd_pmem_available_dpa().

Hmm, the resources defining this are a bit inconvenient given these
constraints. If an unallocated portion of a DIMM may only be used for
BLK because an overlapping range in another DIMM is allocated that way,
would it make since to insert something like a "blk-reserve" resource
in all the other DIMMs so we don't need multiple iterations to calculate
which DPAs can be used for PMEM?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [ndctl PATCH v3] ndctl, test: add a new unit test for monitor

2018-07-09 Thread Dan Williams
On Mon, Jul 9, 2018 at 4:38 AM, Qi, Fuli  wrote:
> Hi Dan,
> Thanks for your comments.
>
>> -Original Message-
>> From: Dan Williams [mailto:dan.j.willi...@intel.com]
>> Sent: Sunday, July 8, 2018 3:56 AM
>> To: Qi, Fuli/斉 福利 
>> Cc: linux-nvdimm 
>> Subject: Re: [ndctl PATCH v3] ndctl, test: add a new unit test for monitor
>>
>> On Fri, Jul 6, 2018 at 10:35 PM, QI Fuli  wrote:
>> > Add a new unit test to test the following options of the monitor command.
>> >--dimm
>> >--bus
>> >--region
>> >--namespace
>> >--logfile
>> >--config-file
>> >
>> > Based-on-patch-by: Yasunori Goto 
>> > Signed-off-by: QI Fuli 
>> > ---
>> > v2 -> v3:
>> >  - Add filter_obj instead of hard-coded values
>> > v1 -> v2:
>> >  - Add init()
>> >  - Add get_filter_dimm() to get the filter dimms by ndctl list command
>> >instead of hard-cording
>> >  - Add sleep to call_notify()
>> >
>> >  test/Makefile.am |   3 +-
>> >  test/monitor.sh  | 134
>> > +++
>> >  2 files changed, 136 insertions(+), 1 deletion(-)  create mode 100755
>> > test/monitor.sh
>>
>> This test is still failing for me, here is the tail of the 
>> test/test-suite.log output:
>>
>> + sync
>> + sleep 3
>> + check_result
>> ++ cat /tmp/tmp.dT3TJ4l5IE
>> + jlog='nmem0: no smart support
>>
>
> My idea of [--dimm option] unit test is try to get the first dimm on bus 
> nfit_test.0, then let
> monitor to run [-d] option with it, the same as:
> ndctl monitor -b nfit_test.0 -d nmem0
> but in your environment, since the "nmem0" didn't support smart, then the 
> monitor outputted the following error message and stopped.

Right, the test is fragile if it tries to guess nmemX device names.

>
>> no dimms to monitor'
>
>> ++ jq .dimm.dev
>> ++ sort
>> ++ uniq
>> ++ sed -e ':loop; N; $!b loop; s/\n/:/g'
>> ++ sed 's/\"//g'
>> parse error: Invalid literal at line 1, column 6
>> + notify_dimms=
>> + [[ '' == '' ]]
>> + stop_monitor
>> + kill 39414
>> ./monitor.sh: line 48: kill: (39414) - No such process
>> ++ err 48
>> +++ basename ./monitor.sh
>> ++ echo test/monitor.sh: failed at line 48
>> test/monitor.sh: failed at line 48
>> ++ '[' -n '' ']'
>> ++ exit 1
>> FAIL monitor.sh (exit status: 1)
>>
>> I notice errors around get_filter_dimm(). Why is that function needed, it 
>> seems
>> redundant now that $filter_obj is being used?
>>
>
> I am sorry that the "filter_dimm" is not a suitable name, maybe changing it 
> to "monitor_dimm" will be better to understand.
> This function is used to get dimms to be monitored from "ndctl list -D -b 
> nfit_test.0 $1".
> After "./smart-notify nfit_test.0", get "notify dimms" from output 
> notifications.
> Then compare the "monitor dimms" with the "notify dimms", if same that means 
> the unit test option works well.
> I will add the a filter to make sure that the "monitor dimms" support smart 
> event in next version.

I'm confused, how does checking for smart event support get around the
fact that the test is looking for the wrong DIMM names in the first
place?

>
>> Hmm, if I just delete get_filter_dimm() I get this;
>>
>> ++ jq .dimm.dev
>> ++ sort
>> ++ uniq
>> ++ sed -e ':loop; N; $!b loop; s/\n/:/g'
>> ++ sed 's/\"//g'
>> + notify_dimms=nmem3
>> + [[ '' == nmem3 ]]
>> ++ err 34
>> +++ basename ./monitor.sh
>> ++ echo test/monitor.sh: failed at line 34
>> test/monitor.sh: failed at line 34
>> ++ '[' -n '' ']'
>> ++ exit 1
>> FAIL monitor.sh (exit status: 1)
>>
>> I assume this test is passing for you? Can you try running it inside a 
>> virtual machine
>> that has a virtual NVDIMM so you can debug the collisions between the 
>> nfit_test.0
>> bus objects and the ACPI.NFIT ones?
>>
>
[..]

>> > +{
>> > +   jlist=$($NDCTL list -D -b $NFIT_TEST_BUS0 $1)
>> > +   filter_dimms=$(jq '.[]."dev"?, ."dev"?' <<<$jlist | sort |
>> > +uniq | sed -e ':loop; N; $!b loop; s/\n/:/g' | sed 's/\"//g') }
>>
>> Can we just remove this?
>
> As I mentioned above, I don't think so.

Again, I don't see how you can achieve the verification you want
without first discovering the names of the possible DIMMs on
nfit_test.0. There is no reliable way that you can hard code the DIMM
names.

>
>>
>> > +
>> > +call_notify()
>> > +{
>> > +   ./smart-notify $NFIT_TEST_BUS0
>> > +   sync; sleep 3
>> > +}
>> > +
>> > +check_result()
>> > +{
>> > +   jlog=$(cat $logfile)
>> > +   notify_dimms=$(jq ."dimm"."dev" <<<$jlog | sort | uniq | sed
>> > +-e ':loop; N; $!b loop; s/\n/:/g' | sed 's/\"//g')
>>
>> Can you add a comment here about that sed script is trying to do? If it's 
>> just filtering
>> json I'd rather it just jq directly.
>
> These sed scripts are used to format "monitor dimms" and "notify dimms".
> The first one is task to replace "\n" with ":", and the second one is used to 
> remove ' " '.
> After these processes, the format will become "nmem0:nmem1:nmem2".

That shell script just seems too dense for what it is trying to do.
How about something like:


[ndctl PATCH v10 1/4] ndctl, monitor: add a new command - monitor

2018-07-09 Thread QI Fuli
Ndctl monitor is used for monitoring the smart events of nvdimm DIMMs.
When a smart event fires, monitor will output the notifications which
include dimm health status and event informations to syslog or a
logfile by setting [--logfile] option. The notifications follow json
format and can be consumed by log collectors like Fluentd.

The objects to monitor can be selected by setting [--dimm] [--region]
[--namespace] [--bus] options and the event type can be filtered by
setting [--dimm-event] option. These options support multiple
space-separated arguments.

Ndctl monitor can be forked as a daemon by using [--daemon] option,
such as:
   # ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log

Signed-off-by: QI Fuli 
---
 builtin.h  |   1 +
 ndctl/Makefile.am  |   3 +-
 ndctl/lib/libndctl.c   |  82 +++
 ndctl/lib/libndctl.sym |   4 +
 ndctl/libndctl.h   |  10 +
 ndctl/monitor.c| 508 +
 ndctl/ndctl.c  |   1 +
 util/filter.h  |   9 +
 8 files changed, 617 insertions(+), 1 deletion(-)
 create mode 100644 ndctl/monitor.c

diff --git a/builtin.h b/builtin.h
index d3cc723..675a6ce 100644
--- a/builtin.h
+++ b/builtin.h
@@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char **argv, void *ctx);
 int cmd_wait_scrub(int argc, const char **argv, void *ctx);
 int cmd_start_scrub(int argc, const char **argv, void *ctx);
 int cmd_list(int argc, const char **argv, void *ctx);
+int cmd_monitor(int argc, const char **argv, void *ctx);
 #ifdef ENABLE_TEST
 int cmd_test(int argc, const char **argv, void *ctx);
 #endif
diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
index d22a379..7dbf223 100644
--- a/ndctl/Makefile.am
+++ b/ndctl/Makefile.am
@@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
util/json-smart.c \
util/json-firmware.c \
inject-error.c \
-   inject-smart.c
+   inject-smart.c \
+   monitor.c
 
 if ENABLE_DESTRUCTIVE
 ndctl_SOURCES += ../test/blk_namespaces.c \
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index 47e005e..969e4aa 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -1635,6 +1635,88 @@ NDCTL_EXPORT int ndctl_dimm_get_health_eventfd(struct 
ndctl_dimm *dimm)
return dimm->health_eventfd;
 }
 
+NDCTL_EXPORT unsigned int ndctl_dimm_get_health(struct ndctl_dimm *dimm)
+{
+   struct ndctl_cmd *cmd = NULL;
+   unsigned int health;
+   struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(dimm);
+   const char *devname = ndctl_dimm_get_devname(dimm);
+
+   cmd = ndctl_dimm_cmd_new_smart(dimm);
+   if (!cmd) {
+   err(ctx, "%s: no smart command support\n", devname);
+   return UINT_MAX;
+   }
+   if (ndctl_cmd_submit(cmd)) {
+   err(ctx, "%s: smart command failed\n", devname);
+   return UINT_MAX;
+   }
+
+   health = ndctl_cmd_smart_get_health(cmd);
+   ndctl_cmd_unref(cmd);
+   return health;
+}
+
+NDCTL_EXPORT unsigned int ndctl_dimm_get_flags(struct ndctl_dimm *dimm)
+{
+   struct ndctl_cmd *cmd = NULL;
+   unsigned int flags;
+   struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(dimm);
+   const char *devname = ndctl_dimm_get_devname(dimm);
+
+   cmd = ndctl_dimm_cmd_new_smart(dimm);
+   if (!cmd) {
+   dbg(ctx, "%s: no smart command support\n", devname);
+   return UINT_MAX;
+   }
+   if (ndctl_cmd_submit(cmd)) {
+   dbg(ctx, "%s: smart command failed\n", devname);
+   return UINT_MAX;
+   }
+
+   flags = ndctl_cmd_smart_get_flags(cmd);
+   ndctl_cmd_unref(cmd);
+   return flags;
+}
+
+NDCTL_EXPORT int ndctl_dimm_is_flag_supported(struct ndctl_dimm *dimm,
+   unsigned int flag)
+{
+   unsigned int flags = ndctl_dimm_get_flags(dimm);
+   return (flags ==  UINT_MAX) ? 0 : !!(flags & flag);
+}
+
+NDCTL_EXPORT unsigned int ndctl_dimm_get_event_flags(struct ndctl_dimm *dimm)
+{
+   struct ndctl_cmd *cmd = NULL;
+   unsigned int alarm_flags, event_flags = 0;
+   struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(dimm);
+   const char *devname = ndctl_dimm_get_devname(dimm);
+
+   cmd = ndctl_dimm_cmd_new_smart(dimm);
+   if (!cmd) {
+   err(ctx, "%s: no smart command support\n", devname);
+   return UINT_MAX;
+   }
+   if (ndctl_cmd_submit(cmd)) {
+   err(ctx, "%s: smart command failed\n", devname);
+   return UINT_MAX;
+   }
+
+   alarm_flags = ndctl_cmd_smart_get_alarm_flags(cmd);
+   if (alarm_flags & ND_SMART_SPARE_TRIP)
+   event_flags |= ND_EVENT_SPARES_REMAINING;
+   if (alarm_flags & ND_SMART_MTEMP_TRIP)
+   event_flags |= ND_EVENT_MEDIA_TEMPERATURE;
+   if (alarm_flags & ND_SMART_CTEMP_TRIP)
+   event_flags |= ND_EVENT_CTRL_TEMPERATURE;
+   if 

[ndctl PATCH v10 3/4] ndctl, monitor: add the unit file of systemd for ndctl-monitor service

2018-07-09 Thread QI Fuli
This patch adds the systemd unit file for ndctl-monitor service.
The systemd unit directory can be configured by setting environment
variable "--with-systemd-unit-dir[=DIR]".

A monitor daemon can be started as a system service:
   # systemctl start ndctl-monitor.service

Signed-off-by: QI Fuli 
---
 autogen.sh  |  3 ++-
 configure.ac| 22 ++
 ndctl/Makefile.am   |  4 
 ndctl/ndctl-monitor.service |  7 +++
 4 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 ndctl/ndctl-monitor.service

diff --git a/autogen.sh b/autogen.sh
index 2a52688..21b0e25 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -17,7 +17,8 @@ libdir() {
 
 args="--prefix=/usr \
 --sysconfdir=/etc \
---libdir=$(libdir /usr/lib)"
+--libdir=$(libdir /usr/lib) \
+--with-systemd-unit-dir"
 
 echo
 echo ""
diff --git a/configure.ac b/configure.ac
index cf44260..a5ba9a1 100644
--- a/configure.ac
+++ b/configure.ac
@@ -143,6 +143,27 @@ AC_CHECK_FUNCS([ \
secure_getenv\
 ])
 
+PKG_PROG_PKG_CONFIG
+AC_ARG_WITH([systemd-unit-dir],
+   AS_HELP_STRING([--with-systemd-unit-dir[=DIR]],
+   [Directory for systemd service files]),
+   [],
+   [with_systemd_unit_dir=yes])
+
+if test "x$with_systemd_unit_dir" = "xyes"; then
+   def_systemd_unit_dir=$($PKG_CONFIG --variable=systemdsystemunitdir 
systemd)
+   if test "x$def_systemd_unit_dir" = "x"; then
+   AC_MSG_ERROR([systemd support requested but pkg-config unable 
to query systemd package])
+   with_systemd_unit_dir=no
+   else
+   with_systemd_unit_dir="$def_systemd_unit_dir"
+   fi
+fi
+
+AS_IF([test "x$with_systemd_unit_dir" != "xno"],
+   [AC_SUBST([systemd_unitdir], [$with_systemd_unit_dir])])
+AM_CONDITIONAL([ENABLE_SYSTEMD_UNIT_DIR], [test "x$with_systemd_unit_dir" != 
"xno"])
+
 my_CFLAGS="\
 -Wall \
 -Wchar-subscripts \
@@ -184,6 +205,7 @@ AC_MSG_RESULT([
 sysconfdir: ${sysconfdir}
 libdir: ${libdir}
 includedir: ${includedir}
+   systemd-unit-dir:   ${systemd_unitdir}
 
 compiler:   ${CC}
 cflags: ${CFLAGS}
diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
index ae3d894..9d008d5 100644
--- a/ndctl/Makefile.am
+++ b/ndctl/Makefile.am
@@ -47,3 +47,7 @@ monitor_config_file = monitor.conf
 monitor_configdir = /etc/ndctl/
 monitor_config_DATA = $(monitor_config_file)
 EXTRA_DIST += $(monitor_config_file)
+
+if ENABLE_SYSTEMD_UNIT_DIR
+systemd_unit_DATA = ndctl-monitor.service
+endif
diff --git a/ndctl/ndctl-monitor.service b/ndctl/ndctl-monitor.service
new file mode 100644
index 000..44f9326
--- /dev/null
+++ b/ndctl/ndctl-monitor.service
@@ -0,0 +1,7 @@
+[Unit]
+Description=Ndctl Monitor Daemon
+
+[Service]
+Type=forking
+ExecStart=/usr/bin/ndctl monitor --daemon
+ExecStop=/bin/kill ${MAINPID}
-- 
2.18.0


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v10 2/4] ndctl, monitor: add main ndctl monitor configuration file

2018-07-09 Thread QI Fuli
This patch adds the main configuration file(/etc/ndctl/monitor.conf)
of ndctl monitor. It contains the configuration directives that give
ndctl monitor instructions. Users can change the configuration by
editing this file or using [--config-file] option to override this
file. The changed value will work after restart ndctl monitor service.

Signed-off-by: QI Fuli 
---
 ndctl/Makefile.am  |   5 ++
 ndctl/monitor.c| 115 +
 ndctl/monitor.conf |  41 
 3 files changed, 161 insertions(+)
 create mode 100644 ndctl/monitor.conf

diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
index 7dbf223..ae3d894 100644
--- a/ndctl/Makefile.am
+++ b/ndctl/Makefile.am
@@ -42,3 +42,8 @@ ndctl_SOURCES += ../test/libndctl.c \
 ../test/multi-pmem.c \
 ../test/core.c
 endif
+
+monitor_config_file = monitor.conf
+monitor_configdir = /etc/ndctl/
+monitor_config_DATA = $(monitor_config_file)
+EXTRA_DIST += $(monitor_config_file)
diff --git a/ndctl/monitor.c b/ndctl/monitor.c
index 700bd22..83f22af 100644
--- a/ndctl/monitor.c
+++ b/ndctl/monitor.c
@@ -19,6 +19,7 @@
 
 static struct monitor {
const char *logfile;
+   const char *config_file;
const char *dimm_event;
bool daemon;
unsigned int event_flags;
@@ -429,6 +430,109 @@ dimm_event_all:
return 0;
 }
 
+static void parse_config(const char **arg, char *key, char *val, char *ident)
+{
+   struct strbuf value = STRBUF_INIT;
+   size_t arg_len = *arg ? strlen(*arg) : 0;
+
+   if (!ident || !key || (strcmp(ident, key) != 0))
+   return;
+
+   if (arg_len) {
+   strbuf_add(, *arg, arg_len);
+   strbuf_addstr(, " ");
+   }
+   strbuf_addstr(, val);
+   *arg = strbuf_detach(, NULL);
+}
+
+static int read_config_file(struct ndctl_ctx *ctx, struct monitor *_monitor,
+   struct util_filter_params *_param)
+{
+   FILE *f;
+   int line = 0;
+   size_t len = 0;
+   char *buf, *value, *config_file;
+   const char *def_config_file = "/etc/ndctl/monitor.conf";
+
+   if (_monitor->config_file)
+   config_file = strdup(_monitor->config_file);
+   else
+   config_file = strdup(def_config_file);
+   if (!config_file) {
+   fail("strdup default config file failed\n");
+   goto out;
+   }
+
+   buf = malloc(BUF_SIZE);
+   if (!buf) {
+   fail("malloc read config-file buf error\n");
+   goto out;
+   }
+
+   f = fopen(config_file, "r");
+   if (!f) {
+   fail("config-file: %s cannot be opened\n", config_file);
+   goto out;
+   }
+
+   while (fgets(buf, BUF_SIZE, f)) {
+   value = NULL;
+   line++;
+
+   while (isspace(*buf))
+   buf++;
+
+   if (*buf == '#' || *buf == '\0')
+   continue;
+
+   value = strchr(buf, '=');
+   if (!value) {
+   fail("config-file syntax error, skip line[%i]\n", line);
+   continue;
+   }
+
+   value[0] = '\0';
+   value++;
+
+   while (isspace(value[0]))
+   value++;
+
+   len = strlen(buf);
+   if (len == 0)
+   continue;
+   while (isspace(buf[len-1]))
+   len--;
+   buf[len] = '\0';
+
+   len = strlen(value);
+   if (len == 0)
+   continue;
+   while (isspace(value[len-1]))
+   len--;
+   value[len] = '\0';
+
+   if (len == 0)
+   continue;
+
+   parse_config(&_param->bus, "bus", value, buf);
+   parse_config(&_param->dimm, "dimm", value, buf);
+   parse_config(&_param->region, "region", value, buf);
+   parse_config(&_param->namespace, "namespace", value, buf);
+   parse_config(&_monitor->dimm_event, "dimm-event", value, buf);
+
+   if (!_monitor->logfile)
+   parse_config(&_monitor->logfile, "logfile", value, buf);
+   }
+   fclose(f);
+   free(config_file);
+   return 0;
+out:
+   if (config_file)
+   free(config_file);
+   return 1;
+}
+
 int cmd_monitor(int argc, const char **argv, void *ctx)
 {
const struct option options[] = {
@@ -441,6 +545,8 @@ int cmd_monitor(int argc, const char **argv, void *ctx)
"namespace-id", "filter by namespace id"),
OPT_FILENAME('l', "logfile", , "file | syslog",
"where to output the monitor's notification"),
+   OPT_FILENAME('c', "config-file", _file,
+   "config-file", "override the default 

[ndctl PATCH v10 4/4] ndctl, documentation: add manpage for monitor

2018-07-09 Thread QI Fuli
This patch is used to add manpage for ndctl monitor command.

Signed-off-by: QI Fuli 
---
 Documentation/ndctl/Makefile.am   |  3 +-
 Documentation/ndctl/ndctl-monitor.txt | 96 +++
 2 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ndctl/ndctl-monitor.txt

diff --git a/Documentation/ndctl/Makefile.am b/Documentation/ndctl/Makefile.am
index 4fd9636..a30b139 100644
--- a/Documentation/ndctl/Makefile.am
+++ b/Documentation/ndctl/Makefile.am
@@ -46,7 +46,8 @@ man1_MANS = \
ndctl-inject-error.1 \
ndctl-inject-smart.1 \
ndctl-update-firmware.1 \
-   ndctl-list.1
+   ndctl-list.1 \
+   ndctl-monitor.1
 
 CLEANFILES = $(man1_MANS)
 
diff --git a/Documentation/ndctl/ndctl-monitor.txt 
b/Documentation/ndctl/ndctl-monitor.txt
new file mode 100644
index 000..037fdc7
--- /dev/null
+++ b/Documentation/ndctl/ndctl-monitor.txt
@@ -0,0 +1,96 @@
+ndctl-monitor(1)
+
+
+NAME
+
+ndctl-monitor - Monitor the smart events of nvdimm objects
+
+SYNOPSIS
+
+[verse]
+'ndctl monitor' []
+
+DESCRIPTION
+---
+Ndctl monitor is used for monitoring the smart events of nvdimm
+objects and dumping the json format notifications to syslog or
+a logfile.
+
+The objects to monitor and smart evnets to notify can be selected
+by setting options and/or the default configuration file
+(/etc/ndctl/monitor.conf). Both of the values in configuration file
+and in options will work. If conflict, the values in options will
+override the values in configuration file. The changed values in
+configuration file will work after restart ndctl monitor.
+
+EXAMPLES
+
+
+Run a monitor as a daemon to monitor DIMMs on a provider named ndbus1
+[verse]
+ndctl monitor -b ndbus1 -f
+
+Run a monitor as a one-shot command and output the notifications to
+/var/log/ndctl.log
+[verse]
+ndctl monitor -l /var/log/ndctl.log
+
+Run a monitor daemon as a system service
+[verse]
+systemctl start ndctl-monitor.service
+
+OPTIONS
+---
+-b::
+--bus=::
+   Enforce that the operation only be carried on devices that are
+   attached to the given bus. Where 'bus' can be a provider name
+   or a bus id number.
+
+-d::
+--dimm=::
+   A 'nmemX' device name, or dimm id number. Select the devices to
+   monitor refrerence the given dimm.
+
+-r::
+--region=::
+   A 'regionX' device name, or a region id number. The keyword 'all'
+   can be specified to carry out the operation on every region in
+   the system, optionally filtered by bus id (see --bus= option).
+
+-n::
+--namespace=::
+   A 'namespaceX.Y' device name, or namespace region plus id tuple
+   'X.Y'.
+
+-l ::
+--logfile=::
+   Output notifications to  or syslog.
+
+-f::
+--daemon::
+   Run a monitor as a daemon.
+
+-D::
+--dimm-event=::
+   A name of smart events come NVDIMM DIMMs.
+   - "dimm-spares-remaining": Spare Blocks Remaining value has gone
+  below the pre-programmed threshold limit
+   - "dimm-media-temperature": NVDIMM Media temperature value has
+  gone above the pre-programmed threshold limit
+   - "dimm-controller-temperature": NVDIMM Controller temperature
+  value has gone above the pre-programmed threshold limit
+   - "dimm-health-state": NVDIMM Normal Health Status has changed
+   - "dimm-unclean-shutdown": NVDIMM Last Shutdown Status was a dirty
+  shutdown.
+
+COPYRIGHT
+-
+Copyright (c) 2018, FUJITSU LIMITED. License GPLv2: GNU GPL version 2
+. This is free software: you are
+free to change and redistribute it. There is NO WARRANTY, to the
+extent permitted by law.
+
+SEE ALSO
+
+linkndctl:ndctl-list[1]
-- 
2.18.0


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH v10 0/4] ndctl, monitor: add ndctl monitor daemon

2018-07-09 Thread QI Fuli
This is the v10 patch for ndctl monitor, a tiny daemon to monitor
the smart events of nvdimm DIMMs. Since NVDIMM does not have a
feature like mirroring, if it breaks down, the data will be
impossible to restore. Ndctl monitor daemon will catch the smart
events notify from firmware and outputs notification to logfile,
therefore users can replace NVDIMM before it is completely broken.

Signed-off-by: QI Fuli 
---
Change log since v9:
 - Replacing ndctl_cmd_smart_get_event_flags() with
   ndctl_dimm_get_event_flags()
 - Adding ndctl_dimm_get_health() api
 - Adding ndctl_dimm_get_flags() api
 - Adding ndctl_dimm_is_flag_supported api
 - Adding manpage

Change log since v8:
 - Adding ndctl_cmd_smart_get_event_flags() api
 - Adding monitor_filter_arg to the union in util_filter_ctx
 - Removing is_dir()
 - Replacing malloc + vsprintf with vasprintf() in log_file() and log_syslog()
 - Adding parse_monitor_event()
 - Refactoring util_dimm_event_filter()
 - Adding event_flags to monitor
 - Refactoring dimm_event_to_json()
 - Adding check_dimm_supported_threshold_alarms()
 - Fixing fail token

Change log since v7:
 - Replacing logreport() with log_file() and log_syslog()
 - Refactoring read_config_file()
 - Replacing set_confile() with parse_config()
 - Fixing the ndctl/ndct.conf file

Change log since v6:
 - Changing License to GPL-2.0
 - Adding event object to output notification
 - Adding [--dimm-event] option to filter notification by event type
 - Rewriting read_config_file()
 - Replacing monitor_dimm_event() with monitor_event()
 - Renaming some variables

Change log since v5:
 - Fixing systemd unit file cannot be installed bug
 - Adding license to ./util/abspath.c

Change log since v4:
 - Adding OPTION_FILENAME to make sure filename is correct
 - Adding configuration file
 - Adding [--config-file] option to override the default configuration
 - Making some options support multiple space-seperated arguments
 - Making systemctl enable ndctl-monitor.service command work
 - Making systemctl restart ndctl-monitor.service command work
 - Making the directory of systemd unit file to be configurable
 - Changing log_file() and log_syslog() to logreport()
 - Changing date format in notification to nanoseconds since epoch
 - Changing select() to epoll()
 - Adding filter_bus() and filter_region()

Change log since v3:
 - Removing create-monitor, show-monitor, list-monitor, destroy-monitor
 - Adding [--daemon] option to run ndctl monitor as a daemon 
 - Using systemd to manage ndctl monitor daemon
 - Replacing filter_monitor_dimm() with filter_dimm()

Change log since v2:
 - Changing the interface of daemon to the ndctl command line
 - Changing the name of daemon form "nvdimmd" to "monitor"
 - Removing the config file, unit_file, nvdimmd dir
 - Removing nvdimmd_test program
 - Adding ndctl/monitor.c

Change log since v1:
 - Adding a config file(/etc/nvdimmd/nvdimmd.conf)
 - Using struct log_ctx instead of syslog()
- Using log_syslog() to save the notify messages to syslog
- Using log_file() to save the notify messages to special file
 - Adding LOG_NOTICE level to log_priority
 - Using automake instead of Makefile
 - Adding a new util file(nvdimmd/util.c) including helper functions
   needed for nvdimm daemon
 - Adding nvdimmd_test program

QI Fuli (4):
  ndctl, monitor: add a new command - monitor
  ndctl, monitor: add main ndctl monitor configuration file
  ndctl, monitor: add the unit file of systemd for ndctl-monitor service
  ndctl, documentation: add manpage for monitor

 Documentation/ndctl/Makefile.am   |   3 +-
 Documentation/ndctl/ndctl-monitor.txt |  96 
 autogen.sh|   3 +-
 builtin.h |   1 +
 configure.ac  |  22 +
 ndctl/Makefile.am |  12 +-
 ndctl/lib/libndctl.c  |  82 
 ndctl/lib/libndctl.sym|   4 +
 ndctl/libndctl.h  |  10 +
 ndctl/monitor.c   | 623 ++
 ndctl/monitor.conf|  41 ++
 ndctl/ndctl-monitor.service   |   7 +
 ndctl/ndctl.c |   1 +
 util/filter.h |   9 +
 14 files changed, 911 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/ndctl/ndctl-monitor.txt
 create mode 100644 ndctl/monitor.c
 create mode 100644 ndctl/monitor.conf
 create mode 100644 ndctl/ndctl-monitor.service

-- 
2.18.0


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 00/13] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE

2018-07-09 Thread Jan Kara
On Wed 04-07-18 23:49:02, Dan Williams wrote:
> In order to keep pfn_to_page() a simple offset calculation the 'struct
> page' memmap needs to be mapped and initialized in advance of any usage
> of a page. This poses a problem for large memory systems as it delays
> full availability of memory resources for 10s to 100s of seconds.
> 
> For typical 'System RAM' the problem is mitigated by the fact that large
> memory allocations tend to happen after the kernel has fully initialized
> and userspace services / applications are launched. A small amount, 2GB
> of memory, is initialized up front. The remainder is initialized in the
> background and freed to the page allocator over time.
> 
> Unfortunately, that scheme is not directly reusable for persistent
> memory and dax because userspace has visibility to the entire resource
> pool and can choose to access any offset directly at its choosing. In
> other words there is no allocator indirection where the kernel can
> satisfy requests with arbitrary pages as they become initialized.
> 
> That said, we can approximate the optimization by performing the
> initialization in the background, allow the kernel to fully boot the
> platform, start up pmem block devices, mount filesystems in dax mode,
> and only incur the delay at the first userspace dax fault.
> 
> With this change an 8 socket system was observed to initialize pmem
> namespaces in ~4 seconds whereas it was previously taking ~4 minutes.
> 
> These patches apply on top of the HMM + devm_memremap_pages() reworks
> [1]. Andrew, once the reviews come back, please consider this series for
> -mm as well.
> 
> [1]: https://lkml.org/lkml/2018/6/19/108

One question: Why not (in addition to background initialization) have
->direct_access() initialize a block of struct pages around the pfn it
needs if it finds it's not initialized yet? That would make devices usable
immediately without waiting for init to complete...

Honza
> 
> ---
> 
> Dan Williams (9):
>   mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone()
>   mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages()
>   mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages
>   mm: Multithread ZONE_DEVICE initialization
>   mm: Allow an external agent to wait for memmap initialization
>   filesystem-dax: Make mount time pfn validation a debug check
>   libnvdimm, pmem: Initialize the memmap in the background
>   device-dax: Initialize the memmap in the background
>   libnvdimm, namespace: Publish page structure init state / control
> 
> Huaisheng Ye (4):
>   nvdimm/pmem: check the validity of the pointer pfn
>   nvdimm/pmem-dax: check the validity of the pointer pfn
>   s390/block/dcssblk: check the validity of the pointer pfn
>   fs/dax: Assign NULL to pfn of dax_direct_access if useless
> 
> 
>  arch/ia64/mm/init.c |5 +
>  arch/powerpc/mm/mem.c   |5 +
>  arch/s390/mm/init.c |8 +
>  arch/sh/mm/init.c   |5 +
>  arch/x86/mm/init_32.c   |8 +
>  arch/x86/mm/init_64.c   |   27 +++--
>  drivers/dax/Kconfig |   10 ++
>  drivers/dax/dax-private.h   |2 
>  drivers/dax/device-dax.h|2 
>  drivers/dax/device.c|   16 +++
>  drivers/dax/pmem.c  |5 +
>  drivers/dax/super.c |   64 +++-
>  drivers/nvdimm/nd.h |2 
>  drivers/nvdimm/pfn_devs.c   |   54 --
>  drivers/nvdimm/pmem.c   |   17 ++-
>  drivers/nvdimm/pmem.h   |1 
>  drivers/s390/block/dcssblk.c|5 +
>  fs/dax.c|   10 +-
>  include/linux/memmap_async.h|   55 ++
>  include/linux/memory_hotplug.h  |   18 ++-
>  include/linux/memremap.h|   31 ++
>  include/linux/mm.h  |8 +
>  kernel/memremap.c   |   85 ---
>  mm/memory_hotplug.c |   73 ++---
>  mm/page_alloc.c |  215 
> +--
>  mm/sparse-vmemmap.c |   56 --
>  tools/testing/nvdimm/pmem-dax.c |   11 ++
>  27 files changed, 610 insertions(+), 188 deletions(-)
>  create mode 100644 include/linux/memmap_async.h
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 3/3] kvm: add a function to check if page is from NVDIMM pmem.

2018-07-09 Thread Jan Kara
On Thu 05-07-18 21:19:30, Zhang,Yi wrote:
> 
> 
> On 2018年07月04日 23:25, Paolo Bonzini wrote:
> > On 04/07/2018 17:30, Zhang Yi wrote:
> >> For device specific memory space, when we move these area of pfn to
> >> memory zone, we will set the page reserved flag at that time, some of
> >> these reserved for device mmio, and some of these are not, such as
> >> NVDIMM pmem.
> >>
> >> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> >> backend, since these pages are reserved. the check of
> >> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> >> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> >> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> >> as normal pages.
> >>
> >> Without this patch, Many operations will be missed due to this
> >> mistreatment to pmem pages. For example, a page may not have chance to
> >> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> >> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
> >>
> >> Signed-off-by: Zhang Yi 
> >> Signed-off-by: Zhang Yu 
> >> ---
> >>  virt/kvm/kvm_main.c | 17 +++--
> >>  1 file changed, 15 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index afb2e6e..1365d18 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -140,10 +140,23 @@ __weak void 
> >> kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> >>  {
> >>  }
> >>  
> >> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
> >> +{
> >> +  struct page *page = pfn_to_page(pfn);
> >> +
> >> +  return is_zone_device_page(page) &&
> >> +  ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
> >> +   (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
> >> +}
> > If the mm people agree, I'd prefer something that takes a struct page *
> > and is exported by include/linux/mm.h.  Then KVM can just do something like
> >
> > struct page *page;
> > if (!pfn_valid(pfn))
> > return true;
> >
> > page = pfn_to_page(pfn);
> > return PageReserved(page) && !is_dax_page(page);
> >
> > Thanks,
> >
> > Paolo
> Yeah, that could be much better. Thanks for your comments Paolo.
> 
> Hi Kara, Do u have Any opinions/ideas to add such definition in mm?

What Paolo suggests sounds good to me.

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 2/2] ext4: handle layout changes to pinned DAX mappings

2018-07-09 Thread Jan Kara
On Thu 05-07-18 10:53:10, Ross Zwisler wrote:
> On Wed, Jul 04, 2018 at 08:59:52PM -0700, Darrick J. Wong wrote:
> > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > > On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > > > > Follow the lead of xfs_break_dax_layouts() and add synchronization 
> > > > > > between
> > > > > > operations in ext4 which remove blocks from an inode (hole punch, 
> > > > > > truncate
> > > > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > > > > 
> > > > > > Signed-off-by: Ross Zwisler 
> > > > > > Reviewed-by: Jan Kara 
> > > > > > Reviewed-by: Lukas Czerner 
> > > > > > ---
> > > > > > 
> > > > > > Changes since v2:
> > > > > >  * Added a comment to ext4_insert_range() explaining why we don't 
> > > > > > call
> > > > > >ext4_break_layouts(). (Jan)
> > > > > 
> > > > > Which I think is wrong and will cause data corruption.
> > > > > 
> > > > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, 
> > > > > > loff_t offset, loff_t len)
> > > > > > LLONG_MAX);
> > > > > > if (ret)
> > > > > > goto out_mmap;
> > > > > > +   /*
> > > > > > +* We don't need to call ext4_break_layouts() because we aren't
> > > > > > +* removing any blocks from the inode.  We are just changing 
> > > > > > their
> > > > > > +* offset by inserting a hole.
> > > > > > +*/
> > 
> > Does calling ext4_break_layouts from insert range not work?
> > 
> > It's my understanding that file leases work are a mechanism for the
> > filesystem to delegate some of its authority over physical space
> > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > directly into userspace and exporting space on shared storage over
> > pNFS.  Some day we might use the same mechanism for the similar things
> > that RDMA does, or the swapfile code since that's essentially how it
> > works today.
> > 
> > The other part of these file leases is that the filesystem revokes them
> > any time it wants to perform a mapping operation on a file.  This breaks
> > my mental model of how leases work, and if you commit to this for ext4
> > then I'll have to remember that leases are different between xfs and
> > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > be the implementation detail that "DAX won't care", then someone else
> > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > back into ext4 or else kaboom.
> > 
> > Granted, Dave said all these things already, but I actually feel
> > strongly enough to reiterate.
> 
> Jan, would you like me to call ext4_break_layouts() in ext4_insert_range() to
> keep the lease mechanism consistent between ext4 and XFS, or would you prefer
> the s/ext4_break_layouts()/ext4_dax_unmap_pages()/ rename?

Let's just call it from ext4_insert_range(). I think the simple semantics
Dave and Darrick defend is more maintainable and insert range isn't really
performance critical operation.

The question remains whether equivalent of BREAK_UNMAP is really required
also for allocation of new blocks using fallocate. Because that doesn't
really seem fundamentally different from normal write which uses
BREAK_WRITE for xfs_break_layouts(). And that it more often used operation
so bothering with GUP synchronization when not needed could hurt there.
Dave, Darrick?

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [ndctl PATCH v3] ndctl, test: add a new unit test for monitor

2018-07-09 Thread Qi, Fuli
Hi Dan,
Thanks for your comments.

> -Original Message-
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Sunday, July 8, 2018 3:56 AM
> To: Qi, Fuli/斉 福利 
> Cc: linux-nvdimm 
> Subject: Re: [ndctl PATCH v3] ndctl, test: add a new unit test for monitor
> 
> On Fri, Jul 6, 2018 at 10:35 PM, QI Fuli  wrote:
> > Add a new unit test to test the following options of the monitor command.
> >--dimm
> >--bus
> >--region
> >--namespace
> >--logfile
> >--config-file
> >
> > Based-on-patch-by: Yasunori Goto 
> > Signed-off-by: QI Fuli 
> > ---
> > v2 -> v3:
> >  - Add filter_obj instead of hard-coded values
> > v1 -> v2:
> >  - Add init()
> >  - Add get_filter_dimm() to get the filter dimms by ndctl list command
> >instead of hard-cording
> >  - Add sleep to call_notify()
> >
> >  test/Makefile.am |   3 +-
> >  test/monitor.sh  | 134
> > +++
> >  2 files changed, 136 insertions(+), 1 deletion(-)  create mode 100755
> > test/monitor.sh
> 
> This test is still failing for me, here is the tail of the 
> test/test-suite.log output:
> 
> + sync
> + sleep 3
> + check_result
> ++ cat /tmp/tmp.dT3TJ4l5IE
> + jlog='nmem0: no smart support
> 

My idea of [--dimm option] unit test is try to get the first dimm on bus 
nfit_test.0, then let
monitor to run [-d] option with it, the same as:
ndctl monitor -b nfit_test.0 -d nmem0
but in your environment, since the "nmem0" didn't support smart, then the 
monitor outputted the following error message and stopped.

> no dimms to monitor'

> ++ jq .dimm.dev
> ++ sort
> ++ uniq
> ++ sed -e ':loop; N; $!b loop; s/\n/:/g'
> ++ sed 's/\"//g'
> parse error: Invalid literal at line 1, column 6
> + notify_dimms=
> + [[ '' == '' ]]
> + stop_monitor
> + kill 39414
> ./monitor.sh: line 48: kill: (39414) - No such process
> ++ err 48
> +++ basename ./monitor.sh
> ++ echo test/monitor.sh: failed at line 48
> test/monitor.sh: failed at line 48
> ++ '[' -n '' ']'
> ++ exit 1
> FAIL monitor.sh (exit status: 1)
> 
> I notice errors around get_filter_dimm(). Why is that function needed, it 
> seems
> redundant now that $filter_obj is being used?
> 

I am sorry that the "filter_dimm" is not a suitable name, maybe changing it to 
"monitor_dimm" will be better to understand.
This function is used to get dimms to be monitored from "ndctl list -D -b 
nfit_test.0 $1".
After "./smart-notify nfit_test.0", get "notify dimms" from output 
notifications.
Then compare the "monitor dimms" with the "notify dimms", if same that means 
the unit test option works well.
I will add the a filter to make sure that the "monitor dimms" support smart 
event in next version.

> Hmm, if I just delete get_filter_dimm() I get this;
> 
> ++ jq .dimm.dev
> ++ sort
> ++ uniq
> ++ sed -e ':loop; N; $!b loop; s/\n/:/g'
> ++ sed 's/\"//g'
> + notify_dimms=nmem3
> + [[ '' == nmem3 ]]
> ++ err 34
> +++ basename ./monitor.sh
> ++ echo test/monitor.sh: failed at line 34
> test/monitor.sh: failed at line 34
> ++ '[' -n '' ']'
> ++ exit 1
> FAIL monitor.sh (exit status: 1)
> 
> I assume this test is passing for you? Can you try running it inside a 
> virtual machine
> that has a virtual NVDIMM so you can debug the collisions between the 
> nfit_test.0
> bus objects and the ACPI.NFIT ones?
> 

I passed this unit test on a virtual machine, which has a virtual NVDIMM.
Here is the buses and dimms list of my test environment.

$ sudo ndctl list -BD
[
  {
"provider":"nfit_test.0",
"dev":"ndbus1",
"scrub_state":"idle",
"dimms":[
  {
"dev":"nmem1",
"id":"cdab-0a-07e0-feff",
"handle":1,
"phys_id":1
  },
  {
"dev":"nmem3",
"id":"cdab-0a-07e0-fefe",
"handle":257,
"phys_id":3
  },
  {
"dev":"nmem0",
"id":"cdab-0a-07e0-",
"handle":0,
"phys_id":0
  },
  {
"dev":"nmem2",
"id":"cdab-0a-07e0-fffe",
"handle":256,
"phys_id":2
  }
]
  },
  {
"provider":"e820",
"dev":"ndbus0"
  },
  {
"provider":"nfit_test.1",
"dev":"ndbus2",
"scrub_state":"idle",
"dimms":[
  {
"dev":"nmem5",
"id":"cdab-0a-07e0-fefffeff",
"handle":65537,
"phys_id":0,
"flag_failed_map":true
  },
  {
"dev":"nmem4",
"id":"cdab-0a-07e0-feff",
"handle":65536,
"phys_id":0,
"flag_failed_save":true,
"flag_failed_arm":true,
"flag_failed_restore":true,
"flag_failed_flush":true,
"flag_smart_event":true
  }
]
  }
]

> Here is my test environment where this unit test is failing:
> 
> https://gist.github.com/djbw/144dc5ddaf5e935ad58bd66a702a5ea8
> 
> >
> > diff --git a/test/Makefile.am b/test/Makefile.am index
> > cd451e9..8c76462 100644
> > --- a/test/Makefile.am
> > +++ b/test/Makefile.am
> > @@ -21,7 +21,8 @@ TESTS =\
> > btt-pad-compat.sh \
> >   

Re: [PATCH v3 2/2] ext4: handle layout changes to pinned DAX mappings

2018-07-09 Thread Lukas Czerner
On Fri, Jul 06, 2018 at 09:29:34AM +1000, Dave Chinner wrote:
> On Thu, Jul 05, 2018 at 01:40:17PM -0700, Dan Williams wrote:
> > On Wed, Jul 4, 2018 at 8:59 PM, Darrick J. Wong  
> > wrote:
> > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > >> On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > >> > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > >> > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > >> > > > Follow the lead of xfs_break_dax_layouts() and add synchronization 
> > >> > > > between
> > >> > > > operations in ext4 which remove blocks from an inode (hole punch, 
> > >> > > > truncate
> > >> > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > >> > > >
> > >> > > > Signed-off-by: Ross Zwisler 
> > >> > > > Reviewed-by: Jan Kara 
> > >> > > > Reviewed-by: Lukas Czerner 
> > >> > > > ---
> > >> > > >
> > >> > > > Changes since v2:
> > >> > > >  * Added a comment to ext4_insert_range() explaining why we don't 
> > >> > > > call
> > >> > > >ext4_break_layouts(). (Jan)
> > >> > >
> > >> > > Which I think is wrong and will cause data corruption.
> > >> > >
> > >> > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, 
> > >> > > > loff_t offset, loff_t len)
> > >> > > > LLONG_MAX);
> > >> > > > if (ret)
> > >> > > > goto out_mmap;
> > >> > > > +   /*
> > >> > > > +* We don't need to call ext4_break_layouts() because we 
> > >> > > > aren't
> > >> > > > +* removing any blocks from the inode.  We are just 
> > >> > > > changing their
> > >> > > > +* offset by inserting a hole.
> > >> > > > +*/
> > >
> > > Does calling ext4_break_layouts from insert range not work?
> > >
> > > It's my understanding that file leases work are a mechanism for the
> > > filesystem to delegate some of its authority over physical space
> > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > directly into userspace and exporting space on shared storage over
> > > pNFS.  Some day we might use the same mechanism for the similar things
> > > that RDMA does, or the swapfile code since that's essentially how it
> > > works today.
> > >
> > > The other part of these file leases is that the filesystem revokes them
> > > any time it wants to perform a mapping operation on a file.  This breaks
> > > my mental model of how leases work, and if you commit to this for ext4
> > > then I'll have to remember that leases are different between xfs and
> > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > be the implementation detail that "DAX won't care", then someone else
> > > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > > back into ext4 or else kaboom.
> > >
> > > Granted, Dave said all these things already, but I actually feel
> > > strongly enough to reiterate.
> > 
> > This patch kit is only for the DAX fix, this isn't full layout lease
> > support. Even XFS is special casing unmap with the BREAK_UNMAP flag.
> > So ext4 is achieving feature parity for BREAK_UNMAP, just not
> > BREAK_WRITE, yet.
> 
> BREAK_UNMAP is issued unconditionally by XFS for all fallocate
> operations. There is no special except for extent shifting (up or
> down) in XFS as this patch set is making for ext4.  IOWs, this
> patchset does not implement BREAK_UNMAP with the same semantics as
> XFS.

If anything this is very usefull discussion ( at least for me ) and what
I do take away from it is that there is no documentation, nor
specification of the leases nor BREAK_UNMAP nor BREAK_WRITE.

grep -iR -e break_layout -e BREAK_UNMAP -e BREAK_WRITE Documentation/*

Maybe someone with a good understanding of how this stuff is supposed to
be done could write it down so filesystem devs can make it behave the
same.

Thanks!
-Lukas


> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor

2018-07-09 Thread Qi, Fuli
> -Original Message-
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Monday, July 9, 2018 3:27 PM
> To: Qi, Fuli/斉 福利 
> Cc: linux-nvdimm 
> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
> 
> On Sun, Jul 8, 2018 at 11:22 PM, Qi, Fuli  wrote:
> >> -Original Message-
> >> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> >> Sent: Monday, July 9, 2018 3:04 PM
> >> To: Qi, Fuli/斉 福利 
> >> Cc: linux-nvdimm 
> >> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
> >>
> >> On Sun, Jul 8, 2018 at 9:59 PM, Qi, Fuli  wrote:
> >> >> -Original Message-
> >> >> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> >> >> Sent: Sunday, July 8, 2018 5:06 AM
> >> >> To: Qi, Fuli/斉 福利 
> >> >> Cc: linux-nvdimm 
> >> >> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
> >> >>
> >> >> On Thu, Jun 28, 2018 at 7:30 PM, QI Fuli  wrote:
> >> >> > Ndctl monitor is used for monitoring the smart events of nvdimm DIMMs.
> >> >> > When a smart event fires, monitor will output the notifications
> >> >> > which include dimm health status and evnet informations to
> >> >> > syslog or a logfile by setting [--logfile] option. The
> >> >> > notifications follow json format and can be consumed by log 
> >> >> > collectors like
> Fluentd.
> >> >> >
> >> >> > The objects to monitor can be selected by setting [--dimm]
> >> >> > [--region] [--namespace] [--bus] options and the event type can
> >> >> > be filtered by setting [--dimm-event] option. These options
> >> >> > support multiple space-separated arguments.
> >> >> >
> >> >> > Ndctl monitor can be forked as a daemon by using [--daemon]
> >> >> > option, such as:
> >> >> ># ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
> >> >> >
> >> >> > Signed-off-by: QI Fuli 
> >> >> > ---
> >> >> >  builtin.h  |   1 +
> >> >> >  ndctl/Makefile.am  |   3 +-
> >> >> >  ndctl/lib/libndctl.sym |   1 +
> >> >> >  ndctl/lib/smart.c  |  17 ++
> >> >> >  ndctl/libndctl.h   |   6 +
> >> >> >  ndctl/monitor.c| 531 
> >> >> > +
> >> >> >  ndctl/ndctl.c  |   1 +
> >> >> >  util/filter.h  |   9 +
> >> >> >  8 files changed, 568 insertions(+), 1 deletion(-)  create mode
> >> >> > 100644 ndctl/monitor.c
> >> >> >
> >> >> > diff --git a/builtin.h b/builtin.h index d3cc723..675a6ce 100644
> >> >> > --- a/builtin.h
> >> >> > +++ b/builtin.h
> >> >> > @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char
> >> >> > **argv, void *ctx);  int cmd_wait_scrub(int argc, const char
> >> >> > **argv, void *ctx);  int cmd_start_scrub(int argc, const char
> >> >> > **argv, void *ctx); int cmd_list(int argc, const char **argv,
> >> >> > void *ctx);
> >> >> > +int cmd_monitor(int argc, const char **argv, void *ctx);
> >> >> >  #ifdef ENABLE_TEST
> >> >> >  int cmd_test(int argc, const char **argv, void *ctx);  #endif
> >> >> > diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am index
> >> >> > d22a379..7dbf223
> >> >> > 100644
> >> >> > --- a/ndctl/Makefile.am
> >> >> > +++ b/ndctl/Makefile.am
> >> >> > @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
> >> >> > util/json-smart.c \
> >> >> > util/json-firmware.c \
> >> >> > inject-error.c \
> >> >> > -   inject-smart.c
> >> >> > +   inject-smart.c \
> >> >> > +   monitor.c
> >> >> >
> >> >> >  if ENABLE_DESTRUCTIVE
> >> >> >  ndctl_SOURCES += ../test/blk_namespaces.c \ diff --git
> >> >> > a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym index
> >> >> > e939993..f64df56 100644
> >> >> > --- a/ndctl/lib/libndctl.sym
> >> >> > +++ b/ndctl/lib/libndctl.sym
> >> >> > @@ -366,4 +366,5 @@ global:
> >> >> > ndctl_namespace_inject_error2;
> >> >> > ndctl_namespace_uninject_error2;
> >> >> > ndctl_cmd_ars_stat_get_flag_overflow;
> >> >> > +   ndctl_cmd_smart_get_event_flags;
> >> >> >  } LIBNDCTL_15;
> >> >> > diff --git a/ndctl/lib/smart.c b/ndctl/lib/smart.c index
> >> >> > 0455252..90a65d0 100644
> >> >> > --- a/ndctl/lib/smart.c
> >> >> > +++ b/ndctl/lib/smart.c
> >> >> > @@ -101,6 +101,23 @@ NDCTL_EXPORT unsigned int
> >> >> > ndctl_cmd_smart_threshold_get_temperature(
> >> >> >
> >> >> >  smart_cmd_op(smart_threshold_get_supported_alarms, unsigned
> >> >> > int, 0);
> >> >> >
> >> >> > +NDCTL_EXPORT unsigned int
> >> >> > +ndctl_cmd_smart_get_event_flags(struct
> >> >> > +ndctl_cmd *cmd)
> >> >>
> >> >> My expectation for this ndctl_*_get_event_flags() api was to have it be:
> >> >>
> >> >> ndctl_dimm_get_event_flags()
> >> >>
> >> >> ...and with that in place get rid of the 'struct monitor_dimm' object.
> >> >> Push everything to be retrieved through api calls against a 'struct 
> >> >> ndctl_dimm'
> >> object.
> >> >> In other words, the usage of 'struct ndctl_cmd'
> >> >> should be hidden and all monitor operations should be done in
> >> >> terms of 'struct ndctl_dimm' helper 

Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor

2018-07-09 Thread Dan Williams
On Sun, Jul 8, 2018 at 11:22 PM, Qi, Fuli  wrote:
>> -Original Message-
>> From: Dan Williams [mailto:dan.j.willi...@intel.com]
>> Sent: Monday, July 9, 2018 3:04 PM
>> To: Qi, Fuli/斉 福利 
>> Cc: linux-nvdimm 
>> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
>>
>> On Sun, Jul 8, 2018 at 9:59 PM, Qi, Fuli  wrote:
>> >> -Original Message-
>> >> From: Dan Williams [mailto:dan.j.willi...@intel.com]
>> >> Sent: Sunday, July 8, 2018 5:06 AM
>> >> To: Qi, Fuli/斉 福利 
>> >> Cc: linux-nvdimm 
>> >> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
>> >>
>> >> On Thu, Jun 28, 2018 at 7:30 PM, QI Fuli  wrote:
>> >> > Ndctl monitor is used for monitoring the smart events of nvdimm DIMMs.
>> >> > When a smart event fires, monitor will output the notifications
>> >> > which include dimm health status and evnet informations to syslog
>> >> > or a logfile by setting [--logfile] option. The notifications
>> >> > follow json format and can be consumed by log collectors like Fluentd.
>> >> >
>> >> > The objects to monitor can be selected by setting [--dimm]
>> >> > [--region] [--namespace] [--bus] options and the event type can be
>> >> > filtered by setting [--dimm-event] option. These options support
>> >> > multiple space-separated arguments.
>> >> >
>> >> > Ndctl monitor can be forked as a daemon by using [--daemon] option,
>> >> > such as:
>> >> ># ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
>> >> >
>> >> > Signed-off-by: QI Fuli 
>> >> > ---
>> >> >  builtin.h  |   1 +
>> >> >  ndctl/Makefile.am  |   3 +-
>> >> >  ndctl/lib/libndctl.sym |   1 +
>> >> >  ndctl/lib/smart.c  |  17 ++
>> >> >  ndctl/libndctl.h   |   6 +
>> >> >  ndctl/monitor.c| 531 +
>> >> >  ndctl/ndctl.c  |   1 +
>> >> >  util/filter.h  |   9 +
>> >> >  8 files changed, 568 insertions(+), 1 deletion(-)  create mode
>> >> > 100644 ndctl/monitor.c
>> >> >
>> >> > diff --git a/builtin.h b/builtin.h
>> >> > index d3cc723..675a6ce 100644
>> >> > --- a/builtin.h
>> >> > +++ b/builtin.h
>> >> > @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char **argv,
>> >> > void *ctx);  int cmd_wait_scrub(int argc, const char **argv, void
>> >> > *ctx);  int cmd_start_scrub(int argc, const char **argv, void
>> >> > *ctx); int cmd_list(int argc, const char **argv, void *ctx);
>> >> > +int cmd_monitor(int argc, const char **argv, void *ctx);
>> >> >  #ifdef ENABLE_TEST
>> >> >  int cmd_test(int argc, const char **argv, void *ctx);  #endif diff
>> >> > --git a/ndctl/Makefile.am b/ndctl/Makefile.am index
>> >> > d22a379..7dbf223
>> >> > 100644
>> >> > --- a/ndctl/Makefile.am
>> >> > +++ b/ndctl/Makefile.am
>> >> > @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
>> >> > util/json-smart.c \
>> >> > util/json-firmware.c \
>> >> > inject-error.c \
>> >> > -   inject-smart.c
>> >> > +   inject-smart.c \
>> >> > +   monitor.c
>> >> >
>> >> >  if ENABLE_DESTRUCTIVE
>> >> >  ndctl_SOURCES += ../test/blk_namespaces.c \ diff --git
>> >> > a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym index
>> >> > e939993..f64df56 100644
>> >> > --- a/ndctl/lib/libndctl.sym
>> >> > +++ b/ndctl/lib/libndctl.sym
>> >> > @@ -366,4 +366,5 @@ global:
>> >> > ndctl_namespace_inject_error2;
>> >> > ndctl_namespace_uninject_error2;
>> >> > ndctl_cmd_ars_stat_get_flag_overflow;
>> >> > +   ndctl_cmd_smart_get_event_flags;
>> >> >  } LIBNDCTL_15;
>> >> > diff --git a/ndctl/lib/smart.c b/ndctl/lib/smart.c index
>> >> > 0455252..90a65d0 100644
>> >> > --- a/ndctl/lib/smart.c
>> >> > +++ b/ndctl/lib/smart.c
>> >> > @@ -101,6 +101,23 @@ NDCTL_EXPORT unsigned int
>> >> > ndctl_cmd_smart_threshold_get_temperature(
>> >> >
>> >> >  smart_cmd_op(smart_threshold_get_supported_alarms, unsigned int,
>> >> > 0);
>> >> >
>> >> > +NDCTL_EXPORT unsigned int ndctl_cmd_smart_get_event_flags(struct
>> >> > +ndctl_cmd *cmd)
>> >>
>> >> My expectation for this ndctl_*_get_event_flags() api was to have it be:
>> >>
>> >> ndctl_dimm_get_event_flags()
>> >>
>> >> ...and with that in place get rid of the 'struct monitor_dimm' object.
>> >> Push everything to be retrieved through api calls against a 'struct 
>> >> ndctl_dimm'
>> object.
>> >> In other words, the usage of 'struct ndctl_cmd'
>> >> should be hidden and all monitor operations should be done in terms
>> >> of 'struct ndctl_dimm' helper calls.
>> >>
>> > Hi Dan,
>> > Thanks for your comments.
>> >
>> > In the v9 of monitor, I use the 'struct ndctl_cmd' object in the following 
>> > places:
>> > ndctl_cmd_smart_get_flags(struct ndctl_cmd *cmd)
>> > ndctl_cmd_smart_get_health(struct ndctl_cmd *cmd)
>> > ndctl_cmd_smart_get_event_flags(struct ndctl_cmd *cmd) Is it
>> > that you want to hide all of the 'struct ndctl_cmd' objects and add the 
>> > following
>> 'struct ndctl_dimm' helper 

RE: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor

2018-07-09 Thread Qi, Fuli
> -Original Message-
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Monday, July 9, 2018 3:04 PM
> To: Qi, Fuli/斉 福利 
> Cc: linux-nvdimm 
> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
> 
> On Sun, Jul 8, 2018 at 9:59 PM, Qi, Fuli  wrote:
> >> -Original Message-
> >> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> >> Sent: Sunday, July 8, 2018 5:06 AM
> >> To: Qi, Fuli/斉 福利 
> >> Cc: linux-nvdimm 
> >> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
> >>
> >> On Thu, Jun 28, 2018 at 7:30 PM, QI Fuli  wrote:
> >> > Ndctl monitor is used for monitoring the smart events of nvdimm DIMMs.
> >> > When a smart event fires, monitor will output the notifications
> >> > which include dimm health status and evnet informations to syslog
> >> > or a logfile by setting [--logfile] option. The notifications
> >> > follow json format and can be consumed by log collectors like Fluentd.
> >> >
> >> > The objects to monitor can be selected by setting [--dimm]
> >> > [--region] [--namespace] [--bus] options and the event type can be
> >> > filtered by setting [--dimm-event] option. These options support
> >> > multiple space-separated arguments.
> >> >
> >> > Ndctl monitor can be forked as a daemon by using [--daemon] option,
> >> > such as:
> >> ># ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
> >> >
> >> > Signed-off-by: QI Fuli 
> >> > ---
> >> >  builtin.h  |   1 +
> >> >  ndctl/Makefile.am  |   3 +-
> >> >  ndctl/lib/libndctl.sym |   1 +
> >> >  ndctl/lib/smart.c  |  17 ++
> >> >  ndctl/libndctl.h   |   6 +
> >> >  ndctl/monitor.c| 531 +
> >> >  ndctl/ndctl.c  |   1 +
> >> >  util/filter.h  |   9 +
> >> >  8 files changed, 568 insertions(+), 1 deletion(-)  create mode
> >> > 100644 ndctl/monitor.c
> >> >
> >> > diff --git a/builtin.h b/builtin.h
> >> > index d3cc723..675a6ce 100644
> >> > --- a/builtin.h
> >> > +++ b/builtin.h
> >> > @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char **argv,
> >> > void *ctx);  int cmd_wait_scrub(int argc, const char **argv, void
> >> > *ctx);  int cmd_start_scrub(int argc, const char **argv, void
> >> > *ctx); int cmd_list(int argc, const char **argv, void *ctx);
> >> > +int cmd_monitor(int argc, const char **argv, void *ctx);
> >> >  #ifdef ENABLE_TEST
> >> >  int cmd_test(int argc, const char **argv, void *ctx);  #endif diff
> >> > --git a/ndctl/Makefile.am b/ndctl/Makefile.am index
> >> > d22a379..7dbf223
> >> > 100644
> >> > --- a/ndctl/Makefile.am
> >> > +++ b/ndctl/Makefile.am
> >> > @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
> >> > util/json-smart.c \
> >> > util/json-firmware.c \
> >> > inject-error.c \
> >> > -   inject-smart.c
> >> > +   inject-smart.c \
> >> > +   monitor.c
> >> >
> >> >  if ENABLE_DESTRUCTIVE
> >> >  ndctl_SOURCES += ../test/blk_namespaces.c \ diff --git
> >> > a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym index
> >> > e939993..f64df56 100644
> >> > --- a/ndctl/lib/libndctl.sym
> >> > +++ b/ndctl/lib/libndctl.sym
> >> > @@ -366,4 +366,5 @@ global:
> >> > ndctl_namespace_inject_error2;
> >> > ndctl_namespace_uninject_error2;
> >> > ndctl_cmd_ars_stat_get_flag_overflow;
> >> > +   ndctl_cmd_smart_get_event_flags;
> >> >  } LIBNDCTL_15;
> >> > diff --git a/ndctl/lib/smart.c b/ndctl/lib/smart.c index
> >> > 0455252..90a65d0 100644
> >> > --- a/ndctl/lib/smart.c
> >> > +++ b/ndctl/lib/smart.c
> >> > @@ -101,6 +101,23 @@ NDCTL_EXPORT unsigned int
> >> > ndctl_cmd_smart_threshold_get_temperature(
> >> >
> >> >  smart_cmd_op(smart_threshold_get_supported_alarms, unsigned int,
> >> > 0);
> >> >
> >> > +NDCTL_EXPORT unsigned int ndctl_cmd_smart_get_event_flags(struct
> >> > +ndctl_cmd *cmd)
> >>
> >> My expectation for this ndctl_*_get_event_flags() api was to have it be:
> >>
> >> ndctl_dimm_get_event_flags()
> >>
> >> ...and with that in place get rid of the 'struct monitor_dimm' object.
> >> Push everything to be retrieved through api calls against a 'struct 
> >> ndctl_dimm'
> object.
> >> In other words, the usage of 'struct ndctl_cmd'
> >> should be hidden and all monitor operations should be done in terms
> >> of 'struct ndctl_dimm' helper calls.
> >>
> > Hi Dan,
> > Thanks for your comments.
> >
> > In the v9 of monitor, I use the 'struct ndctl_cmd' object in the following 
> > places:
> > ndctl_cmd_smart_get_flags(struct ndctl_cmd *cmd)
> > ndctl_cmd_smart_get_health(struct ndctl_cmd *cmd)
> > ndctl_cmd_smart_get_event_flags(struct ndctl_cmd *cmd) Is it
> > that you want to hide all of the 'struct ndctl_cmd' objects and add the 
> > following
> 'struct ndctl_dimm' helper calls?
> 
> I'm primarily reacting to:
> 
> +struct monitor_dimm {
> +   struct ndctl_dimm *dimm;
> +   int health_eventfd;
> +   unsigned int health;
> + 

Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor

2018-07-09 Thread Dan Williams
On Sun, Jul 8, 2018 at 9:59 PM, Qi, Fuli  wrote:
>> -Original Message-
>> From: Dan Williams [mailto:dan.j.willi...@intel.com]
>> Sent: Sunday, July 8, 2018 5:06 AM
>> To: Qi, Fuli/斉 福利 
>> Cc: linux-nvdimm 
>> Subject: Re: [PATCH v9 1/3] ndctl, monitor: add ndctl monitor
>>
>> On Thu, Jun 28, 2018 at 7:30 PM, QI Fuli  wrote:
>> > Ndctl monitor is used for monitoring the smart events of nvdimm DIMMs.
>> > When a smart event fires, monitor will output the notifications which
>> > include dimm health status and evnet informations to syslog or a
>> > logfile by setting [--logfile] option. The notifications follow json
>> > format and can be consumed by log collectors like Fluentd.
>> >
>> > The objects to monitor can be selected by setting [--dimm] [--region]
>> > [--namespace] [--bus] options and the event type can be filtered by
>> > setting [--dimm-event] option. These options support multiple
>> > space-separated arguments.
>> >
>> > Ndctl monitor can be forked as a daemon by using [--daemon] option,
>> > such as:
>> ># ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
>> >
>> > Signed-off-by: QI Fuli 
>> > ---
>> >  builtin.h  |   1 +
>> >  ndctl/Makefile.am  |   3 +-
>> >  ndctl/lib/libndctl.sym |   1 +
>> >  ndctl/lib/smart.c  |  17 ++
>> >  ndctl/libndctl.h   |   6 +
>> >  ndctl/monitor.c| 531 +
>> >  ndctl/ndctl.c  |   1 +
>> >  util/filter.h  |   9 +
>> >  8 files changed, 568 insertions(+), 1 deletion(-)  create mode 100644
>> > ndctl/monitor.c
>> >
>> > diff --git a/builtin.h b/builtin.h
>> > index d3cc723..675a6ce 100644
>> > --- a/builtin.h
>> > +++ b/builtin.h
>> > @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char **argv,
>> > void *ctx);  int cmd_wait_scrub(int argc, const char **argv, void
>> > *ctx);  int cmd_start_scrub(int argc, const char **argv, void *ctx);
>> > int cmd_list(int argc, const char **argv, void *ctx);
>> > +int cmd_monitor(int argc, const char **argv, void *ctx);
>> >  #ifdef ENABLE_TEST
>> >  int cmd_test(int argc, const char **argv, void *ctx);  #endif diff
>> > --git a/ndctl/Makefile.am b/ndctl/Makefile.am index d22a379..7dbf223
>> > 100644
>> > --- a/ndctl/Makefile.am
>> > +++ b/ndctl/Makefile.am
>> > @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
>> > util/json-smart.c \
>> > util/json-firmware.c \
>> > inject-error.c \
>> > -   inject-smart.c
>> > +   inject-smart.c \
>> > +   monitor.c
>> >
>> >  if ENABLE_DESTRUCTIVE
>> >  ndctl_SOURCES += ../test/blk_namespaces.c \ diff --git
>> > a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym index
>> > e939993..f64df56 100644
>> > --- a/ndctl/lib/libndctl.sym
>> > +++ b/ndctl/lib/libndctl.sym
>> > @@ -366,4 +366,5 @@ global:
>> > ndctl_namespace_inject_error2;
>> > ndctl_namespace_uninject_error2;
>> > ndctl_cmd_ars_stat_get_flag_overflow;
>> > +   ndctl_cmd_smart_get_event_flags;
>> >  } LIBNDCTL_15;
>> > diff --git a/ndctl/lib/smart.c b/ndctl/lib/smart.c index
>> > 0455252..90a65d0 100644
>> > --- a/ndctl/lib/smart.c
>> > +++ b/ndctl/lib/smart.c
>> > @@ -101,6 +101,23 @@ NDCTL_EXPORT unsigned int
>> > ndctl_cmd_smart_threshold_get_temperature(
>> >
>> >  smart_cmd_op(smart_threshold_get_supported_alarms, unsigned int, 0);
>> >
>> > +NDCTL_EXPORT unsigned int ndctl_cmd_smart_get_event_flags(struct
>> > +ndctl_cmd *cmd)
>>
>> My expectation for this ndctl_*_get_event_flags() api was to have it be:
>>
>> ndctl_dimm_get_event_flags()
>>
>> ...and with that in place get rid of the 'struct monitor_dimm' object.
>> Push everything to be retrieved through api calls against a 'struct 
>> ndctl_dimm' object.
>> In other words, the usage of 'struct ndctl_cmd'
>> should be hidden and all monitor operations should be done in terms of 
>> 'struct
>> ndctl_dimm' helper calls.
>>
> Hi Dan,
> Thanks for your comments.
>
> In the v9 of monitor, I use the 'struct ndctl_cmd' object in the following 
> places:
> ndctl_cmd_smart_get_flags(struct ndctl_cmd *cmd)
> ndctl_cmd_smart_get_health(struct ndctl_cmd *cmd)
> ndctl_cmd_smart_get_event_flags(struct ndctl_cmd *cmd)
> Is it that you want to hide all of the 'struct ndctl_cmd' objects and add the 
> following 'struct ndctl_dimm' helper calls?

I'm primarily reacting to:

+struct monitor_dimm {
+   struct ndctl_dimm *dimm;
+   int health_eventfd;
+   unsigned int health;
+   unsigned int event_flags;
+   struct list_node list;
+};

Which is effectively duplicating ndctl_dimm internal data.

> ndctl_dimm_get_flags(struct ndctl_dimm *dimm)
> ndctl_dimm_get_health(struct ndctl_dimm *dimm)
> ndctl_dimm_get_event_flags(struct ndctl_dimm *dimm)

I understand why we need ndctl_dimm_get_event_flags() since that tells
you what events have fired. Why do we need the other 2? If the event
has fired then the