Re: [PATCH] mm: reject MAP_SHARED_VALIDATE without new flags

2018-06-27 Thread Eric Sandeen
On 6/27/18 9:37 PM, Linus Torvalds wrote:
> On Wed, Jun 27, 2018 at 7:17 PM Eric Sandeen  wrote:
>>
>> What broke is that mmap(MAP_SHARED|MAP_PRIVATE) now succeeds without error,
>> whereas before it rightly returned -EINVAL.
> 
> You're still confusing *behavior* with breakage.
> 
> Yes. New *behavior* is that MAP_SHARED|MAP_PRIVATE is now a valid
> thing. It means "MAP_SHARED_VALIDATE".
> 
> Behavior changed.  That's normal. Every single time we add a system
> call, behavior changes: a system call that used to return -ENOSYS now
> returns something else.
> 
> That's not breakage, that's just intentional new behavior.

*shrug* semantics aside, the new behavior is out there in a public
API, so I guess there's nothing to do at this point other than
to document the change more clearly.  It's true that my patch could
possibly break existing users.

The man page is clearly wrong at this point, both in terms of the
error code section, and the claim that MAP_SHARED and MAP_PRIVATE
behave as described in POSIX (because POSIX states that these
two flags may not be specified together.)

-Eric
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH v8 1/3] ndctl, monitor: add ndctl monitor

2018-06-27 Thread Qi, Fuli
> -Original Message-
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Thursday, June 28, 2018 12:53 PM
> To: Qi, Fuli/斉 福利 
> Cc: linux-nvdimm 
> Subject: Re: [PATCH v8 1/3] ndctl, monitor: add ndctl monitor
> 
> On Wed, Jun 27, 2018 at 8:02 PM, Qi, Fuli  wrote:
> > Thanks for your comments.
> >
> 
> Thanks for continuing to work through the feedback.
> 
> >> > +static struct monitor_dimm *util_dimm_event_filter(struct monitor_dimm
> *mdimm,
> >> > +   const char *__ident) {
> >> > +   char *ident, *save;
> >> > +   const char *name;
> >> > +   struct monitor_dimm *__mdimm;
> >> > +   struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(mdimm->dimm);
> >> > +
> >> > +   __mdimm = calloc(1, sizeof(struct monitor_dimm));
> >> > +   if (!__mdimm) {
> >> > +   fail("\n");
> >> > +   return NULL;
> >> > +   }
> >>
> >> Why do we need a dynamic allocation? Can't this just live on the stack?
> >>
> > Actually, I just use the __dimm to compare current dimm health with the 
> > dimm health
> when the monitor started.
> > When the dimm changed, the __dimm should be initialized. I think a dynamic 
> > allocation
> is smarter than live on the stack.
> 
> Allocating memory should always be avoided when not necessary because it's so 
> error
> prone in C. None of the other util_X_filter() routines need to allocate 
> memory, in
> fact the point is to validate that the passed in object passes the filter, 
> not to
> swap the passed in object with a new one.
> 

OK, I see. 

Thank you very much.
Qi
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v8 1/3] ndctl, monitor: add ndctl monitor

2018-06-27 Thread Dan Williams
On Wed, Jun 27, 2018 at 8:02 PM, Qi, Fuli  wrote:
> Thanks for your comments.
>

Thanks for continuing to work through the feedback.

>> > +static struct monitor_dimm *util_dimm_event_filter(struct monitor_dimm 
>> > *mdimm,
>> > +   const char *__ident)
>> > +{
>> > +   char *ident, *save;
>> > +   const char *name;
>> > +   struct monitor_dimm *__mdimm;
>> > +   struct ndctl_ctx *ctx = ndctl_dimm_get_ctx(mdimm->dimm);
>> > +
>> > +   __mdimm = calloc(1, sizeof(struct monitor_dimm));
>> > +   if (!__mdimm) {
>> > +   fail("\n");
>> > +   return NULL;
>> > +   }
>>
>> Why do we need a dynamic allocation? Can't this just live on the stack?
>>
> Actually, I just use the __dimm to compare current dimm health with the dimm 
> health when the monitor started.
> When the dimm changed, the __dimm should be initialized. I think a dynamic 
> allocation is smarter than live on the stack.

Allocating memory should always be avoided when not necessary because
it's so error prone in C. None of the other util_X_filter() routines
need to allocate memory, in fact the point is to validate that the
passed in object passes the filter, not to swap the passed in object
with a new one.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH v8 1/3] ndctl, monitor: add ndctl monitor

2018-06-27 Thread Qi, Fuli
Thanks for your comments.

> -Original Message-
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Thursday, June 28, 2018 12:09 AM
> To: Qi, Fuli/斉 福利 
> Cc: linux-nvdimm 
> Subject: Re: [PATCH v8 1/3] ndctl, monitor: add ndctl monitor
> 
> On Tue, Jun 26, 2018 at 10:24 PM, QI Fuli  wrote:
> > Ndctl monitor is used for monitoring the smart events of nvdimm DIMMs.
> > When a smart event fires, monitor will output the notifications which
> > include dimm health status and evnet informations to syslog or a
> > logfile by setting [--logfile] option. The notifications follow json
> > format and can be consumed by log collectors like Fluentd.
> >
> > The objects to monitor can be selected by setting [--dimm] [--region]
> > [--namespace] [--bus] options and the event type can be filtered by
> > setting [--dimm-event] option. These options support multiple
> > space-separated arguments.
> >
> > Ndctl monitor can be forked as a daemon by using [--daemon] option,
> > such as:
> ># ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
> >
> > Signed-off-by: QI Fuli 
> > ---
> >  builtin.h |   1 +
> >  ndctl/Makefile.am |   3 +-
> >  ndctl/monitor.c   | 508 ++
> >  ndctl/ndctl.c |   1 +
> >  4 files changed, 512 insertions(+), 1 deletion(-)
> >  create mode 100644 ndctl/monitor.c
> >
> > diff --git a/builtin.h b/builtin.h
> > index d3cc723..675a6ce 100644
> > --- a/builtin.h
> > +++ b/builtin.h
> > @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char **argv, void 
> > *ctx);
> >  int cmd_wait_scrub(int argc, const char **argv, void *ctx);
> >  int cmd_start_scrub(int argc, const char **argv, void *ctx);
> >  int cmd_list(int argc, const char **argv, void *ctx);
> > +int cmd_monitor(int argc, const char **argv, void *ctx);
> >  #ifdef ENABLE_TEST
> >  int cmd_test(int argc, const char **argv, void *ctx);
> >  #endif
> > diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
> > index d22a379..7dbf223 100644
> > --- a/ndctl/Makefile.am
> > +++ b/ndctl/Makefile.am
> > @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
> > util/json-smart.c \
> > util/json-firmware.c \
> > inject-error.c \
> > -   inject-smart.c
> > +   inject-smart.c \
> > +   monitor.c
> >
> >  if ENABLE_DESTRUCTIVE
> >  ndctl_SOURCES += ../test/blk_namespaces.c \
> > diff --git a/ndctl/monitor.c b/ndctl/monitor.c
> > new file mode 100644
> > index 000..d926a81
> > --- /dev/null
> > +++ b/ndctl/monitor.c
> > @@ -0,0 +1,508 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/* Copyright(c) 2018, FUJITSU LIMITED. All rights reserved. */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#define BUF_SIZE 2048
> > +
> > +#define DIMM_SPARES_REMAINING  (1 << 0)
> > +#define DIMM_MEDIA_TEMPERATURE (1 << 1)
> > +#define DIMM_CTRL_TEMPERATURE  (1 << 2)
> > +#define DIMM_HEALTH_STATE  (1 << 3)
> > +#define DIMM_UNCLEAN_SHUTDOWN  (1 << 4)
> > +
> > +static struct monitor {
> > +   const char *logfile;
> > +   const char *dimm_event;
> > +   bool daemon;
> > +} monitor;
> > +
> > +struct monitor_dimm {
> > +   struct ndctl_dimm *dimm;
> > +   int health_eventfd;
> > +   unsigned int health;
> > +   unsigned int event_flags;
> > +   struct list_node list;
> > +};
> > +
> > +struct monitor_filter_arg {
> > +   struct list_head dimms;
> > +   int maxfd_dimm;
> > +   int num_dimm;
> > +   unsigned long flags;
> > +};
> > +
> > +struct util_filter_params param;
> > +
> > +static int did_fail;
> > +
> > +#define fail(fmt, ...) \
> > +do { \
> > +   did_fail = 1; \
> > +   err((struct ndctl_ctx *)ctx, "ndctl-%s:%s:%d: " fmt, \
> > +   VERSION, __func__, __LINE__, ##__VA_ARGS__); \
> > +} while (0)
> 
> It is odd that the monitor command emits errors in this format, but no
> other ndctl uses scheme. Also. any print that includes source code
> line numbers is debug in my opinion. Source code line numbers are
> useful for developers not typical users. Let's make this a debug level
> print and circle back later to see if we need a better common error
> print format across all commands.
> 

Ok, I will change some of them to debug level for developers, and use err() to 
print the error info for users.

> > +
> > +static bool is_dir(char *filepath)
> > +{
> > +   DIR *dir = opendir(filepath);
> > +   if (dir) {
> > +   closedir(dir);
> > +   return true;
> > +   }
> > +   return false;
> > +}
> > +
> > +static void log_syslog(struct ndctl_ctx *ctx, int priority, const char 
> > *file,
> > +   int line, const char *fn, const char *format, va_list args)
> > +{
> > +   char *buf;
> > +
> > +   buf = 

Re: [PATCH] mm: reject MAP_SHARED_VALIDATE without new flags

2018-06-27 Thread Linus Torvalds
On Wed, Jun 27, 2018 at 7:17 PM Eric Sandeen  wrote:
>
> What broke is that mmap(MAP_SHARED|MAP_PRIVATE) now succeeds without error,
> whereas before it rightly returned -EINVAL.

You're still confusing *behavior* with breakage.

Yes. New *behavior* is that MAP_SHARED|MAP_PRIVATE is now a valid
thing. It means "MAP_SHARED_VALIDATE".

Behavior changed.  That's normal. Every single time we add a system
call, behavior changes: a system call that used to return -ENOSYS now
returns something else.

That's not breakage, that's just intentional new behavior.

> What behavior should a user expect from a successful 
> mmap(MAP_SHARED|MAP_PRIVATE)?

MAP_SHARED|MAP_PRIVATE makes no sense and nobody uses it (because it
has always returned an error and never done anything interesting).

Nobody uses it, and it used to return an error is *exactly* why it was
defined to be MAP_SHARED_VALIDATE.

So you should expect MAP_SHARED_VALIDATE behavior - which is
MAP_SHARED together with "validate that all the flags are things that
we support".

Actual BREAKAGE is if some application or user workflow no longer
works. Did LibreOffice stop working? That is breakage.

And by application, I mean exactly that: a real program.  Not some
manual-page, and not some test-program that people don't actually rely
on, and that just reports on some particular behavior.

Because I can write a test program that verifies that system call #335
doesn't exist:

#define _GNU_SOURCE
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
assert(syscall(335, 0) == -1 && errno == ENOSYS);
return 0;
}

and the next system call we add will break that test program on x86-64.

And that's still not a "regression" - it's just a change in behavior.

But if firefox no longer runs, because it depended on that system call
not existing (or it depended on that MAP_SHARED_VALIDATE returning
EINVAL) then it's a regression.

See the difference?

One case is "we added new behavior".

The other case is "we have a regression".

Linus
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] mm: reject MAP_SHARED_VALIDATE without new flags

2018-06-27 Thread Eric Sandeen
On 6/27/18 9:10 PM, Linus Torvalds wrote:
> On Wed, Jun 27, 2018 at 6:45 PM Eric Sandeen  wrote:
>>
>> Thus the invalid flag combination of (MAP_SHARED|MAP_PRIVATE) now
>> passes without error, which is a regression.
> 
> It's not a regression, it's just new behavior.
> 
> "regression" doesn't mean "things changed". It means "something broke".
> 
> What broke?

My commit log perhaps was not clear enough.

What broke is that mmap(MAP_SHARED|MAP_PRIVATE) now succeeds without error,
whereas before it rightly returned -EINVAL.

What behavior should a user expect from a successful 
mmap(MAP_SHARED|MAP_PRIVATE)?

-Eric

> Because if it's some manual page breakage, just fix the manual. That's
> what "new behavior" is all about.
> 
> There is nothing that says that "MAP_SHARED_VALIDATE" can't work with
> just the legacy flags.
> 
> Because I'd be worried about your patch breaking some actual new user
> of MAP_SHARED_VALIDATE.
> 
> Because it's actual *users* of behavior we care about, not some
> test-suite or manual pages.
> 
>   Linus
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] mm: reject MAP_SHARED_VALIDATE without new flags

2018-06-27 Thread Linus Torvalds
On Wed, Jun 27, 2018 at 6:45 PM Eric Sandeen  wrote:
>
> Thus the invalid flag combination of (MAP_SHARED|MAP_PRIVATE) now
> passes without error, which is a regression.

It's not a regression, it's just new behavior.

"regression" doesn't mean "things changed". It means "something broke".

What broke?

Because if it's some manual page breakage, just fix the manual. That's
what "new behavior" is all about.

There is nothing that says that "MAP_SHARED_VALIDATE" can't work with
just the legacy flags.

Because I'd be worried about your patch breaking some actual new user
of MAP_SHARED_VALIDATE.

Because it's actual *users* of behavior we care about, not some
test-suite or manual pages.

  Linus
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH] mm: reject MAP_SHARED_VALIDATE without new flags

2018-06-27 Thread Eric Sandeen
mmap(2) says the syscall will return EINVAL if "flags contained neither
MAP_PRIVATE or MAP_SHARED, or contained both of these values."
   ^
However, commit 
1c972597 ("mm: introduce MAP_SHARED_VALIDATE ...")
introduced a new flag, MAP_SHARED_VALIDATE, with a value of 0x3,
which is indistinguishable from (MAP_SHARED|MAP_PRIVATE).

Thus the invalid flag combination of (MAP_SHARED|MAP_PRIVATE) now
passes without error, which is a regression.

I'm not sure of the best way out of this, other than to change the
API description to say that MAP_SHARED_VALIDATE is only allowed in
combination with "new" flags, and reject it if it's used only with
flags contained in LEGACY_MAP_MASK.

This will require the mmap(2) manpage to enumerate which flags don't
require validation, as well, so the user knows when to use the
VALIDATE flag.

I'm not super happy with this, because it also means that code
which explicitly asks for mmap(MAP_SHARED|MAP_PRIVATE|MAP_SYNC) will
also pass, but I'm not sure there's anything to do about that.

Reported-by: Zhibin Li 
Signed-off-by: Eric Sandeen 
---

diff --git a/mm/mmap.c b/mm/mmap.c
index d1eb87ef4b1a..b1dc84466365 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1440,6 +1440,16 @@ unsigned long do_mmap(struct file *file, unsigned long 
addr,
 
if (!file_mmap_ok(file, inode, pgoff, len))
return -EOVERFLOW;
+   /*
+* MAP_SHARED_VALIDATE is indistinguishable from
+* (MAP_SHARED|MAP_PRIVATE) which must return -EINVAL.
+* If the flags contain MAP_SHARED_VALIDATE and none of the
+* non-legacy flags, the user gets EINVAL.
+*/
+   if (((flags & MAP_SHARED_VALIDATE) == MAP_SHARED_VALIDATE) &&
+   !(flags & ~LEGACY_MAP_MASK)) {
+   return -EINVAL;
+   }
 
flags_mask = LEGACY_MAP_MASK | file->f_op->mmap_supported_flags;
 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [ndctl PATCH] ndctl: Add CONTRIBUTING.md

2018-06-27 Thread Yasunori Goto
Hi, Vishal-san,

> We were missing a CONTRIBUTING file to help new contributors follow the
> expected guidelines. Add one that refers to the Linux Kernel for most of
> these things, such as Coding style, Submitting Patches, and the DCO.
> 
> Cc: Dan Williams 
> Signed-off-by: Vishal Verma 
> ---
>  CONTRIBUTING.md | 31 +++
>  1 file changed, 31 insertions(+)
>  create mode 100644 CONTRIBUTING.md
> 
> diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
> new file mode 100644
> index 000..aa9e78d
> --- /dev/null
> +++ b/CONTRIBUTING.md
> @@ -0,0 +1,31 @@
> +# Contributing to ndctl
> +
> +Thanks for taking the time to contribute to ndctl.
> +
> +The following is a set of guidelines that we adhere to, and request that
> +contributors follow.
> +
> +1. The libnvdimm (kernel subsystem) and ndctl developers primarily use
> +   the [linux-nvdimm](https://lists.01.org/mailman/listinfo/linux-nvdimm)
> +   mailing list for discussions as well as posting patches.
> +1. Github [issues](https://github.com/pmem/ndctl/issues) are an acceptable
> +   way to report a problem, but if you just have a question,
> +   [email](mailto:linux-nvdimm@lists.01.org) the above list.
> +1. We follow the Linux Kernel [Coding Style Guide][cs] as applicable.
> +
> +   [cs]: https://www.kernel.org/doc/html/latest/process/coding-style.html
> +
> +1. We follow the Linux Kernel [Submitting Patches Guide][sp] as applicable.
> +
> +   [sp]: 
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html
> +
> +1. We follow the Linux Kernel [DCO][dco] (Developer Certificate of Origin).
> +   The DCO is an attestation attached to every contribution made by every
> +   developer. In the commit message of the contribution, the developer simply
> +   adds a Signed-off-by statement and thereby agrees to the DCO.
> +
> +   [dco]: https://developercertificate.org/
> +
> +1. Github Pull Requests are acceptable, but won't be merged directly, as
> +   Github doesn't allow for the kernel style flow of patches where a 
> maintainer
> +   also signs off on the patches they apply.

In my impression, newbie may not catch where is the mailing list
for sending patch from this description, and may not understand
what is difference between "[ndctl PATCH]" and
"[PATCH] pmem/libnvdimm/dax or others".

So, I think the followings descriptions are necessary.

- specify that linux-nvdimm@lists.01.org is suitable for sending patch.
- Which is better to use [ndctl PATCH] or [PATCH] : at subject.

Thanks,




___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [ndctl PATCH] ndctl: Add CONTRIBUTING.md

2018-06-27 Thread Dan Williams
On Wed, Jun 27, 2018 at 4:19 PM, Vishal Verma  wrote:
> We were missing a CONTRIBUTING file to help new contributors follow the
> expected guidelines. Add one that refers to the Linux Kernel for most of
> these things, such as Coding style, Submitting Patches, and the DCO.
>
> Cc: Dan Williams 
> Signed-off-by: Vishal Verma 

Looks good to me,

Reviewed-by: Dan Williams 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH] ndctl: Add CONTRIBUTING.md

2018-06-27 Thread Vishal Verma
We were missing a CONTRIBUTING file to help new contributors follow the
expected guidelines. Add one that refers to the Linux Kernel for most of
these things, such as Coding style, Submitting Patches, and the DCO.

Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 CONTRIBUTING.md | 31 +++
 1 file changed, 31 insertions(+)
 create mode 100644 CONTRIBUTING.md

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 000..aa9e78d
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,31 @@
+# Contributing to ndctl
+
+Thanks for taking the time to contribute to ndctl.
+
+The following is a set of guidelines that we adhere to, and request that
+contributors follow.
+
+1. The libnvdimm (kernel subsystem) and ndctl developers primarily use
+   the [linux-nvdimm](https://lists.01.org/mailman/listinfo/linux-nvdimm)
+   mailing list for discussions as well as posting patches.
+1. Github [issues](https://github.com/pmem/ndctl/issues) are an acceptable
+   way to report a problem, but if you just have a question,
+   [email](mailto:linux-nvdimm@lists.01.org) the above list.
+1. We follow the Linux Kernel [Coding Style Guide][cs] as applicable.
+
+   [cs]: https://www.kernel.org/doc/html/latest/process/coding-style.html
+
+1. We follow the Linux Kernel [Submitting Patches Guide][sp] as applicable.
+
+   [sp]: https://www.kernel.org/doc/html/latest/process/submitting-patches.html
+
+1. We follow the Linux Kernel [DCO][dco] (Developer Certificate of Origin).
+   The DCO is an attestation attached to every contribution made by every
+   developer. In the commit message of the contribution, the developer simply
+   adds a Signed-off-by statement and thereby agrees to the DCO.
+
+   [dco]: https://developercertificate.org/
+
+1. Github Pull Requests are acceptable, but won't be merged directly, as
+   Github doesn't allow for the kernel style flow of patches where a maintainer
+   also signs off on the patches they apply.
-- 
2.14.4

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [ndctl PATCH] Documentation: add a newline in namespace Theory of Operations

2018-06-27 Thread Ross Zwisler
On Wed, Jun 27, 2018 at 04:15:43PM -0600, Vishal Verma wrote:
> The first bullet in the modes description was merged in with the
> previous paragraph in the online version of the man pages. Fix by adding
> a newline before the bulleted list.
> 
> Reported-by: Ross Zwisler 
> Signed-off-by: Vishal Verma 
> ---
>  Documentation/ndctl/namespace-description.txt | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/Documentation/ndctl/namespace-description.txt 
> b/Documentation/ndctl/namespace-description.txt
> index 1dd687e..94999e5 100644
> --- a/Documentation/ndctl/namespace-description.txt
> +++ b/Documentation/ndctl/namespace-description.txt
> @@ -21,6 +21,7 @@ area.
>  A namespace can be provisioned to operate in one of 4 modes, 'fsdax',
>  'devdax', 'sector', and 'raw'. Here are the expected usage models for
>  these modes:
> +
>   - fsdax: Filesystem-DAX mode is the default mode of a namespace
> when specifying 'ndctl create-namespace' with no options. It creates
> a block device (/dev/pmemX[.Y]) that supports the DAX capabilities

Cool, thanks for fixing.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH] Documentation: add a newline in namespace Theory of Operations

2018-06-27 Thread Vishal Verma
The first bullet in the modes description was merged in with the
previous paragraph in the online version of the man pages. Fix by adding
a newline before the bulleted list.

Reported-by: Ross Zwisler 
Signed-off-by: Vishal Verma 
---
 Documentation/ndctl/namespace-description.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/ndctl/namespace-description.txt 
b/Documentation/ndctl/namespace-description.txt
index 1dd687e..94999e5 100644
--- a/Documentation/ndctl/namespace-description.txt
+++ b/Documentation/ndctl/namespace-description.txt
@@ -21,6 +21,7 @@ area.
 A namespace can be provisioned to operate in one of 4 modes, 'fsdax',
 'devdax', 'sector', and 'raw'. Here are the expected usage models for
 these modes:
+
- fsdax: Filesystem-DAX mode is the default mode of a namespace
  when specifying 'ndctl create-namespace' with no options. It creates
  a block device (/dev/pmemX[.Y]) that supports the DAX capabilities
-- 
2.14.4

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 1/2] dax: dax_layout_busy_page() warn on !exceptional

2018-06-27 Thread Ross Zwisler
Inodes using DAX should only ever have exceptional entries in their page
caches.  Make this clear by warning if the iteration in
dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a
comment for the pagevec_release() call which only deals with struct page
pointers.

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
---
 fs/dax.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 641192808bb6..897b51e41d8f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -566,7 +566,8 @@ struct page *dax_layout_busy_page(struct address_space 
*mapping)
if (index >= end)
break;
 
-   if (!radix_tree_exceptional_entry(pvec_ent))
+   if (WARN_ON_ONCE(
+!radix_tree_exceptional_entry(pvec_ent)))
continue;
 
xa_lock_irq(>i_pages);
@@ -578,6 +579,13 @@ struct page *dax_layout_busy_page(struct address_space 
*mapping)
if (page)
break;
}
+
+   /*
+* We don't expect normal struct page entries to exist in our
+* tree, but we keep these pagevec calls so that this code is
+* consistent with the common pattern for handling pagevecs
+* throughout the kernel.
+*/
pagevec_remove_exceptionals();
pagevec_release();
index++;
-- 
2.14.4

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 2/2] ext4: handle layout changes to pinned DAX mappings

2018-06-27 Thread Ross Zwisler
Follow the lead of xfs_break_dax_layouts() and add synchronization between
operations in ext4 which remove blocks from an inode (hole punch, truncate
down, etc.) and pages which are pinned due to DAX DMA operations.

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
---
 fs/ext4/ext4.h |  1 +
 fs/ext4/extents.c  | 12 
 fs/ext4/inode.c| 46 ++
 fs/ext4/truncate.h |  4 
 4 files changed, 63 insertions(+)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0b127853c584..34bccd64d83d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2460,6 +2460,7 @@ extern int ext4_get_inode_loc(struct inode *, struct 
ext4_iloc *);
 extern int ext4_inode_attach_jinode(struct inode *inode);
 extern int ext4_can_truncate(struct inode *inode);
 extern int ext4_truncate(struct inode *);
+extern int ext4_break_layouts(struct inode *);
 extern int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length);
 extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int 
nblocks);
 extern void ext4_set_inode_flags(struct inode *);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0057fe3f248d..a6aef06f455b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4820,6 +4820,13 @@ static long ext4_zero_range(struct file *file, loff_t 
offset,
 * released from page cache.
 */
down_write(_I(inode)->i_mmap_sem);
+
+   ret = ext4_break_layouts(inode);
+   if (ret) {
+   up_write(_I(inode)->i_mmap_sem);
+   goto out_mutex;
+   }
+
ret = ext4_update_disksize_before_punch(inode, offset, len);
if (ret) {
up_write(_I(inode)->i_mmap_sem);
@@ -5493,6 +5500,11 @@ int ext4_collapse_range(struct inode *inode, loff_t 
offset, loff_t len)
 * page cache.
 */
down_write(_I(inode)->i_mmap_sem);
+
+   ret = ext4_break_layouts(inode);
+   if (ret)
+   goto out_mmap;
+
/*
 * Need to round down offset to be aligned with page size boundary
 * for page size > block size.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2ea07efbe016..fadb8ecacb1e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4193,6 +4193,39 @@ int ext4_update_disksize_before_punch(struct inode 
*inode, loff_t offset,
return 0;
 }
 
+static void ext4_wait_dax_page(struct ext4_inode_info *ei, bool *did_unlock)
+{
+   *did_unlock = true;
+   up_write(>i_mmap_sem);
+   schedule();
+   down_write(>i_mmap_sem);
+}
+
+int ext4_break_layouts(struct inode *inode)
+{
+   struct ext4_inode_info *ei = EXT4_I(inode);
+   struct page *page;
+   bool retry;
+   int error;
+
+   if (WARN_ON_ONCE(!rwsem_is_locked(>i_mmap_sem)))
+   return -EINVAL;
+
+   do {
+   retry = false;
+   page = dax_layout_busy_page(inode->i_mapping);
+   if (!page)
+   return 0;
+
+   error = ___wait_var_event(>_refcount,
+   atomic_read(>_refcount) == 1,
+   TASK_INTERRUPTIBLE, 0, 0,
+   ext4_wait_dax_page(ei, ));
+   } while (error == 0 && retry);
+
+   return error;
+}
+
 /*
  * ext4_punch_hole: punches a hole in a file by releasing the blocks
  * associated with the given offset and length
@@ -4266,6 +4299,11 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, 
loff_t length)
 * page cache.
 */
down_write(_I(inode)->i_mmap_sem);
+
+   ret = ext4_break_layouts(inode);
+   if (ret)
+   goto out_dio;
+
first_block_offset = round_up(offset, sb->s_blocksize);
last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
 
@@ -5554,6 +5592,14 @@ int ext4_setattr(struct dentry *dentry, struct iattr 
*attr)
ext4_wait_for_tail_page_commit(inode);
}
down_write(_I(inode)->i_mmap_sem);
+
+   rc = ext4_break_layouts(inode);
+   if (rc) {
+   up_write(_I(inode)->i_mmap_sem);
+   error = rc;
+   goto err_out;
+   }
+
/*
 * Truncate pagecache after we've waited for commit
 * in data=journal mode to make pages freeable.
diff --git a/fs/ext4/truncate.h b/fs/ext4/truncate.h
index 0cb13badf473..bcbe3668c1d4 100644
--- a/fs/ext4/truncate.h
+++ b/fs/ext4/truncate.h
@@ -11,6 +11,10 @@
  */
 static inline void ext4_truncate_failed_write(struct inode *inode)
 {
+   /*
+* We don't need to call ext4_break_layouts() because the blocks we
+* are truncating were never visible to userspace.
+*/
down_write(_I(inode)->i_mmap_sem);
truncate_inode_pages(inode->i_mapping, 

[PATCH v2 0/2] ext4: fix DAX dma vs truncate/hole-punch

2018-06-27 Thread Ross Zwisler
This series from Dan:

https://lists.01.org/pipermail/linux-nvdimm/2018-March/014913.html

added synchronization between DAX dma and truncate/hole-punch in XFS.
This short series adds analogous support to ext4.

I've added calls to ext4_break_layouts() everywhere that ext4 removes
blocks from an inode's map.

The timings in XFS are such that it's difficult to hit this race.  Dan
was able to show the race by manually introducing delays in the direct
I/O path.

For ext4, though, its trivial to hit this race, and a hit will result in
a trigger of this WARN_ON_ONCE() in dax_disassociate_entry():

WARN_ON_ONCE(trunc && page_ref_count(page) > 1);

I've made an xfstest which tests all the paths where we now call
ext4_break_layouts(). Each of the four paths easily hits this race many
times in my test setup with the xfstest.  You can find that test here:

https://lists.01.org/pipermail/linux-nvdimm/2018-June/016435.html

With these patches applied, I've still seen occasional hits of the above
WARN_ON_ONCE(), which tells me that we still have some work to do.  I'll
continue looking at these more rare hits.

--- 

Changes in v2:
 * A little cleanup to each patch as suggested by Jan.
 * Removed the ext4_break_layouts() call in ext4_truncate_failed_write()
   and added a comment instead. (Jan)
 * Added reviewed-by tags from Jan.

Ross Zwisler (2):
  dax: dax_layout_busy_page() warn on !exceptional
  ext4: handle layout changes to pinned DAX mappings

 fs/dax.c   | 10 +-
 fs/ext4/ext4.h |  1 +
 fs/ext4/extents.c  | 12 
 fs/ext4/inode.c| 46 ++
 fs/ext4/truncate.h |  4 
 5 files changed, 72 insertions(+), 1 deletion(-)

-- 
2.14.4

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 2/2] ext4: handle layout changes to pinned DAX mappings

2018-06-27 Thread Ross Zwisler
On Fri, Jun 22, 2018 at 10:19:15AM +0200, Jan Kara wrote:
> On Wed 20-06-18 16:15:03, Ross Zwisler wrote:
> > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > down, etc.) and pages which are pinned due to DAX DMA operations.
> > 
> > Signed-off-by: Ross Zwisler 
> > ---
> >  fs/ext4/ext4.h |  1 +
> >  fs/ext4/extents.c  | 12 
> >  fs/ext4/inode.c| 48 
> >  fs/ext4/truncate.h |  1 +
> >  4 files changed, 62 insertions(+)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 0b127853c584..34bccd64d83d 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -2460,6 +2460,7 @@ extern int ext4_get_inode_loc(struct inode *, struct 
> > ext4_iloc *);
> >  extern int ext4_inode_attach_jinode(struct inode *inode);
> >  extern int ext4_can_truncate(struct inode *inode);
> >  extern int ext4_truncate(struct inode *);
> > +extern int ext4_break_layouts(struct inode *);
> >  extern int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t 
> > length);
> >  extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int 
> > nblocks);
> >  extern void ext4_set_inode_flags(struct inode *);
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 0057fe3f248d..a6aef06f455b 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4820,6 +4820,13 @@ static long ext4_zero_range(struct file *file, 
> > loff_t offset,
> >  * released from page cache.
> >  */
> > down_write(_I(inode)->i_mmap_sem);
> > +
> > +   ret = ext4_break_layouts(inode);
> > +   if (ret) {
> > +   up_write(_I(inode)->i_mmap_sem);
> > +   goto out_mutex;
> > +   }
> > +
> > ret = ext4_update_disksize_before_punch(inode, offset, len);
> > if (ret) {
> > up_write(_I(inode)->i_mmap_sem);
> > @@ -5493,6 +5500,11 @@ int ext4_collapse_range(struct inode *inode, loff_t 
> > offset, loff_t len)
> >  * page cache.
> >  */
> > down_write(_I(inode)->i_mmap_sem);
> > +
> > +   ret = ext4_break_layouts(inode);
> > +   if (ret)
> > +   goto out_mmap;
> > +
> > /*
> >  * Need to round down offset to be aligned with page size boundary
> >  * for page size > block size.
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 2ea07efbe016..c795e5118745 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -4193,6 +4193,41 @@ int ext4_update_disksize_before_punch(struct inode 
> > *inode, loff_t offset,
> > return 0;
> >  }
> >  
> > +static void ext4_wait_dax_page(struct ext4_inode_info *ei, bool 
> > *did_unlock)
> > +{
> > +   *did_unlock = true;
> > +   up_write(>i_mmap_sem);
> > +   schedule();
> > +   down_write(>i_mmap_sem);
> > +}
> > +
> > +int ext4_break_layouts(struct inode *inode)
> > +{
> > +   struct ext4_inode_info *ei = EXT4_I(inode);
> > +   struct page *page;
> > +   bool retry;
> > +   int error;
> > +
> > +   if (unlikely(!rwsem_is_locked(>i_mmap_sem))) {
> > +   WARN_ON_ONCE(1);
> > +   return -EINVAL;
> > +   }
> 
> This could be shortened as:
> 
> if (WARN_ON_ONCE(!rwsem_is_locked(>i_mmap_sem))) {
> }
> 
> couldn't it?

Yep, that's much better.  Thank you for the review.

> Besides that the patch looks to me. You can add:
> 
> Reviewed-by: Jan Kara 
> 
> And I'm really wondering which protection are we still missing that you are
> still able to hit the warning with these patches applied.

I'm also very curious and am going to dig into that this week.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[patch,v2] dev-dax: check_vma: ratelimit dev_info-s

2018-06-27 Thread Jeff Moyer
This is easily triggered from userspace, so let's ratelimit the
messages.

Signed-off-by: Jeff Moyer 

---
v2 - fix up all of the dev_info calls in check_vma

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index de2f8297a210..108c37fca782 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -189,14 +189,16 @@ static int check_vma(struct dev_dax *dev_dax, struct 
vm_area_struct *vma,
 
/* prevent private mappings from being established */
if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
-   dev_info(dev, "%s: %s: fail, attempted private mapping\n",
+   dev_info_ratelimited(dev,
+   "%s: %s: fail, attempted private mapping\n",
current->comm, func);
return -EINVAL;
}
 
mask = dax_region->align - 1;
if (vma->vm_start & mask || vma->vm_end & mask) {
-   dev_info(dev, "%s: %s: fail, unaligned vma (%#lx - %#lx, 
%#lx)\n",
+   dev_info_ratelimited(dev,
+   "%s: %s: fail, unaligned vma (%#lx - %#lx, 
%#lx)\n",
current->comm, func, vma->vm_start, vma->vm_end,
mask);
return -EINVAL;
@@ -204,13 +206,15 @@ static int check_vma(struct dev_dax *dev_dax, struct 
vm_area_struct *vma,
 
if ((dax_region->pfn_flags & (PFN_DEV|PFN_MAP)) == PFN_DEV
&& (vma->vm_flags & VM_DONTCOPY) == 0) {
-   dev_info(dev, "%s: %s: fail, dax range requires 
MADV_DONTFORK\n",
+   dev_info_ratelimited(dev,
+   "%s: %s: fail, dax range requires 
MADV_DONTFORK\n",
current->comm, func);
return -EINVAL;
}
 
if (!vma_is_dax(vma)) {
-   dev_info(dev, "%s: %s: fail, vma is not DAX capable\n",
+   dev_info_ratelimited(dev,
+   "%s: %s: fail, vma is not DAX capable\n",
current->comm, func);
return -EINVAL;
}
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Managed Service Providers (MSPs)

2018-06-27 Thread arcelia . mcdonald
Hi,



I just wanted to check if you would be interested in a list of Managed
Service Providers (MSPs) and Managed Security Service Providers (MSSPs)?



• Managed Service Providers (MSP’s) – 25,000 unique companies

• Managed Security Service Providers (MSSP’s) – 7,520 unique
companies

IT Decision Makers – 6million

Business Decision Makers – 10 million

Kindly review and let me know if I can share more information on this.

I look forward to hearing from you.



Regards,

Abraham Solomon

MSP List Specialist



For Opt-Out reply with “Not Interested”.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [patch] dev-dax: ratelimit unaligned vma warning

2018-06-27 Thread Dan Williams
On Wed, Jun 27, 2018 at 8:29 AM, Jeff Moyer  wrote:
> Dan Williams  writes:
>
>> On Wed, Jun 27, 2018 at 8:14 AM, Jeff Moyer  wrote:
>>> This is easily triggered from userspace, so let's ratelimit the warning.
>>>
>>> Signed-off-by: Jeff Moyer 
>>>
>>> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
>>> index de2f8297a210..16ea90615aac 100644
>>> --- a/drivers/dax/device.c
>>> +++ b/drivers/dax/device.c
>>> @@ -196,7 +196,8 @@ static int check_vma(struct dev_dax *dev_dax, struct 
>>> vm_area_struct *vma,
>>>
>>> mask = dax_region->align - 1;
>>> if (vma->vm_start & mask || vma->vm_end & mask) {
>>> -   dev_info(dev, "%s: %s: fail, unaligned vma (%#lx - %#lx, 
>>> %#lx)\n",
>>> +   dev_info_ratelimited(dev,
>>> +   "%s: %s: fail, unaligned vma (%#lx - %#lx, 
>>> %#lx)\n",
>>> current->comm, func, vma->vm_start, 
>>> vma->vm_end,
>>> mask);
>>
>> Sure, but any reason to not go ahead and convert all of them?
>
> I assume you mean "all of the dev_info calls in this function."

Right.

> If so,
> then no, there's no reason--I'll submit an updated patch.  Are there any
> other paths you know of with this sort of issue?  I didn't see any on a
> quick glance.

No, I think this is the only chatty place I can think of.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [patch] dev-dax: ratelimit unaligned vma warning

2018-06-27 Thread Jeff Moyer
Dan Williams  writes:

> On Wed, Jun 27, 2018 at 8:14 AM, Jeff Moyer  wrote:
>> This is easily triggered from userspace, so let's ratelimit the warning.
>>
>> Signed-off-by: Jeff Moyer 
>>
>> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
>> index de2f8297a210..16ea90615aac 100644
>> --- a/drivers/dax/device.c
>> +++ b/drivers/dax/device.c
>> @@ -196,7 +196,8 @@ static int check_vma(struct dev_dax *dev_dax, struct 
>> vm_area_struct *vma,
>>
>> mask = dax_region->align - 1;
>> if (vma->vm_start & mask || vma->vm_end & mask) {
>> -   dev_info(dev, "%s: %s: fail, unaligned vma (%#lx - %#lx, 
>> %#lx)\n",
>> +   dev_info_ratelimited(dev,
>> +   "%s: %s: fail, unaligned vma (%#lx - %#lx, 
>> %#lx)\n",
>> current->comm, func, vma->vm_start, 
>> vma->vm_end,
>> mask);
>
> Sure, but any reason to not go ahead and convert all of them?

I assume you mean "all of the dev_info calls in this function."  If so,
then no, there's no reason--I'll submit an updated patch.  Are there any
other paths you know of with this sort of issue?  I didn't see any on a
quick glance.

-Jeff
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [patch] dev-dax: ratelimit unaligned vma warning

2018-06-27 Thread Dan Williams
On Wed, Jun 27, 2018 at 8:14 AM, Jeff Moyer  wrote:
> This is easily triggered from userspace, so let's ratelimit the warning.
>
> Signed-off-by: Jeff Moyer 
>
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index de2f8297a210..16ea90615aac 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -196,7 +196,8 @@ static int check_vma(struct dev_dax *dev_dax, struct 
> vm_area_struct *vma,
>
> mask = dax_region->align - 1;
> if (vma->vm_start & mask || vma->vm_end & mask) {
> -   dev_info(dev, "%s: %s: fail, unaligned vma (%#lx - %#lx, 
> %#lx)\n",
> +   dev_info_ratelimited(dev,
> +   "%s: %s: fail, unaligned vma (%#lx - %#lx, 
> %#lx)\n",
> current->comm, func, vma->vm_start, 
> vma->vm_end,
> mask);

Sure, but any reason to not go ahead and convert all of them?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[patch] dev-dax: ratelimit unaligned vma warning

2018-06-27 Thread Jeff Moyer
This is easily triggered from userspace, so let's ratelimit the warning.

Signed-off-by: Jeff Moyer 

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index de2f8297a210..16ea90615aac 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -196,7 +196,8 @@ static int check_vma(struct dev_dax *dev_dax, struct 
vm_area_struct *vma,
 
mask = dax_region->align - 1;
if (vma->vm_start & mask || vma->vm_end & mask) {
-   dev_info(dev, "%s: %s: fail, unaligned vma (%#lx - %#lx, 
%#lx)\n",
+   dev_info_ratelimited(dev,
+   "%s: %s: fail, unaligned vma (%#lx - %#lx, 
%#lx)\n",
current->comm, func, vma->vm_start, vma->vm_end,
mask);
return -EINVAL;
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v8 1/3] ndctl, monitor: add ndctl monitor

2018-06-27 Thread Dan Williams
On Tue, Jun 26, 2018 at 10:24 PM, QI Fuli  wrote:
> Ndctl monitor is used for monitoring the smart events of nvdimm DIMMs.
> When a smart event fires, monitor will output the notifications which
> include dimm health status and evnet informations to syslog or a
> logfile by setting [--logfile] option. The notifications follow json
> format and can be consumed by log collectors like Fluentd.
>
> The objects to monitor can be selected by setting [--dimm] [--region]
> [--namespace] [--bus] options and the event type can be filtered by
> setting [--dimm-event] option. These options support multiple
> space-separated arguments.
>
> Ndctl monitor can be forked as a daemon by using [--daemon] option,
> such as:
># ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
>
> Signed-off-by: QI Fuli 
> ---
>  builtin.h |   1 +
>  ndctl/Makefile.am |   3 +-
>  ndctl/monitor.c   | 508 ++
>  ndctl/ndctl.c |   1 +
>  4 files changed, 512 insertions(+), 1 deletion(-)
>  create mode 100644 ndctl/monitor.c
>
> diff --git a/builtin.h b/builtin.h
> index d3cc723..675a6ce 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char **argv, void 
> *ctx);
>  int cmd_wait_scrub(int argc, const char **argv, void *ctx);
>  int cmd_start_scrub(int argc, const char **argv, void *ctx);
>  int cmd_list(int argc, const char **argv, void *ctx);
> +int cmd_monitor(int argc, const char **argv, void *ctx);
>  #ifdef ENABLE_TEST
>  int cmd_test(int argc, const char **argv, void *ctx);
>  #endif
> diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
> index d22a379..7dbf223 100644
> --- a/ndctl/Makefile.am
> +++ b/ndctl/Makefile.am
> @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
> util/json-smart.c \
> util/json-firmware.c \
> inject-error.c \
> -   inject-smart.c
> +   inject-smart.c \
> +   monitor.c
>
>  if ENABLE_DESTRUCTIVE
>  ndctl_SOURCES += ../test/blk_namespaces.c \
> diff --git a/ndctl/monitor.c b/ndctl/monitor.c
> new file mode 100644
> index 000..d926a81
> --- /dev/null
> +++ b/ndctl/monitor.c
> @@ -0,0 +1,508 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright(c) 2018, FUJITSU LIMITED. All rights reserved. */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#define BUF_SIZE 2048
> +
> +#define DIMM_SPARES_REMAINING  (1 << 0)
> +#define DIMM_MEDIA_TEMPERATURE (1 << 1)
> +#define DIMM_CTRL_TEMPERATURE  (1 << 2)
> +#define DIMM_HEALTH_STATE  (1 << 3)
> +#define DIMM_UNCLEAN_SHUTDOWN  (1 << 4)
> +
> +static struct monitor {
> +   const char *logfile;
> +   const char *dimm_event;
> +   bool daemon;
> +} monitor;
> +
> +struct monitor_dimm {
> +   struct ndctl_dimm *dimm;
> +   int health_eventfd;
> +   unsigned int health;
> +   unsigned int event_flags;
> +   struct list_node list;
> +};
> +
> +struct monitor_filter_arg {
> +   struct list_head dimms;
> +   int maxfd_dimm;
> +   int num_dimm;
> +   unsigned long flags;
> +};
> +
> +struct util_filter_params param;
> +
> +static int did_fail;
> +
> +#define fail(fmt, ...) \
> +do { \
> +   did_fail = 1; \
> +   err((struct ndctl_ctx *)ctx, "ndctl-%s:%s:%d: " fmt, \
> +   VERSION, __func__, __LINE__, ##__VA_ARGS__); \
> +} while (0)

It is odd that the monitor command emits errors in this format, but no
other ndctl uses scheme. Also. any print that includes source code
line numbers is debug in my opinion. Source code line numbers are
useful for developers not typical users. Let's make this a debug level
print and circle back later to see if we need a better common error
print format across all commands.

> +
> +static bool is_dir(char *filepath)
> +{
> +   DIR *dir = opendir(filepath);
> +   if (dir) {
> +   closedir(dir);
> +   return true;
> +   }
> +   return false;
> +}
> +
> +static void log_syslog(struct ndctl_ctx *ctx, int priority, const char *file,
> +   int line, const char *fn, const char *format, va_list args)
> +{
> +   char *buf;
> +
> +   buf = malloc(BUF_SIZE);
> +   if (!buf) {
> +   fail("malloc for log buffer failed\n");
> +   return;
> +   }
> +   vsnprintf(buf, BUF_SIZE, format, args);
> +   syslog(priority, "%s\n", buf);
> +
> +   free(buf);
> +   return;
> +}
> +
> +static void log_file(struct ndctl_ctx *ctx, int priority, const char *file,
> +   int line, const char *fn, const char *format, va_list args)
> +{
> +   FILE *f;
> +   char *buf, *log_dir;
> +
> +   buf = malloc(BUF_SIZE);
> +   if (!buf) {
> +   fail("malloc for log buffer failed\n");
> +   return;
> +   }
> +   

Re: [PATCH] x86: optimize memcpy_flushcache

2018-06-27 Thread Yigal Korman
On Wed, Jun 27, 2018 at 4:03 PM, Dan Williams  wrote:
> On Wed, Jun 27, 2018 at 4:23 AM, Yigal Korman  wrote:
>> Hi,
>> I'm a bit late on this but I have a question about the original patch -
>> I thought that in order for movnt (movntil, movntiq) to push the data
>> into the persistency domain (ADR),
>> one must work with length that is multiple of cacheline size,
>> otherwise the write-combine buffers remain partially
>> filled and you need to commit them with a fence (sfence) - which ruins
>> the whole performance gain you got here.
>> Am I wrong, are the write-combine buffers are part of the ADR domain
>> or something?
>
> The intent is to allow a batch of memcpy_flushcache() calls followed
> by a single sfence. Specifying a multiple of a cacheline size does not
> necessarily help as sfence is still needed to make sure that the movnt
> result has reached the ADR-safe domain.

Oh, right, I see that dm-writecache calls writecache_commit_flushed
which in turn calls wmb().
I keep confusing *_nocache (i.e. copy_user_nocache) that includes
sfence and *_flushcache (i.e. memcpy_flushcache) that doesn't.
Thanks for the clear up.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] x86: optimize memcpy_flushcache

2018-06-27 Thread Dan Williams
On Wed, Jun 27, 2018 at 4:23 AM, Yigal Korman  wrote:
> Hi,
> I'm a bit late on this but I have a question about the original patch -
> I thought that in order for movnt (movntil, movntiq) to push the data
> into the persistency domain (ADR),
> one must work with length that is multiple of cacheline size,
> otherwise the write-combine buffers remain partially
> filled and you need to commit them with a fence (sfence) - which ruins
> the whole performance gain you got here.
> Am I wrong, are the write-combine buffers are part of the ADR domain
> or something?

The intent is to allow a batch of memcpy_flushcache() calls followed
by a single sfence. Specifying a multiple of a cacheline size does not
necessarily help as sfence is still needed to make sure that the movnt
result has reached the ADR-safe domain.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] x86: optimize memcpy_flushcache

2018-06-27 Thread Yigal Korman
Hi,
I'm a bit late on this but I have a question about the original patch -
I thought that in order for movnt (movntil, movntiq) to push the data
into the persistency domain (ADR),
one must work with length that is multiple of cacheline size,
otherwise the write-combine buffers remain partially
filled and you need to commit them with a fence (sfence) - which ruins
the whole performance gain you got here.
Am I wrong, are the write-combine buffers are part of the ADR domain
or something?

Thanks,
Yigal

On Mon, Jun 18, 2018 at 7:38 PM, Dan Williams  wrote:
> On Mon, Jun 18, 2018 at 5:50 AM, Mikulas Patocka  wrote:
>> Hi Mike
>>
>> Could you please push this patch to the kernel 4.18-rc? Dan Williams said
>> that he will submit it, but he forgot about it.
>
> ...to be clear I acked it and asked Ingo to take it. Will need a
> resubmit for 4.19.
>
> Ingo, see below for a patch to pick up into -tip when you have a chance.
>
>>
>> Without this patch, dm-writecache is suffering 2% penalty because of
>> memcpy_flushcache overhead.
>>
>> Mikulas
>>
>>
>>
>> From: Mikulas Patocka 
>>
>> I use memcpy_flushcache in my persistent memory driver for metadata
>> updates and it turns out that the overhead of memcpy_flushcache causes 2%
>> performance degradation compared to "movnti" instruction explicitly coded
>> using inline assembler.
>>
>> This patch recognizes memcpy_flushcache calls with constant short length
>> and turns them into inline assembler - so that I don't have to use inline
>> assembler in the driver.
>>
>> Signed-off-by: Mikulas Patocka 
>>
>> ---
>>  arch/x86/include/asm/string_64.h |   20 +++-
>>  arch/x86/lib/usercopy_64.c   |4 ++--
>>  2 files changed, 21 insertions(+), 3 deletions(-)
>>
>> Index: linux-2.6/arch/x86/include/asm/string_64.h
>> ===
>> --- linux-2.6.orig/arch/x86/include/asm/string_64.h
>> +++ linux-2.6/arch/x86/include/asm/string_64.h
>> @@ -149,7 +149,25 @@ memcpy_mcsafe(void *dst, const void *src
>>
>>  #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>>  #define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1
>> -void memcpy_flushcache(void *dst, const void *src, size_t cnt);
>> +void __memcpy_flushcache(void *dst, const void *src, size_t cnt);
>> +static __always_inline void memcpy_flushcache(void *dst, const void *src, 
>> size_t cnt)
>> +{
>> +   if (__builtin_constant_p(cnt)) {
>> +   switch (cnt) {
>> +   case 4:
>> +   asm ("movntil %1, %0" : "=m"(*(u32 *)dst) : 
>> "r"(*(u32 *)src));
>> +   return;
>> +   case 8:
>> +   asm ("movntiq %1, %0" : "=m"(*(u64 *)dst) : 
>> "r"(*(u64 *)src));
>> +   return;
>> +   case 16:
>> +   asm ("movntiq %1, %0" : "=m"(*(u64 *)dst) : 
>> "r"(*(u64 *)src));
>> +   asm ("movntiq %1, %0" : "=m"(*(u64 *)(dst + 
>> 8)) : "r"(*(u64 *)(src + 8)));
>> +   return;
>> +   }
>> +   }
>> +   __memcpy_flushcache(dst, src, cnt);
>> +}
>>  #endif
>>
>>  #endif /* __KERNEL__ */
>> Index: linux-2.6/arch/x86/lib/usercopy_64.c
>> ===
>> --- linux-2.6.orig/arch/x86/lib/usercopy_64.c
>> +++ linux-2.6/arch/x86/lib/usercopy_64.c
>> @@ -153,7 +153,7 @@ long __copy_user_flushcache(void *dst, c
>> return rc;
>>  }
>>
>> -void memcpy_flushcache(void *_dst, const void *_src, size_t size)
>> +void __memcpy_flushcache(void *_dst, const void *_src, size_t size)
>>  {
>> unsigned long dest = (unsigned long) _dst;
>> unsigned long source = (unsigned long) _src;
>> @@ -216,7 +216,7 @@ void memcpy_flushcache(void *_dst, const
>> clean_cache_range((void *) dest, size);
>> }
>>  }
>> -EXPORT_SYMBOL_GPL(memcpy_flushcache);
>> +EXPORT_SYMBOL_GPL(__memcpy_flushcache);
>>
>>  void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
>> size_t len)
> ___
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm