[PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()

2017-04-21 Thread Dan Williams
The nvdimm_flush() mechanism helps to reduce the impact of an ADR
(asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
platform WPQ (write-pending-queue) buffers when power is removed. The
nvdimm_flush() mechanism performs that same function on-demand.

When a pmem namespace is associated with a block device, an
nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
request. However, when a namespace is in device-dax mode, or namespaces
are disabled, userspace needs another path.

The new 'flush' attribute is visible when it can be determined that the
interleave-set either does, or does not have DIMMs that expose WPQ-flush
addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
flushes DIMMs, or returns "0" the flush operation is a platform nop.

Signed-off-by: Dan Williams 
---
 drivers/nvdimm/region_devs.c |   17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8de5a04644a1..3495b4c23941 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(size);
 
+static ssize_t flush_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct nd_region *nd_region = to_nd_region(dev);
+
+   if (nvdimm_has_flush(nd_region)) {
+   nvdimm_flush(nd_region);
+   return sprintf(buf, "1\n");
+   }
+   return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RO(flush);
+
 static ssize_t mappings_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
@@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
 
 static struct attribute *nd_region_attributes[] = {
_attr_size.attr,
+   _attr_flush.attr,
_attr_nstype.attr,
_attr_mappings.attr,
_attr_btt_seed.attr,
@@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct 
attribute *a, int n)
if (!is_nd_pmem(dev) && a == _attr_resource.attr)
return 0;
 
+   if (a == _attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
+   return 0;
+
if (a != _attr_set_cookie.attr
&& a != _attr_available_size.attr)
return a->mode;

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-21 Thread Verma, Vishal L
On Thu, 2017-04-13 at 13:31 +0200, Borislav Petkov wrote:
> On Thu, Apr 13, 2017 at 12:29:25AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> > > We can futz with that and have them specify which chain (or both)
> > > that they want to be added to.
> > 
> > Well, I didn't want the atomic chain to be a notifier because we can
> > keep it simple and non-blocking. Only the process context one will
> > be.
> > 
> > So the question is, do we even have a use case for outside consumers
> > hanging on the atomic chain? Because if not, we're good to go.
> 
> Ok, new day, new patch.
> 
> Below is what we could do: we don't call the notifier at all on the
> atomic path but only print the MCEs. We do log them and if the machine
> survives, we process them accordingly. This is only a fix for upstream
> so that the current issue at hand is addressed.
> 
> For later, we'd need to split the paths in:
> 
> critical_print_mce()
> 
> or somesuch which immediately dumps the MCE to dmesg, and
> 
> mce_log()
> 
> which does the slow path of logging MCEs and calling the blocking
> notifier.
> 
> Now, I'd want to have decoding of the MCE on the critical path too so
> I have to think about how to do that nicely. Maybe move the decoding
> bits which are the same between Intel and AMD in mce.c and have some
> vendor-specific, fast calls. We'll see. Btw, this is something Ingo
> has
> been mentioning for a while.
> 
> Anyway, here's just the urgent fix for now.
> 
> Thanks.
> 
> ---
> From: Vishal Verma 
> Date: Tue, 11 Apr 2017 16:44:57 -0600
> Subject: [PATCH] x86/mce: Make the MCE notifier a blocking one
> 
> The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> takes a mutex to add the location of a memory error to a list. But
> since
> the notifier call chain for machine checks (x86_mce_decoder_chain) is
> atomic, we get a lockdep splat like:
> 
>   BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:620
>   in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
>   [..]
>   Call Trace:
>    dump_stack
>    ___might_sleep
>    __might_sleep
>    mutex_lock_nested
>    ? __lock_acquire
>    nfit_handle_mce
>    notifier_call_chain
>    atomic_notifier_call_chain
>    ? atomic_notifier_call_chain
>    mce_gen_pool_process
> 
> Convert the notifier to a blocking one which gets to run only in
> process
> context.
> 
> Boris: remove the notifier call in atomic context in print_mce(). For
> now, let's print the MCE on the atomic path so that we can make sure
> it
> goes out. We still log it for process context later.
> 
> Reported-by: Ross Zwisler 
> Signed-off-by: Vishal Verma 
> Cc: Tony Luck 
> Cc: Dan Williams 
> Cc: linux-edac 
> Cc: x86-ml 
> Cc: 
> Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@i
> ntel.com
> Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media
> error")
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/cpu/mcheck/mce-genpool.c  |  2 +-
>  arch/x86/kernel/cpu/mcheck/mce-internal.h |  2 +-
>  arch/x86/kernel/cpu/mcheck/mce.c  | 18 --
>  3 files changed, 6 insertions(+), 16 deletions(-)
> 

I noticed this patch was picked up in tip, in ras/urgent, but didn't see
a pull request for 4.11 - was this the intention? Or will it just be
added for 4.12?

-Vishal
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce

2017-04-21 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 01:27:41PM -0700, Luck, Tony wrote:
> Boris: you coded up a "static bool memory_error(struct mce *m)"
> function inside the patches for the corrected error thingy.
> 
> Perhaps when it goes upstream it should be available for other
> users too?

I don't see why not. struct mce.cpuvendor even has the vendor in there
so memory_error() wouldn't even have to look at boot_cpu_data when doing
per-vendor decision.

I guess we should rename it to something more global namespace-y like
"mce_is_memory_error() or so, though, before we expose it to wider
audience...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce

2017-04-21 Thread Vishal Verma
On 04/21, Luck, Tony wrote:
> On Fri, Apr 21, 2017 at 02:35:51PM -0600, Vishal Verma wrote:
> > On 04/21, Luck, Tony wrote:
> > > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> > > 
> > >   if (!((mce->status & 0xef80) == BIT(7)))
> > 
> > Is this still right though? Anything AND'ed with 0xef80 will never equal
> > BIT(7) which is simply 0100 binary (the lowest byte of the left hand
> > side is '0')
> 
> I think so ... here it is in binary
> 
> ef80 = 1110  1000 
> BIT7 =   1000 
> 
> so the "&" will zap bits {6:0} and bit {12}  [and everything not part
> of the MCACOD field].
> 
> If mce->status had some bit above BIT(7) set, it won't be zapped, so we
> won't match the exact value BIT(7).

Ah, you're right, I was off by one, taking BIT(7) to mean 0100 

> 
> -Tony
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce

2017-04-21 Thread Luck, Tony
On Fri, Apr 21, 2017 at 02:35:51PM -0600, Vishal Verma wrote:
> On 04/21, Luck, Tony wrote:
> > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> > 
> > if (!((mce->status & 0xef80) == BIT(7)))
> 
> Is this still right though? Anything AND'ed with 0xef80 will never equal
> BIT(7) which is simply 0100 binary (the lowest byte of the left hand
> side is '0')

I think so ... here it is in binary

ef80 = 1110  1000 
BIT7 =   1000 

so the "&" will zap bits {6:0} and bit {12}  [and everything not part
of the MCACOD field].

If mce->status had some bit above BIT(7) set, it won't be zapped, so we
won't match the exact value BIT(7).

-Tony
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce

2017-04-21 Thread Vishal Verma
On 04/21, Luck, Tony wrote:
> >> > +   if (!(mce->status & 0xef80) == BIT(7))
> >> 
> >> Can we get a define for this, or a comment explaining all the magic
> >> that's happening on that one line?
> >
> > Yes - also like lkp pointed out, the check isn't correct at all. Let me
> > figure out what really needs to be done, and I will resend with a better
> > comment. 
> 
> Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> 
>   if (!((mce->status & 0xef80) == BIT(7)))

Is this still right though? Anything AND'ed with 0xef80 will never equal
BIT(7) which is simply 0100 binary (the lowest byte of the left hand
side is '0')

> 
> The magic is shown in table 15-9 of the Intel Software Developers Manual
> (but perhaps not well explained there).
> 
> mce->status in the above code is a value plucked from a machine check
> bank status register. See figure 15-6 in the SDM.  The important bits for this
> are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> are grouped into types, where the type is defined by the most significant '1'
> bit in the field (excluding bit 12 which is the Correction Report Filtering 
> bit,
> see section 15.9.2.1).
> 
> So if BIT(3) is the most significant bit, the this is a "Generic Cache 
> Hierarchy"
> error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.
> 
> Maybe we should have defines in mce.h for them?  It gets a bit more 
> complicated
> as all the above only applies to Intel branded X86 CPUs ... on AMD different
> decoding rules apply.
> 
> -Tony
> 
> 


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce

2017-04-21 Thread Luck, Tony
On Fri, Apr 21, 2017 at 01:19:16PM -0700, Dan Williams wrote:
> On Fri, Apr 21, 2017 at 1:16 PM, Luck, Tony  wrote:
> >>> > +   if (!(mce->status & 0xef80) == BIT(7))
> >>>
> >>> Can we get a define for this, or a comment explaining all the magic
> >>> that's happening on that one line?
> >>
> >> Yes - also like lkp pointed out, the check isn't correct at all. Let me
> >> figure out what really needs to be done, and I will resend with a better
> >> comment.
> >
> > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> >
> > if (!((mce->status & 0xef80) == BIT(7)))
> >
> > The magic is shown in table 15-9 of the Intel Software Developers Manual
> > (but perhaps not well explained there).
> >
> > mce->status in the above code is a value plucked from a machine check
> > bank status register. See figure 15-6 in the SDM.  The important bits for 
> > this
> > are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> > are grouped into types, where the type is defined by the most significant 
> > '1'
> > bit in the field (excluding bit 12 which is the Correction Report Filtering 
> > bit,
> > see section 15.9.2.1).
> >
> > So if BIT(3) is the most significant bit, the this is a "Generic Cache 
> > Hierarchy"
> > error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.
> 
> Ah, ok.
> 
> > Maybe we should have defines in mce.h for them?  It gets a bit more 
> > complicated
> > as all the above only applies to Intel branded X86 CPUs ... on AMD different
> > decoding rules apply.
> 
> Yeah, this code is x86_64 generic so should call into helpers that do
> the right thing per cpu type.

Boris: you coded up a "static bool memory_error(struct mce *m)"
function inside the patches for the corrected error thingy.

Perhaps when it goes upstream it should be available for other
users too?

-Tony
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


解决工程电路设计de疑难杂症

2017-04-21 Thread 富厅
详 情 请 查 阅 附 件 大 纲 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 2/3] ndctl, create-namespace: read default alignment from sysfs

2017-04-21 Thread Dan Williams
On Fri, Apr 21, 2017 at 12:12 AM, Oliver O'Halloran  wrote:
> Read the default alignment from the hpage_pmd_size in sysfs. On PPC the
> PMD size depends on the MMU being used. When the traditional hash MMU is
> used (P9 and earlier) the PMD size is 16MB while the newer radix MMU
> uses a 2MB PMD size. The choice of MMU is done at runtime depending on
> what the hardware supports so we need to detect this at runtime rather
> than hardcoding it.
>
> Signed-off-by: Oliver O'Halloran 
> ---
>  ndctl/Makefile.am |  3 ++-
>  ndctl/builtin-xaction-namespace.c | 41 
> +--
>  2 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
> index c563e9411cc3..6d565c643efd 100644
> --- a/ndctl/Makefile.am
> +++ b/ndctl/Makefile.am
> @@ -10,7 +10,8 @@ ndctl_SOURCES = ndctl.c \
>  ../util/log.c \
> builtin-list.c \
> builtin-test.c \
> -   ../util/json.c
> +   ../util/json.c \
> +   ../util/sysfs.c
>
>  if ENABLE_SMART
>  ndctl_SOURCES += util/json-smart.c
> diff --git a/ndctl/builtin-xaction-namespace.c 
> b/ndctl/builtin-xaction-namespace.c
> index d6c38dc15984..713a95987d91 100644
> --- a/ndctl/builtin-xaction-namespace.c
> +++ b/ndctl/builtin-xaction-namespace.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -54,6 +55,8 @@ static struct parameters {
> const char *align;
>  } param;
>
> +char default_align_buf[SYSFS_ATTR_SIZE];
> +
>  void builtin_xaction_namespace_reset(void)
>  {
> /*
> @@ -137,7 +140,24 @@ enum namespace_action {
> ACTION_DESTROY,
>  };
>
> -static int set_defaults(enum namespace_action mode)
> +const char *sysfs_read_default_align(struct ndctl_ctx *ctx, const char *def,
> +   const char *path)
> +{
> +   /*
> +* HACK: The command handlers aren't supposed to write into
> +*   the ndctl command context, but we want the debug
> +*   output to go somewhere sensible.
> +*/
> +   if (__sysfs_read_attr((struct log_ctx *)ctx, path, default_align_buf))
> +   return strdup(def);
> +
> +   if (!strlen(default_align_buf))
> +   return def;
> +
> +   return default_align_buf;

I chatted with Dave Hansen about this and we're thinking we should go
ahead and add a new attribute to the device-dax sysfs with the list of
supported alignments, similar to what we have in the btt case for
supported sector sizes.

The reason is that the sensitivity to page sizes is a device-dax
internal requirement. Theoretically device-dax could support any
alignment and handle it with a mix of page sizes. However, since
device-dax wants to be strict and predictable about the tlb size
backing a given device-dax mapping then it should list the possible
options.

Looking at the transparent_hugepage sysfs is a bit of a layering
violation. There is no strict guarantee that device-dax is tied to thp
in the longterm. The thp sysfs is also awkward because it does not
tell us the pud page size.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH 1/3] ndctl, create-namespace: Allow 64K and 16M alignments

2017-04-21 Thread Oliver O'Halloran
These are needed on powerpc since 64K is the default page size and 16MB
is the PMD size when using the hash MMU.

Signed-off-by: Oliver O'Halloran 
---
 ndctl/builtin-xaction-namespace.c | 2 ++
 util/size.h   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/ndctl/builtin-xaction-namespace.c 
b/ndctl/builtin-xaction-namespace.c
index 46d651e86153..d6c38dc15984 100644
--- a/ndctl/builtin-xaction-namespace.c
+++ b/ndctl/builtin-xaction-namespace.c
@@ -494,7 +494,9 @@ static int validate_namespace_options(struct ndctl_region 
*region,
 
switch (p->align) {
case SZ_4K:
+   case SZ_64K:
case SZ_2M:
+   case SZ_16M:
case SZ_1G:
break;
default:
diff --git a/util/size.h b/util/size.h
index 4af14eb7d150..f1bfd1a30438 100644
--- a/util/size.h
+++ b/util/size.h
@@ -3,6 +3,7 @@
 
 #define SZ_1K 0x0400
 #define SZ_4K 0x1000
+#define SZ_64K0x0001
 #define SZ_1M 0x0010
 #define SZ_2M 0x0020
 #define SZ_4M 0x0040
-- 
2.9.3

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH 2/3] ndctl, create-namespace: read default alignment from sysfs

2017-04-21 Thread Oliver O'Halloran
Read the default alignment from the hpage_pmd_size in sysfs. On PPC the
PMD size depends on the MMU being used. When the traditional hash MMU is
used (P9 and earlier) the PMD size is 16MB while the newer radix MMU
uses a 2MB PMD size. The choice of MMU is done at runtime depending on
what the hardware supports so we need to detect this at runtime rather
than hardcoding it.

Signed-off-by: Oliver O'Halloran 
---
 ndctl/Makefile.am |  3 ++-
 ndctl/builtin-xaction-namespace.c | 41 +--
 2 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
index c563e9411cc3..6d565c643efd 100644
--- a/ndctl/Makefile.am
+++ b/ndctl/Makefile.am
@@ -10,7 +10,8 @@ ndctl_SOURCES = ndctl.c \
 ../util/log.c \
builtin-list.c \
builtin-test.c \
-   ../util/json.c
+   ../util/json.c \
+   ../util/sysfs.c
 
 if ENABLE_SMART
 ndctl_SOURCES += util/json-smart.c
diff --git a/ndctl/builtin-xaction-namespace.c 
b/ndctl/builtin-xaction-namespace.c
index d6c38dc15984..713a95987d91 100644
--- a/ndctl/builtin-xaction-namespace.c
+++ b/ndctl/builtin-xaction-namespace.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -54,6 +55,8 @@ static struct parameters {
const char *align;
 } param;
 
+char default_align_buf[SYSFS_ATTR_SIZE];
+
 void builtin_xaction_namespace_reset(void)
 {
/*
@@ -137,7 +140,24 @@ enum namespace_action {
ACTION_DESTROY,
 };
 
-static int set_defaults(enum namespace_action mode)
+const char *sysfs_read_default_align(struct ndctl_ctx *ctx, const char *def,
+   const char *path)
+{
+   /*
+* HACK: The command handlers aren't supposed to write into
+*   the ndctl command context, but we want the debug
+*   output to go somewhere sensible.
+*/
+   if (__sysfs_read_attr((struct log_ctx *)ctx, path, default_align_buf))
+   return strdup(def);
+
+   if (!strlen(default_align_buf))
+   return def;
+
+   return default_align_buf;
+}
+
+static int set_defaults(enum namespace_action mode, struct ndctl_ctx *ctx)
 {
int rc = 0;
 
@@ -213,7 +233,8 @@ static int set_defaults(enum namespace_action mode)
param.align);
rc = -EINVAL;
} else if (!param.align) {
-   param.align = "2M";
+   param.align = sysfs_read_default_align(ctx, "2M",
+   "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size");
param.align_default = true;
}
 
@@ -254,7 +275,7 @@ static int set_defaults(enum namespace_action mode)
  */
 static const char *parse_namespace_options(int argc, const char **argv,
enum namespace_action mode, const struct option *options,
-   char *xable_usage)
+   char *xable_usage, struct ndctl_ctx *ctx)
 {
const char * const u[] = {
xable_usage,
@@ -265,7 +286,7 @@ static const char *parse_namespace_options(int argc, const 
char **argv,
param.do_scan = argc == 1;
 argc = parse_options(argc, argv, options, u, 0);
 
-   rc = set_defaults(mode);
+   rc = set_defaults(mode, ctx);
 
if (argc == 0 && mode != ACTION_CREATE) {
error("specify a namespace to %s, or \"all\"\n",
@@ -397,7 +418,7 @@ static int validate_namespace_options(struct ndctl_region 
*region,
struct ndctl_namespace *ndns, struct parsed_parameters *p)
 {
const char *region_name = ndctl_region_get_devname(region);
-   unsigned long long size_align, units = 1;
+   unsigned long long size_align = 1, units = 1;
unsigned int ways;
int rc = 0;
 
@@ -900,7 +921,7 @@ int cmd_disable_namespace(int argc, const char **argv, void 
*ctx)
 {
char *xable_usage = "ndctl disable-namespace  []";
const char *namespace = parse_namespace_options(argc, argv,
-   ACTION_DISABLE, base_options, xable_usage);
+   ACTION_DISABLE, base_options, xable_usage, ctx);
int disabled = do_xaction_namespace(namespace, ACTION_DISABLE, ctx);
 
if (disabled < 0) {
@@ -921,7 +942,7 @@ int cmd_enable_namespace(int argc, const char **argv, void 
*ctx)
 {
char *xable_usage = "ndctl enable-namespace  []";
const char *namespace = parse_namespace_options(argc, argv,
-   ACTION_ENABLE, base_options, xable_usage);
+   ACTION_ENABLE, base_options, xable_usage, ctx);
int enabled = do_xaction_namespace(namespace, ACTION_ENABLE, ctx);
 
if (enabled < 0) {
@@ -942,7 +963,7 @@ int cmd_create_namespace(int argc, const char **argv, void 
*ctx)
 {
char *xable_usage = "ndctl create-namespace []";
const char *namespace =