[ANNOUNCE] ndctl v62

2018-08-13 Thread Verma, Vishal L
https://github.com/pmem/ndctl/releases/tag/v62

This release incorporates functionality up to the 4.19 kernel, and a
number of bug fixes and improvements.

Highlights include addition of the 'ndctl monitor' command to monitor
for SMART health events, use of the new max_available_extent sysfs
attribute for namespace creation, verbosity levels for ndctl-list, a
udev rule for enabling the LSS latch when supported, a bypass route
for making the unsafe shutdown count available for non-privileged users,
improvements to ndctl-inject-smart that include an 'uninject' option
for all fields, and a new unit test, a number of static analysis fixes,
and unit test improvements and fixes.

Shortlog:
Dan Williams (2):
  ndctl, test: check availability of MAP_SYNC for poison test
  ndctl: Remove dependency on linker garbage collection

Keith Busch (5):
  ndctl: Use max_available_extent for namespace
  ndctl: Create ndctl udev rules for dirty shutdown
  ndctl, intel: Fallback to smart cached shutdown_count
  ndctl: Add 'list' verbose options
  ndctl: Work around kernel memory corruption

Masayoshi Mizuma (3):
  ndctl, test: remove the firmware image file before the test end
  ndctl, documentation: Clarify the dimm id for ndctl list d option
  ndctl, test: add a new unit test for max_available_extent namespace

Maxwell William (1):
  util/strbuf.h: include sys/types.h for ssize_t definition.

QI Fuli (14):
  ndctl, monitor: add a new command - monitor
  ndctl, monitor: add main ndctl monitor configuration file
  ndctl, monitor: add the unit file of systemd for ndctl-monitor service
  ndctl, documentation: add man page for monitor
  ndctl, test: add a new unit test for monitor
  ndctl, monitor: fix the lack of detection of invalid dimm-events
  ndctl, inject-smart: add an interface to inject ctrl-temperature
  ndctl, monitor: Fix duplicate prefix in monitor.log
  ndctl, monitor: add [--verbose] option to emit extra debug messages
  ndctl, list: add alarm_enable_ to list
  ndctl, monitor: fix the lack of detection of invalid path of log file
  ndctl, monitor: set default log destination to syslog if "--daemon" is 
specified
  ndctl, monitor: add timestamp and pid to log messages in log_file()
  ndctl, monitor: add [Install] Section to systemd unit file of 
ndctl-monitor

Ross Zwisler (2):
  ndctl: simplify JSON print flag handling
  ndctl list: always output array without --human

Vishal Verma (33):
  Documentation: add a newline in namespace Theory of Operations
  ndctl: Add CONTRIBUTING.md
  ndctl, test: add start/wait scrub to injection tests
  libndctl: fix the uninject-error API actually injecting errors
  ndctl, test: Fix dax.sh return code
  ndctl, test: fix timeouts in device-dax
  contrib/do_abidiff: make the build more robust
  ndctl: add an API to check support for smart injection
  ndctl, test: fix tests for the array vs object listing fix
  ndctl, test: convert remaining tests to use test/common
  ndctl, bash-completion: add completion for ndctl-monitor
  ndctl, monitor: Add a config-file section to the man page
  ndctl, monitor: fix memory leak in read_config_file
  ndctl, monitor: Fix memory leak in monitor_event
  ndctl, monitor: improve error reporting throughout monitor.c
  Documentation, create-namespace: clarify fsdax wording
  ndctl, monitor: fix a resource leak in parse_monitor_event
  ndctl, documentation: document the label-version option for init-labels
  ndctl: deprecate undocumented short-options
  ndctl, inject-smart: Fix man page to match the current behavior
  ndctl inject-smart: add an option to uninject smart fields
  ndctl, test/monitor: fix inject-smart field in test_filter_dimmevent
  ndctl, inject-smart: continue in spite of errors for uninject-all
  ndctl, tests: add a new unit test for inject-smart
  ndctl, inject: fix a resource leak in ndctl_namespace_get_clear_unit
  ndctl: fix a resource leak in submit_get_firmware_info
  libndctl: fix a resource leak in ndctl_dimm_get_{{event_}flags, health}
  ndctl, test: fix a potential null pointer dereference in 'ndctl test'
  ndctl, test: fix a resource leak in check_smart_threshold
  ndctl, prepare-release.sh: fix revision update checks
  ndctl: fix potential null dereference in the smart error handler
  ndctl, udev: fix a resource leak in save_unsafe_shutdown_count
  ndctl: release v62
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [ndctl PATCH] ndctl: Work around kernel memory corruption

2018-08-13 Thread Dave Jiang



On 08/13/2018 01:31 PM, Keith Busch wrote:
> Kernel commit efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden
> ars_status output length handling") contained an incorrect ars status
> output size calculation and may overrun the buffer provided by 4
> bytes. This patch adds 4 bytes to the buffer the user space allocates
> so that the kernel's overrun doesn't corrupt the application's heap.
> 
> See kernel patch for more details:
> 
>   https://patchwork.kernel.org/patch/10563103/
> 
> Signed-off-by: Keith Busch 

Reviewed-by: Dave Jiang 

> ---
>  ndctl/lib/ars.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/ndctl/lib/ars.c b/ndctl/lib/ars.c
> index c78e3bf..bd75131 100644
> --- a/ndctl/lib/ars.c
> +++ b/ndctl/lib/ars.c
> @@ -133,7 +133,16 @@ NDCTL_EXPORT struct ndctl_cmd 
> *ndctl_bus_cmd_new_ars_status(struct ndctl_cmd *ar
>   }
>  
>   size = sizeof(*cmd) + ars_cap_cmd->max_ars_out;
> - cmd = calloc(1, size);
> +
> + /*
> +  * Older kernels have a bug that miscalculates the output length of the
> +  * ars status and will overrun the provided buffer by 4 bytes,
> +  * corrupting the memory. Add an additional 4 bytes in the allocation
> +  * size to prevent that corruption. See kernel patch for more details:
> +  *
> +  *   https://patchwork.kernel.org/patch/10563103/
> +  */
> + cmd = calloc(1, size + 4);
>   if (!cmd)
>   return NULL;
>  
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[ndctl PATCH] ndctl: Work around kernel memory corruption

2018-08-13 Thread Keith Busch
Kernel commit efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden
ars_status output length handling") contained an incorrect ars status
output size calculation and may overrun the buffer provided by 4
bytes. This patch adds 4 bytes to the buffer the user space allocates
so that the kernel's overrun doesn't corrupt the application's heap.

See kernel patch for more details:

  https://patchwork.kernel.org/patch/10563103/

Signed-off-by: Keith Busch 
---
 ndctl/lib/ars.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/ndctl/lib/ars.c b/ndctl/lib/ars.c
index c78e3bf..bd75131 100644
--- a/ndctl/lib/ars.c
+++ b/ndctl/lib/ars.c
@@ -133,7 +133,16 @@ NDCTL_EXPORT struct ndctl_cmd 
*ndctl_bus_cmd_new_ars_status(struct ndctl_cmd *ar
}
 
size = sizeof(*cmd) + ars_cap_cmd->max_ars_out;
-   cmd = calloc(1, size);
+
+   /*
+* Older kernels have a bug that miscalculates the output length of the
+* ars status and will overrun the provided buffer by 4 bytes,
+* corrupting the memory. Add an additional 4 bytes in the allocation
+* size to prevent that corruption. See kernel patch for more details:
+*
+*   https://patchwork.kernel.org/patch/10563103/
+*/
+   cmd = calloc(1, size + 4);
if (!cmd)
return NULL;
 
-- 
2.14.4

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 3/3] powerpc/mce: Handle memcpy_mcsafe

2018-08-13 Thread Reza Arbab

On Thu, Apr 05, 2018 at 05:15:00PM +1000, Balbir Singh wrote:

Add a blocking notifier callback to be called in real-mode
on machine check exceptions for UE (ld/st) errors only.


It's been a while, but is this patchset still being pursued?

This patch in particular (callbacks for MCE handling) has other device 
memory use cases and I'd like to move it along.



The patch registers a callback on boot to be notified
of machine check exceptions and returns a NOTIFY_STOP when
a page of interest is seen as the source of the machine
check exception. This page of interest is a ZONE_DEVICE
page and hence for now, for memcpy_mcsafe to work, the page
needs to belong to ZONE_DEVICE and memcpy_mcsafe should be
used to access the memory.

The patch also modifies the NIP of the exception context
to go back to the fixup handler (in memcpy_mcsafe) and does
not print any error message as the error is treated as
returned via a return value and handled.

Signed-off-by: Balbir Singh 
---
arch/powerpc/include/asm/mce.h |  3 +-
arch/powerpc/kernel/mce.c  | 77 --
2 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 3a1226e9b465..a76638e3e47e 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -125,7 +125,8 @@ struct machine_check_event {
enum MCE_UeErrorType ue_error_type:8;
uint8_t effective_address_provided;
uint8_t physical_address_provided;
-   uint8_t reserved_1[5];
+   uint8_t error_return;
+   uint8_t reserved_1[4];
uint64_teffective_address;
uint64_tphysical_address;
uint8_t reserved_2[8];
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index efdd16a79075..b9e4881fa8c5 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -28,7 +28,9 @@
#include 
#include 
#include 
+#include 

+#include 
#include 
#include 

@@ -54,6 +56,52 @@ static struct irq_work mce_event_process_work = {

DECLARE_WORK(mce_ue_event_work, machine_process_ue_event);

+static BLOCKING_NOTIFIER_HEAD(mce_notifier_list);
+
+int register_mce_notifier(struct notifier_block *nb)
+{
+   return blocking_notifier_chain_register(_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(register_mce_notifier);
+
+int unregister_mce_notifier(struct notifier_block *nb)
+{
+   return blocking_notifier_chain_unregister(_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(unregister_mce_notifier);
+
+
+static int check_memcpy_mcsafe(struct notifier_block *nb,
+   unsigned long val, void *data)
+{
+   /*
+* val contains the physical_address of the bad address
+*/
+   unsigned long pfn = val >> PAGE_SHIFT;
+   struct page *page = realmode_pfn_to_page(pfn);
+   int rc = NOTIFY_DONE;
+
+   if (!page)
+   goto out;
+
+   if (is_zone_device_page(page))  /* for HMM and PMEM */
+   rc = NOTIFY_STOP;
+out:
+   return rc;
+}
+
+struct notifier_block memcpy_mcsafe_nb = {
+   .priority = 0,
+   .notifier_call = check_memcpy_mcsafe,
+};
+
+int  mce_mcsafe_register(void)
+{
+   register_mce_notifier(_mcsafe_nb);
+   return 0;
+}
+arch_initcall(mce_mcsafe_register);
+
static void mce_set_error_info(struct machine_check_event *mce,
   struct mce_error_info *mce_err)
{
@@ -151,9 +199,31 @@ void save_mce_event(struct pt_regs *regs, long handled,
mce->u.ue_error.effective_address_provided = true;
mce->u.ue_error.effective_address = addr;
if (phys_addr != ULONG_MAX) {
+   int rc;
+   const struct exception_table_entry *entry;
+
+   /*
+* Once we have the physical address, we check to
+* see if the current nip has a fixup entry.
+* Having a fixup entry plus the notifier stating
+* that it can handle the exception is an indication
+* that we should return to the fixup entry and
+* return an error from there
+*/
mce->u.ue_error.physical_address_provided = true;
mce->u.ue_error.physical_address = phys_addr;
-   machine_check_ue_event(mce);
+
+   rc = blocking_notifier_call_chain(_notifier_list,
+   phys_addr, NULL);


Could we pass mce entirely here instead of just phys_addr? It would 
allow the callback itself to set error_return if needed.



+   if (rc & NOTIFY_STOP_MASK) {
+

Re: [PATCH V3 3/4] mm: add a function to differentiate the pages is from DAX device memory

2018-08-13 Thread Jerome Glisse
On Tue, Aug 14, 2018 at 01:41:40AM +0800, Zhang,Yi wrote:
> 
> 
> On 2018年08月09日 17:23, Pankaj Gupta wrote:
> >> DAX driver hotplug the device memory and move it to memory zone, these
> >> pages will be marked reserved flag, however, some other kernel componet
> >> will misconceive these pages are reserved mmio (ex: we map these dev_dax
> >> or fs_dax pages to kvm for DIMM/NVDIMM backend). Together with the type
> >> MEMORY_DEVICE_FS_DAX, we can use is_dax_page() to differentiate the pages
> >> is DAX device memory or not.
> >>
> >> Signed-off-by: Zhang Yi 
> >> Signed-off-by: Zhang Yu 
> >> ---
> >>  include/linux/mm.h | 12 
> >>  1 file changed, 12 insertions(+)
> >>
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index 68a5121..de5cbc3 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -889,6 +889,13 @@ static inline bool is_device_public_page(const struct
> >> page *page)
> >>page->pgmap->type == MEMORY_DEVICE_PUBLIC;
> >>  }
> >>  
> >> +static inline bool is_dax_page(const struct page *page)
> >> +{
> >> +  return is_zone_device_page(page) &&
> >> +  (page->pgmap->type == MEMORY_DEVICE_FS_DAX ||
> >> +  page->pgmap->type == MEMORY_DEVICE_DEV_DAX);
> >> +}
> > I think question from Dan for KVM VM with 'MEMORY_DEVICE_PUBLIC' still 
> > holds?
> > I am also interested to know if there is any use-case.
> >
> > Thanks,
> > Pankaj
> Yes, it is, thanks for your remind, Pankaj.
> Adding Jerome for Dan's questions on V1:
> [Dan]:
> 
> Jerome, might there be any use case to pass MEMORY_DEVICE_PUBLIC
> memory to a guest vm?

Yes and no, i am not sure how we are going to do it. But being able to
share GPU among multiple VM is on TODO list and those GPU will have
MEMORY_DEVICE_PUBLIC|PRIVATE depending on the platform. So either we
pass down the real underlying resource to the guest, or we will pass
down a fake one and have guest and host driver talk to each other so
that the host driver can do overall resource management accross multiple
guests.

So i would say that for now you can ignore MEMORY_DEVICE_PUBLIC and when
we get to the KVM guest sharing of those and decide how we want to do
it then we can update kvm to properly interpret those.

Cheers,
Jérôme
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [ndctl PATCH 1/2] ndctl: fix potential null dereference in the smart error handler

2018-08-13 Thread Keith Busch
On Fri, Aug 10, 2018 at 06:40:52PM -0600, Vishal Verma wrote:
> Static analysis reports that can potentially dereference a NULL pointer
> in the smart cmd error handler. This can particular instance won't ever
> be hit in practice as the handler is only registered for smart commands,
> and smart commands are currently only DIMM commands, and will always
> have a dimm object. However for completeness, and to avoid future
> errors, we should perform a NULL check in the handler anyway.

Hmm, I purposefully didn't have the NULL check because the dimm is never
not set in this path. Looks like a false positive, but the NULL check is
harmless.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional

2018-08-13 Thread Theodore Y. Ts'o
On Mon, Aug 13, 2018 at 12:12:52PM +0200, Jan Kara wrote:
> > The generic/081 regression appears to be a device-mapper issue...
> 
> I'll see if this reproduces for me. Doesn't seem to be related to the DAX
> patches you caary though.

It does seem to be a DAX-specific failure though.

> > The generic/344 failure seems to be caused by a WARNING triggered in
> > the nvdimm code:
> 
> OK, apparently this is nothing new for you as generic/344 fails for you
> even with 3.17. But it should not :). I'll try to see if I can reproduce
> this in my test setup during more test runs (I don't remember seeing it
> during occasional runs I do) and debug it further.

Thanks!

In case it wasn't clear, I wasn't planning on letting these failures
prevent the patches from going upstream.  As you say, the generic/081
failure looks unrelated to ext4, and the generic/344 isn't a
regression.

- Ted
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional

2018-08-13 Thread Jan Kara
On Fri 10-08-18 22:10:53, Theodore Y. Ts'o wrote:
> On Fri, Aug 10, 2018 at 04:33:49PM -0400, Theodore Y. Ts'o wrote:
> > I just kicked off a DAX test ("gce-xfstests -c dax -g auto") with
> > CONFIG_KASAN disabled, and I expect it shouldn't show up anything
> > concerning.  So assuming nothing surprising pops up, yes it should be
> > merged at the next merge window.
> 
> ... and here are the results.  The first is 4.17, and the second is
> the ext4 git tree:
> 
> ext4/dax: 488 tests, 4 failures, 97 skipped, 2647 seconds
>   Failures: ext4/033 generic/344 generic/491 generic/503
> 
> ext4/dax: 488 tests, 3 failures, 97 skipped, 2637 seconds
>   Failures: generic/081 generic/344 generic/388
> 
> The generic/388 failure is a known flake (shutdown stress test).
> 
> The generic/081 regression appears to be a device-mapper issue:
> 
> generic/081   [22:06:33][   15.079661] run fstests generic/081 at 
> 2018-08-10 22:06:33
> [   15.795745] device-mapper: ioctl: can't change device type (old=4 vs 
> new=1) after initial table load.
> [failed, exit status 1] [22:06:36]- output mismatch (see 
> /results/ext4/results-dax/generic/081.out.bad)
> --- tests/generic/081.out 2018-08-09 18:00:42.0 -0400
> +++ /results/ext4/results-dax/generic/081.out.bad 2018-08-10 
> 22:06:36.440005460 -0400
> @@ -1,2 +1,4 @@
>  QA output created by 081
>  Silence is golden
> +Failed to create snapshot
> +(see /results/ext4/results-dax/generic/081.full for details)
> ...
> (Run 'diff -u tests/generic/081.out 
> /results/ext4/results-dax/generic/081.out.bad'  to see the entire diff)

I'll see if this reproduces for me. Doesn't seem to be related to the DAX
patches you caary though.

> The generic/344 failure seems to be caused by a WARNING triggered in
> the nvdimm code:

OK, apparently this is nothing new for you as generic/344 fails for you
even with 3.17. But it should not :). I'll try to see if I can reproduce
this in my test setup during more test runs (I don't remember seeing it
during occasional runs I do) and debug it further.

Honza

> generic/344   [22:06:36][   18.126280] run fstests generic/344 at 
> 2018-08-10 22:06:36
> [   18.303113] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at 
> your own risk
> [   18.456988] EXT4-fs (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at 
> your own risk
> [   97.375912] WARNING: CPU: 2 PID: 1712 at 
> /usr/projects/linux/ext4/mm/memory.c:1801 insert_pfn+0x15a/0x170
> [   97.377261] CPU: 2 PID: 1712 Comm: holetest Not tainted 
> 4.18.0-rc4-xfstests-00039-g863c37fcb14f #497
> [   97.378486] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.11.1-1 04/01/2014
> [   97.379516] RIP: 0010:insert_pfn+0x15a/0x170
> [   97.380064] Code: 19 1b 01 eb dd 48 85 d2 74 07 48 23 1d 3f 19 1b 01 48 09 
> df 48 89 f8 0f 1f 40 00 48 b9 00 02 00 00 00 00 00 04 48 09 c1 eb c8 <0f> 0b 
> e9 5e ff ff ff bb f4 ff ff ff e9 5e ff ff ff e8 80 7b ec ff 
> [   97.382469] RSP: :acfd0457fc60 EFLAGS: 00010206
> [   97.383123] RAX: 00179e3b RBX: fff0 RCX: 
> 0002
> [   97.384062] RDX: 000f RSI: 8f5c28f5c28f5c29 RDI: 
> 800179e3b225
> [   97.385134] RBP: 923761654558 R08: 0001 R09: 
> 0001
> [   97.386213] R10: 92376f415cc0 R11: 0001 R12: 
> 92377e880cc0
> [   97.387264] R13: 7fab98aab000 R14: 003e860d R15: 
> 92376156
> [   97.388327] FS:  7fab9049c700() GS:92377f20() 
> knlGS:
> [   97.389514] CS:  0010 DS:  ES:  CR0: 80050033
> [   97.390367] CR2: 7fab98aabc00 CR3: 0006a165a004 CR4: 
> 00360ee0
> [   97.391432] Call Trace:
> [   97.391798]  __vm_insert_mixed+0x7e/0xc0
> [   97.392376]  vmf_insert_mixed_mkwrite+0xf/0x30
> [   97.393048]  dax_iomap_pte_fault+0xb8b/0xe40
> [   97.393691]  ext4_dax_huge_fault+0x145/0x200
> [   97.394268]  do_wp_page+0x175/0x5b0
> [   97.394710]  __handle_mm_fault+0x587/0xbb0
> [   97.395228]  __do_page_fault+0x20c/0x490
> [   97.395729]  ? async_page_fault+0x8/0x30
> [   97.396251]  async_page_fault+0x1e/0x30
> [   97.396719] RIP: 0033:0x5598144275ea
> [   97.397187] Code: 0f 85 8a 00 00 00 31 d2 48 85 db 4b 8d 04 34 7e 1f 0f 1f 
> 80 00 00 00 00 48 89 d1 48 83 c2 01 48 0f af 0d 71 1b 20 00 48 39 d3 <48> 89 
> 2c 08 75 e8 8b 0d 36 1b 20 00 31 c0 85 c9 74 0a 8b 15 2e 1b 
> [   97.399752] RSP: 002b:7fab9049bf20 EFLAGS: 00010212
> [   97.400541] RAX: 7fab90c9ec00 RBX: 0001 RCX: 
> 07e0d000
> [   97.401603] RDX: 7e0e RSI:  RDI: 
> 0001
> [   97.402673] RBP: 7fab9049c700 R08: 7fab9049c700 R09: 
> 7fab9049c700
> [   97.403755] R10: 7fab9049c9d0 R11: 0202 R12: 
> 7fab90c9e000
> [   97.404851] R13: 7ffc4608c9e0 R14: 0c00 R15: 
> 55981608e250
> 

Re: [PATCH V3 3/4] mm: add a function to differentiate the pages is from DAX device memory

2018-08-13 Thread Zhang,Yi


On 2018年08月09日 17:23, Pankaj Gupta wrote:
>> DAX driver hotplug the device memory and move it to memory zone, these
>> pages will be marked reserved flag, however, some other kernel componet
>> will misconceive these pages are reserved mmio (ex: we map these dev_dax
>> or fs_dax pages to kvm for DIMM/NVDIMM backend). Together with the type
>> MEMORY_DEVICE_FS_DAX, we can use is_dax_page() to differentiate the pages
>> is DAX device memory or not.
>>
>> Signed-off-by: Zhang Yi 
>> Signed-off-by: Zhang Yu 
>> ---
>>  include/linux/mm.h | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 68a5121..de5cbc3 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -889,6 +889,13 @@ static inline bool is_device_public_page(const struct
>> page *page)
>>  page->pgmap->type == MEMORY_DEVICE_PUBLIC;
>>  }
>>  
>> +static inline bool is_dax_page(const struct page *page)
>> +{
>> +return is_zone_device_page(page) &&
>> +(page->pgmap->type == MEMORY_DEVICE_FS_DAX ||
>> +page->pgmap->type == MEMORY_DEVICE_DEV_DAX);
>> +}
> I think question from Dan for KVM VM with 'MEMORY_DEVICE_PUBLIC' still holds?
> I am also interested to know if there is any use-case.
>
> Thanks,
> Pankaj
Yes, it is, thanks for your remind, Pankaj.
Adding Jerome for Dan's questions on V1:
[Dan]:

Jerome, might there be any use case to pass MEMORY_DEVICE_PUBLIC
memory to a guest vm?

>
>> +
>>  #else /* CONFIG_DEV_PAGEMAP_OPS */
>>  static inline void dev_pagemap_get_ops(void)
>>  {
>> @@ -912,6 +919,11 @@ static inline bool is_device_public_page(const struct
>> page *page)
>>  {
>>  return false;
>>  }
>> +
>> +static inline bool is_dax_page(const struct page *page)
>> +{
>> +return false;
>> +}
>>  #endif /* CONFIG_DEV_PAGEMAP_OPS */
>>  
>>  static inline void get_page(struct page *page)
>> --
>> 2.7.4
>>
>>

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH V3 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio

2018-08-13 Thread Zhang,Yi


On 2018年08月09日 17:02, Jan Kara wrote:
> On Thu 09-08-18 18:52:48, Zhang Yi wrote:
>> For device specific memory space, when we move these area of pfn to
>> memory zone, we will set the page reserved flag at that time, some of
>> these reserved for device mmio, and some of these are not, such as
>> NVDIMM pmem.
>>
>> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
>> backend, since these pages are reserved. the check of
>> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
>> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
>> to indentify these pages are from NVDIMM pmem. and let kvm treat these
>> as normal pages.
>>
>> Without this patch, Many operations will be missed due to this
>> mistreatment to pmem pages. For example, a page may not have chance to
>> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
>> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>>
>> V1:
>> https://lkml.org/lkml/2018/7/4/91
>>
>> V2:
>> https://lkml.org/lkml/2018/7/10/135
>>
>> V3:
>> [PATCH V3 1/4] Needs Comments.
>> [PATCH V3 2/4] Update the description of MEMORY_DEVICE_DEV_DAX: Jan
>> [PATCH V3 3/4] Acked-by: Jan in V2
> Hum, but it is not the the patch...
>
>   Honza
Sorry, I missed that, will add in the next version, thanks for your review

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH V3 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio

2018-08-13 Thread Zhang,Yi


On 2018年08月10日 21:27, David Hildenbrand wrote:
> On 09.08.2018 12:52, Zhang Yi wrote:
>> For device specific memory space, when we move these area of pfn to
>> memory zone, we will set the page reserved flag at that time, some of
>> these reserved for device mmio, and some of these are not, such as
>> NVDIMM pmem.
>>
>> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
>> backend, since these pages are reserved. the check of
>> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
>> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
>> to indentify these pages are from NVDIMM pmem. and let kvm treat these
>> as normal pages.
>>
>> Without this patch, Many operations will be missed due to this
>> mistreatment to pmem pages. For example, a page may not have chance to
>> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
>> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>>
> I am right now looking into (and trying to better document) PG_reserved
> - and having a hard time :) .
>
> One of the main points about reserved pages is that the struct pages are
> not to be touched. See [1] (I know that statement is fairly old, but it
> resembles what PG_reserved is actually used for nowadays - with some
> exceptions unfortunately.).
>
> Struct pages part of user space tables that are PG_reserved can indicate
> (as of now according to my research)
> - MMIO pages
> - Selected MMAPed pages - e.g. vDSO
> - Zero page
> - PMEM pages as you correctly state
>
> So I wonder, if it is really the right approach to silently go ahead and
> treat reserved pages just like they would not be reserved. Maybe the
> right approach would rather be to do something about pmem pages being
> reserved. Yes, they are never to be given to the page allocator, but I
> wonder if PG_reserved is strictly needed for that.
>
> [1] https://lists.linuxcoding.com/kernel/2005-q3/msg10350.html

Thanks David list the long history of Page reserved, By now, I think we treat 
nvdimm as a device not a DRAM, also has it's device driver which manager its 
own device memory. From this perspective, it is reasonable to set these pages 
as zone device memory and mark reserved flag.
@Dan @Dave, how do you think about this?

>
>> V1:
>> https://lkml.org/lkml/2018/7/4/91
>>
>> V2:
>> https://lkml.org/lkml/2018/7/10/135
>>
>> V3:
>> [PATCH V3 1/4] Needs Comments.
>> [PATCH V3 2/4] Update the description of MEMORY_DEVICE_DEV_DAX: Jan
>> [PATCH V3 3/4] Acked-by: Jan in V2
>> [PATCH V3 4/4] Needs Comments.
>>
>> Zhang Yi (4):
>>   kvm: remove redundant reserved page check
>>   mm: introduce memory type MEMORY_DEVICE_DEV_DAX
>>   mm: add a function to differentiate the pages is from DAX device
>> memory
>>   kvm: add a check if pfn is from NVDIMM pmem.
>>
>>  drivers/dax/pmem.c   |  1 +
>>  include/linux/memremap.h |  8 
>>  include/linux/mm.h   | 12 
>>  virt/kvm/kvm_main.c  | 16 
>>  4 files changed, 29 insertions(+), 8 deletions(-)
>>
>

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm