Re: [PATCH v3 01/11] pagemap: Introduce ->memory_failure()

2021-03-06 Thread Dan Williams
On Mon, Feb 8, 2021 at 2:55 AM Shiyang Ruan  wrote:
>
> When memory-failure occurs, we call this function which is implemented
> by each kind of devices.  For the fsdax case, pmem device driver
> implements it.  Pmem device driver will find out the block device where
> the error page locates in, and try to get the filesystem on this block
> device.  And finally call filesystem handler to deal with the error.
> The filesystem will try to recover the corrupted data if possiable.
>
> Signed-off-by: Shiyang Ruan 
> ---
>  include/linux/memremap.h | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 79c49e7f5c30..0bcf2b1e20bd 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -87,6 +87,14 @@ struct dev_pagemap_ops {
>  * the page back to a CPU accessible page.
>  */
> vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf);
> +
> +   /*
> +* Handle the memory failure happens on one page.  Notify the 
> processes
> +* who are using this page, and try to recover the data on this page
> +* if necessary.
> +*/
> +   int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn,
> + int flags);
>  };

After the conversation with Dave I don't see the point of this. If
there is a memory_failure() on a page, why not just call
memory_failure()? That already knows how to find the inode and the
filesystem can be notified from there.

Although memory_failure() is inefficient for large range failures, I'm
not seeing a better option, so I'm going to test calling
memory_failure() over a large range whenever an in-use dax-device is
hot-removed.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [block] 52f019d43c: ndctl.test-libndctl.fail

2021-03-06 Thread Williams, Dan J
On Fri, 2021-03-05 at 08:42 +0100, Christoph Hellwig wrote:
> Dan,
> 
> can you make any sense of thos report?
[..]
> > check_set_config_data: dimm: 0 read2 data miscompare: 0
> > check_set_config_data: dimm: 0x1 read2 data miscompare: 0
> > check_set_config_data: dimm: 0x100 read2 data miscompare: 0
> > check_set_config_data: dimm: 0x101 read2 data miscompare: 0
> > check_dax_autodetect: dax_ndns: 0x558a74d92f00 ndns: 0x558a74d92f00
> > check_dax_autodetect: dax_ndns: 0x558a74d91f40 ndns: 0x558a74d91f40
> > check_pfn_autodetect: pfn_ndns: 0x558a74d91f40 ndns: 0x558a74d91f40
> > check_pfn_autodetect: pfn_ndns: 0x558a74d8c5e0 ndns: 0x558a74d8c5e0
> > check_btt_autodetect: btt_ndns: 0x558a74d8c5e0 ndns: 0x558a74d8c5e0
> > check_btt_autodetect: btt_ndns: 0x558a74da1390 ndns: 0x558a74da1390
> > check_btt_autodetect: btt_ndns: 0x558a74d8c5e0 ndns: 0x558a74d8c5e0
> > check_btt_autodetect: btt_ndns: 0x558a74d91f40 ndns: 0x558a74d91f40
> > namespace7.0: failed to write /dev/pmem7
> > check_namespaces: namespace7.0 validate_bdev failed
> > ndctl-test1 failed: -6
> > libkmod: ERROR ../libkmod/libkmod-module.c:793 kmod_module_remove_module: 
> > could not remove 'nfit_test': Resource temporarily unavailable
> > test-libndctl: FAIL

Yes, it looks like my unit test checks for exactly the behavior you
changed. It was convenient to test that the device could be switched
back to rw via BLKROSET, but I don't require that. The new behaviour of
letting the disk->ro take precedence makes more sense to me, so I'll
update the test for the new behaviour.

I.e. I don't think regressing a unit test counts as a userspace
regression.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org