Re: [driver-core PATCH v6 3/9] device core: Consolidate locking and unlocking of parent and device

2018-11-08 Thread jane . chu
Hi, Alex, On 11/08/2018 10:06 AM, Alexander Duyck wrote: +/* + * __device_driver_lock - release locks needed to manipulate dev->drv You meant to say __device_driver_unlock, right? + * @dev: Device we will update driver info for + * @parent: Parent device. Needed if the bus requires parent l

[PATCH 2/3] dax: introduce dax clear poison to page aligned dax pwrite operation

2021-09-14 Thread Jane Chu
Currenty, when pwrite(2) s issued to a dax range that contains poison, the pwrite(2) fails with EIO. Well, if the hardware backend of the dax device is capable of clearing poison, try that and resume the write. Signed-off-by: Jane Chu --- fs/dax.c | 9 + 1 file changed, 9 insertions

[PATCH 1/3] dax: introduce dax_operation dax_clear_poison

2021-09-14 Thread Jane Chu
locks. Signed-off-by: Jane Chu --- drivers/dax/super.c | 13 + include/linux/dax.h | 6 ++ 2 files changed, 19 insertions(+) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 44736cbd446e..935d496fa7db 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -373,6 +3

[PATCH 2/3] dax: introduce dax_clear_poison to dax pwrite operation

2021-09-14 Thread Jane Chu
When pwrite(2) encounters poison in a dax range, it fails with EIO. But if the backend hardware of the dax device is capable of clearing poison, try that and resume the write. Signed-off-by: Jane Chu --- fs/dax.c | 9 + 1 file changed, 9 insertions(+) diff --git a/fs/dax.c b/fs/dax.c

[PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-14 Thread Jane Chu
le to, first, speed up repairing by means of it; second, maintain backend continuity instead of fragmenting it in search for clean blocks. Jane Chu (3): dax: introduce dax_operation dax_clear_poison dax: introduce dax_clear_poison to dax pwrite operation libnvdimm/pmem: Provide pmem_dax_clear_p

[PATCH 3/3] libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation

2021-09-14 Thread Jane Chu
Provide pmem_dax_clear_poison() to struct dax_operations.clear_poison. Signed-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 1e0615b8565e..307a53aa3432 100644 --- a/drivers

Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-15 Thread Jane Chu
Hi, Dan, On 9/14/2021 9:44 PM, Dan Williams wrote: On Tue, Sep 14, 2021 at 4:32 PM Jane Chu wrote: If pwrite(2) encounters poison in a pmem range, it fails with EIO. This is unecessary if hardware is capable of clearing the poison. Though not all dax backend hardware has the capability of

Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-23 Thread Jane Chu
On 9/15/2021 1:27 PM, Dan Williams wrote: I'm also thinking about the MOVEDIR64B instruction and how it might be used to clear poison on the fly with a single 'store'. Of course, that means we need to figure out how to narrow down the error blast radius first. It turns out the MOVDIR64B error cl

Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-23 Thread Jane Chu
On 9/15/2021 9:15 AM, Darrick J. Wong wrote: On Wed, Sep 15, 2021 at 12:22:05AM -0700, Jane Chu wrote: Hi, Dan, On 9/14/2021 9:44 PM, Dan Williams wrote: On Tue, Sep 14, 2021 at 4:32 PM Jane Chu wrote: If pwrite(2) encounters poison in a pmem range, it fails with EIO. This is unecessary if

Re: [PATCH 3/3] libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation

2021-11-04 Thread Jane Chu
On 11/4/2021 10:55 AM, Christoph Hellwig wrote: > On Tue, Sep 14, 2021 at 05:31:32PM -0600, Jane Chu wrote: >> +static int pmem_dax_clear_poison(struct dax_device *dax_dev, pgoff_t pgoff, >> +size_t nr_pages) >> +{ >> +unsigned

Phantom PMEM poison issue

2022-01-21 Thread Jane Chu
On baremetal Intel platform with DCPMEM installed and configured to provision daxfs, say a poison was consumed by a load from a user thread, and then daxfs takes action and clears the poison, confirmed by "ndctl -NM". Now, depends on the luck, after sometime(from a few seconds to 5+ hours) the

Re: Phantom PMEM poison issue

2022-01-21 Thread Jane Chu
On 1/21/2022 4:31 PM, Jane Chu wrote: > On baremetal Intel platform with DCPMEM installed and configured to > provision daxfs, say a poison was consumed by a load from a user thread, > and then daxfs takes action and clears the poison, confirmed by "ndctl > -NM". > &

Re: Phantom PMEM poison issue

2022-01-21 Thread Jane Chu
On 1/21/2022 5:27 PM, Luck, Tony wrote: > On Sat, Jan 22, 2022 at 12:40:18AM +0000, Jane Chu wrote: >> On 1/21/2022 4:31 PM, Jane Chu wrote: >>> On baremetal Intel platform with DCPMEM installed and configured to >>> provision daxfs, say a poison was consumed by a load

Re: Phantom PMEM poison issue

2022-01-21 Thread Jane Chu
ger.kernel.org > Subject: Re: Phantom PMEM poison issue > > On Sat, Jan 22, 2022 at 12:40:18AM +, Jane Chu wrote: >> On 1/21/2022 4:31 PM, Jane Chu wrote: >>> On baremetal Intel platform with DCPMEM installed and configured to >>> provision daxfs, say a pois

Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-04-05 Thread Jane Chu
On 3/30/2022 9:18 AM, Darrick J. Wong wrote: > On Wed, Mar 30, 2022 at 08:49:29AM -0700, Christoph Hellwig wrote: >> On Wed, Mar 30, 2022 at 06:58:21PM +0800, Shiyang Ruan wrote: >>> As the code I pasted before, pmem driver will subtract its ->data_offset, >>> which is byte-based. And the filesyste

[PATCH] pmem: fix a name collision

2022-06-30 Thread Jane Chu
set) 49 { 50 return pmem->phys_addr + offset; 51 } 52 Fixes: 9409c9b6709e (pmem: refactor pmem_clear_poison()) Reported-by: kernel test robot Signed-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/dr

Re: [PATCH] pmem: fix a name collision

2022-06-30 Thread Jane Chu
On 6/30/2022 11:04 AM, Christoph Hellwig wrote: > On Thu, Jun 30, 2022 at 11:51:55AM -0600, Jane Chu wrote: >> -static phys_addr_t to_phys(struct pmem_device *pmem, phys_addr_t offset) >> +static phys_addr_t _to_phys(struct pmem_device *pmem, phys_addr_t offset) > >

[PATCH v2] pmem: fix a name collision

2022-06-30 Thread Jane Chu
set) 49 { 50 return pmem->phys_addr + offset; 51 } 52 Fixes: 9409c9b6709e (pmem: refactor pmem_clear_poison()) Reported-by: kernel test robot Signed-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --g

Re: [PATCH v2] pmem: fix a name collision

2022-06-30 Thread Jane Chu
On 6/30/2022 11:29 AM, Christoph Hellwig wrote: > Looks good: > > Reviewed-by: Christoph Hellwig Thank you! -jane

[PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-11 Thread Jane Chu
a. it happens to be the badblock granularity, b. ndctl inject-error cannot inject more than one poison to a 512-byte block, c. architecture agnostic Fixes: 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison granularity") Signed-off-by: Jane Chu --- drivers/acpi/nfit/mce.c

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-13 Thread Jane Chu
On 7/12/2022 5:48 PM, Dan Williams wrote: > Jane Chu wrote: >> Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison >> granularity") changed nfit_handle_mce() callback to report badrange for >> each poison at an alignment indicated by 1

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-14 Thread Jane Chu
On 7/13/2022 5:24 PM, Dan Williams wrote: > Jane Chu wrote: >> On 7/12/2022 5:48 PM, Dan Williams wrote: >>> Jane Chu wrote: >>>> Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison >>>> granularity") changed nfit_handle_mc

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-15 Thread Jane Chu
On 7/14/2022 6:19 PM, Dan Williams wrote: > Jane Chu wrote: >> I meant to say there would be 8 calls to the nfit_handle_mce() callback, >> one call for each poison with accurate address. >> >> Also, short ARS would find 2 poisons. >> >> I attached the console

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-15 Thread Jane Chu
On 7/14/2022 5:58 PM, Dan Williams wrote: [..] >>> > However, the ARS engine likely can return the precise error ranges so I > think the fix is to just use the address range indicated by 1UL << > MCI_MISC_ADDR_LSB(mce->misc) to filter the results from a short ARS > scrub request to

Re: [PATCH] acpi/nfit: badrange report spill over to clean range

2022-07-15 Thread Jane Chu
On 7/15/2022 12:17 PM, Dan Williams wrote: > [ add Tony ] > > Jane Chu wrote: >> On 7/14/2022 6:19 PM, Dan Williams wrote: >>> Jane Chu wrote: >>>> I meant to say there would be 8 calls to the nfit_handle_mce() callback, >>>> one call for each poi

[PATCH v2] x86/mce: retrieve poison range from hardware whenever supported

2022-07-15 Thread Jane Chu
n hardware whenever support is available. v1: https://lkml.org/lkml/2022/7/15/1040 Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 717192915f28..a4d5

Re: [PATCH v2] x86/mce: retrieve poison range from hardware whenever supported

2022-07-16 Thread Jane Chu
On 7/15/2022 9:50 PM, Dan Williams wrote: > Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(

[PATCH v3] x86/mce: retrieve poison range from hardware

2022-07-17 Thread Jane Chu
n hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/ap

Re: [PATCH v3] x86/mce: retrieve poison range from hardware

2022-07-18 Thread Jane Chu
On 7/18/2022 12:22 PM, Luck, Tony wrote: >> It appears the kernel is trusting that ->physical_addr_mask is non-zero >> in other paths. So this is at least equally broken in the presence of a >> broken BIOS. The impact is potentially larger though with this change, >> so it might be a good follow-on

[PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-27 Thread Jane Chu
n hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/ke

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-27 Thread Jane Chu
On 7/27/2022 11:56 AM, Dan Williams wrote: > Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-27 Thread Jane Chu
On 7/27/2022 12:24 PM, Jane Chu wrote: > On 7/27/2022 11:56 AM, Dan Williams wrote: >> Jane Chu wrote: >>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>> poison granularity") that changed nfit_handle_mce() callback

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-27 Thread Jane Chu
On 7/27/2022 12:30 PM, Jane Chu wrote: > On 7/27/2022 12:24 PM, Jane Chu wrote: >> On 7/27/2022 11:56 AM, Dan Williams wrote: >>> Jane Chu wrote: >>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>>> poison granularity&q

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-28 Thread Jane Chu
On 7/27/2022 1:01 PM, Dan Williams wrote: > Jane Chu wrote: >> On 7/27/2022 12:30 PM, Jane Chu wrote: >>> On 7/27/2022 12:24 PM, Jane Chu wrote: >>>> On 7/27/2022 11:56 AM, Dan Williams wrote: >>>>> Jane Chu wrote: >>>>>> With

Re: [PATCH v4] x86/mce: retrieve poison range from hardware

2022-07-28 Thread Jane Chu
On 7/28/2022 11:46 AM, Dan Williams wrote: > Jane Chu wrote: >> On 7/27/2022 1:01 PM, Dan Williams wrote: >>> Jane Chu wrote: >>>> On 7/27/2022 12:30 PM, Jane Chu wrote: >>>>> On 7/27/2022 12:24 PM, Jane Chu wrote: >>>>>> On

[PATCH v5] x86/mce: retrieve poison range from hardware

2022-07-29 Thread Jane Chu
n hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/ke

Re: [PATCH v5] x86/mce: retrieve poison range from hardware

2022-08-01 Thread Jane Chu
On 8/1/2022 8:58 AM, Luck, Tony wrote: >> struct mce m; >> +int lsb = PAGE_SHIFT; > > Some maintainers like to order local declaration lines from longest to > shortest > >> + /* >> + * Even if the ->validation_bits are set for address mask, >> + * to be extra safe, check

Re: [PATCH v5] x86/mce: retrieve poison range from hardware

2022-08-01 Thread Jane Chu
On 8/1/2022 9:44 AM, Dan Williams wrote: > Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(

Re: [PATCH v5] x86/mce: retrieve poison range from hardware

2022-08-01 Thread Jane Chu
On 8/1/2022 2:20 PM, Dan Williams wrote: > Jane Chu wrote: >> On 8/1/2022 9:44 AM, Dan Williams wrote: >>> Jane Chu wrote: >>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>>> poison granularity") that changed nfit_

[PATCH v6] x86/mce: retrieve poison range from hardware

2022-08-01 Thread Jane Chu
n hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/ke

Re: [PATCH v6] x86/mce: retrieve poison range from hardware

2022-08-02 Thread Jane Chu
On 8/2/2022 3:59 AM, Ingo Molnar wrote: > > * Jane Chu wrote: > >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL &l

[PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-02 Thread Jane Chu
n hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Reviewed-by: Ingo Molnar Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-)

Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-08 Thread Jane Chu
On 8/3/2022 1:53 AM, Ingo Molnar wrote: > > * Jane Chu wrote: > >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > > s/Commit/commit Maintainers, Would you prefer a v8, or take care the comment upon accepting the patch? > >

Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-23 Thread Jane Chu
gt; APEI error granularities are managed. So I think it is appropriate for > this to go through the x86 tree via the typical path for mce related > topics. + Huang, Ying. x86 maintainers, Please let me know if you need another revision. thanks, -jane On 8/8/2022 4:30 PM, Dan Willia

Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-25 Thread Jane Chu
On 8/23/2022 9:51 AM, Borislav Petkov wrote: > On Tue, Aug 02, 2022 at 01:50:53PM -0600, Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report &

Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-26 Thread Jane Chu
On 8/26/2022 11:09 AM, Borislav Petkov wrote: > On Fri, Aug 26, 2022 at 10:54:31AM -0700, Dan Williams wrote: >> How about: >> >> --- >> >> When memory poison consumption machine checks fire, >> mce-notifier-handlers like nfit_handle_mce() record the impacted >> physical address range. > > ... whi

[PATCH v8] x86/mce: retrieve poison range from hardware

2022-08-26 Thread Jane Chu
e.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Reviewed-by: Ingo Molnar Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x8

[PATCH] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-06 Thread Jane Chu
igned-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 2 +- fs/dax.c | 14 -- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index ceea55f621cc..46e094e56159 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/p

Re: [PATCH] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-06 Thread Jane Chu
On 4/6/2023 12:32 PM, Matthew Wilcox wrote: On Thu, Apr 06, 2023 at 11:55:56AM -0600, Jane Chu wrote: static vm_fault_t dax_fault_return(int error) { if (error == 0) return VM_FAULT_NOPAGE; - return vmf_error(error); + else if (error == -ENOMEM

[PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-06 Thread Jane Chu
igned-off-by: Jane Chu --- drivers/nvdimm/pmem.c | 2 +- fs/dax.c | 2 +- include/linux/mm.h| 2 ++ 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index ceea55f621cc..46e094e56159 100644 --- a/drivers/nvdimm/pmem.c

Re: [PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-18 Thread Jane Chu
Ping, any comment? thanks, -jane On 4/6/2023 4:01 PM, Jane Chu wrote: When dax fault handler fails to provision the fault page due to hwpoison, it returns VM_FAULT_SIGBUS which lead to a sigbus delivered to userspace with .si_code BUS_ADRERR. Channel dax backend driver's detection on hwp

Re: [PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-27 Thread Jane Chu
Hi, Dan, On 4/27/2023 2:36 PM, Dan Williams wrote: Jane Chu wrote: When dax fault handler fails to provision the fault page due to hwpoison, it returns VM_FAULT_SIGBUS which lead to a sigbus delivered to userspace with .si_code BUS_ADRERR. Channel dax backend driver's detection on hwpois

Re: [PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-27 Thread Jane Chu
Hi, Dan, On 4/27/2023 2:36 PM, Dan Williams wrote: Jane Chu wrote: When dax fault handler fails to provision the fault page due to hwpoison, it returns VM_FAULT_SIGBUS which lead to a sigbus delivered to userspace with .si_code BUS_ADRERR. Channel dax backend driver's detection on hwpois

Re: [PATCH v2] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-04-27 Thread Jane Chu
On 4/27/2023 4:48 PM, Matthew Wilcox wrote: On Thu, Apr 27, 2023 at 04:36:58PM -0700, Jane Chu wrote: This change results in EHWPOISON leaking to usersapce in the case of read(2), that's not a return code that block I/O applications have ever had to contend with before. Just as badblocks

[PATCH v3] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-04 Thread Jane Chu
s poison detection to the filesystem such that instead of reporting VM_FAULT_SIGBUS, it could report VM_FAULT_HWPOISON. Change from v2: Convert -EHWPOISON to -EIO to prevent EHWPOISON errno from leaking out to block read(2) - suggested by Matthew. Signed-off-by: Jane Chu --- drivers/nvdim

Re: [PATCH v3] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-08 Thread Jane Chu
On 5/4/2023 7:32 PM, Dan Williams wrote: Jane Chu wrote: When multiple processes mmap() a dax file, then at some point, a process issues a 'load' and consumes a hwpoison, the process receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb set for the poison scope. Soon after,

[PATCH v4 0/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-08 Thread Jane Chu
Change from v3: Prevent leaking EHWPOISON to user level block IO calls such as zero_range_range, and truncate. Suggested by Dan. Change from v2: Convert EHWPOISON to EIO to prevent EHWPOISON errno from leaking out to block read(2). Suggested by Matthew. Jane Chu (1): dax: enable dax fault

[PATCH v4 1/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-08 Thread Jane Chu
s poison detection to the filesystem such that instead of reporting VM_FAULT_SIGBUS, it could report VM_FAULT_HWPOISON. If user level block IO syscalls fail due to poison, the errno will be converted to EIO to maintain block API consistency. Signed-off-by: Jane Chu --- drivers/dax/super.c

Re: [PATCH v4 1/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-05-30 Thread Jane Chu
Ping... Is there any further concern? -jane On 5/8/2023 10:47 PM, Jane Chu wrote: When multiple processes mmap() a dax file, then at some point, a process issues a 'load' and consumes a hwpoison, the process receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb set for

Re: [PATCH v4 1/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-06-14 Thread Jane Chu
On 6/8/2023 8:16 PM, Dan Williams wrote: [..] +static inline int dax_mem2blk_err(int err) +{ + return (err == -EHWPOISON) ? -EIO : err; +} I think it is worth a comment on this function to indicate where this helper is *not* used. I.e. it's easy to grep for where the error code is conv

[PATCH v5 0/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-06-15 Thread Jane Chu
from leaking out to block read(2). Suggested by Matthew. Jane Chu (1): dax: enable dax fault handler to report VM_FAULT_HWPOISON drivers/dax/super.c | 5 - drivers/nvdimm/pmem.c| 2 +- drivers/s390/block/dcssblk.c | 3 ++- fs/dax.c | 11

[PATCH v5 1/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-06-15 Thread Jane Chu
s poison detection to the filesystem such that instead of reporting VM_FAULT_SIGBUS, it could report VM_FAULT_HWPOISON. If user level block IO syscalls fail due to poison, the errno will be converted to EIO to maintain block API consistency. Signed-off-by: Jane Chu --- drivers/dax/super.c

Re: [PATCH v5 0/1] dax: enable dax fault handler to report VM_FAULT_HWPOISON

2023-06-26 Thread Jane Chu
On 6/24/2023 11:25 PM, Markus Elfring wrote: Change from v4: … I suggest to omit the cover letter for a single patch. Will any patch series evolve for your proposed changes? No. The thought was to put descriptions unsuitable for commit header in the cover letter. thanks, jane Regards,

Re: [PATCH 5/5] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-25 Thread Jane Chu
On 1/25/2019 10:20 AM, Verma, Vishal L wrote: On Fri, 2019-01-25 at 09:18 -0800, Dan Williams wrote: On Fri, Jan 25, 2019 at 12:20 AM Du, Fan wrote: Dan Thanks for the insights! Can I say, the UCE is delivered from h/w to OS in a single way in case of machine check, only PMEM/DAX stuff fi

Re: [PATCH 5/5] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-25 Thread Jane Chu
On 1/25/2019 11:15 AM, Dan Williams wrote: On Fri, Jan 25, 2019 at 11:10 AM Jane Chu wrote: On 1/25/2019 10:20 AM, Verma, Vishal L wrote: On Fri, 2019-01-25 at 09:18 -0800, Dan Williams wrote: On Fri, Jan 25, 2019 at 12:20 AM Du, Fan wrote: Dan Thanks for the insights! Can I say

Re: [RFC PATCH v3 0/9] fsdax: introduce fs query to support reflink

2021-01-08 Thread Jane Chu
Hi, Shiyang, On 12/18/2020 1:13 AM, Ruan Shiyang wrote: So I tried the patchset with pmem error injection, the SIGBUS payload does not look right - ** SIGBUS(7): ** ** si_addr(0x(nil)), si_lsb(0xC), si_code(0x4, BUS_MCEERR_AR) ** I expect the payload looks like ** si_addr(0x7f3672e0), si

[PATCH] mm/memory-failure: unecessary amount of unmapping

2021-04-19 Thread Jane Chu
It appears that unmap_mapping_range() actually takes a 'size' as its third argument rather than a location, the current calling fashion causes unecessary amount of unmapping to occur. Fixes: 6100e34b2526e ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages")

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-15 Thread Jane Chu
On 12/15/2020 3:58 AM, Ruan Shiyang wrote: Hi Jane On 2020/12/15 上午4:58, Jane Chu wrote: Hi, Shiyang, On 11/22/2020 4:41 PM, Shiyang Ruan wrote: This patchset is a try to resolve the problem of tracking shared page for fsdax. Change from v1:    - Intorduce ->block_lost() for block dev

Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range()

2020-12-15 Thread Jane Chu
On 12/15/2020 4:14 AM, Shiyang Ruan wrote: #ifdef CONFIG_SYSFS +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t off, + size_t len, void *data); int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); void bd_unlink_disk

Re: [RFC PATCH v3 0/9] fsdax: introduce fs query to support reflink

2020-12-16 Thread Jane Chu
Hi, Shiyang, On 12/15/2020 4:14 AM, Shiyang Ruan wrote: The call trace is like this: memory_failure() pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() gendisk->fops->corrupted_range() => - pmem_corrupted_range() - md_blk_corrupted_range

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-14 Thread Jane Chu
Hi, Shiyang, On 11/22/2020 4:41 PM, Shiyang Ruan wrote: This patchset is a try to resolve the problem of tracking shared page for fsdax. Change from v1: - Intorduce ->block_lost() for block device - Support mapped device - Add 'not available' warning for realtime device in XFS - Reb

kernel panic in 5.3-rc5, nfsd_reply_cache_stats_show+0x11

2019-08-20 Thread jane . chu
Hi, Apology if there is a better channel reporting the issue, if so, please let me know. I just saw below regression in 5.3-rc5 kernel, but not in 5.2-rc7 or earlier kernels. [ 3533.659787] mce: Uncorrected hardware memory error in user-access at 383e202000 [ 3533.659903] Memory failure: 0

Re: kernel panic in 5.3-rc5, nfsd_reply_cache_stats_show+0x11

2019-08-21 Thread jane . chu
Hi, Dan, On 8/20/19 8:48 PM, Dan Williams wrote: On Tue, Aug 20, 2019 at 6:39 PM wrote: Hi, Apology if there is a better channel reporting the issue, if so, please let me know. I just saw below regression in 5.3-rc5 kernel, but not in 5.2-rc7 or earlier kernels. Is the error stable enough

Re: kernel panic in 5.3-rc5, nfsd_reply_cache_stats_show+0x11

2019-08-21 Thread jane . chu
Hi, Bruce, Dan, This patch took care the panic issue. thanks, -jane On 8/21/19 7:12 AM, J. Bruce Fields wrote: Probably just needs the following. I've been slow to get some bugfixes upstream, sorry--I'll go send a pull request now --b. commit 78e70e780b28 Author: He Zhe Date: Tue Aug

Re: [PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-10-08 Thread Jane Chu
Hi, Naoya, What is the status of the patches? Is there anything I need to do from my end ? Regards, -jane On 8/6/2019 10:25 AM, Jane Chu wrote: Change in v4: - remove trailing white space Changes in v3: - move **tk cleanup to its own patch Changes in v2: - move 'tk' a

Re: [PATCH] mm: hwpoison: use do_send_sig_info() instead of force_sig() (Re: PMEM error-handling forces SIGKILL causes kernel panic)

2019-01-16 Thread Jane Chu
On 1/16/2019 3:32 PM, Naoya Horiguchi wrote: Hi Jane, On Wed, Jan 16, 2019 at 09:56:02AM -0800, Jane Chu wrote: Hi, Naoya, On 1/16/2019 1:30 AM, Naoya Horiguchi wrote: diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7c72f2a95785..831be5ff5f4d 100644 --- a/mm

Re: [PATCH 5/5] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-24 Thread Jane Chu
Hi, Dave, While chatting with my colleague Erwin about the patchset, it occurred that we're not clear about the error handling part. Specifically, 1. If an uncorrectable error is detected during a 'load' in the hot plugged pmem region, how will the error be handled? will it be handled like PM

Re: [PATCH] mm, memory-failure: clarify error message

2019-05-20 Thread Jane Chu
Thanks Vishal and Naoya! -jane On 5/20/2019 3:21 AM, Naoya Horiguchi wrote: On Fri, May 17, 2019 at 10:18:02AM +0530, Anshuman Khandual wrote: On 05/17/2019 09:38 AM, Jane Chu wrote: Some user who install SIGBUS handler that does longjmp out What the longjmp about ? Are you referring to

[PATCH v2] mm, memory-failure: clarify error message

2019-05-20 Thread Jane Chu
y. Signed-off-by: Jane Chu --- mm/memory-failure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index fc8b517..c4f4bcd 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -216,7 +216,7 @@ static int kill_proc(struct t

Re: [PATCH] mm, memory-failure: clarify error message

2019-05-20 Thread Jane Chu
On 5/16/2019 9:48 PM, Anshuman Khandual wrote: On 05/17/2019 09:38 AM, Jane Chu wrote: Some user who install SIGBUS handler that does longjmp out What the longjmp about ? Are you referring to the mechanism of catching the signal which was registered ? Yes. thanks, -jane

Re: [PATCH v2 0/6] mm/devm_memremap_pages: Fix page release race

2019-05-14 Thread Jane Chu
On 5/14/2019 12:04 PM, Dan Williams wrote: On Tue, May 14, 2019 at 11:53 AM Jane Chu wrote: On 5/13/2019 12:22 PM, Logan Gunthorpe wrote: On 2019-05-08 11:05 a.m., Logan Gunthorpe wrote: On 2019-05-07 5:55 p.m., Dan Williams wrote: Changes since v1 [1]: - Fix a NULL-pointer deref crash in

Re: [PATCH] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-24 Thread jane . chu
Thank you all for your comments! I've incorporated them, tested, and have a v2 ready for review. Thanks! -jane On 7/23/19 11:48 PM, Naoya Horiguchi wrote: Hi Jane, Dan, On Tue, Jul 23, 2019 at 06:34:35PM -0700, Dan Williams wrote: On Tue, Jul 23, 2019 at 4:49 PM Jane Chu wrote: Mmap

[PATCH v2 1/1] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-24 Thread Jane Chu
21 Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Signed-off-by: Jane Chu Suggested-by: Naoya Ho

[PATCH v2 0/1] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-07-24 Thread Jane Chu
oaya's suggestion, also, skip VMAs where "tk->size_shift == 0" for zone device page, and deliver SIGBUS when "tk->size_shift != 0" so the payload is helpful; - added Suggested-by: Naoya Horiguchi Jane Chu (1): mm/memory-failure: Poison read receives SIGKI

Re: [PATCH v2 1/1] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-24 Thread Jane Chu
On 7/24/2019 4:43 PM, Naoya Horiguchi wrote: On Wed, Jul 24, 2019 at 04:33:23PM -0600, Jane Chu wrote: Mmap /dev/dax more than once, then read the poison location using address from one of the mappings. The other mappings due to not having the page mapped in will cause SIGKILLs delivered to

Re: [PATCH v2 0/1] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-07-24 Thread Jane Chu
On 7/24/2019 3:52 PM, Dan Williams wrote: On Wed, Jul 24, 2019 at 3:35 PM Jane Chu wrote: Changes in v2: - move 'tk' allocations internal to add_to_kill(), suggested by Dan; Oh, sorry if it wasn't clear, this should move to its own patch that only does the cleanup, and the

[PATCH v3 1/2] mm/memory-failure.c clean up around tk pre-allocation

2019-07-25 Thread Jane Chu
add_to_kill() expects the first 'tk' to be pre-allocated, it makes subsequent allocations on need basis, this makes the code a bit difficult to read. Move all the allocation internal to add_to_kill() and drop the **tk argument. Signed-off-by: Jane Chu --- mm/memory-fail

[PATCH v3 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-07-25 Thread Jane Chu
T", since the code returns early. Incorporated Noaya's suggestion, also, skip VMAs where "tk->size_shift == 0" for zone device page, and deliver SIGBUS when "tk->size_shift != 0" so the payload is helpful; - added Suggested-by: Naoya Horiguchi Jane

[PATCH v3 2/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-25 Thread Jane Chu
21 Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Signed-off-by: Jane Chu Suggested-by: Naoya Ho

[PATCH v4 1/2] mm/memory-failure.c clean up around tk pre-allocation

2019-08-06 Thread Jane Chu
add_to_kill() expects the first 'tk' to be pre-allocated, it makes subsequent allocations on need basis, this makes the code a bit difficult to read. Move all the allocation internal to add_to_kill() and drop the **tk argument. Signed-off-by: Jane Chu --- mm/memory-fail

[PATCH v4 0/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS issue

2019-08-06 Thread Jane Chu
he SIGKILL if "tk->addr == -EFAULT", since the code returns early. Incorporated Noaya's suggestion, also, skip VMAs where "tk->size_shift == 0" for zone device page, and deliver SIGBUS when "tk->size_shift != 0" so the payload is helpful; - ad

Re: [PATCH v3 1/2] mm/memory-failure.c clean up around tk pre-allocation

2019-08-06 Thread Jane Chu
Hi, Naoya, Thanks a lot! v4 on the way. :) -jane On 8/1/2019 2:06 AM, Naoya Horiguchi wrote: On Thu, Jul 25, 2019 at 04:01:40PM -0600, Jane Chu wrote: add_to_kill() expects the first 'tk' to be pre-allocated, it makes subsequent allocations on need basis, this makes the code a bit

[PATCH v4 2/2] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-08-06 Thread Jane Chu
21 Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Signed-off-by: Jane Chu Suggested-by: Naoya Ho

[PATCH] mm/memory-failure: Poison read receives SIGKILL instead of SIGBUS if mmaped more than once

2019-07-23 Thread Jane Chu
21 Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Signed-off-by: Jane Chu --- mm/memory-failure.c |

Re: [PATCH 0/6] libnvdimm: Fix async operations and locking

2019-06-18 Thread Jane Chu
.h| 71 ++ drivers/nvdimm/pfn_devs.c | 24 +++--- drivers/nvdimm/pmem.c |4 + drivers/nvdimm/region.c | 24 +++--- drivers/nvdimm/region_devs.c| 12 ++- include/linux/device.h |6 ++ 14 files changed, 308 insertions(+),

Re: [PATCH 1/2] libnvdimm/security: 'security' attr never show 'overwrite' state

2020-08-03 Thread Jane Chu
Hi, Dave, On 8/3/2020 1:41 PM, Dave Jiang wrote: On 7/24/2020 9:09 AM, Jane Chu wrote: Since commit d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute"), when issue   # ndctl sanitize-dimm nmem0 --overwrite then immediately check the 'security' at

[PATCH v2 1/3] libnvdimm/security: fix a typo

2020-08-03 Thread Jane Chu
2e82 ("libnvdimm/security: Introduce a 'frozen' attribute") Signed-off-by: Jane Chu Reviewed-by: Dave Jiang --- drivers/nvdimm/security.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c index 4cef69b..8

[PATCH v2 3/3] libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr

2020-08-03 Thread Jane Chu
097c546 ("acpi/nfit, libnvdimm/security: Add security DSM overwrite support") Signed-off-by: Jane Chu Reviewed-by: Dave Jiang --- drivers/nvdimm/security.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/securi

[PATCH v2 2/3] libnvdimm/security: the 'security' attr never show 'overwrite' state

2020-08-03 Thread Jane Chu
bit. Hence security_show() should check the 'overwrite' bit first, in order to indicate the actual state when multiple bits are set in the flags. Signed-off-by: Jane Chu Reviewed-by: Dave Jiang --- drivers/nvdimm/dimm_devs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff

[PATCH 1/2] libnvdimm/security: 'security' attr never show 'overwrite' state

2020-07-23 Thread Jane Chu
it also has a typo: in one occasion, 'nvdimm->sec.ext_state' assignment is replaced with 'nvdimm->sec.flags' assignment for the NVDIMM_MASTER type. Cc: Dan Williams Fixes: d78c620a2e82 ("libnvdimm/security: Introduce a 'frozen' attribute") Signed-off-by

  1   2   >