[PATCH v4 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline

2022-10-31 Thread Tony Luck
Cannot call memory_failure() directly from the fault handler because
mmap_lock (and others) are held.

It is important, but not urgent, to mark the source page as h/w poisoned
and unmap it from other tasks.

Use memory_failure_queue() to request a call to memory_failure() for the
page with the error.

Also provide a stub version for CONFIG_MEMORY_FAILURE=n

Reviewed-by: Miaohe Lin 
Tested-by: Shuai Xue 
Signed-off-by: Tony Luck 
Message-Id: <20221021200120.175753-3-tony.l...@intel.com>
Signed-off-by: Tony Luck 
---
 include/linux/mm.h | 5 -
 mm/memory.c| 4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8bbcccbc5565..03ced659eb58 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3268,7 +3268,6 @@ enum mf_flags {
 int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
  unsigned long count, int mf_flags);
 extern int memory_failure(unsigned long pfn, int flags);
-extern void memory_failure_queue(unsigned long pfn, int flags);
 extern void memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
 extern int sysctl_memory_failure_early_kill;
@@ -3277,8 +3276,12 @@ extern void shake_page(struct page *p);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
+extern void memory_failure_queue(unsigned long pfn, int flags);
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
 #else
+static inline void memory_failure_queue(unsigned long pfn, int flags)
+{
+}
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 {
return 0;
diff --git a/mm/memory.c b/mm/memory.c
index b6056eef2f72..eae242351726 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2866,8 +2866,10 @@ static inline int __wp_page_copy_user(struct page *dst, 
struct page *src,
unsigned long addr = vmf->address;
 
if (likely(src)) {
-   if (copy_mc_user_highpage(dst, src, addr, vma))
+   if (copy_mc_user_highpage(dst, src, addr, vma)) {
+   memory_failure_queue(page_to_pfn(src), 0);
return -EHWPOISON;
+   }
return 0;
}
 
-- 
2.37.3



[PATCH v4 1/2] mm, hwpoison: Try to recover from copy-on write faults

2022-10-31 Thread Tony Luck
If the kernel is copying a page as the result of a copy-on-write
fault and runs into an uncorrectable error, Linux will crash because
it does not have recovery code for this case where poison is consumed
by the kernel.

It is easy to set up a test case. Just inject an error into a private
page, fork(2), and have the child process write to the page.

I wrapped that neatly into a test at:

  git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git

just enable ACPI error injection and run:

  # ./einj_mem-uc -f copy-on-write

Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
on architectures where that is available (currently x86 and powerpc).
When an error is detected during the page copy, return VM_FAULT_HWPOISON
to caller of wp_page_copy(). This propagates up the call stack. Both x86
and powerpc have code in their fault handler to deal with this code by
sending a SIGBUS to the application.

Note that this patch avoids a system crash and signals the process that
triggered the copy-on-write action. It does not take any action for the
memory error that is still in the shared page. To handle that a call to
memory_failure() is needed. But this cannot be done from wp_page_copy()
because it holds mmap_lock(). Perhaps the architecture fault handlers
can deal with this loose end in a subsequent patch?

On Intel/x86 this loose end will often be handled automatically because
the memory controller provides an additional notification of the h/w
poison in memory, the handler for this will call memory_failure(). This
isn't a 100% solution. If there are multiple errors, not all may be
logged in this way.

Reviewed-by: Dan Williams 
Reviewed-by: Miaohe Lin 
Reviewed-by: Naoya Horiguchi 
Tested-by: Shuai Xue 
Signed-off-by: Tony Luck 
Message-Id: <20221021200120.175753-2-tony.l...@intel.com>
Signed-off-by: Tony Luck 
---
 include/linux/highmem.h | 26 ++
 mm/memory.c | 30 --
 2 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index e9912da5441b..44242268f53b 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -319,6 +319,32 @@ static inline void copy_user_highpage(struct page *to, 
struct page *from,
 
 #endif
 
+#ifdef copy_mc_to_kernel
+static inline int copy_mc_user_highpage(struct page *to, struct page *from,
+   unsigned long vaddr, struct 
vm_area_struct *vma)
+{
+   unsigned long ret;
+   char *vfrom, *vto;
+
+   vfrom = kmap_local_page(from);
+   vto = kmap_local_page(to);
+   ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE);
+   if (!ret)
+   kmsan_unpoison_memory(page_address(to), PAGE_SIZE);
+   kunmap_local(vto);
+   kunmap_local(vfrom);
+
+   return ret;
+}
+#else
+static inline int copy_mc_user_highpage(struct page *to, struct page *from,
+   unsigned long vaddr, struct 
vm_area_struct *vma)
+{
+   copy_user_highpage(to, from, vaddr, vma);
+   return 0;
+}
+#endif
+
 #ifndef __HAVE_ARCH_COPY_HIGHPAGE
 
 static inline void copy_highpage(struct page *to, struct page *from)
diff --git a/mm/memory.c b/mm/memory.c
index f88c351aecd4..b6056eef2f72 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,10 +2848,16 @@ static inline int pte_unmap_same(struct vm_fault *vmf)
return same;
 }
 
-static inline bool __wp_page_copy_user(struct page *dst, struct page *src,
-  struct vm_fault *vmf)
+/*
+ * Return:
+ * 0:  copied succeeded
+ * -EHWPOISON: copy failed due to hwpoison in source page
+ * -EAGAIN:copied failed (some other reason)
+ */
+static inline int __wp_page_copy_user(struct page *dst, struct page *src,
+ struct vm_fault *vmf)
 {
-   bool ret;
+   int ret;
void *kaddr;
void __user *uaddr;
bool locked = false;
@@ -2860,8 +2866,9 @@ static inline bool __wp_page_copy_user(struct page *dst, 
struct page *src,
unsigned long addr = vmf->address;
 
if (likely(src)) {
-   copy_user_highpage(dst, src, addr, vma);
-   return true;
+   if (copy_mc_user_highpage(dst, src, addr, vma))
+   return -EHWPOISON;
+   return 0;
}
 
/*
@@ -2888,7 +2895,7 @@ static inline bool __wp_page_copy_user(struct page *dst, 
struct page *src,
 * and update local tlb only
 */
update_mmu_tlb(vma, addr, vmf->pte);
-   ret = false;
+   ret = -EAGAIN;
goto pte_unlock;
}
 
@@ -2913,7 +2920,7 @@ static inline bool __wp_page_copy_user(struct page *dst, 
struct page *src,
if (!likely(pte_same(*vmf->pte

[PATCH v4 0/2] Copy-on-write poison recovery

2022-10-31 Thread Tony Luck
Recover from poison consumption while copying pages
in the kernel for a copy-on-write fault.

Changes since v3:

1) Miaohe Lin  pointed out that a recent change
by Alexander Potapenko  to copy_user_highpage()
added a call to kmsan_unpoison_memory().  Same is needed in my cloned
copy_mc_user_highpage() ... at least in the successful case where the
page was copied with no machine checks.

2) Picked up some additional Reviewed-by and Tested-by tags.

Tony Luck (2):
  mm, hwpoison: Try to recover from copy-on write faults
  mm, hwpoison: When copy-on-write hits poison, take page offline

 include/linux/highmem.h | 26 ++
 include/linux/mm.h  |  5 -
 mm/memory.c | 32 ++--
 3 files changed, 52 insertions(+), 11 deletions(-)


base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763
-- 
2.37.3



[PATCH v3 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline

2022-10-21 Thread Tony Luck
Cannot call memory_failure() directly from the fault handler because
mmap_lock (and others) are held.

It is important, but not urgent, to mark the source page as h/w poisoned
and unmap it from other tasks.

Use memory_failure_queue() to request a call to memory_failure() for the
page with the error.

Also provide a stub version for CONFIG_MEMORY_FAILURE=n

Signed-off-by: Tony Luck 
---
 include/linux/mm.h | 5 -
 mm/memory.c| 4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8bbcccbc5565..03ced659eb58 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3268,7 +3268,6 @@ enum mf_flags {
 int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
  unsigned long count, int mf_flags);
 extern int memory_failure(unsigned long pfn, int flags);
-extern void memory_failure_queue(unsigned long pfn, int flags);
 extern void memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
 extern int sysctl_memory_failure_early_kill;
@@ -3277,8 +3276,12 @@ extern void shake_page(struct page *p);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 #ifdef CONFIG_MEMORY_FAILURE
+extern void memory_failure_queue(unsigned long pfn, int flags);
 extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags);
 #else
+static inline void memory_failure_queue(unsigned long pfn, int flags)
+{
+}
 static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 {
return 0;
diff --git a/mm/memory.c b/mm/memory.c
index b6056eef2f72..eae242351726 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2866,8 +2866,10 @@ static inline int __wp_page_copy_user(struct page *dst, 
struct page *src,
unsigned long addr = vmf->address;
 
if (likely(src)) {
-   if (copy_mc_user_highpage(dst, src, addr, vma))
+   if (copy_mc_user_highpage(dst, src, addr, vma)) {
+   memory_failure_queue(page_to_pfn(src), 0);
return -EHWPOISON;
+   }
return 0;
}
 
-- 
2.37.3



[PATCH v3 1/2] mm, hwpoison: Try to recover from copy-on write faults

2022-10-21 Thread Tony Luck
If the kernel is copying a page as the result of a copy-on-write
fault and runs into an uncorrectable error, Linux will crash because
it does not have recovery code for this case where poison is consumed
by the kernel.

It is easy to set up a test case. Just inject an error into a private
page, fork(2), and have the child process write to the page.

I wrapped that neatly into a test at:

  git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git

just enable ACPI error injection and run:

  # ./einj_mem-uc -f copy-on-write

Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
on architectures where that is available (currently x86 and powerpc).
When an error is detected during the page copy, return VM_FAULT_HWPOISON
to caller of wp_page_copy(). This propagates up the call stack. Both x86
and powerpc have code in their fault handler to deal with this code by
sending a SIGBUS to the application.

Note that this patch avoids a system crash and signals the process that
triggered the copy-on-write action. It does not take any action for the
memory error that is still in the shared page. To handle that a call to
memory_failure() is needed. But this cannot be done from wp_page_copy()
because it holds mmap_lock(). Perhaps the architecture fault handlers
can deal with this loose end in a subsequent patch?

On Intel/x86 this loose end will often be handled automatically because
the memory controller provides an additional notification of the h/w
poison in memory, the handler for this will call memory_failure(). This
isn't a 100% solution. If there are multiple errors, not all may be
logged in this way.

Reviewed-by: Dan Williams 
Signed-off-by: Tony Luck 

---
Changes in V3:
Dan Williams
Rename copy_user_highpage_mc() to copy_mc_user_highpage() for
consistency with Linus' discussion on names of functions that
check for machine check.
Write complete functions for the have/have-not copy_mc_to_kernel
cases (so grep shows there are two versions)
Change __wp_page_copy_user() to return 0 for success, negative for fail
[I picked -EAGAIN for both non-EHWPOISON cases]

Changes in V2:
   Naoya Horiguchi:
1) Use -EHWPOISON error code instead of minus one.
2) Poison path needs also to deal with old_page
   Tony Luck:
Rewrote commit message
Added some powerpc folks to Cc: list
---
 include/linux/highmem.h | 24 
 mm/memory.c | 30 --
 2 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index e9912da5441b..a32c64681f03 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -319,6 +319,30 @@ static inline void copy_user_highpage(struct page *to, 
struct page *from,
 
 #endif
 
+#ifdef copy_mc_to_kernel
+static inline int copy_mc_user_highpage(struct page *to, struct page *from,
+   unsigned long vaddr, struct 
vm_area_struct *vma)
+{
+   unsigned long ret;
+   char *vfrom, *vto;
+
+   vfrom = kmap_local_page(from);
+   vto = kmap_local_page(to);
+   ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE);
+   kunmap_local(vto);
+   kunmap_local(vfrom);
+
+   return ret;
+}
+#else
+static inline int copy_mc_user_highpage(struct page *to, struct page *from,
+   unsigned long vaddr, struct 
vm_area_struct *vma)
+{
+   copy_user_highpage(to, from, vaddr, vma);
+   return 0;
+}
+#endif
+
 #ifndef __HAVE_ARCH_COPY_HIGHPAGE
 
 static inline void copy_highpage(struct page *to, struct page *from)
diff --git a/mm/memory.c b/mm/memory.c
index f88c351aecd4..b6056eef2f72 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,10 +2848,16 @@ static inline int pte_unmap_same(struct vm_fault *vmf)
return same;
 }
 
-static inline bool __wp_page_copy_user(struct page *dst, struct page *src,
-  struct vm_fault *vmf)
+/*
+ * Return:
+ * 0:  copied succeeded
+ * -EHWPOISON: copy failed due to hwpoison in source page
+ * -EAGAIN:copied failed (some other reason)
+ */
+static inline int __wp_page_copy_user(struct page *dst, struct page *src,
+ struct vm_fault *vmf)
 {
-   bool ret;
+   int ret;
void *kaddr;
void __user *uaddr;
bool locked = false;
@@ -2860,8 +2866,9 @@ static inline bool __wp_page_copy_user(struct page *dst, 
struct page *src,
unsigned long addr = vmf->address;
 
if (likely(src)) {
-   copy_user_highpage(dst, src, addr, vma);
-   return true;
+   if (copy_mc_user_highpage(dst, src, addr, vma))
+   return -EHWPOISON;
+   return 0;
}
 
/*
@@ -2888,7 +2895,7 @@ static inline bool __wp_page_copy_user(struct page *dst, 
struct page *

[PATCH v3 0/2] Copy-on-write poison recovery

2022-10-21 Thread Tony Luck
Part 1 deals with the process that triggered the copy on write
fault with a store to a shared read-only page. That process is
send a SIGBUS with the usual machine check decoration to specify
the virtual address of the lost page, together with the scope.

Part 2 sets up to asynchronously take the page with the uncorrected
error offline to prevent additional machine check faults. H/t to
Miaohe Lin  and Shuai Xue 
for pointing me to the existing function to queue a call to
memory_failure().

On x86 there is some duplicate reporting (because the error is
also signalled by the memory controller as well as by the core
that triggered the machine check). Console logs look like this:

[ 1647.723403] mce: [Hardware Error]: Machine check events logged
Machine check from kernel copy routine

[ 1647.723414] MCE: Killing einj_mem_uc:3600 due to hardware memory corruption 
fault at 7f3309503400
x86 fault handler sends SIGBUS to child process

[ 1647.735183] Memory failure: 0x905b92d: recovery action for dirty LRU page: 
Recovered
Async call to memory_failure() from copy on write path

[ 1647.748397] Memory failure: 0x905b92d: already hardware poisoned
uc_decode_notifier() processes memory controller report

[ 1647.761313] MCE: Killing einj_mem_uc:3599 due to hardware memory corruption 
fault at 7f3309503400
Parent process tries to read poisoned page. Page has been unmapped, so
#PF handler sends SIGBUS


Tony Luck (2):
  mm, hwpoison: Try to recover from copy-on write faults
  mm, hwpoison: When copy-on-write hits poison, take page offline

 include/linux/highmem.h | 24 
 include/linux/mm.h  |  5 -
 mm/memory.c | 32 ++--
 3 files changed, 50 insertions(+), 11 deletions(-)

-- 
2.37.3



Re: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults

2022-10-20 Thread Tony Luck
On Fri, Oct 21, 2022 at 09:52:01AM +0800, Shuai Xue wrote:
> 
> 
> 在 2022/10/21 AM4:05, Tony Luck 写道:
> > On Thu, Oct 20, 2022 at 09:57:04AM +0800, Shuai Xue wrote:
> >>
> >>
> >> 在 2022/10/20 AM1:08, Tony Luck 写道:

> > I'm experimenting with using sched_work() to handle the call to
> > memory_failure() (echoing what the machine check handler does using
> > task_work)_add() to avoid the same problem of not being able to directly
> > call memory_failure()).
> 
> Work queues permit work to be deferred outside of the interrupt context
> into the kernel process context. If we return to user-space before the
> queued memory_failure() work is processed, we will take the fault again,
> as we discussed recently.
> 
> commit 7f17b4a121d0d ACPI: APEI: Kick the memory_failure() queue for 
> synchronous errors
> commit 415fed694fe11 ACPI: APEI: do not add task_work to kernel thread to 
> avoid memory leak
> 
> So, in my opinion, we should add memory failure as a task work, like
> do_machine_check does, e.g.
> 
> queue_task_work(, msg, kill_me_maybe);

Maybe ... but this case isn't pending back to a user instruction
that is trying to READ the poison memory address. The task is just
trying to WRITE to any address within the page.

So this is much more like a patrol scrub error found asynchronously
by the memory controller (in this case found asynchronously by the
Linux page copy function).  So I don't feel that it's really the
responsibility of the current task.

When we do return to user mode the task is going to be busy servicing
a SIGBUS ... so shouldn't try to touch the poison page before the
memory_failure() called by the worker thread cleans things up.

> > +   INIT_WORK(>work, do_sched_memory_failure);
> > +   p->pfn = pfn;
> > +   schedule_work(>work);
> > +}
> 
> I think there is already a function to do such work in mm/memory-failure.c.
> 
>   void memory_failure_queue(unsigned long pfn, int flags)

Also pointed out by Miaohe Lin  ... this does
exacly what I want, and is working well in tests so far. So perhaps
a cleaner solution than making the kill_me_maybe() function globally
visible.

-Tony


Re: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults

2022-10-20 Thread Tony Luck
On Thu, Oct 20, 2022 at 09:57:04AM +0800, Shuai Xue wrote:
> 
> 
> 在 2022/10/20 AM1:08, Tony Luck 写道:
> > If the kernel is copying a page as the result of a copy-on-write
> > fault and runs into an uncorrectable error, Linux will crash because
> > it does not have recovery code for this case where poison is consumed
> > by the kernel.
> > 
> > It is easy to set up a test case. Just inject an error into a private
> > page, fork(2), and have the child process write to the page.
> > 
> > I wrapped that neatly into a test at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git
> > 
> > just enable ACPI error injection and run:
> > 
> >   # ./einj_mem-uc -f copy-on-write
> > 
> > Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
> > on architectures where that is available (currently x86 and powerpc).
> > When an error is detected during the page copy, return VM_FAULT_HWPOISON
> > to caller of wp_page_copy(). This propagates up the call stack. Both x86
> > and powerpc have code in their fault handler to deal with this code by
> > sending a SIGBUS to the application.
> 
> Does it send SIGBUS to only child process or both parent and child process?

This only sends a SIGBUS to the process that wrote the page (typically
the child, but also possible that the parent is the one that does the
write that causes the COW).

> > 
> > Note that this patch avoids a system crash and signals the process that
> > triggered the copy-on-write action. It does not take any action for the
> > memory error that is still in the shared page. To handle that a call to
> > memory_failure() is needed. 
> 
> If the error page is not poisoned, should the return value of wp_page_copy
> be VM_FAULT_HWPOISON or VM_FAULT_SIGBUS? When is_hwpoison_entry(entry) or
> PageHWPoison(page) is true, do_swap_page return VM_FAULT_HWPOISON to caller.
> And when is_swapin_error_entry is true, do_swap_page return VM_FAULT_SIGBUS.

The page has uncorrected data in it, but this patch doesn't mark it
as poisoned.  Returning VM_FAULT_SIGBUS would send an "ordinary" SIGBUS
that doesn't include the BUS_MCEERR_AR and "lsb" information. It would
also skip the:

"MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n"

console message. So might result in confusion and attepmts to debug a
s/w problem with the application instead of blaming the death on a bad
DIMM.

> > But this cannot be done from wp_page_copy()
> > because it holds mmap_lock(). Perhaps the architecture fault handlers
> > can deal with this loose end in a subsequent patch?

I started looking at this for x86 ... but I have changed my mind
about this being a good place for a fix. When control returns back
to the architecture fault handler it no longer has easy access to
the physical page frame number. It has the virtual address, so it
could descend back into somee new mm/memory.c function to get the
physical address ... but that seems silly.

I'm experimenting with using sched_work() to handle the call to
memory_failure() (echoing what the machine check handler does using
task_work)_add() to avoid the same problem of not being able to directly
call memory_failure()).

So far it seems to be working. Patch below (goes on top of original
patch ... well on top of the internal version with mods based on
feedback from Dan Williams ... but should show the general idea)

With this patch applied the page does get unmapped from all users.
Other tasks that shared the page will get a SIGBUS if they attempt
to access it later (from the page fault handler because of
is_hwpoison_entry() as you mention above.

-Tony

>From d3879e83bf91cd6c61e12d32d3e15eb6ef069204 Mon Sep 17 00:00:00 2001
From: Tony Luck 
Date: Thu, 20 Oct 2022 09:57:28 -0700
Subject: [PATCH] mm, hwpoison: Call memory_failure() for source page of COW
 failure

Cannot call memory_failure() directly from the fault handler because
mmap_lock (and others) are held.

It is important, but not urgent, to mark the source page as h/w poisoned
and unmap it from other tasks.

Use schedule_work() to queue a request to call memory_failure() for the
page with the error.

Signed-off-by: Tony Luck 
---
 mm/memory.c | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index b6056eef2f72..4a1304cf1f4e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,6 +2848,37 @@ static inline int pte_unmap_same(struct vm_fault *vmf)
return same;
 }
 
+#ifdef CONFIG_MEMORY_FAILURE
+struct pfn_work {
+   struct work_struct work;
+   unsigned long pfn;
+};
+
+static void do_sched_memory_failure(struct work_struct *w)
+{
+   struct pfn_work *p = container_of(w, struct pf

[PATCH v2] mm, hwpoison: Try to recover from copy-on write faults

2022-10-19 Thread Tony Luck
If the kernel is copying a page as the result of a copy-on-write
fault and runs into an uncorrectable error, Linux will crash because
it does not have recovery code for this case where poison is consumed
by the kernel.

It is easy to set up a test case. Just inject an error into a private
page, fork(2), and have the child process write to the page.

I wrapped that neatly into a test at:

  git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git

just enable ACPI error injection and run:

  # ./einj_mem-uc -f copy-on-write

Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
on architectures where that is available (currently x86 and powerpc).
When an error is detected during the page copy, return VM_FAULT_HWPOISON
to caller of wp_page_copy(). This propagates up the call stack. Both x86
and powerpc have code in their fault handler to deal with this code by
sending a SIGBUS to the application.

Note that this patch avoids a system crash and signals the process that
triggered the copy-on-write action. It does not take any action for the
memory error that is still in the shared page. To handle that a call to
memory_failure() is needed. But this cannot be done from wp_page_copy()
because it holds mmap_lock(). Perhaps the architecture fault handlers
can deal with this loose end in a subsequent patch?

On Intel/x86 this loose end will often be handled automatically because
the memory controller provides an additional notification of the h/w
poison in memory, the handler for this will call memory_failure(). This
isn't a 100% solution. If there are multiple errors, not all may be
logged in this way.

Signed-off-by: Tony Luck 

---
Changes in V2:
   Naoya Horiguchi:
1) Use -EHWPOISON error code instead of minus one.
2) Poison path needs also to deal with old_page
   Tony Luck:
Rewrote commit message
Added some powerpc folks to Cc: list
---
 include/linux/highmem.h | 19 +++
 mm/memory.c | 28 +++-
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index e9912da5441b..5967541fbf0e 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -319,6 +319,25 @@ static inline void copy_user_highpage(struct page *to, 
struct page *from,
 
 #endif
 
+static inline int copy_user_highpage_mc(struct page *to, struct page *from,
+   unsigned long vaddr, struct 
vm_area_struct *vma)
+{
+   unsigned long ret = 0;
+#ifdef copy_mc_to_kernel
+   char *vfrom, *vto;
+
+   vfrom = kmap_local_page(from);
+   vto = kmap_local_page(to);
+   ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE);
+   kunmap_local(vto);
+   kunmap_local(vfrom);
+#else
+   copy_user_highpage(to, from, vaddr, vma);
+#endif
+
+   return ret;
+}
+
 #ifndef __HAVE_ARCH_COPY_HIGHPAGE
 
 static inline void copy_highpage(struct page *to, struct page *from)
diff --git a/mm/memory.c b/mm/memory.c
index f88c351aecd4..a32556c9b689 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,8 +2848,14 @@ static inline int pte_unmap_same(struct vm_fault *vmf)
return same;
 }
 
-static inline bool __wp_page_copy_user(struct page *dst, struct page *src,
-  struct vm_fault *vmf)
+/*
+ * Return:
+ * -EHWPOISON: copy failed due to hwpoison in source page
+ * 0:  copied failed (some other reason)
+ * 1:  copied succeeded
+ */
+static inline int __wp_page_copy_user(struct page *dst, struct page *src,
+ struct vm_fault *vmf)
 {
bool ret;
void *kaddr;
@@ -2860,8 +2866,9 @@ static inline bool __wp_page_copy_user(struct page *dst, 
struct page *src,
unsigned long addr = vmf->address;
 
if (likely(src)) {
-   copy_user_highpage(dst, src, addr, vma);
-   return true;
+   if (copy_user_highpage_mc(dst, src, addr, vma))
+   return -EHWPOISON;
+   return 1;
}
 
/*
@@ -2888,7 +2895,7 @@ static inline bool __wp_page_copy_user(struct page *dst, 
struct page *src,
 * and update local tlb only
 */
update_mmu_tlb(vma, addr, vmf->pte);
-   ret = false;
+   ret = 0;
goto pte_unlock;
}
 
@@ -2913,7 +2920,7 @@ static inline bool __wp_page_copy_user(struct page *dst, 
struct page *src,
if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
/* The PTE changed under us, update local tlb */
update_mmu_tlb(vma, addr, vmf->pte);
-   ret = false;
+   ret = 0;
goto pte_unlock;
}
 
@@ -2932,7 +2939,7 @@ static inline bool __wp_page_c

Re: [PATCH v3 2/7] uaccess: Tell user_access_begin() if it's for a write or not

2020-01-24 Thread Tony Luck
On Thu, Jan 23, 2020 at 10:03 AM Linus Torvalds
 wrote:
> We used to have a read/write argument to the old "verify_area()" and
> "access_ok()" model, and it was a mistake. It was due to odd i386 user
> access issues. We got rid of it. I'm not convinced this is any better
> - it looks very similar and for odd ppc access issues.

If the mode (read or write) were made visible to the trap handler, I'd
find that useful for machine check recovery.  If I'm in the middle of a
copy_from_user() and I get a machine check reading poison from a
user address ... then I could try to recover in the same way as for the
user accessing the poison (offline the page, SIGBUS the task). But if
the poison is in kernel memory and we are doing a copy_to_user(), then
we are hosed (or would need some more complex recovery plan).

[Note that we only get recoverable machine checks on loads... writes
are posted, so if something goes wrong it isn't synchronous with the store
instruction that initiated it]

-Tony


Re: [PATCH v4 2/5] ia64: reuse append_elf_note() and final_note() functions

2017-01-31 Thread Tony Luck
On Wed, Jan 25, 2017 at 11:15 AM, Hari Bathini
 wrote:
> I haven't gotten a success/failure build report from zero-day. Not sure what
> to make of it.

zero-day is generally silent unless it sees a problem. So no news is good news.

> But I did try cross-compiling and it was successful. Should that do?

I guess so. What tree do these apply to?  I tried 4.10-rc5 and "git am"
protested ... but I didn't look closely as at why.

-Tony


Re: [PATCH v4 2/5] ia64: reuse append_elf_note() and final_note() functions

2017-01-24 Thread Tony Luck
On Tue, Jan 24, 2017 at 10:11 AM, Hari Bathini
<hbath...@linux.vnet.ibm.com> wrote:

> Hello IA64 folks,
>
> Could you please review this patch..?

It looks OK in principal.  My lab is in partial disarray at the
moment (just got back from a sabbatical) so I can't test
build and boot. Have you cross-compiled it (or gotten a success
build report from zero-day)?

If you have ... then add an Acked-by: Tony Luck <tony.l...@intel.com>

-Tony


Re: [PATCH v5 2/2] [BUGFIX] kprobes: Fix Failed to find blacklist error on ia64 and ppc64

2014-07-14 Thread Tony Luck
On Tue, Jul 8, 2014 at 5:07 AM, Masami Hiramatsu
masami.hiramatsu...@hitachi.com wrote:
 Ping?

 This patch can be applied without 1/2, and will fix ia64/ppc64 problem.

Is somebody going to push this upstream? Another week has gone by,
we are at -rc5, and I'm still seeing the

  Failed to find blacklist a0010133b150

messages on ia64.

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFT PATCH -next v3] [BUGFIX] kprobes: Fix Failed to find blacklist error on ia64 and ppc64

2014-06-17 Thread Tony Luck
On Thu, Jun 5, 2014 at 11:38 PM, Masami Hiramatsu
masami.hiramatsu...@hitachi.com wrote:
 Ping?

 I guess this should go to 3.16 branch, shouldn't it?

 (2014/05/30 12:18), Masami Hiramatsu wrote:
 On ia64 and ppc64, the function pointer does not point the
 entry address of the function, but the address of function
 discriptor (which contains the entry address and misc
 data.) Since the kprobes passes the function pointer stored
 by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for
 initalizing its blacklist, it fails and reports many errors
 as below.

   Failed to find blacklist 000101316830

Yes please ... just found this problem on ia64 in mainline
and was happy to see this fix for it.

Tested-by: Tony Luck tony.l...@intel.com

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 00/11] Add (de)compression support to pstore

2013-08-19 Thread Tony Luck
On Sat, Aug 17, 2013 at 11:32 AM, Kees Cook keesc...@chromium.org wrote:
 Yeah, this is great. While I haven't tested it myself yet, the code
 seems to be in good shape. I acked the ram piece separately, but
 consider the entire series:

 Reviewed-by: Kees Cook keesc...@chromium.org

Applied.  This should show up in linux-next tomorrow.

Anyone using efivars as the pstore backend?  Testing reports (positive
or negative) appreciated.

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/11] Add compression support to pstore

2013-08-07 Thread Tony Luck
Oh - one more thing - and my apologies for not spotting this before:

dst = allocate_buf_for_compression(big_buf_sz);

No - you may not call kmalloc() in oops/panic context.  Please pre-allocate
everything you need in some initialization code to make sure that we don't
fail in the panic path because we can't get the memory we need.

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/11] Add compression support to pstore

2013-08-07 Thread Tony Luck
On Tue, Aug 6, 2013 at 10:35 PM, Tony Luck tony.l...@gmail.com wrote:
 ERST is at the whim of the BIOS writer (the ACPI standard doesn't provide any
 suggestions on record sizes).  My systems support ~6K record size.

Off by a little - 7896 bytes on my current machine.

 efivars has, IIRC, a 1k limit coded in the Linux back end.
My memory was correct for this one.

Adding a little tracing to pstore_getrecords() I see this:

pstore: inflated 3880 bytes compressed to 17459 bytes
pstore: inflated 2567 bytes compressed to 17531 bytes
pstore: inflated 4018 bytes compressed to 17488 bytes

Which isn't at all what I expected.  The ERST backend
advertised a bufsize of 7896, and I have the default
kmsg_bytes of 10240.  So on my forced panic the code
decided to create a three part pstore dump.  The sum of
the pieces is close to, but a little over the target of 10K.
But I don't understand why the compressed sizes are so
much smaller that the ERST backend block size.

The uncompressed sizes appear to be close to constant.
The compression ratios vary from 14% to 23%

Why do we get three small parts instead of two bigger
ones close the the 7896 ERST bufsize?

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/11] Add compression support to pstore

2013-08-07 Thread Tony Luck
On Wed, Aug 7, 2013 at 9:29 PM, Aruna Balakrishnaiah
ar...@linux.vnet.ibm.com wrote:
 When we preallocate, we can use the same big_buf for compression as well as
 decompression.
 Also workspace will be one for both. By allocating max of inflate workspace
 size and deflate
 workspace size. We can save memory here.

Well decompression isn't a problem. We are doing that in the non-panicing
context of the freshly booted kernel so we can allocate memory without any
worries for this.  It's only the compression during panic where we must
pre-allocate.  But if the sizes are close to the same, then we might as well
use the same buffers for both (and simplify the code because we don't have
to worry about the kmalloc/kfree bits.

 If pre-allocating close to 50k of buffer is not a issue. We can go ahead
 with this approach.

I never care about allocations measured in *kilo*bytes[1] - the smallest systems
I use have 32GB - so 50K is so far down in the noise of other allocations.
But other types of systems might be more concerned.  ERST is generally
only implemented on servers ... so the better question might be:
What are the sizes for the EFI backend (where the buffer size is 1024). It
sounds like it should scale linearly ... so below 8K???  That should not
scare many people. Even phones measure memory in hundreds of MBytes.

-Tony

[1] unless they are per-cpu or per something else that there are a lot of
on a big server - but this is a one-per-system allocation.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/11] Add compression support to pstore

2013-08-06 Thread Tony Luck
On Mon, Aug 5, 2013 at 2:20 PM, Tony Luck tony.l...@gmail.com wrote:
 Still have problems booting if there are any compressed images in ERST
 to be inflated.

So I took another look at this part of the code ... and saw a couple of issues:

while ((size = psi-read(id, type, count, time, buf, compressed,
psi))  0) {
if (compressed  (type == PSTORE_TYPE_DMESG)) {
big_buf_sz = (psinfo-bufsize * 100) / 45;
big_buf = allocate_buf_for_decompression(big_buf_sz);

if (big_buf || stream.workspace)
 Did you mean  here rather that ||?
unzipped_len = pstore_decompress(buf, big_buf,
size, big_buf_sz);
 Need an else here to set unzipped_len to -1 (or set it to -1 down
 at the bottom of the loop ready for next time around.

if (unzipped_len  0) {
buf = big_buf;
 This sets us up for problems.  First, you just overwrote the address
 of the buffer that psi-read allocated - so we have a memory leak. But
 worse than that we now double free the same buffer below when we
 kfree(buf) and then kfree(big_buf)
size = unzipped_len;
compressed = false;
} else {
pr_err(pstore: decompression failed;
returned %d\n, unzipped_len);
compressed = true;
}
}
rc = pstore_mkfile(type, psi-name, id, count, buf,
  compressed, (size_t)size, time, psi);
kfree(buf);
kfree(stream.workspace);
kfree(big_buf);
buf = NULL;
stream.workspace = NULL;
big_buf = NULL;
if (rc  (rc != -EEXIST || !quiet))
failed++;
}


See attached patch that fixes these - but the code still looks like it
could be cleaned up a bit more.

-Tony


pstore.patch
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/11] Add compression support to pstore

2013-08-06 Thread Tony Luck
On Tue, Aug 6, 2013 at 6:58 PM, Aruna Balakrishnaiah
ar...@linux.vnet.ibm.com wrote:
 The patch looks right. I will clean it up. Does the issue still persist
 after this?

Things seem to be working - but testing has hardly been extensive (just
a couple of forced panics).

I do have one other question. In this code:

  if (compressed  (type == PSTORE_TYPE_DMESG)) {
  big_buf_sz = (psinfo-bufsize * 100) / 45;

Where does the magic multiply by 1.45 come from?  Is that always enough
for the decompression of dmesg type data to succeed?

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/11] Add compression support to pstore

2013-08-06 Thread Tony Luck
On Tue, Aug 6, 2013 at 10:13 PM, Aruna Balakrishnaiah
ar...@linux.vnet.ibm.com wrote:
 How is it with erst and efivars?

ERST is at the whim of the BIOS writer (the ACPI standard doesn't provide any
suggestions on record sizes).  My systems support ~6K record size.

efivars has, IIRC, a 1k limit coded in the Linux back end.

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/11] Add compression support to pstore

2013-08-05 Thread Tony Luck
One more experiment - removed previous hack that disabled compression.
Added a new hack to skip decompression.

System died cleanly when I forced a panic.
On reboot I found 3 files in pstore:
-r--r--r--   1 root root 3972 Aug  5 09:24 dmesg-erst-5908671953186586625
-r--r--r--   1 root root 2565 Aug  5 09:24 dmesg-erst-5908671953186586626
-r--r--r--   1 root root 4067 Aug  5 09:24 dmesg-erst-5908671953186586627

Using  openssl zlib -d to decompress then ends up with some garbage
at the end of the decompressed file - some text that should be there is
missing.  E.g. the tail of decompressed version of *625 ends with:

4Call Trace:
4 [815f85f4] dump_stack+0x45/0x56
4 [815f41ca] panic+0xc2/0x1cb
4 [815f4327] ? printk+0x54/0x56
4 [811cfe45] aegl+0x25/0x30
4 [811c719d] proc_reg_write+0x3d/0x80
4 [81165945] vfs_write+0xc5/0x1e0
4 [81165e32] SyS_write+0x52/0xa0
4 [81606882] system_call_fastpath+0x16/0x1b
 )c10^@^@^@^@^@^@^@^@^@

But my serial console logged this:

Call Trace:
 [815f85f4] dump_stack+0x45/0x56
 [815f41ca] panic+0xc2/0x1cb
 [815f4327] ? printk+0x54/0x56
 [811cfe45] aegl+0x25/0x30
 [811c719d] proc_reg_write+0x3d/0x80
 [81165945] vfs_write+0xc5/0x1e0
 [81165e32] SyS_write+0x52/0xa0
 [81606882] system_call_fastpath+0x16/0x1b
[ cut here ]
WARNING: CPU: 18 PID: 381 at arch/x86/kernel/smp.c:124
native_smp_send_reschedule+0x5b/0x60()
Modules linked in:
CPU: 18 PID: 381 Comm: kworker/18:1 Not tainted 3.11.0-rc3-11-ge41db9e #6

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/11] Add compression support to pstore

2013-08-05 Thread Tony Luck
See attachment for what I actually applied - I think I got what you
suggested (I added a declaration for total_len).

Forcing a panic worked some things were logged to pstore.

But on reboot with your patches applied I'm still seeing a GP fault
when pstore is mounted and we find compressed records and inflate them
and install them into the pstore filesystem.  Here's the oops:

general protection fault:  [#1] SMP
Modules linked in:
CPU: 29 PID: 10252 Comm: mount Not tainted 3.11.0-rc3-12-g73bec18 #2
Hardware name: Intel Corporation LH Pass ../SVRBD-ROW_T, BIOS
SE5C600.86B.99.99.x059.091020121352 09/10/2012
task: 88082e934040 ti: 88082e2ec000 task.ti: 88082e2ec000
RIP: 0010:[8126d314]  [8126d314] pstore_mkfile+0x84/0x410
RSP: 0018:88082e2edc70  EFLAGS: 00010007
RAX: 0246 RBX: 81ca7b20 RCX: 625f6963703e373c
RDX: 00040004 RSI: 0004 RDI: 820aa7e8
RBP: 88082e2edd10 R08: 881026a48000 R09: 
R10: 88102d21efb8 R11:  R12: 881026a48000
R13: 51ffe3560003 R14:  R15: 4450
FS:  7fbd37a2d7e0() GS:88103fca() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fbd37a47000 CR3: 00103dc78000 CR4: 000407e0
Stack:
 881026a4c450 5227 81a3703d 881026a48000
 2e2edd70 88103db34140 0001abaf 36383039
 003a0fb8 881026a48000 88102d21e000 448a
Call Trace:
 [8126dd7d] pstore_get_records+0xed/0x2c0
 [8126cfa0] ? pstore_get_inode+0x50/0x50
 [8126d042] pstore_fill_super+0xa2/0xc0
 [811691f2] mount_single+0xa2/0xd0
 [8126ce28] pstore_mount+0x18/0x20
 [811693e3] mount_fs+0x43/0x1b0
 [8112dc40] ? __alloc_percpu+0x10/0x20
 [8118256f] vfs_kern_mount+0x6f/0x100
 [81184a79] do_mount+0x259/0xa10
 [81128bcb] ? strndup_user+0x5b/0x80
 [811852be] SyS_mount+0x8e/0xe0
 [81606802] system_call_fastpath+0x16/0x1b
Code: 88 e8 f1 0f 39 00 48 8b 0d 0a 3a a2 00 48 81 f9 00 0d c9 81 75
15 eb 67 0f 1f 80 00 00 00 00 48 8b 09 48 81 f9 00 0d c9 81 74 54 44
39 71 18 75 ee 4c 39 69 20 75 e8 48 39 59 10 75 e2 48 89 c6
RIP  [8126d314] pstore_mkfile+0x84/0x410
 RSP 88082e2edc70
---[ end trace 0e1dd8e3ccfa3dcc ]---
/etc/init.d/functions: line 530: 10252 Segmentation fault  $@

Here's the start of my pstore_mkfile() code where the GP fault occurred:

8126d290 pstore_mkfile:
8126d290:   e8 2b 91 39 00  callq
816063c0 __fentry__
8126d295:   55  push   %rbp
8126d296:   48 89 e5mov%rsp,%rbp
8126d299:   41 57   push   %r15
8126d29b:   41 56   push   %r14
8126d29d:   41 89 femov%edi,%r14d
8126d2a0:   48 c7 c7 e8 a7 0a 82mov$0x820aa7e8,%rdi
8126d2a7:   41 55   push   %r13
8126d2a9:   49 89 d5mov%rdx,%r13
8126d2ac:   41 54   push   %r12
8126d2ae:   53  push   %rbx
8126d2af:   48 83 ec 78 sub$0x78,%rsp
8126d2b3:   89 4d 84mov%ecx,-0x7c(%rbp)
8126d2b6:   48 89 b5 70 ff ff ffmov%rsi,-0x90(%rbp)
8126d2bd:   65 48 8b 04 25 28 00mov%gs:0x28,%rax
8126d2c4:   00 00
8126d2c6:   48 89 45 d0 mov%rax,-0x30(%rbp)
8126d2ca:   31 c0   xor%eax,%eax
8126d2cc:   48 8b 05 0d d5 e3 00mov
0xe3d50d(%rip),%rax# 820aa7e0 pstore_sb
8126d2d3:   4c 89 85 78 ff ff ffmov%r8,-0x88(%rbp)
8126d2da:   44 89 4d 80 mov%r9d,-0x80(%rbp)
8126d2de:   48 8b 5d 28 mov0x28(%rbp),%rbx
8126d2e2:   48 8b 40 60 mov0x60(%rax),%rax
8126d2e6:   48 89 45 88 mov%rax,-0x78(%rbp)
8126d2ea:   e8 f1 0f 39 00  callq
815fe2e0 _raw_spin_lock_irqsave
8126d2ef:   48 8b 0d 0a 3a a2 00mov
0xa23a0a(%rip),%rcx# 81c90d00 allpstore
8126d2f6:   48 81 f9 00 0d c9 81cmp$0x81c90d00,%rcx
8126d2fd:   75 15   jne
8126d314 pstore_mkfile+0x84
8126d2ff:   eb 67   jmp
8126d368 pstore_mkfile+0xd8
8126d301:   0f 1f 80 00 00 00 00nopl   0x0(%rax)
8126d308:   48 8b 09mov(%rcx),%rcx
8126d30b:   48 81 f9 00 0d c9 81cmp$0x81c90d00,%rcx
8126d312:   74 54   je
8126d368 pstore_mkfile+0xd8
8126d314:   44 

Re: [PATCH 00/11] Add compression support to pstore

2013-08-05 Thread Tony Luck
This patch seems to fix the garbage at the end problem.  Booting an
old kernel and using openssl decodes them OK.

Still have problems booting if there are any compressed images in ERST
to be inflated.

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/11] Add compression support to pstore

2013-08-02 Thread Tony Luck
On Thu, Aug 1, 2013 at 4:42 PM, Luck, Tony tony.l...@intel.com wrote:
 when I rebuilt a plain 3.11-rc3 it didn't log anything via pstore either :-(

Well this turned out to be operator error on my part. 3.11-rc3 does in fact
log errors to pstore and allows them to be retrieved and cleared.

So then I start testing with your 11 patches in place.

First boot was fine - ERST had no records, and pstore mounted OK
(and showed no files).

Then I panic'd the machine and rebooted.  The boot hung when some
rc script printed

Mounting other filesystems:

I guess something went wrong when pstore found a non-empty ERST.

I added some debug traces and booted again.  This time the boot succeeded
but I saw a GP fault reported from pstore_mkfile(). Possibly in this code:

spin_lock_irqsave(allpstore_lock, flags);
list_for_each_entry(pos, allpstore, list) {
if (pos-type == type 
pos-id == id 
pos-psi == psi) {
rc = -EEXIST;
break;
}
}
spin_unlock_irqrestore(allpstore_lock, flags);



My other tracing showed that we'd already found two compressed entries in
ERST and were working on a third when this error happened (implying that
my hang had been a panic that failed to print anything to console)

I've attached one of the compressed files that v3.11-rc3 shows in pstore
now.  The openssl zlib -d trick you mentioned back in June mostly works
to decode ... but it seems to dump some trailing garbage at the end of
the file.

-Tony


unknown-erst-5907623178007478273
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 00/11] Add compression support to pstore

2013-08-02 Thread Tony Luck
A quick experiment to use your patchset - but with compression
disabled by tweaking this line in pstore_dump():

zipped_len = -1; //zip_data(dst, hsize + len);

turned out well. This kernel dumps uncompressed dmesg blobs into pstore
and gets them back out again.  So it seems likely that the problems are
someplace in the compression/decompression code.

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v5 05/19] memory-hotplug: check whether memory is present or not

2012-07-27 Thread Tony Luck
On Fri, Jul 27, 2012 at 3:28 AM, Wen Congyang we...@cn.fujitsu.com wrote:
 +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
 +{
 +   int i;
 +   for (i = 0; i  nr_pages; i++) {
 +   if (pfn_present(pfn + 1))

Typo? I think you meant pfn + i

 +   continue;
 +   else
 +   return -EINVAL;
 +   }
 +   return 0;
 +}

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -v2] Audit: push audit success and retcode into arch ptrace.h

2011-06-03 Thread Tony Luck
On Fri, Jun 3, 2011 at 3:04 PM, Eric Paris epa...@redhat.com wrote:
 The other major change is that on some arches, like ia64, we change
 regs_return_value() to give us the negative value on syscall failure.  The
 only other user of this macro, kretprobe_example.c, won't notice and it makes
 the value signed consistently for the audit functions across all archs.

v2 builds and boots on ia64 now
Acked-by: Tony Luck tony.l...@intel.com


 Signed-off-by: Eric Paris epa...@redhat.com
 Acked-by: Acked-by: H. Peter Anvin h...@zytor.com [for x86 portion]

    :-)

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Audit: push audit success and retcode into arch ptrace.h

2011-06-02 Thread Tony Luck
 But there seems to be another problem.
 Why is pt_regs of type void *?

 gcc complains:
 In file included from include/linux/fsnotify.h:15:0,
                 from include/linux/security.h:26,
                 from init/main.c:32:
 include/linux/audit.h: In function ‘audit_syscall_exit’:
 include/linux/audit.h:440:17: warning: dereferencing ‘void *’ pointer
 include/linux/audit.h:440:3: error: invalid use of void expression
 include/linux/audit.h:441:21: warning: dereferencing ‘void *’ pointer
 include/linux/audit.h:441:21: error: void value not ignored as it ought to be

Perhaps same issue on ia64 - but symptoms are different:

  CC  crypto/cipher.o
In file included from include/linux/fsnotify.h:15,
 from include/linux/security.h:26,
 from init/do_mounts.c:8:
include/linux/audit.h: In function ‘audit_syscall_exit’:
include/linux/audit.h:440: warning: dereferencing ‘void *’ pointer
include/linux/audit.h:440: error: request for member ‘r10’ in
something not a structure or union
include/linux/audit.h:441: error: request for member ‘r10’ in
something not a structure or union
include/linux/audit.h:441: error: request for member ‘r8’ in something
not a structure or union
include/linux/audit.h:441: error: request for member ‘r8’ in something
not a structure or union
include/linux/audit.h:441: error: expected ‘;’ before ‘}’ token
include/linux/audit.h:441: error: void value not ignored as it ought to be

-Tony
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev