Re: [RFC] Kdump and memory error handling

2011-05-17 Thread K.Prasad
On Thu, May 12, 2011 at 03:22:44PM -0700, Eric W. Biederman wrote:
 K.Prasad pra...@linux.vnet.ibm.com writes:
 
  Hi All,
  We've been trying to study and improve the kdump behaviour when
  a panic is triggered due to an unrecoverable memory error causing a
  machine check exception (MCE) followed by a kernel panic.
 
  In this context we foresee a few issues in capturing kdump and would
  like to receive comments about the ways to handle them.
 
  Probable Issues when capturing coredump through kdump following a memory
  error
  ---
  - First, a coredump of the memory from the crashing kernel isn't really
helpful in debugging the crash that was caused due to a faulty memory.
Collecting the same has some of the problems illustrated below. It should
therefore suffice to let the user know the reason of the crash
rather than provide a complete dump of the memory.
 
For this, a 'slim' yet crash-tool readable coredump containing:
- message about the cause (such as crash due to unrecoverable memory 
  error)
  in the coredump's elf-note section.
- and no data from the memory of the 'crashing' kernel (their elf
  sections can be reduced to zero length).
may be suitable.
 
  - Alternatively, if the kdump kernel decides to capture the coredump,
its attempts to read the faulty memory location may lead to subsequent 
faults in the context of kdump kernel with fatal consequences. This
may either be avoided by:
 
a) Pass the address of the corrupt memory location to the kdump kernel
and skip reading that location while creating the vmcore. This needs
an instance of 'struct mce' (from the 'crashing' kernel), which
already contains the faulty memory address (in the physical address
form, which should be confirmed using the IA32_MCi_MISC[8:6] bits stored
in 'misc' field of 'struct mce') to be populated inside the elf
(-notes?) section.
 
b) Use modified copy applications (such as a modified 'cp' command)
that can map the /dev/oldmem into user-space and then initiate the
creation of vmcore. In this method, the user-space process performing
the copy will receive a SIGBUS while consuming the faulty memory (through
INT18 - do_machine_check) but it must be modified to be resilient to the
signal, while intelligently skipping to the subsequent memory location
for further copying. Meanwhile the data for the faulty memory location
can be represented using 'zero-ed' data and the vmcore enhanced to
indicate the cause of the crash as one resulting from a fatal MCE.
 
  Any thoughts/suggestions?
 
 In practice this all works for me.
 
 I have received several crash dumps where there was an mce error.
 
 I admit I have my userspace configured to just grab the dmesg from the
 kernel log and not do a full crash dump.  So in that sense I am already
 a slim crash dump.
 
 But in practice with real hardware errors it is working today without
 kernel changes.


The problem with the existing kernel code is that it allows for the old
kernel's memory regions to be read (through read_vmcore function),
although intelligent userspace tools may avoid such a possibility (like
the one you mentioned).

Given that the system can experience recursive MCE faults while reading
the corrupt memory region, a 'slim' vmcore region presented by the kernel
to the user-space would be a safe option. We could also use such a dump to
include more relevant information such as address of corrupt memory,
type of memory error, etc.

Thanks,
K.Prasad


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] Kdump and memory error handling

2011-05-12 Thread Eric W. Biederman
K.Prasad pra...@linux.vnet.ibm.com writes:

 Hi All,
 We've been trying to study and improve the kdump behaviour when
 a panic is triggered due to an unrecoverable memory error causing a
 machine check exception (MCE) followed by a kernel panic.

 In this context we foresee a few issues in capturing kdump and would
 like to receive comments about the ways to handle them.

 Probable Issues when capturing coredump through kdump following a memory
 error
 ---
 - First, a coredump of the memory from the crashing kernel isn't really
   helpful in debugging the crash that was caused due to a faulty memory.
   Collecting the same has some of the problems illustrated below. It should
   therefore suffice to let the user know the reason of the crash
   rather than provide a complete dump of the memory.

   For this, a 'slim' yet crash-tool readable coredump containing:
   - message about the cause (such as crash due to unrecoverable memory error)
 in the coredump's elf-note section.
   - and no data from the memory of the 'crashing' kernel (their elf
 sections can be reduced to zero length).
   may be suitable.

 - Alternatively, if the kdump kernel decides to capture the coredump,
   its attempts to read the faulty memory location may lead to subsequent 
   faults in the context of kdump kernel with fatal consequences. This
   may either be avoided by:

   a) Pass the address of the corrupt memory location to the kdump kernel
   and skip reading that location while creating the vmcore. This needs
   an instance of 'struct mce' (from the 'crashing' kernel), which
   already contains the faulty memory address (in the physical address
   form, which should be confirmed using the IA32_MCi_MISC[8:6] bits stored
   in 'misc' field of 'struct mce') to be populated inside the elf
   (-notes?) section.

   b) Use modified copy applications (such as a modified 'cp' command)
   that can map the /dev/oldmem into user-space and then initiate the
   creation of vmcore. In this method, the user-space process performing
   the copy will receive a SIGBUS while consuming the faulty memory (through
   INT18 - do_machine_check) but it must be modified to be resilient to the
   signal, while intelligently skipping to the subsequent memory location
   for further copying. Meanwhile the data for the faulty memory location
   can be represented using 'zero-ed' data and the vmcore enhanced to
   indicate the cause of the crash as one resulting from a fatal MCE.

 Any thoughts/suggestions?

In practice this all works for me.

I have received several crash dumps where there was an mce error.

I admit I have my userspace configured to just grab the dmesg from the
kernel log and not do a full crash dump.  So in that sense I am already
a slim crash dump.

But in practice with real hardware errors it is working today without
kernel changes.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] Kdump and memory error handling

2011-05-09 Thread K.Prasad
On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
  Any thoughts/suggestions?
 
 My old attempts to solve this are
 
 Don't dump on MCE:
 
 http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
 

The problem we seen in avoiding a panic-crash_kexec-[coredump capture] is
that the user may not have a means to know the reason for crash, unless
the serial console is connected to capture and store the panic string.

Alternatively a 'slim' kdump (as described here:
https://lkml.org/lkml/2011/5/4/396) would not contain meaningless data from
the old memory, but inform the user about the cause of the crash. I'm
intending to post some patches with a quick implementation of it soon.

 Handle dumps of corrupted memory regresions:
 
 http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
 

 IMHO these patches are still the right solutions for this.
 

Like Vatsa had raised, the processor's behaviour upon reading (or any I/O
operation) the faulty memory location isn't clearly defined (to the
extent I read through System Programming Guide Part 1, Volume 3A,
Chapter 15). In such a scenario, disabling MCE for the kdump kernel (which can
potentially read the faulty memory) is making things hazy.

Thanks,
K.Prasad


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] Kdump and memory error handling

2011-05-05 Thread Srivatsa Vaddagiri
On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
 Handle dumps of corrupted memory regresions:
 
 http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump

What happens when mce is disabled and capture kernel reads corrupted memory? 
Does that result in dump having erroneous data in dump?

- vatsa

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [RFC] Kdump and memory error handling

2011-05-04 Thread Luck, Tony
Your first suggestion of a slim dump makes the most sense. The
purpose of a crash dump is a research resource to find out why
the system crashed - but in the case of a machine check, we already
have the reasons for the crash captured by the machine check handler.

Perhaps you could include __log_buf[] in the slim crash dump? Assuming
that the machine check is not a result of an uncorrectable error
in this memory range.

-Tony

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec