subject:"\[Fastboot\] \[PATCH\] Reserving backup region for kexec based crashdumps."

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-04 Thread Eric W. Biederman

[EMAIL PROTECTED] (Eric W. Biederman) writes:

> Hirokazu Takahashi <[EMAIL PROTECTED]> writes:
> > > Most of this just results in easier management between the pieces.
> > > Which is a good thing.  However at the moment I don't think it
> > > simplifies any of the core problems.  I still need to reserve
> > > a large hunk of physical address space early on before any
> > > DMA transactions are setup to hold the new kernel.
> > 
> > I agree that my idea is not essential at the moment.
> > 
> > > So while I am happy to see patches that improve this I don't
> > > actually care right now.
> > 
> > ok.

Thinking about this some more this does have a significant aspect
on the design.  For architectures that support this, on the
primary kernel the command line option becomes:
crashkernel=size instead of [EMAIL PROTECTED]
Which means the kernel needs to call alloc_bootmem instead
of reserve_bootmem.  So it results in a primary kernel implementation
difference.

In addition if we really can push all of the dump specific
functionality into user space as it appears we can, this allows a
generic kernel to be used for the crash dump process.  It will
probably still be a special hardened build where reliability is
more important than performance.  So that any micro hit we take in
performance by modifying __pa() and __va() will be irrelevant.

I like it.

I have already demonstrated that there is a general technique that
any architecture can use to build a kernel that runs at a non-default
address.  So for the architectures that cannot build a PIC kernel
there is still a proven solution available, it simply will not
be as nice to manage.

x86_64 should pretty straight forward.  i386 will be a little more
difficult but doable.

Patches are still welcome.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-04 Thread Eric W. Biederman

Hirokazu Takahashi <[EMAIL PROTECTED]> writes:

> Hi,
> 
> > > Hi Eric,
> > > 

> I see you have.
> And MIPS CPUs doesn't allow kernel pages to be remapped either.

I guess I should add to be relocatable in the general case most
likely requires running a PIC dynamic linker at kernel startup.
If none of the rest of the kernel is built PIC and the relocation
table is not too big we might be able to convince people to implement
it generally.

At least that is one technique for generating a PIC kernel that I
have not explored fully.
 
> > You don't need anything fancy except to build the page tables
> > during bootup.  However there are a few potential gotchas
> > with respect to using large pages, that can give 4MiB or
> > greater alignment restrictions on the kernel.  Code wise
> > the gotcha is moving the kernel's .text section into what
> > is essentially the vmalloc portion of the address space.
> > For x86_64 the kernels virtual address is already decoupled from the
> > physical addresses, so it is probably easier.
> 
> I know we can place the kernel in any address though there
> exist some exceptions.
> 
> I know mapping kernel pages to the same virtual address only helps
> to avoid caring about physical addresses or vmalloc'ed addresses
> when linking the kernel. I think it wouldn't be bad idea in many
> architectures. I prefer it rather than linking the kernel for each
> system.

Agreed.  Although I suspect most architectures will have a region
that will work for most users.
 
> > Most of this just results in easier management between the pieces.
> > Which is a good thing.  However at the moment I don't think it
> > simplifies any of the core problems.  I still need to reserve
> > a large hunk of physical address space early on before any
> > DMA transactions are setup to hold the new kernel.
> 
> I agree that my idea is not essential at the moment.
> 
> > So while I am happy to see patches that improve this I don't
> > actually care right now.
> 
> ok.

The one part I do request is that if you build such a kernel that
you figure a way to get it's ELF header of type ET_DYN.   So it
does not require a magic loader to load it.

I have recently patched both etherboot and /sbin/kexec to accept
that kind of binary :)

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-04 Thread Hirokazu Takahashi

Hi,

> > Hi Eric,
> > 
> > > > Hi Vivek and Eric,
> > > > 
> > > > IMHO, why don't we swap not only the contents of the top 640K
> > > > but also kernel working memory for kdump kernel?
> > > > 
> > > > I guess this approach has some good points.
> > > > 
> > > >  1.Preallocating reserved area is not mandatory at boot time.
> > > >And the reserved area can be distributed in small pieces
> > > >like original kexec does.
> > > > 
> > > >  2.Special linking is not required for kdump kernel.
> > > >Each kdump kernel can be linked in the same way,
> > > >where the original kernel exists.
> > > > 
> > > > Am I missing something?
> > > 
> > > Preallocating the reserved area is largely to keep it from
> > > being the target of DMA accesses.  Since we are not able
> > > to shutdown any of the drivers in the primary kernel running
> > > in a normal swath of memory sounds like a good way to get
> > > yourself stomped at the worst possible time.
> > 
> > So what do you think my another idea?
> 
> I have proposed it.  I think ia64 already does that.
> It has been pointed that the PowerPC kernel occasionally runs
> with the mmu turned off. So it is not a technique the is 100%
> portable.

I see you have.
And MIPS CPUs doesn't allow kernel pages to be remapped either.

> > I think we can always make a kdump kernel mapped to the same virtual
> > address. So we will be free from caring about the physical address
> > where the kdump kernel is loaded.
> > 
> > I believe the memsection functionality which LHMS project is working
> > on would help this.
> 
> You don't need anything fancy except to build the page tables
> during bootup.  However there are a few potential gotchas
> with respect to using large pages, that can give 4MiB or
> greater alignment restrictions on the kernel.  Code wise
> the gotcha is moving the kernel's .text section into what
> is essentially the vmalloc portion of the address space.
> For x86_64 the kernels virtual address is already decoupled from the
> physical addresses, so it is probably easier.

I know we can place the kernel in any address though there
exist some exceptions.

I know mapping kernel pages to the same virtual address only helps
to avoid caring about physical addresses or vmalloc'ed addresses
when linking the kernel. I think it wouldn't be bad idea in many
architectures. I prefer it rather than linking the kernel for each
system.

> Most of this just results in easier management between the pieces.
> Which is a good thing.  However at the moment I don't think it
> simplifies any of the core problems.  I still need to reserve
> a large hunk of physical address space early on before any
> DMA transactions are setup to hold the new kernel.

I agree that my idea is not essential at the moment.

> So while I am happy to see patches that improve this I don't
> actually care right now.

ok.

> Eric
> 

Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-04 Thread Hirokazu Takahashi

Hi,

  Hi Eric,
  
Hi Vivek and Eric,

IMHO, why don't we swap not only the contents of the top 640K
but also kernel working memory for kdump kernel?

I guess this approach has some good points.

 1.Preallocating reserved area is not mandatory at boot time.
   And the reserved area can be distributed in small pieces
   like original kexec does.

 2.Special linking is not required for kdump kernel.
   Each kdump kernel can be linked in the same way,
   where the original kernel exists.

Am I missing something?
   
   Preallocating the reserved area is largely to keep it from
   being the target of DMA accesses.  Since we are not able
   to shutdown any of the drivers in the primary kernel running
   in a normal swath of memory sounds like a good way to get
   yourself stomped at the worst possible time.
  
  So what do you think my another idea?
 
 I have proposed it.  I think ia64 already does that.
 It has been pointed that the PowerPC kernel occasionally runs
 with the mmu turned off. So it is not a technique the is 100%
 portable.

I see you have.
And MIPS CPUs doesn't allow kernel pages to be remapped either.

  I think we can always make a kdump kernel mapped to the same virtual
  address. So we will be free from caring about the physical address
  where the kdump kernel is loaded.
  
  I believe the memsection functionality which LHMS project is working
  on would help this.
 
 You don't need anything fancy except to build the page tables
 during bootup.  However there are a few potential gotchas
 with respect to using large pages, that can give 4MiB or
 greater alignment restrictions on the kernel.  Code wise
 the gotcha is moving the kernel's .text section into what
 is essentially the vmalloc portion of the address space.
 For x86_64 the kernels virtual address is already decoupled from the
 physical addresses, so it is probably easier.

I know we can place the kernel in any address though there
exist some exceptions.

I know mapping kernel pages to the same virtual address only helps
to avoid caring about physical addresses or vmalloc'ed addresses
when linking the kernel. I think it wouldn't be bad idea in many
architectures. I prefer it rather than linking the kernel for each
system.

 Most of this just results in easier management between the pieces.
 Which is a good thing.  However at the moment I don't think it
 simplifies any of the core problems.  I still need to reserve
 a large hunk of physical address space early on before any
 DMA transactions are setup to hold the new kernel.

I agree that my idea is not essential at the moment.

 So while I am happy to see patches that improve this I don't
 actually care right now.

ok.

 Eric
 

Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-04 Thread Eric W. Biederman

Hirokazu Takahashi [EMAIL PROTECTED] writes:

 Hi,
 
   Hi Eric,
   

 I see you have.
 And MIPS CPUs doesn't allow kernel pages to be remapped either.

I guess I should add to be relocatable in the general case most
likely requires running a PIC dynamic linker at kernel startup.
If none of the rest of the kernel is built PIC and the relocation
table is not too big we might be able to convince people to implement
it generally.

At least that is one technique for generating a PIC kernel that I
have not explored fully.
 
  You don't need anything fancy except to build the page tables
  during bootup.  However there are a few potential gotchas
  with respect to using large pages, that can give 4MiB or
  greater alignment restrictions on the kernel.  Code wise
  the gotcha is moving the kernel's .text section into what
  is essentially the vmalloc portion of the address space.
  For x86_64 the kernels virtual address is already decoupled from the
  physical addresses, so it is probably easier.
 
 I know we can place the kernel in any address though there
 exist some exceptions.
 
 I know mapping kernel pages to the same virtual address only helps
 to avoid caring about physical addresses or vmalloc'ed addresses
 when linking the kernel. I think it wouldn't be bad idea in many
 architectures. I prefer it rather than linking the kernel for each
 system.

Agreed.  Although I suspect most architectures will have a region
that will work for most users.
 
  Most of this just results in easier management between the pieces.
  Which is a good thing.  However at the moment I don't think it
  simplifies any of the core problems.  I still need to reserve
  a large hunk of physical address space early on before any
  DMA transactions are setup to hold the new kernel.
 
 I agree that my idea is not essential at the moment.
 
  So while I am happy to see patches that improve this I don't
  actually care right now.
 
 ok.

The one part I do request is that if you build such a kernel that
you figure a way to get it's ELF header of type ET_DYN.   So it
does not require a magic loader to load it.

I have recently patched both etherboot and /sbin/kexec to accept
that kind of binary :)

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-04 Thread Eric W. Biederman

[EMAIL PROTECTED] (Eric W. Biederman) writes:

 Hirokazu Takahashi [EMAIL PROTECTED] writes:
   Most of this just results in easier management between the pieces.
   Which is a good thing.  However at the moment I don't think it
   simplifies any of the core problems.  I still need to reserve
   a large hunk of physical address space early on before any
   DMA transactions are setup to hold the new kernel.
  
  I agree that my idea is not essential at the moment.
  
   So while I am happy to see patches that improve this I don't
   actually care right now.
  
  ok.

Thinking about this some more this does have a significant aspect
on the design.  For architectures that support this, on the
primary kernel the command line option becomes:
crashkernel=size instead of [EMAIL PROTECTED]
Which means the kernel needs to call alloc_bootmem instead
of reserve_bootmem.  So it results in a primary kernel implementation
difference.

In addition if we really can push all of the dump specific
functionality into user space as it appears we can, this allows a
generic kernel to be used for the crash dump process.  It will
probably still be a special hardened build where reliability is
more important than performance.  So that any micro hit we take in
performance by modifying __pa() and __va() will be irrelevant.

I like it.

I have already demonstrated that there is a general technique that
any architecture can use to build a kernel that runs at a non-default
address.  So for the architectures that cannot build a PIC kernel
there is still a proven solution available, it simply will not
be as nice to manage.

x86_64 should pretty straight forward.  i386 will be a little more
difficult but doable.

Patches are still welcome.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Itsuro Oda

Hi,

On Fri, 04 Feb 2005 08:18:56 +0900
Itsuro Oda <[EMAIL PROTECTED]> wrote:

> 
> > > 5) dump kernel: export all valid physical memory (and saved register
> > >information) to the user. (as /dev/oldmem /proc/vmcore ?)
> > 
> > Or in user space, by just mmaping /dev/mem. That is part of the
> > current conversation.   The only real point for putting that code in
> > the kernel (besides momentum) is it is a cheap way to get the exact
> > data structures of the kernel you are using.  But since:
> > (a) it does not look like any primary kernel data structures need to
> > be examined.
> > (b) even simple compile options like SMP/NOSMP are enough to change
> > the layout of the data structures.
> > I think there is a pretty good case for moving all of the work to
> > user space.  But you still need a kernel that loads and
> > runs in the reserved area.
> > 
> I don't make sense. what do you mean ?
> 

"I don't make sense." should be "It does not make sense."
sorry. I'm not familiar with English.

-- 
Itsuro ODA <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Itsuro Oda <[EMAIL PROTECTED]> writes:

> Hi,
> 
> On 02 Feb 2005 07:45:11 -0700
> [EMAIL PROTECTED] (Eric W. Biederman) wrote:
> 
> > 
> > And the feedback begins :)
> > 
> > Itsuro Oda <[EMAIL PROTECTED]> writes:
> > 
> > > Hi,
> > > 
> > > I don't like calling crash_kexec() directly in (ex.) panic().
> > > It should be call_dump_hook() (or something like this).
> > > 
> > > I think the necessary modifications of the kernel is only:
> > > - insert the hooks that calls a dump function when crash occur
> > crash_kexec()
> > > - binding interface that binds a dump function to the hook
> > >   (like register_dump_hook())
> > sys_kexec_load(...);
> 
> For example there are pepole who want to execute a built in kernel
> debugger when the system is crashed. or there are pepole who
> believe the diskdump is the best dump tool :-)
> 
> So I think a sort of hook is better than calling crash_kexec 
> directly. (May I make a patch ?)

The prevalent feeling I have heard from kernel developers and 
and my personal feeling as well is that after a kernel has called
panic you can't trust it.  Which means anything running in the kernel
itself is suspect.

The crash_kexec() hooks enables everything that does not get linked into
the kernel.   So I don't feel a hook in the panic path is necessary
nor do I feel that it is wise, especially with no in-kernel users.

Plus the worst part about a hook in the panic path is that it is
inherently racy.  Keeping the crash_kexec() code from blocking or
being racy has been a challenge.  And I still think that entire code
path needs a review and some more code tweaks to remove races.

If someone else wants a hook in the panic path they can add their own
hook, and make their own case for why it is needed.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Itsuro Oda <[EMAIL PROTECTED]> writes:

> Hi,
> 
> On 03 Feb 2005 02:00:51 -0700
> [EMAIL PROTECTED] (Eric W. Biederman) wrote:
> 
> > > 5) dump kernel: export all valid physical memory (and saved register
> > >information) to the user. (as /dev/oldmem /proc/vmcore ?)
> > 
> > Or in user space, by just mmaping /dev/mem. That is part of the
> > current conversation.   The only real point for putting that code in
> > the kernel (besides momentum) is it is a cheap way to get the exact
> > data structures of the kernel you are using.  But since:
> > (a) it does not look like any primary kernel data structures need to
> > be examined.
> > (b) even simple compile options like SMP/NOSMP are enough to change
> > the layout of the data structures.
> > I think there is a pretty good case for moving all of the work to
> > user space.  But you still need a kernel that loads and
> > runs in the reserved area.
> > 
> I don't make sense. what do you mean ?
> 
> What we want to do when the system is crashed is storing the whole
> physical memory (and saved register information for x86 arch) to
> some place (ex. a disk partition) for later analysis. 
> So the basic requirments to the dump kernel is that:
>  * supply a method to access whole (valid) physical memory.
>  * supply a method to access the saved register information.
> 
> Does the kdump meet this requirment ? 

Yes, the discussion in this area is what is the best way to implement
this requirement.   How much should be in the kernel and how much
should be in user space.

At the moment things are broken but should be fixed shortly.
So what has been implemented are /dev/oldmem which provides access
to the old memory.  And /proc/vmcore which provides both the old
memory and the register information.

> (I am not interesting to /proc/vmcore. Constructing the vmcore
>  image is area of analysis tools. not kernel's task.)

There is a fine line there, as a simple ELF core dump has just enough
information to describe discontiguous memory, and to have an out of
band channel for register information.  Adding anything extra like
virtual addresses that match the kernel should be left for the
crash dump analysis tools.

In code that is currently in the mainstream kernel /dev/mem can
mmap any area of memory that is not used by the kernel as ram.
So what I believe we will end up is that /sbin/kexec (user space)
will prepare an ELF header (data) that describes the memory regions
and details where to find the kernels register information.  The
address of that ELF header will be passed to the crash dump
capture kernel and user space combination.  The something
(probably a user space program reading /dev/mem) will look
at the ELF header and save the already prepared ELF core
dump to disk.  Possibly doing little things like merging
the MAX_NR_CPUS note segments into one so it actually conforms
to the ELF spec.

This thread started as the design discussion before finishing
that part of the implementation.   The proof of concept
implementations have happened.  We have all seen this kind
of functionality implemented.  Now is the time to come up
with a good solid design that can be maintained and merged
into the mainline kernel and distros.

So thank you for ask questions, it means we have a better chance 
of getting a solid design and a design that those people who
care about this functionality can use.  And with a little luck
we can all wind up on agreeing on the general principles.  You came in
a little late to this conversation so a lot of details have been
settled, but if you have a good argument for doing something another
way we can certainly look at that. 

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Itsuro Oda

Hi,

On 02 Feb 2005 07:45:11 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> 
> And the feedback begins :)
> 
> Itsuro Oda <[EMAIL PROTECTED]> writes:
> 
> > Hi,
> > 
> > I don't like calling crash_kexec() directly in (ex.) panic().
> > It should be call_dump_hook() (or something like this).
> > 
> > I think the necessary modifications of the kernel is only:
> > - insert the hooks that calls a dump function when crash occur
> crash_kexec()
> > - binding interface that binds a dump function to the hook
> >   (like register_dump_hook())
> sys_kexec_load(...);

For example there are pepole who want to execute a built in kernel
debugger when the system is crashed. or there are pepole who
believe the diskdump is the best dump tool :-)

So I think a sort of hook is better than calling crash_kexec 
directly. (May I make a patch ?)

Thanks.
-- 
Itsuro ODA <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Itsuro Oda

Hi,

On 03 Feb 2005 02:00:51 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> A better description is probably make a list of memory regions
> using an ELF header data structure in user space.  
> Use sys_kexec_load to put that list the dump kernel and a little
> big of glue code in the reserved area.  The glue code includes
> a hash of all of everything so it can all be validated before
> use.

I see. The data structure is put on a part of loading kernel's data. 

> Record the register information as ELF notes in a per cpu data
> area.  The per cpu data areas are known and enumerated in
> the list of memory regions.  The kernel knows nothing about
> the ELF header etc.
> 

I see.

> > 5) dump kernel: export all valid physical memory (and saved register
> >information) to the user. (as /dev/oldmem /proc/vmcore ?)
> 
> Or in user space, by just mmaping /dev/mem. That is part of the
> current conversation.   The only real point for putting that code in
> the kernel (besides momentum) is it is a cheap way to get the exact
> data structures of the kernel you are using.  But since:
> (a) it does not look like any primary kernel data structures need to
> be examined.
> (b) even simple compile options like SMP/NOSMP are enough to change
> the layout of the data structures.
> I think there is a pretty good case for moving all of the work to
> user space.  But you still need a kernel that loads and
> runs in the reserved area.
> 
I don't make sense. what do you mean ?

What we want to do when the system is crashed is storing the whole
physical memory (and saved register information for x86 arch) to
some place (ex. a disk partition) for later analysis. 
So the basic requirments to the dump kernel is that:
 * supply a method to access whole (valid) physical memory.
 * supply a method to access the saved register information.

Does the kdump meet this requirment ? 

(I am not interesting to /proc/vmcore. Constructing the vmcore
 image is area of analysis tools. not kernel's task.)

Thanks.
-- 
Itsuro ODA <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Vivek Goyal

On Wed, 2005-02-02 at 21:12, Eric W. Biederman wrote:
> Vivek Goyal <[EMAIL PROTECTED]> writes:
> 
> > On Tue, 2005-02-01 at 20:56, Eric W. Biederman wrote:
> > > Vivek Goyal <[EMAIL PROTECTED]> writes:
> > 
> > "elfcorehdr=" also looks good.
> 
> Then let's go with that for now.  It is not perfect but it seems
> a little more self explanatory at first glance.
> > > A clarification on terminology we are talking about struct Elf64_Phdr
> > > here.  There is only one Elf header.  That seems to be clear farther
> > > down.
> > > 
> > 
> > 
> > Exactly. There shall be one Elf header for whole of the image. In
> > addition there will be one struct Elf64_Phdr, per contiguous physical
> > memory area. One Elf64_Phdr of PT_NOTE type for notes section and one
> > Elf64_Phdr for backup region.
> 
> Actually if we are just pointing a kernel data structures we will
> need multiple Elf64_Phdr of PT_NOTE.  Each cpu has it's own
> notes section and until the smoke clears we can't be confident
> about what is going to wind up there or how densely those will
> be packed.  So collapsing everything into a single notes segment
> needs to happen after we have switched to the crash capture kernel.


Sounds good. So there shall be a PT_NOTE type program header per cpu.
And these headers can be collapsed into one PT_NOTE type header later.


> 
> > > I have serious concerns about the kernel generating the ELF headers
> > > and only delivering them after the kernel has crashed.  Because
> > > then we run into questions of what information can be trusted.  If we
> > > avoid that issue I am not too concerned.
> > 
> > 
> > I hope, all elf headers once prepared by kexec-tools need not to change
> > later (Cannot think of any piece of information which shall change
> > later). These shall be put in separate segment. And SHA-256 shall take
> > care of authenticity of information after crash.
> 
> That should work fine.  We need to consider through throwing in an
> extra note section with information like kernel version that
> we can capture while the system is running.
> 
> > For notes section program header, virtual = physical = 0 and "offset"
> > shall point to crash_notes[], so that notes can directly be read by the
> > capture kernel (or user space).
> 
> I agree.  But see my caveat.  I think we should have one PT_NOTE
> segment point at each element of the crash_notes[] array.  I know
> it is technically a violation of the ELF spec.  But in this case
> it makes sense.   Since we can't guarantee that crash_notes will
> be packed properly I don't know that we could reliably see more
> than one cpu if we pointed a PT_NOTE header at the whole thing.
> 
> If it turns out that we can reliably point a single PT_NOTE header
> at crash_notes so much the better but things are likely to be
> more robust if we don't start with that assumption.  That
> at least allows us the freedom to capture some notes (like NT_UTSNAME)
> before the kernel crashes.
> 
> Eric
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Hirokazu Takahashi <[EMAIL PROTECTED]> writes:

> Hi Eric,
> 
> > > Hi Vivek and Eric,
> > > 
> > > IMHO, why don't we swap not only the contents of the top 640K
> > > but also kernel working memory for kdump kernel?
> > > 
> > > I guess this approach has some good points.
> > > 
> > >  1.Preallocating reserved area is not mandatory at boot time.
> > >And the reserved area can be distributed in small pieces
> > >like original kexec does.
> > > 
> > >  2.Special linking is not required for kdump kernel.
> > >Each kdump kernel can be linked in the same way,
> > >where the original kernel exists.
> > > 
> > > Am I missing something?
> > 
> > Preallocating the reserved area is largely to keep it from
> > being the target of DMA accesses.  Since we are not able
> > to shutdown any of the drivers in the primary kernel running
> > in a normal swath of memory sounds like a good way to get
> > yourself stomped at the worst possible time.
> 
> So what do you think my another idea?

I have proposed it.  I think ia64 already does that.
It has been pointed that the PowerPC kernel occasionally runs
with the mmu turned off. So it is not a technique the is 100%
portable.

> I think we can always make a kdump kernel mapped to the same virtual
> address. So we will be free from caring about the physical address
> where the kdump kernel is loaded.
> 
> I believe the memsection functionality which LHMS project is working
> on would help this.

You don't need anything fancy except to build the page tables
during bootup.  However there are a few potential gotchas
with respect to using large pages, that can give 4MiB or
greater alignment restrictions on the kernel.  Code wise
the gotcha is moving the kernel's .text section into what
is essentially the vmalloc portion of the address space.
For x86_64 the kernels virtual address is already decoupled from the
physical addresses, so it is probably easier.

Most of this just results in easier management between the pieces.
Which is a good thing.  However at the moment I don't think it
simplifies any of the core problems.  I still need to reserve
a large hunk of physical address space early on before any
DMA transactions are setup to hold the new kernel.

So while I am happy to see patches that improve this I don't
actually care right now.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Hirokazu Takahashi

Hi Eric,

> > Hi Vivek and Eric,
> > 
> > IMHO, why don't we swap not only the contents of the top 640K
> > but also kernel working memory for kdump kernel?
> > 
> > I guess this approach has some good points.
> > 
> >  1.Preallocating reserved area is not mandatory at boot time.
> >And the reserved area can be distributed in small pieces
> >like original kexec does.
> > 
> >  2.Special linking is not required for kdump kernel.
> >Each kdump kernel can be linked in the same way,
> >where the original kernel exists.
> > 
> > Am I missing something?
> 
> Preallocating the reserved area is largely to keep it from
> being the target of DMA accesses.  Since we are not able
> to shutdown any of the drivers in the primary kernel running
> in a normal swath of memory sounds like a good way to get
> yourself stomped at the worst possible time.

So what do you think my another idea?

I think we can always make a kdump kernel mapped to the same virtual
address. So we will be free from caring about the physical address
where the kdump kernel is loaded.

I believe the memsection functionality which LHMS project is working
on would help this.

+
|
|
(user space)
|
|
  physical  | virtual
  memory| space
 +  +
 |  |
 |  |
 |  |
 + .+
original |   .  | map kdump kernel here
kernel   | .|
 |   .  |
 | .   .+
 +   .   .  |
 | .   .|
 +   .  |
  kdump  | .|
  kernel |   .  |
 | .|
 +  |
 |  |
 |  |
 |  |



Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Hirokazu Takahashi <[EMAIL PROTECTED]> writes:

> Hi Vivek, 
> 
> > > Hi Vivek and Eric,
> > > 
> > > IMHO, why don't we swap not only the contents of the top 640K
> > > but also kernel working memory for kdump kernel?
> > 
> > 
> > Initial patches of kdump had adopted the same approach but given the
> > fact devices are not stopped during transition to new kernel after a
> > panic, it carried inherent risk of some DMA going on and corrupting the
> > new kernel/data structures. Hence the idea of running the kernel from a
> > reserved location came up. This should be DMA safe as long as DMA is not
> > misdirected.
> 
> I see, that makes sense.
> But I'm not sure yet that it's safe to access the top of 640MB.
640K?

> I wonder how kmalloc(GFP_DMA) works in a kdump kernel.

All that happens there is a one line change to vmlinux.lds.S  that
causes the kernel to live at a different physical and virtual
address.  So everything works as normal.

I do agree that it is risky to use the first 640K for normal work.
But on the list of things to fix it is a minor war, and even if we
back up that region of memory we don't need to use it.

There are still remain a lot of code reviews to ensure the code is
generally safe.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Hirokazu Takahashi

Hi Vivek, 

> > Hi Vivek and Eric,
> > 
> > IMHO, why don't we swap not only the contents of the top 640K
> > but also kernel working memory for kdump kernel?
> 
> 
> Initial patches of kdump had adopted the same approach but given the
> fact devices are not stopped during transition to new kernel after a
> panic, it carried inherent risk of some DMA going on and corrupting the
> new kernel/data structures. Hence the idea of running the kernel from a
> reserved location came up. This should be DMA safe as long as DMA is not
> misdirected.

I see, that makes sense.
But I'm not sure yet that it's safe to access the top of 640MB.
I wonder how kmalloc(GFP_DMA) works in a kdump kernel.

Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Hirokazu Takahashi <[EMAIL PROTECTED]> writes:

> Hi Vivek and Eric,
> 
> IMHO, why don't we swap not only the contents of the top 640K
> but also kernel working memory for kdump kernel?
> 
> I guess this approach has some good points.
> 
>  1.Preallocating reserved area is not mandatory at boot time.
>And the reserved area can be distributed in small pieces
>like original kexec does.
> 
>  2.Special linking is not required for kdump kernel.
>Each kdump kernel can be linked in the same way,
>where the original kernel exists.
> 
> Am I missing something?

Preallocating the reserved area is largely to keep it from
being the target of DMA accesses.  Since we are not able
to shutdown any of the drivers in the primary kernel running
in a normal swath of memory sounds like a good way to get
yourself stomped at the worst possible time.

In addition we get to avoid running a lot of code in the
panic path if we are jumping to a contiguous region of memory 
with everything already setup.

To some extent this is a contest who has the better imagination
for things that can go wrong.  Real life on dying hardware and
kernels, or the programmers writing the diagnostic code.

But if it is a gamble you are willing to take it is quite
feasible to use the reserved region for what you are
proposing and you could run a standard kernel.

The other reason for running out of the reserved region is that
it actually requires less memory reserved.  Every byte you backup
needs to have a reserved area of memory to hold it.  And if you are
also going to fill that with meaningful content you need another
byte to hold the data.  So using a stock kernel probably requires
2/3 more memory.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Itsuro Oda <[EMAIL PROTECTED]> writes:

> Hi,
> 
> On 02 Feb 2005 08:24:03 -0700
> [EMAIL PROTECTED] (Eric W. Biederman) wrote:
> > 
> > So the kernel+initrd that captures a crash dump will live and execute
> > in a reserved area of memory.  It needs to know which memory regions
> > are valid, and it needs to know small things like the final register
> > state of each cpu. 
> 
> Exactly.
> 
> Please let me clarify what you are going to.
> 1) standard kernel: reserve a small contigous area for a dump kernel
>(this is not changed as the current code)
> 2) standard kernel: export the information of valid physical memory
>regions. (/proc/iomem or /proc/cpumem etc.)
> 3) kexec (system call?): store the information of valid physical memory
>regions as ELF program header to the reserved area (mentioned 1)).

A better description is probably make a list of memory regions
using an ELF header data structure in user space.  
Use sys_kexec_load to put that list the dump kernel and a little
big of glue code in the reserved area.  The glue code includes
a hash of all of everything so it can all be validated before
use.

> 4) standard kernel: when a panic occur, append (ex.) the register
>information as ELF note after the memory information (if necessary).
>and jump new kernel

Record the register information as ELF notes in a per cpu data
area.  The per cpu data areas are known and enumerated in
the list of memory regions.  The kernel knows nothing about
the ELF header etc.

> 5) dump kernel: export all valid physical memory (and saved register
>information) to the user. (as /dev/oldmem /proc/vmcore ?)

Or in user space, by just mmaping /dev/mem. That is part of the
current conversation.   The only real point for putting that code in
the kernel (besides momentum) is it is a cheap way to get the exact
data structures of the kernel you are using.  But since:
(a) it does not look like any primary kernel data structures need to
be examined.
(b) even simple compile options like SMP/NOSMP are enough to change
the layout of the data structures.
I think there is a pretty good case for moving all of the work to
user space.  But you still need a kernel that loads and
runs in the reserved area.

> Is this correct ?  one question: how the dump kernel know the saved
> area of ELF headers ?

A command line parameter will be passed.  Probably
elfcorehdr=xxx 

> one more question: I don't understand what the 640K backup area is. 
> Please let me know why it is necessary.

In practice I think we can kill it on x86.  It is necessary (at least
a subset of it is) if we want to boot a SMP kernel.  As cpu must
start running code in the first 1M of the address space.  In addition
some architectures have exceptions vectors and or other data
structures at fixed locations in memory so in the general case a
backup area is required.  So building the infrastructure to handle
backup areas is needed even, even if we later stop using it on
x86.

The other reason for the 640K backup area is the IBM guys were having
problems without it.   The fact that you don't need it is a good
indication that it is unnecessary.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Vivek Goyal

Hi,

On Thu, 2005-02-03 at 12:32, Hirokazu Takahashi wrote:
> Hi Vivek and Eric,
> 
> IMHO, why don't we swap not only the contents of the top 640K
> but also kernel working memory for kdump kernel?


Initial patches of kdump had adopted the same approach but given the
fact devices are not stopped during transition to new kernel after a
panic, it carried inherent risk of some DMA going on and corrupting the
new kernel/data structures. Hence the idea of running the kernel from a
reserved location came up. This should be DMA safe as long as DMA is not
misdirected.

Thanks 
Vivek

> 
> I guess this approach has some good points.
> 
>  1.Preallocating reserved area is not mandatory at boot time.
>And the reserved area can be distributed in small pieces
>like original kexec does.
> 
>  2.Special linking is not required for kdump kernel.
>Each kdump kernel can be linked in the same way,
>where the original kernel exists.
> 
> Am I missing something?
>  physical memory
>+---+
>| 640K  +
>|...|   |
>|   | copy
>+---+   |
>|   |   |
>|original<-+|
>|kernel |  ||
>|   |  ||
>|...|  ||
>|   |  ||
>|   |  ||
>|   | swap  |
>|   |  ||
>+---+  ||
>|reserved<--+
>|area   |  |
>|   |  |
>|kdump  |<-+
>|kernel |
>+---+
>|   |
>|   |
>|   |
>+---+
> 
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Vivek Goyal

Hi,

On Thu, 2005-02-03 at 12:32, Hirokazu Takahashi wrote:
 Hi Vivek and Eric,
 
 IMHO, why don't we swap not only the contents of the top 640K
 but also kernel working memory for kdump kernel?


Initial patches of kdump had adopted the same approach but given the
fact devices are not stopped during transition to new kernel after a
panic, it carried inherent risk of some DMA going on and corrupting the
new kernel/data structures. Hence the idea of running the kernel from a
reserved location came up. This should be DMA safe as long as DMA is not
misdirected.

Thanks 
Vivek

 
 I guess this approach has some good points.
 
  1.Preallocating reserved area is not mandatory at boot time.
And the reserved area can be distributed in small pieces
like original kexec does.
 
  2.Special linking is not required for kdump kernel.
Each kdump kernel can be linked in the same way,
where the original kernel exists.
 
 Am I missing something?
  physical memory
+---+
| 640K  +
|...|   |
|   | copy
+---+   |
|   |   |
|original-+|
|kernel |  ||
|   |  ||
|...|  ||
|   |  ||
|   |  ||
|   | swap  |
|   |  ||
+---+  ||
|reserved--+
|area   |  |
|   |  |
|kdump  |-+
|kernel |
+---+
|   |
|   |
|   |
+---+
 
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Itsuro Oda [EMAIL PROTECTED] writes:

 Hi,
 
 On 02 Feb 2005 08:24:03 -0700
 [EMAIL PROTECTED] (Eric W. Biederman) wrote:
  
  So the kernel+initrd that captures a crash dump will live and execute
  in a reserved area of memory.  It needs to know which memory regions
  are valid, and it needs to know small things like the final register
  state of each cpu. 
 
 Exactly.
 
 Please let me clarify what you are going to.
 1) standard kernel: reserve a small contigous area for a dump kernel
(this is not changed as the current code)
 2) standard kernel: export the information of valid physical memory
regions. (/proc/iomem or /proc/cpumem etc.)
 3) kexec (system call?): store the information of valid physical memory
regions as ELF program header to the reserved area (mentioned 1)).

A better description is probably make a list of memory regions
using an ELF header data structure in user space.  
Use sys_kexec_load to put that list the dump kernel and a little
big of glue code in the reserved area.  The glue code includes
a hash of all of everything so it can all be validated before
use.

 4) standard kernel: when a panic occur, append (ex.) the register
information as ELF note after the memory information (if necessary).
and jump new kernel

Record the register information as ELF notes in a per cpu data
area.  The per cpu data areas are known and enumerated in
the list of memory regions.  The kernel knows nothing about
the ELF header etc.

 5) dump kernel: export all valid physical memory (and saved register
information) to the user. (as /dev/oldmem /proc/vmcore ?)

Or in user space, by just mmaping /dev/mem. That is part of the
current conversation.   The only real point for putting that code in
the kernel (besides momentum) is it is a cheap way to get the exact
data structures of the kernel you are using.  But since:
(a) it does not look like any primary kernel data structures need to
be examined.
(b) even simple compile options like SMP/NOSMP are enough to change
the layout of the data structures.
I think there is a pretty good case for moving all of the work to
user space.  But you still need a kernel that loads and
runs in the reserved area.

 Is this correct ?  one question: how the dump kernel know the saved
 area of ELF headers ?

A command line parameter will be passed.  Probably
elfcorehdr=xxx 

 one more question: I don't understand what the 640K backup area is. 
 Please let me know why it is necessary.

In practice I think we can kill it on x86.  It is necessary (at least
a subset of it is) if we want to boot a SMP kernel.  As cpu must
start running code in the first 1M of the address space.  In addition
some architectures have exceptions vectors and or other data
structures at fixed locations in memory so in the general case a
backup area is required.  So building the infrastructure to handle
backup areas is needed even, even if we later stop using it on
x86.

The other reason for the 640K backup area is the IBM guys were having
problems without it.   The fact that you don't need it is a good
indication that it is unnecessary.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Hirokazu Takahashi [EMAIL PROTECTED] writes:

 Hi Vivek and Eric,
 
 IMHO, why don't we swap not only the contents of the top 640K
 but also kernel working memory for kdump kernel?
 
 I guess this approach has some good points.
 
  1.Preallocating reserved area is not mandatory at boot time.
And the reserved area can be distributed in small pieces
like original kexec does.
 
  2.Special linking is not required for kdump kernel.
Each kdump kernel can be linked in the same way,
where the original kernel exists.
 
 Am I missing something?

Preallocating the reserved area is largely to keep it from
being the target of DMA accesses.  Since we are not able
to shutdown any of the drivers in the primary kernel running
in a normal swath of memory sounds like a good way to get
yourself stomped at the worst possible time.

In addition we get to avoid running a lot of code in the
panic path if we are jumping to a contiguous region of memory 
with everything already setup.

To some extent this is a contest who has the better imagination
for things that can go wrong.  Real life on dying hardware and
kernels, or the programmers writing the diagnostic code.

But if it is a gamble you are willing to take it is quite
feasible to use the reserved region for what you are
proposing and you could run a standard kernel.

The other reason for running out of the reserved region is that
it actually requires less memory reserved.  Every byte you backup
needs to have a reserved area of memory to hold it.  And if you are
also going to fill that with meaningful content you need another
byte to hold the data.  So using a stock kernel probably requires
2/3 more memory.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Hirokazu Takahashi

Hi Vivek, 

  Hi Vivek and Eric,
  
  IMHO, why don't we swap not only the contents of the top 640K
  but also kernel working memory for kdump kernel?
 
 
 Initial patches of kdump had adopted the same approach but given the
 fact devices are not stopped during transition to new kernel after a
 panic, it carried inherent risk of some DMA going on and corrupting the
 new kernel/data structures. Hence the idea of running the kernel from a
 reserved location came up. This should be DMA safe as long as DMA is not
 misdirected.

I see, that makes sense.
But I'm not sure yet that it's safe to access the top of 640MB.
I wonder how kmalloc(GFP_DMA) works in a kdump kernel.

Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Hirokazu Takahashi [EMAIL PROTECTED] writes:

 Hi Vivek, 
 
   Hi Vivek and Eric,
   
   IMHO, why don't we swap not only the contents of the top 640K
   but also kernel working memory for kdump kernel?
  
  
  Initial patches of kdump had adopted the same approach but given the
  fact devices are not stopped during transition to new kernel after a
  panic, it carried inherent risk of some DMA going on and corrupting the
  new kernel/data structures. Hence the idea of running the kernel from a
  reserved location came up. This should be DMA safe as long as DMA is not
  misdirected.
 
 I see, that makes sense.
 But I'm not sure yet that it's safe to access the top of 640MB.
640K?

 I wonder how kmalloc(GFP_DMA) works in a kdump kernel.

All that happens there is a one line change to vmlinux.lds.S  that
causes the kernel to live at a different physical and virtual
address.  So everything works as normal.

I do agree that it is risky to use the first 640K for normal work.
But on the list of things to fix it is a minor war, and even if we
back up that region of memory we don't need to use it.

There are still remain a lot of code reviews to ensure the code is
generally safe.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Hirokazu Takahashi

Hi Eric,

  Hi Vivek and Eric,
  
  IMHO, why don't we swap not only the contents of the top 640K
  but also kernel working memory for kdump kernel?
  
  I guess this approach has some good points.
  
   1.Preallocating reserved area is not mandatory at boot time.
 And the reserved area can be distributed in small pieces
 like original kexec does.
  
   2.Special linking is not required for kdump kernel.
 Each kdump kernel can be linked in the same way,
 where the original kernel exists.
  
  Am I missing something?
 
 Preallocating the reserved area is largely to keep it from
 being the target of DMA accesses.  Since we are not able
 to shutdown any of the drivers in the primary kernel running
 in a normal swath of memory sounds like a good way to get
 yourself stomped at the worst possible time.

So what do you think my another idea?

I think we can always make a kdump kernel mapped to the same virtual
address. So we will be free from caring about the physical address
where the kdump kernel is loaded.

I believe the memsection functionality which LHMS project is working
on would help this.

+
|
|
(user space)
|
|
  physical  | virtual
  memory| space
 +  +
 |  |
 |  |
 |  |
 + .+
original |   .  | map kdump kernel here
kernel   | .|
 |   .  |
 | .   .+
 +   .   .  |
 | .   .|
 +   .  |
  kdump  | .|
  kernel |   .  |
 | .|
 +  |
 |  |
 |  |
 |  |



Thanks,
Hirokazu Takahashi.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Hirokazu Takahashi [EMAIL PROTECTED] writes:

 Hi Eric,
 
   Hi Vivek and Eric,
   
   IMHO, why don't we swap not only the contents of the top 640K
   but also kernel working memory for kdump kernel?
   
   I guess this approach has some good points.
   
1.Preallocating reserved area is not mandatory at boot time.
  And the reserved area can be distributed in small pieces
  like original kexec does.
   
2.Special linking is not required for kdump kernel.
  Each kdump kernel can be linked in the same way,
  where the original kernel exists.
   
   Am I missing something?
  
  Preallocating the reserved area is largely to keep it from
  being the target of DMA accesses.  Since we are not able
  to shutdown any of the drivers in the primary kernel running
  in a normal swath of memory sounds like a good way to get
  yourself stomped at the worst possible time.
 
 So what do you think my another idea?

I have proposed it.  I think ia64 already does that.
It has been pointed that the PowerPC kernel occasionally runs
with the mmu turned off. So it is not a technique the is 100%
portable.
 
 I think we can always make a kdump kernel mapped to the same virtual
 address. So we will be free from caring about the physical address
 where the kdump kernel is loaded.
 
 I believe the memsection functionality which LHMS project is working
 on would help this.

You don't need anything fancy except to build the page tables
during bootup.  However there are a few potential gotchas
with respect to using large pages, that can give 4MiB or
greater alignment restrictions on the kernel.  Code wise
the gotcha is moving the kernel's .text section into what
is essentially the vmalloc portion of the address space.
For x86_64 the kernels virtual address is already decoupled from the
physical addresses, so it is probably easier.

Most of this just results in easier management between the pieces.
Which is a good thing.  However at the moment I don't think it
simplifies any of the core problems.  I still need to reserve
a large hunk of physical address space early on before any
DMA transactions are setup to hold the new kernel.

So while I am happy to see patches that improve this I don't
actually care right now.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Vivek Goyal

On Wed, 2005-02-02 at 21:12, Eric W. Biederman wrote:
 Vivek Goyal [EMAIL PROTECTED] writes:
 
  On Tue, 2005-02-01 at 20:56, Eric W. Biederman wrote:
   Vivek Goyal [EMAIL PROTECTED] writes:
  
  elfcorehdr= also looks good.
 
 Then let's go with that for now.  It is not perfect but it seems
 a little more self explanatory at first glance.
   A clarification on terminology we are talking about struct Elf64_Phdr
   here.  There is only one Elf header.  That seems to be clear farther
   down.
   
  
  
  Exactly. There shall be one Elf header for whole of the image. In
  addition there will be one struct Elf64_Phdr, per contiguous physical
  memory area. One Elf64_Phdr of PT_NOTE type for notes section and one
  Elf64_Phdr for backup region.
 
 Actually if we are just pointing a kernel data structures we will
 need multiple Elf64_Phdr of PT_NOTE.  Each cpu has it's own
 notes section and until the smoke clears we can't be confident
 about what is going to wind up there or how densely those will
 be packed.  So collapsing everything into a single notes segment
 needs to happen after we have switched to the crash capture kernel.


Sounds good. So there shall be a PT_NOTE type program header per cpu.
And these headers can be collapsed into one PT_NOTE type header later.


 
   I have serious concerns about the kernel generating the ELF headers
   and only delivering them after the kernel has crashed.  Because
   then we run into questions of what information can be trusted.  If we
   avoid that issue I am not too concerned.
  
  
  I hope, all elf headers once prepared by kexec-tools need not to change
  later (Cannot think of any piece of information which shall change
  later). These shall be put in separate segment. And SHA-256 shall take
  care of authenticity of information after crash.
 
 That should work fine.  We need to consider through throwing in an
 extra note section with information like kernel version that
 we can capture while the system is running.
 
  For notes section program header, virtual = physical = 0 and offset
  shall point to crash_notes[], so that notes can directly be read by the
  capture kernel (or user space).
 
 I agree.  But see my caveat.  I think we should have one PT_NOTE
 segment point at each element of the crash_notes[] array.  I know
 it is technically a violation of the ELF spec.  But in this case
 it makes sense.   Since we can't guarantee that crash_notes will
 be packed properly I don't know that we could reliably see more
 than one cpu if we pointed a PT_NOTE header at the whole thing.
 
 If it turns out that we can reliably point a single PT_NOTE header
 at crash_notes so much the better but things are likely to be
 more robust if we don't start with that assumption.  That
 at least allows us the freedom to capture some notes (like NT_UTSNAME)
 before the kernel crashes.
 
 Eric
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Itsuro Oda

Hi,

On 03 Feb 2005 02:00:51 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

 A better description is probably make a list of memory regions
 using an ELF header data structure in user space.  
 Use sys_kexec_load to put that list the dump kernel and a little
 big of glue code in the reserved area.  The glue code includes
 a hash of all of everything so it can all be validated before
 use.

I see. The data structure is put on a part of loading kernel's data. 

 Record the register information as ELF notes in a per cpu data
 area.  The per cpu data areas are known and enumerated in
 the list of memory regions.  The kernel knows nothing about
 the ELF header etc.
 

I see.

  5) dump kernel: export all valid physical memory (and saved register
 information) to the user. (as /dev/oldmem /proc/vmcore ?)
 
 Or in user space, by just mmaping /dev/mem. That is part of the
 current conversation.   The only real point for putting that code in
 the kernel (besides momentum) is it is a cheap way to get the exact
 data structures of the kernel you are using.  But since:
 (a) it does not look like any primary kernel data structures need to
 be examined.
 (b) even simple compile options like SMP/NOSMP are enough to change
 the layout of the data structures.
 I think there is a pretty good case for moving all of the work to
 user space.  But you still need a kernel that loads and
 runs in the reserved area.
 
I don't make sense. what do you mean ?

What we want to do when the system is crashed is storing the whole
physical memory (and saved register information for x86 arch) to
some place (ex. a disk partition) for later analysis. 
So the basic requirments to the dump kernel is that:
 * supply a method to access whole (valid) physical memory.
 * supply a method to access the saved register information.

Does the kdump meet this requirment ? 

(I am not interesting to /proc/vmcore. Constructing the vmcore
 image is area of analysis tools. not kernel's task.)

Thanks.
-- 
Itsuro ODA [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Itsuro Oda

Hi,

On 02 Feb 2005 07:45:11 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

 
 And the feedback begins :)
 
 Itsuro Oda [EMAIL PROTECTED] writes:
 
  Hi,
  
  I don't like calling crash_kexec() directly in (ex.) panic().
  It should be call_dump_hook() (or something like this).
  
  I think the necessary modifications of the kernel is only:
  - insert the hooks that calls a dump function when crash occur
 crash_kexec()
  - binding interface that binds a dump function to the hook
(like register_dump_hook())
 sys_kexec_load(...);

For example there are pepole who want to execute a built in kernel
debugger when the system is crashed. or there are pepole who
believe the diskdump is the best dump tool :-)

So I think a sort of hook is better than calling crash_kexec 
directly. (May I make a patch ?)

Thanks.
-- 
Itsuro ODA [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Itsuro Oda [EMAIL PROTECTED] writes:

 Hi,
 
 On 03 Feb 2005 02:00:51 -0700
 [EMAIL PROTECTED] (Eric W. Biederman) wrote:
 
   5) dump kernel: export all valid physical memory (and saved register
  information) to the user. (as /dev/oldmem /proc/vmcore ?)
  
  Or in user space, by just mmaping /dev/mem. That is part of the
  current conversation.   The only real point for putting that code in
  the kernel (besides momentum) is it is a cheap way to get the exact
  data structures of the kernel you are using.  But since:
  (a) it does not look like any primary kernel data structures need to
  be examined.
  (b) even simple compile options like SMP/NOSMP are enough to change
  the layout of the data structures.
  I think there is a pretty good case for moving all of the work to
  user space.  But you still need a kernel that loads and
  runs in the reserved area.
  
 I don't make sense. what do you mean ?
 
 What we want to do when the system is crashed is storing the whole
 physical memory (and saved register information for x86 arch) to
 some place (ex. a disk partition) for later analysis. 
 So the basic requirments to the dump kernel is that:
  * supply a method to access whole (valid) physical memory.
  * supply a method to access the saved register information.
 
 Does the kdump meet this requirment ? 

Yes, the discussion in this area is what is the best way to implement
this requirement.   How much should be in the kernel and how much
should be in user space.

At the moment things are broken but should be fixed shortly.
So what has been implemented are /dev/oldmem which provides access
to the old memory.  And /proc/vmcore which provides both the old
memory and the register information.

 (I am not interesting to /proc/vmcore. Constructing the vmcore
  image is area of analysis tools. not kernel's task.)

There is a fine line there, as a simple ELF core dump has just enough
information to describe discontiguous memory, and to have an out of
band channel for register information.  Adding anything extra like
virtual addresses that match the kernel should be left for the
crash dump analysis tools.

In code that is currently in the mainstream kernel /dev/mem can
mmap any area of memory that is not used by the kernel as ram.
So what I believe we will end up is that /sbin/kexec (user space)
will prepare an ELF header (data) that describes the memory regions
and details where to find the kernels register information.  The
address of that ELF header will be passed to the crash dump
capture kernel and user space combination.  The something
(probably a user space program reading /dev/mem) will look
at the ELF header and save the already prepared ELF core
dump to disk.  Possibly doing little things like merging
the MAX_NR_CPUS note segments into one so it actually conforms
to the ELF spec.

This thread started as the design discussion before finishing
that part of the implementation.   The proof of concept
implementations have happened.  We have all seen this kind
of functionality implemented.  Now is the time to come up
with a good solid design that can be maintained and merged
into the mainline kernel and distros.

So thank you for ask questions, it means we have a better chance 
of getting a solid design and a design that those people who
care about this functionality can use.  And with a little luck
we can all wind up on agreeing on the general principles.  You came in
a little late to this conversation so a lot of details have been
settled, but if you have a good argument for doing something another
way we can certainly look at that. 

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Eric W. Biederman

Itsuro Oda [EMAIL PROTECTED] writes:

 Hi,
 
 On 02 Feb 2005 07:45:11 -0700
 [EMAIL PROTECTED] (Eric W. Biederman) wrote:
 
  
  And the feedback begins :)
  
  Itsuro Oda [EMAIL PROTECTED] writes:
  
   Hi,
   
   I don't like calling crash_kexec() directly in (ex.) panic().
   It should be call_dump_hook() (or something like this).
   
   I think the necessary modifications of the kernel is only:
   - insert the hooks that calls a dump function when crash occur
  crash_kexec()
   - binding interface that binds a dump function to the hook
 (like register_dump_hook())
  sys_kexec_load(...);
 
 For example there are pepole who want to execute a built in kernel
 debugger when the system is crashed. or there are pepole who
 believe the diskdump is the best dump tool :-)
 
 So I think a sort of hook is better than calling crash_kexec 
 directly. (May I make a patch ?)

The prevalent feeling I have heard from kernel developers and 
and my personal feeling as well is that after a kernel has called
panic you can't trust it.  Which means anything running in the kernel
itself is suspect.

The crash_kexec() hooks enables everything that does not get linked into
the kernel.   So I don't feel a hook in the panic path is necessary
nor do I feel that it is wise, especially with no in-kernel users.

Plus the worst part about a hook in the panic path is that it is
inherently racy.  Keeping the crash_kexec() code from blocking or
being racy has been a challenge.  And I still think that entire code
path needs a review and some more code tweaks to remove races.

If someone else wants a hook in the panic path they can add their own
hook, and make their own case for why it is needed.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-03 Thread Itsuro Oda

Hi,

On Fri, 04 Feb 2005 08:18:56 +0900
Itsuro Oda [EMAIL PROTECTED] wrote:

 
   5) dump kernel: export all valid physical memory (and saved register
  information) to the user. (as /dev/oldmem /proc/vmcore ?)
  
  Or in user space, by just mmaping /dev/mem. That is part of the
  current conversation.   The only real point for putting that code in
  the kernel (besides momentum) is it is a cheap way to get the exact
  data structures of the kernel you are using.  But since:
  (a) it does not look like any primary kernel data structures need to
  be examined.
  (b) even simple compile options like SMP/NOSMP are enough to change
  the layout of the data structures.
  I think there is a pretty good case for moving all of the work to
  user space.  But you still need a kernel that loads and
  runs in the reserved area.
  
 I don't make sense. what do you mean ?
 

I don't make sense. should be It does not make sense.
sorry. I'm not familiar with English.

-- 
Itsuro ODA [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Itsuro Oda

Hi,

On 02 Feb 2005 08:24:03 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:
> 
> So the kernel+initrd that captures a crash dump will live and execute
> in a reserved area of memory.  It needs to know which memory regions
> are valid, and it needs to know small things like the final register
> state of each cpu. 

Exactly.

Please let me clarify what you are going to.
1) standard kernel: reserve a small contigous area for a dump kernel
   (this is not changed as the current code)
2) standard kernel: export the information of valid physical memory
   regions. (/proc/iomem or /proc/cpumem etc.)
3) kexec (system call?): store the information of valid physical memory
   regions as ELF program header to the reserved area (mentioned 1)).
4) standard kernel: when a panic occur, append (ex.) the register
   information as ELF note after the memory information (if necessary).
   and jump new kernel
5) dump kernel: export all valid physical memory (and saved register
   information) to the user. (as /dev/oldmem /proc/vmcore ?)

Is this correct ?  one question: how the dump kernel know the saved
area of ELF headers ?

one more question: I don't understand what the 640K backup area is. 
Please let me know why it is necessary.

Thanks.
-- 
Itsuro ODA <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Hirokazu Takahashi

Hi Vivek and Eric,

IMHO, why don't we swap not only the contents of the top 640K
but also kernel working memory for kdump kernel?

I guess this approach has some good points.

 1.Preallocating reserved area is not mandatory at boot time.
   And the reserved area can be distributed in small pieces
   like original kexec does.

 2.Special linking is not required for kdump kernel.
   Each kdump kernel can be linked in the same way,
   where the original kernel exists.

Am I missing something?


 physical memory
   +---+
   | 640K  +
   |...|   |
   |   | copy
   +---+   |
   |   |   |
   |original<-+|
   |kernel |  ||
   |   |  ||
   |...|  ||
   |   |  ||
   |   |  ||
   |   | swap  |
   |   |  ||
   +---+  ||
   |reserved<--+
   |area   |  |
   |   |  |
   |kdump  |<-+
   |kernel |
   +---+
   |   |
   |   |
   |   |
   +---+



> Hi Eric,
> 
> It looks like we are looking at things a little differently. I
> see a portion of the picture in your mind, but obviously not 
> entirely.
> 
> Perhaps, we need to step back and iron out in specific terms what 
> the interface between the two kernels should be in the crash dump
> case, and the distribution of responsibility between kernel, user space
> and the user. 
> 
> [BTW, the patch was intended as a step in development up for
> comment early enough to be able to get agreement on the interface
> and think issues through to more completeness before going 
> too far. Sorry, if that wasn't apparent.]
> 
> When you say "evil intermingling", I'm guessing you mean the
> "crashbackup=" boot parameter ? If so, then yes, I agree it'd
> be nice to find a way around it that doesn't push hardcoding
> elsewhere.
> 
> Let me explain the interface/approach I was looking at.
> 
> 1.First kernel reserves some area of memory for crash/capture kernel as
> specified by [EMAIL PROTECTED] boot time parameter.
> 
> 2.First kernel marks the top 640K of this area as backup area. (If
> architecture needs it.) This is sort of a hardcoding and probably this
> space reservation can be managed from user space as well as mentioned by
> you in this mail below.
> 
> 3. Location of backup region is exported through /proc/iomem which can
> be read by user space utility to pass this information to purgatory code
> to determine where to copy the first 640K.
> 
> Note that we do not make any additional reservation for the 
> backup region. We carve this out from the top of the already 
> reserved region and export it through /proc/iomem so that 
> the user space code and the capture kernel code need not 
> make any assumptions about where this region is located.
> 
> 4. Once the capture kernel boots, it needs to know the location of
> backup region for two purposes.
> 
> a. It should not overwrite the backup region.
> 
> b. There needs to be a way for the capture tool to access the original
>contents of the backed up region
> 
> Boot time parameter [EMAIL PROTECTED] has been provided to pass this
> information to capture kernel. This parameter is valid only for capture
> kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.
> 
> 
> > What is wrong with user space doing all of the extra space
> > reservation?
> 
> Just for clarity, are you suggesting kexec-tools creating an additional
> segment for the backup region and pass the information to kernel.
> 
> There is no problem in doing reservation from user space except
> one. How does the user and in-turn capture kernel come to know the
> location of backup region, assuming that the user is going to provide
> the exactmap for capture kernel to boot into.
> 
> Just a thought, is it  a good idea for kexec-tools to be creating and
> passing memmap parameters doing appropriate adjustment for backup
> region.
> 
> I had another question. How is the starting location of elf headers 
> communicated to capture tool? Is parameter segment a good idea? or 
> some hardcoding? 
> 
> Another approach can be that backup area information is encoded in elf
> headers and capture kernel is booted with modified memmap (User gets
> backup region information from /proc/iomem) and capture tool can
> extract backup area information from elf headers as stored by first
> kernel.
> 
> Could you please elaborate a little more on what aspect of your view
> differs from the above.
> 
> Thanks
> Vivek

Thaks,
Hirokazu Takahashi.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

Vivek Goyal <[EMAIL PROTECTED]> writes:

> On Tue, 2005-02-01 at 20:56, Eric W. Biederman wrote:
> > Vivek Goyal <[EMAIL PROTECTED]> writes:
> 
> "elfcorehdr=" also looks good.

Then let's go with that for now.  It is not perfect but it seems
a little more self explanatory at first glance.

> > A clarification on terminology we are talking about struct Elf64_Phdr
> > here.  There is only one Elf header.  That seems to be clear farther
> > down.
> > 
> 
> 
> Exactly. There shall be one Elf header for whole of the image. In
> addition there will be one struct Elf64_Phdr, per contiguous physical
> memory area. One Elf64_Phdr of PT_NOTE type for notes section and one
> Elf64_Phdr for backup region.

Actually if we are just pointing a kernel data structures we will
need multiple Elf64_Phdr of PT_NOTE.  Each cpu has it's own
notes section and until the smoke clears we can't be confident
about what is going to wind up there or how densely those will
be packed.  So collapsing everything into a single notes segment
needs to happen after we have switched to the crash capture kernel.

> > I have serious concerns about the kernel generating the ELF headers
> > and only delivering them after the kernel has crashed.  Because
> > then we run into questions of what information can be trusted.  If we
> > avoid that issue I am not too concerned.
> 
> 
> I hope, all elf headers once prepared by kexec-tools need not to change
> later (Cannot think of any piece of information which shall change
> later). These shall be put in separate segment. And SHA-256 shall take
> care of authenticity of information after crash.

That should work fine.  We need to consider through throwing in an
extra note section with information like kernel version that
we can capture while the system is running.

> For notes section program header, virtual = physical = 0 and "offset"
> shall point to crash_notes[], so that notes can directly be read by the
> capture kernel (or user space).

I agree.  But see my caveat.  I think we should have one PT_NOTE
segment point at each element of the crash_notes[] array.  I know
it is technically a violation of the ELF spec.  But in this case
it makes sense.   Since we can't guarantee that crash_notes will
be packed properly I don't know that we could reliably see more
than one cpu if we pointed a PT_NOTE header at the whole thing.

If it turns out that we can reliably point a single PT_NOTE header
at crash_notes so much the better but things are likely to be
more robust if we don't start with that assumption.  That
at least allows us the freedom to capture some notes (like NT_UTSNAME)
before the kernel crashes.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

Koichi Suzuki <[EMAIL PROTECTED]> writes:

> Itsuro Oda wrote:
> > Hi,
> > I can't understand why ELF format is necessary.
> > I think the only necessary information is "what physical address regions are
> > valid to read". This information is necessary for any
> > sort of dump tools. (and must get it while the system is normal.)
> > The Eric's /proc/cpumem idea sounds nice to me.
> 
> I agree.  Format conversion should be done in healthy system separately and we
> should restrict what to do while taking the dump as few as possible.  
> Conversion
> from just memory image to crash/lcrash format will be very useful to use
> existing tools and experiences.   I already have such tool and (if my
> administration allows) I can make such tool open. Let me do some paperwork.

The big part of the conversation that is happening right now is how
do we uncouple dependencies between the various parts as much as
possible.  There is nothing here about format conversions except
as to convert weird kernel formats into a stable interface.

There are 3 pieces of code interacting.
1) The primary kernel that will call panic.
2) The kernel+initrd that takes over.
3) The user space that sets it all up (/sbin/kexec) while the primary
   kernel is still in a sane state.

The goal is to make those 3 pieces as independent of each other as
reasonably possible.

So the kernel+initrd that captures a crash dump will live and execute
in a reserved area of memory.  It needs to know which memory regions
are valid, and it needs to know small things like the final register
state of each cpu.  For the set of valid memory regions it is the
intention to encode that as an array of ELF program headers.  The
information of what the final register contents were will be encoded
as ELF notes.  There will be one PT_NOTE segment per cpu that holds
the notes needed to encode a given cpu's final state.  It really
does not matter to implementation that captures each cpu's final
register state which format we record the data in so using a format
designed not to change is not a problem.  So all that needs
to be communicated to the kernel+initrd that captures a crash
dump is the location of an ELF header and it can figure out all of
the rest.

For the primary kernel except for remembering it's final cpu
register state as it dies it does nothing except jump to the 
crash recover kernel.  All of the interesting information will
be exported to user space.

/sbin/kexec is the glue that fills in the cracks.  While
the primary kernel is in a sane state it sets everything up including
finding out which memory areas need to be looked at.  And it stashes
it all in a reserved area of memory, that has never been the target
of DMA transfers.

The goal is to reduce the dependencies as much as possible.  So
an old stable kernel can take a crash dump of a new buggy kernel.
And so that you don't have to be running the latest and greatest
user space simply to set everything up.  Although it is still
better to require a user-space upgrade to cope with new
kernels than to require the crash capture kernel+initrd to
be upgraded.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

And the feedback begins :)

Itsuro Oda <[EMAIL PROTECTED]> writes:

> Hi,
> 
> I don't like calling crash_kexec() directly in (ex.) panic().
> It should be call_dump_hook() (or something like this).
> 
> I think the necessary modifications of the kernel is only:
> - insert the hooks that calls a dump function when crash occur
crash_kexec()
> - binding interface that binds a dump function to the hook
>   (like register_dump_hook())
sys_kexec_load(...);
> - supply the information of valid physical address regions
/proc/iomem or possibly /proc/cpumem.  At least until someone
actually implements hot plug memory support.

> (- maybe some existent functions and variables need to be exported ?)
> 
> I think this makes any sort of dump functions can be implemented
> as a kernel module. I don't think it is best way that the "kexec based 
> crashdump" is built in the kernel.

For people developing code outside of the kernel I can see where
this is a problem.  Given the insane auditing requirements necessary
to get a reliable code path I don't see how not putting the implementation
in the kernel is sane.  Anything that needs to be touched at that point
is core kernel functionality GPL_ONLY if it is exported at all.
Touching anything from a module at that point is not sane.

Basically the code path setup with crash_kexec is little more
than a jump instruction.  And it should be audited and reduced
as much as possible.  I don't see how you get simpler or what
piece of functionality could possibly improve by having multiple
implementations in kernel modules.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

Koichi Suzuki <[EMAIL PROTECTED]> writes:

> I meant with kexec and dump hook, there could be many more things can be done 
> in
> addition to full core dump.  Initiating failover to other node will be one
> example.   Starting with this hook, there must be many good ideas.   So my 
> idea
> is to make this hook general purpose, not for specific core dump tool.

Again that is what is has been implemented.  A fully stand alone executable
that lives in an independent and reserved address in memory is jumped
to.

The goal in the generic kernel is to keep the code path to do that
as small and as simple as possible to reduce the chances of it being
mis-implemented, or the chances of attempting to use corrupted kernel
functionality.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

Itsuro Oda <[EMAIL PROTECTED]> writes:

> Hi,
> 
> I can't understand why ELF format is necessary.

ELF format is not.  However essentially the information an ELF
provides is.  So using an ELF header to convey that information
is a sane choice of data structure.
 
> I think the only necessary information is "what physical address 
> regions are valid to read". This information is necessary for any
> sort of dump tools. (and must get it while the system is normal.)
> The Eric's /proc/cpumem idea sounds nice to me. 

Patches welcome.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Vivek Goyal

On Tue, 2005-02-01 at 20:56, Eric W. Biederman wrote:
> Vivek Goyal <[EMAIL PROTECTED]> writes:
> 
> > Well, trying to put the already discussed ideas together.  I was
> > planning to work on following design. Please comment.
> > 
> > Crashed Kernel <-->Capture Kernel(or User Space) Interface:
> > --
> > 
> > The whole idea is that Crash image is represented in ELF Core format.
> > These ELF Headers are prepared by kexec-tools user space and put in one
> > segment. Address of start of image is passed to the capture kernel(or
> > user space) using one command line (eg. crashimage=). Now either kernel
> > space or user space can parse the elf headers and extract required
> > information and export final kernel elf core image.
> 
> Sounds sane.  We need to make certain there is a checksum of that
> region but putting it in a separate segment should ensure that.
> 
> I also think we need to look at the name crashimage= and see if we
> can find something more descriptive.  But that is minor.  Possibly
> elfcorehdr=  We have a little while to think about that one before we
> are stuck.

"elfcorehdr=" also looks good.

> 
> > > [EMAIL PROTECTED] wrote: 
> > > If we were using an ELF header I would include one PT_NOTE program 
> > > header per cpu (Giving each cpu it's own area to mess around in).
> > > And I would use one PT_LOAD segment per possible memory zone.
> > > So in the worst case (current sgi altix) (MAX_NUMNODES=256,
> > > MAX_NR_ZONES=3, MAX_NR_CPUS=1024) 256*3+1024 = 1792 program
> > > headers.   At 56 bytes per 64bit program header that is 100352 bytes
> > > or 98KiB.  A little bit expensive.  A tuned data structure with
> > > 64bit base and size would only consume 1792*16 = 28672 or 28KiB.
> > 
> > If I prepare One elf header for each physical contiguous memory area (as
> > obtained from /proc/iomem) instead of per zone, then number of elf
> > headers will come down significantly. 
> 
> A clarification on terminology we are talking about struct Elf64_Phdr
> here.  There is only one Elf header.  That seems to be clear farther
> down.
> 


Exactly. There shall be one Elf header for whole of the image. In
addition there will be one struct Elf64_Phdr, per contiguous physical
memory area. One Elf64_Phdr of PT_NOTE type for notes section and one
Elf64_Phdr for backup region.


> > I don't have any idea on number of
> > actual physically contiguous regions present per machine, but roughly
> > assuming it to be 1 per node, it will lead to 256 + 1024 = 1280 program
> > headers.At 56 bytes per 64 bit program header this will amount to 70KB. 
> > 
> > This is worst case estimate and on lower end machines this will require
> > much less a space. On machines as big as 1024 cpus, this should not be a
> > concern, as big machines come with big RAMs.
> 
> Agreed.  Size is not the primary issue.  There is some clear waste
> but that is a secondary concern.  Not performing a 1-1 mapping
> to the kernel data structures also seems to be a win, as the concepts
> are noticeably different.
>  
> > Eric, do you still think that ELF headers are inappropriate to be passed
> > across interface boundary.
> 
> I have serious concerns about the kernel generating the ELF headers
> and only delivering them after the kernel has crashed.  Because
> then we run into questions of what information can be trusted.  If we
> avoid that issue I am not too concerned.


I hope, all elf headers once prepared by kexec-tools need not to change
later (Cannot think of any piece of information which shall change
later). These shall be put in separate segment. And SHA-256 shall take
care of authenticity of information after crash.


>  
> > Regarding Backup Region
> > ---
> > 
> > - Kexec user space does the reservation for backup region segment.
> > - Purgatory copies the backup data to backup region. (Already
> > implemented)
> > - A separate elf header is prepared to represent backed up memory
> > region. And "offset" field of this program header can contain the actual
> > physical address where backup contents are stored. 
> 
> I like that.  I was thinking a virtual versus physical address
> separation.  But using the offset field is much more appropriate,
> and it leaves us the potential of doing something nice like specifying
> the kernels virtual address later on.  Looking exclusively at the
> offset field to know which memory addresses to dump sounds good.
> For now we should have virtual==physical==offset except for the
> backup region.


For notes section program header, virtual = physical = 0 and "offset"
shall point to crash_notes[], so that notes can directly be read by the
capture kernel (or user space).

Thanks
Vivek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Vivek Goyal

On Tue, 2005-02-01 at 20:56, Eric W. Biederman wrote:
 Vivek Goyal [EMAIL PROTECTED] writes:
 
  Well, trying to put the already discussed ideas together.  I was
  planning to work on following design. Please comment.
  
  Crashed Kernel --Capture Kernel(or User Space) Interface:
  --
  
  The whole idea is that Crash image is represented in ELF Core format.
  These ELF Headers are prepared by kexec-tools user space and put in one
  segment. Address of start of image is passed to the capture kernel(or
  user space) using one command line (eg. crashimage=). Now either kernel
  space or user space can parse the elf headers and extract required
  information and export final kernel elf core image.
 
 Sounds sane.  We need to make certain there is a checksum of that
 region but putting it in a separate segment should ensure that.
 
 I also think we need to look at the name crashimage= and see if we
 can find something more descriptive.  But that is minor.  Possibly
 elfcorehdr=  We have a little while to think about that one before we
 are stuck.

elfcorehdr= also looks good.

 
   [EMAIL PROTECTED] wrote: 
   If we were using an ELF header I would include one PT_NOTE program 
   header per cpu (Giving each cpu it's own area to mess around in).
   And I would use one PT_LOAD segment per possible memory zone.
   So in the worst case (current sgi altix) (MAX_NUMNODES=256,
   MAX_NR_ZONES=3, MAX_NR_CPUS=1024) 256*3+1024 = 1792 program
   headers.   At 56 bytes per 64bit program header that is 100352 bytes
   or 98KiB.  A little bit expensive.  A tuned data structure with
   64bit base and size would only consume 1792*16 = 28672 or 28KiB.
  
  If I prepare One elf header for each physical contiguous memory area (as
  obtained from /proc/iomem) instead of per zone, then number of elf
  headers will come down significantly. 
 
 A clarification on terminology we are talking about struct Elf64_Phdr
 here.  There is only one Elf header.  That seems to be clear farther
 down.
 


Exactly. There shall be one Elf header for whole of the image. In
addition there will be one struct Elf64_Phdr, per contiguous physical
memory area. One Elf64_Phdr of PT_NOTE type for notes section and one
Elf64_Phdr for backup region.


  I don't have any idea on number of
  actual physically contiguous regions present per machine, but roughly
  assuming it to be 1 per node, it will lead to 256 + 1024 = 1280 program
  headers.At 56 bytes per 64 bit program header this will amount to 70KB. 
  
  This is worst case estimate and on lower end machines this will require
  much less a space. On machines as big as 1024 cpus, this should not be a
  concern, as big machines come with big RAMs.
 
 Agreed.  Size is not the primary issue.  There is some clear waste
 but that is a secondary concern.  Not performing a 1-1 mapping
 to the kernel data structures also seems to be a win, as the concepts
 are noticeably different.
  
  Eric, do you still think that ELF headers are inappropriate to be passed
  across interface boundary.
 
 I have serious concerns about the kernel generating the ELF headers
 and only delivering them after the kernel has crashed.  Because
 then we run into questions of what information can be trusted.  If we
 avoid that issue I am not too concerned.


I hope, all elf headers once prepared by kexec-tools need not to change
later (Cannot think of any piece of information which shall change
later). These shall be put in separate segment. And SHA-256 shall take
care of authenticity of information after crash.


  
  Regarding Backup Region
  ---
  
  - Kexec user space does the reservation for backup region segment.
  - Purgatory copies the backup data to backup region. (Already
  implemented)
  - A separate elf header is prepared to represent backed up memory
  region. And offset field of this program header can contain the actual
  physical address where backup contents are stored. 
 
 I like that.  I was thinking a virtual versus physical address
 separation.  But using the offset field is much more appropriate,
 and it leaves us the potential of doing something nice like specifying
 the kernels virtual address later on.  Looking exclusively at the
 offset field to know which memory addresses to dump sounds good.
 For now we should have virtual==physical==offset except for the
 backup region.


For notes section program header, virtual = physical = 0 and offset
shall point to crash_notes[], so that notes can directly be read by the
capture kernel (or user space).

Thanks
Vivek

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

Itsuro Oda [EMAIL PROTECTED] writes:

 Hi,
 
 I can't understand why ELF format is necessary.

ELF format is not.  However essentially the information an ELF
provides is.  So using an ELF header to convey that information
is a sane choice of data structure.
 
 I think the only necessary information is what physical address 
 regions are valid to read. This information is necessary for any
 sort of dump tools. (and must get it while the system is normal.)
 The Eric's /proc/cpumem idea sounds nice to me. 

Patches welcome.

Eric

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

Koichi Suzuki [EMAIL PROTECTED] writes:

 I meant with kexec and dump hook, there could be many more things can be done 
 in
 addition to full core dump.  Initiating failover to other node will be one
 example.   Starting with this hook, there must be many good ideas.   So my 
 idea
 is to make this hook general purpose, not for specific core dump tool.

Again that is what is has been implemented.  A fully stand alone executable
that lives in an independent and reserved address in memory is jumped
to.

The goal in the generic kernel is to keep the code path to do that
as small and as simple as possible to reduce the chances of it being
mis-implemented, or the chances of attempting to use corrupted kernel
functionality.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman


And the feedback begins :)

Itsuro Oda [EMAIL PROTECTED] writes:

 Hi,
 
 I don't like calling crash_kexec() directly in (ex.) panic().
 It should be call_dump_hook() (or something like this).
 
 I think the necessary modifications of the kernel is only:
 - insert the hooks that calls a dump function when crash occur
crash_kexec()
 - binding interface that binds a dump function to the hook
   (like register_dump_hook())
sys_kexec_load(...);
 - supply the information of valid physical address regions
/proc/iomem or possibly /proc/cpumem.  At least until someone
actually implements hot plug memory support.

 (- maybe some existent functions and variables need to be exported ?)
 
 I think this makes any sort of dump functions can be implemented
 as a kernel module. I don't think it is best way that the kexec based 
 crashdump is built in the kernel.

For people developing code outside of the kernel I can see where
this is a problem.  Given the insane auditing requirements necessary
to get a reliable code path I don't see how not putting the implementation
in the kernel is sane.  Anything that needs to be touched at that point
is core kernel functionality GPL_ONLY if it is exported at all.
Touching anything from a module at that point is not sane.

Basically the code path setup with crash_kexec is little more
than a jump instruction.  And it should be audited and reduced
as much as possible.  I don't see how you get simpler or what
piece of functionality could possibly improve by having multiple
implementations in kernel modules.

Eric


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

Koichi Suzuki [EMAIL PROTECTED] writes:

 Itsuro Oda wrote:
  Hi,
  I can't understand why ELF format is necessary.
  I think the only necessary information is what physical address regions are
  valid to read. This information is necessary for any
  sort of dump tools. (and must get it while the system is normal.)
  The Eric's /proc/cpumem idea sounds nice to me.
 
 I agree.  Format conversion should be done in healthy system separately and we
 should restrict what to do while taking the dump as few as possible.  
 Conversion
 from just memory image to crash/lcrash format will be very useful to use
 existing tools and experiences.   I already have such tool and (if my
 administration allows) I can make such tool open. Let me do some paperwork.

The big part of the conversation that is happening right now is how
do we uncouple dependencies between the various parts as much as
possible.  There is nothing here about format conversions except
as to convert weird kernel formats into a stable interface.

There are 3 pieces of code interacting.
1) The primary kernel that will call panic.
2) The kernel+initrd that takes over.
3) The user space that sets it all up (/sbin/kexec) while the primary
   kernel is still in a sane state.

The goal is to make those 3 pieces as independent of each other as
reasonably possible.

So the kernel+initrd that captures a crash dump will live and execute
in a reserved area of memory.  It needs to know which memory regions
are valid, and it needs to know small things like the final register
state of each cpu.  For the set of valid memory regions it is the
intention to encode that as an array of ELF program headers.  The
information of what the final register contents were will be encoded
as ELF notes.  There will be one PT_NOTE segment per cpu that holds
the notes needed to encode a given cpu's final state.  It really
does not matter to implementation that captures each cpu's final
register state which format we record the data in so using a format
designed not to change is not a problem.  So all that needs
to be communicated to the kernel+initrd that captures a crash
dump is the location of an ELF header and it can figure out all of
the rest.

For the primary kernel except for remembering it's final cpu
register state as it dies it does nothing except jump to the 
crash recover kernel.  All of the interesting information will
be exported to user space.

/sbin/kexec is the glue that fills in the cracks.  While
the primary kernel is in a sane state it sets everything up including
finding out which memory areas need to be looked at.  And it stashes
it all in a reserved area of memory, that has never been the target
of DMA transfers.

The goal is to reduce the dependencies as much as possible.  So
an old stable kernel can take a crash dump of a new buggy kernel.
And so that you don't have to be running the latest and greatest
user space simply to set everything up.  Although it is still
better to require a user-space upgrade to cope with new
kernels than to require the crash capture kernel+initrd to
be upgraded.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Eric W. Biederman

Vivek Goyal [EMAIL PROTECTED] writes:

 On Tue, 2005-02-01 at 20:56, Eric W. Biederman wrote:
  Vivek Goyal [EMAIL PROTECTED] writes:
 
 elfcorehdr= also looks good.

Then let's go with that for now.  It is not perfect but it seems
a little more self explanatory at first glance.

  A clarification on terminology we are talking about struct Elf64_Phdr
  here.  There is only one Elf header.  That seems to be clear farther
  down.
  
 
 
 Exactly. There shall be one Elf header for whole of the image. In
 addition there will be one struct Elf64_Phdr, per contiguous physical
 memory area. One Elf64_Phdr of PT_NOTE type for notes section and one
 Elf64_Phdr for backup region.

Actually if we are just pointing a kernel data structures we will
need multiple Elf64_Phdr of PT_NOTE.  Each cpu has it's own
notes section and until the smoke clears we can't be confident
about what is going to wind up there or how densely those will
be packed.  So collapsing everything into a single notes segment
needs to happen after we have switched to the crash capture kernel.

  I have serious concerns about the kernel generating the ELF headers
  and only delivering them after the kernel has crashed.  Because
  then we run into questions of what information can be trusted.  If we
  avoid that issue I am not too concerned.
 
 
 I hope, all elf headers once prepared by kexec-tools need not to change
 later (Cannot think of any piece of information which shall change
 later). These shall be put in separate segment. And SHA-256 shall take
 care of authenticity of information after crash.

That should work fine.  We need to consider through throwing in an
extra note section with information like kernel version that
we can capture while the system is running.

 For notes section program header, virtual = physical = 0 and offset
 shall point to crash_notes[], so that notes can directly be read by the
 capture kernel (or user space).

I agree.  But see my caveat.  I think we should have one PT_NOTE
segment point at each element of the crash_notes[] array.  I know
it is technically a violation of the ELF spec.  But in this case
it makes sense.   Since we can't guarantee that crash_notes will
be packed properly I don't know that we could reliably see more
than one cpu if we pointed a PT_NOTE header at the whole thing.

If it turns out that we can reliably point a single PT_NOTE header
at crash_notes so much the better but things are likely to be
more robust if we don't start with that assumption.  That
at least allows us the freedom to capture some notes (like NT_UTSNAME)
before the kernel crashes.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Hirokazu Takahashi

Hi Vivek and Eric,

IMHO, why don't we swap not only the contents of the top 640K
but also kernel working memory for kdump kernel?

I guess this approach has some good points.

 1.Preallocating reserved area is not mandatory at boot time.
   And the reserved area can be distributed in small pieces
   like original kexec does.

 2.Special linking is not required for kdump kernel.
   Each kdump kernel can be linked in the same way,
   where the original kernel exists.

Am I missing something?


 physical memory
   +---+
   | 640K  +
   |...|   |
   |   | copy
   +---+   |
   |   |   |
   |original-+|
   |kernel |  ||
   |   |  ||
   |...|  ||
   |   |  ||
   |   |  ||
   |   | swap  |
   |   |  ||
   +---+  ||
   |reserved--+
   |area   |  |
   |   |  |
   |kdump  |-+
   |kernel |
   +---+
   |   |
   |   |
   |   |
   +---+



 Hi Eric,
 
 It looks like we are looking at things a little differently. I
 see a portion of the picture in your mind, but obviously not 
 entirely.
 
 Perhaps, we need to step back and iron out in specific terms what 
 the interface between the two kernels should be in the crash dump
 case, and the distribution of responsibility between kernel, user space
 and the user. 
 
 [BTW, the patch was intended as a step in development up for
 comment early enough to be able to get agreement on the interface
 and think issues through to more completeness before going 
 too far. Sorry, if that wasn't apparent.]
 
 When you say evil intermingling, I'm guessing you mean the
 crashbackup= boot parameter ? If so, then yes, I agree it'd
 be nice to find a way around it that doesn't push hardcoding
 elsewhere.
 
 Let me explain the interface/approach I was looking at.
 
 1.First kernel reserves some area of memory for crash/capture kernel as
 specified by [EMAIL PROTECTED] boot time parameter.
 
 2.First kernel marks the top 640K of this area as backup area. (If
 architecture needs it.) This is sort of a hardcoding and probably this
 space reservation can be managed from user space as well as mentioned by
 you in this mail below.
 
 3. Location of backup region is exported through /proc/iomem which can
 be read by user space utility to pass this information to purgatory code
 to determine where to copy the first 640K.
 
 Note that we do not make any additional reservation for the 
 backup region. We carve this out from the top of the already 
 reserved region and export it through /proc/iomem so that 
 the user space code and the capture kernel code need not 
 make any assumptions about where this region is located.
 
 4. Once the capture kernel boots, it needs to know the location of
 backup region for two purposes.
 
 a. It should not overwrite the backup region.
 
 b. There needs to be a way for the capture tool to access the original
contents of the backed up region
 
 Boot time parameter [EMAIL PROTECTED] has been provided to pass this
 information to capture kernel. This parameter is valid only for capture
 kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.
 
 
  What is wrong with user space doing all of the extra space
  reservation?
 
 Just for clarity, are you suggesting kexec-tools creating an additional
 segment for the backup region and pass the information to kernel.
 
 There is no problem in doing reservation from user space except
 one. How does the user and in-turn capture kernel come to know the
 location of backup region, assuming that the user is going to provide
 the exactmap for capture kernel to boot into.
 
 Just a thought, is it  a good idea for kexec-tools to be creating and
 passing memmap parameters doing appropriate adjustment for backup
 region.
 
 I had another question. How is the starting location of elf headers 
 communicated to capture tool? Is parameter segment a good idea? or 
 some hardcoding? 
 
 Another approach can be that backup area information is encoded in elf
 headers and capture kernel is booted with modified memmap (User gets
 backup region information from /proc/iomem) and capture tool can
 extract backup area information from elf headers as stored by first
 kernel.
 
 Could you please elaborate a little more on what aspect of your view
 differs from the above.
 
 Thanks
 Vivek

Thaks,
Hirokazu Takahashi.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-02 Thread Itsuro Oda

Hi,

On 02 Feb 2005 08:24:03 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:
 
 So the kernel+initrd that captures a crash dump will live and execute
 in a reserved area of memory.  It needs to know which memory regions
 are valid, and it needs to know small things like the final register
 state of each cpu. 

Exactly.

Please let me clarify what you are going to.
1) standard kernel: reserve a small contigous area for a dump kernel
   (this is not changed as the current code)
2) standard kernel: export the information of valid physical memory
   regions. (/proc/iomem or /proc/cpumem etc.)
3) kexec (system call?): store the information of valid physical memory
   regions as ELF program header to the reserved area (mentioned 1)).
4) standard kernel: when a panic occur, append (ex.) the register
   information as ELF note after the memory information (if necessary).
   and jump new kernel
5) dump kernel: export all valid physical memory (and saved register
   information) to the user. (as /dev/oldmem /proc/vmcore ?)

Is this correct ?  one question: how the dump kernel know the saved
area of ELF headers ?

one more question: I don't understand what the 640K backup area is. 
Please let me know why it is necessary.

Thanks.
-- 
Itsuro ODA [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Koichi Suzuki

Itsuro Oda wrote:
Hi,
I can't understand why ELF format is necessary.
I think the only necessary information is "what physical address 
regions are valid to read". This information is necessary for any
sort of dump tools. (and must get it while the system is normal.)
The Eric's /proc/cpumem idea sounds nice to me. 

I agree.  Format conversion should be done in healthy system separately 
and we should restrict what to do while taking the dump as few as 
possible.  Conversion from just memory image to crash/lcrash format will 
be very useful to use existing tools and experiences.   I already have 
such tool and (if my administration allows) I can make such tool open. 
Let me do some paperwork.

Koichi Suzuki
NTT DATA Intellilink
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Itsuro Oda

Hi,

I don't like calling crash_kexec() directly in (ex.) panic().
It should be call_dump_hook() (or something like this).

I think the necessary modifications of the kernel is only:
- insert the hooks that calls a dump function when crash occur
- binding interface that binds a dump function to the hook
  (like register_dump_hook())
- supply the information of valid physical address regions
(- maybe some existent functions and variables need to be exported ?)

I think this makes any sort of dump functions can be implemented
as a kernel module. I don't think it is best way that the "kexec based 
crashdump" is built in the kernel.

Thanks.

On 01 Feb 2005 02:06:42 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> Koichi Suzuki <[EMAIL PROTECTED]> writes:
> 
> > Hook in panic code is very good idea and is useful in various scenes. It 
> > could
> > be used to kick RAM dump code, obviously, and also kick the code to initiate
> > failover, etc.   Various use could be possible so I believe that this hook
> > should be prepared for wider use.
> 
> It is.  Basically it is the normal kexec interface that allows you to
> boot another kernel.  With a few restrictions that should keep it as
> reliable as possible when the kernel has not shut itself down cleanly.
> 
> The hardest case is to do a useful system core dump.  As that requires
> looking at what has gone before.  For the rest if you can do it
> with a kernel and a initramfs you are in good shape.
> 
> There seems to be a significant amount of interest in the full
> system core dump case so that is what the work is concentrating
> on.
> 
> Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Itsuro ODA <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Koichi Suzuki

[EMAIL PROTECTED] wrote:
Koichi Suzuki <[EMAIL PROTECTED]> writes:

Hook in panic code is very good idea and is useful in various scenes. It could
be used to kick RAM dump code, obviously, and also kick the code to initiate
failover, etc.   Various use could be possible so I believe that this hook
should be prepared for wider use.

It is.  Basically it is the normal kexec interface that allows you to
boot another kernel.  With a few restrictions that should keep it as
reliable as possible when the kernel has not shut itself down cleanly.
The hardest case is to do a useful system core dump.  As that requires
looking at what has gone before.  For the rest if you can do it
with a kernel and a initramfs you are in good shape.
There seems to be a significant amount of interest in the full
system core dump case so that is what the work is concentrating
on.
Eric
I meant with kexec and dump hook, there could be many more things can be 
done in addition to full core dump.  Initiating failover to other node 
will be one example.   Starting with this hook, there must be many good 
ideas.   So my idea is to make this hook general purpose, not for 
specific core dump tool.

Koichi Suzuki
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Itsuro Oda

Hi,

I can't understand why ELF format is necessary.

I think the only necessary information is "what physical address 
regions are valid to read". This information is necessary for any
sort of dump tools. (and must get it while the system is normal.)
The Eric's /proc/cpumem idea sounds nice to me. 

-- 
Itsuro ODA <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Eric W. Biederman

Vivek Goyal <[EMAIL PROTECTED]> writes:

> Well, trying to put the already discussed ideas together.  I was
> planning to work on following design. Please comment.
> 
> Crashed Kernel <-->Capture Kernel(or User Space) Interface:
> --
> 
> The whole idea is that Crash image is represented in ELF Core format.
> These ELF Headers are prepared by kexec-tools user space and put in one
> segment. Address of start of image is passed to the capture kernel(or
> user space) using one command line (eg. crashimage=). Now either kernel
> space or user space can parse the elf headers and extract required
> information and export final kernel elf core image.

Sounds sane.  We need to make certain there is a checksum of that
region but putting it in a separate segment should ensure that.

I also think we need to look at the name crashimage= and see if we
can find something more descriptive.  But that is minor.  Possibly
elfcorehdr=  We have a little while to think about that one before we
are stuck.

> > [EMAIL PROTECTED] wrote: 
> > If we were using an ELF header I would include one PT_NOTE program 
> > header per cpu (Giving each cpu it's own area to mess around in).
> > And I would use one PT_LOAD segment per possible memory zone.
> > So in the worst case (current sgi altix) (MAX_NUMNODES=256,
> > MAX_NR_ZONES=3, MAX_NR_CPUS=1024) 256*3+1024 = 1792 program
> > headers.   At 56 bytes per 64bit program header that is 100352 bytes
> > or 98KiB.  A little bit expensive.  A tuned data structure with
> > 64bit base and size would only consume 1792*16 = 28672 or 28KiB.
> 
> If I prepare One elf header for each physical contiguous memory area (as
> obtained from /proc/iomem) instead of per zone, then number of elf
> headers will come down significantly. 

A clarification on terminology we are talking about struct Elf64_Phdr
here.  There is only one Elf header.  That seems to be clear farther
down.

> I don't have any idea on number of
> actual physically contiguous regions present per machine, but roughly
> assuming it to be 1 per node, it will lead to 256 + 1024 = 1280 program
> headers.At 56 bytes per 64 bit program header this will amount to 70KB. 
> 
> This is worst case estimate and on lower end machines this will require
> much less a space. On machines as big as 1024 cpus, this should not be a
> concern, as big machines come with big RAMs.

Agreed.  Size is not the primary issue.  There is some clear waste
but that is a secondary concern.  Not performing a 1-1 mapping
to the kernel data structures also seems to be a win, as the concepts
are noticeably different.

> Eric, do you still think that ELF headers are inappropriate to be passed
> across interface boundary.

I have serious concerns about the kernel generating the ELF headers
and only delivering them after the kernel has crashed.  Because
then we run into questions of what information can be trusted.  If we
avoid that issue I am not too concerned.

> ELF headers can be prepared by kexec-tools in advance and put into one
> of the data segments. This requires following information to be
> available to user space.

For a first round doing it in user space sounds sane.  Obtaining
the information at the time of load is much more robust.

> - Starting address of space reserved by kernel for notes section
> (crash_notes[]). Probably can be obtained from /proc/kallsysms?

At least for a start.

> - NR_CPUS. May be sysconf(_SC_NPROCESSORS_CONF) should be
> sufficient.

Either that or /proc/cpuinfo.  But the sysconf approach looks more
robust at this point.

> - Size of memory reserved per cpu. No clue how to get that? Any
> suggestions? 
>   May be hard-coding like 1K area per cpu should be to address the
>   future needs ?

The nice thing about doing this in user space is that we can hack
something together and get each side of the interface sorted
out independently.  i.e. We can hard code it for now.  Sort out
the users and then come back and make certain we have the information
exported cleanly. 1K per cpu currently matches the kernel code so
it is a good place to start :)

It does look like getting the size of each array element is a problem,
so the current kernel code certainly needs to be revisited.  And
there are quite a few other things pieces of how we are obtaining
the information that can be fixed as well.

> Regarding Backup Region
> ---
> 
> - Kexec user space does the reservation for backup region segment.
> - Purgatory copies the backup data to backup region. (Already
> implemented)
> - A separate elf header is prepared to represent backed up memory
> region. And "offset" field of this program header can contain the actual
> physical address where backup contents are stored. 

I like that.  I was thinking a virtual versus physical address
separation.  But using the offset field is much more appropriate,
and it leaves us the potential of doing something nice like

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Vivek Goyal

Well, trying to put the already discussed ideas together.  I was
planning to work on following design. Please comment.

Crashed Kernel <-->Capture Kernel(or User Space) Interface:
--

The whole idea is that Crash image is represented in ELF Core format.
These ELF Headers are prepared by kexec-tools user space and put in one
segment. Address of start of image is passed to the capture kernel(or
user space) using one command line (eg. crashimage=). Now either kernel
space or user space can parse the elf headers and extract required
information and export final kernel elf core image.

> [EMAIL PROTECTED] wrote: 
> If we were using an ELF header I would include one PT_NOTE program 
> header per cpu (Giving each cpu it's own area to mess around in).
> And I would use one PT_LOAD segment per possible memory zone.
> So in the worst case (current sgi altix) (MAX_NUMNODES=256,
> MAX_NR_ZONES=3, MAX_NR_CPUS=1024) 256*3+1024 = 1792 program
> headers.   At 56 bytes per 64bit program header that is 100352 bytes
> or 98KiB.  A little bit expensive.  A tuned data structure with
> 64bit base and size would only consume 1792*16 = 28672 or 28KiB.

If I prepare One elf header for each physical contiguous memory area (as
obtained from /proc/iomem) instead of per zone, then number of elf
headers will come down significantly. I don't have any idea on number of
actual physically contiguous regions present per machine, but roughly
assuming it to be 1 per node, it will lead to 256 + 1024 = 1280 program
headers.At 56 bytes per 64 bit program header this will amount to 70KB. 

This is worst case estimate and on lower end machines this will require
much less a space. On machines as big as 1024 cpus, this should not be a
concern, as big machines come with big RAMs.

Eric, do you still think that ELF headers are inappropriate to be passed
across interface boundary.

ELF headers can be prepared by kexec-tools in advance and put into one
of the data segments. This requires following information to be
available to user space.

- Starting address of space reserved by kernel for notes section
(crash_notes[]). Probably can be obtained from /proc/kallsysms?

- NR_CPUS. May be sysconf(_SC_NPROCESSORS_CONF) should be sufficient.

- Size of memory reserved per cpu. No clue how to get that? Any
suggestions? 
May be hard-coding like 1K area per cpu should be to address the
future needs ?

Regarding Backup Region
---

- Kexec user space does the reservation for backup region segment.
- Purgatory copies the backup data to backup region. (Already
implemented)
- A separate elf header is prepared to represent backed up memory
region. And "offset" field of this program header can contain the actual
physical address where backup contents are stored. 

Thanks
Vivek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Eric W. Biederman

Koichi Suzuki <[EMAIL PROTECTED]> writes:

> Hook in panic code is very good idea and is useful in various scenes. It could
> be used to kick RAM dump code, obviously, and also kick the code to initiate
> failover, etc.   Various use could be possible so I believe that this hook
> should be prepared for wider use.

It is.  Basically it is the normal kexec interface that allows you to
boot another kernel.  With a few restrictions that should keep it as
reliable as possible when the kernel has not shut itself down cleanly.

The hardest case is to do a useful system core dump.  As that requires
looking at what has gone before.  For the rest if you can do it
with a kernel and a initramfs you are in good shape.

There seems to be a significant amount of interest in the full
system core dump case so that is what the work is concentrating
on.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Koichi Suzuki

Hook in panic code is very good idea and is useful in various scenes. 
It could be used to kick RAM dump code, obviously, and also kick the 
code to initiate failover, etc.   Various use could be possible so I 
believe that this hook should be prepared for wider use.

--
Koichi Suzuki
NTT DATA Intellilink Corp.
[EMAIL PROTECTED] wrote:
For the guys on ppc, and other architectures that have all of their
cpu memory behind an iommu.  I propose we create a /proc/cpumem
which is the subset of /proc/iomem that deals with RAM.  In any event
as something like that is straight forward to implement I will
assume the existence of the functionality and we can attack the
details when we do the merge the first of those architectures
into the kernel.
Vivek Goyal <[EMAIL PROTECTED]> writes:

Hi Eric,
It looks like we are looking at things a little differently. I
see a portion of the picture in your mind, but obviously not 
entirely.

Perhaps, we need to step back and iron out in specific terms what 
the interface between the two kernels should be in the crash dump
case, and the distribution of responsibility between kernel, user space
and the user. 

[BTW, the patch was intended as a step in development up for
comment early enough to be able to get agreement on the interface
and think issues through to more completeness before going 
too far. Sorry, if that wasn't apparent.]

It wasn't quite, and the fact that Andrew picked it up added
to the confusion.

When you say "evil intermingling", I'm guessing you mean the
"crashbackup=" boot parameter ? If so, then yes, I agree it'd
be nice to find a way around it that doesn't push hardcoding
elsewhere.

I believe there are some alternatives to crashbackup= in the 
crashdump capture kernel.  But as long as that code is running
in the kernel we can't do a lot better.

However for the primary kernel it has no need to know that we
even have a backup region, nor does it need to know about the
size of the backup region.  That can all be handled with the single
reservation, we have now.  

/sbin/kexec which makes the backup needs to know about it and it needs
to pass that information on.  But the primary kernel does not. 

The largest reason I am sensitive to this issue is that if you are not
booting an SMP kernel I don't believe we need a backup region on x86
at all.  If we can remove that dependency I want the freedom to do
that without having to modify the primary kernel.  Or if we discover
we need to preserve other things like the ACPI, mp and pirq tables
I don't want to require patching the kernel just so I can copy those
and preserve them.
 

Let me explain the interface/approach I was looking at.
1.First kernel reserves some area of memory for crash/capture kernel as
specified by [EMAIL PROTECTED] boot time parameter.
2.First kernel marks the top 640K of this area as backup area. (If
architecture needs it.) This is sort of a hardcoding and probably this
space reservation can be managed from user space as well as mentioned by
you in this mail below.
3. Location of backup region is exported through /proc/iomem which can
be read by user space utility to pass this information to purgatory code
to determine where to copy the first 640K.

And 1-3 can be done in /sbin/kexec.  And if it is done there we can
increase our freedom of implementation in the crashdump capture process
quite a bit.

Note that we do not make any additional reservation for the 
backup region. We carve this out from the top of the already 
reserved region and export it through /proc/iomem so that 
the user space code and the capture kernel code need not 
make any assumptions about where this region is located.

4. Once the capture kernel boots, it needs to know the location of
backup region for two purposes.
   
a. It should not overwrite the backup region.

b. There needs to be a way for the capture tool to access the original
  contents of the backed up region
Boot time parameter [EMAIL PROTECTED] has been provided to pass this
information to capture kernel. This parameter is valid only for capture
kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.

But that is not what you implemented.  crashbackup= was an alternative
to the carving out of 640K in parts 1-3.

What is wrong with user space doing all of the extra space
reservation?
Just for clarity, are you suggesting kexec-tools creating an additional
segment for the backup region and pass the information to kernel.

Yes, having kexec create a bss segment for the backup region would
be a good idea.  It will keep us from stomping on the kernel trampoline
(think the identity mapped x86_64 page tables here) by accident.
 

There is no problem in doing reservation from user space except
one. How does the user and in-turn capture kernel come to know the
location of backup region, assuming that the user is going to provide
the exactmap for capture kernel to boot into.
Just a thought, is it  a good idea for kexec-tools to be creating and
passing

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Koichi Suzuki

Hook in panic code is very good idea and is useful in various scenes. 
It could be used to kick RAM dump code, obviously, and also kick the 
code to initiate failover, etc.   Various use could be possible so I 
believe that this hook should be prepared for wider use.

--
Koichi Suzuki
NTT DATA Intellilink Corp.
[EMAIL PROTECTED] wrote:
For the guys on ppc, and other architectures that have all of their
cpu memory behind an iommu.  I propose we create a /proc/cpumem
which is the subset of /proc/iomem that deals with RAM.  In any event
as something like that is straight forward to implement I will
assume the existence of the functionality and we can attack the
details when we do the merge the first of those architectures
into the kernel.
Vivek Goyal [EMAIL PROTECTED] writes:

Hi Eric,
It looks like we are looking at things a little differently. I
see a portion of the picture in your mind, but obviously not 
entirely.

Perhaps, we need to step back and iron out in specific terms what 
the interface between the two kernels should be in the crash dump
case, and the distribution of responsibility between kernel, user space
and the user. 

[BTW, the patch was intended as a step in development up for
comment early enough to be able to get agreement on the interface
and think issues through to more completeness before going 
too far. Sorry, if that wasn't apparent.]

It wasn't quite, and the fact that Andrew picked it up added
to the confusion.

When you say evil intermingling, I'm guessing you mean the
crashbackup= boot parameter ? If so, then yes, I agree it'd
be nice to find a way around it that doesn't push hardcoding
elsewhere.

I believe there are some alternatives to crashbackup= in the 
crashdump capture kernel.  But as long as that code is running
in the kernel we can't do a lot better.

However for the primary kernel it has no need to know that we
even have a backup region, nor does it need to know about the
size of the backup region.  That can all be handled with the single
reservation, we have now.  

/sbin/kexec which makes the backup needs to know about it and it needs
to pass that information on.  But the primary kernel does not. 

The largest reason I am sensitive to this issue is that if you are not
booting an SMP kernel I don't believe we need a backup region on x86
at all.  If we can remove that dependency I want the freedom to do
that without having to modify the primary kernel.  Or if we discover
we need to preserve other things like the ACPI, mp and pirq tables
I don't want to require patching the kernel just so I can copy those
and preserve them.
 

Let me explain the interface/approach I was looking at.
1.First kernel reserves some area of memory for crash/capture kernel as
specified by [EMAIL PROTECTED] boot time parameter.
2.First kernel marks the top 640K of this area as backup area. (If
architecture needs it.) This is sort of a hardcoding and probably this
space reservation can be managed from user space as well as mentioned by
you in this mail below.
3. Location of backup region is exported through /proc/iomem which can
be read by user space utility to pass this information to purgatory code
to determine where to copy the first 640K.

And 1-3 can be done in /sbin/kexec.  And if it is done there we can
increase our freedom of implementation in the crashdump capture process
quite a bit.

Note that we do not make any additional reservation for the 
backup region. We carve this out from the top of the already 
reserved region and export it through /proc/iomem so that 
the user space code and the capture kernel code need not 
make any assumptions about where this region is located.

4. Once the capture kernel boots, it needs to know the location of
backup region for two purposes.
   
a. It should not overwrite the backup region.

b. There needs to be a way for the capture tool to access the original
  contents of the backed up region
Boot time parameter [EMAIL PROTECTED] has been provided to pass this
information to capture kernel. This parameter is valid only for capture
kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.

But that is not what you implemented.  crashbackup= was an alternative
to the carving out of 640K in parts 1-3.

What is wrong with user space doing all of the extra space
reservation?
Just for clarity, are you suggesting kexec-tools creating an additional
segment for the backup region and pass the information to kernel.

Yes, having kexec create a bss segment for the backup region would
be a good idea.  It will keep us from stomping on the kernel trampoline
(think the identity mapped x86_64 page tables here) by accident.
 

There is no problem in doing reservation from user space except
one. How does the user and in-turn capture kernel come to know the
location of backup region, assuming that the user is going to provide
the exactmap for capture kernel to boot into.
Just a thought, is it  a good idea for kexec-tools to be creating and
passing memmap

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Eric W. Biederman

Koichi Suzuki [EMAIL PROTECTED] writes:

 Hook in panic code is very good idea and is useful in various scenes. It could
 be used to kick RAM dump code, obviously, and also kick the code to initiate
 failover, etc.   Various use could be possible so I believe that this hook
 should be prepared for wider use.

It is.  Basically it is the normal kexec interface that allows you to
boot another kernel.  With a few restrictions that should keep it as
reliable as possible when the kernel has not shut itself down cleanly.

The hardest case is to do a useful system core dump.  As that requires
looking at what has gone before.  For the rest if you can do it
with a kernel and a initramfs you are in good shape.

There seems to be a significant amount of interest in the full
system core dump case so that is what the work is concentrating
on.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Vivek Goyal


Well, trying to put the already discussed ideas together.  I was
planning to work on following design. Please comment.

Crashed Kernel --Capture Kernel(or User Space) Interface:
--

The whole idea is that Crash image is represented in ELF Core format.
These ELF Headers are prepared by kexec-tools user space and put in one
segment. Address of start of image is passed to the capture kernel(or
user space) using one command line (eg. crashimage=). Now either kernel
space or user space can parse the elf headers and extract required
information and export final kernel elf core image.


 [EMAIL PROTECTED] wrote: 
 If we were using an ELF header I would include one PT_NOTE program 
 header per cpu (Giving each cpu it's own area to mess around in).
 And I would use one PT_LOAD segment per possible memory zone.
 So in the worst case (current sgi altix) (MAX_NUMNODES=256,
 MAX_NR_ZONES=3, MAX_NR_CPUS=1024) 256*3+1024 = 1792 program
 headers.   At 56 bytes per 64bit program header that is 100352 bytes
 or 98KiB.  A little bit expensive.  A tuned data structure with
 64bit base and size would only consume 1792*16 = 28672 or 28KiB.

If I prepare One elf header for each physical contiguous memory area (as
obtained from /proc/iomem) instead of per zone, then number of elf
headers will come down significantly. I don't have any idea on number of
actual physically contiguous regions present per machine, but roughly
assuming it to be 1 per node, it will lead to 256 + 1024 = 1280 program
headers.At 56 bytes per 64 bit program header this will amount to 70KB. 

This is worst case estimate and on lower end machines this will require
much less a space. On machines as big as 1024 cpus, this should not be a
concern, as big machines come with big RAMs.

Eric, do you still think that ELF headers are inappropriate to be passed
across interface boundary.

ELF headers can be prepared by kexec-tools in advance and put into one
of the data segments. This requires following information to be
available to user space.

- Starting address of space reserved by kernel for notes section
(crash_notes[]). Probably can be obtained from /proc/kallsysms?

- NR_CPUS. May be sysconf(_SC_NPROCESSORS_CONF) should be sufficient.

- Size of memory reserved per cpu. No clue how to get that? Any
suggestions? 
May be hard-coding like 1K area per cpu should be to address the
future needs ?


Regarding Backup Region
---

- Kexec user space does the reservation for backup region segment.
- Purgatory copies the backup data to backup region. (Already
implemented)
- A separate elf header is prepared to represent backed up memory
region. And offset field of this program header can contain the actual
physical address where backup contents are stored. 


Thanks
Vivek



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Eric W. Biederman

Vivek Goyal [EMAIL PROTECTED] writes:

 Well, trying to put the already discussed ideas together.  I was
 planning to work on following design. Please comment.
 
 Crashed Kernel --Capture Kernel(or User Space) Interface:
 --
 
 The whole idea is that Crash image is represented in ELF Core format.
 These ELF Headers are prepared by kexec-tools user space and put in one
 segment. Address of start of image is passed to the capture kernel(or
 user space) using one command line (eg. crashimage=). Now either kernel
 space or user space can parse the elf headers and extract required
 information and export final kernel elf core image.

Sounds sane.  We need to make certain there is a checksum of that
region but putting it in a separate segment should ensure that.

I also think we need to look at the name crashimage= and see if we
can find something more descriptive.  But that is minor.  Possibly
elfcorehdr=  We have a little while to think about that one before we
are stuck.

  [EMAIL PROTECTED] wrote: 
  If we were using an ELF header I would include one PT_NOTE program 
  header per cpu (Giving each cpu it's own area to mess around in).
  And I would use one PT_LOAD segment per possible memory zone.
  So in the worst case (current sgi altix) (MAX_NUMNODES=256,
  MAX_NR_ZONES=3, MAX_NR_CPUS=1024) 256*3+1024 = 1792 program
  headers.   At 56 bytes per 64bit program header that is 100352 bytes
  or 98KiB.  A little bit expensive.  A tuned data structure with
  64bit base and size would only consume 1792*16 = 28672 or 28KiB.
 
 If I prepare One elf header for each physical contiguous memory area (as
 obtained from /proc/iomem) instead of per zone, then number of elf
 headers will come down significantly. 

A clarification on terminology we are talking about struct Elf64_Phdr
here.  There is only one Elf header.  That seems to be clear farther
down.

 I don't have any idea on number of
 actual physically contiguous regions present per machine, but roughly
 assuming it to be 1 per node, it will lead to 256 + 1024 = 1280 program
 headers.At 56 bytes per 64 bit program header this will amount to 70KB. 
 
 This is worst case estimate and on lower end machines this will require
 much less a space. On machines as big as 1024 cpus, this should not be a
 concern, as big machines come with big RAMs.

Agreed.  Size is not the primary issue.  There is some clear waste
but that is a secondary concern.  Not performing a 1-1 mapping
to the kernel data structures also seems to be a win, as the concepts
are noticeably different.
 
 Eric, do you still think that ELF headers are inappropriate to be passed
 across interface boundary.

I have serious concerns about the kernel generating the ELF headers
and only delivering them after the kernel has crashed.  Because
then we run into questions of what information can be trusted.  If we
avoid that issue I am not too concerned.
 
 ELF headers can be prepared by kexec-tools in advance and put into one
 of the data segments. This requires following information to be
 available to user space.

For a first round doing it in user space sounds sane.  Obtaining
the information at the time of load is much more robust.

 - Starting address of space reserved by kernel for notes section
 (crash_notes[]). Probably can be obtained from /proc/kallsysms?

At least for a start.
 
 - NR_CPUS. May be sysconf(_SC_NPROCESSORS_CONF) should be
 sufficient.

Either that or /proc/cpuinfo.  But the sysconf approach looks more
robust at this point.

 - Size of memory reserved per cpu. No clue how to get that? Any
 suggestions? 
   May be hard-coding like 1K area per cpu should be to address the
   future needs ?

The nice thing about doing this in user space is that we can hack
something together and get each side of the interface sorted
out independently.  i.e. We can hard code it for now.  Sort out
the users and then come back and make certain we have the information
exported cleanly. 1K per cpu currently matches the kernel code so
it is a good place to start :)

It does look like getting the size of each array element is a problem,
so the current kernel code certainly needs to be revisited.  And
there are quite a few other things pieces of how we are obtaining
the information that can be fixed as well.

 Regarding Backup Region
 ---
 
 - Kexec user space does the reservation for backup region segment.
 - Purgatory copies the backup data to backup region. (Already
 implemented)
 - A separate elf header is prepared to represent backed up memory
 region. And offset field of this program header can contain the actual
 physical address where backup contents are stored. 

I like that.  I was thinking a virtual versus physical address
separation.  But using the offset field is much more appropriate,
and it leaves us the potential of doing something nice like specifying
the kernels virtual address later on.  Looking exclusively at the

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Itsuro Oda

Hi,

I can't understand why ELF format is necessary.

I think the only necessary information is what physical address 
regions are valid to read. This information is necessary for any
sort of dump tools. (and must get it while the system is normal.)
The Eric's /proc/cpumem idea sounds nice to me. 

-- 
Itsuro ODA [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Koichi Suzuki

[EMAIL PROTECTED] wrote:
Koichi Suzuki [EMAIL PROTECTED] writes:

Hook in panic code is very good idea and is useful in various scenes. It could
be used to kick RAM dump code, obviously, and also kick the code to initiate
failover, etc.   Various use could be possible so I believe that this hook
should be prepared for wider use.

It is.  Basically it is the normal kexec interface that allows you to
boot another kernel.  With a few restrictions that should keep it as
reliable as possible when the kernel has not shut itself down cleanly.
The hardest case is to do a useful system core dump.  As that requires
looking at what has gone before.  For the rest if you can do it
with a kernel and a initramfs you are in good shape.
There seems to be a significant amount of interest in the full
system core dump case so that is what the work is concentrating
on.
Eric
I meant with kexec and dump hook, there could be many more things can be 
done in addition to full core dump.  Initiating failover to other node 
will be one example.   Starting with this hook, there must be many good 
ideas.   So my idea is to make this hook general purpose, not for 
specific core dump tool.

Koichi Suzuki
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Itsuro Oda

Hi,

I don't like calling crash_kexec() directly in (ex.) panic().
It should be call_dump_hook() (or something like this).

I think the necessary modifications of the kernel is only:
- insert the hooks that calls a dump function when crash occur
- binding interface that binds a dump function to the hook
  (like register_dump_hook())
- supply the information of valid physical address regions
(- maybe some existent functions and variables need to be exported ?)

I think this makes any sort of dump functions can be implemented
as a kernel module. I don't think it is best way that the kexec based 
crashdump is built in the kernel.

Thanks.

On 01 Feb 2005 02:06:42 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

 Koichi Suzuki [EMAIL PROTECTED] writes:
 
  Hook in panic code is very good idea and is useful in various scenes. It 
  could
  be used to kick RAM dump code, obviously, and also kick the code to initiate
  failover, etc.   Various use could be possible so I believe that this hook
  should be prepared for wider use.
 
 It is.  Basically it is the normal kexec interface that allows you to
 boot another kernel.  With a few restrictions that should keep it as
 reliable as possible when the kernel has not shut itself down cleanly.
 
 The hardest case is to do a useful system core dump.  As that requires
 looking at what has gone before.  For the rest if you can do it
 with a kernel and a initramfs you are in good shape.
 
 There seems to be a significant amount of interest in the full
 system core dump case so that is what the work is concentrating
 on.
 
 Eric
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Itsuro ODA [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-02-01 Thread Koichi Suzuki

Itsuro Oda wrote:
Hi,
I can't understand why ELF format is necessary.
I think the only necessary information is what physical address 
regions are valid to read. This information is necessary for any
sort of dump tools. (and must get it while the system is normal.)
The Eric's /proc/cpumem idea sounds nice to me. 

I agree.  Format conversion should be done in healthy system separately 
and we should restrict what to do while taking the dump as few as 
possible.  Conversion from just memory image to crash/lcrash format will 
be very useful to use existing tools and experiences.   I already have 
such tool and (if my administration allows) I can make such tool open. 
Let me do some paperwork.

Koichi Suzuki
NTT DATA Intellilink
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-28 Thread Eric W. Biederman

Vivek Goyal <[EMAIL PROTECTED]> writes:

> Hi Eric,
> 
> 
> However for the primary kernel it has no need to know that we
> > even have a backup region, nor does it need to know about the
> > size of the backup region.  That can all be handled with the single
> > reservation, we have now.  
> > 
> > /sbin/kexec which makes the backup needs to know about it and it needs
> > to pass that information on.  But the primary kernel does not. 
> 
> 
> Agreed. Primary kernel need not to be aware of backup region and 
> reservation of this region can be better managed from user space.

Good.  It sound like we are pretty much back on the same page then.

> > > Boot time parameter [EMAIL PROTECTED] has been provided to pass this
> > > information to capture kernel. This parameter is valid only for capture
> > > kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.
> > 
> > But that is not what you implemented.  crashbackup= was an alternative
> > to the carving out of 640K in parts 1-3.
> 
> 
> Not really. crashbackup= is not being used for carving out backup
> region. It is just used for passing the address of this region to second
> kernel. That's why it has been put under CONFIG_CRASH_DUMP.

Ok I missed a piece in your patch.  You have crashdumpk_res, and
crashbackup_start, crashbackup_end.   And I missed the fact that
they were different variables as they dealt with the same concept.

So that patch actually should have been three patches.  The
one line bug fix.  The crashdumpk_res bit (which I strongly
object to) and the crashbackup_start/_end bit.  The fact
that all three were in the same patch is a reviewing and maintenance
pain.

Please in the future do not include code that runs in the primary
kernel and crashdump specific code that runs in the capture kernel in
the same patch.

> This looks good. So memory regions are parsed from /proc/iomem and this
> information is put in one data segment and stored in reserved region
> during panic kernel load time.
> 
> But I am unable to co-relate as to how the capture tool (even if its all
> in user space) gets to know the address of this segment (or for that
> matter, the bss segment created for backup region). Am I missing
> something obvious.

There are a lot of choices at that point.
Place the data in the on the kernel command line, and pick
it up from /proc/cmdline.
Place the data in a file on the initramfs.
Place the data in a user space data segment.

> > However as long as we gracefully handle the interface
> > between the primary kernel and the capture kernel we can
> > switch mechanisms for actually taking the crash dump,
> > kernel based or user space as seems most sane.
> 
> 
> This seems to be a good direction.

Cool.

One of the ideas worth exploring is to see about stabilizing the
other side of this interface as well.   That is we should explore
providing a fixed interface coming out of purgatory.ro to the new
kernel and it's user space (i.e. the ELF header like thing).  I think
we are quite close to that point already.  And this goes back to your
question of how do we let the capture kernel/user space know where to
look.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-28 Thread Vivek Goyal

Hi Eric,


However for the primary kernel it has no need to know that we
> even have a backup region, nor does it need to know about the
> size of the backup region.  That can all be handled with the single
> reservation, we have now.  
> 
> /sbin/kexec which makes the backup needs to know about it and it needs
> to pass that information on.  But the primary kernel does not. 


Agreed. Primary kernel need not to be aware of backup region and 
reservation of this region can be better managed from user space.


> > Boot time parameter [EMAIL PROTECTED] has been provided to pass this
> > information to capture kernel. This parameter is valid only for capture
> > kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.
> 
> But that is not what you implemented.  crashbackup= was an alternative
> to the carving out of 640K in parts 1-3.


Not really. crashbackup= is not being used for carving out backup
region. It is just used for passing the address of this region to second
kernel. That's why it has been put under CONFIG_CRASH_DUMP.


> > > What is wrong with user space doing all of the extra space
> > > reservation?
> > 
> > Just for clarity, are you suggesting kexec-tools creating an additional
> > segment for the backup region and pass the information to kernel.
> 
> Yes, having kexec create a bss segment for the backup region would
> be a good idea.  It will keep us from stomping on the kernel trampoline
> (think the identity mapped x86_64 page tables here) by accident.
>  
> > There is no problem in doing reservation from user space except
> > one. How does the user and in-turn capture kernel come to know the
> > location of backup region, assuming that the user is going to provide
> > the exactmap for capture kernel to boot into.
> >
> > Just a thought, is it  a good idea for kexec-tools to be creating and
> > passing memmap parameters doing appropriate adjustment for backup
> > region.
> 
> Exactly.  Having /sbin/kexec do this instead of the user doing this
> manually is a much simpler solution than we have now.
> 
> > I had another question. How is the starting location of elf headers 
> > communicated to capture tool? Is parameter segment a good idea? or 
> > some hardcoding? 
> 
> I recognize the need for that information.  But I do not recognize
> the need for it to be an ELF header (we do need something
> conceptually close).  If we don't have regions of the memory map
> appearing and disappearing dynamically we can get this information
> from /proc/iomem, before the crash and store it in one of the data
> segments that we checksum.
>  


This looks good. So memory regions are parsed from /proc/iomem and this
information is put in one data segment and stored in reserved region
during panic kernel load time.

But I am unable to co-relate as to how the capture tool (even if its all
in user space) gets to know the address of this segment (or for that
matter, the bss segment created for backup region). Am I missing
something obvious.


> The direction I would take if I was to take to implementing this 
> the crashdump functionality is something different.
> 
> Instead of patching crashdump functionality into the kernel,
> I would create a subdirectory in kexec-tools called crashdump
> and put in the source for a user-space program that could run as
> init.  In addition I would but in the code to generate and
> initramfs cpio.gz archive of that program.  And I would build
> the program against uclibc, klibc or one of the other libc variants
> that actually allows building static binaries.  Unless something
> has changed recently glibc does not all for truly static binaries
> as it dynamically open /lib/libnss*
> 
> Given the pain of building against an external library that is not
> widely distributed I would probably take a snapshot of the code and
> place it in crashdump/libc in the kexec-tools source.  Taking a
> snapshot of frequently used libraries is commonly done with the
> gnu toolchain and is wonderfully effective in resolving painful
> dependencies.
> 
> The crashdump /init would just mmap /dev/mem to read the raw memory.
> >From there it would generate the core file.
> 
> When kexec'ing a panic kernel I would simply have /sbin/kexec
> unconditionally load that cpio.gz as the initrd and things
> would work.
> 
> The large advantage of doing all of this in user space
> is that it moves all of the crashdump policy into user space
> and into one source tree, for simplified maintenance.
> 
> However as long as we gracefully handle the interface
> between the primary kernel and the capture kernel we can
> switch mechanisms for actually taking the crash dump,
> kernel based or user space as seems most sane.


This seems to be a good direction.


Thanks 
Vivek


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-28 Thread Vivek Goyal

Hi Eric,


However for the primary kernel it has no need to know that we
 even have a backup region, nor does it need to know about the
 size of the backup region.  That can all be handled with the single
 reservation, we have now.  
 
 /sbin/kexec which makes the backup needs to know about it and it needs
 to pass that information on.  But the primary kernel does not. 


Agreed. Primary kernel need not to be aware of backup region and 
reservation of this region can be better managed from user space.


  Boot time parameter [EMAIL PROTECTED] has been provided to pass this
  information to capture kernel. This parameter is valid only for capture
  kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.
 
 But that is not what you implemented.  crashbackup= was an alternative
 to the carving out of 640K in parts 1-3.


Not really. crashbackup= is not being used for carving out backup
region. It is just used for passing the address of this region to second
kernel. That's why it has been put under CONFIG_CRASH_DUMP.


   What is wrong with user space doing all of the extra space
   reservation?
  
  Just for clarity, are you suggesting kexec-tools creating an additional
  segment for the backup region and pass the information to kernel.
 
 Yes, having kexec create a bss segment for the backup region would
 be a good idea.  It will keep us from stomping on the kernel trampoline
 (think the identity mapped x86_64 page tables here) by accident.
  
  There is no problem in doing reservation from user space except
  one. How does the user and in-turn capture kernel come to know the
  location of backup region, assuming that the user is going to provide
  the exactmap for capture kernel to boot into.
 
  Just a thought, is it  a good idea for kexec-tools to be creating and
  passing memmap parameters doing appropriate adjustment for backup
  region.
 
 Exactly.  Having /sbin/kexec do this instead of the user doing this
 manually is a much simpler solution than we have now.
 
  I had another question. How is the starting location of elf headers 
  communicated to capture tool? Is parameter segment a good idea? or 
  some hardcoding? 
 
 I recognize the need for that information.  But I do not recognize
 the need for it to be an ELF header (we do need something
 conceptually close).  If we don't have regions of the memory map
 appearing and disappearing dynamically we can get this information
 from /proc/iomem, before the crash and store it in one of the data
 segments that we checksum.
  


This looks good. So memory regions are parsed from /proc/iomem and this
information is put in one data segment and stored in reserved region
during panic kernel load time.

But I am unable to co-relate as to how the capture tool (even if its all
in user space) gets to know the address of this segment (or for that
matter, the bss segment created for backup region). Am I missing
something obvious.


 The direction I would take if I was to take to implementing this 
 the crashdump functionality is something different.
 
 Instead of patching crashdump functionality into the kernel,
 I would create a subdirectory in kexec-tools called crashdump
 and put in the source for a user-space program that could run as
 init.  In addition I would but in the code to generate and
 initramfs cpio.gz archive of that program.  And I would build
 the program against uclibc, klibc or one of the other libc variants
 that actually allows building static binaries.  Unless something
 has changed recently glibc does not all for truly static binaries
 as it dynamically open /lib/libnss*
 
 Given the pain of building against an external library that is not
 widely distributed I would probably take a snapshot of the code and
 place it in crashdump/libc in the kexec-tools source.  Taking a
 snapshot of frequently used libraries is commonly done with the
 gnu toolchain and is wonderfully effective in resolving painful
 dependencies.
 
 The crashdump /init would just mmap /dev/mem to read the raw memory.
 From there it would generate the core file.
 
 When kexec'ing a panic kernel I would simply have /sbin/kexec
 unconditionally load that cpio.gz as the initrd and things
 would work.
 
 The large advantage of doing all of this in user space
 is that it moves all of the crashdump policy into user space
 and into one source tree, for simplified maintenance.
 
 However as long as we gracefully handle the interface
 between the primary kernel and the capture kernel we can
 switch mechanisms for actually taking the crash dump,
 kernel based or user space as seems most sane.


This seems to be a good direction.


Thanks 
Vivek


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-28 Thread Eric W. Biederman

Vivek Goyal [EMAIL PROTECTED] writes:

 Hi Eric,
 
 
 However for the primary kernel it has no need to know that we
  even have a backup region, nor does it need to know about the
  size of the backup region.  That can all be handled with the single
  reservation, we have now.  
  
  /sbin/kexec which makes the backup needs to know about it and it needs
  to pass that information on.  But the primary kernel does not. 
 
 
 Agreed. Primary kernel need not to be aware of backup region and 
 reservation of this region can be better managed from user space.

Good.  It sound like we are pretty much back on the same page then.

   Boot time parameter [EMAIL PROTECTED] has been provided to pass this
   information to capture kernel. This parameter is valid only for capture
   kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.
  
  But that is not what you implemented.  crashbackup= was an alternative
  to the carving out of 640K in parts 1-3.
 
 
 Not really. crashbackup= is not being used for carving out backup
 region. It is just used for passing the address of this region to second
 kernel. That's why it has been put under CONFIG_CRASH_DUMP.

Ok I missed a piece in your patch.  You have crashdumpk_res, and
crashbackup_start, crashbackup_end.   And I missed the fact that
they were different variables as they dealt with the same concept.

So that patch actually should have been three patches.  The
one line bug fix.  The crashdumpk_res bit (which I strongly
object to) and the crashbackup_start/_end bit.  The fact
that all three were in the same patch is a reviewing and maintenance
pain.

Please in the future do not include code that runs in the primary
kernel and crashdump specific code that runs in the capture kernel in
the same patch.

 This looks good. So memory regions are parsed from /proc/iomem and this
 information is put in one data segment and stored in reserved region
 during panic kernel load time.
 
 But I am unable to co-relate as to how the capture tool (even if its all
 in user space) gets to know the address of this segment (or for that
 matter, the bss segment created for backup region). Am I missing
 something obvious.

There are a lot of choices at that point.
Place the data in the on the kernel command line, and pick
it up from /proc/cmdline.
Place the data in a file on the initramfs.
Place the data in a user space data segment.


  However as long as we gracefully handle the interface
  between the primary kernel and the capture kernel we can
  switch mechanisms for actually taking the crash dump,
  kernel based or user space as seems most sane.
 
 
 This seems to be a good direction.

Cool.

One of the ideas worth exploring is to see about stabilizing the
other side of this interface as well.   That is we should explore
providing a fixed interface coming out of purgatory.ro to the new
kernel and it's user space (i.e. the ELF header like thing).  I think
we are quite close to that point already.  And this goes back to your
question of how do we let the capture kernel/user space know where to
look.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-27 Thread Eric W. Biederman

For the guys on ppc, and other architectures that have all of their
cpu memory behind an iommu.  I propose we create a /proc/cpumem
which is the subset of /proc/iomem that deals with RAM.  In any event
as something like that is straight forward to implement I will
assume the existence of the functionality and we can attack the
details when we do the merge the first of those architectures
into the kernel.

Vivek Goyal <[EMAIL PROTECTED]> writes:

> Hi Eric,
> 
> It looks like we are looking at things a little differently. I
> see a portion of the picture in your mind, but obviously not 
> entirely.
> 
> Perhaps, we need to step back and iron out in specific terms what 
> the interface between the two kernels should be in the crash dump
> case, and the distribution of responsibility between kernel, user space
> and the user. 
> 
> [BTW, the patch was intended as a step in development up for
> comment early enough to be able to get agreement on the interface
> and think issues through to more completeness before going 
> too far. Sorry, if that wasn't apparent.]

It wasn't quite, and the fact that Andrew picked it up added
to the confusion.

> When you say "evil intermingling", I'm guessing you mean the
> "crashbackup=" boot parameter ? If so, then yes, I agree it'd
> be nice to find a way around it that doesn't push hardcoding
> elsewhere.

I believe there are some alternatives to crashbackup= in the 
crashdump capture kernel.  But as long as that code is running
in the kernel we can't do a lot better.

However for the primary kernel it has no need to know that we
even have a backup region, nor does it need to know about the
size of the backup region.  That can all be handled with the single
reservation, we have now.  

/sbin/kexec which makes the backup needs to know about it and it needs
to pass that information on.  But the primary kernel does not. 

The largest reason I am sensitive to this issue is that if you are not
booting an SMP kernel I don't believe we need a backup region on x86
at all.  If we can remove that dependency I want the freedom to do
that without having to modify the primary kernel.  Or if we discover
we need to preserve other things like the ACPI, mp and pirq tables
I don't want to require patching the kernel just so I can copy those
and preserve them.

> Let me explain the interface/approach I was looking at.
> 
> 1.First kernel reserves some area of memory for crash/capture kernel as
> specified by [EMAIL PROTECTED] boot time parameter.
> 
> 2.First kernel marks the top 640K of this area as backup area. (If
> architecture needs it.) This is sort of a hardcoding and probably this
> space reservation can be managed from user space as well as mentioned by
> you in this mail below.
> 
> 3. Location of backup region is exported through /proc/iomem which can
> be read by user space utility to pass this information to purgatory code
> to determine where to copy the first 640K.

And 1-3 can be done in /sbin/kexec.  And if it is done there we can
increase our freedom of implementation in the crashdump capture process
quite a bit.

> Note that we do not make any additional reservation for the 
> backup region. We carve this out from the top of the already 
> reserved region and export it through /proc/iomem so that 
> the user space code and the capture kernel code need not 
> make any assumptions about where this region is located.
> 
> 4. Once the capture kernel boots, it needs to know the location of
> backup region for two purposes.
> 
> a. It should not overwrite the backup region.
> 
> b. There needs to be a way for the capture tool to access the original
>contents of the backed up region
> 
> Boot time parameter [EMAIL PROTECTED] has been provided to pass this
> information to capture kernel. This parameter is valid only for capture
> kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.

But that is not what you implemented.  crashbackup= was an alternative
to the carving out of 640K in parts 1-3.

> > What is wrong with user space doing all of the extra space
> > reservation?
> 
> Just for clarity, are you suggesting kexec-tools creating an additional
> segment for the backup region and pass the information to kernel.

Yes, having kexec create a bss segment for the backup region would
be a good idea.  It will keep us from stomping on the kernel trampoline
(think the identity mapped x86_64 page tables here) by accident.

> There is no problem in doing reservation from user space except
> one. How does the user and in-turn capture kernel come to know the
> location of backup region, assuming that the user is going to provide
> the exactmap for capture kernel to boot into.
>
> Just a thought, is it  a good idea for kexec-tools to be creating and
> passing memmap parameters doing appropriate adjustment for backup
> region.

Exactly.  Having /sbin/kexec do this instead of the user doing this
manually is a much simpler solution than we have now.

> I had

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-27 Thread Vivek Goyal

Hi Eric,

It looks like we are looking at things a little differently. I
see a portion of the picture in your mind, but obviously not 
entirely.

Perhaps, we need to step back and iron out in specific terms what 
the interface between the two kernels should be in the crash dump
case, and the distribution of responsibility between kernel, user space
and the user. 

[BTW, the patch was intended as a step in development up for
comment early enough to be able to get agreement on the interface
and think issues through to more completeness before going 
too far. Sorry, if that wasn't apparent.]

When you say "evil intermingling", I'm guessing you mean the
"crashbackup=" boot parameter ? If so, then yes, I agree it'd
be nice to find a way around it that doesn't push hardcoding
elsewhere.

Let me explain the interface/approach I was looking at.

1.First kernel reserves some area of memory for crash/capture kernel as
specified by [EMAIL PROTECTED] boot time parameter.

2.First kernel marks the top 640K of this area as backup area. (If
architecture needs it.) This is sort of a hardcoding and probably this
space reservation can be managed from user space as well as mentioned by
you in this mail below.

3. Location of backup region is exported through /proc/iomem which can
be read by user space utility to pass this information to purgatory code
to determine where to copy the first 640K.

Note that we do not make any additional reservation for the 
backup region. We carve this out from the top of the already 
reserved region and export it through /proc/iomem so that 
the user space code and the capture kernel code need not 
make any assumptions about where this region is located.

4. Once the capture kernel boots, it needs to know the location of
backup region for two purposes.

a. It should not overwrite the backup region.

b. There needs to be a way for the capture tool to access the original
   contents of the backed up region

Boot time parameter [EMAIL PROTECTED] has been provided to pass this
information to capture kernel. This parameter is valid only for capture
kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.


> What is wrong with user space doing all of the extra space
> reservation?

Just for clarity, are you suggesting kexec-tools creating an additional
segment for the backup region and pass the information to kernel.

There is no problem in doing reservation from user space except
one. How does the user and in-turn capture kernel come to know the
location of backup region, assuming that the user is going to provide
the exactmap for capture kernel to boot into.

Just a thought, is it  a good idea for kexec-tools to be creating and
passing memmap parameters doing appropriate adjustment for backup
region.

I had another question. How is the starting location of elf headers 
communicated to capture tool? Is parameter segment a good idea? or 
some hardcoding? 

Another approach can be that backup area information is encoded in elf
headers and capture kernel is booted with modified memmap (User gets
backup region information from /proc/iomem) and capture tool can
extract backup area information from elf headers as stored by first
kernel.

Could you please elaborate a little more on what aspect of your view
differs from the above.

Thanks
Vivek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-27 Thread Vivek Goyal

Hi Eric,

It looks like we are looking at things a little differently. I
see a portion of the picture in your mind, but obviously not 
entirely.

Perhaps, we need to step back and iron out in specific terms what 
the interface between the two kernels should be in the crash dump
case, and the distribution of responsibility between kernel, user space
and the user. 

[BTW, the patch was intended as a step in development up for
comment early enough to be able to get agreement on the interface
and think issues through to more completeness before going 
too far. Sorry, if that wasn't apparent.]

When you say evil intermingling, I'm guessing you mean the
crashbackup= boot parameter ? If so, then yes, I agree it'd
be nice to find a way around it that doesn't push hardcoding
elsewhere.

Let me explain the interface/approach I was looking at.

1.First kernel reserves some area of memory for crash/capture kernel as
specified by [EMAIL PROTECTED] boot time parameter.

2.First kernel marks the top 640K of this area as backup area. (If
architecture needs it.) This is sort of a hardcoding and probably this
space reservation can be managed from user space as well as mentioned by
you in this mail below.

3. Location of backup region is exported through /proc/iomem which can
be read by user space utility to pass this information to purgatory code
to determine where to copy the first 640K.

Note that we do not make any additional reservation for the 
backup region. We carve this out from the top of the already 
reserved region and export it through /proc/iomem so that 
the user space code and the capture kernel code need not 
make any assumptions about where this region is located.

4. Once the capture kernel boots, it needs to know the location of
backup region for two purposes.

a. It should not overwrite the backup region.

b. There needs to be a way for the capture tool to access the original
   contents of the backed up region

Boot time parameter [EMAIL PROTECTED] has been provided to pass this
information to capture kernel. This parameter is valid only for capture
kernel and becomes effective only if CONFIG_CRASH_DUMP is enabled.


 What is wrong with user space doing all of the extra space
 reservation?

Just for clarity, are you suggesting kexec-tools creating an additional
segment for the backup region and pass the information to kernel.

There is no problem in doing reservation from user space except
one. How does the user and in-turn capture kernel come to know the
location of backup region, assuming that the user is going to provide
the exactmap for capture kernel to boot into.

Just a thought, is it  a good idea for kexec-tools to be creating and
passing memmap parameters doing appropriate adjustment for backup
region.

I had another question. How is the starting location of elf headers 
communicated to capture tool? Is parameter segment a good idea? or 
some hardcoding? 

Another approach can be that backup area information is encoded in elf
headers and capture kernel is booted with modified memmap (User gets
backup region information from /proc/iomem) and capture tool can
extract backup area information from elf headers as stored by first
kernel.

Could you please elaborate a little more on what aspect of your view
differs from the above.

Thanks
Vivek

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-26 Thread Eric W. Biederman

Right now I am very frustrated with reviewing any of the crashdump
patches.  When I make comments usually things change just enough that
what I said is addressed but things are addressed very much at
a surface level.  Which means that if I think any kind of substantial
change is needed the only way I seem to be able to communicate
that is by actually implementing it myself.

Code that works today is great it does manages the job of requirements
capture.   But just throwing code together when you are dealing
with fundamental interface boundaries is not a good way to build
a sustainable design.  And with the crashdump code I want an
interface that is at least as simple and as stable as the syscall
interface.

At the very least if a patch is just a snapshot of your development
process up for comment and you are going to continue on making
headway please say as much.  If I know the code is quite possibly
going to change in some pretty fundamental ways I can stop worrying
about it.  This patch is certainly nothing I would want for more
than a couple of day hack, in my personal development tree.

I will try once again...

There is evil intermingling and false dependency sharing between
the dying kernel and the crash capture kernel in this patch, and
virtually all of the code is unnecessary.  I have already addressed
why.

Vivek Goyal <[EMAIL PROTECTED]> writes:

> On Fri, 2005-01-21 at 16:43, Eric W. Biederman wrote:
> > On deeper review your patch as it stands is incomplete.  In particular
> > you don't provide a way to either hardcode or dynamically set
> > the area you are attempt to reserve to hold the backup region.
> 
> Well. Here is the new patch. This one steals the 640k from top of memory
> region reserved for crash kernel. 
> 
> A new command line parameter (crashbackup=) has been introduced for
> crash dump kernels. This parameter specifies the location of backup
> region from where to retrieve the backup data.

What is wrong with user space doing all of the extra space
reservation?

Could you send this fairly obvious kexec fix, as a separate patch? 

> diff -puN include/linux/kexec.h~crashdump-x86-reserve-640k-memory
> include/linux/kexec.h
> 
> --- linux-2.6.11-rc1/include/linux/kexec.h~crashdump-x86-reserve-640k-memory
> 2005-01-22 14:16:27.0 +0530
> 
> +++ linux-2.6.11-rc1-root/include/linux/kexec.h 2005-01-22 14:16:27.0
> +0530
> 
> @@ -79,7 +79,7 @@ struct kimage {
>   unsigned long control_page;
>  
>   /* Flags to indicate special processing */
> - int type : 1;
> + unsigned int type : 1;
>  #define KEXEC_TYPE_DEFAULT 0
>  #define KEXEC_TYPE_CRASH   1
>  };

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-26 Thread Andrew Morton

[EMAIL PROTECTED] (Eric W. Biederman) wrote:
>
> There is evil intermingling and false dependency sharing between
>  the dying kernel and the crash capture kernel in this patch,

yikes!  I'll drop it from -mm while we have a rethink.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-26 Thread Andrew Morton

[EMAIL PROTECTED] (Eric W. Biederman) wrote:

 There is evil intermingling and false dependency sharing between
  the dying kernel and the crash capture kernel in this patch,

yikes!  I'll drop it from -mm while we have a rethink.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-26 Thread Eric W. Biederman


Right now I am very frustrated with reviewing any of the crashdump
patches.  When I make comments usually things change just enough that
what I said is addressed but things are addressed very much at
a surface level.  Which means that if I think any kind of substantial
change is needed the only way I seem to be able to communicate
that is by actually implementing it myself.

Code that works today is great it does manages the job of requirements
capture.   But just throwing code together when you are dealing
with fundamental interface boundaries is not a good way to build
a sustainable design.  And with the crashdump code I want an
interface that is at least as simple and as stable as the syscall
interface.

At the very least if a patch is just a snapshot of your development
process up for comment and you are going to continue on making
headway please say as much.  If I know the code is quite possibly
going to change in some pretty fundamental ways I can stop worrying
about it.  This patch is certainly nothing I would want for more
than a couple of day hack, in my personal development tree.

I will try once again...

There is evil intermingling and false dependency sharing between
the dying kernel and the crash capture kernel in this patch, and
virtually all of the code is unnecessary.  I have already addressed
why.

Vivek Goyal [EMAIL PROTECTED] writes:

 On Fri, 2005-01-21 at 16:43, Eric W. Biederman wrote:
  On deeper review your patch as it stands is incomplete.  In particular
  you don't provide a way to either hardcode or dynamically set
  the area you are attempt to reserve to hold the backup region.
 
 Well. Here is the new patch. This one steals the 640k from top of memory
 region reserved for crash kernel. 
 
 A new command line parameter (crashbackup=) has been introduced for
 crash dump kernels. This parameter specifies the location of backup
 region from where to retrieve the backup data.

What is wrong with user space doing all of the extra space
reservation?

Could you send this fairly obvious kexec fix, as a separate patch? 

 diff -puN include/linux/kexec.h~crashdump-x86-reserve-640k-memory
 include/linux/kexec.h
 
 --- linux-2.6.11-rc1/include/linux/kexec.h~crashdump-x86-reserve-640k-memory
 2005-01-22 14:16:27.0 +0530
 
 +++ linux-2.6.11-rc1-root/include/linux/kexec.h 2005-01-22 14:16:27.0
 +0530
 
 @@ -79,7 +79,7 @@ struct kimage {
   unsigned long control_page;
  
   /* Flags to indicate special processing */
 - int type : 1;
 + unsigned int type : 1;
  #define KEXEC_TYPE_DEFAULT 0
  #define KEXEC_TYPE_CRASH   1
  };

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-23 Thread Vivek Goyal

On Fri, 2005-01-21 at 16:43, Eric W. Biederman wrote:
> On deeper review your patch as it stands is incomplete.  In particular
> you don't provide a way to either hardcode or dynamically set
> the area you are attempt to reserve to hold the backup region.

Well. Here is the new patch. This one steals the 640k from top of memory
region reserved for crash kernel. 

A new command line parameter (crashbackup=) has been introduced for
crash dump kernels. This parameter specifies the location of backup
region from where to retrieve the backup data.

Thanks
Vivek



This patch adds support for reserving 640k memory as backup region as required
by crashdump kernel for x86. 
---

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 linux-2.6.11-rc1-root/Documentation/kernel-parameters.txt |5 ++
 linux-2.6.11-rc1-root/arch/i386/kernel/setup.c|   26 --
 linux-2.6.11-rc1-root/arch/i386/mm/discontig.c|8 
 linux-2.6.11-rc1-root/include/linux/crash_dump.h  |1 
 linux-2.6.11-rc1-root/include/linux/kexec.h   |6 ++-
 linux-2.6.11-rc1-root/kernel/crash_dump.c |3 +
 linux-2.6.11-rc1-root/kernel/kexec.c  |8 
 7 files changed, 54 insertions(+), 3 deletions(-)

diff -puN arch/i386/kernel/setup.c~crashdump-x86-reserve-640k-memory 
arch/i386/kernel/setup.c
--- linux-2.6.11-rc1/arch/i386/kernel/setup.c~crashdump-x86-reserve-640k-memory 
2005-01-22 14:16:27.0 +0530
+++ linux-2.6.11-rc1-root/arch/i386/kernel/setup.c  2005-01-22 
14:22:41.0 +0530
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,7 +52,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "setup_arch_pre.h"
 #include 
 
@@ -852,7 +852,20 @@ static void __init parse_cmdline_early (
}
}
 #endif
-
+#ifdef CONFIG_CRASH_DUMP
+   /* [EMAIL PROTECTED] specifies the location of backup
+* region where, crashed kernel has stored some backup data.
+*/
+   else if (!memcmp(from, "crashbackup=", 12)) {
+   unsigned long size, base;
+   size = memparse(from+12, );
+   if (*from == '@') {
+   base = memparse(from+1, );
+   crashbackup_start = base;
+   crashbackup_end  = base + size - 1;
+   }
+   }
+#endif
/*
 * highmem=size forces highmem to be exactly 'size' bytes.
 * This works even on boxes that have no highmem otherwise.
@@ -1159,6 +1172,14 @@ static unsigned long __init setup_memory
 #ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end) {
reserve_bootmem(crashk_res.start, crashk_res.end - 
crashk_res.start + 1);
+
+#define CRASHDUMP_BACKUP_SZ 0xa
+   /* Steal 640K from top of reserved region for crash kernel */
+   if ((crashk_res.end - crashk_res.start) > CRASHDUMP_BACKUP_SZ) {
+   crashdumpk_res.end = crashk_res.end;
+   crashk_res.end = crashk_res.end - CRASHDUMP_BACKUP_SZ;
+   crashdumpk_res.start = crashk_res.end + 1;
+   }
}
 #endif
return max_low_pfn;
@@ -1202,6 +1223,7 @@ legacy_init_iomem_resources(struct resou
request_resource(res, data_resource);
 #ifdef CONFIG_KEXEC
request_resource(res, _res);
+   request_resource(res, _res);
 #endif
}
}
diff -puN arch/i386/mm/discontig.c~crashdump-x86-reserve-640k-memory 
arch/i386/mm/discontig.c
--- linux-2.6.11-rc1/arch/i386/mm/discontig.c~crashdump-x86-reserve-640k-memory 
2005-01-22 14:16:27.0 +0530
+++ linux-2.6.11-rc1-root/arch/i386/mm/discontig.c  2005-01-22 
14:40:30.0 +0530
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -371,6 +372,13 @@ unsigned long __init setup_memory(void)
 #ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end) {
reserve_bootmem(crashk_res.start, crashk_res.end - 
crashk_res.start + 1);
+#define CRASHDUMP_BACKUP_SZ 0xa
+   /* Steal 640K from top of reserved region for crash kernel */
+   if ((crashk_res.end - crashk_res.start) > CRASHDUMP_BACKUP_SZ) {
+   crashdumpk_res.end = crashk_res.end;
+   crashk_res.end = crashk_res.end - CRASHDUMP_BACKUP_SZ;
+   crashdumpk_res.start = crashk_res.end + 1;
+   }
}
 #endif
return system_max_low_pfn;
diff -puN Documentation/kernel-parameters.txt~crashdump-x86-reserve-640k-memory 
Documentation/kernel-parameters.txt
--- 
linux-2.6.11-rc1/Documentation/kernel-parameters.txt~crashdump-x86-reserve-640k-memory

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-23 Thread Vivek Goyal

On Fri, 2005-01-21 at 16:43, Eric W. Biederman wrote:
 On deeper review your patch as it stands is incomplete.  In particular
 you don't provide a way to either hardcode or dynamically set
 the area you are attempt to reserve to hold the backup region.

Well. Here is the new patch. This one steals the 640k from top of memory
region reserved for crash kernel. 

A new command line parameter (crashbackup=) has been introduced for
crash dump kernels. This parameter specifies the location of backup
region from where to retrieve the backup data.

Thanks
Vivek



This patch adds support for reserving 640k memory as backup region as required
by crashdump kernel for x86. 
---

Signed-off-by: Vivek Goyal [EMAIL PROTECTED]
---

 linux-2.6.11-rc1-root/Documentation/kernel-parameters.txt |5 ++
 linux-2.6.11-rc1-root/arch/i386/kernel/setup.c|   26 --
 linux-2.6.11-rc1-root/arch/i386/mm/discontig.c|8 
 linux-2.6.11-rc1-root/include/linux/crash_dump.h  |1 
 linux-2.6.11-rc1-root/include/linux/kexec.h   |6 ++-
 linux-2.6.11-rc1-root/kernel/crash_dump.c |3 +
 linux-2.6.11-rc1-root/kernel/kexec.c  |8 
 7 files changed, 54 insertions(+), 3 deletions(-)

diff -puN arch/i386/kernel/setup.c~crashdump-x86-reserve-640k-memory 
arch/i386/kernel/setup.c
--- linux-2.6.11-rc1/arch/i386/kernel/setup.c~crashdump-x86-reserve-640k-memory 
2005-01-22 14:16:27.0 +0530
+++ linux-2.6.11-rc1-root/arch/i386/kernel/setup.c  2005-01-22 
14:22:41.0 +0530
@@ -41,6 +41,7 @@
 #include linux/init.h
 #include linux/edd.h
 #include linux/kexec.h
+#include linux/crash_dump.h
 #include video/edid.h
 #include asm/apic.h
 #include asm/e820.h
@@ -51,7 +52,6 @@
 #include asm/io_apic.h
 #include asm/ist.h
 #include asm/io.h
-#include asm/crash_dump.h
 #include setup_arch_pre.h
 #include bios_ebda.h
 
@@ -852,7 +852,20 @@ static void __init parse_cmdline_early (
}
}
 #endif
-
+#ifdef CONFIG_CRASH_DUMP
+   /* [EMAIL PROTECTED] specifies the location of backup
+* region where, crashed kernel has stored some backup data.
+*/
+   else if (!memcmp(from, crashbackup=, 12)) {
+   unsigned long size, base;
+   size = memparse(from+12, from);
+   if (*from == '@') {
+   base = memparse(from+1, from);
+   crashbackup_start = base;
+   crashbackup_end  = base + size - 1;
+   }
+   }
+#endif
/*
 * highmem=size forces highmem to be exactly 'size' bytes.
 * This works even on boxes that have no highmem otherwise.
@@ -1159,6 +1172,14 @@ static unsigned long __init setup_memory
 #ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end) {
reserve_bootmem(crashk_res.start, crashk_res.end - 
crashk_res.start + 1);
+
+#define CRASHDUMP_BACKUP_SZ 0xa
+   /* Steal 640K from top of reserved region for crash kernel */
+   if ((crashk_res.end - crashk_res.start)  CRASHDUMP_BACKUP_SZ) {
+   crashdumpk_res.end = crashk_res.end;
+   crashk_res.end = crashk_res.end - CRASHDUMP_BACKUP_SZ;
+   crashdumpk_res.start = crashk_res.end + 1;
+   }
}
 #endif
return max_low_pfn;
@@ -1202,6 +1223,7 @@ legacy_init_iomem_resources(struct resou
request_resource(res, data_resource);
 #ifdef CONFIG_KEXEC
request_resource(res, crashk_res);
+   request_resource(res, crashdumpk_res);
 #endif
}
}
diff -puN arch/i386/mm/discontig.c~crashdump-x86-reserve-640k-memory 
arch/i386/mm/discontig.c
--- linux-2.6.11-rc1/arch/i386/mm/discontig.c~crashdump-x86-reserve-640k-memory 
2005-01-22 14:16:27.0 +0530
+++ linux-2.6.11-rc1-root/arch/i386/mm/discontig.c  2005-01-22 
14:40:30.0 +0530
@@ -29,6 +29,7 @@
 #include linux/highmem.h
 #include linux/initrd.h
 #include linux/nodemask.h
+#include linux/ioport.h
 #include linux/kexec.h
 #include asm/e820.h
 #include asm/setup.h
@@ -371,6 +372,13 @@ unsigned long __init setup_memory(void)
 #ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end) {
reserve_bootmem(crashk_res.start, crashk_res.end - 
crashk_res.start + 1);
+#define CRASHDUMP_BACKUP_SZ 0xa
+   /* Steal 640K from top of reserved region for crash kernel */
+   if ((crashk_res.end - crashk_res.start)  CRASHDUMP_BACKUP_SZ) {
+   crashdumpk_res.end = crashk_res.end;
+   crashk_res.end = crashk_res.end - CRASHDUMP_BACKUP_SZ;
+   crashdumpk_res.start = crashk_res.end + 1;
+   }
}

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-21 Thread Eric W. Biederman

On deeper review your patch as it stands is incomplete.  In particular
you don't provide a way to either hardcode or dynamically set
the area you are attempt to reserve to hold the backup region.

Vivek Goyal <[EMAIL PROTECTED]> writes:

> On Fri, 2005-01-21 at 13:24, Eric W. Biederman wrote:
> > Why do we need a separate region for this?
> > 
> > It should be simple enough to take 640 out of the area kexec reserves
> > for the crash dump kernel.  That is what the previous code implemented.
> 
> Previous code also reserved the backup memory region after crash kernel
> region. It is just a matter of interpretation. What I understand that
> crash kernel reserved region represents something where one can load the
> panic kernel directly and new kernel can use this memory region for
> memory allocation.

Yes the reservation is a hunk of memory reserved for use by the crashdump
process, or whatever happens after panic.  It is up to the loaded code
to define how that memory is used.  purgatory.ro is a legitimate part
of that loaded code.

> I don't want to steal the backup region from crash kernel region
> otherwise, I shall have to boot the crash kernel with some strange
> values like memmap=(32M-640k)@16M (symbolically) to prevent crash kernel
> overwriting backup region. Why to make user aware of location of backup
> region.

Making the user aware of the region makes it one more thing for the user
to be aware of and to manually manage.  Based on what was passed as
crashkernel=...  We should be able to automate all of the rest of it.
So a weird memmap= line should not be hard.

I will have to wait and see but it would not surprise me if we settled
on a fixed address per architecture for the reservation to make it
easier for various users.

On that note we probably want to move the magic that we are doing
for crashdumps into the linux loader (i.e. x86-linux-setup.c ) in
kexec-tools, as most of these pieces are specific to taking a
crashdump with linux.  Not that I expect we will be doing it with
anything else but...

 > Alternatively, this can be managed by reserving this backup region again
> in crash kernel to avoid any stomping. May be pass backup region
> location to new kernel through parameter segment or through command line
> but don't see a strong reason for doing that.

Probably the biggest reason for doing it in one reservation is that
it happens to be an implementation detail of the crashdump capture
kernel.  If that kernel is not SMP I believe you can safely leave the
first 640k alone.  I know at least one other effort has had success in
that area.

In general it is not good to make unnecessary implementation details
between two pieces of software be part of their interface.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-21 Thread Vivek Goyal

On Fri, 2005-01-21 at 13:24, Eric W. Biederman wrote:
> Vivek Goyal <[EMAIL PROTECTED]> writes:
> 
> > Hi Andrew,
> > 
> > Following patch is against 2.6.11-rc1-mm2. 
> > 
> > As mentioned by following note from Eric, crashdump code is currently
> > broken.
> > > 
> > > The crashdump code is currently slightly broken.  I have attempted to
> > > minimize the breakage so things can quick be made to work again.
> > 
> > We have started doing changes to make crashdump up and running again.
> > Following are few identified items to be done.
> > 
> > 1. Reserve the backup region (640k) during kernel bootup. 
> 
> Why do we need a separate region for this?
> 
> It should be simple enough to take 640 out of the area kexec reserves
> for the crash dump kernel.  That is what the previous code implemented.

Previous code also reserved the backup memory region after crash kernel
region. It is just a matter of interpretation. What I understand that
crash kernel reserved region represents something where one can load the
panic kernel directly and new kernel can use this memory region for
memory allocation.

I don't want to steal the backup region from crash kernel region
otherwise, I shall have to boot the crash kernel with some strange
values like memmap=(32M-640k)@16M (symbolically) to prevent crash kernel
overwriting backup region. Why to make user aware of location of backup
region.

Alternatively, this can be managed by reserving this backup region again
in crash kernel to avoid any stomping. May be pass backup region
location to new kernel through parameter segment or through command line
but don't see a strong reason for doing that.

Thanks
Vivek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-21 Thread Eric W. Biederman

Vivek Goyal <[EMAIL PROTECTED]> writes:

> Hi Andrew,
> 
> Following patch is against 2.6.11-rc1-mm2. 
> 
> As mentioned by following note from Eric, crashdump code is currently
> broken.
> > 
> > The crashdump code is currently slightly broken.  I have attempted to
> > minimize the breakage so things can quick be made to work again.
> 
> We have started doing changes to make crashdump up and running again.
> Following are few identified items to be done.
> 
> 1. Reserve the backup region (640k) during kernel bootup. 

Why do we need a separate region for this?

It should be simple enough to take 640 out of the area kexec reserves
for the crash dump kernel.  That is what the previous code implemented.

> 2. Copy the data to backup region during crash.(moved to kexec user
> space code, patch posted in separate mail)

Thanks by and large it looks sane, it won't work yet the but it is
moving in the right direction.

> +++ linux-2.6.11-rc1-mm2-kexec-eric-root/include/linux/kexec.h 2005-01-20
> 13:55:33.0 +0530
> 
> @@ -79,7 +79,7 @@ struct kimage {
>   unsigned long control_page;
>  
>   /* Flags to indicate special processing */
> - int type : 1;
> + unsigned int type : 1;

That looks like a sane bug fix.  Having values of 0 and -1 is quite what
I was expecting...

Eric



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-21 Thread Eric W. Biederman

Vivek Goyal [EMAIL PROTECTED] writes:

 Hi Andrew,
 
 Following patch is against 2.6.11-rc1-mm2. 
 
 As mentioned by following note from Eric, crashdump code is currently
 broken.
  
  The crashdump code is currently slightly broken.  I have attempted to
  minimize the breakage so things can quick be made to work again.
 
 We have started doing changes to make crashdump up and running again.
 Following are few identified items to be done.
 
 1. Reserve the backup region (640k) during kernel bootup. 

Why do we need a separate region for this?

It should be simple enough to take 640 out of the area kexec reserves
for the crash dump kernel.  That is what the previous code implemented.

 2. Copy the data to backup region during crash.(moved to kexec user
 space code, patch posted in separate mail)

Thanks by and large it looks sane, it won't work yet the but it is
moving in the right direction.

 +++ linux-2.6.11-rc1-mm2-kexec-eric-root/include/linux/kexec.h 2005-01-20
 13:55:33.0 +0530
 
 @@ -79,7 +79,7 @@ struct kimage {
   unsigned long control_page;
  
   /* Flags to indicate special processing */
 - int type : 1;
 + unsigned int type : 1;

That looks like a sane bug fix.  Having values of 0 and -1 is quite what
I was expecting...

Eric



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-21 Thread Vivek Goyal

On Fri, 2005-01-21 at 13:24, Eric W. Biederman wrote:
 Vivek Goyal [EMAIL PROTECTED] writes:
 
  Hi Andrew,
  
  Following patch is against 2.6.11-rc1-mm2. 
  
  As mentioned by following note from Eric, crashdump code is currently
  broken.
   
   The crashdump code is currently slightly broken.  I have attempted to
   minimize the breakage so things can quick be made to work again.
  
  We have started doing changes to make crashdump up and running again.
  Following are few identified items to be done.
  
  1. Reserve the backup region (640k) during kernel bootup. 
 
 Why do we need a separate region for this?
 
 It should be simple enough to take 640 out of the area kexec reserves
 for the crash dump kernel.  That is what the previous code implemented.

Previous code also reserved the backup memory region after crash kernel
region. It is just a matter of interpretation. What I understand that
crash kernel reserved region represents something where one can load the
panic kernel directly and new kernel can use this memory region for
memory allocation.

I don't want to steal the backup region from crash kernel region
otherwise, I shall have to boot the crash kernel with some strange
values like memmap=(32M-640k)@16M (symbolically) to prevent crash kernel
overwriting backup region. Why to make user aware of location of backup
region.

Alternatively, this can be managed by reserving this backup region again
in crash kernel to avoid any stomping. May be pass backup region
location to new kernel through parameter segment or through command line
but don't see a strong reason for doing that.


Thanks
Vivek

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fastboot] [PATCH] Reserving backup region for kexec based crashdumps.

2005-01-21 Thread Eric W. Biederman


On deeper review your patch as it stands is incomplete.  In particular
you don't provide a way to either hardcode or dynamically set
the area you are attempt to reserve to hold the backup region.

Vivek Goyal [EMAIL PROTECTED] writes:

 On Fri, 2005-01-21 at 13:24, Eric W. Biederman wrote:
  Why do we need a separate region for this?
  
  It should be simple enough to take 640 out of the area kexec reserves
  for the crash dump kernel.  That is what the previous code implemented.
 
 Previous code also reserved the backup memory region after crash kernel
 region. It is just a matter of interpretation. What I understand that
 crash kernel reserved region represents something where one can load the
 panic kernel directly and new kernel can use this memory region for
 memory allocation.

Yes the reservation is a hunk of memory reserved for use by the crashdump
process, or whatever happens after panic.  It is up to the loaded code
to define how that memory is used.  purgatory.ro is a legitimate part
of that loaded code.

 I don't want to steal the backup region from crash kernel region
 otherwise, I shall have to boot the crash kernel with some strange
 values like memmap=(32M-640k)@16M (symbolically) to prevent crash kernel
 overwriting backup region. Why to make user aware of location of backup
 region.

Making the user aware of the region makes it one more thing for the user
to be aware of and to manually manage.  Based on what was passed as
crashkernel=...  We should be able to automate all of the rest of it.
So a weird memmap= line should not be hard.

I will have to wait and see but it would not surprise me if we settled
on a fixed address per architecture for the reservation to make it
easier for various users.

On that note we probably want to move the magic that we are doing
for crashdumps into the linux loader (i.e. x86-linux-setup.c ) in
kexec-tools, as most of these pieces are specific to taking a
crashdump with linux.  Not that I expect we will be doing it with
anything else but...

  Alternatively, this can be managed by reserving this backup region again
 in crash kernel to avoid any stomping. May be pass backup region
 location to new kernel through parameter segment or through command line
 but don't see a strong reason for doing that.

Probably the biggest reason for doing it in one reservation is that
it happens to be an implementation detail of the crashdump capture
kernel.  If that kernel is not SMP I believe you can safely leave the
first 640k alone.  I know at least one other effort has had success in
that area.

In general it is not good to make unnecessary implementation details
between two pieces of software be part of their interface.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

83 matches

Mail list logo