Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-07 Thread H. Peter Anvin
Vivek Goyal wrote:
> 
> One would not know highest used address until ELF headers have been 
> parsed. May be it is two step movement. First decompress ELF.gz and 
> ELF parser can be at the end of decompressed data. Then it can parse
> the ELF headers and move itself out of the ELF header destination memory
> and then load the elf segments at appropriate place.
> 
> One will have to be little careful while moving ELF parser or while
> decompressing the file to a temporary buffer so that we don't stomp over
> any other data loaded by boot-loader (like kexec does) or we don't go beyond
> the memory bounds which might have been created in the case of using kdump.
> 

The easiest is probably to decode the ELF headers (which can be done in
O(1) space), relocate, reset the decompressor and restart.

Relocation is currently done in the decompressor, but it could also be
done at the kernel entrypoint, as long as the kernel entrypoint code is
all PIC.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-07 Thread Vivek Goyal
On Wed, Jun 06, 2007 at 05:42:35PM -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > Certainly, but much harder to implement.  The ELF parser needs to be
> > prepared to move itself around to get out of the way of the ELF file. 
> > It's a fairly large change from how it works now.
> > 
> 
> It doesn't if we simply declare that a certain chunk of memory is
> available to it, for the case where it runs in the native configuration.
> Since it doesn't have to support *any* ELF file, just the kernel one,
> that's an option.
> 
> On the other hand, I guess with the decompressor/ELF parser being PIC,
> one would simply look for the highest used address, and relocate itself
> above that point.  It's not really all that different from what the
> decompressor does today, except that it knows the address a priori.
> 

One would not know highest used address until ELF headers have been 
parsed. May be it is two step movement. First decompress ELF.gz and 
ELF parser can be at the end of decompressed data. Then it can parse
the ELF headers and move itself out of the ELF header destination memory
and then load the elf segments at appropriate place.

One will have to be little careful while moving ELF parser or while
decompressing the file to a temporary buffer so that we don't stomp over
any other data loaded by boot-loader (like kexec does) or we don't go beyond
the memory bounds which might have been created in the case of using kdump.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-07 Thread H. Peter Anvin
Rob Landley wrote:
> 
> Er, make that objcopy, not objdump.
> 
> Sane, maybe not.  Something people want to do (and under the mistaken 
> assumption I know more about initramfs then they do, have asked me how), yes. 
>  
> It always boils down to "do you have a vmlinux image lying around?  Doing 
> this with a bzImage _is_ brain surgery", and has yet to get beyond that 
> question.  I had about half of a script worked out for this, once...
> 

If it can be done today on a vmlinux then it can be done the same way
with the mechanism I have proposed.  Period, full stop.

> You can also supply an external initramfs image through the initrd mechanism, 
> but this is unpleasant to do with some bootloaders (or lack of bootloaders).  
> Plus it doesn't remove the old one, and wasting space makes embedded 
> developers itch.

In thory one could create an extended bzImage format which could handle
a concatenated, and easily replaceable, initrd, but if it's done on
vmlinux today it would make a *lot* more sense to have it be done on the
vmlinux and nothing else.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-07 Thread Rob Landley
On Wednesday 06 June 2007 9:54 pm, H. Peter Anvin wrote:
> Rob Landley wrote:
> > On Wednesday 06 June 2007 7:41 pm, H. Peter Anvin wrote:
> >> This makes vmlinux (normally stripped) recoverable from the bzImage file
> >> and so anything that is currently booting vmlinux would be serviced by
> >> this scheme.
> > 
> > Would this make it sane to strip the initramfs image out of vmlinux with 
> > objdump and replace it with another one, or are there offsets resolved 
during 
> > the build that stop that for vmlinux?
> > 
> 
> There probably are offsets resolved during the build.  However, that
> wouldn't be all that hard to fix.  Still, one can argue whether or not
> it is sane under any definition to do this kind of unpacking-repacking
> of ELF files.

Er, make that objcopy, not objdump.

Sane, maybe not.  Something people want to do (and under the mistaken 
assumption I know more about initramfs then they do, have asked me how), yes.  
It always boils down to "do you have a vmlinux image lying around?  Doing 
this with a bzImage _is_ brain surgery", and has yet to get beyond that 
question.  I had about half of a script worked out for this, once...

You can also supply an external initramfs image through the initrd mechanism, 
but this is unpleasant to do with some bootloaders (or lack of bootloaders).  
Plus it doesn't remove the old one, and wasting space makes embedded 
developers itch.

Rob
-- 
The Google cluster became self-aware at 2:14am EDT August 29, 2007...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-07 Thread Vivek Goyal
On Wed, Jun 06, 2007 at 05:42:35PM -0700, H. Peter Anvin wrote:
 Jeremy Fitzhardinge wrote:
  
  Certainly, but much harder to implement.  The ELF parser needs to be
  prepared to move itself around to get out of the way of the ELF file. 
  It's a fairly large change from how it works now.
  
 
 It doesn't if we simply declare that a certain chunk of memory is
 available to it, for the case where it runs in the native configuration.
 Since it doesn't have to support *any* ELF file, just the kernel one,
 that's an option.
 
 On the other hand, I guess with the decompressor/ELF parser being PIC,
 one would simply look for the highest used address, and relocate itself
 above that point.  It's not really all that different from what the
 decompressor does today, except that it knows the address a priori.
 

One would not know highest used address until ELF headers have been 
parsed. May be it is two step movement. First decompress ELF.gz and 
ELF parser can be at the end of decompressed data. Then it can parse
the ELF headers and move itself out of the ELF header destination memory
and then load the elf segments at appropriate place.

One will have to be little careful while moving ELF parser or while
decompressing the file to a temporary buffer so that we don't stomp over
any other data loaded by boot-loader (like kexec does) or we don't go beyond
the memory bounds which might have been created in the case of using kdump.

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-07 Thread H. Peter Anvin
Vivek Goyal wrote:
 
 One would not know highest used address until ELF headers have been 
 parsed. May be it is two step movement. First decompress ELF.gz and 
 ELF parser can be at the end of decompressed data. Then it can parse
 the ELF headers and move itself out of the ELF header destination memory
 and then load the elf segments at appropriate place.
 
 One will have to be little careful while moving ELF parser or while
 decompressing the file to a temporary buffer so that we don't stomp over
 any other data loaded by boot-loader (like kexec does) or we don't go beyond
 the memory bounds which might have been created in the case of using kdump.
 

The easiest is probably to decode the ELF headers (which can be done in
O(1) space), relocate, reset the decompressor and restart.

Relocation is currently done in the decompressor, but it could also be
done at the kernel entrypoint, as long as the kernel entrypoint code is
all PIC.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-07 Thread Rob Landley
On Wednesday 06 June 2007 9:54 pm, H. Peter Anvin wrote:
 Rob Landley wrote:
  On Wednesday 06 June 2007 7:41 pm, H. Peter Anvin wrote:
  This makes vmlinux (normally stripped) recoverable from the bzImage file
  and so anything that is currently booting vmlinux would be serviced by
  this scheme.
  
  Would this make it sane to strip the initramfs image out of vmlinux with 
  objdump and replace it with another one, or are there offsets resolved 
during 
  the build that stop that for vmlinux?
  
 
 There probably are offsets resolved during the build.  However, that
 wouldn't be all that hard to fix.  Still, one can argue whether or not
 it is sane under any definition to do this kind of unpacking-repacking
 of ELF files.

Er, make that objcopy, not objdump.

Sane, maybe not.  Something people want to do (and under the mistaken 
assumption I know more about initramfs then they do, have asked me how), yes.  
It always boils down to do you have a vmlinux image lying around?  Doing 
this with a bzImage _is_ brain surgery, and has yet to get beyond that 
question.  I had about half of a script worked out for this, once...

You can also supply an external initramfs image through the initrd mechanism, 
but this is unpleasant to do with some bootloaders (or lack of bootloaders).  
Plus it doesn't remove the old one, and wasting space makes embedded 
developers itch.

Rob
-- 
The Google cluster became self-aware at 2:14am EDT August 29, 2007...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-07 Thread H. Peter Anvin
Rob Landley wrote:
 
 Er, make that objcopy, not objdump.
 
 Sane, maybe not.  Something people want to do (and under the mistaken 
 assumption I know more about initramfs then they do, have asked me how), yes. 
  
 It always boils down to do you have a vmlinux image lying around?  Doing 
 this with a bzImage _is_ brain surgery, and has yet to get beyond that 
 question.  I had about half of a script worked out for this, once...
 

If it can be done today on a vmlinux then it can be done the same way
with the mechanism I have proposed.  Period, full stop.

 You can also supply an external initramfs image through the initrd mechanism, 
 but this is unpleasant to do with some bootloaders (or lack of bootloaders).  
 Plus it doesn't remove the old one, and wasting space makes embedded 
 developers itch.

In thory one could create an extended bzImage format which could handle
a concatenated, and easily replaceable, initrd, but if it's done on
vmlinux today it would make a *lot* more sense to have it be done on the
vmlinux and nothing else.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Rob Landley wrote:
> On Wednesday 06 June 2007 7:41 pm, H. Peter Anvin wrote:
>> This makes vmlinux (normally stripped) recoverable from the bzImage file
>> and so anything that is currently booting vmlinux would be serviced by
>> this scheme.
> 
> Would this make it sane to strip the initramfs image out of vmlinux with 
> objdump and replace it with another one, or are there offsets resolved during 
> the build that stop that for vmlinux?
> 

There probably are offsets resolved during the build.  However, that
wouldn't be all that hard to fix.  Still, one can argue whether or not
it is sane under any definition to do this kind of unpacking-repacking
of ELF files.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Rob Landley
On Wednesday 06 June 2007 7:41 pm, H. Peter Anvin wrote:
> This makes vmlinux (normally stripped) recoverable from the bzImage file
> and so anything that is currently booting vmlinux would be serviced by
> this scheme.

Would this make it sane to strip the initramfs image out of vmlinux with 
objdump and replace it with another one, or are there offsets resolved during 
the build that stop that for vmlinux?

Rob
-- 
The Google cluster became self-aware at 2:14am EDT August 29, 2007...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> It doesn't if we simply declare that a certain chunk of memory is
> available to it, for the case where it runs in the native configuration.
> Since it doesn't have to support *any* ELF file, just the kernel one,
> that's an option.
>   

I suppose.  But given that its always built at the same time as - and
linked to - the kernel itself, it can have private knowledge about the
kernel.

> On the other hand, I guess with the decompressor/ELF parser being PIC,
> one would simply look for the highest used address, and relocate itself
> above that point.  It's not really all that different from what the
> decompressor does today, except that it knows the address a priori.
>   

Yes, it would have to decompress the ELF file into a temp buffer, and
then rearrange itself and the decompressed ELF file to make space for
the ELF file's final location.  Seems a bit more complex because it has
to be done in the middle of execution rather that at start of day.  But
perhaps that doesn't matter very much.

>> I was thinking of making the ELF file entirely descriptive, since its
>> just a set of ELF headers inserted into the existing bzImage structure,
>> and it still relies on the bzImage being build properly in the first place.
>> 
>
> Again, it's an option.  The downside is that you don't get the automatic
> test coverage of having it be exercised as often as possible.

I don't follow your argument at all.

I'm proposing the kernel take the same code path regardless of how its
booted, with the only two variations:

   1. boot all the way up from 16-bit mode, or
   2. start directly in 32-bit mode

which is essentially the current situation (setup vs code32_start).  All
I'm adding is a bit more metadata for the domain builder to work with. 
The code will get exercised on every boot in every environment, and the
metadata will be tested by whichever environment cares about it.

You're proposing that we add a third booting variation, where the
bootloader takes on the responsibility for decompressing and loading the
kernel's ELF image.  In addition, you're proposing changing the existing
32-bit portion of the boot to perform the same job as the third method,
but in a way which is not reusable by a paravirtual domain builder. 
This means that the boot path is unique for each boot environment, and
so will overall get less coverage.

Given that one axis of the test matrix - "number of subarchtectures" -
is the same in both cases, and the other axis - "number of ways of
booting" - is larger in your proposal, it seems to me that your's has
the higher testing burden.

Anyway, I added an extra pointer in the boot_params so that you can
implement it that way if you really want (no real reason you can have
ELF within ELF within bzImage, but it starts to look a bit
engineering-by-compromise at that point).  It isn't, however, the
approach I want to take with Xen.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> 
> Certainly, but much harder to implement.  The ELF parser needs to be
> prepared to move itself around to get out of the way of the ELF file. 
> It's a fairly large change from how it works now.
> 

It doesn't if we simply declare that a certain chunk of memory is
available to it, for the case where it runs in the native configuration.
Since it doesn't have to support *any* ELF file, just the kernel one,
that's an option.

On the other hand, I guess with the decompressor/ELF parser being PIC,
one would simply look for the highest used address, and relocate itself
above that point.  It's not really all that different from what the
decompressor does today, except that it knows the address a priori.

> I was thinking of making the ELF file entirely descriptive, since its
> just a set of ELF headers inserted into the existing bzImage structure,
> and it still relies on the bzImage being build properly in the first place.

Again, it's an option.  The downside is that you don't get the automatic
test coverage of having it be exercised as often as possible.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> I was thinking prescriptive, having the decompressor read the output
> stream and interpret it as ELF.  I guess a descriptive approach could be
> made to work, too (I haven't really thought about that avenue of
> approach), but the prescriptive model seems more powerful, at least to me.

Certainly, but much harder to implement.  The ELF parser needs to be
prepared to move itself around to get out of the way of the ELF file. 
It's a fairly large change from how it works now.

I was thinking of making the ELF file entirely descriptive, since its
just a set of ELF headers inserted into the existing bzImage structure,
and it still relies on the bzImage being build properly in the first place.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> 
> I'm not sure I fully understand the mechanism you're proposing.  You
> have the 16-bit setup code, the 32-bit decompressor, and an ELF.gz. Once
> the decompressor has extracted the actual ELF file, are you proposing
> that it properly parse the ELF file and follow its instuctions to put
> the segments in the appropriate places, or are you assuming that the
> decompressor can just skip that part and plonk the ELF file where it wants?
> 
> In other words, do you see the Phdrs as being descriptive or prescriptive?
> 

I was thinking prescriptive, having the decompressor read the output
stream and interpret it as ELF.  I guess a descriptive approach could be
made to work, too (I haven't really thought about that avenue of
approach), but the prescriptive model seems more powerful, at least to me.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> I still believe that we should provide, in effect, vmlinux as a
> (compressed) ELF file rather than provide the intermediate stage.  It
> would reduce the complexity of testing (all information provided about a
> stage have to be both guaranteed to even make sense in the future as
> well as be tested to conform to such information

I'm not sure I follow you.  Sure, you're right that the Phdr info
contained within the bzImage needs to be tested for correctness.  This
wouldn't normally happen when booting native, but when booting under the
most constrained environment - Xen - it will be tested (and I intend
making the Xen loader as strict as possible).  Of course, it won't help
if the Phdrs are overmap too much, but I don't think that matters too
much, so long as the mappings are not excessively large.

I'm not sure what you mean about "make sense in the future".  If you're
booting the kernel in a new paravirtualized environment, you've
presumably modified the kernel to understand that environment, and
perhaps had to update the boot image format a bit to deal with its
requirements.  I agree that updating the bzImage format may require
retesting in all the other environments, but I think that's probably
true for your scheme as well.

After all, you're assuming that the vmlinux itself provides all
necessary information to be loaded in any environment, which is not
necessarily true (it may need extra ELF notes, for example).  But if
there are any major structural changes needed in the vmlinux, then that
will be equally problematic for both directly using vmlinux and using
ELF-in-bzImage.  So I don't think your argument convincingly sways in
any particular direction.

> ) as well as cover a
> larger number of environments -- any environment where injecting data
> into memory is cheaper than execution is quite unhappy about the current
> system.  Such environments include heterogeneous embedded systems (think
> a slow CPU on a PCI card where the host CPU has direct access to the
> memory on the card) as well as simulators/emulators.
>   

Well, nothing in this scheme precludes the ELF file from being a plain
uncompressed kernel image.  If that's what these environments want, its
easy to provide with a small update to the Makefiles.

> For environments where so is appropriate it would even be possible to
> run the setup, invoke the code32_setup hook to do the decompression (and
> relocation, if appropriate) in host space.
>   

Well, that's what we currently have, and we can't break backwards
compatibility.

> This makes vmlinux (normally stripped) recoverable from the bzImage file
> and so anything that is currently booting vmlinux would be serviced by
> this scheme.
>   

I'm not sure I fully understand the mechanism you're proposing.  You
have the 16-bit setup code, the 32-bit decompressor, and an ELF.gz. Once
the decompressor has extracted the actual ELF file, are you proposing
that it properly parse the ELF file and follow its instuctions to put
the segments in the appropriate places, or are you assuming that the
decompressor can just skip that part and plonk the ELF file where it wants?

In other words, do you see the Phdrs as being descriptive or prescriptive?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> This patch makes the payload of the bzImage file an ELF file.  In
> other words, the bzImage is structured as follows:
>  - boot sector
>  - 16bit setup code
>  - ELF header
>   - decompressor
>   - compressed kernel
> 
> A bootloader may find the start of the ELF file by looking at the
> setup_size entry in the boot params, and using that to find the offset
> of the ELF header.  The ELF Phdrs contain all the mapped memory
> required to decompress and start booting the kernel.
> 
> One slightly complex part of this is that the bzImage boot_params need
> to know about the internal structure of the ELF file, at least to the
> extent of being able to point the core32_start entry at the ELF file's
> entrypoint, so that loaders which use this field will still work.
> 
> Similarly, the ELF header needs to know how big the kernel vmlinux's
> bss segment is, in order to make sure is is mapped properly.
> 
> To handle these two cases, we generate abstracted versions of the
> object files which only contain the symbols we care about (generated
> with objcopy --strip-all --keep-symbol=X), and then include those
> symbol tables with ld -R.

I still believe that we should provide, in effect, vmlinux as a
(compressed) ELF file rather than provide the intermediate stage.  It
would reduce the complexity of testing (all information provided about a
stage have to be both guaranteed to even make sense in the future as
well as be tested to conform to such information) as well as cover a
larger number of environments -- any environment where injecting data
into memory is cheaper than execution is quite unhappy about the current
system.  Such environments include heterogeneous embedded systems (think
a slow CPU on a PCI card where the host CPU has direct access to the
memory on the card) as well as simulators/emulators.

For environments where so is appropriate it would even be possible to
run the setup, invoke the code32_setup hook to do the decompression (and
relocation, if appropriate) in host space.

This makes vmlinux (normally stripped) recoverable from the bzImage file
and so anything that is currently booting vmlinux would be serviced by
this scheme.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
This patch makes the payload of the bzImage file an ELF file.  In
other words, the bzImage is structured as follows:
 - boot sector
 - 16bit setup code
 - ELF header
  - decompressor
  - compressed kernel

A bootloader may find the start of the ELF file by looking at the
setup_size entry in the boot params, and using that to find the offset
of the ELF header.  The ELF Phdrs contain all the mapped memory
required to decompress and start booting the kernel.

One slightly complex part of this is that the bzImage boot_params need
to know about the internal structure of the ELF file, at least to the
extent of being able to point the core32_start entry at the ELF file's
entrypoint, so that loaders which use this field will still work.

Similarly, the ELF header needs to know how big the kernel vmlinux's
bss segment is, in order to make sure is is mapped properly.

To handle these two cases, we generate abstracted versions of the
object files which only contain the symbols we care about (generated
with objcopy --strip-all --keep-symbol=X), and then include those
symbol tables with ld -R.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: H. Peter Anvin <[EMAIL PROTECTED]>
Cc: Vivek Goyal <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

---
 arch/i386/boot/Makefile   |   11 --
 arch/i386/boot/compressed/Makefile|   29 +--
 arch/i386/boot/compressed/elfhdr.S|   60 +
 arch/i386/boot/compressed/head.S  |9 ++--
 arch/i386/boot/compressed/notes.S |7 +++
 arch/i386/boot/compressed/vmlinux.lds |   24 ++---
 arch/i386/boot/header.S   |7 ---
 arch/i386/boot/setup.ld   |5 ++
 arch/i386/kernel/head.S   |1 
 arch/i386/kernel/vmlinux.lds.S|1 
 10 files changed, 131 insertions(+), 23 deletions(-)

===
--- a/arch/i386/boot/Makefile
+++ b/arch/i386/boot/Makefile
@@ -72,14 +72,19 @@ AFLAGS  := $(CFLAGS) -D__ASSEMBLY__
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-LDFLAGS_setup.elf  := -T
-$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
+$(obj)/zImage $(obj)/bzImage:  \
+   LDFLAGS :=  \
+   -R $(obj)/compressed/blob-syms  \
+   --defsym IMAGE_OFFSET=$(IMAGE_OFFSET) -T
+
+$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS)\
+   $(obj)/compressed/blob-syms FORCE
$(call if_changed,ld)
 
 $(obj)/payload.o:  EXTRA_AFLAGS := -Wa,-I$(obj)
 $(obj)/payload.o: $(src)/payload.S $(obj)/blob.bin
 
-$(obj)/compressed/blob: FORCE
+$(obj)/compressed/blob $(obj)/compressed/blob-syms: FORCE
$(Q)$(MAKE) $(build)=$(obj)/compressed IMAGE_OFFSET=$(IMAGE_OFFSET) $@
 
 # Set this if you want to pass append arguments to the zdisk/fdimage/isoimage 
kernel
===
--- a/arch/i386/boot/compressed/Makefile
+++ b/arch/i386/boot/compressed/Makefile
@@ -4,21 +4,42 @@
 # create a compressed vmlinux image from the original vmlinux
 #
 
-targets:= blob vmlinux.bin vmlinux.bin.gz head.o misc.o 
piggy.o \
+targets:= blob vmlinux.bin vmlinux.bin.gz \
+   elfhdr.o head.o misc.o notes.o piggy.o \
vmlinux.bin.all vmlinux.relocs
 
-LDFLAGS_blob   := -T
 hostprogs-y:= relocs
 
 CFLAGS  := -m32 -D__KERNEL__ $(LINUX_INCLUDE) -O2 \
   -fno-strict-aliasing -fPIC \
   $(call cc-option,-ffreestanding) \
   $(call cc-option,-fno-stack-protector)
-LDFLAGS := -m elf_i386
+LDFLAGS := -R $(obj)/vmlinux-syms --defsym IMAGE_OFFSET=$(IMAGE_OFFSET) -T
 
-$(obj)/blob: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o 
FORCE
+OBJS=$(addprefix $(obj)/,elfhdr.o head.o misc.o notes.o piggy.o)
+
+$(obj)/blob: $(src)/vmlinux.lds $(obj)/vmlinux-syms $(OBJS) FORCE
$(call if_changed,ld)
@:
+
+# Generate a stripped-down object including only the symbols needed
+# so that we can get them with ld -R. Direct stderr to /dev/null to
+# shut useless warning up.
+quiet_cmd_symextract = SYMEXT $@
+  cmd_symextract = objcopy -S \
+   $(addprefix -j,$(EXTRACTSECTS)) \
+   $(addprefix -K,$(EXTRACTSYMS)) \
+   $< $@ 2>/dev/null
+
+$(obj)/blob-syms: EXTRACTSYMS := blob_entry blob_payload
+$(obj)/blob-syms: EXTRACTSECTS := .text.head .data.compressed
+$(obj)/blob-syms: $(obj)/blob FORCE
+   $(call if_changed,symextract)
+
+$(obj)/vmlinux-syms: EXTRACTSYMS := __reserved_end
+$(obj)/vmlinux-syms: EXTRACTSECTS := .bss
+$(obj)/vmlinux-syms: vmlinux FORCE
+   $(call if_changed,symextract)
 
 $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
===

[PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
This patch makes the payload of the bzImage file an ELF file.  In
other words, the bzImage is structured as follows:
 - boot sector
 - 16bit setup code
 - ELF header
  - decompressor
  - compressed kernel

A bootloader may find the start of the ELF file by looking at the
setup_size entry in the boot params, and using that to find the offset
of the ELF header.  The ELF Phdrs contain all the mapped memory
required to decompress and start booting the kernel.

One slightly complex part of this is that the bzImage boot_params need
to know about the internal structure of the ELF file, at least to the
extent of being able to point the core32_start entry at the ELF file's
entrypoint, so that loaders which use this field will still work.

Similarly, the ELF header needs to know how big the kernel vmlinux's
bss segment is, in order to make sure is is mapped properly.

To handle these two cases, we generate abstracted versions of the
object files which only contain the symbols we care about (generated
with objcopy --strip-all --keep-symbol=X), and then include those
symbol tables with ld -R.

Signed-off-by: Jeremy Fitzhardinge [EMAIL PROTECTED]
Cc: Eric W. Biederman [EMAIL PROTECTED]
Cc: H. Peter Anvin [EMAIL PROTECTED]
Cc: Vivek Goyal [EMAIL PROTECTED]
Cc: Rusty Russell [EMAIL PROTECTED]

---
 arch/i386/boot/Makefile   |   11 --
 arch/i386/boot/compressed/Makefile|   29 +--
 arch/i386/boot/compressed/elfhdr.S|   60 +
 arch/i386/boot/compressed/head.S  |9 ++--
 arch/i386/boot/compressed/notes.S |7 +++
 arch/i386/boot/compressed/vmlinux.lds |   24 ++---
 arch/i386/boot/header.S   |7 ---
 arch/i386/boot/setup.ld   |5 ++
 arch/i386/kernel/head.S   |1 
 arch/i386/kernel/vmlinux.lds.S|1 
 10 files changed, 131 insertions(+), 23 deletions(-)

===
--- a/arch/i386/boot/Makefile
+++ b/arch/i386/boot/Makefile
@@ -72,14 +72,19 @@ AFLAGS  := $(CFLAGS) -D__ASSEMBLY__
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-LDFLAGS_setup.elf  := -T
-$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
+$(obj)/zImage $(obj)/bzImage:  \
+   LDFLAGS :=  \
+   -R $(obj)/compressed/blob-syms  \
+   --defsym IMAGE_OFFSET=$(IMAGE_OFFSET) -T
+
+$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS)\
+   $(obj)/compressed/blob-syms FORCE
$(call if_changed,ld)
 
 $(obj)/payload.o:  EXTRA_AFLAGS := -Wa,-I$(obj)
 $(obj)/payload.o: $(src)/payload.S $(obj)/blob.bin
 
-$(obj)/compressed/blob: FORCE
+$(obj)/compressed/blob $(obj)/compressed/blob-syms: FORCE
$(Q)$(MAKE) $(build)=$(obj)/compressed IMAGE_OFFSET=$(IMAGE_OFFSET) $@
 
 # Set this if you want to pass append arguments to the zdisk/fdimage/isoimage 
kernel
===
--- a/arch/i386/boot/compressed/Makefile
+++ b/arch/i386/boot/compressed/Makefile
@@ -4,21 +4,42 @@
 # create a compressed vmlinux image from the original vmlinux
 #
 
-targets:= blob vmlinux.bin vmlinux.bin.gz head.o misc.o 
piggy.o \
+targets:= blob vmlinux.bin vmlinux.bin.gz \
+   elfhdr.o head.o misc.o notes.o piggy.o \
vmlinux.bin.all vmlinux.relocs
 
-LDFLAGS_blob   := -T
 hostprogs-y:= relocs
 
 CFLAGS  := -m32 -D__KERNEL__ $(LINUX_INCLUDE) -O2 \
   -fno-strict-aliasing -fPIC \
   $(call cc-option,-ffreestanding) \
   $(call cc-option,-fno-stack-protector)
-LDFLAGS := -m elf_i386
+LDFLAGS := -R $(obj)/vmlinux-syms --defsym IMAGE_OFFSET=$(IMAGE_OFFSET) -T
 
-$(obj)/blob: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o 
FORCE
+OBJS=$(addprefix $(obj)/,elfhdr.o head.o misc.o notes.o piggy.o)
+
+$(obj)/blob: $(src)/vmlinux.lds $(obj)/vmlinux-syms $(OBJS) FORCE
$(call if_changed,ld)
@:
+
+# Generate a stripped-down object including only the symbols needed
+# so that we can get them with ld -R. Direct stderr to /dev/null to
+# shut useless warning up.
+quiet_cmd_symextract = SYMEXT $@
+  cmd_symextract = objcopy -S \
+   $(addprefix -j,$(EXTRACTSECTS)) \
+   $(addprefix -K,$(EXTRACTSYMS)) \
+   $ $@ 2/dev/null
+
+$(obj)/blob-syms: EXTRACTSYMS := blob_entry blob_payload
+$(obj)/blob-syms: EXTRACTSECTS := .text.head .data.compressed
+$(obj)/blob-syms: $(obj)/blob FORCE
+   $(call if_changed,symextract)
+
+$(obj)/vmlinux-syms: EXTRACTSYMS := __reserved_end
+$(obj)/vmlinux-syms: EXTRACTSECTS := .bss
+$(obj)/vmlinux-syms: vmlinux FORCE
+   $(call if_changed,symextract)
 
 $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
===
--- /dev/null

Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
 This patch makes the payload of the bzImage file an ELF file.  In
 other words, the bzImage is structured as follows:
  - boot sector
  - 16bit setup code
  - ELF header
   - decompressor
   - compressed kernel
 
 A bootloader may find the start of the ELF file by looking at the
 setup_size entry in the boot params, and using that to find the offset
 of the ELF header.  The ELF Phdrs contain all the mapped memory
 required to decompress and start booting the kernel.
 
 One slightly complex part of this is that the bzImage boot_params need
 to know about the internal structure of the ELF file, at least to the
 extent of being able to point the core32_start entry at the ELF file's
 entrypoint, so that loaders which use this field will still work.
 
 Similarly, the ELF header needs to know how big the kernel vmlinux's
 bss segment is, in order to make sure is is mapped properly.
 
 To handle these two cases, we generate abstracted versions of the
 object files which only contain the symbols we care about (generated
 with objcopy --strip-all --keep-symbol=X), and then include those
 symbol tables with ld -R.

I still believe that we should provide, in effect, vmlinux as a
(compressed) ELF file rather than provide the intermediate stage.  It
would reduce the complexity of testing (all information provided about a
stage have to be both guaranteed to even make sense in the future as
well as be tested to conform to such information) as well as cover a
larger number of environments -- any environment where injecting data
into memory is cheaper than execution is quite unhappy about the current
system.  Such environments include heterogeneous embedded systems (think
a slow CPU on a PCI card where the host CPU has direct access to the
memory on the card) as well as simulators/emulators.

For environments where so is appropriate it would even be possible to
run the setup, invoke the code32_setup hook to do the decompression (and
relocation, if appropriate) in host space.

This makes vmlinux (normally stripped) recoverable from the bzImage file
and so anything that is currently booting vmlinux would be serviced by
this scheme.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
 I still believe that we should provide, in effect, vmlinux as a
 (compressed) ELF file rather than provide the intermediate stage.  It
 would reduce the complexity of testing (all information provided about a
 stage have to be both guaranteed to even make sense in the future as
 well as be tested to conform to such information

I'm not sure I follow you.  Sure, you're right that the Phdr info
contained within the bzImage needs to be tested for correctness.  This
wouldn't normally happen when booting native, but when booting under the
most constrained environment - Xen - it will be tested (and I intend
making the Xen loader as strict as possible).  Of course, it won't help
if the Phdrs are overmap too much, but I don't think that matters too
much, so long as the mappings are not excessively large.

I'm not sure what you mean about make sense in the future.  If you're
booting the kernel in a new paravirtualized environment, you've
presumably modified the kernel to understand that environment, and
perhaps had to update the boot image format a bit to deal with its
requirements.  I agree that updating the bzImage format may require
retesting in all the other environments, but I think that's probably
true for your scheme as well.

After all, you're assuming that the vmlinux itself provides all
necessary information to be loaded in any environment, which is not
necessarily true (it may need extra ELF notes, for example).  But if
there are any major structural changes needed in the vmlinux, then that
will be equally problematic for both directly using vmlinux and using
ELF-in-bzImage.  So I don't think your argument convincingly sways in
any particular direction.

 ) as well as cover a
 larger number of environments -- any environment where injecting data
 into memory is cheaper than execution is quite unhappy about the current
 system.  Such environments include heterogeneous embedded systems (think
 a slow CPU on a PCI card where the host CPU has direct access to the
 memory on the card) as well as simulators/emulators.
   

Well, nothing in this scheme precludes the ELF file from being a plain
uncompressed kernel image.  If that's what these environments want, its
easy to provide with a small update to the Makefiles.

 For environments where so is appropriate it would even be possible to
 run the setup, invoke the code32_setup hook to do the decompression (and
 relocation, if appropriate) in host space.
   

Well, that's what we currently have, and we can't break backwards
compatibility.

 This makes vmlinux (normally stripped) recoverable from the bzImage file
 and so anything that is currently booting vmlinux would be serviced by
 this scheme.
   

I'm not sure I fully understand the mechanism you're proposing.  You
have the 16-bit setup code, the 32-bit decompressor, and an ELF.gz. Once
the decompressor has extracted the actual ELF file, are you proposing
that it properly parse the ELF file and follow its instuctions to put
the segments in the appropriate places, or are you assuming that the
decompressor can just skip that part and plonk the ELF file where it wants?

In other words, do you see the Phdrs as being descriptive or prescriptive?

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
 
 I'm not sure I fully understand the mechanism you're proposing.  You
 have the 16-bit setup code, the 32-bit decompressor, and an ELF.gz. Once
 the decompressor has extracted the actual ELF file, are you proposing
 that it properly parse the ELF file and follow its instuctions to put
 the segments in the appropriate places, or are you assuming that the
 decompressor can just skip that part and plonk the ELF file where it wants?
 
 In other words, do you see the Phdrs as being descriptive or prescriptive?
 

I was thinking prescriptive, having the decompressor read the output
stream and interpret it as ELF.  I guess a descriptive approach could be
made to work, too (I haven't really thought about that avenue of
approach), but the prescriptive model seems more powerful, at least to me.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
 I was thinking prescriptive, having the decompressor read the output
 stream and interpret it as ELF.  I guess a descriptive approach could be
 made to work, too (I haven't really thought about that avenue of
 approach), but the prescriptive model seems more powerful, at least to me.

Certainly, but much harder to implement.  The ELF parser needs to be
prepared to move itself around to get out of the way of the ELF file. 
It's a fairly large change from how it works now.

I was thinking of making the ELF file entirely descriptive, since its
just a set of ELF headers inserted into the existing bzImage structure,
and it still relies on the bzImage being build properly in the first place.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
 
 Certainly, but much harder to implement.  The ELF parser needs to be
 prepared to move itself around to get out of the way of the ELF file. 
 It's a fairly large change from how it works now.
 

It doesn't if we simply declare that a certain chunk of memory is
available to it, for the case where it runs in the native configuration.
Since it doesn't have to support *any* ELF file, just the kernel one,
that's an option.

On the other hand, I guess with the decompressor/ELF parser being PIC,
one would simply look for the highest used address, and relocate itself
above that point.  It's not really all that different from what the
decompressor does today, except that it knows the address a priori.

 I was thinking of making the ELF file entirely descriptive, since its
 just a set of ELF headers inserted into the existing bzImage structure,
 and it still relies on the bzImage being build properly in the first place.

Again, it's an option.  The downside is that you don't get the automatic
test coverage of having it be exercised as often as possible.

-hpa

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
 It doesn't if we simply declare that a certain chunk of memory is
 available to it, for the case where it runs in the native configuration.
 Since it doesn't have to support *any* ELF file, just the kernel one,
 that's an option.
   

I suppose.  But given that its always built at the same time as - and
linked to - the kernel itself, it can have private knowledge about the
kernel.

 On the other hand, I guess with the decompressor/ELF parser being PIC,
 one would simply look for the highest used address, and relocate itself
 above that point.  It's not really all that different from what the
 decompressor does today, except that it knows the address a priori.
   

Yes, it would have to decompress the ELF file into a temp buffer, and
then rearrange itself and the decompressed ELF file to make space for
the ELF file's final location.  Seems a bit more complex because it has
to be done in the middle of execution rather that at start of day.  But
perhaps that doesn't matter very much.

 I was thinking of making the ELF file entirely descriptive, since its
 just a set of ELF headers inserted into the existing bzImage structure,
 and it still relies on the bzImage being build properly in the first place.
 

 Again, it's an option.  The downside is that you don't get the automatic
 test coverage of having it be exercised as often as possible.

I don't follow your argument at all.

I'm proposing the kernel take the same code path regardless of how its
booted, with the only two variations:

   1. boot all the way up from 16-bit mode, or
   2. start directly in 32-bit mode

which is essentially the current situation (setup vs code32_start).  All
I'm adding is a bit more metadata for the domain builder to work with. 
The code will get exercised on every boot in every environment, and the
metadata will be tested by whichever environment cares about it.

You're proposing that we add a third booting variation, where the
bootloader takes on the responsibility for decompressing and loading the
kernel's ELF image.  In addition, you're proposing changing the existing
32-bit portion of the boot to perform the same job as the third method,
but in a way which is not reusable by a paravirtual domain builder. 
This means that the boot path is unique for each boot environment, and
so will overall get less coverage.

Given that one axis of the test matrix - number of subarchtectures -
is the same in both cases, and the other axis - number of ways of
booting - is larger in your proposal, it seems to me that your's has
the higher testing burden.

Anyway, I added an extra pointer in the boot_params so that you can
implement it that way if you really want (no real reason you can have
ELF within ELF within bzImage, but it starts to look a bit
engineering-by-compromise at that point).  It isn't, however, the
approach I want to take with Xen.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Rob Landley
On Wednesday 06 June 2007 7:41 pm, H. Peter Anvin wrote:
 This makes vmlinux (normally stripped) recoverable from the bzImage file
 and so anything that is currently booting vmlinux would be serviced by
 this scheme.

Would this make it sane to strip the initramfs image out of vmlinux with 
objdump and replace it with another one, or are there offsets resolved during 
the build that stop that for vmlinux?

Rob
-- 
The Google cluster became self-aware at 2:14am EDT August 29, 2007...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Rob Landley wrote:
 On Wednesday 06 June 2007 7:41 pm, H. Peter Anvin wrote:
 This makes vmlinux (normally stripped) recoverable from the bzImage file
 and so anything that is currently booting vmlinux would be serviced by
 this scheme.
 
 Would this make it sane to strip the initramfs image out of vmlinux with 
 objdump and replace it with another one, or are there offsets resolved during 
 the build that stop that for vmlinux?
 

There probably are offsets resolved during the build.  However, that
wouldn't be all that hard to fix.  Still, one can argue whether or not
it is sane under any definition to do this kind of unpacking-repacking
of ELF files.

-hpa

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/