Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-30 Thread James Bottomley
On Wed, 2015-09-30 at 09:43 -0700, Andy Lutomirski wrote:
> On Wed, Sep 30, 2015 at 2:30 AM, Ard Biesheuvel
>  wrote:
> > On 29 September 2015 at 23:58, Laszlo Ersek  wrote:
> >> On 09/28/15 08:41, Matthew Garrett wrote:
> >>> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:
> >>>
>  So the question is, what does Windows do?
> >>>
> >>> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap()
> >>> arguments to the qemu debug port. Unfortunately I'm about to drop
> >>> mostly  offline for a week, otherwise I'd give it a go...
> > [...]
> >> Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10
> >> guests, with the properties table feature enabled vs. disabled in the
> >> firmware. (All three Windows guests were updated first though.)
> >>
> >> All three Windows OSes adapt their SetVirtualAddressMap() calls, when
> >> the feature is enabled in the firmware. However, Windows 8.1 crashes
> >> nonetheless (BSOD, I forget the fault details, sorry). Windows Server
> >> 2012 R2 and Windows 10 boot fine.
> >>
> >
> > Looking at the log, it seems the VA mapping strategy is actually the
> > same (i.e., bottom-up for Win10), and the difference can be explained
> > by the differences in the memory map provided by the firmware to the
> > OS. And indeed, the Win8.1 log shows the following:
> >
> >  # MemType Phys 0x  Virt 0x  Size 0x Attributes
> > -- ---   --- ---
> >  0 RtData  7EC21000 FFBFA000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  1 RtCode  7EC27000 FFBF3000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> >  2 RtData  7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  3 RtData  7EC35000 FFBEB000 0001000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  4 RtCode  7EC36000 FFBE6000 0005000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> >  5 RtData  7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  6 RtData  7EC6 FFBDE000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  7 RtCode  7EC66000 FFBD5000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> >  8 RtData  7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  9 RtData  7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 10 RtCode  7ED77000 FFA63000 0097000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 11 RtData  7EE0E000 FFA58000 000B000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 12 RtData  7FE99000 FFA52000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 13 RtCode  7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 14 RtData  7FEA5000 FFA49000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 15 RtCode  7FEA8000 FFA42000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 16 RtData  7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 17 RtCode  7FEB2000 FFA36000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 18 RtData  7FEBB000 FFA33000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 19 RtCode  7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 20 RtData  7FEC7000 FFA04000 0026000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 21 RtData  7FFD FF9E4000 002 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 22 RtData  FFE0 FF7E4000 020 [UC|  |  |  |  |XP|  |  |  |RT]
> >
> > I.e., the physical addresses increase while the virtual addresses
> > decrease, and since each consecutive RuntimeCode/RuntimeData pair
> > constitutes a PE/COFF image (.text and .data, respectively), the
> > PE/COFF images appear corrupted in the virtual space.
> 
> All of this garbage makes me want to ask a rhetorical question:
> 
> Why on Earth did anyone think it's a good idea to invoke EFI functions
> at CPL0 once the OS is booted?

I'm afraid the originators of EFI (Intel) look on it as a DOS
replacement ... with the same OS support.

> And a more practical question:
> 
> Do we actually have to invoke EFI functions at CPL0?
> 
> I really mean it.  Sure, for things like reboot where we give up
> control and don't get it back, we need to do that.  But for things
> like variable access, the EFI code should really only need access to
> EFI memor (with a known PA -> VA map) and the ability to trigger an
> SMI.  Doing it at CPL3 could require more fixups than would really
> make sense, but could we virtualize it instead?
> 
> Actually, CPL3 + IOPL3 just might work.
> 
> Heck, on mixed-mode, we're already invoke EFI functions in compat
> mode, and that seems okay, so those functions can't be poking at any
> CPU state that varies between long and 32-bit modes.

It's hard.  The EFI functions expect to interact directly with kernel
memory, which they can't at CPL3.  We could vector all that through a
CPL3 readable buffer but anything within EFI that uses privileged
instructions will fault and we'll have to handle it ... this really
sounds like a can of worms.  Especially as windows will be no help
testing all of this because it will call in at CPL0.

James

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-30 Thread Andy Lutomirski
On Wed, Sep 30, 2015 at 2:30 AM, Ard Biesheuvel
 wrote:
> On 29 September 2015 at 23:58, Laszlo Ersek  wrote:
>> On 09/28/15 08:41, Matthew Garrett wrote:
>>> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:
>>>
 So the question is, what does Windows do?
>>>
>>> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap()
>>> arguments to the qemu debug port. Unfortunately I'm about to drop
>>> mostly  offline for a week, otherwise I'd give it a go...
> [...]
>> Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10
>> guests, with the properties table feature enabled vs. disabled in the
>> firmware. (All three Windows guests were updated first though.)
>>
>> All three Windows OSes adapt their SetVirtualAddressMap() calls, when
>> the feature is enabled in the firmware. However, Windows 8.1 crashes
>> nonetheless (BSOD, I forget the fault details, sorry). Windows Server
>> 2012 R2 and Windows 10 boot fine.
>>
>
> Looking at the log, it seems the VA mapping strategy is actually the
> same (i.e., bottom-up for Win10), and the difference can be explained
> by the differences in the memory map provided by the firmware to the
> OS. And indeed, the Win8.1 log shows the following:
>
>  # MemType Phys 0x  Virt 0x  Size 0x Attributes
> -- ---   --- ---
>  0 RtData  7EC21000 FFBFA000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  1 RtCode  7EC27000 FFBF3000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
>  2 RtData  7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  3 RtData  7EC35000 FFBEB000 0001000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  4 RtCode  7EC36000 FFBE6000 0005000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
>  5 RtData  7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  6 RtData  7EC6 FFBDE000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  7 RtCode  7EC66000 FFBD5000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
>  8 RtData  7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  9 RtData  7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 10 RtCode  7ED77000 FFA63000 0097000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 11 RtData  7EE0E000 FFA58000 000B000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 12 RtData  7FE99000 FFA52000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 13 RtCode  7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 14 RtData  7FEA5000 FFA49000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 15 RtCode  7FEA8000 FFA42000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 16 RtData  7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 17 RtCode  7FEB2000 FFA36000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 18 RtData  7FEBB000 FFA33000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 19 RtCode  7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 20 RtData  7FEC7000 FFA04000 0026000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 21 RtData  7FFD FF9E4000 002 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 22 RtData  FFE0 FF7E4000 020 [UC|  |  |  |  |XP|  |  |  |RT]
>
> I.e., the physical addresses increase while the virtual addresses
> decrease, and since each consecutive RuntimeCode/RuntimeData pair
> constitutes a PE/COFF image (.text and .data, respectively), the
> PE/COFF images appear corrupted in the virtual space.

All of this garbage makes me want to ask a rhetorical question:

Why on Earth did anyone think it's a good idea to invoke EFI functions
at CPL0 once the OS is booted?

And a more practical question:

Do we actually have to invoke EFI functions at CPL0?

I really mean it.  Sure, for things like reboot where we give up
control and don't get it back, we need to do that.  But for things
like variable access, the EFI code should really only need access to
EFI memor (with a known PA -> VA map) and the ability to trigger an
SMI.  Doing it at CPL3 could require more fixups than would really
make sense, but could we virtualize it instead?

Actually, CPL3 + IOPL3 just might work.

Heck, on mixed-mode, we're already invoke EFI functions in compat
mode, and that seems okay, so those functions can't be poking at any
CPU state that varies between long and 32-bit modes.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-30 Thread Ard Biesheuvel
On 29 September 2015 at 23:58, Laszlo Ersek  wrote:
> On 09/28/15 08:41, Matthew Garrett wrote:
>> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:
>>
>>> So the question is, what does Windows do?
>>
>> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap()
>> arguments to the qemu debug port. Unfortunately I'm about to drop
>> mostly  offline for a week, otherwise I'd give it a go...
[...]
> Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10
> guests, with the properties table feature enabled vs. disabled in the
> firmware. (All three Windows guests were updated first though.)
>
> All three Windows OSes adapt their SetVirtualAddressMap() calls, when
> the feature is enabled in the firmware. However, Windows 8.1 crashes
> nonetheless (BSOD, I forget the fault details, sorry). Windows Server
> 2012 R2 and Windows 10 boot fine.
>

Looking at the log, it seems the VA mapping strategy is actually the
same (i.e., bottom-up for Win10), and the difference can be explained
by the differences in the memory map provided by the firmware to the
OS. And indeed, the Win8.1 log shows the following:

 # MemType Phys 0x  Virt 0x  Size 0x Attributes
-- ---   --- ---
 0 RtData  7EC21000 FFBFA000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 1 RtCode  7EC27000 FFBF3000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
 2 RtData  7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 3 RtData  7EC35000 FFBEB000 0001000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 4 RtCode  7EC36000 FFBE6000 0005000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
 5 RtData  7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 6 RtData  7EC6 FFBDE000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 7 RtCode  7EC66000 FFBD5000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
 8 RtData  7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 9 RtData  7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
10 RtCode  7ED77000 FFA63000 0097000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
11 RtData  7EE0E000 FFA58000 000B000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
12 RtData  7FE99000 FFA52000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
13 RtCode  7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
14 RtData  7FEA5000 FFA49000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
15 RtCode  7FEA8000 FFA42000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
16 RtData  7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
17 RtCode  7FEB2000 FFA36000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
18 RtData  7FEBB000 FFA33000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
19 RtCode  7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
20 RtData  7FEC7000 FFA04000 0026000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
21 RtData  7FFD FF9E4000 002 [UC|WC|WT|WB|  |XP|  |  |  |RT]
22 RtData  FFE0 FF7E4000 020 [UC|  |  |  |  |XP|  |  |  |RT]

I.e., the physical addresses increase while the virtual addresses
decrease, and since each consecutive RuntimeCode/RuntimeData pair
constitutes a PE/COFF image (.text and .data, respectively), the
PE/COFF images appear corrupted in the virtual space.

> I uploaded the verbose OVMF log files from all six guest boots to [5].
> The tables you might be interested in are dumped at the ends of the log
> files.
>
> All three guests had 2GB of RAM. They had different VM configurations,
> but between disabling and enabling the properties table feature, no
> other knob was tweaked. Therefore the two log files of the same guest
> should be comparable against each other, for each guest. For example:
>
> $ colordiff -u ovmf.win10.prop.{disabled,enabled}.log
>
> Because stuff hosted on the web privately tends to go away, I'll quote
> that diff here, for posterity:
>
>> --- ovmf.win10.prop.disabled.log  2015-09-29 22:01:45.252126086 +0200
>> +++ ovmf.win10.prop.enabled.log   2015-09-29 21:50:54.579475078 +0200
[...]

Thanks a lot for taking the time, this is very useful info.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-29 Thread H. Peter Anvin
On 09/27/2015 11:16 PM, Ingo Molnar wrote:
> 
> So the question is, what does Windows do?
> 
> PC firmware is a hostile environment for Linux, to be compatible the best we 
> can 
> do is to mimic the environment that the firmware is tested under - i.e. try 
> to use 
> the firmware in the way Windows uses it.
> 

Windows apparently went through the same exercise of breakage followed
by a fix.  It is unknown if Windows will preserve gaps since those are
not manifest on existing firmware.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-29 Thread Laszlo Ersek
On 09/28/15 08:41, Matthew Garrett wrote:
> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:
>
>> So the question is, what does Windows do?
>
> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap()
> arguments to the qemu debug port. Unfortunately I'm about to drop
> mostly  offline for a week, otherwise I'd give it a go...

In order to enable the properties table feature in OVMF, two conditions
have to be satisfied:

(a) set gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable to TRUE
(b) build DXE_RUNTIME_DRIVER modules with 4KB section alignment

(a) used to be possible only statically (ie. at build time), but since
edk2 commit ab081a50e5 [1] it can be done dynamically too, on the qemu
command line. See that edk2 commit for details.

This OVMF feature depends on QEMU commit 81b2b81062 [2]. Another patch
from Gabriel is under review that enables such simple settings without
host-side files [3].

(b) is satisfied by the edk2 patch that Ard posted today [4].

Because I know how much people like building OVMF, I uploaded a fresh
OVMF binary (which includes a number of other patches from my personal
tree, but those are irrelevent here), with a matching varstore template,
to [5]. The relevant edk2 patches (ie. the one from Ard [4], and a debug
patch like you mention above) can also be found under [5].

(

I recommend to start QEMU like this:

  # Create a new (empty) varstore for the virtual machine, from the
  # varstore template.

  cp OVMF_VARS.fd my-vars.fd

  # Start the VM, with direct kernel boot.

  qemu-system-x86_64 \
  -m 2048 \
  -M pc,accel=kvm \
  \
  -device qxl-vga \
  \
  -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on \
  -drive if=pflash,format=raw,unit=1,file=my-vars.fd \
  \
  -debugcon file:debug.log \
  -global isa-debugcon.iobase=0x402 \
  \
  -chardev stdio,signal=off,mux=on,id=char0 \
  -mon chardev=char0,mode=readline,default \
  -serial chardev:char0 \
  \
  -kernel vmlinuz... \
  -initrd initramfs... \
  -append "root=..." \
  \
  ...

The guest will have 2GB of RAM, it will use KVM, the virtual video card
will be QXL; the OVMF debug log will be written to "debug.log". The key
combination [Control-A C] will switch between the QEMU monitor prompt
and the serial port of the guest. The above should be suitable for rapid
testing of kernels built on the host.

)

Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10
guests, with the properties table feature enabled vs. disabled in the
firmware. (All three Windows guests were updated first though.)

All three Windows OSes adapt their SetVirtualAddressMap() calls, when
the feature is enabled in the firmware. However, Windows 8.1 crashes
nonetheless (BSOD, I forget the fault details, sorry). Windows Server
2012 R2 and Windows 10 boot fine.

I uploaded the verbose OVMF log files from all six guest boots to [5].
The tables you might be interested in are dumped at the ends of the log
files.

All three guests had 2GB of RAM. They had different VM configurations,
but between disabling and enabling the properties table feature, no
other knob was tweaked. Therefore the two log files of the same guest
should be comparable against each other, for each guest. For example:

$ colordiff -u ovmf.win10.prop.{disabled,enabled}.log

Because stuff hosted on the web privately tends to go away, I'll quote
that diff here, for posterity:

> --- ovmf.win10.prop.disabled.log  2015-09-29 22:01:45.252126086 +0200
> +++ ovmf.win10.prop.enabled.log   2015-09-29 21:50:54.579475078 +0200
> @@ -24,7 +24,7 @@
>  QemuFwCfg interface is supported.
>  Platform PEIM Loaded
>  CMOS:
> -00: 38 00 01 00 22 00 03 29 09 15 26 02 00 80 00 00
> +00: 48 00 50 00 21 00 03 29 09 15 26 02 00 80 00 00
>  10: 00 00 00 00 06 80 02 FF FF 00 00 00 00 00 00 00
>  20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  30: FF FF 20 00 00 7F 00 20 30 00 00 00 00 12 00 00
> @@ -1110,6 +1110,17 @@
>  InstallProtocolInterface: 4CF5B200-68B8-4CA5-9EEC-B23E3F50029A 7F271568
>  [Variable]END_OF_DXE is signaled
>  Initialize variable error flag (FF)
> +MemoryProtectionAttribute - 0x0001
> +Total Image Count - 0x8
> +Dump ImageRecord:
> +  Image[0]: 0x7EC26000 - 0xA000
> +  Image[1]: 0x7EC35000 - 0x8000
> +  Image[2]: 0x7EC65000 - 0xC000
> +  Image[3]: 0x7ED76000 - 0x000A3000
> +  Image[4]: 0x7FE9E000 - 0x9000
> +  Image[5]: 0x7FEA7000 - 0xA000
> +  Image[6]: 0x7FEB1000 - 0xC000
> +  Image[7]: 0x7FEBD000 - 0xC000
>  S3Ready!
>  SaveLockBox: Guid=DEA652B0-D587-4C54-B5B4-C682E7A0AA3D Buffer=7FEEE000 
> Length=0xA
>  SetLockBoxAttributes: Guid=DEA652B0-D587-4C54-B5B4-C682E7A0AA3D 
> Attributes=0x1
> @@ -1822,18 +1833,29 @@
>  InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7F0EBA00
>  Loading driver at 0x0001000 EntryPoint=0x00010014790 bootmgfw.efi
> 

Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-29 Thread Matt Fleming
On Tue, 29 Sep, at 12:41:23PM, Ard Biesheuvel wrote:
> 
> OK, fair enough. I agree that setting the flag for 32-bit would be
> semantically correct. I will leave it to Matt to comment whether it is
> reasonable in terms of changes to other parts of the code.

It should be pretty minimal. Let me take a swing at the patch.
 
-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-29 Thread Matt Fleming
On Tue, 29 Sep, at 11:12:30AM, Ingo Molnar wrote:
> 
> * Ard Biesheuvel  wrote:
> 
> > > except that I don't think
> > > the condition on 64-bit makes any sense:
> > >
> > > +   if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> > >
> > > I can see us being nervous wrt. backported patches, but is there any 
> > > strong reason
> > > to not follow this up with a third (non-backported) patch that changes 
> > > this to:
> > >
> > > +   if (!efi_enabled(EFI_OLD_MEMMAP)) {
> > >
> > > for v4.4?
> > >
> > 
> > The 32-bit side essentially implements the old memmap only, which is the 
> > the 
> > bottom-up version. So old memmap will be implied by 32-bit but not set in 
> > the 
> > EFI flags, resulting in the reverse enumeration being used with the 
> > bottom-up 
> > mapping logic. The net result of that is that we create the same problem 
> > for 
> > 32-bit that we are trying to solve for 64-bit, i.e., the regions will end 
> > up in 
> > reverse order in the VA mapping.
> > 
> > To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP 
> > unconditionally on 32-bit x86. Or we could reshuffle variables and 
> > conditionals 
> > in various other way.
> 
> Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects.
 
Right, I think that's a very good suggestion, because like Ard
mentioned, since EFI_OLD_MEMMAP is implied for 32-bit (there's no
other way to map stuff currently), so it makes sense to force set the
bit.

> > [...] I am not convinced that the overall end result will be any better 
> > though.
> 
> That's not true, we change an obscure, implicit dependency on 32-bit detail 
> to an 
> explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a 
> clear 
> improvement.

Agreed.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-29 Thread Ard Biesheuvel
On 29 September 2015 at 11:12, Ingo Molnar  wrote:
>
> * Ard Biesheuvel  wrote:
>
>> > except that I don't think
>> > the condition on 64-bit makes any sense:
>> >
>> > +   if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>> >
>> > I can see us being nervous wrt. backported patches, but is there any 
>> > strong reason
>> > to not follow this up with a third (non-backported) patch that changes 
>> > this to:
>> >
>> > +   if (!efi_enabled(EFI_OLD_MEMMAP)) {
>> >
>> > for v4.4?
>> >
>>
>> The 32-bit side essentially implements the old memmap only, which is the the
>> bottom-up version. So old memmap will be implied by 32-bit but not set in the
>> EFI flags, resulting in the reverse enumeration being used with the bottom-up
>> mapping logic. The net result of that is that we create the same problem for
>> 32-bit that we are trying to solve for 64-bit, i.e., the regions will end up 
>> in
>> reverse order in the VA mapping.
>>
>> To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP
>> unconditionally on 32-bit x86. Or we could reshuffle variables and 
>> conditionals
>> in various other way.
>
> Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects.
>
>> [...] I am not convinced that the overall end result will be any better 
>> though.
>
> That's not true, we change an obscure, implicit dependency on 32-bit detail 
> to an
> explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a 
> clear
> improvement.
>

OK, fair enough. I agree that setting the flag for 32-bit would be
semantically correct. I will leave it to Matt to comment whether it is
reasonable in terms of changes to other parts of the code.

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-29 Thread Ingo Molnar

* Ard Biesheuvel  wrote:

> > except that I don't think
> > the condition on 64-bit makes any sense:
> >
> > +   if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> >
> > I can see us being nervous wrt. backported patches, but is there any strong 
> > reason
> > to not follow this up with a third (non-backported) patch that changes this 
> > to:
> >
> > +   if (!efi_enabled(EFI_OLD_MEMMAP)) {
> >
> > for v4.4?
> >
> 
> The 32-bit side essentially implements the old memmap only, which is the the 
> bottom-up version. So old memmap will be implied by 32-bit but not set in the 
> EFI flags, resulting in the reverse enumeration being used with the bottom-up 
> mapping logic. The net result of that is that we create the same problem for 
> 32-bit that we are trying to solve for 64-bit, i.e., the regions will end up 
> in 
> reverse order in the VA mapping.
> 
> To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP 
> unconditionally on 32-bit x86. Or we could reshuffle variables and 
> conditionals 
> in various other way.

Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects.

> [...] I am not convinced that the overall end result will be any better 
> though.

That's not true, we change an obscure, implicit dependency on 32-bit detail to 
an 
explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a 
clear 
improvement.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-28 Thread Ard Biesheuvel
On 28 September 2015 at 09:22, Ingo Molnar  wrote:
>
> * Ard Biesheuvel  wrote:
>
>> On 27 September 2015 at 08:03, Ingo Molnar  wrote:
>> >
>> > * Matt Fleming  wrote:
>> >
>> [...]
>> >> [...] The actual virtual addresses we pick are exactly the same with the 
>> >> two
>> >> patches.
>> >
>> > So I'm NAK-ing this for now:
>> >
>> >  - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an
>> >allocator, because all the sections have already been determined by the
>> >firmware, and, as we just learned the hard way, we do not want to 
>> > deviate from
>> >that! There's nothing to 'allocate'!
>> >
>> >What these patches seem to implement is an elaborate 'allocator' that 
>> > ends up
>> >doing nothing on 'new 64-bit' ...
>> >
>> >  - The 32-bit and 64-bit and 'old_mmap' asymmetries:
>> >
>> > if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>> >
>> >seem fragile and nonsensical. The question is: is it possible for the 
>> > whole EFI
>> >image to be larger than a couple of megabytes? If not then 32-bit 
>> > should just
>> >mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything
>> >differently from this _obvious_ 1:1 mapping of the EFI memory offsets 
>> > then it's
>> >not worth keeping as a legacy, because there's just nothing better than
>> >mirroring the firmware layout.
>> >
>> > My suggestion would be to just 1:1 map what the EFI tables describe, 
>> > modulo the
>> > single absolute offset by which we shift the whole thing to a single base.
>> >
>> > Is there any technical reason why we'd want to deviate from that? 
>> > Gigabytes of
>> > tables or gigabytes of holes that 32-bit cannot handle? Firmware that 
>> > wants an OS
>> > layout that differs from the firmware layout?
>> >
>>
>> The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1 
>> addressable PA
>> space. They usually don't but it is a possibility, which means 32-bit will 
>> not
>> generally be able to support this approach. [...]
>
> Ok, that's a good argument which invalidates my NAK.
>
>> [...] For 64-bit ARM, there are some minor complications when the base of 
>> RAM is
>> up very high in physical memory, but we already fixed that for the boot time 
>> ID
>> map and for KVM.
>>
>> > Also, nobody seems to be asking the obvious hardware compatibility question
>> > when trying to implement a standard influenced in great part by an entity 
>> > that
>> > is partly ignorant of and partly hostile to Linux: how does Windows map the
>> > EFI sections, under what OSs are these firmware versions tested? I suspect 
>> > no
>> > firmware is released that crashes on bootup on all OSs that can run on that
>> > hardware, right?
>>
>> Interestingly, it was the other way around this time. The engineers that
>> implemented this feature for EDK2 could not boot Windows 8 anymore, because 
>> it
>> supposedly maps the regions in reverse order as well (and MS too will need to
>> backport a fix that inverts the mapping order). The engineers also tested
>> Linux/x86, by means of a SUSE installer image, which booted fine, most likely
>> due to the fact that it is an older version which still uses the old memmap
>> layout.
>
> That's nice to hear!
>
>> My concern with all of this is that this security feature will become an 
>> obscure
>> opt-in feature rather than something UEFIv2.5 firmware implementations can
>> enable by default.
>
> Ok, so I think the patches are mostly fine after all,

That is good to hear.

> except that I don't think
> the condition on 64-bit makes any sense:
>
> +   if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>
> I can see us being nervous wrt. backported patches, but is there any strong 
> reason
> to not follow this up with a third (non-backported) patch that changes this 
> to:
>
> +   if (!efi_enabled(EFI_OLD_MEMMAP)) {
>
> for v4.4?
>

The 32-bit side essentially implements the old memmap only, which is
the the bottom-up version. So old memmap will be implied by 32-bit but
not set in the EFI flags, resulting in the reverse enumeration being
used with the bottom-up mapping logic. The net result of that is that
we create the same problem for 32-bit that we are trying to solve for
64-bit, i.e., the regions will end up in reverse order in the VA
mapping.

To deobfuscate this particular conditional, we could set
EFI_OLD_MEMMAP unconditionally on 32-bit x86. Or we could reshuffle
variables and conditionals in various other way. I am not convinced
that the overall end result will be any better though.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-28 Thread Ingo Molnar

* Ard Biesheuvel  wrote:

> On 27 September 2015 at 08:03, Ingo Molnar  wrote:
> >
> > * Matt Fleming  wrote:
> >
> [...]
> >> [...] The actual virtual addresses we pick are exactly the same with the 
> >> two
> >> patches.
> >
> > So I'm NAK-ing this for now:
> >
> >  - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an
> >allocator, because all the sections have already been determined by the
> >firmware, and, as we just learned the hard way, we do not want to 
> > deviate from
> >that! There's nothing to 'allocate'!
> >
> >What these patches seem to implement is an elaborate 'allocator' that 
> > ends up
> >doing nothing on 'new 64-bit' ...
> >
> >  - The 32-bit and 64-bit and 'old_mmap' asymmetries:
> >
> > if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> >
> >seem fragile and nonsensical. The question is: is it possible for the 
> > whole EFI
> >image to be larger than a couple of megabytes? If not then 32-bit should 
> > just
> >mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything
> >differently from this _obvious_ 1:1 mapping of the EFI memory offsets 
> > then it's
> >not worth keeping as a legacy, because there's just nothing better than
> >mirroring the firmware layout.
> >
> > My suggestion would be to just 1:1 map what the EFI tables describe, modulo 
> > the
> > single absolute offset by which we shift the whole thing to a single base.
> >
> > Is there any technical reason why we'd want to deviate from that? Gigabytes 
> > of
> > tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants 
> > an OS
> > layout that differs from the firmware layout?
> >
> 
> The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1 addressable 
> PA 
> space. They usually don't but it is a possibility, which means 32-bit will 
> not 
> generally be able to support this approach. [...]

Ok, that's a good argument which invalidates my NAK.

> [...] For 64-bit ARM, there are some minor complications when the base of RAM 
> is 
> up very high in physical memory, but we already fixed that for the boot time 
> ID 
> map and for KVM.
> 
> > Also, nobody seems to be asking the obvious hardware compatibility question 
> > when trying to implement a standard influenced in great part by an entity 
> > that 
> > is partly ignorant of and partly hostile to Linux: how does Windows map the 
> > EFI sections, under what OSs are these firmware versions tested? I suspect 
> > no 
> > firmware is released that crashes on bootup on all OSs that can run on that 
> > hardware, right?
> 
> Interestingly, it was the other way around this time. The engineers that 
> implemented this feature for EDK2 could not boot Windows 8 anymore, because 
> it 
> supposedly maps the regions in reverse order as well (and MS too will need to 
> backport a fix that inverts the mapping order). The engineers also tested 
> Linux/x86, by means of a SUSE installer image, which booted fine, most likely 
> due to the fact that it is an older version which still uses the old memmap 
> layout.

That's nice to hear!

> My concern with all of this is that this security feature will become an 
> obscure 
> opt-in feature rather than something UEFIv2.5 firmware implementations can 
> enable by default.

Ok, so I think the patches are mostly fine after all, except that I don't think 
the condition on 64-bit makes any sense:

+   if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {

I can see us being nervous wrt. backported patches, but is there any strong 
reason 
to not follow this up with a third (non-backported) patch that changes this to:

+   if (!efi_enabled(EFI_OLD_MEMMAP)) {

for v4.4?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-28 Thread Matthew Garrett
On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:

> So the question is, what does Windows do?

It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap() 
arguments to the qemu debug port. Unfortunately I'm about to drop mostly 
offline for a week, otherwise I'd give it a go...

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-27 Thread Ard Biesheuvel
On 27 September 2015 at 08:03, Ingo Molnar  wrote:
>
> * Matt Fleming  wrote:
>
[...]
>> [...] The actual virtual addresses we pick are exactly the same with the two
>> patches.
>
> So I'm NAK-ing this for now:
>
>  - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an
>allocator, because all the sections have already been determined by the
>firmware, and, as we just learned the hard way, we do not want to deviate 
> from
>that! There's nothing to 'allocate'!
>
>What these patches seem to implement is an elaborate 'allocator' that ends 
> up
>doing nothing on 'new 64-bit' ...
>
>  - The 32-bit and 64-bit and 'old_mmap' asymmetries:
>
> if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>
>seem fragile and nonsensical. The question is: is it possible for the 
> whole EFI
>image to be larger than a couple of megabytes? If not then 32-bit should 
> just
>mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything
>differently from this _obvious_ 1:1 mapping of the EFI memory offsets then 
> it's
>not worth keeping as a legacy, because there's just nothing better than
>mirroring the firmware layout.
>
> My suggestion would be to just 1:1 map what the EFI tables describe, modulo 
> the
> single absolute offset by which we shift the whole thing to a single base.
>
> Is there any technical reason why we'd want to deviate from that? Gigabytes of
> tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants 
> an OS
> layout that differs from the firmware layout?
>

The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1
addressable PA space. They usually don't but it is a possibility,
which means 32-bit will not generally be able to support this
approach. For 64-bit ARM, there are some minor complications when the
base of RAM is up very high in physical memory, but we already fixed
that for the boot time ID map and for KVM.

> Also, nobody seems to be asking the obvious hardware compatibility question 
> when
> trying to implement a standard influenced in great part by an entity that is
> partly ignorant of and partly hostile to Linux: how does Windows map the EFI
> sections, under what OSs are these firmware versions tested? I suspect no 
> firmware
> is released that crashes on bootup on all OSs that can run on that hardware,
> right?
>

Interestingly, it was the other way around this time. The engineers
that implemented this feature for EDK2 could not boot Windows 8
anymore, because it supposedly maps the regions in reverse order as
well (and MS too will need to backport a fix that inverts the mapping
order). The engineers also tested Linux/x86, by means of a SUSE
installer image, which booted fine, most likely due to the fact that
it is an older version which still uses the old memmap layout.

My concern with all of this is that this security feature will become
an obscure opt-in feature rather than something UEFIv2.5 firmware
implementations can enable by default.
-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-27 Thread Ingo Molnar

* Matthew Garrett  wrote:

> On Sun, Sep 27, 2015 at 09:30:48AM -0700, Andy Lutomirski wrote:
> > On Sep 26, 2015 1:19 PM, "H. Peter Anvin"  wrote:
> > >
> > > Sadly a lot of firmware is known to fail in that configuration :(  That 
> > > was very much our guest choice.
> > >
> > 
> > Why can't we map everything completely 1:1 (VA = PA) and call the
> > setVA thing but pass it literally the identity.
> 
> Last time I tried this I found that some firmware makes assumptions 
> about having high addresses.

So the question is, what does Windows do?

PC firmware is a hostile environment for Linux, to be compatible the best we 
can 
do is to mimic the environment that the firmware is tested under - i.e. try to 
use 
the firmware in the way Windows uses it.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-27 Thread Matthew Garrett
On Sun, Sep 27, 2015 at 09:30:48AM -0700, Andy Lutomirski wrote:
> On Sep 26, 2015 1:19 PM, "H. Peter Anvin"  wrote:
> >
> > Sadly a lot of firmware is known to fail in that configuration :(  That was 
> > very much our guest choice.
> >
> 
> Why can't we map everything completely 1:1 (VA = PA) and call the
> setVA thing but pass it literally the identity.

Last time I tried this I found that some firmware makes assumptions 
about having high addresses.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-27 Thread Andy Lutomirski
On Sep 26, 2015 1:19 PM, "H. Peter Anvin"  wrote:
>
> Sadly a lot of firmware is known to fail in that configuration :(  That was 
> very much our guest choice.
>

Why can't we map everything completely 1:1 (VA = PA) and call the
setVA thing but pass it literally the identity.

Firmwares that need it to be called will work, and firmwares that fail
to update all offsets will be fine because there's nothing to update.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-27 Thread Ingo Molnar

* Matt Fleming  wrote:

> > So could we make the whole code obviously bottom-up? Such as first 
> > calculating 
> > the size of virtual memory needed, then allocating a _single_, obviously 
> > continuous mapping, and then doing a very clear in-order mapping within 
> > that 
> > window? That would remove any bitness and legacy dependencies.
>  
> So, we could, and in fact the first version of this patch did just that. You 
> can 
> find it here,
> 
>   
> https://lkml.kernel.org/r/1441372447-23439-1-git-send-email-m...@codeblueprint.co.uk
> 
> But Ard suggested re-using the existing code and simply changing the order we 
> map the memmap entries in.

Such implementational arguments (which are an internal cost) never trump 
compatibility and robustness concerns (which are an external constraint).

> [...] And given the constraint for a small patch for backporting, I think 
> it's a 
> better solution. [...]

Ugh, backporting size is _even less_ of a valid argument when it comes to 
firmware 
support correctness!

We can (perhaps) use these already existing patches as a simpler backport, but 
there's absolutely no reason to keep that code as a solution:

> [...] The actual virtual addresses we pick are exactly the same with the two 
> patches.

So I'm NAK-ing this for now:

 - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an
   allocator, because all the sections have already been determined by the
   firmware, and, as we just learned the hard way, we do not want to deviate 
from 
   that! There's nothing to 'allocate'!

   What these patches seem to implement is an elaborate 'allocator' that ends up
   doing nothing on 'new 64-bit' ...

 - The 32-bit and 64-bit and 'old_mmap' asymmetries:

if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {

   seem fragile and nonsensical. The question is: is it possible for the whole 
EFI
   image to be larger than a couple of megabytes? If not then 32-bit should just
   mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything
   differently from this _obvious_ 1:1 mapping of the EFI memory offsets then 
it's
   not worth keeping as a legacy, because there's just nothing better than
   mirroring the firmware layout.

My suggestion would be to just 1:1 map what the EFI tables describe, modulo the 
single absolute offset by which we shift the whole thing to a single base.

Is there any technical reason why we'd want to deviate from that? Gigabytes of 
tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants an 
OS 
layout that differs from the firmware layout?

Also, nobody seems to be asking the obvious hardware compatibility question 
when 
trying to implement a standard influenced in great part by an entity that is 
partly ignorant of and partly hostile to Linux: how does Windows map the EFI 
sections, under what OSs are these firmware versions tested? I suspect no 
firmware 
is released that crashes on bootup on all OSs that can run on that hardware, 
right?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar  wrote:
> >
> > So this commit worries me.
> >
> > This bug is a good find, and the fix is obviously needed and urgent, but 
> > I'm not
> > sure about the implementation at all. (I've Cc:-ed a few more x86 low level
> > gents.)
> >
> > * Matt Fleming  wrote:
> >> + /*
> >> +  * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
> >> +  * config table feature requires us to map all entries
> >> +  * in the same order as they appear in the EFI memory
> >> +  * map. That is to say, entry N must have a lower
> >> +  * virtual address than entry N+1. This is because the
> >> +  * firmware toolchain leaves relative references in
> >> +  * the code/data sections, which are split and become
> >> +  * separate EFI memory regions. Mapping things
> >> +  * out-of-order leads to the firmware accessing
> >> +  * unmapped addresses.
> >> +  *
> 
> I'm clearly missing something.  What is EFI doing that it doesn't care how 
> big 
> the gap between sections is but it still requires them to be in order?  It's 
> not 
> as though x86_64 has an addressing mode that allows only non-negative offsets.

It appears the problem is that what we think to be 'different sections' are in 
reality smaller parts of the same section.

Any relative address calculation will be broken if we don't preserve the 
relative 
positions of these sections/sub-sections. Any CPU that supports addition is 
affected, it doesn't need any special addressing modes.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread H. Peter Anvin
Sadly a lot of firmware is known to fail in that configuration :(  That was 
very much our guest choice.

I don't actually think it is all that infeasible to keep relative offsets 
consistent for the regions we have to map. PMD_SIZE is not a very large chunk 
so it could be a problem.

On September 26, 2015 1:09:17 PM PDT, Ard Biesheuvel 
 wrote:
>
>> On 26 sep. 2015, at 12:57, Matt Fleming 
>wrote:
>> 
>>> On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote:
>>> 
>>> It is still a hack unless all relative offsets are preserved.  That
>>> is actually simpler, even: no sorting necessary.
>> 
>> Unless I'm missing something, preserving relative offsets is exactly
>> what we do today, modulo PMD_SIZE gaps.
>> 
>
>I think what Peter means is preserving the relative offsets inside the
>entire 1:1 space.
>
>This is not at all what we do currently, and i don't think it is
>generally feasible on 32-bit (since the physical range may conflict
>with the virtual kernel mappings)
>
>However, on 64 bit (both arm and x86), this boils down to not calling
>setVA() in the first place, which i'm all in favor of.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread Ard Biesheuvel

> On 26 sep. 2015, at 12:57, Matt Fleming  wrote:
> 
>> On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote:
>> 
>> It is still a hack unless all relative offsets are preserved.  That
>> is actually simpler, even: no sorting necessary.
> 
> Unless I'm missing something, preserving relative offsets is exactly
> what we do today, modulo PMD_SIZE gaps.
> 

I think what Peter means is preserving the relative offsets inside the entire 
1:1 space.

This is not at all what we do currently, and i don't think it is generally 
feasible on 32-bit (since the physical range may conflict with the virtual 
kernel mappings)

However, on 64 bit (both arm and x86), this boils down to not calling setVA() 
in the first place, which i'm all in favor of.
-- 
Ard.

> -- 
> Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread Matt Fleming
On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote:
>
> It is still a hack unless all relative offsets are preserved.  That
> is actually simpler, even: no sorting necessary.

Unless I'm missing something, preserving relative offsets is exactly
what we do today, modulo PMD_SIZE gaps.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread Matt Fleming
On Sat, 26 Sep, at 10:20:22AM, H. Peter Anvin wrote:
>
> I think it "works" because the affected BIOSes don't put spaces
> between the chunks.  I have discussed this with Matt.

Right, that's very true. Though note that the current mapping
algorithm will handle a gap <= PMD_SIZE, it's just anything larger
than PMD_SIZE we "squash" to the next multiple of PMD_SIZE.

It's unclear whether the firmware toolchains would even support
references between sections with a PMD_SIZE gap between them, and I
think the firmware engineers would have to go out of their way to
actually insert such a gap.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread H. Peter Anvin
It is still a hack unless all relative offsets are preserved.  That is actually 
simpler, even: no sorting necessary.

On September 26, 2015 11:15:57 AM PDT, Ard Biesheuvel 
 wrote:
>On 26 September 2015 at 10:20, H. Peter Anvin  wrote:
>> I think it "works" because the affected BIOSes don't put spaces
>between the chunks.  I have discussed this with Matt.
>>
>
>Forgive the ASCII art but perhaps an illustration might help:
>
>before the 2.5 feature, PE/COFF runtime images were remapped as
>illustrated here:
>
>PAVA
>+---+ +---+
>|   | |   |
>| PE/COFF .text | |EFI|
>|   | |Runtime|
>+- - - - - - - -+=>   |Services   |+
>|   | |Code   ||:   :
>| PE/COFF .data | |   ||:   :
>|   | |   ||+---+
>+---+ +---+||   |
>|   | |   |||EFI|
>:   : :   :||Runtime|
>:   : :   :+--->|Services   |
>|   | |   | |Code   |
>+---+ +---+ |   |
>|   | |   | |   |
>| PE/COFF .text | |EFI| +---+
>|   | |Runtime| :  gap  :
>+- - - - - - - -+=>   |Services   |---+ +---+
>|   | |Code   |   | |   |
>| PE/COFF .data | |   |   | |EFI|
>|   | |   |   | |Runtime|
>+---+ +---+   +>|Services   |
>|   | |   | |Code   |
>:   : :   : |   |
>:   : :   : |   |
>:   : :   : +---+
>:   : :   : :   :
>
>Since the affected symbol references only exist between PE/COFF .text
>and PE/COFF .data, there is never a problem since each is PE/COFF
>image is mapped as a single region.
>However, with the new feature enabled, this no longer holds:
>PAVA
>+---+ +---+
>|   | |   |
>| PE/COFF .text | |RtServices |+
>|   | |Code   ||
>+- - - - - - - -+=>   +---+|+---+
>|   | |RtServices |+--->|RtServices |
>| PE/COFF .data | |Data   | |Code   |
>|   | |   |++---+
>+---+ +---+|: gap   :
>|   | |   ||+---+
>:   : :   :+--->|RtServices |
>:   : :   : |Data   |
>|   | |   | +---+
>+---+ +---+ : gap   :
>|   | |   | +---+
>| PE/COFF .text | |RtServices |>|RtServices |
>|   | |Code   | |Code   |
>+- - - - - - - -+=>   +---+ +---+
>|   | |RtServices | : gap   :
>| PE/COFF .data | |Data   |---+ +---+
>|   | |   |   | |RtServices |
>+---+ +---+   +>|Data   |
>|   | |   | |   |
>:   : :   : +---+
>:   : :   : :   :
>:   : :   : :   :
>
>The illustration uses gaps, but obviously, this applies equally to
>inverting the mapping order, since the PE/COFF .text and .data
>sections will end up out of order.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread Ard Biesheuvel
On 26 September 2015 at 10:20, H. Peter Anvin  wrote:
> I think it "works" because the affected BIOSes don't put spaces between the 
> chunks.  I have discussed this with Matt.
>

Forgive the ASCII art but perhaps an illustration might help:

before the 2.5 feature, PE/COFF runtime images were remapped as
illustrated here:

PAVA
+---+ +---+
|   | |   |
| PE/COFF .text | |EFI|
|   | |Runtime|
+- - - - - - - -+=>   |Services   |+
|   | |Code   ||:   :
| PE/COFF .data | |   ||:   :
|   | |   ||+---+
+---+ +---+||   |
|   | |   |||EFI|
:   : :   :||Runtime|
:   : :   :+--->|Services   |
|   | |   | |Code   |
+---+ +---+ |   |
|   | |   | |   |
| PE/COFF .text | |EFI| +---+
|   | |Runtime| :  gap  :
+- - - - - - - -+=>   |Services   |---+ +---+
|   | |Code   |   | |   |
| PE/COFF .data | |   |   | |EFI|
|   | |   |   | |Runtime|
+---+ +---+   +>|Services   |
|   | |   | |Code   |
:   : :   : |   |
:   : :   : |   |
:   : :   : +---+
:   : :   : :   :

Since the affected symbol references only exist between PE/COFF .text
and PE/COFF .data, there is never a problem since each is PE/COFF
image is mapped as a single region.
However, with the new feature enabled, this no longer holds:
PAVA
+---+ +---+
|   | |   |
| PE/COFF .text | |RtServices |+
|   | |Code   ||
+- - - - - - - -+=>   +---+|+---+
|   | |RtServices |+--->|RtServices |
| PE/COFF .data | |Data   | |Code   |
|   | |   |++---+
+---+ +---+|: gap   :
|   | |   ||+---+
:   : :   :+--->|RtServices |
:   : :   : |Data   |
|   | |   | +---+
+---+ +---+ : gap   :
|   | |   | +---+
| PE/COFF .text | |RtServices |>|RtServices |
|   | |Code   | |Code   |
+- - - - - - - -+=>   +---+ +---+
|   | |RtServices | : gap   :
| PE/COFF .data | |Data   |---+ +---+
|   | |   |   | |RtServices |
+---+ +---+   +>|Data   |
|   | |   | |   |
:   : :   : +---+
:   : :   : :   :
:   : :   : :   :

The illustration uses gaps, but obviously, this applies equally to
inverting the mapping order, since the PE/COFF .text and .data
sections will end up out of order.

-- 
Ard.


> On September 26, 2015 10:01:14 AM PDT, Andy Lutomirski  
> wrote:
>>On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar  wrote:
>>>
>>> So this commit worries me.
>>>
>>> This bug is a good find, and the fix is obviously needed and urgent,
>>but I'm not
>>> sure about the implementation at all. (I've Cc:-ed a few more x86 low
>>level
>>> gents.)
>>>
>>> * Matt Fleming  wrote:
 + /*
 +  * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
 +  * config table feature requires us to map all entries
 +  * in the same order as they appear in the EFI memory
 +  * map. That is 

Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread H. Peter Anvin
I think it "works" because the affected BIOSes don't put spaces between the 
chunks.  I have discussed this with Matt.

On September 26, 2015 10:01:14 AM PDT, Andy Lutomirski  
wrote:
>On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar  wrote:
>>
>> So this commit worries me.
>>
>> This bug is a good find, and the fix is obviously needed and urgent,
>but I'm not
>> sure about the implementation at all. (I've Cc:-ed a few more x86 low
>level
>> gents.)
>>
>> * Matt Fleming  wrote:
>>> + /*
>>> +  * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
>>> +  * config table feature requires us to map all entries
>>> +  * in the same order as they appear in the EFI memory
>>> +  * map. That is to say, entry N must have a lower
>>> +  * virtual address than entry N+1. This is because the
>>> +  * firmware toolchain leaves relative references in
>>> +  * the code/data sections, which are split and become
>>> +  * separate EFI memory regions. Mapping things
>>> +  * out-of-order leads to the firmware accessing
>>> +  * unmapped addresses.
>>> +  *
>
>I'm clearly missing something.  What is EFI doing that it doesn't care
>how big the gap between sections is but it still requires them to be
>in order?  It's not as though x86_64 has an addressing mode that
>allows only non-negative offsets.
>
>--Andy

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread Andy Lutomirski
On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar  wrote:
>
> So this commit worries me.
>
> This bug is a good find, and the fix is obviously needed and urgent, but I'm 
> not
> sure about the implementation at all. (I've Cc:-ed a few more x86 low level
> gents.)
>
> * Matt Fleming  wrote:
>> + /*
>> +  * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
>> +  * config table feature requires us to map all entries
>> +  * in the same order as they appear in the EFI memory
>> +  * map. That is to say, entry N must have a lower
>> +  * virtual address than entry N+1. This is because the
>> +  * firmware toolchain leaves relative references in
>> +  * the code/data sections, which are split and become
>> +  * separate EFI memory regions. Mapping things
>> +  * out-of-order leads to the firmware accessing
>> +  * unmapped addresses.
>> +  *

I'm clearly missing something.  What is EFI doing that it doesn't care
how big the gap between sections is but it still requires them to be
in order?  It's not as though x86_64 has an addressing mode that
allows only non-negative offsets.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-26 Thread Matt Fleming
On Sat, 26 Sep, at 07:56:43AM, Ingo Molnar wrote:
> 
> So this commit worries me.
> 
> This bug is a good find, and the fix is obviously needed and urgent, but I'm 
> not 
> sure about the implementation at all. (I've Cc:-ed a few more x86 low level 
> gents.)
 
Thanks, the more the merrier.

> * Matt Fleming  wrote:
> 
> >  /*
> > + * Iterate the EFI memory map in reverse order because the regions
> > + * will be mapped top-down. The end result is the same as if we had
> > + * mapped things forward, but doesn't require us to change the
> > + * existing implementation of efi_map_region().
> > + */
> > +static inline void *efi_map_next_entry_reverse(void *entry)
> > +{
> > +   /* Initial call */
> > +   if (!entry)
> > +   return memmap.map_end - memmap.desc_size;
> > +
> > +   entry -= memmap.desc_size;
> > +   if (entry < memmap.map)
> > +   return NULL;
> > +
> > +   return entry;
> > +}
> > +
> > +/*
> > + * efi_map_next_entry - Return the next EFI memory map descriptor
> > + * @entry: Previous EFI memory map descriptor
> > + *
> > + * This is a helper function to iterate over the EFI memory map, which
> > + * we do in different orders depending on the current configuration.
> > + *
> > + * To begin traversing the memory map @entry must be %NULL.
> > + *
> > + * Returns %NULL when we reach the end of the memory map.
> > + */
> > +static void *efi_map_next_entry(void *entry)
> > +{
> > +   if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> > +   /*
> > +* Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
> > +* config table feature requires us to map all entries
> > +* in the same order as they appear in the EFI memory
> > +* map. That is to say, entry N must have a lower
> > +* virtual address than entry N+1. This is because the
> > +* firmware toolchain leaves relative references in
> > +* the code/data sections, which are split and become
> > +* separate EFI memory regions. Mapping things
> > +* out-of-order leads to the firmware accessing
> > +* unmapped addresses.
> > +*
> > +* Since we need to map things this way whether or not
> > +* the kernel actually makes use of
> > +* EFI_PROPERTIES_TABLE, let's just switch to this
> > +* scheme by default for 64-bit.
> 
> The thing is, if relative accesses between these 'sections' do happen then 
> the 
> requirement is much stronger than just 'ordered by addresses' - then we must 
> map 
> them continuously and as a single block!
> 
> So at minimum the comment should say that. But I think we want more:
 
Well, the firmware doesn't place arbitrary gaps between these runtime
image sections in memory, they still get loaded into memory by the
firmware the old-fashioned way (as one contiguous blob). It's just
that we've now got two memory map entries for the single PE/COFF
image. Also, it's only sections from the same PE/COFF image that
reference each other (but like Ard said, we can't tell from the memmap
which entries are for a single PE/COFF image).

Because EFI memory map entries that are part of the same PE/COFF image
will be at consecutive addresses, we'll also map them contiguously in
the kernel virtual address space, because that's how the current code
works - it's just that the current code maps them backwards.

Yeah, I'm completely in favour of improving any of the comments. 
 
> > +*/
> > +   return efi_map_next_entry_reverse(entry);
> > +   }
> > +
> > +   /* Initial call */
> > +   if (!entry)
> > +   return memmap.map;
> > +
> > +   entry += memmap.desc_size;
> > +   if (entry >= memmap.map_end)
> > +   return NULL;
> > +
> > +   return entry;
> > +}
> > +
> > +/*
> >   * Map the efi memory ranges of the runtime services and update new_mmap 
> > with
> >   * virtual addresses.
> >   */
> > @@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int 
> > *pg_shift)
> > unsigned long left = 0;
> > efi_memory_desc_t *md;
> >  
> > -   for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> > +   p = NULL;
> > +   while ((p = efi_map_next_entry(p))) {
> > md = p;
> > if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
> 
> So why is this 64-bit only? Is 32-bit not affected because there we allocate 
> virtual addresses bottom-up?
 
32-bit would potentially be effected if 32-bit firmware ever enabled
the EFI_PROPERTIES_TABLE feature. Whether or not it worked would
depend on the fragemntation of the virtual memory space when we map
the EFI regions.

But I'm not aware of any 32-bit firmware shipping with Properties
Table support, and fixing it in the kernel like we do for x86-64 would
require changing the EFI region mapping code. That's something I
wanted to avoid for this patch since it needs to be applied to stable,
and especially when we haven't seen the fea

Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-25 Thread Ard Biesheuvel
On 25 September 2015 at 22:56, Ingo Molnar  wrote:
>
> So this commit worries me.
>
> This bug is a good find, and the fix is obviously needed and urgent, but I'm 
> not
> sure about the implementation at all. (I've Cc:-ed a few more x86 low level
> gents.)
>
> * Matt Fleming  wrote:
>
>>  /*
>> + * Iterate the EFI memory map in reverse order because the regions
>> + * will be mapped top-down. The end result is the same as if we had
>> + * mapped things forward, but doesn't require us to change the
>> + * existing implementation of efi_map_region().
>> + */
>> +static inline void *efi_map_next_entry_reverse(void *entry)
>> +{
>> + /* Initial call */
>> + if (!entry)
>> + return memmap.map_end - memmap.desc_size;
>> +
>> + entry -= memmap.desc_size;
>> + if (entry < memmap.map)
>> + return NULL;
>> +
>> + return entry;
>> +}
>> +
>> +/*
>> + * efi_map_next_entry - Return the next EFI memory map descriptor
>> + * @entry: Previous EFI memory map descriptor
>> + *
>> + * This is a helper function to iterate over the EFI memory map, which
>> + * we do in different orders depending on the current configuration.
>> + *
>> + * To begin traversing the memory map @entry must be %NULL.
>> + *
>> + * Returns %NULL when we reach the end of the memory map.
>> + */
>> +static void *efi_map_next_entry(void *entry)
>> +{
>> + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>> + /*
>> +  * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
>> +  * config table feature requires us to map all entries
>> +  * in the same order as they appear in the EFI memory
>> +  * map. That is to say, entry N must have a lower
>> +  * virtual address than entry N+1. This is because the
>> +  * firmware toolchain leaves relative references in
>> +  * the code/data sections, which are split and become
>> +  * separate EFI memory regions. Mapping things
>> +  * out-of-order leads to the firmware accessing
>> +  * unmapped addresses.
>> +  *
>> +  * Since we need to map things this way whether or not
>> +  * the kernel actually makes use of
>> +  * EFI_PROPERTIES_TABLE, let's just switch to this
>> +  * scheme by default for 64-bit.
>
> The thing is, if relative accesses between these 'sections' do happen then the
> requirement is much stronger than just 'ordered by addresses' - then we must 
> map
> them continuously and as a single block!
>

The primary difference between pre-2.5 and 2.5 with this feature
enabled is that formerly, each PE/COFF image in memory would be
covered by at most a single EfiRuntimeServicesCode region, and now, a
single PE/COFF image may be split into different regions. It is only
relative references *inside* such a PE/COFF image that we are
concerned about, since no symbol references exist between separate
ones.

Also, it is not only relative references inside the PE/COFF image that
cause the problem. Another aspect is that the offset that is applied
to all absolute references at relocation time is derived from the
offset of the base of the PE/COFF image. If part of the PE/COFF image
(the .data section) is moved relatively to the code section, these
absolute references are fixed up incorrectly. This is actually a
problem that we could solve at the firmware side, but since PE/COFF
does not really tolerate being split up like that, the correct fix is
to keep all regions belonging to a single PE/COFF image adjacent.
Since we can't tell which those regions are, the next best approach is
to keep all adjacent regions with the EFI_MEMORY_RUNTIME attribute
adjacent in the VA mapping.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-25 Thread Ingo Molnar

So this commit worries me.

This bug is a good find, and the fix is obviously needed and urgent, but I'm 
not 
sure about the implementation at all. (I've Cc:-ed a few more x86 low level 
gents.)

* Matt Fleming  wrote:

>  /*
> + * Iterate the EFI memory map in reverse order because the regions
> + * will be mapped top-down. The end result is the same as if we had
> + * mapped things forward, but doesn't require us to change the
> + * existing implementation of efi_map_region().
> + */
> +static inline void *efi_map_next_entry_reverse(void *entry)
> +{
> + /* Initial call */
> + if (!entry)
> + return memmap.map_end - memmap.desc_size;
> +
> + entry -= memmap.desc_size;
> + if (entry < memmap.map)
> + return NULL;
> +
> + return entry;
> +}
> +
> +/*
> + * efi_map_next_entry - Return the next EFI memory map descriptor
> + * @entry: Previous EFI memory map descriptor
> + *
> + * This is a helper function to iterate over the EFI memory map, which
> + * we do in different orders depending on the current configuration.
> + *
> + * To begin traversing the memory map @entry must be %NULL.
> + *
> + * Returns %NULL when we reach the end of the memory map.
> + */
> +static void *efi_map_next_entry(void *entry)
> +{
> + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> + /*
> +  * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
> +  * config table feature requires us to map all entries
> +  * in the same order as they appear in the EFI memory
> +  * map. That is to say, entry N must have a lower
> +  * virtual address than entry N+1. This is because the
> +  * firmware toolchain leaves relative references in
> +  * the code/data sections, which are split and become
> +  * separate EFI memory regions. Mapping things
> +  * out-of-order leads to the firmware accessing
> +  * unmapped addresses.
> +  *
> +  * Since we need to map things this way whether or not
> +  * the kernel actually makes use of
> +  * EFI_PROPERTIES_TABLE, let's just switch to this
> +  * scheme by default for 64-bit.

The thing is, if relative accesses between these 'sections' do happen then the 
requirement is much stronger than just 'ordered by addresses' - then we must 
map 
them continuously and as a single block!

So at minimum the comment should say that. But I think we want more:


> +  */
> + return efi_map_next_entry_reverse(entry);
> + }
> +
> + /* Initial call */
> + if (!entry)
> + return memmap.map;
> +
> + entry += memmap.desc_size;
> + if (entry >= memmap.map_end)
> + return NULL;
> +
> + return entry;
> +}
> +
> +/*
>   * Map the efi memory ranges of the runtime services and update new_mmap with
>   * virtual addresses.
>   */
> @@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int 
> *pg_shift)
>   unsigned long left = 0;
>   efi_memory_desc_t *md;
>  
> - for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> + p = NULL;
> + while ((p = efi_map_next_entry(p))) {
>   md = p;
>   if (!(md->attribute & EFI_MEMORY_RUNTIME)) {

So why is this 64-bit only? Is 32-bit not affected because there we allocate 
virtual addresses bottom-up?

This would be a lot clearer if we just mapped the entries in order, no 
questions 
asked. Conditions like this:

> + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {

... just invite confusion and possible corner cases where we end up mapping 
them 
wrong.

So could we make the whole code obviously bottom-up? Such as first calculating 
the 
size of virtual memory needed, then allocating a _single_, obviously continuous 
mapping, and then doing a very clear in-order mapping within that window? That 
would remove any bitness and legacy dependencies.

Good virtual memory layout is so critical for any third party code that this 
should be the central property of the whole approach, not just some side 
condition 
somewhere in an iteration loop.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

2015-09-25 Thread Matt Fleming
From: Matt Fleming 

Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
signals that the firmware PE/COFF loader supports splitting code and
data sections of PE/COFF images into separate EFI memory map entries.
This allows the kernel to map those regions with strict memory
protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.

Unfortunately, an unwritten requirement of this new feature is that
the regions need to be mapped with the same offsets relative to each
other as observed in the EFI memory map. If this is not done crashes
like this may occur,

 [0.006391] BUG: unable to handle kernel paging request at fffefe6086dd
 [0.006923] IP: [] 0xfffefe6086dd
 [0.007000] Call Trace:
 [0.007000]  [] efi_call+0x7e/0x100
 [0.007000]  [] ? virt_efi_set_variable+0x61/0x90
 [0.007000]  [] efi_delete_dummy_variable+0x63/0x70
 [0.007000]  [] efi_enter_virtual_mode+0x383/0x392
 [0.007000]  [] start_kernel+0x38a/0x417
 [0.007000]  [] x86_64_start_reservations+0x2a/0x2c
 [0.007000]  [] x86_64_start_kernel+0xeb/0xef

Here 0xfffefe6086dd refers to an address the firmware expects to
be mapped but which the OS never claimed was mapped. The issue is that
included in these regions are relative addresses to other regions
which were emitted by the firmware toolchain before the "splitting" of
sections occurred at runtime.

Needless to say, we don't satisfy this unwritten requirement on x86_64
and instead map the EFI memory map entries in reverse order. The above
crash is almost certainly triggerable with any kernel newer than v3.13
because that's when we rewrote the EFI runtime region mapping code, in
commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For
kernel versions before v3.13 things may work by pure luck depending on
the fragmentation of the kernel virtual address space at the time we
map the EFI regions.

Instead of mapping the EFI memory map entries in reverse order, where
entry N has a higher virtual address than entry N+1, map them in the
same order as they appear in the EFI memory map to preserve this
relative offset between regions.

This patch has been kept as small as possible with the intention that
it should be applied aggressively to stable and distribution kernels.
It is very much a bugfix rather than support for a new feature, since
when EFI_PROPERTIES_TABLE is enabled we must map things as outlined
above to even boot - we have no way of asking the firmware not to
split the code/data regions.

In fact, this patch doesn't even make use of the more strict memory
protections available in UEFI v2.5. That will come later.

Reported-by: Ard Biesheuvel 
Suggested-by: Ard Biesheuvel 
Cc: "Lee, Chun-Yi" 
Cc: Borislav Petkov 
Cc: Leif Lindholm 
Cc: Peter Jones 
Cc: James Bottomley 
Cc: Matthew Garrett 
Cc: H. Peter Anvin 
Cc: Dave Young 
Cc: 
Signed-off-by: Matt Fleming 
---
 arch/x86/platform/efi/efi.c | 67 -
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 1db84c0758b7..6a28ded74211 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -705,6 +705,70 @@ out:
 }
 
 /*
+ * Iterate the EFI memory map in reverse order because the regions
+ * will be mapped top-down. The end result is the same as if we had
+ * mapped things forward, but doesn't require us to change the
+ * existing implementation of efi_map_region().
+ */
+static inline void *efi_map_next_entry_reverse(void *entry)
+{
+   /* Initial call */
+   if (!entry)
+   return memmap.map_end - memmap.desc_size;
+
+   entry -= memmap.desc_size;
+   if (entry < memmap.map)
+   return NULL;
+
+   return entry;
+}
+
+/*
+ * efi_map_next_entry - Return the next EFI memory map descriptor
+ * @entry: Previous EFI memory map descriptor
+ *
+ * This is a helper function to iterate over the EFI memory map, which
+ * we do in different orders depending on the current configuration.
+ *
+ * To begin traversing the memory map @entry must be %NULL.
+ *
+ * Returns %NULL when we reach the end of the memory map.
+ */
+static void *efi_map_next_entry(void *entry)
+{
+   if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
+   /*
+* Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
+* config table feature requires us to map all entries
+* in the same order as they appear in the EFI memory
+* map. That is to say, entry N must have a lower
+* virtual address than entry N+1. This is because the
+* firmware toolchain leaves relative references in
+* the code/data sections, which are split and become
+* separate EFI memory regions. Mapping things
+* out-of-order leads to the firmware accessing
+* unmapped addresses.
+