Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Wed, 2015-09-30 at 09:43 -0700, Andy Lutomirski wrote: > On Wed, Sep 30, 2015 at 2:30 AM, Ard Biesheuvel > wrote: > > On 29 September 2015 at 23:58, Laszlo Ersek wrote: > >> On 09/28/15 08:41, Matthew Garrett wrote: > >>> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote: > >>> > So the question is, what does Windows do? > >>> > >>> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap() > >>> arguments to the qemu debug port. Unfortunately I'm about to drop > >>> mostly offline for a week, otherwise I'd give it a go... > > [...] > >> Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10 > >> guests, with the properties table feature enabled vs. disabled in the > >> firmware. (All three Windows guests were updated first though.) > >> > >> All three Windows OSes adapt their SetVirtualAddressMap() calls, when > >> the feature is enabled in the firmware. However, Windows 8.1 crashes > >> nonetheless (BSOD, I forget the fault details, sorry). Windows Server > >> 2012 R2 and Windows 10 boot fine. > >> > > > > Looking at the log, it seems the VA mapping strategy is actually the > > same (i.e., bottom-up for Win10), and the difference can be explained > > by the differences in the memory map provided by the firmware to the > > OS. And indeed, the Win8.1 log shows the following: > > > > # MemType Phys 0x Virt 0x Size 0x Attributes > > -- --- --- --- > > 0 RtData 7EC21000 FFBFA000 0006000 [UC|WC|WT|WB| |XP| | | |RT] > > 1 RtCode 7EC27000 FFBF3000 0007000 [UC|WC|WT|WB| | |RO| | |RT] > > 2 RtData 7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB| |XP| | | |RT] > > 3 RtData 7EC35000 FFBEB000 0001000 [UC|WC|WT|WB| |XP| | | |RT] > > 4 RtCode 7EC36000 FFBE6000 0005000 [UC|WC|WT|WB| | |RO| | |RT] > > 5 RtData 7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB| |XP| | | |RT] > > 6 RtData 7EC6 FFBDE000 0006000 [UC|WC|WT|WB| |XP| | | |RT] > > 7 RtCode 7EC66000 FFBD5000 0009000 [UC|WC|WT|WB| | |RO| | |RT] > > 8 RtData 7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB| |XP| | | |RT] > > 9 RtData 7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB| |XP| | | |RT] > > 10 RtCode 7ED77000 FFA63000 0097000 [UC|WC|WT|WB| | |RO| | |RT] > > 11 RtData 7EE0E000 FFA58000 000B000 [UC|WC|WT|WB| |XP| | | |RT] > > 12 RtData 7FE99000 FFA52000 0006000 [UC|WC|WT|WB| |XP| | | |RT] > > 13 RtCode 7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB| | |RO| | |RT] > > 14 RtData 7FEA5000 FFA49000 0003000 [UC|WC|WT|WB| |XP| | | |RT] > > 15 RtCode 7FEA8000 FFA42000 0007000 [UC|WC|WT|WB| | |RO| | |RT] > > 16 RtData 7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB| |XP| | | |RT] > > 17 RtCode 7FEB2000 FFA36000 0009000 [UC|WC|WT|WB| | |RO| | |RT] > > 18 RtData 7FEBB000 FFA33000 0003000 [UC|WC|WT|WB| |XP| | | |RT] > > 19 RtCode 7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB| | |RO| | |RT] > > 20 RtData 7FEC7000 FFA04000 0026000 [UC|WC|WT|WB| |XP| | | |RT] > > 21 RtData 7FFD FF9E4000 002 [UC|WC|WT|WB| |XP| | | |RT] > > 22 RtData FFE0 FF7E4000 020 [UC| | | | |XP| | | |RT] > > > > I.e., the physical addresses increase while the virtual addresses > > decrease, and since each consecutive RuntimeCode/RuntimeData pair > > constitutes a PE/COFF image (.text and .data, respectively), the > > PE/COFF images appear corrupted in the virtual space. > > All of this garbage makes me want to ask a rhetorical question: > > Why on Earth did anyone think it's a good idea to invoke EFI functions > at CPL0 once the OS is booted? I'm afraid the originators of EFI (Intel) look on it as a DOS replacement ... with the same OS support. > And a more practical question: > > Do we actually have to invoke EFI functions at CPL0? > > I really mean it. Sure, for things like reboot where we give up > control and don't get it back, we need to do that. But for things > like variable access, the EFI code should really only need access to > EFI memor (with a known PA -> VA map) and the ability to trigger an > SMI. Doing it at CPL3 could require more fixups than would really > make sense, but could we virtualize it instead? > > Actually, CPL3 + IOPL3 just might work. > > Heck, on mixed-mode, we're already invoke EFI functions in compat > mode, and that seems okay, so those functions can't be poking at any > CPU state that varies between long and 32-bit modes. It's hard. The EFI functions expect to interact directly with kernel memory, which they can't at CPL3. We could vector all that through a CPL3 readable buffer but anything within EFI that uses privileged instructions will fault and we'll have to handle it ... this really sounds like a can of worms. Especially as windows will be no help testing all of this because it will call in at CPL0. James N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Wed, Sep 30, 2015 at 2:30 AM, Ard Biesheuvel wrote: > On 29 September 2015 at 23:58, Laszlo Ersek wrote: >> On 09/28/15 08:41, Matthew Garrett wrote: >>> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote: >>> So the question is, what does Windows do? >>> >>> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap() >>> arguments to the qemu debug port. Unfortunately I'm about to drop >>> mostly offline for a week, otherwise I'd give it a go... > [...] >> Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10 >> guests, with the properties table feature enabled vs. disabled in the >> firmware. (All three Windows guests were updated first though.) >> >> All three Windows OSes adapt their SetVirtualAddressMap() calls, when >> the feature is enabled in the firmware. However, Windows 8.1 crashes >> nonetheless (BSOD, I forget the fault details, sorry). Windows Server >> 2012 R2 and Windows 10 boot fine. >> > > Looking at the log, it seems the VA mapping strategy is actually the > same (i.e., bottom-up for Win10), and the difference can be explained > by the differences in the memory map provided by the firmware to the > OS. And indeed, the Win8.1 log shows the following: > > # MemType Phys 0x Virt 0x Size 0x Attributes > -- --- --- --- > 0 RtData 7EC21000 FFBFA000 0006000 [UC|WC|WT|WB| |XP| | | |RT] > 1 RtCode 7EC27000 FFBF3000 0007000 [UC|WC|WT|WB| | |RO| | |RT] > 2 RtData 7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB| |XP| | | |RT] > 3 RtData 7EC35000 FFBEB000 0001000 [UC|WC|WT|WB| |XP| | | |RT] > 4 RtCode 7EC36000 FFBE6000 0005000 [UC|WC|WT|WB| | |RO| | |RT] > 5 RtData 7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB| |XP| | | |RT] > 6 RtData 7EC6 FFBDE000 0006000 [UC|WC|WT|WB| |XP| | | |RT] > 7 RtCode 7EC66000 FFBD5000 0009000 [UC|WC|WT|WB| | |RO| | |RT] > 8 RtData 7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB| |XP| | | |RT] > 9 RtData 7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB| |XP| | | |RT] > 10 RtCode 7ED77000 FFA63000 0097000 [UC|WC|WT|WB| | |RO| | |RT] > 11 RtData 7EE0E000 FFA58000 000B000 [UC|WC|WT|WB| |XP| | | |RT] > 12 RtData 7FE99000 FFA52000 0006000 [UC|WC|WT|WB| |XP| | | |RT] > 13 RtCode 7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB| | |RO| | |RT] > 14 RtData 7FEA5000 FFA49000 0003000 [UC|WC|WT|WB| |XP| | | |RT] > 15 RtCode 7FEA8000 FFA42000 0007000 [UC|WC|WT|WB| | |RO| | |RT] > 16 RtData 7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB| |XP| | | |RT] > 17 RtCode 7FEB2000 FFA36000 0009000 [UC|WC|WT|WB| | |RO| | |RT] > 18 RtData 7FEBB000 FFA33000 0003000 [UC|WC|WT|WB| |XP| | | |RT] > 19 RtCode 7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB| | |RO| | |RT] > 20 RtData 7FEC7000 FFA04000 0026000 [UC|WC|WT|WB| |XP| | | |RT] > 21 RtData 7FFD FF9E4000 002 [UC|WC|WT|WB| |XP| | | |RT] > 22 RtData FFE0 FF7E4000 020 [UC| | | | |XP| | | |RT] > > I.e., the physical addresses increase while the virtual addresses > decrease, and since each consecutive RuntimeCode/RuntimeData pair > constitutes a PE/COFF image (.text and .data, respectively), the > PE/COFF images appear corrupted in the virtual space. All of this garbage makes me want to ask a rhetorical question: Why on Earth did anyone think it's a good idea to invoke EFI functions at CPL0 once the OS is booted? And a more practical question: Do we actually have to invoke EFI functions at CPL0? I really mean it. Sure, for things like reboot where we give up control and don't get it back, we need to do that. But for things like variable access, the EFI code should really only need access to EFI memor (with a known PA -> VA map) and the ability to trigger an SMI. Doing it at CPL3 could require more fixups than would really make sense, but could we virtualize it instead? Actually, CPL3 + IOPL3 just might work. Heck, on mixed-mode, we're already invoke EFI functions in compat mode, and that seems okay, so those functions can't be poking at any CPU state that varies between long and 32-bit modes. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On 29 September 2015 at 23:58, Laszlo Ersek wrote: > On 09/28/15 08:41, Matthew Garrett wrote: >> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote: >> >>> So the question is, what does Windows do? >> >> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap() >> arguments to the qemu debug port. Unfortunately I'm about to drop >> mostly offline for a week, otherwise I'd give it a go... [...] > Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10 > guests, with the properties table feature enabled vs. disabled in the > firmware. (All three Windows guests were updated first though.) > > All three Windows OSes adapt their SetVirtualAddressMap() calls, when > the feature is enabled in the firmware. However, Windows 8.1 crashes > nonetheless (BSOD, I forget the fault details, sorry). Windows Server > 2012 R2 and Windows 10 boot fine. > Looking at the log, it seems the VA mapping strategy is actually the same (i.e., bottom-up for Win10), and the difference can be explained by the differences in the memory map provided by the firmware to the OS. And indeed, the Win8.1 log shows the following: # MemType Phys 0x Virt 0x Size 0x Attributes -- --- --- --- 0 RtData 7EC21000 FFBFA000 0006000 [UC|WC|WT|WB| |XP| | | |RT] 1 RtCode 7EC27000 FFBF3000 0007000 [UC|WC|WT|WB| | |RO| | |RT] 2 RtData 7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB| |XP| | | |RT] 3 RtData 7EC35000 FFBEB000 0001000 [UC|WC|WT|WB| |XP| | | |RT] 4 RtCode 7EC36000 FFBE6000 0005000 [UC|WC|WT|WB| | |RO| | |RT] 5 RtData 7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB| |XP| | | |RT] 6 RtData 7EC6 FFBDE000 0006000 [UC|WC|WT|WB| |XP| | | |RT] 7 RtCode 7EC66000 FFBD5000 0009000 [UC|WC|WT|WB| | |RO| | |RT] 8 RtData 7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB| |XP| | | |RT] 9 RtData 7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB| |XP| | | |RT] 10 RtCode 7ED77000 FFA63000 0097000 [UC|WC|WT|WB| | |RO| | |RT] 11 RtData 7EE0E000 FFA58000 000B000 [UC|WC|WT|WB| |XP| | | |RT] 12 RtData 7FE99000 FFA52000 0006000 [UC|WC|WT|WB| |XP| | | |RT] 13 RtCode 7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB| | |RO| | |RT] 14 RtData 7FEA5000 FFA49000 0003000 [UC|WC|WT|WB| |XP| | | |RT] 15 RtCode 7FEA8000 FFA42000 0007000 [UC|WC|WT|WB| | |RO| | |RT] 16 RtData 7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB| |XP| | | |RT] 17 RtCode 7FEB2000 FFA36000 0009000 [UC|WC|WT|WB| | |RO| | |RT] 18 RtData 7FEBB000 FFA33000 0003000 [UC|WC|WT|WB| |XP| | | |RT] 19 RtCode 7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB| | |RO| | |RT] 20 RtData 7FEC7000 FFA04000 0026000 [UC|WC|WT|WB| |XP| | | |RT] 21 RtData 7FFD FF9E4000 002 [UC|WC|WT|WB| |XP| | | |RT] 22 RtData FFE0 FF7E4000 020 [UC| | | | |XP| | | |RT] I.e., the physical addresses increase while the virtual addresses decrease, and since each consecutive RuntimeCode/RuntimeData pair constitutes a PE/COFF image (.text and .data, respectively), the PE/COFF images appear corrupted in the virtual space. > I uploaded the verbose OVMF log files from all six guest boots to [5]. > The tables you might be interested in are dumped at the ends of the log > files. > > All three guests had 2GB of RAM. They had different VM configurations, > but between disabling and enabling the properties table feature, no > other knob was tweaked. Therefore the two log files of the same guest > should be comparable against each other, for each guest. For example: > > $ colordiff -u ovmf.win10.prop.{disabled,enabled}.log > > Because stuff hosted on the web privately tends to go away, I'll quote > that diff here, for posterity: > >> --- ovmf.win10.prop.disabled.log 2015-09-29 22:01:45.252126086 +0200 >> +++ ovmf.win10.prop.enabled.log 2015-09-29 21:50:54.579475078 +0200 [...] Thanks a lot for taking the time, this is very useful info. -- Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On 09/27/2015 11:16 PM, Ingo Molnar wrote: > > So the question is, what does Windows do? > > PC firmware is a hostile environment for Linux, to be compatible the best we > can > do is to mimic the environment that the firmware is tested under - i.e. try > to use > the firmware in the way Windows uses it. > Windows apparently went through the same exercise of breakage followed by a fix. It is unknown if Windows will preserve gaps since those are not manifest on existing firmware. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On 09/28/15 08:41, Matthew Garrett wrote: > On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote: > >> So the question is, what does Windows do? > > It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap() > arguments to the qemu debug port. Unfortunately I'm about to drop > mostly offline for a week, otherwise I'd give it a go... In order to enable the properties table feature in OVMF, two conditions have to be satisfied: (a) set gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable to TRUE (b) build DXE_RUNTIME_DRIVER modules with 4KB section alignment (a) used to be possible only statically (ie. at build time), but since edk2 commit ab081a50e5 [1] it can be done dynamically too, on the qemu command line. See that edk2 commit for details. This OVMF feature depends on QEMU commit 81b2b81062 [2]. Another patch from Gabriel is under review that enables such simple settings without host-side files [3]. (b) is satisfied by the edk2 patch that Ard posted today [4]. Because I know how much people like building OVMF, I uploaded a fresh OVMF binary (which includes a number of other patches from my personal tree, but those are irrelevent here), with a matching varstore template, to [5]. The relevant edk2 patches (ie. the one from Ard [4], and a debug patch like you mention above) can also be found under [5]. ( I recommend to start QEMU like this: # Create a new (empty) varstore for the virtual machine, from the # varstore template. cp OVMF_VARS.fd my-vars.fd # Start the VM, with direct kernel boot. qemu-system-x86_64 \ -m 2048 \ -M pc,accel=kvm \ \ -device qxl-vga \ \ -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on \ -drive if=pflash,format=raw,unit=1,file=my-vars.fd \ \ -debugcon file:debug.log \ -global isa-debugcon.iobase=0x402 \ \ -chardev stdio,signal=off,mux=on,id=char0 \ -mon chardev=char0,mode=readline,default \ -serial chardev:char0 \ \ -kernel vmlinuz... \ -initrd initramfs... \ -append "root=..." \ \ ... The guest will have 2GB of RAM, it will use KVM, the virtual video card will be QXL; the OVMF debug log will be written to "debug.log". The key combination [Control-A C] will switch between the QEMU monitor prompt and the serial port of the guest. The above should be suitable for rapid testing of kernels built on the host. ) Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10 guests, with the properties table feature enabled vs. disabled in the firmware. (All three Windows guests were updated first though.) All three Windows OSes adapt their SetVirtualAddressMap() calls, when the feature is enabled in the firmware. However, Windows 8.1 crashes nonetheless (BSOD, I forget the fault details, sorry). Windows Server 2012 R2 and Windows 10 boot fine. I uploaded the verbose OVMF log files from all six guest boots to [5]. The tables you might be interested in are dumped at the ends of the log files. All three guests had 2GB of RAM. They had different VM configurations, but between disabling and enabling the properties table feature, no other knob was tweaked. Therefore the two log files of the same guest should be comparable against each other, for each guest. For example: $ colordiff -u ovmf.win10.prop.{disabled,enabled}.log Because stuff hosted on the web privately tends to go away, I'll quote that diff here, for posterity: > --- ovmf.win10.prop.disabled.log 2015-09-29 22:01:45.252126086 +0200 > +++ ovmf.win10.prop.enabled.log 2015-09-29 21:50:54.579475078 +0200 > @@ -24,7 +24,7 @@ > QemuFwCfg interface is supported. > Platform PEIM Loaded > CMOS: > -00: 38 00 01 00 22 00 03 29 09 15 26 02 00 80 00 00 > +00: 48 00 50 00 21 00 03 29 09 15 26 02 00 80 00 00 > 10: 00 00 00 00 06 80 02 FF FF 00 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 30: FF FF 20 00 00 7F 00 20 30 00 00 00 00 12 00 00 > @@ -1110,6 +1110,17 @@ > InstallProtocolInterface: 4CF5B200-68B8-4CA5-9EEC-B23E3F50029A 7F271568 > [Variable]END_OF_DXE is signaled > Initialize variable error flag (FF) > +MemoryProtectionAttribute - 0x0001 > +Total Image Count - 0x8 > +Dump ImageRecord: > + Image[0]: 0x7EC26000 - 0xA000 > + Image[1]: 0x7EC35000 - 0x8000 > + Image[2]: 0x7EC65000 - 0xC000 > + Image[3]: 0x7ED76000 - 0x000A3000 > + Image[4]: 0x7FE9E000 - 0x9000 > + Image[5]: 0x7FEA7000 - 0xA000 > + Image[6]: 0x7FEB1000 - 0xC000 > + Image[7]: 0x7FEBD000 - 0xC000 > S3Ready! > SaveLockBox: Guid=DEA652B0-D587-4C54-B5B4-C682E7A0AA3D Buffer=7FEEE000 > Length=0xA > SetLockBoxAttributes: Guid=DEA652B0-D587-4C54-B5B4-C682E7A0AA3D > Attributes=0x1 > @@ -1822,18 +1833,29 @@ > InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7F0EBA00 > Loading driver at 0x0001000 EntryPoint=0x00010014790 bootmgfw.efi >
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Tue, 29 Sep, at 12:41:23PM, Ard Biesheuvel wrote: > > OK, fair enough. I agree that setting the flag for 32-bit would be > semantically correct. I will leave it to Matt to comment whether it is > reasonable in terms of changes to other parts of the code. It should be pretty minimal. Let me take a swing at the patch. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Tue, 29 Sep, at 11:12:30AM, Ingo Molnar wrote: > > * Ard Biesheuvel wrote: > > > > except that I don't think > > > the condition on 64-bit makes any sense: > > > > > > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { > > > > > > I can see us being nervous wrt. backported patches, but is there any > > > strong reason > > > to not follow this up with a third (non-backported) patch that changes > > > this to: > > > > > > + if (!efi_enabled(EFI_OLD_MEMMAP)) { > > > > > > for v4.4? > > > > > > > The 32-bit side essentially implements the old memmap only, which is the > > the > > bottom-up version. So old memmap will be implied by 32-bit but not set in > > the > > EFI flags, resulting in the reverse enumeration being used with the > > bottom-up > > mapping logic. The net result of that is that we create the same problem > > for > > 32-bit that we are trying to solve for 64-bit, i.e., the regions will end > > up in > > reverse order in the VA mapping. > > > > To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP > > unconditionally on 32-bit x86. Or we could reshuffle variables and > > conditionals > > in various other way. > > Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects. Right, I think that's a very good suggestion, because like Ard mentioned, since EFI_OLD_MEMMAP is implied for 32-bit (there's no other way to map stuff currently), so it makes sense to force set the bit. > > [...] I am not convinced that the overall end result will be any better > > though. > > That's not true, we change an obscure, implicit dependency on 32-bit detail > to an > explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a > clear > improvement. Agreed. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On 29 September 2015 at 11:12, Ingo Molnar wrote: > > * Ard Biesheuvel wrote: > >> > except that I don't think >> > the condition on 64-bit makes any sense: >> > >> > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { >> > >> > I can see us being nervous wrt. backported patches, but is there any >> > strong reason >> > to not follow this up with a third (non-backported) patch that changes >> > this to: >> > >> > + if (!efi_enabled(EFI_OLD_MEMMAP)) { >> > >> > for v4.4? >> > >> >> The 32-bit side essentially implements the old memmap only, which is the the >> bottom-up version. So old memmap will be implied by 32-bit but not set in the >> EFI flags, resulting in the reverse enumeration being used with the bottom-up >> mapping logic. The net result of that is that we create the same problem for >> 32-bit that we are trying to solve for 64-bit, i.e., the regions will end up >> in >> reverse order in the VA mapping. >> >> To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP >> unconditionally on 32-bit x86. Or we could reshuffle variables and >> conditionals >> in various other way. > > Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects. > >> [...] I am not convinced that the overall end result will be any better >> though. > > That's not true, we change an obscure, implicit dependency on 32-bit detail > to an > explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a > clear > improvement. > OK, fair enough. I agree that setting the flag for 32-bit would be semantically correct. I will leave it to Matt to comment whether it is reasonable in terms of changes to other parts of the code. Thanks, Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
* Ard Biesheuvel wrote: > > except that I don't think > > the condition on 64-bit makes any sense: > > > > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { > > > > I can see us being nervous wrt. backported patches, but is there any strong > > reason > > to not follow this up with a third (non-backported) patch that changes this > > to: > > > > + if (!efi_enabled(EFI_OLD_MEMMAP)) { > > > > for v4.4? > > > > The 32-bit side essentially implements the old memmap only, which is the the > bottom-up version. So old memmap will be implied by 32-bit but not set in the > EFI flags, resulting in the reverse enumeration being used with the bottom-up > mapping logic. The net result of that is that we create the same problem for > 32-bit that we are trying to solve for 64-bit, i.e., the regions will end up > in > reverse order in the VA mapping. > > To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP > unconditionally on 32-bit x86. Or we could reshuffle variables and > conditionals > in various other way. Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects. > [...] I am not convinced that the overall end result will be any better > though. That's not true, we change an obscure, implicit dependency on 32-bit detail to an explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a clear improvement. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On 28 September 2015 at 09:22, Ingo Molnar wrote: > > * Ard Biesheuvel wrote: > >> On 27 September 2015 at 08:03, Ingo Molnar wrote: >> > >> > * Matt Fleming wrote: >> > >> [...] >> >> [...] The actual virtual addresses we pick are exactly the same with the >> >> two >> >> patches. >> > >> > So I'm NAK-ing this for now: >> > >> > - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an >> >allocator, because all the sections have already been determined by the >> >firmware, and, as we just learned the hard way, we do not want to >> > deviate from >> >that! There's nothing to 'allocate'! >> > >> >What these patches seem to implement is an elaborate 'allocator' that >> > ends up >> >doing nothing on 'new 64-bit' ... >> > >> > - The 32-bit and 64-bit and 'old_mmap' asymmetries: >> > >> > if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { >> > >> >seem fragile and nonsensical. The question is: is it possible for the >> > whole EFI >> >image to be larger than a couple of megabytes? If not then 32-bit >> > should just >> >mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything >> >differently from this _obvious_ 1:1 mapping of the EFI memory offsets >> > then it's >> >not worth keeping as a legacy, because there's just nothing better than >> >mirroring the firmware layout. >> > >> > My suggestion would be to just 1:1 map what the EFI tables describe, >> > modulo the >> > single absolute offset by which we shift the whole thing to a single base. >> > >> > Is there any technical reason why we'd want to deviate from that? >> > Gigabytes of >> > tables or gigabytes of holes that 32-bit cannot handle? Firmware that >> > wants an OS >> > layout that differs from the firmware layout? >> > >> >> The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1 >> addressable PA >> space. They usually don't but it is a possibility, which means 32-bit will >> not >> generally be able to support this approach. [...] > > Ok, that's a good argument which invalidates my NAK. > >> [...] For 64-bit ARM, there are some minor complications when the base of >> RAM is >> up very high in physical memory, but we already fixed that for the boot time >> ID >> map and for KVM. >> >> > Also, nobody seems to be asking the obvious hardware compatibility question >> > when trying to implement a standard influenced in great part by an entity >> > that >> > is partly ignorant of and partly hostile to Linux: how does Windows map the >> > EFI sections, under what OSs are these firmware versions tested? I suspect >> > no >> > firmware is released that crashes on bootup on all OSs that can run on that >> > hardware, right? >> >> Interestingly, it was the other way around this time. The engineers that >> implemented this feature for EDK2 could not boot Windows 8 anymore, because >> it >> supposedly maps the regions in reverse order as well (and MS too will need to >> backport a fix that inverts the mapping order). The engineers also tested >> Linux/x86, by means of a SUSE installer image, which booted fine, most likely >> due to the fact that it is an older version which still uses the old memmap >> layout. > > That's nice to hear! > >> My concern with all of this is that this security feature will become an >> obscure >> opt-in feature rather than something UEFIv2.5 firmware implementations can >> enable by default. > > Ok, so I think the patches are mostly fine after all, That is good to hear. > except that I don't think > the condition on 64-bit makes any sense: > > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { > > I can see us being nervous wrt. backported patches, but is there any strong > reason > to not follow this up with a third (non-backported) patch that changes this > to: > > + if (!efi_enabled(EFI_OLD_MEMMAP)) { > > for v4.4? > The 32-bit side essentially implements the old memmap only, which is the the bottom-up version. So old memmap will be implied by 32-bit but not set in the EFI flags, resulting in the reverse enumeration being used with the bottom-up mapping logic. The net result of that is that we create the same problem for 32-bit that we are trying to solve for 64-bit, i.e., the regions will end up in reverse order in the VA mapping. To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP unconditionally on 32-bit x86. Or we could reshuffle variables and conditionals in various other way. I am not convinced that the overall end result will be any better though. -- Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
* Ard Biesheuvel wrote: > On 27 September 2015 at 08:03, Ingo Molnar wrote: > > > > * Matt Fleming wrote: > > > [...] > >> [...] The actual virtual addresses we pick are exactly the same with the > >> two > >> patches. > > > > So I'm NAK-ing this for now: > > > > - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an > >allocator, because all the sections have already been determined by the > >firmware, and, as we just learned the hard way, we do not want to > > deviate from > >that! There's nothing to 'allocate'! > > > >What these patches seem to implement is an elaborate 'allocator' that > > ends up > >doing nothing on 'new 64-bit' ... > > > > - The 32-bit and 64-bit and 'old_mmap' asymmetries: > > > > if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { > > > >seem fragile and nonsensical. The question is: is it possible for the > > whole EFI > >image to be larger than a couple of megabytes? If not then 32-bit should > > just > >mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything > >differently from this _obvious_ 1:1 mapping of the EFI memory offsets > > then it's > >not worth keeping as a legacy, because there's just nothing better than > >mirroring the firmware layout. > > > > My suggestion would be to just 1:1 map what the EFI tables describe, modulo > > the > > single absolute offset by which we shift the whole thing to a single base. > > > > Is there any technical reason why we'd want to deviate from that? Gigabytes > > of > > tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants > > an OS > > layout that differs from the firmware layout? > > > > The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1 addressable > PA > space. They usually don't but it is a possibility, which means 32-bit will > not > generally be able to support this approach. [...] Ok, that's a good argument which invalidates my NAK. > [...] For 64-bit ARM, there are some minor complications when the base of RAM > is > up very high in physical memory, but we already fixed that for the boot time > ID > map and for KVM. > > > Also, nobody seems to be asking the obvious hardware compatibility question > > when trying to implement a standard influenced in great part by an entity > > that > > is partly ignorant of and partly hostile to Linux: how does Windows map the > > EFI sections, under what OSs are these firmware versions tested? I suspect > > no > > firmware is released that crashes on bootup on all OSs that can run on that > > hardware, right? > > Interestingly, it was the other way around this time. The engineers that > implemented this feature for EDK2 could not boot Windows 8 anymore, because > it > supposedly maps the regions in reverse order as well (and MS too will need to > backport a fix that inverts the mapping order). The engineers also tested > Linux/x86, by means of a SUSE installer image, which booted fine, most likely > due to the fact that it is an older version which still uses the old memmap > layout. That's nice to hear! > My concern with all of this is that this security feature will become an > obscure > opt-in feature rather than something UEFIv2.5 firmware implementations can > enable by default. Ok, so I think the patches are mostly fine after all, except that I don't think the condition on 64-bit makes any sense: + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { I can see us being nervous wrt. backported patches, but is there any strong reason to not follow this up with a third (non-backported) patch that changes this to: + if (!efi_enabled(EFI_OLD_MEMMAP)) { for v4.4? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote: > So the question is, what does Windows do? It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap() arguments to the qemu debug port. Unfortunately I'm about to drop mostly offline for a week, otherwise I'd give it a go... -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On 27 September 2015 at 08:03, Ingo Molnar wrote: > > * Matt Fleming wrote: > [...] >> [...] The actual virtual addresses we pick are exactly the same with the two >> patches. > > So I'm NAK-ing this for now: > > - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an >allocator, because all the sections have already been determined by the >firmware, and, as we just learned the hard way, we do not want to deviate > from >that! There's nothing to 'allocate'! > >What these patches seem to implement is an elaborate 'allocator' that ends > up >doing nothing on 'new 64-bit' ... > > - The 32-bit and 64-bit and 'old_mmap' asymmetries: > > if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { > >seem fragile and nonsensical. The question is: is it possible for the > whole EFI >image to be larger than a couple of megabytes? If not then 32-bit should > just >mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything >differently from this _obvious_ 1:1 mapping of the EFI memory offsets then > it's >not worth keeping as a legacy, because there's just nothing better than >mirroring the firmware layout. > > My suggestion would be to just 1:1 map what the EFI tables describe, modulo > the > single absolute offset by which we shift the whole thing to a single base. > > Is there any technical reason why we'd want to deviate from that? Gigabytes of > tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants > an OS > layout that differs from the firmware layout? > The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1 addressable PA space. They usually don't but it is a possibility, which means 32-bit will not generally be able to support this approach. For 64-bit ARM, there are some minor complications when the base of RAM is up very high in physical memory, but we already fixed that for the boot time ID map and for KVM. > Also, nobody seems to be asking the obvious hardware compatibility question > when > trying to implement a standard influenced in great part by an entity that is > partly ignorant of and partly hostile to Linux: how does Windows map the EFI > sections, under what OSs are these firmware versions tested? I suspect no > firmware > is released that crashes on bootup on all OSs that can run on that hardware, > right? > Interestingly, it was the other way around this time. The engineers that implemented this feature for EDK2 could not boot Windows 8 anymore, because it supposedly maps the regions in reverse order as well (and MS too will need to backport a fix that inverts the mapping order). The engineers also tested Linux/x86, by means of a SUSE installer image, which booted fine, most likely due to the fact that it is an older version which still uses the old memmap layout. My concern with all of this is that this security feature will become an obscure opt-in feature rather than something UEFIv2.5 firmware implementations can enable by default. -- Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
* Matthew Garrett wrote: > On Sun, Sep 27, 2015 at 09:30:48AM -0700, Andy Lutomirski wrote: > > On Sep 26, 2015 1:19 PM, "H. Peter Anvin" wrote: > > > > > > Sadly a lot of firmware is known to fail in that configuration :( That > > > was very much our guest choice. > > > > > > > Why can't we map everything completely 1:1 (VA = PA) and call the > > setVA thing but pass it literally the identity. > > Last time I tried this I found that some firmware makes assumptions > about having high addresses. So the question is, what does Windows do? PC firmware is a hostile environment for Linux, to be compatible the best we can do is to mimic the environment that the firmware is tested under - i.e. try to use the firmware in the way Windows uses it. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Sun, Sep 27, 2015 at 09:30:48AM -0700, Andy Lutomirski wrote: > On Sep 26, 2015 1:19 PM, "H. Peter Anvin" wrote: > > > > Sadly a lot of firmware is known to fail in that configuration :( That was > > very much our guest choice. > > > > Why can't we map everything completely 1:1 (VA = PA) and call the > setVA thing but pass it literally the identity. Last time I tried this I found that some firmware makes assumptions about having high addresses. -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Sep 26, 2015 1:19 PM, "H. Peter Anvin" wrote: > > Sadly a lot of firmware is known to fail in that configuration :( That was > very much our guest choice. > Why can't we map everything completely 1:1 (VA = PA) and call the setVA thing but pass it literally the identity. Firmwares that need it to be called will work, and firmwares that fail to update all offsets will be fine because there's nothing to update. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
* Matt Fleming wrote: > > So could we make the whole code obviously bottom-up? Such as first > > calculating > > the size of virtual memory needed, then allocating a _single_, obviously > > continuous mapping, and then doing a very clear in-order mapping within > > that > > window? That would remove any bitness and legacy dependencies. > > So, we could, and in fact the first version of this patch did just that. You > can > find it here, > > > https://lkml.kernel.org/r/1441372447-23439-1-git-send-email-m...@codeblueprint.co.uk > > But Ard suggested re-using the existing code and simply changing the order we > map the memmap entries in. Such implementational arguments (which are an internal cost) never trump compatibility and robustness concerns (which are an external constraint). > [...] And given the constraint for a small patch for backporting, I think > it's a > better solution. [...] Ugh, backporting size is _even less_ of a valid argument when it comes to firmware support correctness! We can (perhaps) use these already existing patches as a simpler backport, but there's absolutely no reason to keep that code as a solution: > [...] The actual virtual addresses we pick are exactly the same with the two > patches. So I'm NAK-ing this for now: - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an allocator, because all the sections have already been determined by the firmware, and, as we just learned the hard way, we do not want to deviate from that! There's nothing to 'allocate'! What these patches seem to implement is an elaborate 'allocator' that ends up doing nothing on 'new 64-bit' ... - The 32-bit and 64-bit and 'old_mmap' asymmetries: if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { seem fragile and nonsensical. The question is: is it possible for the whole EFI image to be larger than a couple of megabytes? If not then 32-bit should just mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything differently from this _obvious_ 1:1 mapping of the EFI memory offsets then it's not worth keeping as a legacy, because there's just nothing better than mirroring the firmware layout. My suggestion would be to just 1:1 map what the EFI tables describe, modulo the single absolute offset by which we shift the whole thing to a single base. Is there any technical reason why we'd want to deviate from that? Gigabytes of tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants an OS layout that differs from the firmware layout? Also, nobody seems to be asking the obvious hardware compatibility question when trying to implement a standard influenced in great part by an entity that is partly ignorant of and partly hostile to Linux: how does Windows map the EFI sections, under what OSs are these firmware versions tested? I suspect no firmware is released that crashes on bootup on all OSs that can run on that hardware, right? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
* Andy Lutomirski wrote: > On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar wrote: > > > > So this commit worries me. > > > > This bug is a good find, and the fix is obviously needed and urgent, but > > I'm not > > sure about the implementation at all. (I've Cc:-ed a few more x86 low level > > gents.) > > > > * Matt Fleming wrote: > >> + /* > >> + * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE > >> + * config table feature requires us to map all entries > >> + * in the same order as they appear in the EFI memory > >> + * map. That is to say, entry N must have a lower > >> + * virtual address than entry N+1. This is because the > >> + * firmware toolchain leaves relative references in > >> + * the code/data sections, which are split and become > >> + * separate EFI memory regions. Mapping things > >> + * out-of-order leads to the firmware accessing > >> + * unmapped addresses. > >> + * > > I'm clearly missing something. What is EFI doing that it doesn't care how > big > the gap between sections is but it still requires them to be in order? It's > not > as though x86_64 has an addressing mode that allows only non-negative offsets. It appears the problem is that what we think to be 'different sections' are in reality smaller parts of the same section. Any relative address calculation will be broken if we don't preserve the relative positions of these sections/sub-sections. Any CPU that supports addition is affected, it doesn't need any special addressing modes. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
Sadly a lot of firmware is known to fail in that configuration :( That was very much our guest choice. I don't actually think it is all that infeasible to keep relative offsets consistent for the regions we have to map. PMD_SIZE is not a very large chunk so it could be a problem. On September 26, 2015 1:09:17 PM PDT, Ard Biesheuvel wrote: > >> On 26 sep. 2015, at 12:57, Matt Fleming >wrote: >> >>> On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote: >>> >>> It is still a hack unless all relative offsets are preserved. That >>> is actually simpler, even: no sorting necessary. >> >> Unless I'm missing something, preserving relative offsets is exactly >> what we do today, modulo PMD_SIZE gaps. >> > >I think what Peter means is preserving the relative offsets inside the >entire 1:1 space. > >This is not at all what we do currently, and i don't think it is >generally feasible on 32-bit (since the physical range may conflict >with the virtual kernel mappings) > >However, on 64 bit (both arm and x86), this boils down to not calling >setVA() in the first place, which i'm all in favor of. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
> On 26 sep. 2015, at 12:57, Matt Fleming wrote: > >> On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote: >> >> It is still a hack unless all relative offsets are preserved. That >> is actually simpler, even: no sorting necessary. > > Unless I'm missing something, preserving relative offsets is exactly > what we do today, modulo PMD_SIZE gaps. > I think what Peter means is preserving the relative offsets inside the entire 1:1 space. This is not at all what we do currently, and i don't think it is generally feasible on 32-bit (since the physical range may conflict with the virtual kernel mappings) However, on 64 bit (both arm and x86), this boils down to not calling setVA() in the first place, which i'm all in favor of. -- Ard. > -- > Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote: > > It is still a hack unless all relative offsets are preserved. That > is actually simpler, even: no sorting necessary. Unless I'm missing something, preserving relative offsets is exactly what we do today, modulo PMD_SIZE gaps. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Sat, 26 Sep, at 10:20:22AM, H. Peter Anvin wrote: > > I think it "works" because the affected BIOSes don't put spaces > between the chunks. I have discussed this with Matt. Right, that's very true. Though note that the current mapping algorithm will handle a gap <= PMD_SIZE, it's just anything larger than PMD_SIZE we "squash" to the next multiple of PMD_SIZE. It's unclear whether the firmware toolchains would even support references between sections with a PMD_SIZE gap between them, and I think the firmware engineers would have to go out of their way to actually insert such a gap. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
It is still a hack unless all relative offsets are preserved. That is actually simpler, even: no sorting necessary. On September 26, 2015 11:15:57 AM PDT, Ard Biesheuvel wrote: >On 26 September 2015 at 10:20, H. Peter Anvin wrote: >> I think it "works" because the affected BIOSes don't put spaces >between the chunks. I have discussed this with Matt. >> > >Forgive the ASCII art but perhaps an illustration might help: > >before the 2.5 feature, PE/COFF runtime images were remapped as >illustrated here: > >PAVA >+---+ +---+ >| | | | >| PE/COFF .text | |EFI| >| | |Runtime| >+- - - - - - - -+=> |Services |+ >| | |Code ||: : >| PE/COFF .data | | ||: : >| | | ||+---+ >+---+ +---+|| | >| | | |||EFI| >: : : :||Runtime| >: : : :+--->|Services | >| | | | |Code | >+---+ +---+ | | >| | | | | | >| PE/COFF .text | |EFI| +---+ >| | |Runtime| : gap : >+- - - - - - - -+=> |Services |---+ +---+ >| | |Code | | | | >| PE/COFF .data | | | | |EFI| >| | | | | |Runtime| >+---+ +---+ +>|Services | >| | | | |Code | >: : : : | | >: : : : | | >: : : : +---+ >: : : : : : > >Since the affected symbol references only exist between PE/COFF .text >and PE/COFF .data, there is never a problem since each is PE/COFF >image is mapped as a single region. >However, with the new feature enabled, this no longer holds: >PAVA >+---+ +---+ >| | | | >| PE/COFF .text | |RtServices |+ >| | |Code || >+- - - - - - - -+=> +---+|+---+ >| | |RtServices |+--->|RtServices | >| PE/COFF .data | |Data | |Code | >| | | |++---+ >+---+ +---+|: gap : >| | | ||+---+ >: : : :+--->|RtServices | >: : : : |Data | >| | | | +---+ >+---+ +---+ : gap : >| | | | +---+ >| PE/COFF .text | |RtServices |>|RtServices | >| | |Code | |Code | >+- - - - - - - -+=> +---+ +---+ >| | |RtServices | : gap : >| PE/COFF .data | |Data |---+ +---+ >| | | | | |RtServices | >+---+ +---+ +>|Data | >| | | | | | >: : : : +---+ >: : : : : : >: : : : : : > >The illustration uses gaps, but obviously, this applies equally to >inverting the mapping order, since the PE/COFF .text and .data >sections will end up out of order. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On 26 September 2015 at 10:20, H. Peter Anvin wrote: > I think it "works" because the affected BIOSes don't put spaces between the > chunks. I have discussed this with Matt. > Forgive the ASCII art but perhaps an illustration might help: before the 2.5 feature, PE/COFF runtime images were remapped as illustrated here: PAVA +---+ +---+ | | | | | PE/COFF .text | |EFI| | | |Runtime| +- - - - - - - -+=> |Services |+ | | |Code ||: : | PE/COFF .data | | ||: : | | | ||+---+ +---+ +---+|| | | | | |||EFI| : : : :||Runtime| : : : :+--->|Services | | | | | |Code | +---+ +---+ | | | | | | | | | PE/COFF .text | |EFI| +---+ | | |Runtime| : gap : +- - - - - - - -+=> |Services |---+ +---+ | | |Code | | | | | PE/COFF .data | | | | |EFI| | | | | | |Runtime| +---+ +---+ +>|Services | | | | | |Code | : : : : | | : : : : | | : : : : +---+ : : : : : : Since the affected symbol references only exist between PE/COFF .text and PE/COFF .data, there is never a problem since each is PE/COFF image is mapped as a single region. However, with the new feature enabled, this no longer holds: PAVA +---+ +---+ | | | | | PE/COFF .text | |RtServices |+ | | |Code || +- - - - - - - -+=> +---+|+---+ | | |RtServices |+--->|RtServices | | PE/COFF .data | |Data | |Code | | | | |++---+ +---+ +---+|: gap : | | | ||+---+ : : : :+--->|RtServices | : : : : |Data | | | | | +---+ +---+ +---+ : gap : | | | | +---+ | PE/COFF .text | |RtServices |>|RtServices | | | |Code | |Code | +- - - - - - - -+=> +---+ +---+ | | |RtServices | : gap : | PE/COFF .data | |Data |---+ +---+ | | | | | |RtServices | +---+ +---+ +>|Data | | | | | | | : : : : +---+ : : : : : : : : : : : : The illustration uses gaps, but obviously, this applies equally to inverting the mapping order, since the PE/COFF .text and .data sections will end up out of order. -- Ard. > On September 26, 2015 10:01:14 AM PDT, Andy Lutomirski > wrote: >>On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar wrote: >>> >>> So this commit worries me. >>> >>> This bug is a good find, and the fix is obviously needed and urgent, >>but I'm not >>> sure about the implementation at all. (I've Cc:-ed a few more x86 low >>level >>> gents.) >>> >>> * Matt Fleming wrote: + /* + * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE + * config table feature requires us to map all entries + * in the same order as they appear in the EFI memory + * map. That is
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
I think it "works" because the affected BIOSes don't put spaces between the chunks. I have discussed this with Matt. On September 26, 2015 10:01:14 AM PDT, Andy Lutomirski wrote: >On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar wrote: >> >> So this commit worries me. >> >> This bug is a good find, and the fix is obviously needed and urgent, >but I'm not >> sure about the implementation at all. (I've Cc:-ed a few more x86 low >level >> gents.) >> >> * Matt Fleming wrote: >>> + /* >>> + * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE >>> + * config table feature requires us to map all entries >>> + * in the same order as they appear in the EFI memory >>> + * map. That is to say, entry N must have a lower >>> + * virtual address than entry N+1. This is because the >>> + * firmware toolchain leaves relative references in >>> + * the code/data sections, which are split and become >>> + * separate EFI memory regions. Mapping things >>> + * out-of-order leads to the firmware accessing >>> + * unmapped addresses. >>> + * > >I'm clearly missing something. What is EFI doing that it doesn't care >how big the gap between sections is but it still requires them to be >in order? It's not as though x86_64 has an addressing mode that >allows only non-negative offsets. > >--Andy -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar wrote: > > So this commit worries me. > > This bug is a good find, and the fix is obviously needed and urgent, but I'm > not > sure about the implementation at all. (I've Cc:-ed a few more x86 low level > gents.) > > * Matt Fleming wrote: >> + /* >> + * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE >> + * config table feature requires us to map all entries >> + * in the same order as they appear in the EFI memory >> + * map. That is to say, entry N must have a lower >> + * virtual address than entry N+1. This is because the >> + * firmware toolchain leaves relative references in >> + * the code/data sections, which are split and become >> + * separate EFI memory regions. Mapping things >> + * out-of-order leads to the firmware accessing >> + * unmapped addresses. >> + * I'm clearly missing something. What is EFI doing that it doesn't care how big the gap between sections is but it still requires them to be in order? It's not as though x86_64 has an addressing mode that allows only non-negative offsets. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On Sat, 26 Sep, at 07:56:43AM, Ingo Molnar wrote: > > So this commit worries me. > > This bug is a good find, and the fix is obviously needed and urgent, but I'm > not > sure about the implementation at all. (I've Cc:-ed a few more x86 low level > gents.) Thanks, the more the merrier. > * Matt Fleming wrote: > > > /* > > + * Iterate the EFI memory map in reverse order because the regions > > + * will be mapped top-down. The end result is the same as if we had > > + * mapped things forward, but doesn't require us to change the > > + * existing implementation of efi_map_region(). > > + */ > > +static inline void *efi_map_next_entry_reverse(void *entry) > > +{ > > + /* Initial call */ > > + if (!entry) > > + return memmap.map_end - memmap.desc_size; > > + > > + entry -= memmap.desc_size; > > + if (entry < memmap.map) > > + return NULL; > > + > > + return entry; > > +} > > + > > +/* > > + * efi_map_next_entry - Return the next EFI memory map descriptor > > + * @entry: Previous EFI memory map descriptor > > + * > > + * This is a helper function to iterate over the EFI memory map, which > > + * we do in different orders depending on the current configuration. > > + * > > + * To begin traversing the memory map @entry must be %NULL. > > + * > > + * Returns %NULL when we reach the end of the memory map. > > + */ > > +static void *efi_map_next_entry(void *entry) > > +{ > > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { > > + /* > > +* Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE > > +* config table feature requires us to map all entries > > +* in the same order as they appear in the EFI memory > > +* map. That is to say, entry N must have a lower > > +* virtual address than entry N+1. This is because the > > +* firmware toolchain leaves relative references in > > +* the code/data sections, which are split and become > > +* separate EFI memory regions. Mapping things > > +* out-of-order leads to the firmware accessing > > +* unmapped addresses. > > +* > > +* Since we need to map things this way whether or not > > +* the kernel actually makes use of > > +* EFI_PROPERTIES_TABLE, let's just switch to this > > +* scheme by default for 64-bit. > > The thing is, if relative accesses between these 'sections' do happen then > the > requirement is much stronger than just 'ordered by addresses' - then we must > map > them continuously and as a single block! > > So at minimum the comment should say that. But I think we want more: Well, the firmware doesn't place arbitrary gaps between these runtime image sections in memory, they still get loaded into memory by the firmware the old-fashioned way (as one contiguous blob). It's just that we've now got two memory map entries for the single PE/COFF image. Also, it's only sections from the same PE/COFF image that reference each other (but like Ard said, we can't tell from the memmap which entries are for a single PE/COFF image). Because EFI memory map entries that are part of the same PE/COFF image will be at consecutive addresses, we'll also map them contiguously in the kernel virtual address space, because that's how the current code works - it's just that the current code maps them backwards. Yeah, I'm completely in favour of improving any of the comments. > > +*/ > > + return efi_map_next_entry_reverse(entry); > > + } > > + > > + /* Initial call */ > > + if (!entry) > > + return memmap.map; > > + > > + entry += memmap.desc_size; > > + if (entry >= memmap.map_end) > > + return NULL; > > + > > + return entry; > > +} > > + > > +/* > > * Map the efi memory ranges of the runtime services and update new_mmap > > with > > * virtual addresses. > > */ > > @@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int > > *pg_shift) > > unsigned long left = 0; > > efi_memory_desc_t *md; > > > > - for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) { > > + p = NULL; > > + while ((p = efi_map_next_entry(p))) { > > md = p; > > if (!(md->attribute & EFI_MEMORY_RUNTIME)) { > > So why is this 64-bit only? Is 32-bit not affected because there we allocate > virtual addresses bottom-up? 32-bit would potentially be effected if 32-bit firmware ever enabled the EFI_PROPERTIES_TABLE feature. Whether or not it worked would depend on the fragemntation of the virtual memory space when we map the EFI regions. But I'm not aware of any 32-bit firmware shipping with Properties Table support, and fixing it in the kernel like we do for x86-64 would require changing the EFI region mapping code. That's something I wanted to avoid for this patch since it needs to be applied to stable, and especially when we haven't seen the fea
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
On 25 September 2015 at 22:56, Ingo Molnar wrote: > > So this commit worries me. > > This bug is a good find, and the fix is obviously needed and urgent, but I'm > not > sure about the implementation at all. (I've Cc:-ed a few more x86 low level > gents.) > > * Matt Fleming wrote: > >> /* >> + * Iterate the EFI memory map in reverse order because the regions >> + * will be mapped top-down. The end result is the same as if we had >> + * mapped things forward, but doesn't require us to change the >> + * existing implementation of efi_map_region(). >> + */ >> +static inline void *efi_map_next_entry_reverse(void *entry) >> +{ >> + /* Initial call */ >> + if (!entry) >> + return memmap.map_end - memmap.desc_size; >> + >> + entry -= memmap.desc_size; >> + if (entry < memmap.map) >> + return NULL; >> + >> + return entry; >> +} >> + >> +/* >> + * efi_map_next_entry - Return the next EFI memory map descriptor >> + * @entry: Previous EFI memory map descriptor >> + * >> + * This is a helper function to iterate over the EFI memory map, which >> + * we do in different orders depending on the current configuration. >> + * >> + * To begin traversing the memory map @entry must be %NULL. >> + * >> + * Returns %NULL when we reach the end of the memory map. >> + */ >> +static void *efi_map_next_entry(void *entry) >> +{ >> + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { >> + /* >> + * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE >> + * config table feature requires us to map all entries >> + * in the same order as they appear in the EFI memory >> + * map. That is to say, entry N must have a lower >> + * virtual address than entry N+1. This is because the >> + * firmware toolchain leaves relative references in >> + * the code/data sections, which are split and become >> + * separate EFI memory regions. Mapping things >> + * out-of-order leads to the firmware accessing >> + * unmapped addresses. >> + * >> + * Since we need to map things this way whether or not >> + * the kernel actually makes use of >> + * EFI_PROPERTIES_TABLE, let's just switch to this >> + * scheme by default for 64-bit. > > The thing is, if relative accesses between these 'sections' do happen then the > requirement is much stronger than just 'ordered by addresses' - then we must > map > them continuously and as a single block! > The primary difference between pre-2.5 and 2.5 with this feature enabled is that formerly, each PE/COFF image in memory would be covered by at most a single EfiRuntimeServicesCode region, and now, a single PE/COFF image may be split into different regions. It is only relative references *inside* such a PE/COFF image that we are concerned about, since no symbol references exist between separate ones. Also, it is not only relative references inside the PE/COFF image that cause the problem. Another aspect is that the offset that is applied to all absolute references at relocation time is derived from the offset of the base of the PE/COFF image. If part of the PE/COFF image (the .data section) is moved relatively to the code section, these absolute references are fixed up incorrectly. This is actually a problem that we could solve at the firmware side, but since PE/COFF does not really tolerate being split up like that, the correct fix is to keep all regions belonging to a single PE/COFF image adjacent. Since we can't tell which those regions are, the next best approach is to keep all adjacent regions with the EFI_MEMORY_RUNTIME attribute adjacent in the VA mapping. -- Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
So this commit worries me. This bug is a good find, and the fix is obviously needed and urgent, but I'm not sure about the implementation at all. (I've Cc:-ed a few more x86 low level gents.) * Matt Fleming wrote: > /* > + * Iterate the EFI memory map in reverse order because the regions > + * will be mapped top-down. The end result is the same as if we had > + * mapped things forward, but doesn't require us to change the > + * existing implementation of efi_map_region(). > + */ > +static inline void *efi_map_next_entry_reverse(void *entry) > +{ > + /* Initial call */ > + if (!entry) > + return memmap.map_end - memmap.desc_size; > + > + entry -= memmap.desc_size; > + if (entry < memmap.map) > + return NULL; > + > + return entry; > +} > + > +/* > + * efi_map_next_entry - Return the next EFI memory map descriptor > + * @entry: Previous EFI memory map descriptor > + * > + * This is a helper function to iterate over the EFI memory map, which > + * we do in different orders depending on the current configuration. > + * > + * To begin traversing the memory map @entry must be %NULL. > + * > + * Returns %NULL when we reach the end of the memory map. > + */ > +static void *efi_map_next_entry(void *entry) > +{ > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { > + /* > + * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE > + * config table feature requires us to map all entries > + * in the same order as they appear in the EFI memory > + * map. That is to say, entry N must have a lower > + * virtual address than entry N+1. This is because the > + * firmware toolchain leaves relative references in > + * the code/data sections, which are split and become > + * separate EFI memory regions. Mapping things > + * out-of-order leads to the firmware accessing > + * unmapped addresses. > + * > + * Since we need to map things this way whether or not > + * the kernel actually makes use of > + * EFI_PROPERTIES_TABLE, let's just switch to this > + * scheme by default for 64-bit. The thing is, if relative accesses between these 'sections' do happen then the requirement is much stronger than just 'ordered by addresses' - then we must map them continuously and as a single block! So at minimum the comment should say that. But I think we want more: > + */ > + return efi_map_next_entry_reverse(entry); > + } > + > + /* Initial call */ > + if (!entry) > + return memmap.map; > + > + entry += memmap.desc_size; > + if (entry >= memmap.map_end) > + return NULL; > + > + return entry; > +} > + > +/* > * Map the efi memory ranges of the runtime services and update new_mmap with > * virtual addresses. > */ > @@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int > *pg_shift) > unsigned long left = 0; > efi_memory_desc_t *md; > > - for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) { > + p = NULL; > + while ((p = efi_map_next_entry(p))) { > md = p; > if (!(md->attribute & EFI_MEMORY_RUNTIME)) { So why is this 64-bit only? Is 32-bit not affected because there we allocate virtual addresses bottom-up? This would be a lot clearer if we just mapped the entries in order, no questions asked. Conditions like this: > + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { ... just invite confusion and possible corner cases where we end up mapping them wrong. So could we make the whole code obviously bottom-up? Such as first calculating the size of virtual memory needed, then allocating a _single_, obviously continuous mapping, and then doing a very clear in-order mapping within that window? That would remove any bitness and legacy dependencies. Good virtual memory layout is so critical for any third party code that this should be the central property of the whole approach, not just some side condition somewhere in an iteration loop. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
From: Matt Fleming Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that signals that the firmware PE/COFF loader supports splitting code and data sections of PE/COFF images into separate EFI memory map entries. This allows the kernel to map those regions with strict memory protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc. Unfortunately, an unwritten requirement of this new feature is that the regions need to be mapped with the same offsets relative to each other as observed in the EFI memory map. If this is not done crashes like this may occur, [0.006391] BUG: unable to handle kernel paging request at fffefe6086dd [0.006923] IP: [] 0xfffefe6086dd [0.007000] Call Trace: [0.007000] [] efi_call+0x7e/0x100 [0.007000] [] ? virt_efi_set_variable+0x61/0x90 [0.007000] [] efi_delete_dummy_variable+0x63/0x70 [0.007000] [] efi_enter_virtual_mode+0x383/0x392 [0.007000] [] start_kernel+0x38a/0x417 [0.007000] [] x86_64_start_reservations+0x2a/0x2c [0.007000] [] x86_64_start_kernel+0xeb/0xef Here 0xfffefe6086dd refers to an address the firmware expects to be mapped but which the OS never claimed was mapped. The issue is that included in these regions are relative addresses to other regions which were emitted by the firmware toolchain before the "splitting" of sections occurred at runtime. Needless to say, we don't satisfy this unwritten requirement on x86_64 and instead map the EFI memory map entries in reverse order. The above crash is almost certainly triggerable with any kernel newer than v3.13 because that's when we rewrote the EFI runtime region mapping code, in commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For kernel versions before v3.13 things may work by pure luck depending on the fragmentation of the kernel virtual address space at the time we map the EFI regions. Instead of mapping the EFI memory map entries in reverse order, where entry N has a higher virtual address than entry N+1, map them in the same order as they appear in the EFI memory map to preserve this relative offset between regions. This patch has been kept as small as possible with the intention that it should be applied aggressively to stable and distribution kernels. It is very much a bugfix rather than support for a new feature, since when EFI_PROPERTIES_TABLE is enabled we must map things as outlined above to even boot - we have no way of asking the firmware not to split the code/data regions. In fact, this patch doesn't even make use of the more strict memory protections available in UEFI v2.5. That will come later. Reported-by: Ard Biesheuvel Suggested-by: Ard Biesheuvel Cc: "Lee, Chun-Yi" Cc: Borislav Petkov Cc: Leif Lindholm Cc: Peter Jones Cc: James Bottomley Cc: Matthew Garrett Cc: H. Peter Anvin Cc: Dave Young Cc: Signed-off-by: Matt Fleming --- arch/x86/platform/efi/efi.c | 67 - 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 1db84c0758b7..6a28ded74211 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -705,6 +705,70 @@ out: } /* + * Iterate the EFI memory map in reverse order because the regions + * will be mapped top-down. The end result is the same as if we had + * mapped things forward, but doesn't require us to change the + * existing implementation of efi_map_region(). + */ +static inline void *efi_map_next_entry_reverse(void *entry) +{ + /* Initial call */ + if (!entry) + return memmap.map_end - memmap.desc_size; + + entry -= memmap.desc_size; + if (entry < memmap.map) + return NULL; + + return entry; +} + +/* + * efi_map_next_entry - Return the next EFI memory map descriptor + * @entry: Previous EFI memory map descriptor + * + * This is a helper function to iterate over the EFI memory map, which + * we do in different orders depending on the current configuration. + * + * To begin traversing the memory map @entry must be %NULL. + * + * Returns %NULL when we reach the end of the memory map. + */ +static void *efi_map_next_entry(void *entry) +{ + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) { + /* +* Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE +* config table feature requires us to map all entries +* in the same order as they appear in the EFI memory +* map. That is to say, entry N must have a lower +* virtual address than entry N+1. This is because the +* firmware toolchain leaves relative references in +* the code/data sections, which are split and become +* separate EFI memory regions. Mapping things +* out-of-order leads to the firmware accessing +* unmapped addresses. +