Re: [edk2] Corrupted EFI region
On Wed, 18 Sep, at 01:24:14PM, jerry.hoem...@hp.com wrote: Matt, I conducted the following experiments on a 3.11 kernel: Jerry, could you paste your memory map from the kernel log? -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Sep 16, 2013 at 11:59:20AM +0100, Matt Fleming wrote: On Fri, 13 Sep, at 02:38:12PM, jerry.hoem...@hp.com wrote: Matt, We have hit an issue on our new platform in development related to the call of efi_reserve_boot_services() from setup_arch(). The reservation can interfere with allocation of the crash kernel. Jerry, thanks for bringing this up. In pre 3.9(?) kernels, the crash kernel is required to be allocated from physically contiguous memory below 896 MB. Our new platforms are large in both the amount of memory and the amount of IO. This requires large crash kernels for kdump to work. This is even after the work done for makedumpfile v 1.5 to allow it to work with a smaller foot print. One of the problems is that drivers will allocate memory as boot code and/or data in the region 896 that effectively fragments this memory. With the reservation, we can't reuse the memory when needed for the crash kernels. If we remove the reservation and allow the kernel to reuse the memory, we the reservation of the crash kernel succeeds. This is definitely a problem for distros that are pre 3.9. Probably less so for top of tree, but i haven't been focused there. So we are definitely interested in finding a mechanism to not do this reservation on platforms that don't have the issues described earlier in this thread. OK, in an ideal world we'd move the crash kernel reservation after efi_free_boot_services(), because at that point the boot regions are available again. But it seems that we reserve the boot regions really early during startup and release them relatively late. The reason is that the Boot Graphics Resource Table (BGRT) data, if present, is located in the Boot Services Data regions but we can't extract the address of the region from the ACPI tables until we've setup the ACPI subsystem, which happens quite late. I wonder whether performing the reservation of the crash kernel memory first, before efi_reserve_boot_services(), would help. That way we'd only need to reserve remaining regions in efi_reserve_boot_services(). This scheme would rely on nothing writing into the crash kernel area before we've extracted the BGRT data, however. -- Matt Fleming, Intel Open Source Technology Center Matt, I conducted the following experiments on a 3.11 kernel: 1) Moved the call of reserve_crashkernel to after efi_free_boot_services. Booted with crashkernel=512M a) when memory below 896M was *not* fragmented by BootCode segments reserve_crashkernel succeeded. b) when memory below 896M *was* fragmented by BootCode segments reserve_crashkernel failed. 2) Moved the call to reserve_crashkernel to before call to efi_reserve_boot_services. Booted with crashkernel=512M reserve_crashkernel succeeded irrespective of whether the memory below 896M was fragmented by BootCode segments. I haven't determined why reserve_crashkernel failed in 1b) above. I don't see the memory reserved for the crash kernel being accessed before call to efi_free_boot_services. CC'ing kexec list for their input as I may have missed something. Jerry -- Jerry HoemannSoftware Engineer Hewlett-Packard/MODL 3404 E Harmony Rd. MS 57phone: (970) 898-1022 Ft. Collins, CO 80528 FAX:(970) 898- email: jerry.hoem...@hp.com -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Fri, 13 Sep, at 02:38:12PM, jerry.hoem...@hp.com wrote: Matt, We have hit an issue on our new platform in development related to the call of efi_reserve_boot_services() from setup_arch(). The reservation can interfere with allocation of the crash kernel. Jerry, thanks for bringing this up. In pre 3.9(?) kernels, the crash kernel is required to be allocated from physically contiguous memory below 896 MB. Our new platforms are large in both the amount of memory and the amount of IO. This requires large crash kernels for kdump to work. This is even after the work done for makedumpfile v 1.5 to allow it to work with a smaller foot print. One of the problems is that drivers will allocate memory as boot code and/or data in the region 896 that effectively fragments this memory. With the reservation, we can't reuse the memory when needed for the crash kernels. If we remove the reservation and allow the kernel to reuse the memory, we the reservation of the crash kernel succeeds. This is definitely a problem for distros that are pre 3.9. Probably less so for top of tree, but i haven't been focused there. So we are definitely interested in finding a mechanism to not do this reservation on platforms that don't have the issues described earlier in this thread. OK, in an ideal world we'd move the crash kernel reservation after efi_free_boot_services(), because at that point the boot regions are available again. But it seems that we reserve the boot regions really early during startup and release them relatively late. The reason is that the Boot Graphics Resource Table (BGRT) data, if present, is located in the Boot Services Data regions but we can't extract the address of the region from the ACPI tables until we've setup the ACPI subsystem, which happens quite late. I wonder whether performing the reservation of the crash kernel memory first, before efi_reserve_boot_services(), would help. That way we'd only need to reserve remaining regions in efi_reserve_boot_services(). This scheme would rely on nothing writing into the crash kernel area before we've extracted the BGRT data, however. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On 09/16/13 12:59, Matt Fleming wrote: On Fri, 13 Sep, at 02:38:12PM, jerry.hoem...@hp.com wrote: Matt, We have hit an issue on our new platform in development related to the call of efi_reserve_boot_services() from setup_arch(). The reservation can interfere with allocation of the crash kernel. Jerry, thanks for bringing this up. In pre 3.9(?) kernels, the crash kernel is required to be allocated from physically contiguous memory below 896 MB. Our new platforms are large in both the amount of memory and the amount of IO. This requires large crash kernels for kdump to work. This is even after the work done for makedumpfile v 1.5 to allow it to work with a smaller foot print. One of the problems is that drivers will allocate memory as boot code and/or data in the region 896 that effectively fragments this memory. With the reservation, we can't reuse the memory when needed for the crash kernels. If we remove the reservation and allow the kernel to reuse the memory, we the reservation of the crash kernel succeeds. This is definitely a problem for distros that are pre 3.9. Probably less so for top of tree, but i haven't been focused there. So we are definitely interested in finding a mechanism to not do this reservation on platforms that don't have the issues described earlier in this thread. OK, in an ideal world we'd move the crash kernel reservation after efi_free_boot_services(), because at that point the boot regions are available again. But it seems that we reserve the boot regions really early during startup and release them relatively late. The reason is that the Boot Graphics Resource Table (BGRT) data, if present, is located in the Boot Services Data regions but we can't extract the address of the region from the ACPI tables until we've setup the ACPI subsystem, which happens quite late. Why is BGRT allocated as Boot Services Data? In file MdeModulePkg/Universal/Acpi/BootGraphicsResourceTableDxe/BootGraphicsResourceTableDxe.c: InstallBootGraphicsResourceTable() BgrtAllocateBsDataMemoryBelow4G() gBS-AllocatePages(... EfiBootServicesData ...) From Table 25. Memory Type Usage before ExitBootServices(): EfiBootServicesData -- The data portions of a loaded Boot Services Driver, and the default data allocation type used by a Boot Services Driver to allocate pool memory. EfiACPIReclaimMemory -- Memory that holds the ACPI tables. From Table 26. Memory Type Usage after ExitBootServices(): EfiBootServicesData -- Memory available for general use. EfiACPIReclaimMemory -- This memory is to be preserved by the loader and OS until ACPI is enabled. Once ACPI is enabled, the memory in this range is available for general use. I thought that anything referenced by a pointer in any ACPI table was EfiACPIReclaimMemory or stricter. Specifically, the RSDT or XSDT points to BGRT, so BGRT is EfiACPIReclaimMemory. BGRT points to the image data (with its Image Address field), hence the image data should be EfiACPIReclaimMemory too. Otherwise, the pointer (BGRT.ImageAddress) can outlive the pointed-to storage (the image data). The image data sounds to me like textbook example for EfiACPIReclaimMemory. This way the kernel could free Boot Services Data early, perform the crash kernel reservation right after, and safely access BGRT whenever the ACPI subsystem is brought up later. The edk2 commit that flipped the memory type underneath the image data from EfiReservedMemoryType to EfiBootServicesData is: https://github.com/tianocore/edk2/commit/4c58575e I think this commit is wrong. It's fine for OSPM to release the image data at some point, but not right after ExitBootServices(), because referencing pointers in ACPI tables survive strictly longer. ... Actually, the commit does follow the ACPI spec 5.0: 5.2.22.4 Image Address The Image Address contains the location in memory where an in-memory copy of the boot image can be found. The image should be stored in EfiBootServicesData, allowing the system to reclaim the memory when the image is no longer needed. The ACPI spec 5.0 should recommend EfiACPIReclaimMemory here IMO. (I take the current wording (should be stored) as a recommendation only.) If that's in fact a recommendation (and not a hard requirement), then it should be easy to change BgrtAllocateBsDataMemoryBelow4G() again. Thanks, Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Sep 16, 2013 at 01:50:46PM +0200, Laszlo Ersek wrote: On 09/16/13 12:59, Matt Fleming wrote: On Fri, 13 Sep, at 02:38:12PM, jerry.hoem...@hp.com wrote: Matt, We have hit an issue on our new platform in development related to the call of efi_reserve_boot_services() from setup_arch(). The reservation can interfere with allocation of the crash kernel. Jerry, thanks for bringing this up. In pre 3.9(?) kernels, the crash kernel is required to be allocated from physically contiguous memory below 896 MB. Our new platforms are large in both the amount of memory and the amount of IO. This requires large crash kernels for kdump to work. This is even after the work done for makedumpfile v 1.5 to allow it to work with a smaller foot print. One of the problems is that drivers will allocate memory as boot code and/or data in the region 896 that effectively fragments this memory. With the reservation, we can't reuse the memory when needed for the crash kernels. If we remove the reservation and allow the kernel to reuse the memory, we the reservation of the crash kernel succeeds. This is definitely a problem for distros that are pre 3.9. Probably less so for top of tree, but i haven't been focused there. So we are definitely interested in finding a mechanism to not do this reservation on platforms that don't have the issues described earlier in this thread. OK, in an ideal world we'd move the crash kernel reservation after efi_free_boot_services(), because at that point the boot regions are available again. But it seems that we reserve the boot regions really early during startup and release them relatively late. The reason is that the Boot Graphics Resource Table (BGRT) data, if present, is located in the Boot Services Data regions but we can't extract the address of the region from the ACPI tables until we've setup the ACPI subsystem, which happens quite late. Why is BGRT allocated as Boot Services Data? In file MdeModulePkg/Universal/Acpi/BootGraphicsResourceTableDxe/BootGraphicsResourceTableDxe.c: InstallBootGraphicsResourceTable() BgrtAllocateBsDataMemoryBelow4G() gBS-AllocatePages(... EfiBootServicesData ...) From Table 25. Memory Type Usage before ExitBootServices(): EfiBootServicesData -- The data portions of a loaded Boot Services Driver, and the default data allocation type used by a Boot Services Driver to allocate pool memory. EfiACPIReclaimMemory -- Memory that holds the ACPI tables. From Table 26. Memory Type Usage after ExitBootServices(): EfiBootServicesData -- Memory available for general use. EfiACPIReclaimMemory -- This memory is to be preserved by the loader and OS until ACPI is enabled. Once ACPI is enabled, the memory in this range is available for general use. I thought that anything referenced by a pointer in any ACPI table was EfiACPIReclaimMemory or stricter. Specifically, the RSDT or XSDT points to BGRT, so BGRT is EfiACPIReclaimMemory. BGRT points to the image data (with its Image Address field), hence the image data should be EfiACPIReclaimMemory too. Otherwise, the pointer (BGRT.ImageAddress) can outlive the pointed-to storage (the image data). The image data sounds to me like textbook example for EfiACPIReclaimMemory. This way the kernel could free Boot Services Data early, perform the crash kernel reservation right after, and safely access BGRT whenever the ACPI subsystem is brought up later. The edk2 commit that flipped the memory type underneath the image data from EfiReservedMemoryType to EfiBootServicesData is: https://github.com/tianocore/edk2/commit/4c58575e I think this commit is wrong. It's fine for OSPM to release the image data at some point, but not right after ExitBootServices(), because referencing pointers in ACPI tables survive strictly longer. ... Actually, the commit does follow the ACPI spec 5.0: 5.2.22.4 Image Address The Image Address contains the location in memory where an in-memory copy of the boot image can be found. The image should be stored in EfiBootServicesData, allowing the system to reclaim the memory when the image is no longer needed. The ACPI spec 5.0 should recommend EfiACPIReclaimMemory here IMO. (I take the current wording (should be stored) as a recommendation only.) I agree that UEFI *should* store the BGRT in EfiACPIReclaimMemory, but in practice the UEFI firmware I've seen with a BGRT does follow that recommendation and store it in EfiBootServicesData. So, even if the recommendation in the spec changed, the kernel would still have to accomodate both possibilities. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to
Re: [edk2] Corrupted EFI region
On 09/16/13 17:57, Josh Triplett wrote: The edk2 commit that flipped the memory type underneath the image data from EfiReservedMemoryType to EfiBootServicesData is: https://github.com/tianocore/edk2/commit/4c58575e I think this commit is wrong. It's fine for OSPM to release the image data at some point, but not right after ExitBootServices(), because referencing pointers in ACPI tables survive strictly longer. ... Actually, the commit does follow the ACPI spec 5.0: 5.2.22.4 Image Address The Image Address contains the location in memory where an in-memory copy of the boot image can be found. The image should be stored in EfiBootServicesData, allowing the system to reclaim the memory when the image is no longer needed. The ACPI spec 5.0 should recommend EfiACPIReclaimMemory here IMO. (I take the current wording (should be stored) as a recommendation only.) I agree that UEFI *should* store the BGRT in EfiACPIReclaimMemory, but in practice the UEFI firmware I've seen with a BGRT does follow that recommendation and store it in EfiBootServicesData. So, even if the recommendation in the spec changed, the kernel would still have to accomodate both possibilities. Just for the theoretical debate: The edk2 commit linked above is 5 days old. All UEFI firmware in the wild (on released hardware) should be using EfiReservedMemoryType (the pre-patch memory type), which is even stricter. EfiReservedMemoryType can never be released repurposed, so it should make no difference for crash kernel allocation, shouldn't it? - call efi_free_boot_services() -- doesn't touch the image data (which is in RAM of EfiReservedMemoryType), - reserve crash kernel, - access BGRT via ACPI. BGRT had appeared in edk2 with https://github.com/tianocore/edk2/commit/0284e90c and EfiReservedMemoryType used to be the allocation type until commit 4c58575e. Or are you alluding to UEFI firmware that's not based on TianoCore? Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Sep 16, 2013 at 06:25:22PM +0200, Laszlo Ersek wrote: Or are you alluding to UEFI firmware that's not based on TianoCore? Most BGRT implementations are IBV specific rather than coming from Tiano. The ACPI spec says that the image should be stored in EfiBootServicesData, and most implementations follow that. -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Sep 16, 2013 at 06:25:22PM +0200, Laszlo Ersek wrote: On 09/16/13 17:57, Josh Triplett wrote: The edk2 commit that flipped the memory type underneath the image data from EfiReservedMemoryType to EfiBootServicesData is: https://github.com/tianocore/edk2/commit/4c58575e I think this commit is wrong. It's fine for OSPM to release the image data at some point, but not right after ExitBootServices(), because referencing pointers in ACPI tables survive strictly longer. ... Actually, the commit does follow the ACPI spec 5.0: 5.2.22.4 Image Address The Image Address contains the location in memory where an in-memory copy of the boot image can be found. The image should be stored in EfiBootServicesData, allowing the system to reclaim the memory when the image is no longer needed. The ACPI spec 5.0 should recommend EfiACPIReclaimMemory here IMO. (I take the current wording (should be stored) as a recommendation only.) I agree that UEFI *should* store the BGRT in EfiACPIReclaimMemory, but in practice the UEFI firmware I've seen with a BGRT does follow that recommendation and store it in EfiBootServicesData. So, even if the recommendation in the spec changed, the kernel would still have to accomodate both possibilities. Just for the theoretical debate: The edk2 commit linked above is 5 days old. All UEFI firmware in the wild (on released hardware) should be using EfiReservedMemoryType (the pre-patch memory type), which is even stricter. EfiReservedMemoryType can never be released repurposed, so it should make no difference for crash kernel allocation, shouldn't it? - call efi_free_boot_services() -- doesn't touch the image data (which is in RAM of EfiReservedMemoryType), - reserve crash kernel, - access BGRT via ACPI. BGRT had appeared in edk2 with https://github.com/tianocore/edk2/commit/0284e90c and EfiReservedMemoryType used to be the allocation type until commit 4c58575e. Or are you alluding to UEFI firmware that's not based on TianoCore? I'm saying, in practice, that the systems I tested BGRT support on and submitted patches for stored the BGRT's image in EfiBootServicesData. - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Thu, 08 Aug, at 06:46:02AM, Andrew Fish wrote: On Aug 8, 2013, at 3:17 AM, Matt Fleming m...@console-pimps.org wrote: On Wed, 07 Aug, at 02:10:28PM, Andrew Fish wrote: Well the issue I see is I don't think OS X or Windows are doing this. So I'm guessing there is some unique thing beings done on the Linux side and we don't have good tests to catch bugs in the EFI implementations. If the Linux loader hides the bugs and we don't hit them with other operating systems they are never going to get fixed. It would be good if we could track down some of these issues and make a request for some tests that can help catch these issues. The tests would be part of UEFI.org, but since some of us play in both worlds we can forward the known issues to the UEFI test work group. I'm all for helping to develop tests that catch these kind of bugs. What's the next step? I'll bring this up with UEFI.org. For those attending the UEFI plugfest in New Orleans this would be a good topic for discussion - figuring out a collaboration process to get new tests in place. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
0001-OvmfPkg-allocate-the-EFI-memory-map-for-Linux-as-Loa.patch was applied in r14555. Thanks for the contribution. And thanks for the bug report testing Boris. On Wed, Aug 7, 2013 at 10:49 AM, Laszlo Ersek ler...@redhat.com wrote: On 08/07/13 17:19, Borislav Petkov wrote: On Tue, Aug 06, 2013 at 05:31:29PM +0200, Laszlo Ersek wrote: Can you capture the OVMF debug output? Do you see ConvertPages: Incompatible memory types there? Can you set the following bits too in the debug mask? #define DEBUG_POOL 0x0010 // Alloc Free's #define DEBUG_PAGE 0x0020 // Alloc Free's Ok, I got debug output; I have to be careful now of not missing anything. Ok, so here we go: First of all, I changed debugging mask to: gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel|0x8010007F (I just set all three bits you requested). Using the new OVMF.id changed the addresses, of course, so we're looking at 0x7dc59XXX ones now. [0.00] memblock_reserve: [0x007dc59018-0x007dc59618] efi_memblock_x86_reserve_range+0x70/0x75 So, I've attached an archive of the debug logs. The initial observations I could do is that the region still gets squashed to: [0.014041] efi: mem11: type=4, attr=0xf, range=[0x7dc59000-0x7dc59000) (0MB) from [0.00] efi: mem11: type=4, attr=0xf, range=[0x7dc59000-0x7e146000) (4MB) And the interesting stuff in the OVMF output is right at the end: ConvertRange: 7DC59000-7DC5AFFF to 4 AddRange: 7DC59000-7DC5AFFF to 4 AllocatePoolI: Type 4, Addr 7DC59018 (len 16F0) 26,735,072 Jumping to kernel We get that same output no matter if I boot it with -enable-kvm or not. If the order of the debug messages is the same as the calls actually happen, we AllocatePoolI to address 7DC59018 which we already have added as a range. But I'm not going to pretend I even know the code so I'll let you comment instead :). I think this allows us to solve the bug :) First, forget everything I said :) I was completely lost. Remember this? 01 efi_main() 02 exit_boot() 03low_alloc() 04GetMemoryMap() 05ExitBootServices() 06 07 start_kernel() 08 setup_arch() 09efi_memblock_x86_reserve_range() 10efi_reserve_boot_services() 11 efi_enter_virtual_mode() 12SetVirtualAddressMap() Now, lines 01 to 05 *do not happen*. More precisely, they don't happen in the kernel. They happen in the firmware. Specifically, OvmfPkg/Library/LoadLinuxLib/Linux.c. You're booting the kernel from the qemu command line. The kernel you run is also an [o]ld kernel[] without EFI handover protocol. So what happens is, OVMF downloads the kernel image from qemu over fw_cfg, figures it's an old kernel... PlatformBdsPolicyBehavior() [OvmfPkg/Library/PlatformBdsLib/BdsPlatform.c] // Process QEMU's -kernel command line option: TryRunningQemuKernel()[OvmfPkg/Library/PlatformBdsLib/QemuKernel.c] LoadLinux() [OvmfPkg/Library/LoadLinuxLib/Linux.c] // Old kernels without EFI handover protocol SetupLinuxBootParams() SetupLinuxMemmap() AllocatePool() -- !!! gBS-GetMemoryMap() gBS-ExitBootServices() prints Jumping to kernel JumpToKernel() Now pull up efi_memblock_x86_reserve_range(). It reserves boot_params.efi_info-efi_memmap. I assumed this field would come from the exit_boot() kernel function. It doesn't. It comes from SetupLinuxMemmap(). The former allocates the backing store as EFI_LOADER_DATA. The latter, alas, marked with !!! above, as boot services data. :) So, what you're seeing in the OVMF debug log: ConvertRange: 7DC59000-7DC5AFFF to 4 AddRange: 7DC59000-7DC5AFFF to 4 AllocatePoolI: Type 4, Addr 7DC59018 (len 16F0) 26,735,072 This is self-consistent. It just documents that the AllocatePool() call marked with !!! needs to grab two full pages first (two first lines), carve them up into pool chunks, and then serve the request from them (third line). The address displayed here shows up in the linux dmesg later on because the storage for the memory map itself is allocated, and populated, by OVMF, not the EFI stub in the kernel. In one sentence, efi_memblock_x86_reserve_range() expects that boot_params.efi_info-efi_memmap has been allocated as loader data (by whomever), but SetupLinuxMemmap() violates this by allocating the storage as boot services data. This leads to double reservation attempts between efi_memblock_x86_reserve_range(), and efi_reserve_boot_services(). The attached edk2 patch should fix it. Please confirm. Thanks, Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Aug 7, 2013, at 8:19 AM, Borislav Petkov b...@alien8.de wrote: On Tue, Aug 06, 2013 at 05:31:29PM +0200, Laszlo Ersek wrote: Can you capture the OVMF debug output? Do you see ConvertPages: Incompatible memory types there? Can you set the following bits too in the debug mask? #define DEBUG_POOL 0x0010 // Alloc Free's #define DEBUG_PAGE 0x0020 // Alloc Free's Ok, I got debug output; I have to be careful now of not missing anything. Ok, so here we go: First of all, I changed debugging mask to: gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel|0x8010007F (I just set all three bits you requested). Using the new OVMF.id changed the addresses, of course, so we're looking at 0x7dc59XXX ones now. [0.00] memblock_reserve: [0x007dc59018-0x007dc59618] efi_memblock_x86_reserve_range+0x70/0x75 So, I've attached an archive of the debug logs. The initial observations I could do is that the region still gets squashed to: [0.014041] efi: mem11: type=4, attr=0xf, range=[0x7dc59000-0x7dc59000) (0MB) from [0.00] efi: mem11: type=4, attr=0xf, range=[0x7dc59000-0x7e146000) (4MB) OK so I think I need some Cliff Notes here to help me understand what is going on... type 4 is EfiBootServicesData and attr 0x0f is cache attributes with no request for a runtime mapping. This is not runtime memory so to the OS loader it is just memory EFI has used that will get freed back to the OS after ExitBootServices(), along with EfiBootServicesCode, EfiLoaderCode, and EfiLoaderData. The EfiLoaderCode and EfiLoaderData also get freed back to the OS and they just exist for the convenience of the OS loader. So I can't figure out why this maters? Given: typedef enum { // Boot Services Memory EfiLoaderCode = 1, EfiLoaderData = 2, EfiBootServicesCode = 3, EfiBootServicesData = 4, EfiConventionalMemory = 7, // EFI Runtime Drivers EfiRuntimeServicesCode = 5, EfiRuntimeServicesData = 6, // Stuff that may get mapped into Runtime EfiReservedMemoryType = 0, EfiACPIReclaimMemory = 9, EfiACPIMemoryNVS = 10, EfiMemoryMappedIO = 11, EfiMemoryMappedIOPortSpace = 12, EfiPalCode = 13, EfiUnusableMemory = 8, EfiMaxMemoryType = 14 } EFI_MEMORY_TYPE; [0.005012] efi: efi_enter_virtual_mode **[0.006004] efi: mem00: type=7, attr=0xf, range=[0x-0x0009f000) (0MB) *[0.007004] efi: mem01: type=2, attr=0xf, range=[0x0009f000-0x000a) (0MB) **[0.008004] efi: mem02: type=7, attr=0xf, range=[0x0010-0x0080) (7MB) *[0.009004] efi: mem03: type=4, attr=0xf, range=[0x0080-0x0100) (8MB) **[0.010004] efi: mem04: type=7, attr=0xf, range=[0x0100-0x0200) (16MB) *[0.011004] efi: mem05: type=2, attr=0xf, range=[0x0200-0x036e5000) (22MB) **[0.012004] efi: mem06: type=7, attr=0xf, range=[0x036e5000-0x3fffc000) (969MB) *[0.013004] efi: mem07: type=2, attr=0xf, range=[0x3fffc000-0x4000) (0MB) **[0.014004] efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) *[0.015004] efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) **[0.016004] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7dc59000) (28MB) *[0.017004] efi: mem11: type=4, attr=0xf, range=[0x7dc59000-0x7dc59000) (0MB) *[0.018004] efi: mem12: type=3, attr=0xf, range=[0x7e146000-0x7e1c2000) (0MB) *[0.019004] efi: mem13: type=4, attr=0xf, range=[0x7e1c2000-0x7e1ca000) (0MB) *[0.020004] efi: mem14: type=3, attr=0xf, range=[0x7e1ca000-0x7e1d4000) (0MB) *[0.021004] efi: mem15: type=4, attr=0xf, range=[0x7e1d4000-0x7e1d6000) (0MB) *[0.022004] efi: mem16: type=3, attr=0xf, range=[0x7e1d6000-0x7e368000) (1MB) [0.023004] efi: mem17: type=6, attr=0x800f, range=[0x7e368000-0x7e37d000) (0MB) *[0.024004] efi: mem18: type=4, attr=0xf, range=[0x7e37d000-0x7e8c8000) (5MB) [0.025004] efi: mem19: type=5, attr=0x800f, range=[0x7e8c8000-0x7e8cf000) (0MB) *[0.026004] efi: mem20: type=4, attr=0xf, range=[0x7e8cf000-0x7e923000) (0MB) [0.028010] efi: mem21: type=6, attr=0x800f, range=[0x7e923000-0x7e925000) (0MB) [0.029004] efi: mem22: type=5, attr=0x800f, range=[0x7e925000-0x7e934000) (0MB) *[0.031004] efi: mem23: type=4, attr=0xf, range=[0x7e934000-0x7f881000) (15MB) *[0.032004] efi: mem24: type=3, attr=0xf,
Re: [edk2] Corrupted EFI region
[ Readding Matthew Garrett to the Cc list, seeing as we both got removed for some unknown reason ] On Wed, 07 Aug, at 10:23:56AM, Andrew Fish wrote: OK so I think I need some Cliff Notes here to help me understand what is going on... type 4 is EfiBootServicesData and attr 0x0f is cache attributes with no request for a runtime mapping. This is not runtime memory so to the OS loader it is just memory EFI has used that will get freed back to the OS after ExitBootServices(), along with EfiBootServicesCode, EfiLoaderCode, and EfiLoaderData. The EfiLoaderCode and EfiLoaderData also get freed back to the OS and they just exist for the convenience of the OS loader. So I can't figure out why this maters? Given: We've seen a bunch of systems that make calls into EfiBootServicesCode after ExitBootServices(). There were some Apple machines in that list, though I don't have the details but Matthew should. So we map these regions unconditionally and in their original state, otherwise the firmware will generate fatal page faults when trying to access those memory regions. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On 08/01/13 18:49, Borislav Petkov wrote: On Wed, Jul 31, 2013 at 10:55:27PM +0100, David Woodhouse wrote: On Wed, 2013-07-31 at 22:54 +0200, Borislav Petkov wrote: so I'm seeing this funny thing where an EFI region changes when we enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff: Perhaps the edk2-de...@lists.sourceforge.net list should be in Cc? Good idea and message repeated below. One more thing: I'm using a self-built OVMF with top commit from March: r14165 | sfu5 | 2013-03-06 02:42:04 +0100 (Wed, 06 Mar 2013) | 4 lines Fix a bug that IsSignatureFoundInDatabase() incorrectly computes CertCount. --- Hi guys, so I'm seeing this funny thing where an EFI region changes when we enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff: --- before 2013-07-31 22:20:52.316039492 +0200 +++ after 2013-07-31 22:21:30.960731706 +0200 @@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0 efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) -efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) +efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) (type 4 is EfiBootServicesData) efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) efi: mem13: type=4, attr=0xf, range=[0x7e0cd000-0x7e55d000) (4MB) efi: mem14: type=3, attr=0xf, range=[0x7e55d000-0x7e59c000) (0MB) That second boundary of region mem11 suddenly changes *before* we merge the regions. edk2 bug? I take it you mean this change (ie. appearance of the zero-sized range) occurs when you enable KVM acceleration in qemu? If so, please locate gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel in OvmfPkg/OvmfPkgX64.dsc, and set the following bit in its value: # DEBUG_GCD 0x0010 Global Coherency Database changes Then please rebuild OVMF, and capture the debug port output of qemu (-debugcon file:debug.log -global isa-debugcon.iobase=0x402) both with and without KVM. DEBUG_GCD should produce messages related to CoreAllocateSpace(), and might help us find the spot the difference is introduced. BTW does this have anything to do with the NX bit report of yours, or have you noticed this independently? (I'm not subscribed to lkml so apologies if this email doesn't end up in those archives / doesn't reach everyone.) Thanks Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Aug 05, 2013 at 01:27:16PM +0200, Laszlo Ersek wrote: --- before 2013-07-31 22:20:52.316039492 +0200 +++ after 2013-07-31 22:21:30.960731706 +0200 @@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0 efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) -efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) +efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) (type 4 is EfiBootServicesData) Yes. efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) efi: mem13: type=4, attr=0xf, range=[0x7e0cd000-0x7e55d000) (4MB) efi: mem14: type=3, attr=0xf, range=[0x7e55d000-0x7e59c000) (0MB) That second boundary of region mem11 suddenly changes *before* we merge the regions. edk2 bug? I take it you mean this change (ie. appearance of the zero-sized range) occurs when you enable KVM acceleration in qemu? Right. And I'm booting with qemu -enable-kvm so KVM acceleration is enabled?? Or do you mean something else. If so, please locate gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel in OvmfPkg/OvmfPkgX64.dsc, and set the following bit in its value: # DEBUG_GCD 0x0010 Global Coherency Database changes Then please rebuild OVMF, and capture the debug port output of qemu (-debugcon file:debug.log -global isa-debugcon.iobase=0x402) both with and without KVM. DEBUG_GCD should produce messages related to CoreAllocateSpace(), and might help us find the spot the difference is introduced. Ok, I'll try to get this thing done before my vacation. If not, we'll deal with it afterwards but I won't forget, I promise! :-) BTW does this have anything to do with the NX bit report of yours, or have you noticed this independently? Independently, while testing my runtime services mapping patchset. I was getting an empty region and was wondering whether to discard it from the mapping or not and then I looked at why I get it in the first place. Basically, I get this empty region which appears at some point. It is there when we enter efi_enter_virtual_mode in the kernel to setup the runtime mappings: [0.005012] efi: efi_enter_virtual_mode: enter [0.006004] efi: mem00: type=7, attr=0xf, range=[0x-0x0009f000) (0MB) [0.007004] efi: mem01: type=2, attr=0xf, range=[0x0009f000-0x000a) (0MB) [0.008003] efi: mem02: type=7, attr=0xf, range=[0x0010-0x0080) (7MB) [0.009004] efi: mem03: type=4, attr=0xf, range=[0x0080-0x0100) (8MB) [0.010004] efi: mem04: type=7, attr=0xf, range=[0x0100-0x0200) (16MB) [0.011004] efi: mem05: type=2, attr=0xf, range=[0x0200-0x036e3000) (22MB) [0.012004] efi: mem06: type=7, attr=0xf, range=[0x036e3000-0x3fffb000) (969MB) [0.013003] efi: mem07: type=2, attr=0xf, range=[0x3fffb000-0x4000) (0MB) [0.014004] efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) [0.015004] efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) [0.016004] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) [0.017004] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) ^^ [0.018003] efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) When we dump the EFI regions initially, it is ok. [0.00] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) [0.00] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) [0.00] efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) So what basically happens is the end boundary of the region becomes the start, practically turning it into a 0-size one. Thanks for looking into it. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On 08/05/13 15:02, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 01:27:16PM +0200, Laszlo Ersek wrote: --- before 2013-07-31 22:20:52.316039492 +0200 +++ after 2013-07-31 22:21:30.960731706 +0200 @@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0 efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) -efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) +efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) (type 4 is EfiBootServicesData) Yes. efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) efi: mem13: type=4, attr=0xf, range=[0x7e0cd000-0x7e55d000) (4MB) efi: mem14: type=3, attr=0xf, range=[0x7e55d000-0x7e59c000) (0MB) That second boundary of region mem11 suddenly changes *before* we merge the regions. edk2 bug? I take it you mean this change (ie. appearance of the zero-sized range) occurs when you enable KVM acceleration in qemu? Right. And I'm booting with qemu -enable-kvm so KVM acceleration is enabled?? Or do you mean something else. My question was: is my understanding correct that you only see this problem with -enable-kvm? Because, On 08/01/13 18:49, Borislav Petkov wrote: so I'm seeing this funny thing where an EFI region changes when we enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff: You said on kvm, and provided a diff. I think (hope) I understand the environment you've denoted with after, but what's your before? The absence of -enable-kvm, or something else? If so, please locate gEfiMdePkgTokenSpaceGuid.PcdDebugPrintErrorLevel in OvmfPkg/OvmfPkgX64.dsc, and set the following bit in its value: # DEBUG_GCD 0x0010 Global Coherency Database changes Then please rebuild OVMF, and capture the debug port output of qemu (-debugcon file:debug.log -global isa-debugcon.iobase=0x402) both with and without KVM. DEBUG_GCD should produce messages related to CoreAllocateSpace(), and might help us find the spot the difference is introduced. Ok, I'll try to get this thing done before my vacation. If not, we'll deal with it afterwards but I won't forget, I promise! :-) BTW does this have anything to do with the NX bit report of yours, or have you noticed this independently? Independently, while testing my runtime services mapping patchset. What's the purpose of that series? Can you please provide a link (if you posted versions of it already)? I was getting an empty region and was wondering whether to discard it from the mapping or not and then I looked at why I get it in the first place. Basically, I get this empty region which appears at some point. It is there when we enter efi_enter_virtual_mode in the kernel to setup the runtime mappings: [0.005012] efi: efi_enter_virtual_mode: enter [0.006004] efi: mem00: type=7, attr=0xf, range=[0x-0x0009f000) (0MB) [0.007004] efi: mem01: type=2, attr=0xf, range=[0x0009f000-0x000a) (0MB) [0.008003] efi: mem02: type=7, attr=0xf, range=[0x0010-0x0080) (7MB) [0.009004] efi: mem03: type=4, attr=0xf, range=[0x0080-0x0100) (8MB) [0.010004] efi: mem04: type=7, attr=0xf, range=[0x0100-0x0200) (16MB) [0.011004] efi: mem05: type=2, attr=0xf, range=[0x0200-0x036e3000) (22MB) [0.012004] efi: mem06: type=7, attr=0xf, range=[0x036e3000-0x3fffb000) (969MB) [0.013003] efi: mem07: type=2, attr=0xf, range=[0x3fffb000-0x4000) (0MB) [0.014004] efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) [0.015004] efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) [0.016004] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) [0.017004] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) ^^ [0.018003] efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) When we dump the EFI regions initially, it is ok. [0.00] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) [0.00] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) [0.00] efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) So what basically happens is the end boundary of the region becomes the start, practically turning it into a 0-size one. ... and you
Re: [edk2] Corrupted EFI region
On Mon, Aug 05, 2013 at 03:39:31PM +0200, Laszlo Ersek wrote: My question was: is my understanding correct that you only see this problem with -enable-kvm? Because, On 08/01/13 18:49, Borislav Petkov wrote: so I'm seeing this funny thing where an EFI region changes when we enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff: You said on kvm, and provided a diff. I think (hope) I understand the environment you've denoted with after, but what's your before? The absence of -enable-kvm, or something else? Ah, I see. So 'before' is the initial dump of the EFI regions, very early during boot: [0.00] efi: EFI v2.31 by EDK II [0.00] efi: ACPI=0x7fb71000 ACPI 2.0=0x7fb71014 [0.00] efi: mem00: type=7, attr=0xf, range=[0x-0x0009f000) (0MB) [0.00] efi: mem01: type=2, attr=0xf, range=[0x0009f000-0x000a) (0MB) [0.00] efi: mem02: type=7, attr=0xf, range=[0x0010-0x0080) (7MB) [0.00] efi: mem03: type=4, attr=0xf, range=[0x0080-0x0100) (8MB) [0.00] efi: mem04: type=7, attr=0xf, range=[0x0100-0x0200) (16MB) [0.00] efi: mem05: type=2, attr=0xf, range=[0x0200-0x036e3000) (22MB) [0.00] efi: mem06: type=7, attr=0xf, range=[0x036e3000-0x3fffb000) (969MB) [0.00] efi: mem07: type=2, attr=0xf, range=[0x3fffb000-0x4000) (0MB) [0.00] efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) [0.00] efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) [0.00] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) [0.00] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) [0.00] efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) [0.00] efi: mem13: type=4, attr=0xf, range=[0x7e0cd000-0x7e55d000) (4MB) [0.00] efi: mem14: type=3, attr=0xf, range=[0x7e55d000-0x7e59c000) (0MB) [0.00] efi: mem15: type=4, attr=0xf, range=[0x7e59c000-0x7e5a) (0MB) [0.00] efi: mem16: type=3, attr=0xf, range=[0x7e5a-0x7e668000) (0MB) [0.00] efi: mem17: type=5, attr=0x800f, range=[0x7e668000-0x7e67d000) (0MB) [0.00] efi: mem18: type=6, attr=0x800f, range=[0x7e67d000-0x7e692000) (0MB) [0.00] efi: mem19: type=4, attr=0xf, range=[0x7e692000-0x7f992000) (19MB) [0.00] efi: mem20: type=7, attr=0xf, range=[0x7f992000-0x7f994000) (0MB) [0.00] efi: mem21: type=3, attr=0xf, range=[0x7f994000-0x7fb12000) (1MB) [0.00] efi: mem22: type=5, attr=0x800f, range=[0x7fb12000-0x7fb42000) (0MB) [0.00] efi: mem23: type=6, attr=0x800f, range=[0x7fb42000-0x7fb66000) (0MB) [0.00] efi: mem24: type=0, attr=0xf, range=[0x7fb66000-0x7fb6a000) (0MB) [0.00] efi: mem25: type=9, attr=0xf, range=[0x7fb6a000-0x7fb72000) (0MB) [0.00] efi: mem26: type=10, attr=0xf, range=[0x7fb72000-0x7fb76000) (0MB) [0.00] efi: mem27: type=4, attr=0xf, range=[0x7fb76000-0x7ffe) (4MB) [0.00] efi: mem28: type=6, attr=0x800f, range=[0x7ffe-0x8000) (0MB) and with 'after' I've denoted the dump of the EFI regions a second time, a bit later, when we enter efi_enter_virtual_mode(): [0.005012] efi: efi_enter_virtual_mode: enter [0.006004] efi: mem00: type=7, attr=0xf, range=[0x-0x0009f000) (0MB) [0.007004] efi: mem01: type=2, attr=0xf, range=[0x0009f000-0x000a) (0MB) [0.008003] efi: mem02: type=7, attr=0xf, range=[0x0010-0x0080) (7MB) [0.009004] efi: mem03: type=4, attr=0xf, range=[0x0080-0x0100) (8MB) [0.010004] efi: mem04: type=7, attr=0xf, range=[0x0100-0x0200) (16MB) [0.011004] efi: mem05: type=2, attr=0xf, range=[0x0200-0x036e3000) (22MB) [0.012004] efi: mem06: type=7, attr=0xf, range=[0x036e3000-0x3fffb000) (969MB) [0.013003] efi: mem07: type=2, attr=0xf, range=[0x3fffb000-0x4000) (0MB) [0.014004] efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) [0.015004] efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) [0.016004] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) [0.017004] efi: mem11: type=4, attr=0xf,
Re: [edk2] Corrupted EFI region
On 08/05/13 16:03, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 03:39:31PM +0200, Laszlo Ersek wrote: My question was: is my understanding correct that you only see this problem with -enable-kvm? Because, On 08/01/13 18:49, Borislav Petkov wrote: so I'm seeing this funny thing where an EFI region changes when we enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff: You said on kvm, and provided a diff. I think (hope) I understand the environment you've denoted with after, but what's your before? The absence of -enable-kvm, or something else? Ah, I see. So 'before' is the initial dump of the EFI regions, very early during boot: snip and with 'after' I've denoted the dump of the EFI regions a second time, a bit later, when we enter efi_enter_virtual_mode(): snip during the *same* boot. So, it is one boot but two dumps of the EFI regions. And yes, I'm booting with the 'kvm' executable which has '-enable-kvm' Okay. Thanks for clarifying it. What's the purpose of that series? Can you please provide a link (if you posted versions of it already)? Not yet posted but working on it. The idea is to map the runtime regions at stable addresses so that when we kexec a kernel, it can use runtime services too. And we have to do that because of the braindead design of SetVirtualAddressMap() being callable only once per boot. I wouldn't call the design of SetVirtualAddressMap() braindead. I'd rather call kexec unique and somewhat unexpected :) So what basically happens is the end boundary of the region becomes the start, practically turning it into a 0-size one. ... and you guys suspect that some firmware code is responsible, code that runs between the initial memory map dump, and efi_enter_virtual_mode(): https://lkml.org/lkml/2013/7/31/550 I wouldn't wonder if we f*cked it up again like the last time. I'll give it a long hard look. Ah sorry, by and you guys suspect I didn't mean to imply anything between the lines, I was simply trying to ascertain your working idea :) Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote: I wouldn't call the design of SetVirtualAddressMap() braindead. Ok, I've always wondered and you could probably shed some light on the matter: why is SetVirtualAddressMap() a call-once only? Why can't I simply call it again and update the mappings? I'd rather call kexec unique and somewhat unexpected :) In all fairness, it was there before UEFI, AFAICT. I wouldn't wonder if we f*cked it up again like the last time. I'll give it a long hard look. Ah sorry, by and you guys suspect I didn't mean to imply anything between the lines, I was simply trying to ascertain your working idea :) As long as we get to the bottom of this, we're all fine. And I'd pretty much expect everyone who is dealing with EFI to have grown a sufficiently thick skin before starting to do so, so don't worry. :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On 08/05/13 16:40, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote: I wouldn't call the design of SetVirtualAddressMap() braindead. Ok, I've always wondered and you could probably shed some light on the matter: why is SetVirtualAddressMap() a call-once only? Why can't I simply call it again and update the mappings? The current implementation (how pointers are converted) probably doesn't accommodate a second call. Of course you want to know why SetVirtualAddressMap() was designed like that... I didn't participate in the design so I don't know :) But, as I said, a kernel directly executing another kernel is an unexpected idea. IMHO the second kernel in question doesn't fit the UEFI phases at all. The OS booted like that (ie. the OS whose kernel is the 2nd (=kexec) kernel) never goes through SEC, PEI, DXE, BDS. SetVirtualAddressMap() is a firmware interface, but the kexec OS (including its private boot loader and kernel) are not loaded by firmware. I'd rather call kexec unique and somewhat unexpected :) In all fairness, it was there before UEFI, AFAICT. That doesn't matter as long as the UEFI designers aren't aware of it :) (Who should have made whom aware, ie. Linux people approaching UEFI people, or UEFI people exploring Linux, is a separate topic. As always I'm apolitical about UEFI; I'm not arguing for it or against it. My feeble efforts for improving OVMF and interfacing code are motivated by my employer, not my world view, but as a side-effect of working with the code I can't help but notice some nice things in edk2 and appreciate them :)) I wouldn't wonder if we f*cked it up again like the last time. I'll give it a long hard look. Ah sorry, by and you guys suspect I didn't mean to imply anything between the lines, I was simply trying to ascertain your working idea :) As long as we get to the bottom of this, we're all fine. And I'd pretty much expect everyone who is dealing with EFI to have grown a sufficiently thick skin before starting to do so, so don't worry. :-) This is a unique opportunity for me to point the following. (Unique because it wasn't me bringing up the thick skin thing :)) My skin is *very thin*. It's not even there, you could say. So, if I mess up, please don't insult me. (As explained before, my own language above wasn't even tongue-in-cheek.) Insult my code or my analysis pls. BTW there's another point I'd like to ask about -- you're saying you see the region corruption during the same boot, from the first (early) memmap dump to the second one (when just about to enter virtual mode). But, is this one boot the very first boot, or the kexec one? Thanks! Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, 2013-08-05 at 17:15 +0200, Laszlo Ersek wrote: On 08/05/13 16:40, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote: I wouldn't call the design of SetVirtualAddressMap() braindead. Ok, I've always wondered and you could probably shed some light on the matter: why is SetVirtualAddressMap() a call-once only? Why can't I simply call it again and update the mappings? The current implementation (how pointers are converted) probably doesn't accommodate a second call. Having actually looked at the code (trying to find why we were getting an unconverted pointer), I second that. However, the ugliness of the massive pointer chase should have been an indication that something was not quite right architecturally (or implementation wise) with SetVirtualAddressMap(). Of course you want to know why SetVirtualAddressMap() was designed like that... I didn't participate in the design so I don't know :) But, as I said, a kernel directly executing another kernel is an unexpected idea. IMHO the second kernel in question doesn't fit the UEFI phases at all. The OS booted like that (ie. the OS whose kernel is the 2nd (=kexec) kernel) never goes through SEC, PEI, DXE, BDS. That thinking is a bit last century (not that I'm blaming you for it, it seems to be ingrained in the way UEFI sometimes goes about things) ... in the old days, DOS was bootstrapped by the 512 byte jump code in a well known sector. In the current century, almost every OS is bootstrapped by a sophisticated loader, which is effectively another OS (if you don't believe this, try looking at the grub source code one day); it's a short step from this to one OS booting another, and that's really what kexec is. The utility of kexec has proven itself over the past couple of decades or so by allowing us to dump (kexec to a dump kernel), short circuit the boot process (simply re-kexec the kernel on crash) and now do rebootless upgrades (checkpoint the userspace and kexec to the new kernel). It's not even unique to Linux: Solaris used a hidden kexec system call to do live upgrades as well and I believe several other UNIXs have this feature. James -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Aug 05, 2013 at 06:41:20PM +0200, Laszlo Ersek wrote: I didn't realize the timestamps survive kexec. (As far as I remember the kernels I played with kexec on didn't have the automatic timestamps yet in dmesg, but I might have messed up just as well...) No, no, no, kexec is not involved at all. Here's the whole dmesg up until efi_enter_virtual_map. When we have entered efi_enter_virtual_mode, the region has changed from [0.00] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) to [0.023004] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) And yes, I still need to audit whether the kernel actually does that change. I'm still looking... [2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[0m[35m[40m[2J[01;01H[=3h[2J[01;01H[0m[37m[40m[2J[01;01Hearly console in decompress_kernel Decompressing Linux... Parsing ELF... done. Booting the kernel. [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.10.0-rc7+ (boris@nazgul) (gcc version 4.7.3 (Debian 4.7.3-4) ) #9 SMP PREEMPT Mon Aug 5 16:27:00 CEST 2013 [0.00] Command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009] usable [0.00] BIOS-e820: [mem 0x0010-0x7e667fff] usable [0.00] BIOS-e820: [mem 0x7e668000-0x7e691fff] reserved [0.00] BIOS-e820: [mem 0x7e692000-0x7fb11fff] usable [0.00] BIOS-e820: [mem 0x7fb12000-0x7fb69fff] reserved [0.00] BIOS-e820: [mem 0x7fb6a000-0x7fb71fff] ACPI data [0.00] BIOS-e820: [mem 0x7fb72000-0x7fb75fff] ACPI NVS [0.00] BIOS-e820: [mem 0x7fb76000-0x7ffd] usable [0.00] BIOS-e820: [mem 0x7ffe-0x7fff] reserved [0.00] debug: ignoring loglevel setting. [0.00] bootconsole [earlyser0] enabled [0.00] NX (Execute Disable) protection: active [0.00] efi: EFI v2.31 by EDK II [0.00] efi: ACPI=0x7fb71000 ACPI 2.0=0x7fb71014 [0.00] efi: mem00: type=7, attr=0xf, range=[0x-0x0009f000) (0MB) [0.00] efi: mem01: type=2, attr=0xf, range=[0x0009f000-0x000a) (0MB) [0.00] efi: mem02: type=7, attr=0xf, range=[0x0010-0x0080) (7MB) [0.00] efi: mem03: type=4, attr=0xf, range=[0x0080-0x0100) (8MB) [0.00] efi: mem04: type=7, attr=0xf, range=[0x0100-0x0200) (16MB) [0.00] efi: mem05: type=2, attr=0xf, range=[0x0200-0x036e3000) (22MB) [0.00] efi: mem06: type=7, attr=0xf, range=[0x036e3000-0x3fffb000) (969MB) [0.00] efi: mem07: type=2, attr=0xf, range=[0x3fffb000-0x4000) (0MB) [0.00] efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) [0.00] efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) [0.00] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) [0.00] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) [0.00] efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) [0.00] efi: mem13: type=4, attr=0xf, range=[0x7e0cd000-0x7e55d000) (4MB) [0.00] efi: mem14: type=3, attr=0xf, range=[0x7e55d000-0x7e59c000) (0MB) [0.00] efi: mem15: type=4, attr=0xf, range=[0x7e59c000-0x7e5a) (0MB) [0.00] efi: mem16: type=3, attr=0xf, range=[0x7e5a-0x7e668000) (0MB) [0.00] efi: mem17: type=5, attr=0x800f, range=[0x7e668000-0x7e67d000) (0MB) [0.00] efi: mem18: type=6, attr=0x800f, range=[0x7e67d000-0x7e692000) (0MB) [0.00] efi: mem19: type=4, attr=0xf, range=[0x7e692000-0x7f992000) (19MB) [0.00] efi: mem20: type=7, attr=0xf, range=[0x7f992000-0x7f994000) (0MB) [0.00] efi: mem21: type=3, attr=0xf, range=[0x7f994000-0x7fb12000) (1MB) [0.00] efi: mem22: type=5, attr=0x800f, range=[0x7fb12000-0x7fb42000) (0MB) [0.00] efi: mem23: type=6, attr=0x800f, range=[0x7fb42000-0x7fb66000) (0MB) [0.00] efi: mem24: type=0, attr=0xf, range=[0x7fb66000-0x7fb6a000) (0MB) [0.00] efi: mem25: type=9, attr=0xf,
Re: [edk2] Corrupted EFI region
On Aug 5, 2013, at 7:40 AM, Borislav Petkov b...@alien8.de wrote: On Mon, Aug 05, 2013 at 04:27:44PM +0200, Laszlo Ersek wrote: I wouldn't call the design of SetVirtualAddressMap() braindead. Ok, I've always wondered and you could probably shed some light on the matter: why is SetVirtualAddressMap() a call-once only? Why can't I simply call it again and update the mappings? I'd rather call kexec unique and somewhat unexpected :) In all fairness, it was there before UEFI, AFAICT. AFAICT EFI pre-dates kexec merge into mainline by a number of years as SetVirtualaddressMap() was part of EFI 1.0 (previous millennium) The EFI to UEFI conversion was placing EFI 1.10 into an industry standard, UEFI 2.0. UEFI is an industry standard so some one just needs to make a proposal to update the spec. The edk2 open source project is not part of the standards body so complaining on this mailing list is not going to get anything changed. The conversion of C code to run from address A to address B is a non trivial operation, and a single conversion is bad enough. The infrastructure code required to do the conversion from physical to virtual addressing currently only runs from physical mode, so a call to change virtual address mappings from virtual mode is more complex than the current scheme. In general you don't want complexity in the locked NOR FLASH of the platform that can only be updated by the platform vendor. Even if the platform firmware is easy to update you want to have complexity in the OS as it is easier to change and easier to get right. Thanks, Andrew Fish I wouldn't wonder if we f*cked it up again like the last time. I'll give it a long hard look. Ah sorry, by and you guys suspect I didn't mean to imply anything between the lines, I was simply trying to ascertain your working idea :) As long as we get to the bottom of this, we're all fine. And I'd pretty much expect everyone who is dealing with EFI to have grown a sufficiently thick skin before starting to do so, so don't worry. :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk ___ edk2-devel mailing list edk2-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/edk2-devel -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [edk2] Corrupted EFI region
Boris, A memory map entry with zero size does not look right to me. The memory map passed into SetVirtualAddressMap() must contain the exact same set of memory map entries that existed when ExitBootServices() was called with a return result of EFI_SUCCESS. When you are showing comparisons of memory maps, are you showing the ExitBootServices() one and the SeVirtualAddressMap() one? If the memory maps are not identical, then somehow the memory map is being modified, and we need to figure that out. If the ExitBootServices() memory map has the zero sized entry, then we need to see how GetMemoryMap() is returning a zero sized entry. It is not clear that a zero sized entry would actually break anything, but it is a good idea to root cause that issue and make sure those types of memory map entries are not pass from the FW to the OS. Thanks, Mike -Original Message- From: Borislav Petkov [mailto:b...@alien8.de] Sent: Monday, August 05, 2013 9:48 AM To: Laszlo Ersek Cc: linux-efi@vger.kernel.org; Gleb Natapov; edk2-de...@lists.sourceforge.net; lkml; David Woodhouse Subject: Re: [edk2] Corrupted EFI region On Mon, Aug 05, 2013 at 06:41:20PM +0200, Laszlo Ersek wrote: I didn't realize the timestamps survive kexec. (As far as I remember the kernels I played with kexec on didn't have the automatic timestamps yet in dmesg, but I might have messed up just as well...) No, no, no, kexec is not involved at all. Here's the whole dmesg up until efi_enter_virtual_map. When we have entered efi_enter_virtual_mode, the region has changed from [0.00] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) to [0.023004] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) And yes, I still need to audit whether the kernel actually does that change. I'm still looking... [2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[0m[35m[40m[2J[01;01H[=3h[2J[01;01H[0m[37m[40m[2J[01;01Hearly console in decompress_kernel Decompressing Linux... Parsing ELF... done. Booting the kernel. [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.10.0-rc7+ (boris@nazgul) (gcc version 4.7.3 (Debian 4.7.3-4) ) #9 SMP PREEMPT Mon Aug 5 16:27:00 CEST 2013 [0.00] Command line: root=/dev/sda1 debug ignore_loglevel log_buf_len=10M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009] usable [0.00] BIOS-e820: [mem 0x0010-0x7e667fff] usable [0.00] BIOS-e820: [mem 0x7e668000-0x7e691fff] reserved [0.00] BIOS-e820: [mem 0x7e692000-0x7fb11fff] usable [0.00] BIOS-e820: [mem 0x7fb12000-0x7fb69fff] reserved [0.00] BIOS-e820: [mem 0x7fb6a000-0x7fb71fff] ACPI data [0.00] BIOS-e820: [mem 0x7fb72000-0x7fb75fff] ACPI NVS [0.00] BIOS-e820: [mem 0x7fb76000-0x7ffd] usable [0.00] BIOS-e820: [mem 0x7ffe-0x7fff] reserved [0.00] debug: ignoring loglevel setting. [0.00] bootconsole [earlyser0] enabled [0.00] NX (Execute Disable) protection: active [0.00] efi: EFI v2.31 by EDK II [0.00] efi: ACPI=0x7fb71000 ACPI 2.0=0x7fb71014 [0.00] efi: mem00: type=7, attr=0xf, range=[0x-0x0009f000) (0MB) [0.00] efi: mem01: type=2, attr=0xf, range=[0x0009f000-0x000a) (0MB) [0.00] efi: mem02: type=7, attr=0xf, range=[0x0010-0x0080) (7MB) [0.00] efi: mem03: type=4, attr=0xf, range=[0x0080-0x0100) (8MB) [0.00] efi: mem04: type=7, attr=0xf, range=[0x0100-0x0200) (16MB) [0.00] efi: mem05: type=2, attr=0xf, range=[0x0200-0x036e3000) (22MB) [0.00] efi: mem06: type=7, attr=0xf, range=[0x036e3000-0x3fffb000) (969MB) [0.00] efi: mem07: type=2, attr=0xf, range=[0x3fffb000-0x4000) (0MB) [0.00] efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) [0.00] efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) [0.00] efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) [0.00] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) [0.00] efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) [0.00] efi: mem13: type=4, attr=0xf, range=[0x7e0cd000-0x7e55d000) (4MB) [0.00] efi: mem14: type=3, attr=0xf, range
Re: [edk2] Corrupted EFI region
On 08/05/13 18:47, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 06:41:20PM +0200, Laszlo Ersek wrote: I didn't realize the timestamps survive kexec. (As far as I remember the kernels I played with kexec on didn't have the automatic timestamps yet in dmesg, but I might have messed up just as well...) No, no, no, kexec is not involved at all. I understand. I just explained why I could not derive that fact from the timestamps. You said, No, kexec is not even involved yet. If you look at the timestamps, there's 0.005 seconds between the two dumps during the *same* kernel booting on the machine, baremetal, straight from grub. There are four memmap dumps: (1) first boot, initial dump, (2) first boot, dump when entering virtual mode, (3) kexec boot, initial dump, (4) kexec boot, dump when entering virtual mode. I was aware that we were discussing a problem either between (1) and (2), *or* between (3) and (4); I just didn't know inside which pair. I misunderstood your reply and thought that you were implying the (1)+(2) pair by the low absolute timestamps. I assumed that (3)+(4) would print low timestamps as well (due to the time offset starting from zero in the kexec kernel too) and took your message as a correction to that idea. But, you didn't say anything about the magnitude of the timestamps, only about the differences between them. Sorry for the noise, it's clear now that we're looking at (1)-(2). Thanks Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Aug 05, 2013 at 08:50:17AM -0700, Andrew Fish wrote: AFAICT EFI pre-dates kexec merge into mainline by a number of years as SetVirtualaddressMap() was part of EFI 1.0 (previous millennium) Ok, fair enough. The EFI to UEFI conversion was placing EFI 1.10 into an industry standard, UEFI 2.0. UEFI is an industry standard so some one just needs to make a proposal to update the spec. The edk2 open source project is not part of the standards body so complaining on this mailing list is not going to get anything changed. Right, I don't think that even changing the spec would help - it would actually make things worse because then we'd have to differentiate between UEFI versions: those which can do SetVirtualaddressMap() more than once and the older ones. So let's drop the discussion here - it is what it is, it is too late to change anything. At least we talked about it. :-) Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On 08/05/13 18:47, Borislav Petkov wrote: Here's the whole dmesg up until efi_enter_virtual_map. When we have entered efi_enter_virtual_mode, the region has changed from [0.00] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) to [0.023004] efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) And yes, I still need to audit whether the kernel actually does that change. I'm still looking... The following is a long shot, but I have no better idea for now. Normally the following relevant sequence of calls are made to UEFI services: (a) GetMemoryMap() -- returns memory map and map key, (b) ExitBootServices() -- takes map key (c) SetVirtualAddressMap() -- takes memory map (completed with virtual addresses) ((a)+(b) can be repeated if (b) fails, and Linux seems to retry once.) Now see Linux commit http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=916f676f by Matthew. If I understand correctly, it introduces the function efi_reserve_boot_services(). Normally, immediately after a successful (b) -- ExitBootServices() -- one should be allowed to free boot services code and data. However (c) itself -- SetVirtualAddressMap() -- seems to depend on boot services code and data in some firmware implementations (probably violating the spec). Therefore this commit keeps boot services code and data around long enough for SetVirtualAddressMap(), and releases them after. I *think* efi_reserve_boot_services() runs between (b) and (c), that is, after the initial EFI memmap dump, and before efi_enter_virtual_mode() does its thing (ie. before your debug memmap dump is executed there): efi_main() [arch/x86/boot/compressed/eboot.c] exit_boot() -- covers (a) and (b) start_kernel() [init/main.c] setup_arch() [arch/x86/kernel/setup.c] efi_memblock_x86_reserve_range() [arch/x86/platform/efi/efi.c] efi_reserve_boot_services() [arch/x86/platform/efi/efi.c] efi_enter_virtual_mode() [arch/x86/platform/efi/efi.c] -- covers (c) That is, efi_reserve_boot_services() is called in a place where it can potentially alter the EFI memmap between the two dumps. (I only display efi_memblock_x86_reserve_range() in the callstack above for completeness; I'll refer back to it lower down.) Now look at Linux commit http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7d68dc3f This commit changes efi_reserve_boot_services() -- it restricts the function to reserve the boot services code data only under some circumstances. If those don't hold, then: md-num_pages = 0; Which I think is exactly the source of the region being truncated to zero size. (memmap.phys_map is set to the EFI memory map in efi_memblock_x86_reserve_range(), see the above partial callstack, and memmap.map is pointed at memmap.phys_map in efi_memmap_init(). efi_reserve_boot_services() iterates over memmap.map, so we can say it modifies the EFI memory map.) Granted, memblock_dbg() is called too if num_pages is reset, and the message it prints is not included in your dmesg. However I think that could be explained by memblock_debug==0 [include/linux/memblock.h]. What happens if you pass memblock=debug on the kernel command line (see early_memblock() in mm/memblock.c)? (I just tried it in my Fedora 19 guest, and it in fact produced the message [0.00] efi: Could not reserve boot range [0x80-0xff] ) BTW, regarding Michael's answer, I think this is just one of several ways in which Linux manipulates the EFI memmap between (b) and (c). For example it seems to merge ranges in the map. Thanks, Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On 08/05/2013 11:12 AM, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 08:50:17AM -0700, Andrew Fish wrote: AFAICT EFI pre-dates kexec merge into mainline by a number of years as SetVirtualaddressMap() was part of EFI 1.0 (previous millennium) Ok, fair enough. The EFI to UEFI conversion was placing EFI 1.10 into an industry standard, UEFI 2.0. UEFI is an industry standard so some one just needs to make a proposal to update the spec. The edk2 open source project is not part of the standards body so complaining on this mailing list is not going to get anything changed. Right, I don't think that even changing the spec would help - it would actually make things worse because then we'd have to differentiate between UEFI versions: those which can do SetVirtualaddressMap() more than once and the older ones. So let's drop the discussion here - it is what it is, it is too late to change anything. At least we talked about it. :-) All of this would be a non-problem if there weren't buggy implementations which can't run *without* SetVirtualAddressMap(). -=hpa -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote: All of this would be a non-problem if there weren't buggy implementations which can't run *without* SetVirtualAddressMap(). Oh, you mean, if we were to call the runtime services through their physical addresses? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On 08/05/2013 02:41 PM, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote: All of this would be a non-problem if there weren't buggy implementations which can't run *without* SetVirtualAddressMap(). Oh, you mean, if we were to call the runtime services through their physical addresses? Yes. It is supposed to work, but at least on some Apple machines it triggers bugs. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On 08/05/13 23:41, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote: All of this would be a non-problem if there weren't buggy implementations which can't run *without* SetVirtualAddressMap(). Oh, you mean, if we were to call the runtime services through their physical addresses? I heard that there was a (U)EFI firmware implementation that didn't even implement SetVirtualAddressMap(). It was okay because the main OS for that platform didn't want to call it, it thunked to physical mode for each runtime service call. (This is not hearsay; I'm omitting the specifics because I'm not sure if I'm allowed to give any. I've heard about this stuff from a direct colleague who used to work on these systems.) Laszlo -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, Aug 05, 2013 at 11:26:46PM +0200, Laszlo Ersek wrote: What happens if you pass memblock=debug on the kernel command line (see early_memblock() in mm/memblock.c)? (I just tried it in my Fedora 19 guest, and it in fact produced the message [0.00] efi: Could not reserve boot range [0x80-0xff] Note to self: Always look for bugs in Linux' UEFI code first, before going anywhere else! Yes, very good analysis and good job Laszlo! I'll write what I see now but will doublecheck it tomorrow because I'm almost half asleep. [0.00] efi: efi_reserve_boot_services: - start: 0x7e0ad000, size: 0x1f000 [0.00] efi: Could not reserve boot range [0x007e0ad000-0x007e0cbfff] And yes, this fails because memblock_is_region_reserved(start, size) returns true. And why is that: [0.00] memblock_reserve: [0x00036be000-0x00036c3000] setup_arch+0x60e/0xa63 [0.00] MEMBLOCK configuration: [0.00] memory size = 0x7fef1000 reserved size = 0x1724570 [0.00] memory.cnt = 0x4 [0.00] memory[0x0] [0x001000-0x09], 0x9f000 bytes [0.00] memory[0x1] [0x10-0x007e667fff], 0x7e568000 bytes [0.00] memory[0x2] [0x007e692000-0x007fb11fff], 0x148 bytes [0.00] memory[0x3] [0x007fb76000-0x007ffd], 0x46a000 bytes [0.00] reserved.cnt = 0x3 [0.00] reserved[0x0] [0x09f000-0x0f], 0x61000 bytes [0.00] reserved[0x1] [0x000200-0x00036c2fff], 0x16c3000 bytes [0.00] reserved[0x2] [0x007e0ad018-0x007e0ad587], 0x570 bytes ^ There are 0x570 bytes right in this region which are memblock-reserved and so we truncate it in efi_reserve_boot_services(). This makes me say words which will offend this list so I'll instead go out on the balcony and wake up the neighbors. :-) Ok, thanks again for finding it, I'll go and try to figure out the whole mess tomorrow. Good night! BTW, regarding Michael's answer, I think this is just one of several ways in which Linux manipulates the EFI memmap between (b) and (c). For example it seems to merge ranges in the map. Yes, it does so in efi_enter_virtual_mode(). That was my initial suspicion, that's why I dumped the regions before the merging. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [edk2] Corrupted EFI region
On Mon, 2013-08-05 at 23:55 +0200, Laszlo Ersek wrote: On 08/05/13 23:41, Borislav Petkov wrote: On Mon, Aug 05, 2013 at 02:37:08PM -0700, H. Peter Anvin wrote: All of this would be a non-problem if there weren't buggy implementations which can't run *without* SetVirtualAddressMap(). Oh, you mean, if we were to call the runtime services through their physical addresses? I heard that there was a (U)EFI firmware implementation that didn't even implement SetVirtualAddressMap(). It was okay because the main OS for that platform didn't want to call it, it thunked to physical mode for each runtime service call. (This is not hearsay; I'm omitting the specifics because I'm not sure if I'm allowed to give any. I've heard about this stuff from a direct colleague who used to work on these systems.) That's actually the way all non-x86 unix systems operate. If you look in the firmware mechanisms for almost every non-x86 system in the Linux kernel architecture directories they do this if they have to access firmware from Linux (we do it a lot on parisc to get the IODC to give us the device inventory for instance). I strongly suspect the origin of this weirdness is that once upon a time windows didn't run with a separated address space and so needed a way of accessing firmware in the same address space, hence the pointer relocation trick, but even windows hasn't needed this for a while. James -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Corrupted EFI region
Hi guys, so I'm seeing this funny thing where an EFI region changes when we enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff: --- before 2013-07-31 22:20:52.316039492 +0200 +++ after 2013-07-31 22:21:30.960731706 +0200 @@ -9,7 +9,7 @@ efi: mem07: type=2, attr=0xf, range=[0x0 efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) -efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) +efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) efi: mem13: type=4, attr=0xf, range=[0x7e0cd000-0x7e55d000) (4MB) efi: mem14: type=3, attr=0xf, range=[0x7e55d000-0x7e59c000) (0MB) That second boundary of region mem11 suddenly changes *before* we merge the regions. edk2 bug? Whole dmesg attached. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- test-x86_64.log.gz Description: Binary data
Re: Corrupted EFI region
On Wed, Jul 31, 2013 at 10:54:31PM +0200, Borislav Petkov wrote: efi: mem08: type=7, attr=0xf, range=[0x4000-0x7c00) (960MB) efi: mem09: type=4, attr=0xf, range=[0x7c00-0x7c02) (0MB) efi: mem10: type=7, attr=0xf, range=[0x7c02-0x7e0ad000) (32MB) -efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0cc000) (0MB) +efi: mem11: type=4, attr=0xf, range=[0x7e0ad000-0x7e0ad000) (0MB) efi: mem12: type=7, attr=0xf, range=[0x7e0cc000-0x7e0cd000) (0MB) efi: mem13: type=4, attr=0xf, range=[0x7e0cd000-0x7e55d000) (4MB) efi: mem14: type=3, attr=0xf, range=[0x7e55d000-0x7e59c000) (0MB) Are we making any EFI calls in between? I certainly wouldn't expect the memory map to change after ExitBootServices, but up until that point the firmware's free to mess with it. -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corrupted EFI region
On Wed, Jul 31, 2013 at 11:51:30PM +0200, Borislav Petkov wrote: But the problem is, something messes up the upper boundary of the region and it is an EFI_BOOT_SERVICES_DATA region which we need for the runtime services mapping and if we can't map it properly, we're probably going to miss functionality or not have runtime at all. Easiest way around this would probably be to stash the address map after ExitBootServices() and compare it at SetVirtualAddressMap() time, then take the widest boundaries and trim the e820 map to match. This is obviously dependent upon the system not allocating anything further after that, but it seems safest. The worst case is finding the firmware writing over bits of the kernel. -- Matthew Garrett | mj...@srcf.ucam.org -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corrupted EFI region
On Wed, 2013-07-31 at 22:54 +0200, Borislav Petkov wrote: so I'm seeing this funny thing where an EFI region changes when we enter efi_enter_virtual_mode when booting with edk2 on kvm. Here's the diff: Perhaps the edk2-de...@lists.sourceforge.net list should be in Cc? -- dwmw2 smime.p7s Description: S/MIME cryptographic signature