[PATCH] x86/setup: Extend low identity map to cover whole kernel range
From: Paolo Bonzini On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb *0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 00ff IDT= fffbb000 07ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=0690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x0163 0x1163 0x2163 0x3163 (gdb) x/4wx 0x3399000 0x3399000: 0x0043 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 0x32b7070: 0x [ It appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Andy Lutomirski provides some additional insight: "AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT." Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. Which explains why no one has hit this issue until now. ] Signed-off-by: Paolo Bonzini Reported-by: Laszlo Ersek Cc: Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Andy Lutomirski Signed-off-by: Matt Fleming [ Updated changelog. ] --- arch/x86/kernel/setup.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index fdb7f2a2d328..a3cccbfc5f77 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1173,6 +1173,14 @@ void __init setup_arch(char **cmdline_p) clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, swapper_pg_dir + KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS); + + /* +* sync back low identity map too. It is used for example +* in the 32-bit EFI stub. +*/ + clone_pgd_range(initial_page_table, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); #endif tboot_probe(); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
On October 14, 2015 2:39:58 PM PDT, Andy Lutomirski wrote: >On Wed, Oct 14, 2015 at 2:00 PM, Matt Fleming > wrote: >> On Wed, 14 Oct, at 09:22:03AM, Andy Lutomirski wrote: >>> On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming > wrote: >>> > (Pulling in luto for low-level x86 fu) >>> > >>> > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: >>> >> On 32-bit systems, the initial_page_table is reused by >>> >> efi_call_phys_prolog as an identity map to call >>> >> SetVirtualAddressMap. efi_call_phys_prolog takes care of >>> >> converting the current CPU's GDT to a physical address too. >>> >> >>> >> For PAE kernels the identity mapping is achieved by aliasing the >>> >> first PDPE for the kernel memory mapping into the first PDPE >>> >> of initial_page_table. This makes the EFI stub's trick "just >work". >>> >> >>> >> However, for non-PAE kernels there is no guarantee that the >identity >>> >> mapping in the initial_page_table extends as far as the GDT; in >this >>> >> case, accesses to the GDT will cause a page fault (which quickly >becomes >>> >> a triple fault). Fix this by copying the kernel mappings from >>> >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET >and at >>> >> identity mapping. >>> > >>> > Oops, good catch guys. This is clearly a bug, but... >>> > >>> >> For some reason, this is only reproducible with QEMU's dynamic >translation >>> >> mode, and not for example with KVM. However, even under KVM one >can clearly >>> >> see that the page table is bogus: >>> >>> I haven't looked at the code, but it wouldn't surprise me if this is >>> some kind of TLB issue. With the hardware TLB (which is in use on >>> KVM), it seems quite likely that the GDT is pretty much always in >the >>> TLB and, if nothing flushes global mappings, then it'll probably >stick >>> around. >> >> From some quick experiments it appears that you can skate past this >> issue if you don't receive any interrupts while the bogus GDT pointer >> is loaded, or if you avoid reloading the segment registers in >general. >> Which is interesting because I assumed that writing to GDTR took >> immediate effect. > >Trivia for your amusement: > >AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to >point to unmapped memory. Any attempt to use them (segment loads, >interrupts, IRET, etc) will try to access that memory as if the access >came from CPL 0 and, if the access fails, will generate a valid page >fault with CR2 pointing into the GDT or LDT. > >Xen is nuts^Wclever and actually uses this. > >Of course, if your #PF vector references a GDT or LDT descriptor and >trying to load that descriptor results in a page fault, you get a >double fault. > >I learned this while trying to puzzle out why v1 of my LDT >synchronization patch caused random faults on Xen. > >--Andy There is no "if"... you can't get to an interrupt vector without going through the GDT or LDT. That being said, the GDT or LDT can be partially mapped. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
On Wed, 14 Oct, at 02:39:58PM, Andy Lutomirski wrote: > > Trivia for your amusement: > > AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to > point to unmapped memory. Any attempt to use them (segment loads, > interrupts, IRET, etc) will try to access that memory as if the access > came from CPL 0 and, if the access fails, will generate a valid page > fault with CR2 pointing into the GDT or LDT. > > Xen is nuts^Wclever and actually uses this. > > Of course, if your #PF vector references a GDT or LDT descriptor and > trying to load that descriptor results in a page fault, you get a > double fault. > > I learned this while trying to puzzle out why v1 of my LDT > synchronization patch caused random faults on Xen. Ha, interesting! Thanks Andy, it's good to get confirmation. OK, I think we understand this issue well enough to call this fixed. I've queued the following patch up in my urgent tree and I'll send a pull request to the tip folks tomorrow. --- >From 72ac9f242fad7c3bf3e06461e34c9a546d8cb411 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 14 Oct 2015 13:30:45 +0200 Subject: [PATCH] x86/setup: Extend low identity map to cover whole kernel range On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb *0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 00ff IDT= fffbb000 07ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=0690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x0163 0x1163 0x2163 0x3163 (gdb) x/4wx 0x3399000 0x3399000: 0x0043 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 0x32b7070: 0x [ It appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Andy Lutomirski provides some additional insight: "AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT." Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. Which explains why no one has hit this issue until now. ] Signed-off-by: Paolo Bonzini Reported-by: Laszlo Ersek Cc: Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Andy Lutomirski Signed-off-by: Matt Fleming [ Updated changelog. ] --- arch/x86/kernel/setup.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index fdb7f2a2d328..a3cccbfc5f77 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1173,6 +1173,14 @@ void __init setup_arch(char **cmdline_p) clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, swapper_pg_dir + KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS); + + /* +* sync back low identity map too. It is used for example +* in the 32-bit EFI stub. +*/ + clone_pgd_range(initial_page_table, +
Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
On Wed, Oct 14, 2015 at 2:00 PM, Matt Fleming wrote: > On Wed, 14 Oct, at 09:22:03AM, Andy Lutomirski wrote: >> On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming >> wrote: >> > (Pulling in luto for low-level x86 fu) >> > >> > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: >> >> On 32-bit systems, the initial_page_table is reused by >> >> efi_call_phys_prolog as an identity map to call >> >> SetVirtualAddressMap. efi_call_phys_prolog takes care of >> >> converting the current CPU's GDT to a physical address too. >> >> >> >> For PAE kernels the identity mapping is achieved by aliasing the >> >> first PDPE for the kernel memory mapping into the first PDPE >> >> of initial_page_table. This makes the EFI stub's trick "just work". >> >> >> >> However, for non-PAE kernels there is no guarantee that the identity >> >> mapping in the initial_page_table extends as far as the GDT; in this >> >> case, accesses to the GDT will cause a page fault (which quickly becomes >> >> a triple fault). Fix this by copying the kernel mappings from >> >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at >> >> identity mapping. >> > >> > Oops, good catch guys. This is clearly a bug, but... >> > >> >> For some reason, this is only reproducible with QEMU's dynamic translation >> >> mode, and not for example with KVM. However, even under KVM one can >> >> clearly >> >> see that the page table is bogus: >> >> I haven't looked at the code, but it wouldn't surprise me if this is >> some kind of TLB issue. With the hardware TLB (which is in use on >> KVM), it seems quite likely that the GDT is pretty much always in the >> TLB and, if nothing flushes global mappings, then it'll probably stick >> around. > > From some quick experiments it appears that you can skate past this > issue if you don't receive any interrupts while the bogus GDT pointer > is loaded, or if you avoid reloading the segment registers in general. > Which is interesting because I assumed that writing to GDTR took > immediate effect. Trivia for your amusement: AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT. Xen is nuts^Wclever and actually uses this. Of course, if your #PF vector references a GDT or LDT descriptor and trying to load that descriptor results in a page fault, you get a double fault. I learned this while trying to puzzle out why v1 of my LDT synchronization patch caused random faults on Xen. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
On Wed, 14 Oct, at 04:29:33PM, Paolo Bonzini wrote: > > On 14/10/2015 15:52, Matt Fleming wrote: > >> > However, for non-PAE kernels there is no guarantee that the identity > >> > mapping in the initial_page_table extends as far as the GDT; in this > >> > case, accesses to the GDT will cause a page fault (which quickly becomes > >> > a triple fault). Fix this by copying the kernel mappings from > >> > swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at > >> > identity mapping. > > > > Oops, good catch guys. This is clearly a bug, but... > > > > ... I'm a little surprised you managed to trigger this at all, because > > the GDT we load in efi_call_phys_prolog() is part of the per-cpu data > > section and therefore part of the kernel image. > > Only until setup_percpu, which is earlier than SetVirtualAddressMap. > For example, I get: > > setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:1 nr_node_ids:1 > PERCPU: Embedded 18 pages/cpu @c728e000 s41800 r0 d31928 u73728 > ^^^ > but the kernel image ends at 0x037f. > > The GDT is 0xc728e000 in this run, so the GDT is at the beginning of the > relocated percpu area. Ah, good point. I completely missed that the percpu sections get relocated. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
On Wed, 14 Oct, at 09:22:03AM, Andy Lutomirski wrote: > On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming > wrote: > > (Pulling in luto for low-level x86 fu) > > > > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: > >> On 32-bit systems, the initial_page_table is reused by > >> efi_call_phys_prolog as an identity map to call > >> SetVirtualAddressMap. efi_call_phys_prolog takes care of > >> converting the current CPU's GDT to a physical address too. > >> > >> For PAE kernels the identity mapping is achieved by aliasing the > >> first PDPE for the kernel memory mapping into the first PDPE > >> of initial_page_table. This makes the EFI stub's trick "just work". > >> > >> However, for non-PAE kernels there is no guarantee that the identity > >> mapping in the initial_page_table extends as far as the GDT; in this > >> case, accesses to the GDT will cause a page fault (which quickly becomes > >> a triple fault). Fix this by copying the kernel mappings from > >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at > >> identity mapping. > > > > Oops, good catch guys. This is clearly a bug, but... > > > >> For some reason, this is only reproducible with QEMU's dynamic translation > >> mode, and not for example with KVM. However, even under KVM one can > >> clearly > >> see that the page table is bogus: > > I haven't looked at the code, but it wouldn't surprise me if this is > some kind of TLB issue. With the hardware TLB (which is in use on > KVM), it seems quite likely that the GDT is pretty much always in the > TLB and, if nothing flushes global mappings, then it'll probably stick > around. >From some quick experiments it appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Which is interesting because I assumed that writing to GDTR took immediate effect. Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. That does explain why no one has complained about this issue before. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming wrote: > (Pulling in luto for low-level x86 fu) > > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: >> On 32-bit systems, the initial_page_table is reused by >> efi_call_phys_prolog as an identity map to call >> SetVirtualAddressMap. efi_call_phys_prolog takes care of >> converting the current CPU's GDT to a physical address too. >> >> For PAE kernels the identity mapping is achieved by aliasing the >> first PDPE for the kernel memory mapping into the first PDPE >> of initial_page_table. This makes the EFI stub's trick "just work". >> >> However, for non-PAE kernels there is no guarantee that the identity >> mapping in the initial_page_table extends as far as the GDT; in this >> case, accesses to the GDT will cause a page fault (which quickly becomes >> a triple fault). Fix this by copying the kernel mappings from >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at >> identity mapping. > > Oops, good catch guys. This is clearly a bug, but... > >> For some reason, this is only reproducible with QEMU's dynamic translation >> mode, and not for example with KVM. However, even under KVM one can clearly >> see that the page table is bogus: I haven't looked at the code, but it wouldn't surprise me if this is some kind of TLB issue. With the hardware TLB (which is in use on KVM), it seems quite likely that the GDT is pretty much always in the TLB and, if nothing flushes global mappings, then it'll probably stick around. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
On 14/10/2015 15:52, Matt Fleming wrote: >> > However, for non-PAE kernels there is no guarantee that the identity >> > mapping in the initial_page_table extends as far as the GDT; in this >> > case, accesses to the GDT will cause a page fault (which quickly becomes >> > a triple fault). Fix this by copying the kernel mappings from >> > swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at >> > identity mapping. > > Oops, good catch guys. This is clearly a bug, but... > > ... I'm a little surprised you managed to trigger this at all, because > the GDT we load in efi_call_phys_prolog() is part of the per-cpu data > section and therefore part of the kernel image. Only until setup_percpu, which is earlier than SetVirtualAddressMap. For example, I get: setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:1 nr_node_ids:1 PERCPU: Embedded 18 pages/cpu @c728e000 s41800 r0 d31928 u73728 ^^^ but the kernel image ends at 0x037f. The GDT is 0xc728e000 in this run, so the GDT is at the beginning of the relocated percpu area. In the above run, the FS base that switch_to_new_gdt loads is 0x551C000. You have 0x728E000 - 0x551C000 = 0x1D72000, and from tracing I see that one of the GDT values that is loaded very early is exactly 0xC1D72000. That _is_ inside the kernel image of course when you remove PAGE_OFFSET. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
(Pulling in luto for low-level x86 fu) On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote: > On 32-bit systems, the initial_page_table is reused by > efi_call_phys_prolog as an identity map to call > SetVirtualAddressMap. efi_call_phys_prolog takes care of > converting the current CPU's GDT to a physical address too. > > For PAE kernels the identity mapping is achieved by aliasing the > first PDPE for the kernel memory mapping into the first PDPE > of initial_page_table. This makes the EFI stub's trick "just work". > > However, for non-PAE kernels there is no guarantee that the identity > mapping in the initial_page_table extends as far as the GDT; in this > case, accesses to the GDT will cause a page fault (which quickly becomes > a triple fault). Fix this by copying the kernel mappings from > swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at > identity mapping. Oops, good catch guys. This is clearly a bug, but... > For some reason, this is only reproducible with QEMU's dynamic translation > mode, and not for example with KVM. However, even under KVM one can clearly > see that the page table is bogus: > > $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize > $ gdb > (gdb) target remote localhost:1234 > (gdb) hb *0x02858f6f > Hardware assisted breakpoint 1 at 0x2858f6f > (gdb) c > Continuing. > > Breakpoint 1, 0x02858f6f in ?? () > (gdb) monitor info registers > ... > GDT= 0724e000 00ff > IDT= fffbb000 07ff > CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=0690 > ... > > The page directory is sane: > > (gdb) x/4wx 0x32b7000 > 0x32b7000:0x03398063 0x03399063 0x0339a063 > 0x0339b063 > (gdb) x/4wx 0x3398000 > 0x3398000:0x0163 0x1163 0x2163 > 0x3163 > (gdb) x/4wx 0x3399000 > 0x3399000:0x0043 0x00401003 0x00402003 > 0x00403003 > > but our particular page directory entry is empty: > > (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 > 0x32b7070:0x ... I'm a little surprised you managed to trigger this at all, because the GDT we load in efi_call_phys_prolog() is part of the per-cpu data section and therefore part of the kernel image. The kernel image *is* mapped in the identity range, even for non-PAE kernels. So yes, you're right this is a bug but I'm not sure how you're actually triggering it. > Reported-by: Laszlo Ersek > Cc: sta...@vger.kernel.org > Signed-off-by: Paolo Bonzini > --- > arch/x86/kernel/setup.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 80f874bf999e..24154bd12307 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -1198,6 +1198,14 @@ void __init setup_arch(char **cmdline_p) > clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, > swapper_pg_dir + KERNEL_PGD_BOUNDARY, > KERNEL_PGD_PTRS); > + > + /* > + * sync back low identity map too. It is used for example > + * in the 32-bit EFI stub. > + */ > + clone_pgd_range(initial_page_table, > + swapper_pg_dir + KERNEL_PGD_BOUNDARY, > + KERNEL_PGD_PTRS); > #endif > > tboot_probe(); > -- > 2.5.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-efi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: setup: extend low identity map to cover whole kernel range
On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb *0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 00ff IDT= fffbb000 07ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=0690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x0163 0x1163 0x2163 0x3163 (gdb) x/4wx 0x3399000 0x3399000: 0x0043 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 0x32b7070: 0x Reported-by: Laszlo Ersek Cc: sta...@vger.kernel.org Signed-off-by: Paolo Bonzini --- arch/x86/kernel/setup.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 80f874bf999e..24154bd12307 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1198,6 +1198,14 @@ void __init setup_arch(char **cmdline_p) clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, swapper_pg_dir + KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS); + + /* +* sync back low identity map too. It is used for example +* in the 32-bit EFI stub. +*/ + clone_pgd_range(initial_page_table, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); #endif tboot_probe(); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/