Re: [GIT PULL] x86/mm changes for v4.4
On Mon, Nov 9, 2015 at 11:08 PM, Ard Biesheuvel wrote: > On 9 November 2015 at 22:08, Kees Cook wrote: >> On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel >> wrote: >>> On 8 November 2015 at 07:58, Kees Cook wrote: On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel wrote: > On 7 November 2015 at 08:09, Ingo Molnar wrote: >> >> * Matt Fleming wrote: >> >>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >>> > >>> > 3) We should fix the EFI permission problem without relying on the >>> > firmware: it >>> > appears we could just mark everything R-X optimistically, and if >>> > a write fault >>> > happens (it's pretty rare in fact, only triggers when we write to >>> > an EFI >>> > variable and so), we can mark the faulting page RW- on the fly, >>> > because it >>> > appears that writable EFI sections, while not enumerated very >>> > well in 'old' >>> > firmware, are still supposed to be page granular. (Even 'new' >>> > firmware I >>> > wouldn't automatically trust to get the enumeration right...) >>> >>> Sorry, this isn't true. I misled you with one of my earlier posts on >>> this topic. Let me try and clear things up... >>> >>> Writing to EFI regions has to do with every invocation of the EFI >>> runtime services - it's not limited to when you read/write/delete EFI >>> variables. In fact, EFI variables really have nothing to do with this >>> discussion, they're a completely opaque concept to the OS, we have no >>> idea how the firmware implements them. Everything is done via the EFI >>> boot/runtime services. >>> >>> The firmware itself will attempt to write to EFI regions when we >>> invoke the EFI services because that's where the PE/COFF ".data" and >>> ".bss" sections live along with the heap. There's even some relocation >>> fixups that occur as SetVirtualAddressMap() time so it'll write to >>> ".text" too. >>> >>> Now, the above PE/COFF sections are usually (always?) contained within >>> EFI regions of type EfiRuntimeServicesCode. We know this is true >>> because the firmware folks have told us so, and because stopping that >>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >>> V2.5. >>> >>> The data sections within the region are also *not* guaranteed to be >>> page granular because work was required in Tianocore for emitting >>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >>> support. >>> >>> Ultimately, what this means is that if you were to attempt to >>> dynamically fixup those regions that required write permission, you'd >>> have to modify the mappings for the majority of the EFI regions >>> anyway. And if you're blindly allowing write permission as a fixup, >>> there's not much security to be had. >> >> I think you misunderstood my suggestion: the 'fixup' would be changing >> it from R-X >> to RW-, i.e. it would add 'write' permission but remove 'execute' >> permission. >> >> Note that there would be no 'RWX' permission at any given moment - which >> is the >> dangerous combination. >> > > The problem with that is that /any/ page in the UEFI runtime region > may intersect with both .text and .data of any of the PE/COFF images > that make up the runtime firmware (since the PE/COFF sections are not > necessarily page aligned). Such pages require RWX permissions. The > UEFI memory map does not provide the information to identify those > pages a priori (the entire region containing several PE/COFF images > could be covered by a single entry) so it is hard to guess which pages > should be allowed these RWX permissions. I'm sad that UEFI was designed without even the most basic of memory protections in mind. UEFI _itself_ should be setting up protective page mappings. :( >>> >>> Well, the 4 KB alignment of sections was considered prohibitive at the >>> time from code size pov. But this was a long time ago, obviously. >> >> Heh, yeah, I'd expect max 4K padding to get code/data correctly >> aligned on a 2MB binary to not be an issue. :) >> > > This is not about section sizes on ARM. The PE/COFF format does not > use segments, like ELF, so the payload (the sections) needs to be > completely disjoint from the header. This means, when using 4 KB > alignment, that every PE/COFF image wastes ~4 KB in the header and 4 > KB on average in the section padding (assuming a .text/.data/.reloc > layout, as is common with PE/COFF) > > Considering that a typical UEFI firmware image consists of numerous > (around 50 on average, I think) PE/COFF images, and some of them Oooh, that's no fun. So the linker can't produce merged .text and .data sections? > execute from NOR flash, the Tianocore tooling (which is the reference >
Re: [GIT PULL] x86/mm changes for v4.4
On Mon, Nov 9, 2015 at 11:08 PM, Ard Biesheuvelwrote: > On 9 November 2015 at 22:08, Kees Cook wrote: >> On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel >> wrote: >>> On 8 November 2015 at 07:58, Kees Cook wrote: On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel wrote: > On 7 November 2015 at 08:09, Ingo Molnar wrote: >> >> * Matt Fleming wrote: >> >>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >>> > >>> > 3) We should fix the EFI permission problem without relying on the >>> > firmware: it >>> > appears we could just mark everything R-X optimistically, and if >>> > a write fault >>> > happens (it's pretty rare in fact, only triggers when we write to >>> > an EFI >>> > variable and so), we can mark the faulting page RW- on the fly, >>> > because it >>> > appears that writable EFI sections, while not enumerated very >>> > well in 'old' >>> > firmware, are still supposed to be page granular. (Even 'new' >>> > firmware I >>> > wouldn't automatically trust to get the enumeration right...) >>> >>> Sorry, this isn't true. I misled you with one of my earlier posts on >>> this topic. Let me try and clear things up... >>> >>> Writing to EFI regions has to do with every invocation of the EFI >>> runtime services - it's not limited to when you read/write/delete EFI >>> variables. In fact, EFI variables really have nothing to do with this >>> discussion, they're a completely opaque concept to the OS, we have no >>> idea how the firmware implements them. Everything is done via the EFI >>> boot/runtime services. >>> >>> The firmware itself will attempt to write to EFI regions when we >>> invoke the EFI services because that's where the PE/COFF ".data" and >>> ".bss" sections live along with the heap. There's even some relocation >>> fixups that occur as SetVirtualAddressMap() time so it'll write to >>> ".text" too. >>> >>> Now, the above PE/COFF sections are usually (always?) contained within >>> EFI regions of type EfiRuntimeServicesCode. We know this is true >>> because the firmware folks have told us so, and because stopping that >>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >>> V2.5. >>> >>> The data sections within the region are also *not* guaranteed to be >>> page granular because work was required in Tianocore for emitting >>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >>> support. >>> >>> Ultimately, what this means is that if you were to attempt to >>> dynamically fixup those regions that required write permission, you'd >>> have to modify the mappings for the majority of the EFI regions >>> anyway. And if you're blindly allowing write permission as a fixup, >>> there's not much security to be had. >> >> I think you misunderstood my suggestion: the 'fixup' would be changing >> it from R-X >> to RW-, i.e. it would add 'write' permission but remove 'execute' >> permission. >> >> Note that there would be no 'RWX' permission at any given moment - which >> is the >> dangerous combination. >> > > The problem with that is that /any/ page in the UEFI runtime region > may intersect with both .text and .data of any of the PE/COFF images > that make up the runtime firmware (since the PE/COFF sections are not > necessarily page aligned). Such pages require RWX permissions. The > UEFI memory map does not provide the information to identify those > pages a priori (the entire region containing several PE/COFF images > could be covered by a single entry) so it is hard to guess which pages > should be allowed these RWX permissions. I'm sad that UEFI was designed without even the most basic of memory protections in mind. UEFI _itself_ should be setting up protective page mappings. :( >>> >>> Well, the 4 KB alignment of sections was considered prohibitive at the >>> time from code size pov. But this was a long time ago, obviously. >> >> Heh, yeah, I'd expect max 4K padding to get code/data correctly >> aligned on a 2MB binary to not be an issue. :) >> > > This is not about section sizes on ARM. The PE/COFF format does not > use segments, like ELF, so the payload (the sections) needs to be > completely disjoint from the header. This means, when using 4 KB > alignment, that every PE/COFF image wastes ~4 KB in the header and 4 > KB on average in the section padding (assuming a .text/.data/.reloc > layout, as is common with PE/COFF) > > Considering that a typical UEFI firmware image consists of numerous > (around 50 on average, I think) PE/COFF images, and
Re: [GIT PULL] x86/mm changes for v4.4
On 9 November 2015 at 22:08, Kees Cook wrote: > On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel > wrote: >> On 8 November 2015 at 07:58, Kees Cook wrote: >>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel >>> wrote: On 7 November 2015 at 08:09, Ingo Molnar wrote: > > * Matt Fleming wrote: > >> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >> > >> > 3) We should fix the EFI permission problem without relying on the >> > firmware: it >> > appears we could just mark everything R-X optimistically, and if a >> > write fault >> > happens (it's pretty rare in fact, only triggers when we write to >> > an EFI >> > variable and so), we can mark the faulting page RW- on the fly, >> > because it >> > appears that writable EFI sections, while not enumerated very well >> > in 'old' >> > firmware, are still supposed to be page granular. (Even 'new' >> > firmware I >> > wouldn't automatically trust to get the enumeration right...) >> >> Sorry, this isn't true. I misled you with one of my earlier posts on >> this topic. Let me try and clear things up... >> >> Writing to EFI regions has to do with every invocation of the EFI >> runtime services - it's not limited to when you read/write/delete EFI >> variables. In fact, EFI variables really have nothing to do with this >> discussion, they're a completely opaque concept to the OS, we have no >> idea how the firmware implements them. Everything is done via the EFI >> boot/runtime services. >> >> The firmware itself will attempt to write to EFI regions when we >> invoke the EFI services because that's where the PE/COFF ".data" and >> ".bss" sections live along with the heap. There's even some relocation >> fixups that occur as SetVirtualAddressMap() time so it'll write to >> ".text" too. >> >> Now, the above PE/COFF sections are usually (always?) contained within >> EFI regions of type EfiRuntimeServicesCode. We know this is true >> because the firmware folks have told us so, and because stopping that >> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >> V2.5. >> >> The data sections within the region are also *not* guaranteed to be >> page granular because work was required in Tianocore for emitting >> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >> support. >> >> Ultimately, what this means is that if you were to attempt to >> dynamically fixup those regions that required write permission, you'd >> have to modify the mappings for the majority of the EFI regions >> anyway. And if you're blindly allowing write permission as a fixup, >> there's not much security to be had. > > I think you misunderstood my suggestion: the 'fixup' would be changing it > from R-X > to RW-, i.e. it would add 'write' permission but remove 'execute' > permission. > > Note that there would be no 'RWX' permission at any given moment - which > is the > dangerous combination. > The problem with that is that /any/ page in the UEFI runtime region may intersect with both .text and .data of any of the PE/COFF images that make up the runtime firmware (since the PE/COFF sections are not necessarily page aligned). Such pages require RWX permissions. The UEFI memory map does not provide the information to identify those pages a priori (the entire region containing several PE/COFF images could be covered by a single entry) so it is hard to guess which pages should be allowed these RWX permissions. >>> >>> I'm sad that UEFI was designed without even the most basic of memory >>> protections in mind. UEFI _itself_ should be setting up protective >>> page mappings. :( >>> >> >> Well, the 4 KB alignment of sections was considered prohibitive at the >> time from code size pov. But this was a long time ago, obviously. > > Heh, yeah, I'd expect max 4K padding to get code/data correctly > aligned on a 2MB binary to not be an issue. :) > This is not about section sizes on ARM. The PE/COFF format does not use segments, like ELF, so the payload (the sections) needs to be completely disjoint from the header. This means, when using 4 KB alignment, that every PE/COFF image wastes ~4 KB in the header and 4 KB on average in the section padding (assuming a .text/.data/.reloc layout, as is common with PE/COFF) Considering that a typical UEFI firmware image consists of numerous (around 50 on average, I think) PE/COFF images, and some of them execute from NOR flash, the Tianocore tooling (which is the reference implementation) has always been geared towards keeping the alignment as small as possible, typically 32 bytes unless data objects need more. Since the UEFI runtime services are typically implemented by several of these PE/COFF images, and since the
Re: [GIT PULL] x86/mm changes for v4.4
On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel wrote: > On 8 November 2015 at 07:58, Kees Cook wrote: >> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel >> wrote: >>> On 7 November 2015 at 08:09, Ingo Molnar wrote: * Matt Fleming wrote: > On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: > > > > 3) We should fix the EFI permission problem without relying on the > > firmware: it > > appears we could just mark everything R-X optimistically, and if a > > write fault > > happens (it's pretty rare in fact, only triggers when we write to > > an EFI > > variable and so), we can mark the faulting page RW- on the fly, > > because it > > appears that writable EFI sections, while not enumerated very well > > in 'old' > > firmware, are still supposed to be page granular. (Even 'new' > > firmware I > > wouldn't automatically trust to get the enumeration right...) > > Sorry, this isn't true. I misled you with one of my earlier posts on > this topic. Let me try and clear things up... > > Writing to EFI regions has to do with every invocation of the EFI > runtime services - it's not limited to when you read/write/delete EFI > variables. In fact, EFI variables really have nothing to do with this > discussion, they're a completely opaque concept to the OS, we have no > idea how the firmware implements them. Everything is done via the EFI > boot/runtime services. > > The firmware itself will attempt to write to EFI regions when we > invoke the EFI services because that's where the PE/COFF ".data" and > ".bss" sections live along with the heap. There's even some relocation > fixups that occur as SetVirtualAddressMap() time so it'll write to > ".text" too. > > Now, the above PE/COFF sections are usually (always?) contained within > EFI regions of type EfiRuntimeServicesCode. We know this is true > because the firmware folks have told us so, and because stopping that > is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI > V2.5. > > The data sections within the region are also *not* guaranteed to be > page granular because work was required in Tianocore for emitting > sections with 4k alignment as part of the EFI_PROPERTIES_TABLE > support. > > Ultimately, what this means is that if you were to attempt to > dynamically fixup those regions that required write permission, you'd > have to modify the mappings for the majority of the EFI regions > anyway. And if you're blindly allowing write permission as a fixup, > there's not much security to be had. I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X to RW-, i.e. it would add 'write' permission but remove 'execute' permission. Note that there would be no 'RWX' permission at any given moment - which is the dangerous combination. >>> >>> The problem with that is that /any/ page in the UEFI runtime region >>> may intersect with both .text and .data of any of the PE/COFF images >>> that make up the runtime firmware (since the PE/COFF sections are not >>> necessarily page aligned). Such pages require RWX permissions. The >>> UEFI memory map does not provide the information to identify those >>> pages a priori (the entire region containing several PE/COFF images >>> could be covered by a single entry) so it is hard to guess which pages >>> should be allowed these RWX permissions. >> >> I'm sad that UEFI was designed without even the most basic of memory >> protections in mind. UEFI _itself_ should be setting up protective >> page mappings. :( >> > > Well, the 4 KB alignment of sections was considered prohibitive at the > time from code size pov. But this was a long time ago, obviously. Heh, yeah, I'd expect max 4K padding to get code/data correctly aligned on a 2MB binary to not be an issue. :) > >> For a boot firmware, it seems to me that safe page table layout would >> be a top priority bug. The "reporting issues" page for TianoCore >> doesn't actually seem to link to the "Project Tracker": >> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues >> >> Does anyone know how to get this correctly reported so future UEFI >> releases don't suffer from this? >> > > Ugh. Don't get me started on that topic. I have been working with the > UEFI forum since July to get a fundamentally broken implementation of > memory protections fixed. UEFI v2.5 defines a memory protection scheme > that is based on splitting PE/COFF images into separate memory regions > so that R-X and RW- permissions can be applied. Unfortunately, that > broke every OS in existence (including Windows 8), since the OS is > allowed to reorder memory regions when it lays out the virtual > remapping of the UEFI regions, resulting in PE/COFF .data and .text > potentially
Re: [GIT PULL] x86/mm changes for v4.4
On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvelwrote: > On 8 November 2015 at 07:58, Kees Cook wrote: >> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel >> wrote: >>> On 7 November 2015 at 08:09, Ingo Molnar wrote: * Matt Fleming wrote: > On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: > > > > 3) We should fix the EFI permission problem without relying on the > > firmware: it > > appears we could just mark everything R-X optimistically, and if a > > write fault > > happens (it's pretty rare in fact, only triggers when we write to > > an EFI > > variable and so), we can mark the faulting page RW- on the fly, > > because it > > appears that writable EFI sections, while not enumerated very well > > in 'old' > > firmware, are still supposed to be page granular. (Even 'new' > > firmware I > > wouldn't automatically trust to get the enumeration right...) > > Sorry, this isn't true. I misled you with one of my earlier posts on > this topic. Let me try and clear things up... > > Writing to EFI regions has to do with every invocation of the EFI > runtime services - it's not limited to when you read/write/delete EFI > variables. In fact, EFI variables really have nothing to do with this > discussion, they're a completely opaque concept to the OS, we have no > idea how the firmware implements them. Everything is done via the EFI > boot/runtime services. > > The firmware itself will attempt to write to EFI regions when we > invoke the EFI services because that's where the PE/COFF ".data" and > ".bss" sections live along with the heap. There's even some relocation > fixups that occur as SetVirtualAddressMap() time so it'll write to > ".text" too. > > Now, the above PE/COFF sections are usually (always?) contained within > EFI regions of type EfiRuntimeServicesCode. We know this is true > because the firmware folks have told us so, and because stopping that > is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI > V2.5. > > The data sections within the region are also *not* guaranteed to be > page granular because work was required in Tianocore for emitting > sections with 4k alignment as part of the EFI_PROPERTIES_TABLE > support. > > Ultimately, what this means is that if you were to attempt to > dynamically fixup those regions that required write permission, you'd > have to modify the mappings for the majority of the EFI regions > anyway. And if you're blindly allowing write permission as a fixup, > there's not much security to be had. I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X to RW-, i.e. it would add 'write' permission but remove 'execute' permission. Note that there would be no 'RWX' permission at any given moment - which is the dangerous combination. >>> >>> The problem with that is that /any/ page in the UEFI runtime region >>> may intersect with both .text and .data of any of the PE/COFF images >>> that make up the runtime firmware (since the PE/COFF sections are not >>> necessarily page aligned). Such pages require RWX permissions. The >>> UEFI memory map does not provide the information to identify those >>> pages a priori (the entire region containing several PE/COFF images >>> could be covered by a single entry) so it is hard to guess which pages >>> should be allowed these RWX permissions. >> >> I'm sad that UEFI was designed without even the most basic of memory >> protections in mind. UEFI _itself_ should be setting up protective >> page mappings. :( >> > > Well, the 4 KB alignment of sections was considered prohibitive at the > time from code size pov. But this was a long time ago, obviously. Heh, yeah, I'd expect max 4K padding to get code/data correctly aligned on a 2MB binary to not be an issue. :) > >> For a boot firmware, it seems to me that safe page table layout would >> be a top priority bug. The "reporting issues" page for TianoCore >> doesn't actually seem to link to the "Project Tracker": >> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues >> >> Does anyone know how to get this correctly reported so future UEFI >> releases don't suffer from this? >> > > Ugh. Don't get me started on that topic. I have been working with the > UEFI forum since July to get a fundamentally broken implementation of > memory protections fixed. UEFI v2.5 defines a memory protection scheme > that is based on splitting PE/COFF images into separate memory regions > so that R-X and RW- permissions can be applied. Unfortunately, that > broke every OS in existence (including Windows 8), since the OS is > allowed to reorder memory
Re: [GIT PULL] x86/mm changes for v4.4
On 9 November 2015 at 22:08, Kees Cookwrote: > On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel > wrote: >> On 8 November 2015 at 07:58, Kees Cook wrote: >>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel >>> wrote: On 7 November 2015 at 08:09, Ingo Molnar wrote: > > * Matt Fleming wrote: > >> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >> > >> > 3) We should fix the EFI permission problem without relying on the >> > firmware: it >> > appears we could just mark everything R-X optimistically, and if a >> > write fault >> > happens (it's pretty rare in fact, only triggers when we write to >> > an EFI >> > variable and so), we can mark the faulting page RW- on the fly, >> > because it >> > appears that writable EFI sections, while not enumerated very well >> > in 'old' >> > firmware, are still supposed to be page granular. (Even 'new' >> > firmware I >> > wouldn't automatically trust to get the enumeration right...) >> >> Sorry, this isn't true. I misled you with one of my earlier posts on >> this topic. Let me try and clear things up... >> >> Writing to EFI regions has to do with every invocation of the EFI >> runtime services - it's not limited to when you read/write/delete EFI >> variables. In fact, EFI variables really have nothing to do with this >> discussion, they're a completely opaque concept to the OS, we have no >> idea how the firmware implements them. Everything is done via the EFI >> boot/runtime services. >> >> The firmware itself will attempt to write to EFI regions when we >> invoke the EFI services because that's where the PE/COFF ".data" and >> ".bss" sections live along with the heap. There's even some relocation >> fixups that occur as SetVirtualAddressMap() time so it'll write to >> ".text" too. >> >> Now, the above PE/COFF sections are usually (always?) contained within >> EFI regions of type EfiRuntimeServicesCode. We know this is true >> because the firmware folks have told us so, and because stopping that >> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >> V2.5. >> >> The data sections within the region are also *not* guaranteed to be >> page granular because work was required in Tianocore for emitting >> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >> support. >> >> Ultimately, what this means is that if you were to attempt to >> dynamically fixup those regions that required write permission, you'd >> have to modify the mappings for the majority of the EFI regions >> anyway. And if you're blindly allowing write permission as a fixup, >> there's not much security to be had. > > I think you misunderstood my suggestion: the 'fixup' would be changing it > from R-X > to RW-, i.e. it would add 'write' permission but remove 'execute' > permission. > > Note that there would be no 'RWX' permission at any given moment - which > is the > dangerous combination. > The problem with that is that /any/ page in the UEFI runtime region may intersect with both .text and .data of any of the PE/COFF images that make up the runtime firmware (since the PE/COFF sections are not necessarily page aligned). Such pages require RWX permissions. The UEFI memory map does not provide the information to identify those pages a priori (the entire region containing several PE/COFF images could be covered by a single entry) so it is hard to guess which pages should be allowed these RWX permissions. >>> >>> I'm sad that UEFI was designed without even the most basic of memory >>> protections in mind. UEFI _itself_ should be setting up protective >>> page mappings. :( >>> >> >> Well, the 4 KB alignment of sections was considered prohibitive at the >> time from code size pov. But this was a long time ago, obviously. > > Heh, yeah, I'd expect max 4K padding to get code/data correctly > aligned on a 2MB binary to not be an issue. :) > This is not about section sizes on ARM. The PE/COFF format does not use segments, like ELF, so the payload (the sections) needs to be completely disjoint from the header. This means, when using 4 KB alignment, that every PE/COFF image wastes ~4 KB in the header and 4 KB on average in the section padding (assuming a .text/.data/.reloc layout, as is common with PE/COFF) Considering that a typical UEFI firmware image consists of numerous (around 50 on average, I think) PE/COFF images, and some of them execute from NOR flash, the Tianocore tooling (which is the reference implementation) has always been geared towards keeping the alignment as small as possible, typically 32 bytes
Re: [GIT PULL] x86/mm changes for v4.4
On 8 November 2015 at 07:58, Kees Cook wrote: > On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel > wrote: >> On 7 November 2015 at 08:09, Ingo Molnar wrote: >>> >>> * Matt Fleming wrote: >>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: > > 3) We should fix the EFI permission problem without relying on the > firmware: it > appears we could just mark everything R-X optimistically, and if a > write fault > happens (it's pretty rare in fact, only triggers when we write to an > EFI > variable and so), we can mark the faulting page RW- on the fly, > because it > appears that writable EFI sections, while not enumerated very well > in 'old' > firmware, are still supposed to be page granular. (Even 'new' > firmware I > wouldn't automatically trust to get the enumeration right...) Sorry, this isn't true. I misled you with one of my earlier posts on this topic. Let me try and clear things up... Writing to EFI regions has to do with every invocation of the EFI runtime services - it's not limited to when you read/write/delete EFI variables. In fact, EFI variables really have nothing to do with this discussion, they're a completely opaque concept to the OS, we have no idea how the firmware implements them. Everything is done via the EFI boot/runtime services. The firmware itself will attempt to write to EFI regions when we invoke the EFI services because that's where the PE/COFF ".data" and ".bss" sections live along with the heap. There's even some relocation fixups that occur as SetVirtualAddressMap() time so it'll write to ".text" too. Now, the above PE/COFF sections are usually (always?) contained within EFI regions of type EfiRuntimeServicesCode. We know this is true because the firmware folks have told us so, and because stopping that is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI V2.5. The data sections within the region are also *not* guaranteed to be page granular because work was required in Tianocore for emitting sections with 4k alignment as part of the EFI_PROPERTIES_TABLE support. Ultimately, what this means is that if you were to attempt to dynamically fixup those regions that required write permission, you'd have to modify the mappings for the majority of the EFI regions anyway. And if you're blindly allowing write permission as a fixup, there's not much security to be had. >>> >>> I think you misunderstood my suggestion: the 'fixup' would be changing it >>> from R-X >>> to RW-, i.e. it would add 'write' permission but remove 'execute' >>> permission. >>> >>> Note that there would be no 'RWX' permission at any given moment - which is >>> the >>> dangerous combination. >>> >> >> The problem with that is that /any/ page in the UEFI runtime region >> may intersect with both .text and .data of any of the PE/COFF images >> that make up the runtime firmware (since the PE/COFF sections are not >> necessarily page aligned). Such pages require RWX permissions. The >> UEFI memory map does not provide the information to identify those >> pages a priori (the entire region containing several PE/COFF images >> could be covered by a single entry) so it is hard to guess which pages >> should be allowed these RWX permissions. > > I'm sad that UEFI was designed without even the most basic of memory > protections in mind. UEFI _itself_ should be setting up protective > page mappings. :( > Well, the 4 KB alignment of sections was considered prohibitive at the time from code size pov. But this was a long time ago, obviously. > For a boot firmware, it seems to me that safe page table layout would > be a top priority bug. The "reporting issues" page for TianoCore > doesn't actually seem to link to the "Project Tracker": > https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues > > Does anyone know how to get this correctly reported so future UEFI > releases don't suffer from this? > Ugh. Don't get me started on that topic. I have been working with the UEFI forum since July to get a fundamentally broken implementation of memory protections fixed. UEFI v2.5 defines a memory protection scheme that is based on splitting PE/COFF images into separate memory regions so that R-X and RW- permissions can be applied. Unfortunately, that broke every OS in existence (including Windows 8), since the OS is allowed to reorder memory regions when it lays out the virtual remapping of the UEFI regions, resulting in PE/COFF .data and .text potentially appearing out of order. The good news is that we fixed it for the upcoming release (v2.6). I can't disclose any specifics, though :-( -- Ard. > If that 'supposed to be' turns out to be 'not true' (not unheard of > in > firmware land), then plan B would
Re: [GIT PULL] x86/mm changes for v4.4
On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel wrote: > On 7 November 2015 at 08:09, Ingo Molnar wrote: >> >> * Matt Fleming wrote: >> >>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >>> > >>> > 3) We should fix the EFI permission problem without relying on the >>> > firmware: it >>> > appears we could just mark everything R-X optimistically, and if a >>> > write fault >>> > happens (it's pretty rare in fact, only triggers when we write to an >>> > EFI >>> > variable and so), we can mark the faulting page RW- on the fly, >>> > because it >>> > appears that writable EFI sections, while not enumerated very well in >>> > 'old' >>> > firmware, are still supposed to be page granular. (Even 'new' >>> > firmware I >>> > wouldn't automatically trust to get the enumeration right...) >>> >>> Sorry, this isn't true. I misled you with one of my earlier posts on >>> this topic. Let me try and clear things up... >>> >>> Writing to EFI regions has to do with every invocation of the EFI >>> runtime services - it's not limited to when you read/write/delete EFI >>> variables. In fact, EFI variables really have nothing to do with this >>> discussion, they're a completely opaque concept to the OS, we have no >>> idea how the firmware implements them. Everything is done via the EFI >>> boot/runtime services. >>> >>> The firmware itself will attempt to write to EFI regions when we >>> invoke the EFI services because that's where the PE/COFF ".data" and >>> ".bss" sections live along with the heap. There's even some relocation >>> fixups that occur as SetVirtualAddressMap() time so it'll write to >>> ".text" too. >>> >>> Now, the above PE/COFF sections are usually (always?) contained within >>> EFI regions of type EfiRuntimeServicesCode. We know this is true >>> because the firmware folks have told us so, and because stopping that >>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >>> V2.5. >>> >>> The data sections within the region are also *not* guaranteed to be >>> page granular because work was required in Tianocore for emitting >>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >>> support. >>> >>> Ultimately, what this means is that if you were to attempt to >>> dynamically fixup those regions that required write permission, you'd >>> have to modify the mappings for the majority of the EFI regions >>> anyway. And if you're blindly allowing write permission as a fixup, >>> there's not much security to be had. >> >> I think you misunderstood my suggestion: the 'fixup' would be changing it >> from R-X >> to RW-, i.e. it would add 'write' permission but remove 'execute' permission. >> >> Note that there would be no 'RWX' permission at any given moment - which is >> the >> dangerous combination. >> > > The problem with that is that /any/ page in the UEFI runtime region > may intersect with both .text and .data of any of the PE/COFF images > that make up the runtime firmware (since the PE/COFF sections are not > necessarily page aligned). Such pages require RWX permissions. The > UEFI memory map does not provide the information to identify those > pages a priori (the entire region containing several PE/COFF images > could be covered by a single entry) so it is hard to guess which pages > should be allowed these RWX permissions. I'm sad that UEFI was designed without even the most basic of memory protections in mind. UEFI _itself_ should be setting up protective page mappings. :( For a boot firmware, it seems to me that safe page table layout would be a top priority bug. The "reporting issues" page for TianoCore doesn't actually seem to link to the "Project Tracker": https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues Does anyone know how to get this correctly reported so future UEFI releases don't suffer from this? -Kees > >>> > If that 'supposed to be' turns out to be 'not true' (not unheard of in >>> > firmware land), then plan B would be to mark pages that generate >>> > write faults >>> > RWX as well, to not break functionality. (This 'mark it RWX' is not >>> > something >>> > that exploits would have easy access to, and we could also generate a >>> > warning >>> > [after the EFI call has finished] if it ever triggers.) >>> > >>> > Admittedly this approach might not be without its own complications, >>> > but it >>> > looks reasonably simple (I don't think we need per EFI call page >>> > tables, >>> > etc.), and does not assume much about the firmware being able to >>> > enumerate its >>> > permissions properly. Were we to merge EFI support today I'd have >>> > insisted on >>> > trying such an approach from day 1 on. >>> >>> We already have separate EFI page tables, though with the caveat that >>> we share some of swapper_pg_dir's PGD entries. The best solution would >>> be to stop sharing entries and isolate the EFI mappings from every >>> other page table structure, so
Re: [GIT PULL] x86/mm changes for v4.4
On Sat, 07 Nov, at 08:05:54AM, Ingo Molnar wrote: > > * Matt Fleming wrote: > > > On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote: > > > > > > And if this turns out to be due to EFI wanting those permissions, what > > > should > > > we do? People have talked about running the EFI callbacks in their own > > > private > > > page table setup, which sounds like the right idea, but until that > > > actually > > > *happens* > > > > We have separate page tables today, for a few reasons, but mainly it's > > so that we can have an identity mapping of memory present in the > > region usually used by user processes - broken firmware still uses > > those identity mappings even after the kernel tells it they're > > invalid. > > > > Note that when I say "separate" I'm talking about trampoline_pgd[] > > which is also used by the x86 suspend/resume code. > > > > However, turns out that the issue with the current scheme is the fact > > that trampoline_pgd[] actually shares a couple of PGD entries with > > swapper_pg_dir as can be seen in setup_real_mode(), > > > > > > trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd); > > trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd; > > trampoline_pgd[511] = init_level4_pgt[511].pgd; > > > > > > So when we map the EFI regions in efi_map_regions() we're inserting > > them into swapper_pg_dir also, which is why you're seeing the > > warnings. > > > > If I remember correctly the rationale for using trampoline_pgd[] was > > that it already did what we wanted (provided the identity mapping) and > > would save us the overhead of maintaining more page tables for no good > > reason. Obviously this entire thread is a good reason. > > > > I suggest we stop using trampoline_pgd[] (since it has a good reason > > for sharing the kernel mapping PGD entries) and create our own so that > > we can isolate EFI completely. > > Ok. Could you please make this fix a priority for upcoming EFI changes? Yep, I'll get on it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Sat, 07 Nov, at 08:05:54AM, Ingo Molnar wrote: > > * Matt Flemingwrote: > > > On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote: > > > > > > And if this turns out to be due to EFI wanting those permissions, what > > > should > > > we do? People have talked about running the EFI callbacks in their own > > > private > > > page table setup, which sounds like the right idea, but until that > > > actually > > > *happens* > > > > We have separate page tables today, for a few reasons, but mainly it's > > so that we can have an identity mapping of memory present in the > > region usually used by user processes - broken firmware still uses > > those identity mappings even after the kernel tells it they're > > invalid. > > > > Note that when I say "separate" I'm talking about trampoline_pgd[] > > which is also used by the x86 suspend/resume code. > > > > However, turns out that the issue with the current scheme is the fact > > that trampoline_pgd[] actually shares a couple of PGD entries with > > swapper_pg_dir as can be seen in setup_real_mode(), > > > > > > trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd); > > trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd; > > trampoline_pgd[511] = init_level4_pgt[511].pgd; > > > > > > So when we map the EFI regions in efi_map_regions() we're inserting > > them into swapper_pg_dir also, which is why you're seeing the > > warnings. > > > > If I remember correctly the rationale for using trampoline_pgd[] was > > that it already did what we wanted (provided the identity mapping) and > > would save us the overhead of maintaining more page tables for no good > > reason. Obviously this entire thread is a good reason. > > > > I suggest we stop using trampoline_pgd[] (since it has a good reason > > for sharing the kernel mapping PGD entries) and create our own so that > > we can isolate EFI completely. > > Ok. Could you please make this fix a priority for upcoming EFI changes? Yep, I'll get on it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvelwrote: > On 7 November 2015 at 08:09, Ingo Molnar wrote: >> >> * Matt Fleming wrote: >> >>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >>> > >>> > 3) We should fix the EFI permission problem without relying on the >>> > firmware: it >>> > appears we could just mark everything R-X optimistically, and if a >>> > write fault >>> > happens (it's pretty rare in fact, only triggers when we write to an >>> > EFI >>> > variable and so), we can mark the faulting page RW- on the fly, >>> > because it >>> > appears that writable EFI sections, while not enumerated very well in >>> > 'old' >>> > firmware, are still supposed to be page granular. (Even 'new' >>> > firmware I >>> > wouldn't automatically trust to get the enumeration right...) >>> >>> Sorry, this isn't true. I misled you with one of my earlier posts on >>> this topic. Let me try and clear things up... >>> >>> Writing to EFI regions has to do with every invocation of the EFI >>> runtime services - it's not limited to when you read/write/delete EFI >>> variables. In fact, EFI variables really have nothing to do with this >>> discussion, they're a completely opaque concept to the OS, we have no >>> idea how the firmware implements them. Everything is done via the EFI >>> boot/runtime services. >>> >>> The firmware itself will attempt to write to EFI regions when we >>> invoke the EFI services because that's where the PE/COFF ".data" and >>> ".bss" sections live along with the heap. There's even some relocation >>> fixups that occur as SetVirtualAddressMap() time so it'll write to >>> ".text" too. >>> >>> Now, the above PE/COFF sections are usually (always?) contained within >>> EFI regions of type EfiRuntimeServicesCode. We know this is true >>> because the firmware folks have told us so, and because stopping that >>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >>> V2.5. >>> >>> The data sections within the region are also *not* guaranteed to be >>> page granular because work was required in Tianocore for emitting >>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >>> support. >>> >>> Ultimately, what this means is that if you were to attempt to >>> dynamically fixup those regions that required write permission, you'd >>> have to modify the mappings for the majority of the EFI regions >>> anyway. And if you're blindly allowing write permission as a fixup, >>> there's not much security to be had. >> >> I think you misunderstood my suggestion: the 'fixup' would be changing it >> from R-X >> to RW-, i.e. it would add 'write' permission but remove 'execute' permission. >> >> Note that there would be no 'RWX' permission at any given moment - which is >> the >> dangerous combination. >> > > The problem with that is that /any/ page in the UEFI runtime region > may intersect with both .text and .data of any of the PE/COFF images > that make up the runtime firmware (since the PE/COFF sections are not > necessarily page aligned). Such pages require RWX permissions. The > UEFI memory map does not provide the information to identify those > pages a priori (the entire region containing several PE/COFF images > could be covered by a single entry) so it is hard to guess which pages > should be allowed these RWX permissions. I'm sad that UEFI was designed without even the most basic of memory protections in mind. UEFI _itself_ should be setting up protective page mappings. :( For a boot firmware, it seems to me that safe page table layout would be a top priority bug. The "reporting issues" page for TianoCore doesn't actually seem to link to the "Project Tracker": https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues Does anyone know how to get this correctly reported so future UEFI releases don't suffer from this? -Kees > >>> > If that 'supposed to be' turns out to be 'not true' (not unheard of in >>> > firmware land), then plan B would be to mark pages that generate >>> > write faults >>> > RWX as well, to not break functionality. (This 'mark it RWX' is not >>> > something >>> > that exploits would have easy access to, and we could also generate a >>> > warning >>> > [after the EFI call has finished] if it ever triggers.) >>> > >>> > Admittedly this approach might not be without its own complications, >>> > but it >>> > looks reasonably simple (I don't think we need per EFI call page >>> > tables, >>> > etc.), and does not assume much about the firmware being able to >>> > enumerate its >>> > permissions properly. Were we to merge EFI support today I'd have >>> > insisted on >>> > trying such an approach from day 1 on. >>> >>> We already have separate EFI page tables, though with the caveat that >>> we share some of swapper_pg_dir's PGD entries. The best solution would >>> be to stop sharing entries and
Re: [GIT PULL] x86/mm changes for v4.4
On 8 November 2015 at 07:58, Kees Cookwrote: > On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel > wrote: >> On 7 November 2015 at 08:09, Ingo Molnar wrote: >>> >>> * Matt Fleming wrote: >>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: > > 3) We should fix the EFI permission problem without relying on the > firmware: it > appears we could just mark everything R-X optimistically, and if a > write fault > happens (it's pretty rare in fact, only triggers when we write to an > EFI > variable and so), we can mark the faulting page RW- on the fly, > because it > appears that writable EFI sections, while not enumerated very well > in 'old' > firmware, are still supposed to be page granular. (Even 'new' > firmware I > wouldn't automatically trust to get the enumeration right...) Sorry, this isn't true. I misled you with one of my earlier posts on this topic. Let me try and clear things up... Writing to EFI regions has to do with every invocation of the EFI runtime services - it's not limited to when you read/write/delete EFI variables. In fact, EFI variables really have nothing to do with this discussion, they're a completely opaque concept to the OS, we have no idea how the firmware implements them. Everything is done via the EFI boot/runtime services. The firmware itself will attempt to write to EFI regions when we invoke the EFI services because that's where the PE/COFF ".data" and ".bss" sections live along with the heap. There's even some relocation fixups that occur as SetVirtualAddressMap() time so it'll write to ".text" too. Now, the above PE/COFF sections are usually (always?) contained within EFI regions of type EfiRuntimeServicesCode. We know this is true because the firmware folks have told us so, and because stopping that is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI V2.5. The data sections within the region are also *not* guaranteed to be page granular because work was required in Tianocore for emitting sections with 4k alignment as part of the EFI_PROPERTIES_TABLE support. Ultimately, what this means is that if you were to attempt to dynamically fixup those regions that required write permission, you'd have to modify the mappings for the majority of the EFI regions anyway. And if you're blindly allowing write permission as a fixup, there's not much security to be had. >>> >>> I think you misunderstood my suggestion: the 'fixup' would be changing it >>> from R-X >>> to RW-, i.e. it would add 'write' permission but remove 'execute' >>> permission. >>> >>> Note that there would be no 'RWX' permission at any given moment - which is >>> the >>> dangerous combination. >>> >> >> The problem with that is that /any/ page in the UEFI runtime region >> may intersect with both .text and .data of any of the PE/COFF images >> that make up the runtime firmware (since the PE/COFF sections are not >> necessarily page aligned). Such pages require RWX permissions. The >> UEFI memory map does not provide the information to identify those >> pages a priori (the entire region containing several PE/COFF images >> could be covered by a single entry) so it is hard to guess which pages >> should be allowed these RWX permissions. > > I'm sad that UEFI was designed without even the most basic of memory > protections in mind. UEFI _itself_ should be setting up protective > page mappings. :( > Well, the 4 KB alignment of sections was considered prohibitive at the time from code size pov. But this was a long time ago, obviously. > For a boot firmware, it seems to me that safe page table layout would > be a top priority bug. The "reporting issues" page for TianoCore > doesn't actually seem to link to the "Project Tracker": > https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues > > Does anyone know how to get this correctly reported so future UEFI > releases don't suffer from this? > Ugh. Don't get me started on that topic. I have been working with the UEFI forum since July to get a fundamentally broken implementation of memory protections fixed. UEFI v2.5 defines a memory protection scheme that is based on splitting PE/COFF images into separate memory regions so that R-X and RW- permissions can be applied. Unfortunately, that broke every OS in existence (including Windows 8), since the OS is allowed to reorder memory regions when it lays out the virtual remapping of the UEFI regions, resulting in PE/COFF .data and .text potentially appearing out of order. The good news is that we fixed it for the upcoming release (v2.6). I can't disclose any specifics, though :-( -- Ard. > If that 'supposed to be' turns
Re: [GIT PULL] x86/mm changes for v4.4
On 7 November 2015 at 08:09, Ingo Molnar wrote: > > * Matt Fleming wrote: > >> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >> > >> > 3) We should fix the EFI permission problem without relying on the >> > firmware: it >> > appears we could just mark everything R-X optimistically, and if a >> > write fault >> > happens (it's pretty rare in fact, only triggers when we write to an >> > EFI >> > variable and so), we can mark the faulting page RW- on the fly, >> > because it >> > appears that writable EFI sections, while not enumerated very well in >> > 'old' >> > firmware, are still supposed to be page granular. (Even 'new' firmware >> > I >> > wouldn't automatically trust to get the enumeration right...) >> >> Sorry, this isn't true. I misled you with one of my earlier posts on >> this topic. Let me try and clear things up... >> >> Writing to EFI regions has to do with every invocation of the EFI >> runtime services - it's not limited to when you read/write/delete EFI >> variables. In fact, EFI variables really have nothing to do with this >> discussion, they're a completely opaque concept to the OS, we have no >> idea how the firmware implements them. Everything is done via the EFI >> boot/runtime services. >> >> The firmware itself will attempt to write to EFI regions when we >> invoke the EFI services because that's where the PE/COFF ".data" and >> ".bss" sections live along with the heap. There's even some relocation >> fixups that occur as SetVirtualAddressMap() time so it'll write to >> ".text" too. >> >> Now, the above PE/COFF sections are usually (always?) contained within >> EFI regions of type EfiRuntimeServicesCode. We know this is true >> because the firmware folks have told us so, and because stopping that >> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >> V2.5. >> >> The data sections within the region are also *not* guaranteed to be >> page granular because work was required in Tianocore for emitting >> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >> support. >> >> Ultimately, what this means is that if you were to attempt to >> dynamically fixup those regions that required write permission, you'd >> have to modify the mappings for the majority of the EFI regions >> anyway. And if you're blindly allowing write permission as a fixup, >> there's not much security to be had. > > I think you misunderstood my suggestion: the 'fixup' would be changing it > from R-X > to RW-, i.e. it would add 'write' permission but remove 'execute' permission. > > Note that there would be no 'RWX' permission at any given moment - which is > the > dangerous combination. > The problem with that is that /any/ page in the UEFI runtime region may intersect with both .text and .data of any of the PE/COFF images that make up the runtime firmware (since the PE/COFF sections are not necessarily page aligned). Such pages require RWX permissions. The UEFI memory map does not provide the information to identify those pages a priori (the entire region containing several PE/COFF images could be covered by a single entry) so it is hard to guess which pages should be allowed these RWX permissions. >> > If that 'supposed to be' turns out to be 'not true' (not unheard of in >> > firmware land), then plan B would be to mark pages that generate write >> > faults >> > RWX as well, to not break functionality. (This 'mark it RWX' is not >> > something >> > that exploits would have easy access to, and we could also generate a >> > warning >> > [after the EFI call has finished] if it ever triggers.) >> > >> > Admittedly this approach might not be without its own complications, >> > but it >> > looks reasonably simple (I don't think we need per EFI call page >> > tables, >> > etc.), and does not assume much about the firmware being able to >> > enumerate its >> > permissions properly. Were we to merge EFI support today I'd have >> > insisted on >> > trying such an approach from day 1 on. >> >> We already have separate EFI page tables, though with the caveat that >> we share some of swapper_pg_dir's PGD entries. The best solution would >> be to stop sharing entries and isolate the EFI mappings from every >> other page table structure, so that they're only used during the EFI >> service calls. > > Absolutely. Can you try to fix this for v4.3? > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
* Matt Fleming wrote: > On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: > > > > 3) We should fix the EFI permission problem without relying on the > > firmware: it > > appears we could just mark everything R-X optimistically, and if a > > write fault > > happens (it's pretty rare in fact, only triggers when we write to an > > EFI > > variable and so), we can mark the faulting page RW- on the fly, because > > it > > appears that writable EFI sections, while not enumerated very well in > > 'old' > > firmware, are still supposed to be page granular. (Even 'new' firmware > > I > > wouldn't automatically trust to get the enumeration right...) > > Sorry, this isn't true. I misled you with one of my earlier posts on > this topic. Let me try and clear things up... > > Writing to EFI regions has to do with every invocation of the EFI > runtime services - it's not limited to when you read/write/delete EFI > variables. In fact, EFI variables really have nothing to do with this > discussion, they're a completely opaque concept to the OS, we have no > idea how the firmware implements them. Everything is done via the EFI > boot/runtime services. > > The firmware itself will attempt to write to EFI regions when we > invoke the EFI services because that's where the PE/COFF ".data" and > ".bss" sections live along with the heap. There's even some relocation > fixups that occur as SetVirtualAddressMap() time so it'll write to > ".text" too. > > Now, the above PE/COFF sections are usually (always?) contained within > EFI regions of type EfiRuntimeServicesCode. We know this is true > because the firmware folks have told us so, and because stopping that > is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI > V2.5. > > The data sections within the region are also *not* guaranteed to be > page granular because work was required in Tianocore for emitting > sections with 4k alignment as part of the EFI_PROPERTIES_TABLE > support. > > Ultimately, what this means is that if you were to attempt to > dynamically fixup those regions that required write permission, you'd > have to modify the mappings for the majority of the EFI regions > anyway. And if you're blindly allowing write permission as a fixup, > there's not much security to be had. I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X to RW-, i.e. it would add 'write' permission but remove 'execute' permission. Note that there would be no 'RWX' permission at any given moment - which is the dangerous combination. > > If that 'supposed to be' turns out to be 'not true' (not unheard of in > > firmware land), then plan B would be to mark pages that generate write > > faults > > RWX as well, to not break functionality. (This 'mark it RWX' is not > > something > > that exploits would have easy access to, and we could also generate a > > warning > > [after the EFI call has finished] if it ever triggers.) > > > > Admittedly this approach might not be without its own complications, > > but it > > looks reasonably simple (I don't think we need per EFI call page > > tables, > > etc.), and does not assume much about the firmware being able to > > enumerate its > > permissions properly. Were we to merge EFI support today I'd have > > insisted on > > trying such an approach from day 1 on. > > We already have separate EFI page tables, though with the caveat that > we share some of swapper_pg_dir's PGD entries. The best solution would > be to stop sharing entries and isolate the EFI mappings from every > other page table structure, so that they're only used during the EFI > service calls. Absolutely. Can you try to fix this for v4.3? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
* Matt Fleming wrote: > On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote: > > > > And if this turns out to be due to EFI wanting those permissions, what > > should > > we do? People have talked about running the EFI callbacks in their own > > private > > page table setup, which sounds like the right idea, but until that actually > > *happens* > > We have separate page tables today, for a few reasons, but mainly it's > so that we can have an identity mapping of memory present in the > region usually used by user processes - broken firmware still uses > those identity mappings even after the kernel tells it they're > invalid. > > Note that when I say "separate" I'm talking about trampoline_pgd[] > which is also used by the x86 suspend/resume code. > > However, turns out that the issue with the current scheme is the fact > that trampoline_pgd[] actually shares a couple of PGD entries with > swapper_pg_dir as can be seen in setup_real_mode(), > > > trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd); > trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd; > trampoline_pgd[511] = init_level4_pgt[511].pgd; > > > So when we map the EFI regions in efi_map_regions() we're inserting > them into swapper_pg_dir also, which is why you're seeing the > warnings. > > If I remember correctly the rationale for using trampoline_pgd[] was > that it already did what we wanted (provided the identity mapping) and > would save us the overhead of maintaining more page tables for no good > reason. Obviously this entire thread is a good reason. > > I suggest we stop using trampoline_pgd[] (since it has a good reason > for sharing the kernel mapping PGD entries) and create our own so that > we can isolate EFI completely. Ok. Could you please make this fix a priority for upcoming EFI changes? > For the immediate problem of the warnings spewing forth on all UEFI > machines, at the very least the config options needs to be disabled by > default, if not the patch reverted. We'll certainly flip around the default, but reverting would be shooting the messenger: the EFI code is endangering everyone else today, and for no good reason as it appears... so the warning very much served its purpose in pointing out a valid problem. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
* Andy Lutomirski wrote: > On Thu, Nov 5, 2015 at 10:55 PM, Ingo Molnar wrote: > > > > * Linus Torvalds wrote: > > > >> On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: > >> > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > >> > > > >> > > I don't have that later debug output at all. Presumably some config > >> > difference. > >> > > >> > CONFIG_X86_PTDUMP_CORE iirc. > >> > >> No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. > >> > >> Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should > >> not default to 'y' unless it is made more useful if it actually > >> triggers. Ingo? > > > > Yeah, agreed absolutely. > > > > So this is a bit sad because RWX pages are a real problem in practice, > > especially > > since the EFI addresses are well predictable, but generating a warning > > without > > being able to fix it quickly is counterproductive as well, as it only annoys > > people and makes them turn off the option. (Which we could do as well to > > begin > > with, without the annoyance factor...) > > > > So the plan would be: > > > > 1) Make it default-n. > > > > 2) We should try to further improve the messages to make it easier to > > determine > > what's wrong. We _do_ try to output symbolic information in the > > warning, to > > make it easier to find buggy mappings, but these are not standard kernel > > mappings. So I think we need an e820 mappings based semi-symbolic > > printout of > > bad addresses - maybe even correlate it with the MMIO resource tree. > > > > 3) We should fix the EFI permission problem without relying on the > > firmware: it > > appears we could just mark everything R-X optimistically, and if a > > write fault > > happens (it's pretty rare in fact, only triggers when we write to an EFI > > variable and so), we can mark the faulting page RW- on the fly, because > > it > > appears that writable EFI sections, while not enumerated very well in > > 'old' > > firmware, are still supposed to be page granular. (Even 'new' firmware I > > wouldn't automatically trust to get the enumeration right...) > > I think it was Borislav who pointed out that this idea, which might > have been mine, is a bit silly. Why not just skip mapping the EFI > stuff in the init_pgd entirely and only map it in the EFI pgd? > > We'll have RWX stuff in the EFI pgd, but so what? If we're exposing > anything that runs with the EFI pgd loaded to untrusted input, I think > we've already lost. That's certainly true, I was simply confused about the life time of these mappings: I assumed they have to stay around. If they are meant to be and are partly temporary today already, we should go the whole mile and make that really so, because _today_ the mappings are permanent, so this is a real problem ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Fri, Nov 06, 2015 at 01:09:48PM +, Matt Fleming wrote: > On Thu, 05 Nov, at 11:05:35PM, Andy Lutomirski wrote: > > > > Admittedly, we might need to use a certain amount of care to avoid > > interesting conflicts with the vmap mechanism. We might need to vmap > > all of the EFI stuff, and possibly even all the top-level entries that > > contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*) > > as a blank not-present region to avoid overlaps, but that's not a big > > deal. > > There shouldn't be any room for conflicting with vmap() because the VA > region where we map EFI regions is still carved out especially for us. > > Right Boris? Yap: ea00 - eaff (=40 bits) virtual memory map (1TB) vs ffef - EFI region in trampoline_pgd the new pagetable will make that issue moot too. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, 05 Nov, at 11:05:35PM, Andy Lutomirski wrote: > > Admittedly, we might need to use a certain amount of care to avoid > interesting conflicts with the vmap mechanism. We might need to vmap > all of the EFI stuff, and possibly even all the top-level entries that > contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*) > as a blank not-present region to avoid overlaps, but that's not a big > deal. There shouldn't be any room for conflicting with vmap() because the VA region where we map EFI regions is still carved out especially for us. Right Boris? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: > > 3) We should fix the EFI permission problem without relying on the firmware: > it > appears we could just mark everything R-X optimistically, and if a write > fault > happens (it's pretty rare in fact, only triggers when we write to an EFI > variable and so), we can mark the faulting page RW- on the fly, because > it > appears that writable EFI sections, while not enumerated very well in > 'old' > firmware, are still supposed to be page granular. (Even 'new' firmware I > wouldn't automatically trust to get the enumeration right...) Sorry, this isn't true. I misled you with one of my earlier posts on this topic. Let me try and clear things up... Writing to EFI regions has to do with every invocation of the EFI runtime services - it's not limited to when you read/write/delete EFI variables. In fact, EFI variables really have nothing to do with this discussion, they're a completely opaque concept to the OS, we have no idea how the firmware implements them. Everything is done via the EFI boot/runtime services. The firmware itself will attempt to write to EFI regions when we invoke the EFI services because that's where the PE/COFF ".data" and ".bss" sections live along with the heap. There's even some relocation fixups that occur as SetVirtualAddressMap() time so it'll write to ".text" too. Now, the above PE/COFF sections are usually (always?) contained within EFI regions of type EfiRuntimeServicesCode. We know this is true because the firmware folks have told us so, and because stopping that is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI V2.5. The data sections within the region are also *not* guaranteed to be page granular because work was required in Tianocore for emitting sections with 4k alignment as part of the EFI_PROPERTIES_TABLE support. Ultimately, what this means is that if you were to attempt to dynamically fixup those regions that required write permission, you'd have to modify the mappings for the majority of the EFI regions anyway. And if you're blindly allowing write permission as a fixup, there's not much security to be had. > If that 'supposed to be' turns out to be 'not true' (not unheard of in > firmware land), then plan B would be to mark pages that generate write > faults > RWX as well, to not break functionality. (This 'mark it RWX' is not > something > that exploits would have easy access to, and we could also generate a > warning > [after the EFI call has finished] if it ever triggers.) > > Admittedly this approach might not be without its own complications, but > it > looks reasonably simple (I don't think we need per EFI call page tables, > etc.), and does not assume much about the firmware being able to > enumerate its > permissions properly. Were we to merge EFI support today I'd have > insisted on > trying such an approach from day 1 on. We already have separate EFI page tables, though with the caveat that we share some of swapper_pg_dir's PGD entries. The best solution would be to stop sharing entries and isolate the EFI mappings from every other page table structure, so that they're only used during the EFI service calls. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote: > > And if this turns out to be due to EFI wanting those permissions, what > should we do? People have talked about running the EFI callbacks in > their own private page table setup, which sounds like the right idea, > but until that actually *happens* We have separate page tables today, for a few reasons, but mainly it's so that we can have an identity mapping of memory present in the region usually used by user processes - broken firmware still uses those identity mappings even after the kernel tells it they're invalid. Note that when I say "separate" I'm talking about trampoline_pgd[] which is also used by the x86 suspend/resume code. However, turns out that the issue with the current scheme is the fact that trampoline_pgd[] actually shares a couple of PGD entries with swapper_pg_dir as can be seen in setup_real_mode(), trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd); trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd; trampoline_pgd[511] = init_level4_pgt[511].pgd; So when we map the EFI regions in efi_map_regions() we're inserting them into swapper_pg_dir also, which is why you're seeing the warnings. If I remember correctly the rationale for using trampoline_pgd[] was that it already did what we wanted (provided the identity mapping) and would save us the overhead of maintaining more page tables for no good reason. Obviously this entire thread is a good reason. I suggest we stop using trampoline_pgd[] (since it has a good reason for sharing the kernel mapping PGD entries) and create our own so that we can isolate EFI completely. For the immediate problem of the warnings spewing forth on all UEFI machines, at the very least the config options needs to be disabled by default, if not the patch reverted. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
* Matt Flemingwrote: > On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: > > > > 3) We should fix the EFI permission problem without relying on the > > firmware: it > > appears we could just mark everything R-X optimistically, and if a > > write fault > > happens (it's pretty rare in fact, only triggers when we write to an > > EFI > > variable and so), we can mark the faulting page RW- on the fly, because > > it > > appears that writable EFI sections, while not enumerated very well in > > 'old' > > firmware, are still supposed to be page granular. (Even 'new' firmware > > I > > wouldn't automatically trust to get the enumeration right...) > > Sorry, this isn't true. I misled you with one of my earlier posts on > this topic. Let me try and clear things up... > > Writing to EFI regions has to do with every invocation of the EFI > runtime services - it's not limited to when you read/write/delete EFI > variables. In fact, EFI variables really have nothing to do with this > discussion, they're a completely opaque concept to the OS, we have no > idea how the firmware implements them. Everything is done via the EFI > boot/runtime services. > > The firmware itself will attempt to write to EFI regions when we > invoke the EFI services because that's where the PE/COFF ".data" and > ".bss" sections live along with the heap. There's even some relocation > fixups that occur as SetVirtualAddressMap() time so it'll write to > ".text" too. > > Now, the above PE/COFF sections are usually (always?) contained within > EFI regions of type EfiRuntimeServicesCode. We know this is true > because the firmware folks have told us so, and because stopping that > is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI > V2.5. > > The data sections within the region are also *not* guaranteed to be > page granular because work was required in Tianocore for emitting > sections with 4k alignment as part of the EFI_PROPERTIES_TABLE > support. > > Ultimately, what this means is that if you were to attempt to > dynamically fixup those regions that required write permission, you'd > have to modify the mappings for the majority of the EFI regions > anyway. And if you're blindly allowing write permission as a fixup, > there's not much security to be had. I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X to RW-, i.e. it would add 'write' permission but remove 'execute' permission. Note that there would be no 'RWX' permission at any given moment - which is the dangerous combination. > > If that 'supposed to be' turns out to be 'not true' (not unheard of in > > firmware land), then plan B would be to mark pages that generate write > > faults > > RWX as well, to not break functionality. (This 'mark it RWX' is not > > something > > that exploits would have easy access to, and we could also generate a > > warning > > [after the EFI call has finished] if it ever triggers.) > > > > Admittedly this approach might not be without its own complications, > > but it > > looks reasonably simple (I don't think we need per EFI call page > > tables, > > etc.), and does not assume much about the firmware being able to > > enumerate its > > permissions properly. Were we to merge EFI support today I'd have > > insisted on > > trying such an approach from day 1 on. > > We already have separate EFI page tables, though with the caveat that > we share some of swapper_pg_dir's PGD entries. The best solution would > be to stop sharing entries and isolate the EFI mappings from every > other page table structure, so that they're only used during the EFI > service calls. Absolutely. Can you try to fix this for v4.3? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On 7 November 2015 at 08:09, Ingo Molnarwrote: > > * Matt Fleming wrote: > >> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >> > >> > 3) We should fix the EFI permission problem without relying on the >> > firmware: it >> > appears we could just mark everything R-X optimistically, and if a >> > write fault >> > happens (it's pretty rare in fact, only triggers when we write to an >> > EFI >> > variable and so), we can mark the faulting page RW- on the fly, >> > because it >> > appears that writable EFI sections, while not enumerated very well in >> > 'old' >> > firmware, are still supposed to be page granular. (Even 'new' firmware >> > I >> > wouldn't automatically trust to get the enumeration right...) >> >> Sorry, this isn't true. I misled you with one of my earlier posts on >> this topic. Let me try and clear things up... >> >> Writing to EFI regions has to do with every invocation of the EFI >> runtime services - it's not limited to when you read/write/delete EFI >> variables. In fact, EFI variables really have nothing to do with this >> discussion, they're a completely opaque concept to the OS, we have no >> idea how the firmware implements them. Everything is done via the EFI >> boot/runtime services. >> >> The firmware itself will attempt to write to EFI regions when we >> invoke the EFI services because that's where the PE/COFF ".data" and >> ".bss" sections live along with the heap. There's even some relocation >> fixups that occur as SetVirtualAddressMap() time so it'll write to >> ".text" too. >> >> Now, the above PE/COFF sections are usually (always?) contained within >> EFI regions of type EfiRuntimeServicesCode. We know this is true >> because the firmware folks have told us so, and because stopping that >> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >> V2.5. >> >> The data sections within the region are also *not* guaranteed to be >> page granular because work was required in Tianocore for emitting >> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >> support. >> >> Ultimately, what this means is that if you were to attempt to >> dynamically fixup those regions that required write permission, you'd >> have to modify the mappings for the majority of the EFI regions >> anyway. And if you're blindly allowing write permission as a fixup, >> there's not much security to be had. > > I think you misunderstood my suggestion: the 'fixup' would be changing it > from R-X > to RW-, i.e. it would add 'write' permission but remove 'execute' permission. > > Note that there would be no 'RWX' permission at any given moment - which is > the > dangerous combination. > The problem with that is that /any/ page in the UEFI runtime region may intersect with both .text and .data of any of the PE/COFF images that make up the runtime firmware (since the PE/COFF sections are not necessarily page aligned). Such pages require RWX permissions. The UEFI memory map does not provide the information to identify those pages a priori (the entire region containing several PE/COFF images could be covered by a single entry) so it is hard to guess which pages should be allowed these RWX permissions. >> > If that 'supposed to be' turns out to be 'not true' (not unheard of in >> > firmware land), then plan B would be to mark pages that generate write >> > faults >> > RWX as well, to not break functionality. (This 'mark it RWX' is not >> > something >> > that exploits would have easy access to, and we could also generate a >> > warning >> > [after the EFI call has finished] if it ever triggers.) >> > >> > Admittedly this approach might not be without its own complications, >> > but it >> > looks reasonably simple (I don't think we need per EFI call page >> > tables, >> > etc.), and does not assume much about the firmware being able to >> > enumerate its >> > permissions properly. Were we to merge EFI support today I'd have >> > insisted on >> > trying such an approach from day 1 on. >> >> We already have separate EFI page tables, though with the caveat that >> we share some of swapper_pg_dir's PGD entries. The best solution would >> be to stop sharing entries and isolate the EFI mappings from every >> other page table structure, so that they're only used during the EFI >> service calls. > > Absolutely. Can you try to fix this for v4.3? > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
* Matt Flemingwrote: > On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote: > > > > And if this turns out to be due to EFI wanting those permissions, what > > should > > we do? People have talked about running the EFI callbacks in their own > > private > > page table setup, which sounds like the right idea, but until that actually > > *happens* > > We have separate page tables today, for a few reasons, but mainly it's > so that we can have an identity mapping of memory present in the > region usually used by user processes - broken firmware still uses > those identity mappings even after the kernel tells it they're > invalid. > > Note that when I say "separate" I'm talking about trampoline_pgd[] > which is also used by the x86 suspend/resume code. > > However, turns out that the issue with the current scheme is the fact > that trampoline_pgd[] actually shares a couple of PGD entries with > swapper_pg_dir as can be seen in setup_real_mode(), > > > trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd); > trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd; > trampoline_pgd[511] = init_level4_pgt[511].pgd; > > > So when we map the EFI regions in efi_map_regions() we're inserting > them into swapper_pg_dir also, which is why you're seeing the > warnings. > > If I remember correctly the rationale for using trampoline_pgd[] was > that it already did what we wanted (provided the identity mapping) and > would save us the overhead of maintaining more page tables for no good > reason. Obviously this entire thread is a good reason. > > I suggest we stop using trampoline_pgd[] (since it has a good reason > for sharing the kernel mapping PGD entries) and create our own so that > we can isolate EFI completely. Ok. Could you please make this fix a priority for upcoming EFI changes? > For the immediate problem of the warnings spewing forth on all UEFI > machines, at the very least the config options needs to be disabled by > default, if not the patch reverted. We'll certainly flip around the default, but reverting would be shooting the messenger: the EFI code is endangering everyone else today, and for no good reason as it appears... so the warning very much served its purpose in pointing out a valid problem. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
* Andy Lutomirskiwrote: > On Thu, Nov 5, 2015 at 10:55 PM, Ingo Molnar wrote: > > > > * Linus Torvalds wrote: > > > >> On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: > >> > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > >> > > > >> > > I don't have that later debug output at all. Presumably some config > >> > difference. > >> > > >> > CONFIG_X86_PTDUMP_CORE iirc. > >> > >> No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. > >> > >> Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should > >> not default to 'y' unless it is made more useful if it actually > >> triggers. Ingo? > > > > Yeah, agreed absolutely. > > > > So this is a bit sad because RWX pages are a real problem in practice, > > especially > > since the EFI addresses are well predictable, but generating a warning > > without > > being able to fix it quickly is counterproductive as well, as it only annoys > > people and makes them turn off the option. (Which we could do as well to > > begin > > with, without the annoyance factor...) > > > > So the plan would be: > > > > 1) Make it default-n. > > > > 2) We should try to further improve the messages to make it easier to > > determine > > what's wrong. We _do_ try to output symbolic information in the > > warning, to > > make it easier to find buggy mappings, but these are not standard kernel > > mappings. So I think we need an e820 mappings based semi-symbolic > > printout of > > bad addresses - maybe even correlate it with the MMIO resource tree. > > > > 3) We should fix the EFI permission problem without relying on the > > firmware: it > > appears we could just mark everything R-X optimistically, and if a > > write fault > > happens (it's pretty rare in fact, only triggers when we write to an EFI > > variable and so), we can mark the faulting page RW- on the fly, because > > it > > appears that writable EFI sections, while not enumerated very well in > > 'old' > > firmware, are still supposed to be page granular. (Even 'new' firmware I > > wouldn't automatically trust to get the enumeration right...) > > I think it was Borislav who pointed out that this idea, which might > have been mine, is a bit silly. Why not just skip mapping the EFI > stuff in the init_pgd entirely and only map it in the EFI pgd? > > We'll have RWX stuff in the EFI pgd, but so what? If we're exposing > anything that runs with the EFI pgd loaded to untrusted input, I think > we've already lost. That's certainly true, I was simply confused about the life time of these mappings: I assumed they have to stay around. If they are meant to be and are partly temporary today already, we should go the whole mile and make that really so, because _today_ the mappings are permanent, so this is a real problem ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Fri, Nov 06, 2015 at 01:09:48PM +, Matt Fleming wrote: > On Thu, 05 Nov, at 11:05:35PM, Andy Lutomirski wrote: > > > > Admittedly, we might need to use a certain amount of care to avoid > > interesting conflicts with the vmap mechanism. We might need to vmap > > all of the EFI stuff, and possibly even all the top-level entries that > > contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*) > > as a blank not-present region to avoid overlaps, but that's not a big > > deal. > > There shouldn't be any room for conflicting with vmap() because the VA > region where we map EFI regions is still carved out especially for us. > > Right Boris? Yap: ea00 - eaff (=40 bits) virtual memory map (1TB) vs ffef - EFI region in trampoline_pgd the new pagetable will make that issue moot too. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: > > 3) We should fix the EFI permission problem without relying on the firmware: > it > appears we could just mark everything R-X optimistically, and if a write > fault > happens (it's pretty rare in fact, only triggers when we write to an EFI > variable and so), we can mark the faulting page RW- on the fly, because > it > appears that writable EFI sections, while not enumerated very well in > 'old' > firmware, are still supposed to be page granular. (Even 'new' firmware I > wouldn't automatically trust to get the enumeration right...) Sorry, this isn't true. I misled you with one of my earlier posts on this topic. Let me try and clear things up... Writing to EFI regions has to do with every invocation of the EFI runtime services - it's not limited to when you read/write/delete EFI variables. In fact, EFI variables really have nothing to do with this discussion, they're a completely opaque concept to the OS, we have no idea how the firmware implements them. Everything is done via the EFI boot/runtime services. The firmware itself will attempt to write to EFI regions when we invoke the EFI services because that's where the PE/COFF ".data" and ".bss" sections live along with the heap. There's even some relocation fixups that occur as SetVirtualAddressMap() time so it'll write to ".text" too. Now, the above PE/COFF sections are usually (always?) contained within EFI regions of type EfiRuntimeServicesCode. We know this is true because the firmware folks have told us so, and because stopping that is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI V2.5. The data sections within the region are also *not* guaranteed to be page granular because work was required in Tianocore for emitting sections with 4k alignment as part of the EFI_PROPERTIES_TABLE support. Ultimately, what this means is that if you were to attempt to dynamically fixup those regions that required write permission, you'd have to modify the mappings for the majority of the EFI regions anyway. And if you're blindly allowing write permission as a fixup, there's not much security to be had. > If that 'supposed to be' turns out to be 'not true' (not unheard of in > firmware land), then plan B would be to mark pages that generate write > faults > RWX as well, to not break functionality. (This 'mark it RWX' is not > something > that exploits would have easy access to, and we could also generate a > warning > [after the EFI call has finished] if it ever triggers.) > > Admittedly this approach might not be without its own complications, but > it > looks reasonably simple (I don't think we need per EFI call page tables, > etc.), and does not assume much about the firmware being able to > enumerate its > permissions properly. Were we to merge EFI support today I'd have > insisted on > trying such an approach from day 1 on. We already have separate EFI page tables, though with the caveat that we share some of swapper_pg_dir's PGD entries. The best solution would be to stop sharing entries and isolate the EFI mappings from every other page table structure, so that they're only used during the EFI service calls. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote: > > And if this turns out to be due to EFI wanting those permissions, what > should we do? People have talked about running the EFI callbacks in > their own private page table setup, which sounds like the right idea, > but until that actually *happens* We have separate page tables today, for a few reasons, but mainly it's so that we can have an identity mapping of memory present in the region usually used by user processes - broken firmware still uses those identity mappings even after the kernel tells it they're invalid. Note that when I say "separate" I'm talking about trampoline_pgd[] which is also used by the x86 suspend/resume code. However, turns out that the issue with the current scheme is the fact that trampoline_pgd[] actually shares a couple of PGD entries with swapper_pg_dir as can be seen in setup_real_mode(), trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd); trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd; trampoline_pgd[511] = init_level4_pgt[511].pgd; So when we map the EFI regions in efi_map_regions() we're inserting them into swapper_pg_dir also, which is why you're seeing the warnings. If I remember correctly the rationale for using trampoline_pgd[] was that it already did what we wanted (provided the identity mapping) and would save us the overhead of maintaining more page tables for no good reason. Obviously this entire thread is a good reason. I suggest we stop using trampoline_pgd[] (since it has a good reason for sharing the kernel mapping PGD entries) and create our own so that we can isolate EFI completely. For the immediate problem of the warnings spewing forth on all UEFI machines, at the very least the config options needs to be disabled by default, if not the patch reverted. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, 05 Nov, at 11:05:35PM, Andy Lutomirski wrote: > > Admittedly, we might need to use a certain amount of care to avoid > interesting conflicts with the vmap mechanism. We might need to vmap > all of the EFI stuff, and possibly even all the top-level entries that > contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*) > as a blank not-present region to avoid overlaps, but that's not a big > deal. There shouldn't be any room for conflicting with vmap() because the VA region where we map EFI regions is still carved out especially for us. Right Boris? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
(resent with Matt's email address fixed.) * Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: > > > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > > > > > > > > I don't have that later debug output at all. Presumably some config > > > difference. > > > > > > CONFIG_X86_PTDUMP_CORE iirc. > > > > No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. > > > > Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should > > not default to 'y' unless it is made more useful if it actually > > triggers. Ingo? > > Yeah, agreed absolutely. > > So this is a bit sad because RWX pages are a real problem in practice, > especially > since the EFI addresses are well predictable, but generating a warning > without > being able to fix it quickly is counterproductive as well, as it only annoys > people and makes them turn off the option. (Which we could do as well to > begin > with, without the annoyance factor...) > > So the plan would be: > > 1) Make it default-n. > > 2) We should try to further improve the messages to make it easier to > determine > what's wrong. We _do_ try to output symbolic information in the warning, > to > make it easier to find buggy mappings, but these are not standard kernel > mappings. So I think we need an e820 mappings based semi-symbolic > printout of > bad addresses - maybe even correlate it with the MMIO resource tree. > > 3) We should fix the EFI permission problem without relying on the firmware: > it > appears we could just mark everything R-X optimistically, and if a write > fault > happens (it's pretty rare in fact, only triggers when we write to an EFI > variable and so), we can mark the faulting page RW- on the fly, because > it > appears that writable EFI sections, while not enumerated very well in > 'old' > firmware, are still supposed to be page granular. (Even 'new' firmware I > wouldn't automatically trust to get the enumeration right...) > > If that 'supposed to be' turns out to be 'not true' (not unheard of in > firmware land), then plan B would be to mark pages that generate write > faults > RWX as well, to not break functionality. (This 'mark it RWX' is not > something > that exploits would have easy access to, and we could also generate a > warning > [after the EFI call has finished] if it ever triggers.) > > Admittedly this approach might not be without its own complications, but > it > looks reasonably simple (I don't think we need per EFI call page tables, > etc.), and does not assume much about the firmware being able to > enumerate its > permissions properly. Were we to merge EFI support today I'd have > insisted on > trying such an approach from day 1 on. > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, Nov 5, 2015 at 10:55 PM, Ingo Molnar wrote: > > * Linus Torvalds wrote: > >> On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: >> > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: >> > > >> > > I don't have that later debug output at all. Presumably some config >> > difference. >> > >> > CONFIG_X86_PTDUMP_CORE iirc. >> >> No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. >> >> Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should >> not default to 'y' unless it is made more useful if it actually >> triggers. Ingo? > > Yeah, agreed absolutely. > > So this is a bit sad because RWX pages are a real problem in practice, > especially > since the EFI addresses are well predictable, but generating a warning without > being able to fix it quickly is counterproductive as well, as it only annoys > people and makes them turn off the option. (Which we could do as well to begin > with, without the annoyance factor...) > > So the plan would be: > > 1) Make it default-n. > > 2) We should try to further improve the messages to make it easier to > determine > what's wrong. We _do_ try to output symbolic information in the warning, > to > make it easier to find buggy mappings, but these are not standard kernel > mappings. So I think we need an e820 mappings based semi-symbolic > printout of > bad addresses - maybe even correlate it with the MMIO resource tree. > > 3) We should fix the EFI permission problem without relying on the firmware: > it > appears we could just mark everything R-X optimistically, and if a write > fault > happens (it's pretty rare in fact, only triggers when we write to an EFI > variable and so), we can mark the faulting page RW- on the fly, because it > appears that writable EFI sections, while not enumerated very well in > 'old' > firmware, are still supposed to be page granular. (Even 'new' firmware I > wouldn't automatically trust to get the enumeration right...) I think it was Borislav who pointed out that this idea, which might have been mine, is a bit silly. Why not just skip mapping the EFI stuff in the init_pgd entirely and only map it in the EFI pgd? We'll have RWX stuff in the EFI pgd, but so what? If we're exposing anything that runs with the EFI pgd loaded to untrusted input, I think we've already lost. Admittedly, we might need to use a certain amount of care to avoid interesting conflicts with the vmap mechanism. We might need to vmap all of the EFI stuff, and possibly even all the top-level entries that contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*) as a blank not-present region to avoid overlaps, but that's not a big deal. > > If that 'supposed to be' turns out to be 'not true' (not unheard of in > firmware land), then plan B would be to mark pages that generate write > faults > RWX as well, to not break functionality. (This 'mark it RWX' is not > something > that exploits would have easy access to, and we could also generate a > warning > [after the EFI call has finished] if it ever triggers.) > > Admittedly this approach might not be without its own complications, but > it > looks reasonably simple (I don't think we need per EFI call page tables, > etc.), and does not assume much about the firmware being able to > enumerate its > permissions properly. Were we to merge EFI support today I'd have > insisted on > trying such an approach from day 1 on. I think we have separate EFI page tables already for other reasons. I could be wrong -- I've never really understood the EFI mapping layout very well. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
* Linus Torvalds wrote: > On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: > > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > > > > > > I don't have that later debug output at all. Presumably some config > > difference. > > > > CONFIG_X86_PTDUMP_CORE iirc. > > No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. > > Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should > not default to 'y' unless it is made more useful if it actually > triggers. Ingo? Yeah, agreed absolutely. So this is a bit sad because RWX pages are a real problem in practice, especially since the EFI addresses are well predictable, but generating a warning without being able to fix it quickly is counterproductive as well, as it only annoys people and makes them turn off the option. (Which we could do as well to begin with, without the annoyance factor...) So the plan would be: 1) Make it default-n. 2) We should try to further improve the messages to make it easier to determine what's wrong. We _do_ try to output symbolic information in the warning, to make it easier to find buggy mappings, but these are not standard kernel mappings. So I think we need an e820 mappings based semi-symbolic printout of bad addresses - maybe even correlate it with the MMIO resource tree. 3) We should fix the EFI permission problem without relying on the firmware: it appears we could just mark everything R-X optimistically, and if a write fault happens (it's pretty rare in fact, only triggers when we write to an EFI variable and so), we can mark the faulting page RW- on the fly, because it appears that writable EFI sections, while not enumerated very well in 'old' firmware, are still supposed to be page granular. (Even 'new' firmware I wouldn't automatically trust to get the enumeration right...) If that 'supposed to be' turns out to be 'not true' (not unheard of in firmware land), then plan B would be to mark pages that generate write faults RWX as well, to not break functionality. (This 'mark it RWX' is not something that exploits would have easy access to, and we could also generate a warning [after the EFI call has finished] if it ever triggers.) Admittedly this approach might not be without its own complications, but it looks reasonably simple (I don't think we need per EFI call page tables, etc.), and does not assume much about the firmware being able to enumerate its permissions properly. Were we to merge EFI support today I'd have insisted on trying such an approach from day 1 on. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, Nov 05, 2015 at 02:04:55PM -0800, Linus Torvalds wrote: > and there's quite a few other pages there that are RW but not marked > NX. I suspect they come from the EFI runtime services because the Yeah, at least the EFI mappings would need a bit more fiddling until they're NX: https://lkml.kernel.org/r/20151012124113.gd2...@codeblueprint.co.uk Perhaps we don't even have to make them NX - just set the PGD entry containing them before the EFI call and unset it after its done. I.e., something like that. > pattern seems to match what I see in that area, but there's at least a > PSE mapping at START_KERNEL_map too, etc. Maybe we should default n CONFIG_DEBUG_WX until stuff has been fixed...? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, Nov 5, 2015 at 1:27 PM, Linus Torvalds wrote: > > No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. Yes, that seems to show the tables, and agrees with the problem address. So for me I have: WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780() x86/mm: Found insecure W+X mapping at address 8805f000/0x8805f000 and that 005f000 address also shows up in the firmware tables: ACPI: 6 ACPI AML tables successfully acquired and loaded ---[ User Space ]--- 0x-0x0005f000 380K RW GLB NX pte 0x0005f000-0x0006 4K RW GLB x pte 0x0006-0x00201664K RW GLB NX pte ... ---[ Low Kernel Mapping ]--- 0x8800-0x8805f000 380K RW GLB NX pte 0x8805f000-0x8806 4K RW GLB x pte 0x8806-0x88201664K RW GLB NX pte ... and there's quite a few other pages there that are RW but not marked NX. I suspect they come from the EFI runtime services because the pattern seems to match what I see in that area, but there's at least a PSE mapping at START_KERNEL_map too, etc. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, Nov 5, 2015 at 1:27 PM, Linus Torvalds wrote: > > I suspect CONFIG_EFI_PGT_DUMP instead. > > Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should > not default to 'y' unless it is made more useful if it actually > triggers. Ingo? Actually, I guess I should have cc'd Steven Smalley too, since he's the author. Steven, the problem is that without other debug options, the CONFIG_DEBUG_WX option doesn't seem useful. It shows a problem, but it doesn't actually show any information about where that issue comes from or any other details to even *guess* at what is going on. And if this turns out to be due to EFI wanting those permissions, what should we do? People have talked about running the EFI callbacks in their own private page table setup, which sounds like the right idea, but until that actually *happens* Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > > > > I don't have that later debug output at all. Presumably some config > difference. > > CONFIG_X86_PTDUMP_CORE iirc. No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should not default to 'y' unless it is made more useful if it actually triggers. Ingo? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, Nov 5, 2015 at 1:27 PM, Linus Torvaldswrote: > > I suspect CONFIG_EFI_PGT_DUMP instead. > > Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should > not default to 'y' unless it is made more useful if it actually > triggers. Ingo? Actually, I guess I should have cc'd Steven Smalley too, since he's the author. Steven, the problem is that without other debug options, the CONFIG_DEBUG_WX option doesn't seem useful. It shows a problem, but it doesn't actually show any information about where that issue comes from or any other details to even *guess* at what is going on. And if this turns out to be due to EFI wanting those permissions, what should we do? People have talked about running the EFI callbacks in their own private page table setup, which sounds like the right idea, but until that actually *happens* Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, Nov 05, 2015 at 02:04:55PM -0800, Linus Torvalds wrote: > and there's quite a few other pages there that are RW but not marked > NX. I suspect they come from the EFI runtime services because the Yeah, at least the EFI mappings would need a bit more fiddling until they're NX: https://lkml.kernel.org/r/20151012124113.gd2...@codeblueprint.co.uk Perhaps we don't even have to make them NX - just set the PGD entry containing them before the EFI call and unset it after its done. I.e., something like that. > pattern seems to match what I see in that area, but there's at least a > PSE mapping at START_KERNEL_map too, etc. Maybe we should default n CONFIG_DEBUG_WX until stuff has been fixed...? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Wed, Nov 4, 2015 at 6:17 PM, Dave Joneswrote: > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > > > > I don't have that later debug output at all. Presumably some config > difference. > > CONFIG_X86_PTDUMP_CORE iirc. No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should not default to 'y' unless it is made more useful if it actually triggers. Ingo? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, Nov 5, 2015 at 1:27 PM, Linus Torvaldswrote: > > No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. Yes, that seems to show the tables, and agrees with the problem address. So for me I have: WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780() x86/mm: Found insecure W+X mapping at address 8805f000/0x8805f000 and that 005f000 address also shows up in the firmware tables: ACPI: 6 ACPI AML tables successfully acquired and loaded ---[ User Space ]--- 0x-0x0005f000 380K RW GLB NX pte 0x0005f000-0x0006 4K RW GLB x pte 0x0006-0x00201664K RW GLB NX pte ... ---[ Low Kernel Mapping ]--- 0x8800-0x8805f000 380K RW GLB NX pte 0x8805f000-0x8806 4K RW GLB x pte 0x8806-0x88201664K RW GLB NX pte ... and there's quite a few other pages there that are RW but not marked NX. I suspect they come from the EFI runtime services because the pattern seems to match what I see in that area, but there's at least a PSE mapping at START_KERNEL_map too, etc. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
* Linus Torvaldswrote: > On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: > > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > > > > > > I don't have that later debug output at all. Presumably some config > > difference. > > > > CONFIG_X86_PTDUMP_CORE iirc. > > No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. > > Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should > not default to 'y' unless it is made more useful if it actually > triggers. Ingo? Yeah, agreed absolutely. So this is a bit sad because RWX pages are a real problem in practice, especially since the EFI addresses are well predictable, but generating a warning without being able to fix it quickly is counterproductive as well, as it only annoys people and makes them turn off the option. (Which we could do as well to begin with, without the annoyance factor...) So the plan would be: 1) Make it default-n. 2) We should try to further improve the messages to make it easier to determine what's wrong. We _do_ try to output symbolic information in the warning, to make it easier to find buggy mappings, but these are not standard kernel mappings. So I think we need an e820 mappings based semi-symbolic printout of bad addresses - maybe even correlate it with the MMIO resource tree. 3) We should fix the EFI permission problem without relying on the firmware: it appears we could just mark everything R-X optimistically, and if a write fault happens (it's pretty rare in fact, only triggers when we write to an EFI variable and so), we can mark the faulting page RW- on the fly, because it appears that writable EFI sections, while not enumerated very well in 'old' firmware, are still supposed to be page granular. (Even 'new' firmware I wouldn't automatically trust to get the enumeration right...) If that 'supposed to be' turns out to be 'not true' (not unheard of in firmware land), then plan B would be to mark pages that generate write faults RWX as well, to not break functionality. (This 'mark it RWX' is not something that exploits would have easy access to, and we could also generate a warning [after the EFI call has finished] if it ever triggers.) Admittedly this approach might not be without its own complications, but it looks reasonably simple (I don't think we need per EFI call page tables, etc.), and does not assume much about the firmware being able to enumerate its permissions properly. Were we to merge EFI support today I'd have insisted on trying such an approach from day 1 on. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Thu, Nov 5, 2015 at 10:55 PM, Ingo Molnarwrote: > > * Linus Torvalds wrote: > >> On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: >> > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: >> > > >> > > I don't have that later debug output at all. Presumably some config >> > difference. >> > >> > CONFIG_X86_PTDUMP_CORE iirc. >> >> No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. >> >> Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should >> not default to 'y' unless it is made more useful if it actually >> triggers. Ingo? > > Yeah, agreed absolutely. > > So this is a bit sad because RWX pages are a real problem in practice, > especially > since the EFI addresses are well predictable, but generating a warning without > being able to fix it quickly is counterproductive as well, as it only annoys > people and makes them turn off the option. (Which we could do as well to begin > with, without the annoyance factor...) > > So the plan would be: > > 1) Make it default-n. > > 2) We should try to further improve the messages to make it easier to > determine > what's wrong. We _do_ try to output symbolic information in the warning, > to > make it easier to find buggy mappings, but these are not standard kernel > mappings. So I think we need an e820 mappings based semi-symbolic > printout of > bad addresses - maybe even correlate it with the MMIO resource tree. > > 3) We should fix the EFI permission problem without relying on the firmware: > it > appears we could just mark everything R-X optimistically, and if a write > fault > happens (it's pretty rare in fact, only triggers when we write to an EFI > variable and so), we can mark the faulting page RW- on the fly, because it > appears that writable EFI sections, while not enumerated very well in > 'old' > firmware, are still supposed to be page granular. (Even 'new' firmware I > wouldn't automatically trust to get the enumeration right...) I think it was Borislav who pointed out that this idea, which might have been mine, is a bit silly. Why not just skip mapping the EFI stuff in the init_pgd entirely and only map it in the EFI pgd? We'll have RWX stuff in the EFI pgd, but so what? If we're exposing anything that runs with the EFI pgd loaded to untrusted input, I think we've already lost. Admittedly, we might need to use a certain amount of care to avoid interesting conflicts with the vmap mechanism. We might need to vmap all of the EFI stuff, and possibly even all the top-level entries that contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*) as a blank not-present region to avoid overlaps, but that's not a big deal. > > If that 'supposed to be' turns out to be 'not true' (not unheard of in > firmware land), then plan B would be to mark pages that generate write > faults > RWX as well, to not break functionality. (This 'mark it RWX' is not > something > that exploits would have easy access to, and we could also generate a > warning > [after the EFI call has finished] if it ever triggers.) > > Admittedly this approach might not be without its own complications, but > it > looks reasonably simple (I don't think we need per EFI call page tables, > etc.), and does not assume much about the firmware being able to > enumerate its > permissions properly. Were we to merge EFI support today I'd have > insisted on > trying such an approach from day 1 on. I think we have separate EFI page tables already for other reasons. I could be wrong -- I've never really understood the EFI mapping layout very well. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
(resent with Matt's email address fixed.) * Ingo Molnarwrote: > > * Linus Torvalds wrote: > > > On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones wrote: > > > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > > > > > > > > I don't have that later debug output at all. Presumably some config > > > difference. > > > > > > CONFIG_X86_PTDUMP_CORE iirc. > > > > No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead. > > > > Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should > > not default to 'y' unless it is made more useful if it actually > > triggers. Ingo? > > Yeah, agreed absolutely. > > So this is a bit sad because RWX pages are a real problem in practice, > especially > since the EFI addresses are well predictable, but generating a warning > without > being able to fix it quickly is counterproductive as well, as it only annoys > people and makes them turn off the option. (Which we could do as well to > begin > with, without the annoyance factor...) > > So the plan would be: > > 1) Make it default-n. > > 2) We should try to further improve the messages to make it easier to > determine > what's wrong. We _do_ try to output symbolic information in the warning, > to > make it easier to find buggy mappings, but these are not standard kernel > mappings. So I think we need an e820 mappings based semi-symbolic > printout of > bad addresses - maybe even correlate it with the MMIO resource tree. > > 3) We should fix the EFI permission problem without relying on the firmware: > it > appears we could just mark everything R-X optimistically, and if a write > fault > happens (it's pretty rare in fact, only triggers when we write to an EFI > variable and so), we can mark the faulting page RW- on the fly, because > it > appears that writable EFI sections, while not enumerated very well in > 'old' > firmware, are still supposed to be page granular. (Even 'new' firmware I > wouldn't automatically trust to get the enumeration right...) > > If that 'supposed to be' turns out to be 'not true' (not unheard of in > firmware land), then plan B would be to mark pages that generate write > faults > RWX as well, to not break functionality. (This 'mark it RWX' is not > something > that exploits would have easy access to, and we could also generate a > warning > [after the EFI call has finished] if it ever triggers.) > > Admittedly this approach might not be without its own complications, but > it > looks reasonably simple (I don't think we need per EFI call page tables, > etc.), and does not assume much about the firmware being able to > enumerate its > permissions properly. Were we to merge EFI support today I'd have > insisted on > trying such an approach from day 1 on. > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > On Wed, Nov 4, 2015 at 3:39 PM, Dave Jones wrote: > > > > FWIW I'm seeing this too. > > > > [0.468368] ---[ Low Kernel Mapping ]--- > > [0.468381] 0x8800-0x8880 8M RW > > GLB NX pte > > [0.468391] 0x8880-0x8890 1M RW > > GLB x pte > > > > Linus, does that match your trace too ? The 2nd low kernel mapping? > > I don't have that later debug output at all. Presumably some config > difference. CONFIG_X86_PTDUMP_CORE iirc. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Wed, Nov 4, 2015 at 3:39 PM, Dave Jones wrote: > > FWIW I'm seeing this too. > > [0.468368] ---[ Low Kernel Mapping ]--- > [0.468381] 0x8800-0x8880 8M RW >GLB NX pte > [0.468391] 0x8880-0x8890 1M RW >GLB x pte > > Linus, does that match your trace too ? The 2nd low kernel mapping? I don't have that later debug output at all. Presumably some config difference. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Wed, Nov 04, 2015 at 11:26:12AM -0800, Linus Torvalds wrote: > On Tue, Nov 3, 2015 at 3:16 AM, Ingo Molnar wrote: > > > > The new CONFIG_DEBUG_WX=y warning is marked default-y if > > CONFIG_DEBUG_RODATA=y is > > already eanbled, as a special exception, as these bugs are hard to notice > > and this > > check already found several live bugs. > > So this seems to be not very useful. > > ... > Freeing unused kernel memory: 1068K (8bcc - > 8bdcb000) > Write protecting the kernel read-only data: 12288k > Freeing unused kernel memory: 1944K (88000b61a000 - 88000b80) > Freeing unused kernel memory: 1372K (88000baa9000 - 88000bc0) > [ cut here ] > WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 > note_page+0x5dc/0x780() > x86/mm: Found insecure W+X mapping at address > 8805f000/0x8805f000 > ... > x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found. > ... > > because it doesn't seem to give you any idea where to look for the > issue. Those 165660 pages come out to 647MB. Odd number. > > Is this *supposed* to come out clean? FWIW I'm seeing this too. [3.293503] x86/mm: Found insecure W+X mapping at address 8880/0x8880 Looking further up the dmesg I see that range is.. [0.468368] ---[ Low Kernel Mapping ]--- [0.468381] 0x8800-0x8880 8M RW GLB NX pte [0.468391] 0x8880-0x8890 1M RW GLB x pte Linus, does that match your trace too ? The 2nd low kernel mapping? Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Tue, Nov 3, 2015 at 3:16 AM, Ingo Molnar wrote: > > The new CONFIG_DEBUG_WX=y warning is marked default-y if > CONFIG_DEBUG_RODATA=y is > already eanbled, as a special exception, as these bugs are hard to notice and > this > check already found several live bugs. So this seems to be not very useful. ... Freeing unused kernel memory: 1068K (8bcc - 8bdcb000) Write protecting the kernel read-only data: 12288k Freeing unused kernel memory: 1944K (88000b61a000 - 88000b80) Freeing unused kernel memory: 1372K (88000baa9000 - 88000bc0) [ cut here ] WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780() x86/mm: Found insecure W+X mapping at address 8805f000/0x8805f000 ... x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found. ... because it doesn't seem to give you any idea where to look for the issue. Those 165660 pages come out to 647MB. Odd number. Is this *supposed* to come out clean? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Wed, Nov 4, 2015 at 3:39 PM, Dave Joneswrote: > > FWIW I'm seeing this too. > > [0.468368] ---[ Low Kernel Mapping ]--- > [0.468381] 0x8800-0x8880 8M RW >GLB NX pte > [0.468391] 0x8880-0x8890 1M RW >GLB x pte > > Linus, does that match your trace too ? The 2nd low kernel mapping? I don't have that later debug output at all. Presumably some config difference. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Wed, Nov 04, 2015 at 11:26:12AM -0800, Linus Torvalds wrote: > On Tue, Nov 3, 2015 at 3:16 AM, Ingo Molnarwrote: > > > > The new CONFIG_DEBUG_WX=y warning is marked default-y if > > CONFIG_DEBUG_RODATA=y is > > already eanbled, as a special exception, as these bugs are hard to notice > > and this > > check already found several live bugs. > > So this seems to be not very useful. > > ... > Freeing unused kernel memory: 1068K (8bcc - > 8bdcb000) > Write protecting the kernel read-only data: 12288k > Freeing unused kernel memory: 1944K (88000b61a000 - 88000b80) > Freeing unused kernel memory: 1372K (88000baa9000 - 88000bc0) > [ cut here ] > WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 > note_page+0x5dc/0x780() > x86/mm: Found insecure W+X mapping at address > 8805f000/0x8805f000 > ... > x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found. > ... > > because it doesn't seem to give you any idea where to look for the > issue. Those 165660 pages come out to 647MB. Odd number. > > Is this *supposed* to come out clean? FWIW I'm seeing this too. [3.293503] x86/mm: Found insecure W+X mapping at address 8880/0x8880 Looking further up the dmesg I see that range is.. [0.468368] ---[ Low Kernel Mapping ]--- [0.468381] 0x8800-0x8880 8M RW GLB NX pte [0.468391] 0x8880-0x8890 1M RW GLB x pte Linus, does that match your trace too ? The 2nd low kernel mapping? Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote: > On Wed, Nov 4, 2015 at 3:39 PM, Dave Joneswrote: > > > > FWIW I'm seeing this too. > > > > [0.468368] ---[ Low Kernel Mapping ]--- > > [0.468381] 0x8800-0x8880 8M RW > > GLB NX pte > > [0.468391] 0x8880-0x8890 1M RW > > GLB x pte > > > > Linus, does that match your trace too ? The 2nd low kernel mapping? > > I don't have that later debug output at all. Presumably some config > difference. CONFIG_X86_PTDUMP_CORE iirc. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/mm changes for v4.4
On Tue, Nov 3, 2015 at 3:16 AM, Ingo Molnarwrote: > > The new CONFIG_DEBUG_WX=y warning is marked default-y if > CONFIG_DEBUG_RODATA=y is > already eanbled, as a special exception, as these bugs are hard to notice and > this > check already found several live bugs. So this seems to be not very useful. ... Freeing unused kernel memory: 1068K (8bcc - 8bdcb000) Write protecting the kernel read-only data: 12288k Freeing unused kernel memory: 1944K (88000b61a000 - 88000b80) Freeing unused kernel memory: 1372K (88000baa9000 - 88000bc0) [ cut here ] WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780() x86/mm: Found insecure W+X mapping at address 8805f000/0x8805f000 ... x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found. ... because it doesn't seem to give you any idea where to look for the issue. Those 165660 pages come out to 647MB. Odd number. Is this *supposed* to come out clean? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] x86/mm changes for v4.4
Linus, Please pull the latest x86-mm-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-for-linus # HEAD: e1a58320a38dfa72be48a0f1a3a92273663ba6db x86/mm: Warn on W^X mappings The main changes are: continued PAT work by Toshi Kani, plus a new boot time warning about insecure RWX kernel mappings, by Stephen Smalley. The new CONFIG_DEBUG_WX=y warning is marked default-y if CONFIG_DEBUG_RODATA=y is already eanbled, as a special exception, as these bugs are hard to notice and this check already found several live bugs. Thanks, Ingo --> Stephen Smalley (1): x86/mm: Warn on W^X mappings Toshi Kani (11): x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO x86/asm: Move PUD_PAGE macros to page_types.h x86/asm: Add pud/pmd mask interfaces to handle large PAT bit x86/asm: Fix pud/pmd interfaces to handle large PAT bit x86/asm: Add pud_pgprot() and pmd_pgprot() x86/mm: Fix page table dump to show PAT bit x86/mm: Fix slow_virt_to_phys() to handle large PAT bit x86/mm: Fix gup_huge_p?d() to handle large PAT bit x86/mm: Fix try_preserve_large_page() to handle large PAT bit x86/mm: Fix __split_large_page() to handle large PAT bit x86/mm: Fix no-change case in try_preserve_large_page() arch/x86/Kconfig.debug | 36 - arch/x86/entry/vdso/vdso32/vclock_gettime.c | 2 + arch/x86/include/asm/page_64_types.h| 3 -- arch/x86/include/asm/page_types.h | 3 ++ arch/x86/include/asm/pgtable.h | 25 ++--- arch/x86/include/asm/pgtable_types.h| 40 -- arch/x86/mm/Makefile| 2 +- arch/x86/mm/dump_pagetables.c | 81 ++--- arch/x86/mm/gup.c | 18 +++ arch/x86/mm/init_32.c | 2 + arch/x86/mm/init_64.c | 2 + arch/x86/mm/pageattr.c | 79 +--- 12 files changed, 219 insertions(+), 74 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d8c0d3266173..3e0baf726eef 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI This is useful for kernel debugging when your machine crashes very early before the console code is initialized. +config X86_PTDUMP_CORE + def_bool n + config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS + select X86_PTDUMP_CORE ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers @@ -79,7 +83,8 @@ config X86_PTDUMP config EFI_PGT_DUMP bool "Dump the EFI pagetable" - depends on EFI && X86_PTDUMP + depends on EFI + select X86_PTDUMP_CORE ---help--- Enable this if you want to dump the EFI page table before enabling virtual mode. This can be used to debug miscellaneous @@ -105,6 +110,35 @@ config DEBUG_RODATA_TEST feature as well as for the change_page_attr() infrastructure. If in doubt, say "N" +config DEBUG_WX + bool "Warn on W+X mappings at boot" + depends on DEBUG_RODATA + default y + select X86_PTDUMP_CORE + ---help--- + Generate a warning if any W+X mappings are found at boot. + + This is useful for discovering cases where the kernel is leaving + W+X mappings after applying NX, as such mappings are a security risk. + + Look for a message in dmesg output like this: + + x86/mm: Checked W+X mappings: passed, no W+X pages found. + + or like this, if the check failed: + + x86/mm: Checked W+X mappings: FAILED, W+X pages found. + + Note that even if the check fails, your kernel is possibly + still fine, as W+X mappings are not a security hole in + themselves, what they do is that they make the exploitation + of other unfixed kernel bugs easier. + + There is no runtime or memory usage effect of this option + once the kernel has booted up - it's a one time check. + + If in doubt, say "Y". + config DEBUG_SET_MODULE_RONX bool "Set loadable kernel module data as NX and text as RO" depends on MODULES diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c b/arch/x86/entry/vdso/vdso32/vclock_gettime.c index 175cc72c0f68..87a86e017f0e 100644 --- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c +++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c @@ -14,11 +14,13 @@ */ #undef CONFIG_64BIT #undef CONFIG_X86_64 +#undef CONFIG_PGTABLE_LEVELS #undef CONFIG_ILLEGAL_POINTER_VALUE #undef CONFIG_SPARSEMEM_VMEMMAP #undef CONFIG_NR_CPUS #define CONFIG_X86_32 1
[GIT PULL] x86/mm changes for v4.4
Linus, Please pull the latest x86-mm-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-for-linus # HEAD: e1a58320a38dfa72be48a0f1a3a92273663ba6db x86/mm: Warn on W^X mappings The main changes are: continued PAT work by Toshi Kani, plus a new boot time warning about insecure RWX kernel mappings, by Stephen Smalley. The new CONFIG_DEBUG_WX=y warning is marked default-y if CONFIG_DEBUG_RODATA=y is already eanbled, as a special exception, as these bugs are hard to notice and this check already found several live bugs. Thanks, Ingo --> Stephen Smalley (1): x86/mm: Warn on W^X mappings Toshi Kani (11): x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO x86/asm: Move PUD_PAGE macros to page_types.h x86/asm: Add pud/pmd mask interfaces to handle large PAT bit x86/asm: Fix pud/pmd interfaces to handle large PAT bit x86/asm: Add pud_pgprot() and pmd_pgprot() x86/mm: Fix page table dump to show PAT bit x86/mm: Fix slow_virt_to_phys() to handle large PAT bit x86/mm: Fix gup_huge_p?d() to handle large PAT bit x86/mm: Fix try_preserve_large_page() to handle large PAT bit x86/mm: Fix __split_large_page() to handle large PAT bit x86/mm: Fix no-change case in try_preserve_large_page() arch/x86/Kconfig.debug | 36 - arch/x86/entry/vdso/vdso32/vclock_gettime.c | 2 + arch/x86/include/asm/page_64_types.h| 3 -- arch/x86/include/asm/page_types.h | 3 ++ arch/x86/include/asm/pgtable.h | 25 ++--- arch/x86/include/asm/pgtable_types.h| 40 -- arch/x86/mm/Makefile| 2 +- arch/x86/mm/dump_pagetables.c | 81 ++--- arch/x86/mm/gup.c | 18 +++ arch/x86/mm/init_32.c | 2 + arch/x86/mm/init_64.c | 2 + arch/x86/mm/pageattr.c | 79 +--- 12 files changed, 219 insertions(+), 74 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d8c0d3266173..3e0baf726eef 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -65,10 +65,14 @@ config EARLY_PRINTK_EFI This is useful for kernel debugging when your machine crashes very early before the console code is initialized. +config X86_PTDUMP_CORE + def_bool n + config X86_PTDUMP bool "Export kernel pagetable layout to userspace via debugfs" depends on DEBUG_KERNEL select DEBUG_FS + select X86_PTDUMP_CORE ---help--- Say Y here if you want to show the kernel pagetable layout in a debugfs file. This information is only useful for kernel developers @@ -79,7 +83,8 @@ config X86_PTDUMP config EFI_PGT_DUMP bool "Dump the EFI pagetable" - depends on EFI && X86_PTDUMP + depends on EFI + select X86_PTDUMP_CORE ---help--- Enable this if you want to dump the EFI page table before enabling virtual mode. This can be used to debug miscellaneous @@ -105,6 +110,35 @@ config DEBUG_RODATA_TEST feature as well as for the change_page_attr() infrastructure. If in doubt, say "N" +config DEBUG_WX + bool "Warn on W+X mappings at boot" + depends on DEBUG_RODATA + default y + select X86_PTDUMP_CORE + ---help--- + Generate a warning if any W+X mappings are found at boot. + + This is useful for discovering cases where the kernel is leaving + W+X mappings after applying NX, as such mappings are a security risk. + + Look for a message in dmesg output like this: + + x86/mm: Checked W+X mappings: passed, no W+X pages found. + + or like this, if the check failed: + + x86/mm: Checked W+X mappings: FAILED, W+X pages found. + + Note that even if the check fails, your kernel is possibly + still fine, as W+X mappings are not a security hole in + themselves, what they do is that they make the exploitation + of other unfixed kernel bugs easier. + + There is no runtime or memory usage effect of this option + once the kernel has booted up - it's a one time check. + + If in doubt, say "Y". + config DEBUG_SET_MODULE_RONX bool "Set loadable kernel module data as NX and text as RO" depends on MODULES diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c b/arch/x86/entry/vdso/vdso32/vclock_gettime.c index 175cc72c0f68..87a86e017f0e 100644 --- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c +++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c @@ -14,11 +14,13 @@ */ #undef CONFIG_64BIT #undef CONFIG_X86_64 +#undef CONFIG_PGTABLE_LEVELS #undef CONFIG_ILLEGAL_POINTER_VALUE #undef CONFIG_SPARSEMEM_VMEMMAP #undef CONFIG_NR_CPUS #define CONFIG_X86_32 1