Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, 14 Nov 2014, Kees Cook wrote: > On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu wrote: > > On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu wrote: > >> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook wrote: > >>> v2: > >>> - added call to free_init_pages(), as suggested by tglx > > > >> something is wrong: > >> > >> [7.842479] Freeing unused kernel memory: 3844K (82e52000 - > >> 83213000) > >> [7.843305] Write protecting the kernel read-only data: 28672k > > > > > > should use attached one instead. > > > > 1. should use _brk_end instead of , as we only use partial of > >brk. > > 2. [_brk_end, pm_end) page range is already converted. aka > >is not wasted. > > Are you sure? For me, _brk_end isn't far enough: _brk_end is guaranteed to be <= _end. But we really want to use _brk_end here, because if we have the following situation: _brk_start: 0x03ff _brk_end: 0x0300 _end: 0x04016000 So we have the following PMDs setup: 0x03e0 pmd rw nx<- covers the top of .bss and the start of .brk 0x0400 pmd rw nx<- covers the end of .brk and some random unused So if _brk_end is less than 0x0400, then cleanup_highmem() has zapped the extra PMD already. So we don't want to call set_nx() on that. If _brk_end is >= 0x0400 then we cover that last pmd with the set_nx call. Completely non obvious as anything else in that area. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, 14 Nov 2014, Kees Cook wrote: On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote: On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote: On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote: v2: - added call to free_init_pages(), as suggested by tglx something is wrong: [7.842479] Freeing unused kernel memory: 3844K (82e52000 - 83213000) [7.843305] Write protecting the kernel read-only data: 28672k should use attached one instead. 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. Are you sure? For me, _brk_end isn't far enough: _brk_end is guaranteed to be = _end. But we really want to use _brk_end here, because if we have the following situation: _brk_start: 0x03ff _brk_end: 0x0300 _end: 0x04016000 So we have the following PMDs setup: 0x03e0 pmd rw nx- covers the top of .bss and the start of .brk 0x0400 pmd rw nx- covers the end of .brk and some random unused So if _brk_end is less than 0x0400, then cleanup_highmem() has zapped the extra PMD already. So we don't want to call set_nx() on that. If _brk_end is = 0x0400 then we cover that last pmd with the set_nx call. Completely non obvious as anything else in that area. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Mon, Nov 17, 2014 at 12:27 PM, Kees Cook wrote: > On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner wrote: >> >> So the initial patch to get rid of the X mapping is of course to just >> extend the area to the PMD. A little bit different to your initial >> patch, but essentially the same. >> >> - unsigned long all_end = PFN_ALIGN(&_end); >> + unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE); >> >> I'm going to apply your V1 patch with the above roundup() >> simplification. If a page of that area gets used later on then we are >> going to split up the PMD anyway. > > That's fine by me. Yinghai Lu seems to have potentially better > solutions, but my head is spinning after reading more of this thread. > :) I just want to make sure that at the end of the day, there are no > RW+x mappings. > Please check v3 that cleanup highmap in the middle. Before patch: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8220 18M ro PSE GLB x pmd 0x8220-0x82c0 10M ro PSE GLB NX pmd 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x8300 2M RW GLB NX pte 0x8300-0x8320 2M RW PSE GLB NX pmd 0x8320-0x8340 2M RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843a20001672K RW GLB NX pte 0x843a2000-0x8440 376K RW GLB x pte 0x8440-0xa000 444M pmd After patch: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8200 16M ro PSE GLB x pmd 0x8200-0x82011000 68K ro GLB x pte 0x82011000-0x82201980K pte 0x8220-0x82a0 8M ro PSE GLB NX pmd 0x82a0-0x82a1d000 116K ro GLB NX pte 0x82a1d000-0x82c01932K pte 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x82e52000 328K RW GLB NX pte 0x82e52000-0x83001720K pte 0x8300-0x8320 2M pmd 0x8320-0x83213000 76K pte 0x83213000-0x83401972K RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843830001548K RW GLB NX pte 0x84383000-0x8440 500K pte 0x8440-0xa000 444M pmd Thanks Yinghai Subject: [PATCH 1/2] x86, 64bit: add pfn_range_is_highmapped() will use it to support holes in highmap. Signed-off-by: Yinghai Lu --- arch/x86/include/asm/pgtable_64.h |2 ++ arch/x86/mm/init_64.c | 22 ++ arch/x86/mm/pageattr.c| 16 +--- 3 files changed, 25 insertions(+), 15 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -375,6 +375,23 @@ void __init init_extra_mapping_uc(unsign __init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE); } +/* three holes at most*/ +#define NR_RANGE 4 +static struct range pfn_highmapped[NR_RANGE]; +static int nr_pfn_highmapped; + +int pfn_range_is_highmapped(unsigned long start_pfn, unsigned long end_pfn) +{ + int i; + + for (i = 0; i < nr_pfn_highmapped; i++) + if ((start_pfn >= pfn_highmapped[i].start) && + (end_pfn <= pfn_highmapped[i].end)) + return 1; + + return 0; +} + /* * The head.S code sets up the kernel high mapping: * @@ -409,6 +426,11 @@ void __init cleanup_highmap(void) if (vaddr < (unsigned long) _text || vaddr > end) set_pmd(pmd, __pmd(0)); } + + nr_pfn_highmapped = add_range(pfn_highmapped, NR_RANGE, + nr_pfn_highmapped, + __pa_symbol(_text) >> PAGE_SHIFT, + __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT); } static unsigned long __meminit Index: linux-2.6/arch/x86/mm/pageattr.c === --- linux-2.6.orig/arch/x86/mm/pageattr.c +++ linux-2.6/arch/x86/mm/pageattr.c @@ -91,20 +91,6 @@ void arch_report_meminfo(struct seq_file static inline
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Mon, Nov 17, 2014 at 12:32 PM, Russell King - ARM Linux wrote: > On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote: >> I have a series for arm that is waiting to be picked up by rmk: >> https://patchwork.ozlabs.org/patch/400383/ > > It should've been in linux-next via my tree for the last two weeks > or so. Fantastic! Thank you! -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote: > I have a series for arm that is waiting to be picked up by rmk: > https://patchwork.ozlabs.org/patch/400383/ It should've been in linux-next via my tree for the last two weeks or so. -- FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner wrote: > On Fri, 14 Nov 2014, Kees Cook wrote: >> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu wrote: >> > should use attached one instead. >> > >> > 1. should use _brk_end instead of , as we only use partial of >> >brk. >> > 2. [_brk_end, pm_end) page range is already converted. aka >> >is not wasted. >> >> Are you sure? For me, _brk_end isn't far enough: >> >> [1.475572] all_end: 0x82df5000 >> [1.476736] _brk_end: 0x82dd6000 > > _brk_end is adjusted at boot time via extend_brk() up to __brk_limit, > which is the same as _end. We usually do not use all of that space. So > it's expected that _brk_end < _end. > >> Is this correct? It sounded like tglx wanted the pmd split, like this: > > Yes, I wanted to get rid of the high mapping for anything between > _brk_end and _end, and I brought you on the wrong track with my > suggestion to call free_init_pages(). Sorry about that. > > That happened because I missed the completely non obvious fact, that > only the effective brk section is reserved for the kernel via > reserve_brk(). So the area between _brk_end and _end is already > reusable. Though that reuse works only by chance and not by design and > is completely undocumented as everything else in that area. > > So the initial patch to get rid of the X mapping is of course to just > extend the area to the PMD. A little bit different to your initial > patch, but essentially the same. > > - unsigned long all_end = PFN_ALIGN(&_end); > + unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE); > > I'm going to apply your V1 patch with the above roundup() > simplification. If a page of that area gets used later on then we are > going to split up the PMD anyway. That's fine by me. Yinghai Lu seems to have potentially better solutions, but my head is spinning after reading more of this thread. :) I just want to make sure that at the end of the day, there are no RW+x mappings. > But still we want to get rid of that highmap between _brk_end and > _end, but there is absolutely no reason to come up with extra silly > functions for that. > > So the obvious solution is to let setup_arch() reserve the memory up > to _end instead of _bss_stop, get rid of the extra reservation in > reserve_brk() and then let free_initmem() release the area between > _brk_end and _end. No extra hackery, no side effects, just works. > > I spent quite some time to stare into that and I wonder about the > following related issues: > > 1) Why is the mark_rodata_ro() business a debug configuration, i.e >CONFIG_DEBUG_RODATA? > >This does not make any sense at all. We really want RO and NX on by >default and AFAICT distros are turning that on anyway for obvious >reasons. This is a historically badly named config item. Once arm and arm64 land their CONFIG_DEBUG_RODATA implementations, I was going to suggest having this renamed. > >The only idiocity I found so far is the kgdb Documentation which >recommends to turn it off. Sigh. > >So that should be changed to: > > config ARCH_HAS_RONX > bool > > config DISABLE_RONX > def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES > >plus > > config ARCH_WANTS_KGDB_SECURITY_HOLES > bool > > config KGDB_ENABLE_SECURITY_HOLES > bool "WTF?" > depends on BROKEN && ARCH_WANTS_KGDB_SECURITY_HOLES > > 2) What is actually the modules counterpart for mark_rodata_ro()? > >CONFIG_DEBUG_SET_MODULE_RONX > >Of course not enabled by default, but enabled by distros again. > >See #1. > >Now what's interesting aside of the general fuckup is that >CONFIG_DEBUG_RODATA is supported by: > > arch/x86 and arch/parisc > >But CONFIG_DEBUG_SET_MODULE_RONX is supported by: > > arch/arm/ arch/arm64 arch/s390 and arch/x86 I have a series for arm that is waiting to be picked up by rmk: https://patchwork.ozlabs.org/patch/400383/ Laura has been working on a series for arm64: http://comments.gmane.org/gmane.linux.ports.arm.kernel/366777 So what you're seeing is us being in the middle of providing support for it. It just happens that CONFIG_DEBUG_SET_MODULE_RONX is a bit easier to implement, so it was completed first. > >This does not make any sense at all. > >Do arm/arm64/s390 have other means to make RO/NX work or are they >just doing it for modules? And how is that supposed to work with >KGDB if it is not aware of modules sections being RO/NX? KGDB has >only extra magic for CONFIG_DEBUG_RODATA, but not for >CONFIG_DEBUG_SET_MODULE_RONX. > >Now for extended fun the x86 help text for that option says: > > ... Such protection may interfere with run-time code > patching and dynamic kernel tracing - and they might also protect > against certain classes of kernel exploits. > If in doubt, say "N". > >Patently wrong. More sigh. Like the CONFIG naming, I
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner t...@linutronix.de wrote: On Fri, 14 Nov 2014, Kees Cook wrote: On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote: should use attached one instead. 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. Are you sure? For me, _brk_end isn't far enough: [1.475572] all_end: 0x82df5000 [1.476736] _brk_end: 0x82dd6000 _brk_end is adjusted at boot time via extend_brk() up to __brk_limit, which is the same as _end. We usually do not use all of that space. So it's expected that _brk_end _end. Is this correct? It sounded like tglx wanted the pmd split, like this: Yes, I wanted to get rid of the high mapping for anything between _brk_end and _end, and I brought you on the wrong track with my suggestion to call free_init_pages(). Sorry about that. That happened because I missed the completely non obvious fact, that only the effective brk section is reserved for the kernel via reserve_brk(). So the area between _brk_end and _end is already reusable. Though that reuse works only by chance and not by design and is completely undocumented as everything else in that area. So the initial patch to get rid of the X mapping is of course to just extend the area to the PMD. A little bit different to your initial patch, but essentially the same. - unsigned long all_end = PFN_ALIGN(_end); + unsigned long all_end = roundup((unsigned long) _end, PMD_SIZE); I'm going to apply your V1 patch with the above roundup() simplification. If a page of that area gets used later on then we are going to split up the PMD anyway. That's fine by me. Yinghai Lu seems to have potentially better solutions, but my head is spinning after reading more of this thread. :) I just want to make sure that at the end of the day, there are no RW+x mappings. But still we want to get rid of that highmap between _brk_end and _end, but there is absolutely no reason to come up with extra silly functions for that. So the obvious solution is to let setup_arch() reserve the memory up to _end instead of _bss_stop, get rid of the extra reservation in reserve_brk() and then let free_initmem() release the area between _brk_end and _end. No extra hackery, no side effects, just works. I spent quite some time to stare into that and I wonder about the following related issues: 1) Why is the mark_rodata_ro() business a debug configuration, i.e CONFIG_DEBUG_RODATA? This does not make any sense at all. We really want RO and NX on by default and AFAICT distros are turning that on anyway for obvious reasons. This is a historically badly named config item. Once arm and arm64 land their CONFIG_DEBUG_RODATA implementations, I was going to suggest having this renamed. The only idiocity I found so far is the kgdb Documentation which recommends to turn it off. Sigh. So that should be changed to: config ARCH_HAS_RONX bool config DISABLE_RONX def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES plus config ARCH_WANTS_KGDB_SECURITY_HOLES bool config KGDB_ENABLE_SECURITY_HOLES bool WTF? depends on BROKEN ARCH_WANTS_KGDB_SECURITY_HOLES 2) What is actually the modules counterpart for mark_rodata_ro()? CONFIG_DEBUG_SET_MODULE_RONX Of course not enabled by default, but enabled by distros again. See #1. Now what's interesting aside of the general fuckup is that CONFIG_DEBUG_RODATA is supported by: arch/x86 and arch/parisc But CONFIG_DEBUG_SET_MODULE_RONX is supported by: arch/arm/ arch/arm64 arch/s390 and arch/x86 I have a series for arm that is waiting to be picked up by rmk: https://patchwork.ozlabs.org/patch/400383/ Laura has been working on a series for arm64: http://comments.gmane.org/gmane.linux.ports.arm.kernel/366777 So what you're seeing is us being in the middle of providing support for it. It just happens that CONFIG_DEBUG_SET_MODULE_RONX is a bit easier to implement, so it was completed first. This does not make any sense at all. Do arm/arm64/s390 have other means to make RO/NX work or are they just doing it for modules? And how is that supposed to work with KGDB if it is not aware of modules sections being RO/NX? KGDB has only extra magic for CONFIG_DEBUG_RODATA, but not for CONFIG_DEBUG_SET_MODULE_RONX. Now for extended fun the x86 help text for that option says: ... Such protection may interfere with run-time code patching and dynamic kernel tracing - and they might also protect against certain classes of kernel exploits. If in doubt, say N. Patently wrong. More sigh. Like the CONFIG naming, I hope to start cleaning these defaults up once arm and arm64 are landed. 3) Why is mark_rodata_ro()
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote: I have a series for arm that is waiting to be picked up by rmk: https://patchwork.ozlabs.org/patch/400383/ It should've been in linux-next via my tree for the last two weeks or so. -- FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Mon, Nov 17, 2014 at 12:32 PM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote: I have a series for arm that is waiting to be picked up by rmk: https://patchwork.ozlabs.org/patch/400383/ It should've been in linux-next via my tree for the last two weeks or so. Fantastic! Thank you! -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Mon, Nov 17, 2014 at 12:27 PM, Kees Cook keesc...@chromium.org wrote: On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner t...@linutronix.de wrote: So the initial patch to get rid of the X mapping is of course to just extend the area to the PMD. A little bit different to your initial patch, but essentially the same. - unsigned long all_end = PFN_ALIGN(_end); + unsigned long all_end = roundup((unsigned long) _end, PMD_SIZE); I'm going to apply your V1 patch with the above roundup() simplification. If a page of that area gets used later on then we are going to split up the PMD anyway. That's fine by me. Yinghai Lu seems to have potentially better solutions, but my head is spinning after reading more of this thread. :) I just want to make sure that at the end of the day, there are no RW+x mappings. Please check v3 that cleanup highmap in the middle. Before patch: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8220 18M ro PSE GLB x pmd 0x8220-0x82c0 10M ro PSE GLB NX pmd 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x8300 2M RW GLB NX pte 0x8300-0x8320 2M RW PSE GLB NX pmd 0x8320-0x8340 2M RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843a20001672K RW GLB NX pte 0x843a2000-0x8440 376K RW GLB x pte 0x8440-0xa000 444M pmd After patch: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8200 16M ro PSE GLB x pmd 0x8200-0x82011000 68K ro GLB x pte 0x82011000-0x82201980K pte 0x8220-0x82a0 8M ro PSE GLB NX pmd 0x82a0-0x82a1d000 116K ro GLB NX pte 0x82a1d000-0x82c01932K pte 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x82e52000 328K RW GLB NX pte 0x82e52000-0x83001720K pte 0x8300-0x8320 2M pmd 0x8320-0x83213000 76K pte 0x83213000-0x83401972K RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843830001548K RW GLB NX pte 0x84383000-0x8440 500K pte 0x8440-0xa000 444M pmd Thanks Yinghai Subject: [PATCH 1/2] x86, 64bit: add pfn_range_is_highmapped() will use it to support holes in highmap. Signed-off-by: Yinghai Lu ying...@kernel.org --- arch/x86/include/asm/pgtable_64.h |2 ++ arch/x86/mm/init_64.c | 22 ++ arch/x86/mm/pageattr.c| 16 +--- 3 files changed, 25 insertions(+), 15 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -375,6 +375,23 @@ void __init init_extra_mapping_uc(unsign __init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE); } +/* three holes at most*/ +#define NR_RANGE 4 +static struct range pfn_highmapped[NR_RANGE]; +static int nr_pfn_highmapped; + +int pfn_range_is_highmapped(unsigned long start_pfn, unsigned long end_pfn) +{ + int i; + + for (i = 0; i nr_pfn_highmapped; i++) + if ((start_pfn = pfn_highmapped[i].start) + (end_pfn = pfn_highmapped[i].end)) + return 1; + + return 0; +} + /* * The head.S code sets up the kernel high mapping: * @@ -409,6 +426,11 @@ void __init cleanup_highmap(void) if (vaddr (unsigned long) _text || vaddr end) set_pmd(pmd, __pmd(0)); } + + nr_pfn_highmapped = add_range(pfn_highmapped, NR_RANGE, + nr_pfn_highmapped, + __pa_symbol(_text) PAGE_SHIFT, + __pa_symbol(roundup(_brk_end, PMD_SIZE)) PAGE_SHIFT); } static unsigned long __meminit Index: linux-2.6/arch/x86/mm/pageattr.c === --- linux-2.6.orig/arch/x86/mm/pageattr.c +++ linux-2.6/arch/x86/mm/pageattr.c @@ -91,20 +91,6 @@ void arch_report_meminfo(struct seq_file
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner wrote: > > _brk_end is adjusted at boot time via extend_brk() up to __brk_limit, > which is the same as _end. We usually do not use all of that space. So > it's expected that _brk_end < _end. > >> Is this correct? It sounded like tglx wanted the pmd split, like this: > > Yes, I wanted to get rid of the high mapping for anything between > _brk_end and _end, and I brought you on the wrong track with my > suggestion to call free_init_pages(). Sorry about that. :) > > That happened because I missed the completely non obvious fact, that > only the effective brk section is reserved for the kernel via > reserve_brk(). So the area between _brk_end and _end is already > reusable. Though that reuse works only by chance and not by design and > is completely undocumented as everything else in that area. where is info for everything else? > > So the initial patch to get rid of the X mapping is of course to just > extend the area to the PMD. A little bit different to your initial > patch, but essentially the same. > > - unsigned long all_end = PFN_ALIGN(&_end); > + unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE); > > I'm going to apply your V1 patch with the above roundup() > simplification. If a page of that area gets used later on then we are > going to split up the PMD anyway. should use _brk_end instead of &_end for the roundup? > > But still we want to get rid of that highmap between _brk_end and > _end, but there is absolutely no reason to come up with extra silly > functions for that. > > So the obvious solution is to let setup_arch() reserve the memory up > to _end instead of _bss_stop, get rid of the extra reservation in > reserve_brk() and then let free_initmem() release the area between > _brk_end and _end. No extra hackery, no side effects, just works. So where get the highmap to be removed? free_init_pages via free_initmem()? with the code change like you suggested, --- arch/x86/kernel/setup.c |6 +- arch/x86/mm/init.c |6 ++ arch/x86/mm/init_64.c |2 +- 3 files changed, 8 insertions(+), 6 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c === --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -286,10 +286,6 @@ static void __init cleanup_highmap(void) static void __init reserve_brk(void) { -if (_brk_end > _brk_start) -memblock_reserve(__pa_symbol(_brk_start), - _brk_end - _brk_start); - /* Mark brk area as locked down and no longer taking any new allocations */ _brk_start = 0; @@ -857,7 +853,7 @@ dump_kernel_offset(struct notifier_block void __init setup_arch(char **cmdline_p) { memblock_reserve(__pa_symbol(_text), - (unsigned long)__bss_stop - (unsigned long)_text); + (unsigned long)_end - (unsigned long)_text); early_reserve_initrd(); Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -1122,7 +1122,7 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) &__end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(&__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(&__end_rodata); -unsigned long all_end = PFN_ALIGN(&_end); +unsigned long all_end = roundup((unsigned long)&_end, PMD_SIZE); printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n", (end - start) >> 10); Index: linux-2.6/arch/x86/mm/init.c === --- linux-2.6.orig/arch/x86/mm/init.c +++ linux-2.6/arch/x86/mm/init.c @@ -637,9 +637,15 @@ void free_init_pages(char *what, unsigne void free_initmem(void) { +unsigned long all_end = roundup((unsigned long)&_end, PMD_SIZE); + free_init_pages("unused kernel", (unsigned long)(&__init_begin), (unsigned long)(&__init_end)); + +#ifdef CONFIG_X86_64 +free_init_pages("unused brk", PFN_ALIGN(_brk_end), all_end); +#endif } #ifdef CONFIG_BLK_DEV_INITRD got [7.942636] Freeing unused brk memory: 500K (84383000 - 8440) ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8220 18M ro PSE GLB x pmd 0x8220-0x82c0 10M ro PSE GLB NX pmd 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x8300 2M RW GLB NX pte 0x8300-0x8320 2M RW PSE GLB NX pmd 0x8320-0x8340 2M RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sun, Nov 16, 2014 at 1:26 PM, Thomas Gleixner wrote: > On Sat, 15 Nov 2014, Yinghai Lu wrote: >> +static pmd_t *last_pmd; >> /* >> * The head.S code sets up the kernel high mapping: >> * >> @@ -408,9 +409,26 @@ void __init cleanup_highmap(void) >> continue; >> if (vaddr < (unsigned long) _text || vaddr > end) >> set_pmd(pmd, __pmd(0)); >> +else >> +last_pmd = pmd; > > Why do you need to store this? You can compute this. I'm not quite sure about the xen path. > >> +static void __init cleanup_highmap_tail(unsigned long addr) > > Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which > will free exactly that code. I missed that. Please check this one that address three problems that you pointed out. Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range 1. should use _brk_end instead of &_end in mark_rodata_ro(). _brk_end can move up to &_end, i.e. to __brk_limit. It's safe to use _brk_end when mark_rodata_ro() is called because extend_brk() is gone already at that point. 2. [_brk_end, pm_end) page range is already converted mem. and is not wasted. 3. add cleanup_highmap_tail for [_brk_end, pm_end). Kernel Layout: [0.00].brk: [0x0437c000-0x043a1fff] Actually used brk: [0.272959] memblock_reserve: [0x000437c000-0x0004382fff] flags 0x0 BRK Before patch: ---[ High Kernel Mapping ]--- ... 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843a20001672K RW GLB NX pte 0x843a2000-0x8440 376K RW GLB x pte 0x8440-0xa000 444M pmd After patch: ---[ High Kernel Mapping ]--- ... 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843830001548K RW GLB NX pte 0x84383000-0x8440 500K pte 0x8440-0xa000 444M pmd -v2: according to tglx caculate the pmd postion instead of passing last_pmd. cleanup_highmap_tail could not have __init, as it is called in mark_rodata_ro and mark_rodata_ro is called after free_initmem. highmap_end_pfn should keep PMD_SIZE alignment on !CONFIG_DEBUG_RODATA Signed-off-by: Yinghai Lu --- arch/x86/mm/init_64.c | 22 +- arch/x86/mm/pageattr.c |4 2 files changed, 25 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -411,6 +411,23 @@ void __init cleanup_highmap(void) } } +static void cleanup_highmap_tail(unsigned long addr) +{ +int i; +pgd_t *pgd; +pud_t *pud; +pmd_t *pmd; +pte_t *pte; + +pgd = pgd_offset_k(addr); +pud = (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr); +pmd = (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr); +pte = (pte_t *)pmd_page_vaddr(*pmd) + pte_index(addr); + +for (i = pte_index(addr); i < PTRS_PER_PTE; i++, pte++) +set_pte(pte, __pte(0)); +} + static unsigned long __meminit phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end, pgprot_t prot) @@ -1124,7 +1141,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) &__end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(&__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(&__end_rodata); -unsigned long all_end = PFN_ALIGN(&_end); +unsigned long all_end = PFN_ALIGN(_brk_end); +unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n", (end - start) >> 10); @@ -1137,6 +1155,8 @@ void mark_rodata_ro(void) * should also be not-executable. */ set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); +if (all_end < pmd_end) +cleanup_highmap_tail(all_end); rodata_test(); Index: linux-2.6/arch/x86/mm/pageattr.c === --- linux-2.6.orig/arch/x86/mm/pageattr.c +++ linux-2.6/arch/x86/mm/pageattr.c @@ -100,7 +100,11 @@ static inline unsigned long highmap_star static inline unsigned long highmap_end_pfn(void) { +#ifdef CONFIG_DEBUG_RODATA +return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT; +#else return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT; +#endif } #endif Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range 1. should use _brk_end instead of &_end in mark_rodata_ro(). _brk_end can move up to &_end, i.e. to __brk_limit. It's safe to use _brk_end when mark_rodata_ro() is called because extend_brk() is gone already at that point. 2. [_brk_end,
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sun, Nov 16, 2014 at 10:52 AM, Thomas Gleixner wrote: >> > >> > Are you sure? For me, _brk_end isn't far enough: >> > >> > [1.475572] all_end: 0x82df5000 >> > [1.476736] _brk_end: 0x82dd6000 >> >> Yes. _brk_end should be small then &_end. > > Wrong. _brk_end can move up to _end, i.e. to __brk_limit. Confused. What is wrong? _brk_end is not smaller than &_end? Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, 14 Nov 2014, Kees Cook wrote: > On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu wrote: > > should use attached one instead. > > > > 1. should use _brk_end instead of , as we only use partial of > >brk. > > 2. [_brk_end, pm_end) page range is already converted. aka > >is not wasted. > > Are you sure? For me, _brk_end isn't far enough: > > [1.475572] all_end: 0x82df5000 > [1.476736] _brk_end: 0x82dd6000 _brk_end is adjusted at boot time via extend_brk() up to __brk_limit, which is the same as _end. We usually do not use all of that space. So it's expected that _brk_end < _end. > Is this correct? It sounded like tglx wanted the pmd split, like this: Yes, I wanted to get rid of the high mapping for anything between _brk_end and _end, and I brought you on the wrong track with my suggestion to call free_init_pages(). Sorry about that. That happened because I missed the completely non obvious fact, that only the effective brk section is reserved for the kernel via reserve_brk(). So the area between _brk_end and _end is already reusable. Though that reuse works only by chance and not by design and is completely undocumented as everything else in that area. So the initial patch to get rid of the X mapping is of course to just extend the area to the PMD. A little bit different to your initial patch, but essentially the same. - unsigned long all_end = PFN_ALIGN(&_end); + unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE); I'm going to apply your V1 patch with the above roundup() simplification. If a page of that area gets used later on then we are going to split up the PMD anyway. But still we want to get rid of that highmap between _brk_end and _end, but there is absolutely no reason to come up with extra silly functions for that. So the obvious solution is to let setup_arch() reserve the memory up to _end instead of _bss_stop, get rid of the extra reservation in reserve_brk() and then let free_initmem() release the area between _brk_end and _end. No extra hackery, no side effects, just works. I spent quite some time to stare into that and I wonder about the following related issues: 1) Why is the mark_rodata_ro() business a debug configuration, i.e CONFIG_DEBUG_RODATA? This does not make any sense at all. We really want RO and NX on by default and AFAICT distros are turning that on anyway for obvious reasons. The only idiocity I found so far is the kgdb Documentation which recommends to turn it off. Sigh. So that should be changed to: config ARCH_HAS_RONX bool config DISABLE_RONX def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES plus config ARCH_WANTS_KGDB_SECURITY_HOLES bool config KGDB_ENABLE_SECURITY_HOLES bool "WTF?" depends on BROKEN && ARCH_WANTS_KGDB_SECURITY_HOLES 2) What is actually the modules counterpart for mark_rodata_ro()? CONFIG_DEBUG_SET_MODULE_RONX Of course not enabled by default, but enabled by distros again. See #1. Now what's interesting aside of the general fuckup is that CONFIG_DEBUG_RODATA is supported by: arch/x86 and arch/parisc But CONFIG_DEBUG_SET_MODULE_RONX is supported by: arch/arm/ arch/arm64 arch/s390 and arch/x86 This does not make any sense at all. Do arm/arm64/s390 have other means to make RO/NX work or are they just doing it for modules? And how is that supposed to work with KGDB if it is not aware of modules sections being RO/NX? KGDB has only extra magic for CONFIG_DEBUG_RODATA, but not for CONFIG_DEBUG_SET_MODULE_RONX. Now for extended fun the x86 help text for that option says: ... Such protection may interfere with run-time code patching and dynamic kernel tracing - and they might also protect against certain classes of kernel exploits. If in doubt, say "N". Patently wrong. More sigh. 3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor cannot be marked __init ? Just because ... Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sat, 15 Nov 2014, Yinghai Lu wrote: > +static pmd_t *last_pmd; > /* > * The head.S code sets up the kernel high mapping: > * > @@ -408,9 +409,26 @@ void __init cleanup_highmap(void) > continue; > if (vaddr < (unsigned long) _text || vaddr > end) > set_pmd(pmd, __pmd(0)); > +else > +last_pmd = pmd; Why do you need to store this? You can compute this. > +static void __init cleanup_highmap_tail(unsigned long addr) Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which will free exactly that code. > +{ > +int i; > +pte_t *pte; > + > +if (!last_pmd || pmd_none(*last_pmd)) > +return; Aside of the above: This is complete and utter crap. Just another useless function which can be avoided by switching on brain before coding. > Index: linux-2.6/arch/x86/mm/pageattr.c > === > --- linux-2.6.orig/arch/x86/mm/pageattr.c > +++ linux-2.6/arch/x86/mm/pageattr.c > @@ -100,7 +100,7 @@ static inline unsigned long highmap_star > > static inline unsigned long highmap_end_pfn(void) > { > -return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT; > +return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT; > } Wrong as well for CONFIG_DEBUG_RODATA=n for obvious reasons. I know why I have your patches on blacklist Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, 14 Nov 2014, Yinghai Lu wrote: > On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook wrote: > > On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu wrote: > > >> should use attached one instead. > >> > >> 1. should use _brk_end instead of , as we only use partial of > >>brk. > >> 2. [_brk_end, pm_end) page range is already converted. aka > >>is not wasted. > > > > Are you sure? For me, _brk_end isn't far enough: > > > > [1.475572] all_end: 0x82df5000 > > [1.476736] _brk_end: 0x82dd6000 > > Yes. _brk_end should be small then &_end. Wrong. _brk_end can move up to _end, i.e. to __brk_limit. But it's safe to use _brk_end when mark_rodata_ro() is called because extend_brk() is gone already at that point. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, 14 Nov 2014, Yinghai Lu wrote: On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook keesc...@chromium.org wrote: On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote: should use attached one instead. 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. Are you sure? For me, _brk_end isn't far enough: [1.475572] all_end: 0x82df5000 [1.476736] _brk_end: 0x82dd6000 Yes. _brk_end should be small then _end. Wrong. _brk_end can move up to _end, i.e. to __brk_limit. But it's safe to use _brk_end when mark_rodata_ro() is called because extend_brk() is gone already at that point. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sat, 15 Nov 2014, Yinghai Lu wrote: +static pmd_t *last_pmd; /* * The head.S code sets up the kernel high mapping: * @@ -408,9 +409,26 @@ void __init cleanup_highmap(void) continue; if (vaddr (unsigned long) _text || vaddr end) set_pmd(pmd, __pmd(0)); +else +last_pmd = pmd; Why do you need to store this? You can compute this. +static void __init cleanup_highmap_tail(unsigned long addr) Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which will free exactly that code. +{ +int i; +pte_t *pte; + +if (!last_pmd || pmd_none(*last_pmd)) +return; Aside of the above: This is complete and utter crap. Just another useless function which can be avoided by switching on brain before coding. Index: linux-2.6/arch/x86/mm/pageattr.c === --- linux-2.6.orig/arch/x86/mm/pageattr.c +++ linux-2.6/arch/x86/mm/pageattr.c @@ -100,7 +100,7 @@ static inline unsigned long highmap_star static inline unsigned long highmap_end_pfn(void) { -return __pa_symbol(roundup(_brk_end, PMD_SIZE)) PAGE_SHIFT; +return __pa_symbol(PFN_ALIGN(_brk_end)) PAGE_SHIFT; } Wrong as well for CONFIG_DEBUG_RODATA=n for obvious reasons. I know why I have your patches on blacklist Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, 14 Nov 2014, Kees Cook wrote: On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote: should use attached one instead. 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. Are you sure? For me, _brk_end isn't far enough: [1.475572] all_end: 0x82df5000 [1.476736] _brk_end: 0x82dd6000 _brk_end is adjusted at boot time via extend_brk() up to __brk_limit, which is the same as _end. We usually do not use all of that space. So it's expected that _brk_end _end. Is this correct? It sounded like tglx wanted the pmd split, like this: Yes, I wanted to get rid of the high mapping for anything between _brk_end and _end, and I brought you on the wrong track with my suggestion to call free_init_pages(). Sorry about that. That happened because I missed the completely non obvious fact, that only the effective brk section is reserved for the kernel via reserve_brk(). So the area between _brk_end and _end is already reusable. Though that reuse works only by chance and not by design and is completely undocumented as everything else in that area. So the initial patch to get rid of the X mapping is of course to just extend the area to the PMD. A little bit different to your initial patch, but essentially the same. - unsigned long all_end = PFN_ALIGN(_end); + unsigned long all_end = roundup((unsigned long) _end, PMD_SIZE); I'm going to apply your V1 patch with the above roundup() simplification. If a page of that area gets used later on then we are going to split up the PMD anyway. But still we want to get rid of that highmap between _brk_end and _end, but there is absolutely no reason to come up with extra silly functions for that. So the obvious solution is to let setup_arch() reserve the memory up to _end instead of _bss_stop, get rid of the extra reservation in reserve_brk() and then let free_initmem() release the area between _brk_end and _end. No extra hackery, no side effects, just works. I spent quite some time to stare into that and I wonder about the following related issues: 1) Why is the mark_rodata_ro() business a debug configuration, i.e CONFIG_DEBUG_RODATA? This does not make any sense at all. We really want RO and NX on by default and AFAICT distros are turning that on anyway for obvious reasons. The only idiocity I found so far is the kgdb Documentation which recommends to turn it off. Sigh. So that should be changed to: config ARCH_HAS_RONX bool config DISABLE_RONX def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES plus config ARCH_WANTS_KGDB_SECURITY_HOLES bool config KGDB_ENABLE_SECURITY_HOLES bool WTF? depends on BROKEN ARCH_WANTS_KGDB_SECURITY_HOLES 2) What is actually the modules counterpart for mark_rodata_ro()? CONFIG_DEBUG_SET_MODULE_RONX Of course not enabled by default, but enabled by distros again. See #1. Now what's interesting aside of the general fuckup is that CONFIG_DEBUG_RODATA is supported by: arch/x86 and arch/parisc But CONFIG_DEBUG_SET_MODULE_RONX is supported by: arch/arm/ arch/arm64 arch/s390 and arch/x86 This does not make any sense at all. Do arm/arm64/s390 have other means to make RO/NX work or are they just doing it for modules? And how is that supposed to work with KGDB if it is not aware of modules sections being RO/NX? KGDB has only extra magic for CONFIG_DEBUG_RODATA, but not for CONFIG_DEBUG_SET_MODULE_RONX. Now for extended fun the x86 help text for that option says: ... Such protection may interfere with run-time code patching and dynamic kernel tracing - and they might also protect against certain classes of kernel exploits. If in doubt, say N. Patently wrong. More sigh. 3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor cannot be marked __init ? Just because ... Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sun, Nov 16, 2014 at 10:52 AM, Thomas Gleixner t...@linutronix.de wrote: Are you sure? For me, _brk_end isn't far enough: [1.475572] all_end: 0x82df5000 [1.476736] _brk_end: 0x82dd6000 Yes. _brk_end should be small then _end. Wrong. _brk_end can move up to _end, i.e. to __brk_limit. Confused. What is wrong? _brk_end is not smaller than _end? Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sun, Nov 16, 2014 at 1:26 PM, Thomas Gleixner t...@linutronix.de wrote: On Sat, 15 Nov 2014, Yinghai Lu wrote: +static pmd_t *last_pmd; /* * The head.S code sets up the kernel high mapping: * @@ -408,9 +409,26 @@ void __init cleanup_highmap(void) continue; if (vaddr (unsigned long) _text || vaddr end) set_pmd(pmd, __pmd(0)); +else +last_pmd = pmd; Why do you need to store this? You can compute this. I'm not quite sure about the xen path. +static void __init cleanup_highmap_tail(unsigned long addr) Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which will free exactly that code. I missed that. Please check this one that address three problems that you pointed out. Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range 1. should use _brk_end instead of _end in mark_rodata_ro(). _brk_end can move up to _end, i.e. to __brk_limit. It's safe to use _brk_end when mark_rodata_ro() is called because extend_brk() is gone already at that point. 2. [_brk_end, pm_end) page range is already converted mem. and is not wasted. 3. add cleanup_highmap_tail for [_brk_end, pm_end). Kernel Layout: [0.00].brk: [0x0437c000-0x043a1fff] Actually used brk: [0.272959] memblock_reserve: [0x000437c000-0x0004382fff] flags 0x0 BRK Before patch: ---[ High Kernel Mapping ]--- ... 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843a20001672K RW GLB NX pte 0x843a2000-0x8440 376K RW GLB x pte 0x8440-0xa000 444M pmd After patch: ---[ High Kernel Mapping ]--- ... 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843830001548K RW GLB NX pte 0x84383000-0x8440 500K pte 0x8440-0xa000 444M pmd -v2: according to tglx caculate the pmd postion instead of passing last_pmd. cleanup_highmap_tail could not have __init, as it is called in mark_rodata_ro and mark_rodata_ro is called after free_initmem. highmap_end_pfn should keep PMD_SIZE alignment on !CONFIG_DEBUG_RODATA Signed-off-by: Yinghai Lu ying...@kernel.org --- arch/x86/mm/init_64.c | 22 +- arch/x86/mm/pageattr.c |4 2 files changed, 25 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -411,6 +411,23 @@ void __init cleanup_highmap(void) } } +static void cleanup_highmap_tail(unsigned long addr) +{ +int i; +pgd_t *pgd; +pud_t *pud; +pmd_t *pmd; +pte_t *pte; + +pgd = pgd_offset_k(addr); +pud = (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr); +pmd = (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr); +pte = (pte_t *)pmd_page_vaddr(*pmd) + pte_index(addr); + +for (i = pte_index(addr); i PTRS_PER_PTE; i++, pte++) +set_pte(pte, __pte(0)); +} + static unsigned long __meminit phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end, pgprot_t prot) @@ -1124,7 +1141,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) __end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); -unsigned long all_end = PFN_ALIGN(_end); +unsigned long all_end = PFN_ALIGN(_brk_end); +unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); @@ -1137,6 +1155,8 @@ void mark_rodata_ro(void) * should also be not-executable. */ set_memory_nx(rodata_start, (all_end - rodata_start) PAGE_SHIFT); +if (all_end pmd_end) +cleanup_highmap_tail(all_end); rodata_test(); Index: linux-2.6/arch/x86/mm/pageattr.c === --- linux-2.6.orig/arch/x86/mm/pageattr.c +++ linux-2.6/arch/x86/mm/pageattr.c @@ -100,7 +100,11 @@ static inline unsigned long highmap_star static inline unsigned long highmap_end_pfn(void) { +#ifdef CONFIG_DEBUG_RODATA +return __pa_symbol(PFN_ALIGN(_brk_end)) PAGE_SHIFT; +#else return __pa_symbol(roundup(_brk_end, PMD_SIZE)) PAGE_SHIFT; +#endif } #endif Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range 1. should use _brk_end instead of _end in mark_rodata_ro(). _brk_end can move up to _end, i.e. to __brk_limit. It's safe to use _brk_end when mark_rodata_ro() is called because extend_brk() is gone already at that point. 2. [_brk_end, pm_end) page
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner t...@linutronix.de wrote: _brk_end is adjusted at boot time via extend_brk() up to __brk_limit, which is the same as _end. We usually do not use all of that space. So it's expected that _brk_end _end. Is this correct? It sounded like tglx wanted the pmd split, like this: Yes, I wanted to get rid of the high mapping for anything between _brk_end and _end, and I brought you on the wrong track with my suggestion to call free_init_pages(). Sorry about that. :) That happened because I missed the completely non obvious fact, that only the effective brk section is reserved for the kernel via reserve_brk(). So the area between _brk_end and _end is already reusable. Though that reuse works only by chance and not by design and is completely undocumented as everything else in that area. where is info for everything else? So the initial patch to get rid of the X mapping is of course to just extend the area to the PMD. A little bit different to your initial patch, but essentially the same. - unsigned long all_end = PFN_ALIGN(_end); + unsigned long all_end = roundup((unsigned long) _end, PMD_SIZE); I'm going to apply your V1 patch with the above roundup() simplification. If a page of that area gets used later on then we are going to split up the PMD anyway. should use _brk_end instead of _end for the roundup? But still we want to get rid of that highmap between _brk_end and _end, but there is absolutely no reason to come up with extra silly functions for that. So the obvious solution is to let setup_arch() reserve the memory up to _end instead of _bss_stop, get rid of the extra reservation in reserve_brk() and then let free_initmem() release the area between _brk_end and _end. No extra hackery, no side effects, just works. So where get the highmap to be removed? free_init_pages via free_initmem()? with the code change like you suggested, --- arch/x86/kernel/setup.c |6 +- arch/x86/mm/init.c |6 ++ arch/x86/mm/init_64.c |2 +- 3 files changed, 8 insertions(+), 6 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c === --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -286,10 +286,6 @@ static void __init cleanup_highmap(void) static void __init reserve_brk(void) { -if (_brk_end _brk_start) -memblock_reserve(__pa_symbol(_brk_start), - _brk_end - _brk_start); - /* Mark brk area as locked down and no longer taking any new allocations */ _brk_start = 0; @@ -857,7 +853,7 @@ dump_kernel_offset(struct notifier_block void __init setup_arch(char **cmdline_p) { memblock_reserve(__pa_symbol(_text), - (unsigned long)__bss_stop - (unsigned long)_text); + (unsigned long)_end - (unsigned long)_text); early_reserve_initrd(); Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -1122,7 +1122,7 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) __end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); -unsigned long all_end = PFN_ALIGN(_end); +unsigned long all_end = roundup((unsigned long)_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); Index: linux-2.6/arch/x86/mm/init.c === --- linux-2.6.orig/arch/x86/mm/init.c +++ linux-2.6/arch/x86/mm/init.c @@ -637,9 +637,15 @@ void free_init_pages(char *what, unsigne void free_initmem(void) { +unsigned long all_end = roundup((unsigned long)_end, PMD_SIZE); + free_init_pages(unused kernel, (unsigned long)(__init_begin), (unsigned long)(__init_end)); + +#ifdef CONFIG_X86_64 +free_init_pages(unused brk, PFN_ALIGN(_brk_end), all_end); +#endif } #ifdef CONFIG_BLK_DEV_INITRD got [7.942636] Freeing unused brk memory: 500K (84383000 - 8440) ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8220 18M ro PSE GLB x pmd 0x8220-0x82c0 10M ro PSE GLB NX pmd 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x8300 2M RW GLB NX pte 0x8300-0x8320 2M RW PSE GLB NX pmd 0x8320-0x8340 2M RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 7:38 PM, Yinghai Lu wrote: > On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook wrote: >> Is this correct? It sounded like tglx wanted the pmd split, like this: >> >> 0x8220-0x82c010M RW PSE GLB NX pmd >> 0x82c0-0x82df5000 2004K RW GLB NX pte >> 0x82df5000-0x82e044K RW NX pte >> 0x82e0-0xc000 978M pmd > > Need to remove GLB ? Please check attached one that clean up the highmap tail... Subject: [RFC PATCH] x86, 64bit: cleanup highmap tail near partial 2M range 1. should use _brk_end instead of , as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted mem. and is not wasted. 3. add cleanup_highmap_tail for [_brk_end, pm_end). Kernel Layout: [0.00] .text: [0x0100-0x0200d5c8] [0.00] .rodata: [0x0220-0x02a1cfff] [0.00] .data: [0x02c0-0x02e50e7f] [0.00] .init: [0x02e52000-0x03212fff] [0.00].bss: [0x03221000-0x0437bfff] [0.00].brk: [0x0437c000-0x043a1fff] Actually used brk: [0.272959] memblock_reserve: [0x000437c000-0x0004382fff] flags 0x0 BRK Before patch: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8220 18M ro PSE GLB x pmd 0x8220-0x82c0 10M ro PSE GLB NX pmd 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x8300 2M RW GLB NX pte 0x8300-0x8320 2M RW PSE GLB NX pmd 0x8320-0x8340 2M RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843a20001672K RW GLB NX pte 0x843a2000-0x8440 376K RW GLB x pte 0x8440-0xa000 444M pmd After patch: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8220 18M ro PSE GLB x pmd 0x8220-0x82c0 10M ro PSE GLB NX pmd 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x8300 2M RW GLB NX pte 0x8300-0x8320 2M RW PSE GLB NX pmd 0x8320-0x8340 2M RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843830001548K RW GLB NX pte 0x84383000-0x8440 500K pte 0x8440-0xa000 444M pmd Signed-off-by: Yinghai Lu --- arch/x86/mm/init_64.c | 23 ++- arch/x86/mm/pageattr.c |2 +- 2 files changed, 23 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -375,6 +375,7 @@ void __init init_extra_mapping_uc(unsign __init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE); } +static pmd_t *last_pmd; /* * The head.S code sets up the kernel high mapping: * @@ -408,9 +409,26 @@ void __init cleanup_highmap(void) continue; if (vaddr < (unsigned long) _text || vaddr > end) set_pmd(pmd, __pmd(0)); +else +last_pmd = pmd; } } +static void __init cleanup_highmap_tail(unsigned long addr) +{ +int i; +pte_t *pte; + +if (!last_pmd || pmd_none(*last_pmd)) +return; + +pte = (pte_t *)pmd_page_vaddr(*last_pmd); +pte += pte_index(addr); + +for (i = pte_index(addr); i < PTRS_PER_PTE; i++, pte++) +set_pte(pte, __pte(0)); +} + static unsigned long __meminit phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end, pgprot_t prot) @@ -1124,7 +1142,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) &__end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(&__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(&__end_rodata); -unsigned long all_end = PFN_ALIGN(&_end); +unsigned long all_end = PFN_ALIGN(_brk_end); +unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n", (end - start) >> 10); @@ -1137,6 +1156,8 @@ void mark_rodata_ro(void) * should also be not-executable. */
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 7:38 PM, Yinghai Lu ying...@kernel.org wrote: On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook keesc...@chromium.org wrote: Is this correct? It sounded like tglx wanted the pmd split, like this: 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW NX pte 0x82e0-0xc000 978M pmd Need to remove GLB ? Please check attached one that clean up the highmap tail... Subject: [RFC PATCH] x86, 64bit: cleanup highmap tail near partial 2M range 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted mem. and is not wasted. 3. add cleanup_highmap_tail for [_brk_end, pm_end). Kernel Layout: [0.00] .text: [0x0100-0x0200d5c8] [0.00] .rodata: [0x0220-0x02a1cfff] [0.00] .data: [0x02c0-0x02e50e7f] [0.00] .init: [0x02e52000-0x03212fff] [0.00].bss: [0x03221000-0x0437bfff] [0.00].brk: [0x0437c000-0x043a1fff] Actually used brk: [0.272959] memblock_reserve: [0x000437c000-0x0004382fff] flags 0x0 BRK Before patch: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8220 18M ro PSE GLB x pmd 0x8220-0x82c0 10M ro PSE GLB NX pmd 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x8300 2M RW GLB NX pte 0x8300-0x8320 2M RW PSE GLB NX pmd 0x8320-0x8340 2M RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843a20001672K RW GLB NX pte 0x843a2000-0x8440 376K RW GLB x pte 0x8440-0xa000 444M pmd After patch: ---[ High Kernel Mapping ]--- 0x8000-0x8100 16M pmd 0x8100-0x8220 18M ro PSE GLB x pmd 0x8220-0x82c0 10M ro PSE GLB NX pmd 0x82c0-0x82e0 2M RW PSE GLB NX pmd 0x82e0-0x8300 2M RW GLB NX pte 0x8300-0x8320 2M RW PSE GLB NX pmd 0x8320-0x8340 2M RW GLB NX pte 0x8340-0x8420 14M RW PSE GLB NX pmd 0x8420-0x843830001548K RW GLB NX pte 0x84383000-0x8440 500K pte 0x8440-0xa000 444M pmd Signed-off-by: Yinghai Lu ying...@kernel.org --- arch/x86/mm/init_64.c | 23 ++- arch/x86/mm/pageattr.c |2 +- 2 files changed, 23 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -375,6 +375,7 @@ void __init init_extra_mapping_uc(unsign __init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE); } +static pmd_t *last_pmd; /* * The head.S code sets up the kernel high mapping: * @@ -408,9 +409,26 @@ void __init cleanup_highmap(void) continue; if (vaddr (unsigned long) _text || vaddr end) set_pmd(pmd, __pmd(0)); +else +last_pmd = pmd; } } +static void __init cleanup_highmap_tail(unsigned long addr) +{ +int i; +pte_t *pte; + +if (!last_pmd || pmd_none(*last_pmd)) +return; + +pte = (pte_t *)pmd_page_vaddr(*last_pmd); +pte += pte_index(addr); + +for (i = pte_index(addr); i PTRS_PER_PTE; i++, pte++) +set_pte(pte, __pte(0)); +} + static unsigned long __meminit phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end, pgprot_t prot) @@ -1124,7 +1142,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) __end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); -unsigned long all_end = PFN_ALIGN(_end); +unsigned long all_end = PFN_ALIGN(_brk_end); +unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); @@ -1137,6 +1156,8 @@ void mark_rodata_ro(void) * should also be
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook wrote: > On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu wrote: >> should use attached one instead. >> >> 1. should use _brk_end instead of , as we only use partial of >>brk. >> 2. [_brk_end, pm_end) page range is already converted. aka >>is not wasted. > > Are you sure? For me, _brk_end isn't far enough: > > [1.475572] all_end: 0x82df5000 > [1.476736] _brk_end: 0x82dd6000 Yes. _brk_end should be small then &_end. > >> >> --- >> arch/x86/mm/init_64.c |6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> Index: linux-2.6/arch/x86/mm/init_64.c >> === >> --- linux-2.6.orig/arch/x86/mm/init_64.c >> +++ linux-2.6/arch/x86/mm/init_64.c >> @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void) >> unsigned long end = (unsigned long) &__end_rodata_hpage_align; >> unsigned long text_end = PFN_ALIGN(&__stop___ex_table); >> unsigned long rodata_end = PFN_ALIGN(&__end_rodata); >> -unsigned long all_end = PFN_ALIGN(&_end); >> +unsigned long all_end = PFN_ALIGN(_brk_end); >> +unsigned long pmd_end = roundup(all_end, PMD_SIZE); >> >> printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n", >> (end - start) >> 10); >> @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void) >> * The rodata/data/bss/brk section (but not the kernel text!) >> * should also be not-executable. >> */ >> -set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); >> +set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT); >> >> rodata_test(); >> >> @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void) >> set_memory_ro(start, (end-start) >> PAGE_SHIFT); >> #endif >> >> +/* all_end to pmd_end is handled via free_all_bootmem() */ >> free_init_pages("unused kernel", >> (unsigned long) __va(__pa_symbol(text_end)), >> (unsigned long) __va(__pa_symbol(rodata_start))); > > This patch produces the same results as my v1 patch: > > 0x8202d000-0x8220 1868K RW GLB NX pte > 0x8220-0x82e012M RW PSE GLB NX pmd > 0x82e0-0xc000 978M pmd > > Is this correct? It sounded like tglx wanted the pmd split, like this: > > 0x8220-0x82c010M RW PSE GLB NX pmd > 0x82c0-0x82df5000 2004K RW GLB NX pte > 0x82df5000-0x82e044K RW NX pte > 0x82e0-0xc000 978M pmd Need to remove GLB ? Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 7:06 PM, Kees Cook wrote: > On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu wrote: >> >> [0.00] .text: [0x0100-0x0200d548] >> [0.00] .rodata: [0x0220-0x02a1cfff] >> [0.00] .data: [0x02c0-0x02e50e7f] >> [0.00] .init: [0x02e52000-0x03212fff] >> [0.00].bss: [0x03221000-0x0437bfff] >> [0.00].brk: [0x0437c000-0x043a1fff] > > And which CONFIG turns on this reporting? > My local patch. this brk does not shrink yet. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu wrote: > On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook wrote: >> When setting up permissions on kernel memory at boot, the end of the >> PMD that was split from bss remained executable. It should be NX like >> the rest. This performs a PMD alignment instead of a PAGE alignment to >> get the correct span of memory, and should be freed. >> >> Before: >> ---[ High Kernel Mapping ]--- >> ... >> 0x8202d000-0x8220 1868K RW GLB NX pte >> 0x8220-0x82c010M RW PSE GLB NX pmd >> 0x82c0-0x82df5000 2004K RW GLB NX pte >> 0x82df5000-0x82e044K RW GLB x pte >> 0x82e0-0xc000 978M pmd >> >> After: >> ---[ High Kernel Mapping ]--- >> ... >> 0x8202d000-0x8220 1868K RW GLB NX pte >> 0x8220-0x82c010M RW PSE GLB NX pmd >> 0x82c0-0x82df5000 2004K RW GLB NX pte >> 0x82df5000-0x82e044K RW NX pte >> 0x82e0-0xc000 978M pmd >> >> Signed-off-by: Kees Cook >> --- >> v2: >> - added call to free_init_pages(), as suggested by tglx >> --- >> arch/x86/mm/init_64.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c >> index 4cb8763868fc..0d498c922668 100644 >> --- a/arch/x86/mm/init_64.c >> +++ b/arch/x86/mm/init_64.c >> @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void) >> unsigned long text_end = PFN_ALIGN(&__stop___ex_table); >> unsigned long rodata_end = PFN_ALIGN(&__end_rodata); >> unsigned long all_end = PFN_ALIGN(&_end); >> + unsigned long pmd_end = roundup(all_end, PMD_SIZE); >> >> printk(KERN_INFO "Write protecting the kernel read-only data: >> %luk\n", >>(end - start) >> 10); >> @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void) >> * The rodata/data/bss/brk section (but not the kernel text!) >> * should also be not-executable. >> */ >> - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); >> + set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT); >> >> rodata_test(); >> >> @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void) >> set_memory_ro(start, (end-start) >> PAGE_SHIFT); >> #endif >> >> + free_init_pages("unused kernel", all_end, pmd_end); >> free_init_pages("unused kernel", >> (unsigned long) __va(__pa_symbol(text_end)), >> (unsigned long) __va(__pa_symbol(rodata_start))); > > something is wrong: > > [7.842479] Freeing unused kernel memory: 3844K (82e52000 - > 83213000) > [7.843305] Write protecting the kernel read-only data: 28672k > [7.844433] BUG: Bad page state in process swapper/0 pfn:043c0 > [7.845093] page:ea10f000 count:0 mapcount:-127 mapping: > (null) index:0x2 > [7.846388] flags: 0x10() > [7.846871] page dumped because: nonzero mapcount > [7.847343] Modules linked in: > [7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted > 3.18.0-rc4-yh-01896-g40204c8-dirty #23 > [7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org > 04/01/2014 > [7.850014] 828300ca 880078babd68 81ff47d0 > 0001 > [7.850857] ea10f000 880078babd98 8118c2bd > 001d4cc0 > [7.851791] ea10f000 ea10f000 > 880078babdf8 > [7.852700] Call Trace: > [7.852991] [] dump_stack+0x45/0x57 > [7.853494] [] bad_page+0xfd/0x130 > [7.854130] [] free_pages_prepare+0x13c/0x1c0 > [7.854808] [] ? adjust_managed_page_count+0x5d/0x70 > [7.855575] [] free_hot_cold_page+0x35/0x180 > [7.856326] [] __free_pages+0x13/0x40 > [7.856854] [] free_reserved_area+0xcd/0x140 > [7.857442] [] free_init_pages+0x98/0xb0 > [7.858001] [] mark_rodata_ro+0xb5/0x120 > [7.858622] [] ? rest_init+0xc0/0xc0 > [7.859174] [] kernel_init+0x1d/0x100 > [7.859724] [] ret_from_fork+0x7c/0xb0 > [7.860279] [] ? rest_init+0xc0/0xc0 > [7.860836] Disabling lock debugging due to kernel taint > [7.861432] Freeing unused kernel memory: 376K (843a2000 - > 8440) > [7.866118] Freeing unused kernel memory: 1980K (880002011000 - > 88000220) > [7.870525] Freeing unused kernel memory: 1932K (880002a1d000 - > 880002c0) Also, what tree is this? "Freeing %s" went away in c88442ec45f30d587b38b935a14acde4e217a926 (and should probably be re-added, which is what I assume has happened.) > > [0.00] .text: [0x0100-0x0200d548] > [0.00] .rodata:
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu wrote: > On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu wrote: >> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook wrote: >>> v2: >>> - added call to free_init_pages(), as suggested by tglx > >> something is wrong: >> >> [7.842479] Freeing unused kernel memory: 3844K (82e52000 - >> 83213000) >> [7.843305] Write protecting the kernel read-only data: 28672k > > > should use attached one instead. > > 1. should use _brk_end instead of , as we only use partial of >brk. > 2. [_brk_end, pm_end) page range is already converted. aka >is not wasted. Are you sure? For me, _brk_end isn't far enough: [1.475572] all_end: 0x82df5000 [1.476736] _brk_end: 0x82dd6000 > > --- > arch/x86/mm/init_64.c |6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > Index: linux-2.6/arch/x86/mm/init_64.c > === > --- linux-2.6.orig/arch/x86/mm/init_64.c > +++ linux-2.6/arch/x86/mm/init_64.c > @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void) > unsigned long end = (unsigned long) &__end_rodata_hpage_align; > unsigned long text_end = PFN_ALIGN(&__stop___ex_table); > unsigned long rodata_end = PFN_ALIGN(&__end_rodata); > -unsigned long all_end = PFN_ALIGN(&_end); > +unsigned long all_end = PFN_ALIGN(_brk_end); > +unsigned long pmd_end = roundup(all_end, PMD_SIZE); > > printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n", > (end - start) >> 10); > @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void) > * The rodata/data/bss/brk section (but not the kernel text!) > * should also be not-executable. > */ > -set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); > +set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT); > > rodata_test(); > > @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void) > set_memory_ro(start, (end-start) >> PAGE_SHIFT); > #endif > > +/* all_end to pmd_end is handled via free_all_bootmem() */ > free_init_pages("unused kernel", > (unsigned long) __va(__pa_symbol(text_end)), > (unsigned long) __va(__pa_symbol(rodata_start))); This patch produces the same results as my v1 patch: 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82e012M RW PSE GLB NX pmd 0x82e0-0xc000 978M pmd Is this correct? It sounded like tglx wanted the pmd split, like this: 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW NX pte 0x82e0-0xc000 978M pmd -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu wrote: > On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook wrote: >> v2: >> - added call to free_init_pages(), as suggested by tglx > something is wrong: > > [7.842479] Freeing unused kernel memory: 3844K (82e52000 - > 83213000) > [7.843305] Write protecting the kernel read-only data: 28672k should use attached one instead. 1. should use _brk_end instead of , as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. --- arch/x86/mm/init_64.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) &__end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(&__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(&__end_rodata); -unsigned long all_end = PFN_ALIGN(&_end); +unsigned long all_end = PFN_ALIGN(_brk_end); +unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n", (end - start) >> 10); @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void) * The rodata/data/bss/brk section (but not the kernel text!) * should also be not-executable. */ -set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); +set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT); rodata_test(); @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void) set_memory_ro(start, (end-start) >> PAGE_SHIFT); #endif +/* all_end to pmd_end is handled via free_all_bootmem() */ free_init_pages("unused kernel", (unsigned long) __va(__pa_symbol(text_end)), (unsigned long) __va(__pa_symbol(rodata_start))); 1. should use _brk_end instead of , as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. --- arch/x86/mm/init_64.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) &__end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(&__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(&__end_rodata); - unsigned long all_end = PFN_ALIGN(&_end); + unsigned long all_end = PFN_ALIGN(_brk_end); + unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n", (end - start) >> 10); @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void) * The rodata/data/bss/brk section (but not the kernel text!) * should also be not-executable. */ - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); + set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT); rodata_test(); @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void) set_memory_ro(start, (end-start) >> PAGE_SHIFT); #endif + /* all_end to pmd_end is handled via free_all_bootmem() */ free_init_pages("unused kernel", (unsigned long) __va(__pa_symbol(text_end)), (unsigned long) __va(__pa_symbol(rodata_start)));
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook wrote: > When setting up permissions on kernel memory at boot, the end of the > PMD that was split from bss remained executable. It should be NX like > the rest. This performs a PMD alignment instead of a PAGE alignment to > get the correct span of memory, and should be freed. > > Before: > ---[ High Kernel Mapping ]--- > ... > 0x8202d000-0x8220 1868K RW GLB NX pte > 0x8220-0x82c010M RW PSE GLB NX pmd > 0x82c0-0x82df5000 2004K RW GLB NX pte > 0x82df5000-0x82e044K RW GLB x pte > 0x82e0-0xc000 978M pmd > > After: > ---[ High Kernel Mapping ]--- > ... > 0x8202d000-0x8220 1868K RW GLB NX pte > 0x8220-0x82c010M RW PSE GLB NX pmd > 0x82c0-0x82df5000 2004K RW GLB NX pte > 0x82df5000-0x82e044K RW NX pte > 0x82e0-0xc000 978M pmd > > Signed-off-by: Kees Cook > --- > v2: > - added call to free_init_pages(), as suggested by tglx > --- > arch/x86/mm/init_64.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index 4cb8763868fc..0d498c922668 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void) > unsigned long text_end = PFN_ALIGN(&__stop___ex_table); > unsigned long rodata_end = PFN_ALIGN(&__end_rodata); > unsigned long all_end = PFN_ALIGN(&_end); > + unsigned long pmd_end = roundup(all_end, PMD_SIZE); > > printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n", >(end - start) >> 10); > @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void) > * The rodata/data/bss/brk section (but not the kernel text!) > * should also be not-executable. > */ > - set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT); > + set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT); > > rodata_test(); > > @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void) > set_memory_ro(start, (end-start) >> PAGE_SHIFT); > #endif > > + free_init_pages("unused kernel", all_end, pmd_end); > free_init_pages("unused kernel", > (unsigned long) __va(__pa_symbol(text_end)), > (unsigned long) __va(__pa_symbol(rodata_start))); something is wrong: [7.842479] Freeing unused kernel memory: 3844K (82e52000 - 83213000) [7.843305] Write protecting the kernel read-only data: 28672k [7.844433] BUG: Bad page state in process swapper/0 pfn:043c0 [7.845093] page:ea10f000 count:0 mapcount:-127 mapping: (null) index:0x2 [7.846388] flags: 0x10() [7.846871] page dumped because: nonzero mapcount [7.847343] Modules linked in: [7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc4-yh-01896-g40204c8-dirty #23 [7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [7.850014] 828300ca 880078babd68 81ff47d0 0001 [7.850857] ea10f000 880078babd98 8118c2bd 001d4cc0 [7.851791] ea10f000 ea10f000 880078babdf8 [7.852700] Call Trace: [7.852991] [] dump_stack+0x45/0x57 [7.853494] [] bad_page+0xfd/0x130 [7.854130] [] free_pages_prepare+0x13c/0x1c0 [7.854808] [] ? adjust_managed_page_count+0x5d/0x70 [7.855575] [] free_hot_cold_page+0x35/0x180 [7.856326] [] __free_pages+0x13/0x40 [7.856854] [] free_reserved_area+0xcd/0x140 [7.857442] [] free_init_pages+0x98/0xb0 [7.858001] [] mark_rodata_ro+0xb5/0x120 [7.858622] [] ? rest_init+0xc0/0xc0 [7.859174] [] kernel_init+0x1d/0x100 [7.859724] [] ret_from_fork+0x7c/0xb0 [7.860279] [] ? rest_init+0xc0/0xc0 [7.860836] Disabling lock debugging due to kernel taint [7.861432] Freeing unused kernel memory: 376K (843a2000 - 8440) [7.866118] Freeing unused kernel memory: 1980K (880002011000 - 88000220) [7.870525] Freeing unused kernel memory: 1932K (880002a1d000 - 880002c0) [0.00] .text: [0x0100-0x0200d548] [0.00] .rodata: [0x0220-0x02a1cfff] [0.00] .data: [0x02c0-0x02e50e7f] [0.00] .init: [0x02e52000-0x03212fff] [0.00].bss: [0x03221000-0x0437bfff] [0.00].brk: [0x0437c000-0x043a1fff] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote: When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory, and should be freed. Before: ---[ High Kernel Mapping ]--- ... 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW GLB x pte 0x82e0-0xc000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW NX pte 0x82e0-0xc000 978M pmd Signed-off-by: Kees Cook keesc...@chromium.org --- v2: - added call to free_init_pages(), as suggested by tglx --- arch/x86/mm/init_64.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 4cb8763868fc..0d498c922668 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void) unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); unsigned long all_end = PFN_ALIGN(_end); + unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void) * The rodata/data/bss/brk section (but not the kernel text!) * should also be not-executable. */ - set_memory_nx(rodata_start, (all_end - rodata_start) PAGE_SHIFT); + set_memory_nx(rodata_start, (pmd_end - rodata_start) PAGE_SHIFT); rodata_test(); @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void) set_memory_ro(start, (end-start) PAGE_SHIFT); #endif + free_init_pages(unused kernel, all_end, pmd_end); free_init_pages(unused kernel, (unsigned long) __va(__pa_symbol(text_end)), (unsigned long) __va(__pa_symbol(rodata_start))); something is wrong: [7.842479] Freeing unused kernel memory: 3844K (82e52000 - 83213000) [7.843305] Write protecting the kernel read-only data: 28672k [7.844433] BUG: Bad page state in process swapper/0 pfn:043c0 [7.845093] page:ea10f000 count:0 mapcount:-127 mapping: (null) index:0x2 [7.846388] flags: 0x10() [7.846871] page dumped because: nonzero mapcount [7.847343] Modules linked in: [7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc4-yh-01896-g40204c8-dirty #23 [7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [7.850014] 828300ca 880078babd68 81ff47d0 0001 [7.850857] ea10f000 880078babd98 8118c2bd 001d4cc0 [7.851791] ea10f000 ea10f000 880078babdf8 [7.852700] Call Trace: [7.852991] [81ff47d0] dump_stack+0x45/0x57 [7.853494] [8118c2bd] bad_page+0xfd/0x130 [7.854130] [8118c42c] free_pages_prepare+0x13c/0x1c0 [7.854808] [8118c64d] ? adjust_managed_page_count+0x5d/0x70 [7.855575] [8118f285] free_hot_cold_page+0x35/0x180 [7.856326] [8118f3e3] __free_pages+0x13/0x40 [7.856854] [8118f4dd] free_reserved_area+0xcd/0x140 [7.857442] [81091778] free_init_pages+0x98/0xb0 [7.858001] [81092085] mark_rodata_ro+0xb5/0x120 [7.858622] [81fe3240] ? rest_init+0xc0/0xc0 [7.859174] [81fe325d] kernel_init+0x1d/0x100 [7.859724] [820066ec] ret_from_fork+0x7c/0xb0 [7.860279] [81fe3240] ? rest_init+0xc0/0xc0 [7.860836] Disabling lock debugging due to kernel taint [7.861432] Freeing unused kernel memory: 376K (843a2000 - 8440) [7.866118] Freeing unused kernel memory: 1980K (880002011000 - 88000220) [7.870525] Freeing unused kernel memory: 1932K (880002a1d000 - 880002c0) [0.00] .text: [0x0100-0x0200d548] [0.00] .rodata: [0x0220-0x02a1cfff] [0.00] .data: [0x02c0-0x02e50e7f] [0.00] .init: [0x02e52000-0x03212fff] [0.00].bss: [0x03221000-0x0437bfff] [0.00].brk: [0x0437c000-0x043a1fff]
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote: On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote: v2: - added call to free_init_pages(), as suggested by tglx something is wrong: [7.842479] Freeing unused kernel memory: 3844K (82e52000 - 83213000) [7.843305] Write protecting the kernel read-only data: 28672k should use attached one instead. 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. --- arch/x86/mm/init_64.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) __end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); -unsigned long all_end = PFN_ALIGN(_end); +unsigned long all_end = PFN_ALIGN(_brk_end); +unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void) * The rodata/data/bss/brk section (but not the kernel text!) * should also be not-executable. */ -set_memory_nx(rodata_start, (all_end - rodata_start) PAGE_SHIFT); +set_memory_nx(rodata_start, (pmd_end - rodata_start) PAGE_SHIFT); rodata_test(); @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void) set_memory_ro(start, (end-start) PAGE_SHIFT); #endif +/* all_end to pmd_end is handled via free_all_bootmem() */ free_init_pages(unused kernel, (unsigned long) __va(__pa_symbol(text_end)), (unsigned long) __va(__pa_symbol(rodata_start))); 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. --- arch/x86/mm/init_64.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) __end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); - unsigned long all_end = PFN_ALIGN(_end); + unsigned long all_end = PFN_ALIGN(_brk_end); + unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void) * The rodata/data/bss/brk section (but not the kernel text!) * should also be not-executable. */ - set_memory_nx(rodata_start, (all_end - rodata_start) PAGE_SHIFT); + set_memory_nx(rodata_start, (pmd_end - rodata_start) PAGE_SHIFT); rodata_test(); @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void) set_memory_ro(start, (end-start) PAGE_SHIFT); #endif + /* all_end to pmd_end is handled via free_all_bootmem() */ free_init_pages(unused kernel, (unsigned long) __va(__pa_symbol(text_end)), (unsigned long) __va(__pa_symbol(rodata_start)));
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote: On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote: On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote: v2: - added call to free_init_pages(), as suggested by tglx something is wrong: [7.842479] Freeing unused kernel memory: 3844K (82e52000 - 83213000) [7.843305] Write protecting the kernel read-only data: 28672k should use attached one instead. 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. Are you sure? For me, _brk_end isn't far enough: [1.475572] all_end: 0x82df5000 [1.476736] _brk_end: 0x82dd6000 --- arch/x86/mm/init_64.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) __end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); -unsigned long all_end = PFN_ALIGN(_end); +unsigned long all_end = PFN_ALIGN(_brk_end); +unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void) * The rodata/data/bss/brk section (but not the kernel text!) * should also be not-executable. */ -set_memory_nx(rodata_start, (all_end - rodata_start) PAGE_SHIFT); +set_memory_nx(rodata_start, (pmd_end - rodata_start) PAGE_SHIFT); rodata_test(); @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void) set_memory_ro(start, (end-start) PAGE_SHIFT); #endif +/* all_end to pmd_end is handled via free_all_bootmem() */ free_init_pages(unused kernel, (unsigned long) __va(__pa_symbol(text_end)), (unsigned long) __va(__pa_symbol(rodata_start))); This patch produces the same results as my v1 patch: 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82e012M RW PSE GLB NX pmd 0x82e0-0xc000 978M pmd Is this correct? It sounded like tglx wanted the pmd split, like this: 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW NX pte 0x82e0-0xc000 978M pmd -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote: On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote: When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory, and should be freed. Before: ---[ High Kernel Mapping ]--- ... 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW GLB x pte 0x82e0-0xc000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW NX pte 0x82e0-0xc000 978M pmd Signed-off-by: Kees Cook keesc...@chromium.org --- v2: - added call to free_init_pages(), as suggested by tglx --- arch/x86/mm/init_64.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 4cb8763868fc..0d498c922668 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void) unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); unsigned long all_end = PFN_ALIGN(_end); + unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void) * The rodata/data/bss/brk section (but not the kernel text!) * should also be not-executable. */ - set_memory_nx(rodata_start, (all_end - rodata_start) PAGE_SHIFT); + set_memory_nx(rodata_start, (pmd_end - rodata_start) PAGE_SHIFT); rodata_test(); @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void) set_memory_ro(start, (end-start) PAGE_SHIFT); #endif + free_init_pages(unused kernel, all_end, pmd_end); free_init_pages(unused kernel, (unsigned long) __va(__pa_symbol(text_end)), (unsigned long) __va(__pa_symbol(rodata_start))); something is wrong: [7.842479] Freeing unused kernel memory: 3844K (82e52000 - 83213000) [7.843305] Write protecting the kernel read-only data: 28672k [7.844433] BUG: Bad page state in process swapper/0 pfn:043c0 [7.845093] page:ea10f000 count:0 mapcount:-127 mapping: (null) index:0x2 [7.846388] flags: 0x10() [7.846871] page dumped because: nonzero mapcount [7.847343] Modules linked in: [7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc4-yh-01896-g40204c8-dirty #23 [7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [7.850014] 828300ca 880078babd68 81ff47d0 0001 [7.850857] ea10f000 880078babd98 8118c2bd 001d4cc0 [7.851791] ea10f000 ea10f000 880078babdf8 [7.852700] Call Trace: [7.852991] [81ff47d0] dump_stack+0x45/0x57 [7.853494] [8118c2bd] bad_page+0xfd/0x130 [7.854130] [8118c42c] free_pages_prepare+0x13c/0x1c0 [7.854808] [8118c64d] ? adjust_managed_page_count+0x5d/0x70 [7.855575] [8118f285] free_hot_cold_page+0x35/0x180 [7.856326] [8118f3e3] __free_pages+0x13/0x40 [7.856854] [8118f4dd] free_reserved_area+0xcd/0x140 [7.857442] [81091778] free_init_pages+0x98/0xb0 [7.858001] [81092085] mark_rodata_ro+0xb5/0x120 [7.858622] [81fe3240] ? rest_init+0xc0/0xc0 [7.859174] [81fe325d] kernel_init+0x1d/0x100 [7.859724] [820066ec] ret_from_fork+0x7c/0xb0 [7.860279] [81fe3240] ? rest_init+0xc0/0xc0 [7.860836] Disabling lock debugging due to kernel taint [7.861432] Freeing unused kernel memory: 376K (843a2000 - 8440) [7.866118] Freeing unused kernel memory: 1980K (880002011000 - 88000220) [7.870525] Freeing unused kernel memory: 1932K (880002a1d000 - 880002c0) Also, what tree is this? Freeing %s went away in c88442ec45f30d587b38b935a14acde4e217a926 (and should probably be re-added, which is what I assume has happened.) [
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 7:06 PM, Kees Cook keesc...@chromium.org wrote: On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote: [0.00] .text: [0x0100-0x0200d548] [0.00] .rodata: [0x0220-0x02a1cfff] [0.00] .data: [0x02c0-0x02e50e7f] [0.00] .init: [0x02e52000-0x03212fff] [0.00].bss: [0x03221000-0x0437bfff] [0.00].brk: [0x0437c000-0x043a1fff] And which CONFIG turns on this reporting? My local patch. this brk does not shrink yet. Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, mm: set NX across entire PMD at boot
On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook keesc...@chromium.org wrote: On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote: should use attached one instead. 1. should use _brk_end instead of end, as we only use partial of brk. 2. [_brk_end, pm_end) page range is already converted. aka is not wasted. Are you sure? For me, _brk_end isn't far enough: [1.475572] all_end: 0x82df5000 [1.476736] _brk_end: 0x82dd6000 Yes. _brk_end should be small then _end. --- arch/x86/mm/init_64.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/mm/init_64.c === --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void) unsigned long end = (unsigned long) __end_rodata_hpage_align; unsigned long text_end = PFN_ALIGN(__stop___ex_table); unsigned long rodata_end = PFN_ALIGN(__end_rodata); -unsigned long all_end = PFN_ALIGN(_end); +unsigned long all_end = PFN_ALIGN(_brk_end); +unsigned long pmd_end = roundup(all_end, PMD_SIZE); printk(KERN_INFO Write protecting the kernel read-only data: %luk\n, (end - start) 10); @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void) * The rodata/data/bss/brk section (but not the kernel text!) * should also be not-executable. */ -set_memory_nx(rodata_start, (all_end - rodata_start) PAGE_SHIFT); +set_memory_nx(rodata_start, (pmd_end - rodata_start) PAGE_SHIFT); rodata_test(); @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void) set_memory_ro(start, (end-start) PAGE_SHIFT); #endif +/* all_end to pmd_end is handled via free_all_bootmem() */ free_init_pages(unused kernel, (unsigned long) __va(__pa_symbol(text_end)), (unsigned long) __va(__pa_symbol(rodata_start))); This patch produces the same results as my v1 patch: 0x8202d000-0x8220 1868K RW GLB NX pte 0x8220-0x82e012M RW PSE GLB NX pmd 0x82e0-0xc000 978M pmd Is this correct? It sounded like tglx wanted the pmd split, like this: 0x8220-0x82c010M RW PSE GLB NX pmd 0x82c0-0x82df5000 2004K RW GLB NX pte 0x82df5000-0x82e044K RW NX pte 0x82e0-0xc000 978M pmd Need to remove GLB ? Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/