Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-18 Thread Thomas Gleixner
On Fri, 14 Nov 2014, Kees Cook wrote:
> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu  wrote:
> > On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu  wrote:
> >> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook  wrote:
> >>> v2:
> >>>  - added call to free_init_pages(), as suggested by tglx
> >
> >> something is wrong:
> >>
> >> [7.842479] Freeing unused kernel memory: 3844K (82e52000 -
> >> 83213000)
> >> [7.843305] Write protecting the kernel read-only data: 28672k
> >
> > 
> > should use attached one instead.
> >
> > 1. should use _brk_end instead of , as we only use partial of
> >brk.
> > 2. [_brk_end, pm_end) page range is already converted. aka
> >is not wasted.
> 
> Are you sure? For me, _brk_end isn't far enough:

_brk_end is guaranteed to be <= _end. But we really want to use
_brk_end here, because if we have the following situation:

_brk_start: 0x03ff
_brk_end:   0x0300
_end:   0x04016000

So we have the following PMDs setup:

0x03e0 pmd rw nx<- covers the top of .bss
   and the start  of .brk
0x0400 pmd rw nx<- covers the end of .brk
   and some random unused

So if _brk_end is less than 0x0400, then cleanup_highmem() has
zapped the extra PMD already. So we don't want to call set_nx() on
that.

If _brk_end is >= 0x0400 then we cover that last pmd with the
set_nx call.

Completely non obvious as anything else in that area.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-18 Thread Thomas Gleixner
On Fri, 14 Nov 2014, Kees Cook wrote:
 On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote:
  On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote:
  On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote:
  v2:
   - added call to free_init_pages(), as suggested by tglx
 
  something is wrong:
 
  [7.842479] Freeing unused kernel memory: 3844K (82e52000 -
  83213000)
  [7.843305] Write protecting the kernel read-only data: 28672k
 
  
  should use attached one instead.
 
  1. should use _brk_end instead of end, as we only use partial of
 brk.
  2. [_brk_end, pm_end) page range is already converted. aka
 is not wasted.
 
 Are you sure? For me, _brk_end isn't far enough:

_brk_end is guaranteed to be = _end. But we really want to use
_brk_end here, because if we have the following situation:

_brk_start: 0x03ff
_brk_end:   0x0300
_end:   0x04016000

So we have the following PMDs setup:

0x03e0 pmd rw nx- covers the top of .bss
   and the start  of .brk
0x0400 pmd rw nx- covers the end of .brk
   and some random unused

So if _brk_end is less than 0x0400, then cleanup_highmem() has
zapped the extra PMD already. So we don't want to call set_nx() on
that.

If _brk_end is = 0x0400 then we cover that last pmd with the
set_nx call.

Completely non obvious as anything else in that area.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-17 Thread Yinghai Lu
On Mon, Nov 17, 2014 at 12:27 PM, Kees Cook  wrote:
> On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner  wrote:
>>
>> So the initial patch to get rid of the X mapping is of course to just
>> extend the area to the PMD. A little bit different to your initial
>> patch, but essentially the same.
>>
>> -   unsigned long all_end = PFN_ALIGN(&_end);
>> +   unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);
>>
>> I'm going to apply your V1 patch with the above roundup()
>> simplification. If a page of that area gets used later on then we are
>> going to split up the PMD anyway.
>
> That's fine by me. Yinghai Lu seems to have potentially better
> solutions, but my head is spinning after reading more of this thread.
> :) I just want to make sure that at the end of the day, there are no
> RW+x mappings.
>

Please check v3 that cleanup highmap in the middle.


Before patch:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8220  18M ro PSE GLB x  pmd
0x8220-0x82c0  10M ro PSE GLB NX pmd
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x8300   2M RW GLB NX pte
0x8300-0x8320   2M RW PSE GLB NX pmd
0x8320-0x8340   2M RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843a20001672K RW GLB NX pte
0x843a2000-0x8440 376K RW GLB x  pte
0x8440-0xa000 444M   pmd

After patch:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8200  16M ro PSE GLB x  pmd
0x8200-0x82011000  68K ro GLB x  pte
0x82011000-0x82201980K   pte
0x8220-0x82a0   8M ro PSE GLB NX pmd
0x82a0-0x82a1d000 116K ro GLB NX pte
0x82a1d000-0x82c01932K   pte
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x82e52000 328K RW GLB NX pte
0x82e52000-0x83001720K   pte
0x8300-0x8320   2M   pmd
0x8320-0x83213000  76K   pte
0x83213000-0x83401972K RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843830001548K RW GLB NX pte
0x84383000-0x8440 500K   pte
0x8440-0xa000 444M   pmd

Thanks

Yinghai
Subject: [PATCH 1/2] x86, 64bit: add pfn_range_is_highmapped()

will use it to support holes in highmap.

Signed-off-by: Yinghai Lu 

---
 arch/x86/include/asm/pgtable_64.h |2 ++
 arch/x86/mm/init_64.c |   22 ++
 arch/x86/mm/pageattr.c|   16 +---
 3 files changed, 25 insertions(+), 15 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -375,6 +375,23 @@ void __init init_extra_mapping_uc(unsign
 	__init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE);
 }
 
+/* three holes at most*/
+#define NR_RANGE 4
+static struct range pfn_highmapped[NR_RANGE];
+static int nr_pfn_highmapped;
+
+int pfn_range_is_highmapped(unsigned long start_pfn, unsigned long end_pfn)
+{
+	int i;
+
+	for (i = 0; i < nr_pfn_highmapped; i++)
+		if ((start_pfn >= pfn_highmapped[i].start) &&
+		(end_pfn <= pfn_highmapped[i].end))
+			return 1;
+
+	return 0;
+}
+
 /*
  * The head.S code sets up the kernel high mapping:
  *
@@ -409,6 +426,11 @@ void __init cleanup_highmap(void)
 		if (vaddr < (unsigned long) _text || vaddr > end)
 			set_pmd(pmd, __pmd(0));
 	}
+
+	nr_pfn_highmapped = add_range(pfn_highmapped, NR_RANGE,
+			nr_pfn_highmapped,
+			__pa_symbol(_text) >> PAGE_SHIFT,
+			__pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT);
 }
 
 static unsigned long __meminit
Index: linux-2.6/arch/x86/mm/pageattr.c
===
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -91,20 +91,6 @@ void arch_report_meminfo(struct seq_file
 static inline 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-17 Thread Kees Cook
On Mon, Nov 17, 2014 at 12:32 PM, Russell King - ARM Linux
 wrote:
> On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote:
>> I have a series for arm that is waiting to be picked up by rmk:
>> https://patchwork.ozlabs.org/patch/400383/
>
> It should've been in linux-next via my tree for the last two weeks
> or so.

Fantastic! Thank you!

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-17 Thread Russell King - ARM Linux
On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote:
> I have a series for arm that is waiting to be picked up by rmk:
> https://patchwork.ozlabs.org/patch/400383/

It should've been in linux-next via my tree for the last two weeks
or so.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-17 Thread Kees Cook
On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner  wrote:
> On Fri, 14 Nov 2014, Kees Cook wrote:
>> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu  wrote:
>> > should use attached one instead.
>> >
>> > 1. should use _brk_end instead of , as we only use partial of
>> >brk.
>> > 2. [_brk_end, pm_end) page range is already converted. aka
>> >is not wasted.
>>
>> Are you sure? For me, _brk_end isn't far enough:
>>
>> [1.475572] all_end: 0x82df5000
>> [1.476736] _brk_end: 0x82dd6000
>
> _brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
> which is the same as _end. We usually do not use all of that space. So
> it's expected that _brk_end < _end.
>
>> Is this correct? It sounded like tglx wanted the pmd split, like this:
>
> Yes, I wanted to get rid of the high mapping for anything between
> _brk_end and _end, and I brought you on the wrong track with my
> suggestion to call free_init_pages(). Sorry about that.
>
> That happened because I missed the completely non obvious fact, that
> only the effective brk section is reserved for the kernel via
> reserve_brk(). So the area between _brk_end and _end is already
> reusable. Though that reuse works only by chance and not by design and
> is completely undocumented as everything else in that area.
>
> So the initial patch to get rid of the X mapping is of course to just
> extend the area to the PMD. A little bit different to your initial
> patch, but essentially the same.
>
> -   unsigned long all_end = PFN_ALIGN(&_end);
> +   unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);
>
> I'm going to apply your V1 patch with the above roundup()
> simplification. If a page of that area gets used later on then we are
> going to split up the PMD anyway.

That's fine by me. Yinghai Lu seems to have potentially better
solutions, but my head is spinning after reading more of this thread.
:) I just want to make sure that at the end of the day, there are no
RW+x mappings.

> But still we want to get rid of that highmap between _brk_end and
> _end, but there is absolutely no reason to come up with extra silly
> functions for that.
>
> So the obvious solution is to let setup_arch() reserve the memory up
> to _end instead of _bss_stop, get rid of the extra reservation in
> reserve_brk() and then let free_initmem() release the area between
> _brk_end and _end. No extra hackery, no side effects, just works.
>
> I spent quite some time to stare into that and I wonder about the
> following related issues:
>
> 1) Why is the mark_rodata_ro() business a debug configuration, i.e
>CONFIG_DEBUG_RODATA?
>
>This does not make any sense at all. We really want RO and NX on by
>default and AFAICT distros are turning that on anyway for obvious
>reasons.

This is a historically badly named config item. Once arm and arm64
land their CONFIG_DEBUG_RODATA implementations, I was going to suggest
having this renamed.

>
>The only idiocity I found so far is the kgdb Documentation which
>recommends to turn it off. Sigh.
>
>So that should be changed to:
>
>   config ARCH_HAS_RONX
>   bool
>
>   config DISABLE_RONX
>   def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES
>
>plus
>
>   config ARCH_WANTS_KGDB_SECURITY_HOLES
>   bool
>
>   config KGDB_ENABLE_SECURITY_HOLES
>   bool "WTF?"
>   depends on BROKEN && ARCH_WANTS_KGDB_SECURITY_HOLES
>
> 2) What is actually the modules counterpart for mark_rodata_ro()?
>
>CONFIG_DEBUG_SET_MODULE_RONX
>
>Of course not enabled by default, but enabled by distros again.
>
>See #1.
>
>Now what's interesting aside of the general fuckup is that
>CONFIG_DEBUG_RODATA is supported by:
>
>  arch/x86 and arch/parisc
>
>But CONFIG_DEBUG_SET_MODULE_RONX is supported by:
>
>  arch/arm/ arch/arm64 arch/s390 and arch/x86

I have a series for arm that is waiting to be picked up by rmk:
https://patchwork.ozlabs.org/patch/400383/

Laura has been working on a series for arm64:
http://comments.gmane.org/gmane.linux.ports.arm.kernel/366777

So what you're seeing is us being in the middle of providing support
for it. It just happens that CONFIG_DEBUG_SET_MODULE_RONX is a bit
easier to implement, so it was completed first.

>
>This does not make any sense at all.
>
>Do arm/arm64/s390 have other means to make RO/NX work or are they
>just doing it for modules? And how is that supposed to work with
>KGDB if it is not aware of modules sections being RO/NX? KGDB has
>only extra magic for CONFIG_DEBUG_RODATA, but not for
>CONFIG_DEBUG_SET_MODULE_RONX.
>
>Now for extended fun the x86 help text for that option says:
>
> ... Such protection may interfere with run-time code
> patching and dynamic kernel tracing - and they might also protect
> against certain classes of kernel exploits.
> If in doubt, say "N".
>
>Patently wrong. More sigh.

Like the CONFIG naming, I 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-17 Thread Kees Cook
On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner t...@linutronix.de wrote:
 On Fri, 14 Nov 2014, Kees Cook wrote:
 On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote:
  should use attached one instead.
 
  1. should use _brk_end instead of end, as we only use partial of
 brk.
  2. [_brk_end, pm_end) page range is already converted. aka
 is not wasted.

 Are you sure? For me, _brk_end isn't far enough:

 [1.475572] all_end: 0x82df5000
 [1.476736] _brk_end: 0x82dd6000

 _brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
 which is the same as _end. We usually do not use all of that space. So
 it's expected that _brk_end  _end.

 Is this correct? It sounded like tglx wanted the pmd split, like this:

 Yes, I wanted to get rid of the high mapping for anything between
 _brk_end and _end, and I brought you on the wrong track with my
 suggestion to call free_init_pages(). Sorry about that.

 That happened because I missed the completely non obvious fact, that
 only the effective brk section is reserved for the kernel via
 reserve_brk(). So the area between _brk_end and _end is already
 reusable. Though that reuse works only by chance and not by design and
 is completely undocumented as everything else in that area.

 So the initial patch to get rid of the X mapping is of course to just
 extend the area to the PMD. A little bit different to your initial
 patch, but essentially the same.

 -   unsigned long all_end = PFN_ALIGN(_end);
 +   unsigned long all_end = roundup((unsigned long) _end, PMD_SIZE);

 I'm going to apply your V1 patch with the above roundup()
 simplification. If a page of that area gets used later on then we are
 going to split up the PMD anyway.

That's fine by me. Yinghai Lu seems to have potentially better
solutions, but my head is spinning after reading more of this thread.
:) I just want to make sure that at the end of the day, there are no
RW+x mappings.

 But still we want to get rid of that highmap between _brk_end and
 _end, but there is absolutely no reason to come up with extra silly
 functions for that.

 So the obvious solution is to let setup_arch() reserve the memory up
 to _end instead of _bss_stop, get rid of the extra reservation in
 reserve_brk() and then let free_initmem() release the area between
 _brk_end and _end. No extra hackery, no side effects, just works.

 I spent quite some time to stare into that and I wonder about the
 following related issues:

 1) Why is the mark_rodata_ro() business a debug configuration, i.e
CONFIG_DEBUG_RODATA?

This does not make any sense at all. We really want RO and NX on by
default and AFAICT distros are turning that on anyway for obvious
reasons.

This is a historically badly named config item. Once arm and arm64
land their CONFIG_DEBUG_RODATA implementations, I was going to suggest
having this renamed.


The only idiocity I found so far is the kgdb Documentation which
recommends to turn it off. Sigh.

So that should be changed to:

   config ARCH_HAS_RONX
   bool

   config DISABLE_RONX
   def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES

plus

   config ARCH_WANTS_KGDB_SECURITY_HOLES
   bool

   config KGDB_ENABLE_SECURITY_HOLES
   bool WTF?
   depends on BROKEN  ARCH_WANTS_KGDB_SECURITY_HOLES

 2) What is actually the modules counterpart for mark_rodata_ro()?

CONFIG_DEBUG_SET_MODULE_RONX

Of course not enabled by default, but enabled by distros again.

See #1.

Now what's interesting aside of the general fuckup is that
CONFIG_DEBUG_RODATA is supported by:

  arch/x86 and arch/parisc

But CONFIG_DEBUG_SET_MODULE_RONX is supported by:

  arch/arm/ arch/arm64 arch/s390 and arch/x86

I have a series for arm that is waiting to be picked up by rmk:
https://patchwork.ozlabs.org/patch/400383/

Laura has been working on a series for arm64:
http://comments.gmane.org/gmane.linux.ports.arm.kernel/366777

So what you're seeing is us being in the middle of providing support
for it. It just happens that CONFIG_DEBUG_SET_MODULE_RONX is a bit
easier to implement, so it was completed first.


This does not make any sense at all.

Do arm/arm64/s390 have other means to make RO/NX work or are they
just doing it for modules? And how is that supposed to work with
KGDB if it is not aware of modules sections being RO/NX? KGDB has
only extra magic for CONFIG_DEBUG_RODATA, but not for
CONFIG_DEBUG_SET_MODULE_RONX.

Now for extended fun the x86 help text for that option says:

 ... Such protection may interfere with run-time code
 patching and dynamic kernel tracing - and they might also protect
 against certain classes of kernel exploits.
 If in doubt, say N.

Patently wrong. More sigh.

Like the CONFIG naming, I hope to start cleaning these defaults up
once arm and arm64 are landed.

 3) Why is mark_rodata_ro() 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-17 Thread Russell King - ARM Linux
On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote:
 I have a series for arm that is waiting to be picked up by rmk:
 https://patchwork.ozlabs.org/patch/400383/

It should've been in linux-next via my tree for the last two weeks
or so.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-17 Thread Kees Cook
On Mon, Nov 17, 2014 at 12:32 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Mon, Nov 17, 2014 at 12:27:59PM -0800, Kees Cook wrote:
 I have a series for arm that is waiting to be picked up by rmk:
 https://patchwork.ozlabs.org/patch/400383/

 It should've been in linux-next via my tree for the last two weeks
 or so.

Fantastic! Thank you!

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-17 Thread Yinghai Lu
On Mon, Nov 17, 2014 at 12:27 PM, Kees Cook keesc...@chromium.org wrote:
 On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner t...@linutronix.de wrote:

 So the initial patch to get rid of the X mapping is of course to just
 extend the area to the PMD. A little bit different to your initial
 patch, but essentially the same.

 -   unsigned long all_end = PFN_ALIGN(_end);
 +   unsigned long all_end = roundup((unsigned long) _end, PMD_SIZE);

 I'm going to apply your V1 patch with the above roundup()
 simplification. If a page of that area gets used later on then we are
 going to split up the PMD anyway.

 That's fine by me. Yinghai Lu seems to have potentially better
 solutions, but my head is spinning after reading more of this thread.
 :) I just want to make sure that at the end of the day, there are no
 RW+x mappings.


Please check v3 that cleanup highmap in the middle.


Before patch:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8220  18M ro PSE GLB x  pmd
0x8220-0x82c0  10M ro PSE GLB NX pmd
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x8300   2M RW GLB NX pte
0x8300-0x8320   2M RW PSE GLB NX pmd
0x8320-0x8340   2M RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843a20001672K RW GLB NX pte
0x843a2000-0x8440 376K RW GLB x  pte
0x8440-0xa000 444M   pmd

After patch:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8200  16M ro PSE GLB x  pmd
0x8200-0x82011000  68K ro GLB x  pte
0x82011000-0x82201980K   pte
0x8220-0x82a0   8M ro PSE GLB NX pmd
0x82a0-0x82a1d000 116K ro GLB NX pte
0x82a1d000-0x82c01932K   pte
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x82e52000 328K RW GLB NX pte
0x82e52000-0x83001720K   pte
0x8300-0x8320   2M   pmd
0x8320-0x83213000  76K   pte
0x83213000-0x83401972K RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843830001548K RW GLB NX pte
0x84383000-0x8440 500K   pte
0x8440-0xa000 444M   pmd

Thanks

Yinghai
Subject: [PATCH 1/2] x86, 64bit: add pfn_range_is_highmapped()

will use it to support holes in highmap.

Signed-off-by: Yinghai Lu ying...@kernel.org

---
 arch/x86/include/asm/pgtable_64.h |2 ++
 arch/x86/mm/init_64.c |   22 ++
 arch/x86/mm/pageattr.c|   16 +---
 3 files changed, 25 insertions(+), 15 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -375,6 +375,23 @@ void __init init_extra_mapping_uc(unsign
 	__init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE);
 }
 
+/* three holes at most*/
+#define NR_RANGE 4
+static struct range pfn_highmapped[NR_RANGE];
+static int nr_pfn_highmapped;
+
+int pfn_range_is_highmapped(unsigned long start_pfn, unsigned long end_pfn)
+{
+	int i;
+
+	for (i = 0; i  nr_pfn_highmapped; i++)
+		if ((start_pfn = pfn_highmapped[i].start) 
+		(end_pfn = pfn_highmapped[i].end))
+			return 1;
+
+	return 0;
+}
+
 /*
  * The head.S code sets up the kernel high mapping:
  *
@@ -409,6 +426,11 @@ void __init cleanup_highmap(void)
 		if (vaddr  (unsigned long) _text || vaddr  end)
 			set_pmd(pmd, __pmd(0));
 	}
+
+	nr_pfn_highmapped = add_range(pfn_highmapped, NR_RANGE,
+			nr_pfn_highmapped,
+			__pa_symbol(_text)  PAGE_SHIFT,
+			__pa_symbol(roundup(_brk_end, PMD_SIZE))  PAGE_SHIFT);
 }
 
 static unsigned long __meminit
Index: linux-2.6/arch/x86/mm/pageattr.c
===
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -91,20 +91,6 @@ void arch_report_meminfo(struct seq_file
 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Yinghai Lu
On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner  wrote:
>
> _brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
> which is the same as _end. We usually do not use all of that space. So
> it's expected that _brk_end < _end.
>
>> Is this correct? It sounded like tglx wanted the pmd split, like this:
>
> Yes, I wanted to get rid of the high mapping for anything between
> _brk_end and _end, and I brought you on the wrong track with my
> suggestion to call free_init_pages(). Sorry about that.

:)

>
> That happened because I missed the completely non obvious fact, that
> only the effective brk section is reserved for the kernel via
> reserve_brk(). So the area between _brk_end and _end is already
> reusable. Though that reuse works only by chance and not by design and
> is completely undocumented as everything else in that area.

where is info for everything else?

>
> So the initial patch to get rid of the X mapping is of course to just
> extend the area to the PMD. A little bit different to your initial
> patch, but essentially the same.
>
> -   unsigned long all_end = PFN_ALIGN(&_end);
> +   unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);
>
> I'm going to apply your V1 patch with the above roundup()
> simplification. If a page of that area gets used later on then we are
> going to split up the PMD anyway.

should use _brk_end instead of &_end for the roundup?

>
> But still we want to get rid of that highmap between _brk_end and
> _end, but there is absolutely no reason to come up with extra silly
> functions for that.
>
> So the obvious solution is to let setup_arch() reserve the memory up
> to _end instead of _bss_stop, get rid of the extra reservation in
> reserve_brk() and then let free_initmem() release the area between
> _brk_end and _end. No extra hackery, no side effects, just works.

So where get the highmap to be removed?
free_init_pages via free_initmem()?

with the code change like you suggested,

---
 arch/x86/kernel/setup.c |6 +-
 arch/x86/mm/init.c  |6 ++
 arch/x86/mm/init_64.c   |2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -286,10 +286,6 @@ static void __init cleanup_highmap(void)

 static void __init reserve_brk(void)
 {
-if (_brk_end > _brk_start)
-memblock_reserve(__pa_symbol(_brk_start),
- _brk_end - _brk_start);
-
 /* Mark brk area as locked down and no longer taking any
new allocations */
 _brk_start = 0;
@@ -857,7 +853,7 @@ dump_kernel_offset(struct notifier_block
 void __init setup_arch(char **cmdline_p)
 {
 memblock_reserve(__pa_symbol(_text),
- (unsigned long)__bss_stop - (unsigned long)_text);
+ (unsigned long)_end - (unsigned long)_text);

 early_reserve_initrd();

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1122,7 +1122,7 @@ void mark_rodata_ro(void)
 unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-unsigned long all_end = PFN_ALIGN(&_end);
+unsigned long all_end = roundup((unsigned long)&_end, PMD_SIZE);

 printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
(end - start) >> 10);
Index: linux-2.6/arch/x86/mm/init.c
===
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -637,9 +637,15 @@ void free_init_pages(char *what, unsigne

 void free_initmem(void)
 {
+unsigned long all_end = roundup((unsigned long)&_end, PMD_SIZE);
+
 free_init_pages("unused kernel",
 (unsigned long)(&__init_begin),
 (unsigned long)(&__init_end));
+
+#ifdef CONFIG_X86_64
+free_init_pages("unused brk", PFN_ALIGN(_brk_end), all_end);
+#endif
 }

 #ifdef CONFIG_BLK_DEV_INITRD


got
[7.942636] Freeing unused brk memory: 500K (84383000 -
8440)

---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8220  18M ro PSE GLB x  pmd
0x8220-0x82c0  10M ro PSE GLB NX pmd
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x8300   2M RW GLB NX pte
0x8300-0x8320   2M RW PSE GLB NX pmd
0x8320-0x8340   2M RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Yinghai Lu
On Sun, Nov 16, 2014 at 1:26 PM, Thomas Gleixner  wrote:
> On Sat, 15 Nov 2014, Yinghai Lu wrote:
>> +static pmd_t *last_pmd;
>>  /*
>>   * The head.S code sets up the kernel high mapping:
>>   *
>> @@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
>>  continue;
>>  if (vaddr < (unsigned long) _text || vaddr > end)
>>  set_pmd(pmd, __pmd(0));
>> +else
>> +last_pmd = pmd;
>
> Why do you need to store this? You can compute this.

I'm not quite sure about the xen path.

>
>> +static void __init cleanup_highmap_tail(unsigned long addr)
>
> Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which
> will free exactly that code.

I missed that.

Please check this one that address three problems that you pointed out.

Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of &_end in mark_rodata_ro().
   _brk_end can move up to &_end, i.e. to __brk_limit.  It's safe to
   use _brk_end when mark_rodata_ro() is called because extend_brk()
   is gone already at that point.
2. [_brk_end, pm_end) page range is already converted mem. and
   is not wasted.
3. add cleanup_highmap_tail for [_brk_end, pm_end).

Kernel Layout:
[0.00].brk: [0x0437c000-0x043a1fff]

Actually used brk:
[0.272959] memblock_reserve: [0x000437c000-0x0004382fff]
flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
...
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843a20001672K RW GLB NX pte
0x843a2000-0x8440 376K RW GLB x  pte
0x8440-0xa000 444M   pmd
After patch:
---[ High Kernel Mapping ]---
...
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843830001548K RW GLB NX pte
0x84383000-0x8440 500K   pte
0x8440-0xa000 444M   pmd

-v2: according to tglx
 caculate the pmd postion instead of passing last_pmd.
 cleanup_highmap_tail could not have __init, as it is called in mark_rodata_ro
 and mark_rodata_ro is called after free_initmem.
 highmap_end_pfn should keep PMD_SIZE alignment on !CONFIG_DEBUG_RODATA

Signed-off-by: Yinghai Lu 

---
 arch/x86/mm/init_64.c  |   22 +-
 arch/x86/mm/pageattr.c |4 
 2 files changed, 25 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -411,6 +411,23 @@ void __init cleanup_highmap(void)
 }
 }

+static void cleanup_highmap_tail(unsigned long addr)
+{
+int i;
+pgd_t *pgd;
+pud_t *pud;
+pmd_t *pmd;
+pte_t *pte;
+
+pgd = pgd_offset_k(addr);
+pud = (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr);
+pmd = (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
+pte = (pte_t *)pmd_page_vaddr(*pmd) + pte_index(addr);
+
+for (i = pte_index(addr); i < PTRS_PER_PTE; i++, pte++)
+set_pte(pte, __pte(0));
+}
+
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
   pgprot_t prot)
@@ -1124,7 +1141,8 @@ void mark_rodata_ro(void)
 unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-unsigned long all_end = PFN_ALIGN(&_end);
+unsigned long all_end = PFN_ALIGN(_brk_end);
+unsigned long pmd_end = roundup(all_end, PMD_SIZE);

 printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
(end - start) >> 10);
@@ -1137,6 +1155,8 @@ void mark_rodata_ro(void)
  * should also be not-executable.
  */
 set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+if (all_end < pmd_end)
+cleanup_highmap_tail(all_end);

 rodata_test();

Index: linux-2.6/arch/x86/mm/pageattr.c
===
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -100,7 +100,11 @@ static inline unsigned long highmap_star

 static inline unsigned long highmap_end_pfn(void)
 {
+#ifdef CONFIG_DEBUG_RODATA
+return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT;
+#else
 return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
+#endif
 }

 #endif
Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of &_end in mark_rodata_ro().
   _brk_end can move up to &_end, i.e. to __brk_limit.  It's safe to
   use _brk_end when mark_rodata_ro() is called because extend_brk()
   is gone already at that point.
2. [_brk_end, 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Yinghai Lu
On Sun, Nov 16, 2014 at 10:52 AM, Thomas Gleixner  wrote:
>> >
>> > Are you sure? For me, _brk_end isn't far enough:
>> >
>> > [1.475572] all_end: 0x82df5000
>> > [1.476736] _brk_end: 0x82dd6000
>>
>> Yes. _brk_end should be small then &_end.
>
> Wrong. _brk_end can move up to _end, i.e. to __brk_limit.

Confused. What is wrong? _brk_end is not smaller than &_end?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Thomas Gleixner
On Fri, 14 Nov 2014, Kees Cook wrote:
> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu  wrote:
> > should use attached one instead.
> >
> > 1. should use _brk_end instead of , as we only use partial of
> >brk.
> > 2. [_brk_end, pm_end) page range is already converted. aka
> >is not wasted.
> 
> Are you sure? For me, _brk_end isn't far enough:
> 
> [1.475572] all_end: 0x82df5000
> [1.476736] _brk_end: 0x82dd6000

_brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
which is the same as _end. We usually do not use all of that space. So
it's expected that _brk_end < _end.
 
> Is this correct? It sounded like tglx wanted the pmd split, like this:

Yes, I wanted to get rid of the high mapping for anything between
_brk_end and _end, and I brought you on the wrong track with my
suggestion to call free_init_pages(). Sorry about that.

That happened because I missed the completely non obvious fact, that
only the effective brk section is reserved for the kernel via
reserve_brk(). So the area between _brk_end and _end is already
reusable. Though that reuse works only by chance and not by design and
is completely undocumented as everything else in that area.

So the initial patch to get rid of the X mapping is of course to just
extend the area to the PMD. A little bit different to your initial
patch, but essentially the same.

-   unsigned long all_end = PFN_ALIGN(&_end);
+   unsigned long all_end = roundup((unsigned long) &_end, PMD_SIZE);

I'm going to apply your V1 patch with the above roundup()
simplification. If a page of that area gets used later on then we are
going to split up the PMD anyway.
 
But still we want to get rid of that highmap between _brk_end and
_end, but there is absolutely no reason to come up with extra silly
functions for that.

So the obvious solution is to let setup_arch() reserve the memory up
to _end instead of _bss_stop, get rid of the extra reservation in
reserve_brk() and then let free_initmem() release the area between
_brk_end and _end. No extra hackery, no side effects, just works.

I spent quite some time to stare into that and I wonder about the
following related issues:

1) Why is the mark_rodata_ro() business a debug configuration, i.e
   CONFIG_DEBUG_RODATA?

   This does not make any sense at all. We really want RO and NX on by
   default and AFAICT distros are turning that on anyway for obvious
   reasons.

   The only idiocity I found so far is the kgdb Documentation which
   recommends to turn it off. Sigh.

   So that should be changed to:

  config ARCH_HAS_RONX
  bool   

  config DISABLE_RONX
  def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES

   plus

  config ARCH_WANTS_KGDB_SECURITY_HOLES
  bool

  config KGDB_ENABLE_SECURITY_HOLES
  bool "WTF?"
  depends on BROKEN && ARCH_WANTS_KGDB_SECURITY_HOLES

2) What is actually the modules counterpart for mark_rodata_ro()?

   CONFIG_DEBUG_SET_MODULE_RONX

   Of course not enabled by default, but enabled by distros again.

   See #1.
   
   Now what's interesting aside of the general fuckup is that
   CONFIG_DEBUG_RODATA is supported by:
  
 arch/x86 and arch/parisc

   But CONFIG_DEBUG_SET_MODULE_RONX is supported by:

 arch/arm/ arch/arm64 arch/s390 and arch/x86

   This does not make any sense at all.

   Do arm/arm64/s390 have other means to make RO/NX work or are they
   just doing it for modules? And how is that supposed to work with
   KGDB if it is not aware of modules sections being RO/NX? KGDB has
   only extra magic for CONFIG_DEBUG_RODATA, but not for
   CONFIG_DEBUG_SET_MODULE_RONX.

   Now for extended fun the x86 help text for that option says:
   
... Such protection may interfere with run-time code
patching and dynamic kernel tracing - and they might also protect
against certain classes of kernel exploits.
If in doubt, say "N".

   Patently wrong. More sigh.  

3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor
   cannot be marked __init ?

   Just because ...

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Thomas Gleixner
On Sat, 15 Nov 2014, Yinghai Lu wrote:
> +static pmd_t *last_pmd;
>  /*
>   * The head.S code sets up the kernel high mapping:
>   *
> @@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
>  continue;
>  if (vaddr < (unsigned long) _text || vaddr > end)
>  set_pmd(pmd, __pmd(0));
> +else
> +last_pmd = pmd;

Why do you need to store this? You can compute this.
 
> +static void __init cleanup_highmap_tail(unsigned long addr)

Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which
will free exactly that code.

> +{
> +int i;
> +pte_t *pte;
> +
> +if (!last_pmd || pmd_none(*last_pmd))
> +return;

Aside of the above: This is complete and utter crap. Just another
useless function which can be avoided by switching on brain before
coding.

> Index: linux-2.6/arch/x86/mm/pageattr.c
> ===
> --- linux-2.6.orig/arch/x86/mm/pageattr.c
> +++ linux-2.6/arch/x86/mm/pageattr.c
> @@ -100,7 +100,7 @@ static inline unsigned long highmap_star
> 
>  static inline unsigned long highmap_end_pfn(void)
>  {
> -return __pa_symbol(roundup(_brk_end, PMD_SIZE)) >> PAGE_SHIFT;
> +return __pa_symbol(PFN_ALIGN(_brk_end)) >> PAGE_SHIFT;
>  }

Wrong as well for CONFIG_DEBUG_RODATA=n for obvious reasons.

I know why I have your patches on blacklist 

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Thomas Gleixner
On Fri, 14 Nov 2014, Yinghai Lu wrote:

> On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook  wrote:
> > On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu  wrote:
> 
> >> should use attached one instead.
> >>
> >> 1. should use _brk_end instead of , as we only use partial of
> >>brk.
> >> 2. [_brk_end, pm_end) page range is already converted. aka
> >>is not wasted.
> >
> > Are you sure? For me, _brk_end isn't far enough:
> >
> > [1.475572] all_end: 0x82df5000
> > [1.476736] _brk_end: 0x82dd6000
> 
> Yes. _brk_end should be small then &_end.

Wrong. _brk_end can move up to _end, i.e. to __brk_limit.
 
But it's safe to use _brk_end when mark_rodata_ro() is called because
extend_brk() is gone already at that point.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Thomas Gleixner
On Fri, 14 Nov 2014, Yinghai Lu wrote:

 On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook keesc...@chromium.org wrote:
  On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote:
 
  should use attached one instead.
 
  1. should use _brk_end instead of end, as we only use partial of
 brk.
  2. [_brk_end, pm_end) page range is already converted. aka
 is not wasted.
 
  Are you sure? For me, _brk_end isn't far enough:
 
  [1.475572] all_end: 0x82df5000
  [1.476736] _brk_end: 0x82dd6000
 
 Yes. _brk_end should be small then _end.

Wrong. _brk_end can move up to _end, i.e. to __brk_limit.
 
But it's safe to use _brk_end when mark_rodata_ro() is called because
extend_brk() is gone already at that point.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Thomas Gleixner
On Sat, 15 Nov 2014, Yinghai Lu wrote:
 +static pmd_t *last_pmd;
  /*
   * The head.S code sets up the kernel high mapping:
   *
 @@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
  continue;
  if (vaddr  (unsigned long) _text || vaddr  end)
  set_pmd(pmd, __pmd(0));
 +else
 +last_pmd = pmd;

Why do you need to store this? You can compute this.
 
 +static void __init cleanup_highmap_tail(unsigned long addr)

Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which
will free exactly that code.

 +{
 +int i;
 +pte_t *pte;
 +
 +if (!last_pmd || pmd_none(*last_pmd))
 +return;

Aside of the above: This is complete and utter crap. Just another
useless function which can be avoided by switching on brain before
coding.

 Index: linux-2.6/arch/x86/mm/pageattr.c
 ===
 --- linux-2.6.orig/arch/x86/mm/pageattr.c
 +++ linux-2.6/arch/x86/mm/pageattr.c
 @@ -100,7 +100,7 @@ static inline unsigned long highmap_star
 
  static inline unsigned long highmap_end_pfn(void)
  {
 -return __pa_symbol(roundup(_brk_end, PMD_SIZE))  PAGE_SHIFT;
 +return __pa_symbol(PFN_ALIGN(_brk_end))  PAGE_SHIFT;
  }

Wrong as well for CONFIG_DEBUG_RODATA=n for obvious reasons.

I know why I have your patches on blacklist 

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Thomas Gleixner
On Fri, 14 Nov 2014, Kees Cook wrote:
 On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote:
  should use attached one instead.
 
  1. should use _brk_end instead of end, as we only use partial of
 brk.
  2. [_brk_end, pm_end) page range is already converted. aka
 is not wasted.
 
 Are you sure? For me, _brk_end isn't far enough:
 
 [1.475572] all_end: 0x82df5000
 [1.476736] _brk_end: 0x82dd6000

_brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
which is the same as _end. We usually do not use all of that space. So
it's expected that _brk_end  _end.
 
 Is this correct? It sounded like tglx wanted the pmd split, like this:

Yes, I wanted to get rid of the high mapping for anything between
_brk_end and _end, and I brought you on the wrong track with my
suggestion to call free_init_pages(). Sorry about that.

That happened because I missed the completely non obvious fact, that
only the effective brk section is reserved for the kernel via
reserve_brk(). So the area between _brk_end and _end is already
reusable. Though that reuse works only by chance and not by design and
is completely undocumented as everything else in that area.

So the initial patch to get rid of the X mapping is of course to just
extend the area to the PMD. A little bit different to your initial
patch, but essentially the same.

-   unsigned long all_end = PFN_ALIGN(_end);
+   unsigned long all_end = roundup((unsigned long) _end, PMD_SIZE);

I'm going to apply your V1 patch with the above roundup()
simplification. If a page of that area gets used later on then we are
going to split up the PMD anyway.
 
But still we want to get rid of that highmap between _brk_end and
_end, but there is absolutely no reason to come up with extra silly
functions for that.

So the obvious solution is to let setup_arch() reserve the memory up
to _end instead of _bss_stop, get rid of the extra reservation in
reserve_brk() and then let free_initmem() release the area between
_brk_end and _end. No extra hackery, no side effects, just works.

I spent quite some time to stare into that and I wonder about the
following related issues:

1) Why is the mark_rodata_ro() business a debug configuration, i.e
   CONFIG_DEBUG_RODATA?

   This does not make any sense at all. We really want RO and NX on by
   default and AFAICT distros are turning that on anyway for obvious
   reasons.

   The only idiocity I found so far is the kgdb Documentation which
   recommends to turn it off. Sigh.

   So that should be changed to:

  config ARCH_HAS_RONX
  bool   

  config DISABLE_RONX
  def_bool !ARCH_HAS_RONX || !KGDB_ENABLE_SECURITY_HOLES

   plus

  config ARCH_WANTS_KGDB_SECURITY_HOLES
  bool

  config KGDB_ENABLE_SECURITY_HOLES
  bool WTF?
  depends on BROKEN  ARCH_WANTS_KGDB_SECURITY_HOLES

2) What is actually the modules counterpart for mark_rodata_ro()?

   CONFIG_DEBUG_SET_MODULE_RONX

   Of course not enabled by default, but enabled by distros again.

   See #1.
   
   Now what's interesting aside of the general fuckup is that
   CONFIG_DEBUG_RODATA is supported by:
  
 arch/x86 and arch/parisc

   But CONFIG_DEBUG_SET_MODULE_RONX is supported by:

 arch/arm/ arch/arm64 arch/s390 and arch/x86

   This does not make any sense at all.

   Do arm/arm64/s390 have other means to make RO/NX work or are they
   just doing it for modules? And how is that supposed to work with
   KGDB if it is not aware of modules sections being RO/NX? KGDB has
   only extra magic for CONFIG_DEBUG_RODATA, but not for
   CONFIG_DEBUG_SET_MODULE_RONX.

   Now for extended fun the x86 help text for that option says:
   
... Such protection may interfere with run-time code
patching and dynamic kernel tracing - and they might also protect
against certain classes of kernel exploits.
If in doubt, say N.

   Patently wrong. More sigh.  

3) Why is mark_rodata_ro() called AFTER free_initmem() and therefor
   cannot be marked __init ?

   Just because ...

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Yinghai Lu
On Sun, Nov 16, 2014 at 10:52 AM, Thomas Gleixner t...@linutronix.de wrote:
 
  Are you sure? For me, _brk_end isn't far enough:
 
  [1.475572] all_end: 0x82df5000
  [1.476736] _brk_end: 0x82dd6000

 Yes. _brk_end should be small then _end.

 Wrong. _brk_end can move up to _end, i.e. to __brk_limit.

Confused. What is wrong? _brk_end is not smaller than _end?

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Yinghai Lu
On Sun, Nov 16, 2014 at 1:26 PM, Thomas Gleixner t...@linutronix.de wrote:
 On Sat, 15 Nov 2014, Yinghai Lu wrote:
 +static pmd_t *last_pmd;
  /*
   * The head.S code sets up the kernel high mapping:
   *
 @@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
  continue;
  if (vaddr  (unsigned long) _text || vaddr  end)
  set_pmd(pmd, __pmd(0));
 +else
 +last_pmd = pmd;

 Why do you need to store this? You can compute this.

I'm not quite sure about the xen path.


 +static void __init cleanup_highmap_tail(unsigned long addr)

 Brilliant stuff. mark_rodata_ro() is called AFTER free_initmem() which
 will free exactly that code.

I missed that.

Please check this one that address three problems that you pointed out.

Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of _end in mark_rodata_ro().
   _brk_end can move up to _end, i.e. to __brk_limit.  It's safe to
   use _brk_end when mark_rodata_ro() is called because extend_brk()
   is gone already at that point.
2. [_brk_end, pm_end) page range is already converted mem. and
   is not wasted.
3. add cleanup_highmap_tail for [_brk_end, pm_end).

Kernel Layout:
[0.00].brk: [0x0437c000-0x043a1fff]

Actually used brk:
[0.272959] memblock_reserve: [0x000437c000-0x0004382fff]
flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
...
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843a20001672K RW GLB NX pte
0x843a2000-0x8440 376K RW GLB x  pte
0x8440-0xa000 444M   pmd
After patch:
---[ High Kernel Mapping ]---
...
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843830001548K RW GLB NX pte
0x84383000-0x8440 500K   pte
0x8440-0xa000 444M   pmd

-v2: according to tglx
 caculate the pmd postion instead of passing last_pmd.
 cleanup_highmap_tail could not have __init, as it is called in mark_rodata_ro
 and mark_rodata_ro is called after free_initmem.
 highmap_end_pfn should keep PMD_SIZE alignment on !CONFIG_DEBUG_RODATA

Signed-off-by: Yinghai Lu ying...@kernel.org

---
 arch/x86/mm/init_64.c  |   22 +-
 arch/x86/mm/pageattr.c |4 
 2 files changed, 25 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -411,6 +411,23 @@ void __init cleanup_highmap(void)
 }
 }

+static void cleanup_highmap_tail(unsigned long addr)
+{
+int i;
+pgd_t *pgd;
+pud_t *pud;
+pmd_t *pmd;
+pte_t *pte;
+
+pgd = pgd_offset_k(addr);
+pud = (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr);
+pmd = (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
+pte = (pte_t *)pmd_page_vaddr(*pmd) + pte_index(addr);
+
+for (i = pte_index(addr); i  PTRS_PER_PTE; i++, pte++)
+set_pte(pte, __pte(0));
+}
+
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
   pgprot_t prot)
@@ -1124,7 +1141,8 @@ void mark_rodata_ro(void)
 unsigned long end = (unsigned long) __end_rodata_hpage_align;
 unsigned long text_end = PFN_ALIGN(__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(__end_rodata);
-unsigned long all_end = PFN_ALIGN(_end);
+unsigned long all_end = PFN_ALIGN(_brk_end);
+unsigned long pmd_end = roundup(all_end, PMD_SIZE);

 printk(KERN_INFO Write protecting the kernel read-only data: %luk\n,
(end - start)  10);
@@ -1137,6 +1155,8 @@ void mark_rodata_ro(void)
  * should also be not-executable.
  */
 set_memory_nx(rodata_start, (all_end - rodata_start)  PAGE_SHIFT);
+if (all_end  pmd_end)
+cleanup_highmap_tail(all_end);

 rodata_test();

Index: linux-2.6/arch/x86/mm/pageattr.c
===
--- linux-2.6.orig/arch/x86/mm/pageattr.c
+++ linux-2.6/arch/x86/mm/pageattr.c
@@ -100,7 +100,11 @@ static inline unsigned long highmap_star

 static inline unsigned long highmap_end_pfn(void)
 {
+#ifdef CONFIG_DEBUG_RODATA
+return __pa_symbol(PFN_ALIGN(_brk_end))  PAGE_SHIFT;
+#else
 return __pa_symbol(roundup(_brk_end, PMD_SIZE))  PAGE_SHIFT;
+#endif
 }

 #endif
Subject: [PATCH v2] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of _end in mark_rodata_ro().
   _brk_end can move up to _end, i.e. to __brk_limit.  It's safe to
   use _brk_end when mark_rodata_ro() is called because extend_brk()
   is gone already at that point.
2. [_brk_end, pm_end) page 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-16 Thread Yinghai Lu
On Sun, Nov 16, 2014 at 3:44 PM, Thomas Gleixner t...@linutronix.de wrote:

 _brk_end is adjusted at boot time via extend_brk() up to __brk_limit,
 which is the same as _end. We usually do not use all of that space. So
 it's expected that _brk_end  _end.

 Is this correct? It sounded like tglx wanted the pmd split, like this:

 Yes, I wanted to get rid of the high mapping for anything between
 _brk_end and _end, and I brought you on the wrong track with my
 suggestion to call free_init_pages(). Sorry about that.

:)


 That happened because I missed the completely non obvious fact, that
 only the effective brk section is reserved for the kernel via
 reserve_brk(). So the area between _brk_end and _end is already
 reusable. Though that reuse works only by chance and not by design and
 is completely undocumented as everything else in that area.

where is info for everything else?


 So the initial patch to get rid of the X mapping is of course to just
 extend the area to the PMD. A little bit different to your initial
 patch, but essentially the same.

 -   unsigned long all_end = PFN_ALIGN(_end);
 +   unsigned long all_end = roundup((unsigned long) _end, PMD_SIZE);

 I'm going to apply your V1 patch with the above roundup()
 simplification. If a page of that area gets used later on then we are
 going to split up the PMD anyway.

should use _brk_end instead of _end for the roundup?


 But still we want to get rid of that highmap between _brk_end and
 _end, but there is absolutely no reason to come up with extra silly
 functions for that.

 So the obvious solution is to let setup_arch() reserve the memory up
 to _end instead of _bss_stop, get rid of the extra reservation in
 reserve_brk() and then let free_initmem() release the area between
 _brk_end and _end. No extra hackery, no side effects, just works.

So where get the highmap to be removed?
free_init_pages via free_initmem()?

with the code change like you suggested,

---
 arch/x86/kernel/setup.c |6 +-
 arch/x86/mm/init.c  |6 ++
 arch/x86/mm/init_64.c   |2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -286,10 +286,6 @@ static void __init cleanup_highmap(void)

 static void __init reserve_brk(void)
 {
-if (_brk_end  _brk_start)
-memblock_reserve(__pa_symbol(_brk_start),
- _brk_end - _brk_start);
-
 /* Mark brk area as locked down and no longer taking any
new allocations */
 _brk_start = 0;
@@ -857,7 +853,7 @@ dump_kernel_offset(struct notifier_block
 void __init setup_arch(char **cmdline_p)
 {
 memblock_reserve(__pa_symbol(_text),
- (unsigned long)__bss_stop - (unsigned long)_text);
+ (unsigned long)_end - (unsigned long)_text);

 early_reserve_initrd();

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1122,7 +1122,7 @@ void mark_rodata_ro(void)
 unsigned long end = (unsigned long) __end_rodata_hpage_align;
 unsigned long text_end = PFN_ALIGN(__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(__end_rodata);
-unsigned long all_end = PFN_ALIGN(_end);
+unsigned long all_end = roundup((unsigned long)_end, PMD_SIZE);

 printk(KERN_INFO Write protecting the kernel read-only data: %luk\n,
(end - start)  10);
Index: linux-2.6/arch/x86/mm/init.c
===
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -637,9 +637,15 @@ void free_init_pages(char *what, unsigne

 void free_initmem(void)
 {
+unsigned long all_end = roundup((unsigned long)_end, PMD_SIZE);
+
 free_init_pages(unused kernel,
 (unsigned long)(__init_begin),
 (unsigned long)(__init_end));
+
+#ifdef CONFIG_X86_64
+free_init_pages(unused brk, PFN_ALIGN(_brk_end), all_end);
+#endif
 }

 #ifdef CONFIG_BLK_DEV_INITRD


got
[7.942636] Freeing unused brk memory: 500K (84383000 -
8440)

---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8220  18M ro PSE GLB x  pmd
0x8220-0x82c0  10M ro PSE GLB NX pmd
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x8300   2M RW GLB NX pte
0x8300-0x8320   2M RW PSE GLB NX pmd
0x8320-0x8340   2M RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-15 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 7:38 PM, Yinghai Lu  wrote:
> On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook  wrote:
>> Is this correct? It sounded like tglx wanted the pmd split, like this:
>>
>> 0x8220-0x82c010M RW   PSE GLB NX pmd
>> 0x82c0-0x82df5000  2004K RW   GLB NX pte
>> 0x82df5000-0x82e044K RW   NX pte
>> 0x82e0-0xc000   978M pmd
>
> Need to remove GLB ?

Please check attached one that clean up the highmap tail...

Subject: [RFC PATCH] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of , as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted mem. and
   is not wasted.
3. add cleanup_highmap_tail for [_brk_end, pm_end).

Kernel Layout:
[0.00]   .text: [0x0100-0x0200d5c8]
[0.00] .rodata: [0x0220-0x02a1cfff]
[0.00]   .data: [0x02c0-0x02e50e7f]
[0.00]   .init: [0x02e52000-0x03212fff]
[0.00].bss: [0x03221000-0x0437bfff]
[0.00].brk: [0x0437c000-0x043a1fff]

Actually used brk:
[0.272959] memblock_reserve: [0x000437c000-0x0004382fff]
flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8220  18M ro PSE GLB x  pmd
0x8220-0x82c0  10M ro PSE GLB NX pmd
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x8300   2M RW GLB NX pte
0x8300-0x8320   2M RW PSE GLB NX pmd
0x8320-0x8340   2M RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843a20001672K RW GLB NX pte
0x843a2000-0x8440 376K RW GLB x  pte
0x8440-0xa000 444M   pmd
After patch:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8220  18M ro PSE GLB x  pmd
0x8220-0x82c0  10M ro PSE GLB NX pmd
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x8300   2M RW GLB NX pte
0x8300-0x8320   2M RW PSE GLB NX pmd
0x8320-0x8340   2M RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843830001548K RW GLB NX pte
0x84383000-0x8440 500K   pte
0x8440-0xa000 444M   pmd

Signed-off-by: Yinghai Lu 

---
 arch/x86/mm/init_64.c  |   23 ++-
 arch/x86/mm/pageattr.c |2 +-
 2 files changed, 23 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -375,6 +375,7 @@ void __init init_extra_mapping_uc(unsign
 __init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE);
 }

+static pmd_t *last_pmd;
 /*
  * The head.S code sets up the kernel high mapping:
  *
@@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
 continue;
 if (vaddr < (unsigned long) _text || vaddr > end)
 set_pmd(pmd, __pmd(0));
+else
+last_pmd = pmd;
 }
 }

+static void __init cleanup_highmap_tail(unsigned long addr)
+{
+int i;
+pte_t *pte;
+
+if (!last_pmd || pmd_none(*last_pmd))
+return;
+
+pte = (pte_t *)pmd_page_vaddr(*last_pmd);
+pte += pte_index(addr);
+
+for (i = pte_index(addr); i < PTRS_PER_PTE; i++, pte++)
+set_pte(pte, __pte(0));
+}
+
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
   pgprot_t prot)
@@ -1124,7 +1142,8 @@ void mark_rodata_ro(void)
 unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-unsigned long all_end = PFN_ALIGN(&_end);
+unsigned long all_end = PFN_ALIGN(_brk_end);
+unsigned long pmd_end = roundup(all_end, PMD_SIZE);

 printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
(end - start) >> 10);
@@ -1137,6 +1156,8 @@ void mark_rodata_ro(void)
  * should also be not-executable.
  */
 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-15 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 7:38 PM, Yinghai Lu ying...@kernel.org wrote:
 On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook keesc...@chromium.org wrote:
 Is this correct? It sounded like tglx wanted the pmd split, like this:

 0x8220-0x82c010M RW   PSE GLB NX pmd
 0x82c0-0x82df5000  2004K RW   GLB NX pte
 0x82df5000-0x82e044K RW   NX pte
 0x82e0-0xc000   978M pmd

 Need to remove GLB ?

Please check attached one that clean up the highmap tail...

Subject: [RFC PATCH] x86, 64bit: cleanup highmap tail near partial 2M range

1. should use _brk_end instead of end, as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted mem. and
   is not wasted.
3. add cleanup_highmap_tail for [_brk_end, pm_end).

Kernel Layout:
[0.00]   .text: [0x0100-0x0200d5c8]
[0.00] .rodata: [0x0220-0x02a1cfff]
[0.00]   .data: [0x02c0-0x02e50e7f]
[0.00]   .init: [0x02e52000-0x03212fff]
[0.00].bss: [0x03221000-0x0437bfff]
[0.00].brk: [0x0437c000-0x043a1fff]

Actually used brk:
[0.272959] memblock_reserve: [0x000437c000-0x0004382fff]
flags 0x0 BRK

Before patch:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8220  18M ro PSE GLB x  pmd
0x8220-0x82c0  10M ro PSE GLB NX pmd
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x8300   2M RW GLB NX pte
0x8300-0x8320   2M RW PSE GLB NX pmd
0x8320-0x8340   2M RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843a20001672K RW GLB NX pte
0x843a2000-0x8440 376K RW GLB x  pte
0x8440-0xa000 444M   pmd
After patch:
---[ High Kernel Mapping ]---
0x8000-0x8100  16M   pmd
0x8100-0x8220  18M ro PSE GLB x  pmd
0x8220-0x82c0  10M ro PSE GLB NX pmd
0x82c0-0x82e0   2M RW PSE GLB NX pmd
0x82e0-0x8300   2M RW GLB NX pte
0x8300-0x8320   2M RW PSE GLB NX pmd
0x8320-0x8340   2M RW GLB NX pte
0x8340-0x8420  14M RW PSE GLB NX pmd
0x8420-0x843830001548K RW GLB NX pte
0x84383000-0x8440 500K   pte
0x8440-0xa000 444M   pmd

Signed-off-by: Yinghai Lu ying...@kernel.org

---
 arch/x86/mm/init_64.c  |   23 ++-
 arch/x86/mm/pageattr.c |2 +-
 2 files changed, 23 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -375,6 +375,7 @@ void __init init_extra_mapping_uc(unsign
 __init_extra_mapping(phys, size, PAGE_KERNEL_LARGE_NOCACHE);
 }

+static pmd_t *last_pmd;
 /*
  * The head.S code sets up the kernel high mapping:
  *
@@ -408,9 +409,26 @@ void __init cleanup_highmap(void)
 continue;
 if (vaddr  (unsigned long) _text || vaddr  end)
 set_pmd(pmd, __pmd(0));
+else
+last_pmd = pmd;
 }
 }

+static void __init cleanup_highmap_tail(unsigned long addr)
+{
+int i;
+pte_t *pte;
+
+if (!last_pmd || pmd_none(*last_pmd))
+return;
+
+pte = (pte_t *)pmd_page_vaddr(*last_pmd);
+pte += pte_index(addr);
+
+for (i = pte_index(addr); i  PTRS_PER_PTE; i++, pte++)
+set_pte(pte, __pte(0));
+}
+
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
   pgprot_t prot)
@@ -1124,7 +1142,8 @@ void mark_rodata_ro(void)
 unsigned long end = (unsigned long) __end_rodata_hpage_align;
 unsigned long text_end = PFN_ALIGN(__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(__end_rodata);
-unsigned long all_end = PFN_ALIGN(_end);
+unsigned long all_end = PFN_ALIGN(_brk_end);
+unsigned long pmd_end = roundup(all_end, PMD_SIZE);

 printk(KERN_INFO Write protecting the kernel read-only data: %luk\n,
(end - start)  10);
@@ -1137,6 +1156,8 @@ void mark_rodata_ro(void)
  * should also be 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook  wrote:
> On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu  wrote:

>> should use attached one instead.
>>
>> 1. should use _brk_end instead of , as we only use partial of
>>brk.
>> 2. [_brk_end, pm_end) page range is already converted. aka
>>is not wasted.
>
> Are you sure? For me, _brk_end isn't far enough:
>
> [1.475572] all_end: 0x82df5000
> [1.476736] _brk_end: 0x82dd6000

Yes. _brk_end should be small then &_end.

>
>>
>> ---
>>  arch/x86/mm/init_64.c |6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6/arch/x86/mm/init_64.c
>> ===
>> --- linux-2.6.orig/arch/x86/mm/init_64.c
>> +++ linux-2.6/arch/x86/mm/init_64.c
>> @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
>>  unsigned long end = (unsigned long) &__end_rodata_hpage_align;
>>  unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
>>  unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
>> -unsigned long all_end = PFN_ALIGN(&_end);
>> +unsigned long all_end = PFN_ALIGN(_brk_end);
>> +unsigned long pmd_end = roundup(all_end, PMD_SIZE);
>>
>>  printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
>> (end - start) >> 10);
>> @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
>>   * The rodata/data/bss/brk section (but not the kernel text!)
>>   * should also be not-executable.
>>   */
>> -set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
>> +set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
>>
>>  rodata_test();
>>
>> @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
>>  set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>>  #endif
>>
>> +/* all_end to pmd_end is handled via free_all_bootmem() */
>>  free_init_pages("unused kernel",
>>  (unsigned long) __va(__pa_symbol(text_end)),
>>  (unsigned long) __va(__pa_symbol(rodata_start)));
>
> This patch produces the same results as my v1 patch:
>
> 0x8202d000-0x8220  1868K RW   GLB NX pte
> 0x8220-0x82e012M RW   PSE GLB NX pmd
> 0x82e0-0xc000   978M pmd
>
> Is this correct? It sounded like tglx wanted the pmd split, like this:
>
> 0x8220-0x82c010M RW   PSE GLB NX pmd
> 0x82c0-0x82df5000  2004K RW   GLB NX pte
> 0x82df5000-0x82e044K RW   NX pte
> 0x82e0-0xc000   978M pmd

Need to remove GLB ?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 7:06 PM, Kees Cook  wrote:
> On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu  wrote:
>>
>> [0.00]   .text: [0x0100-0x0200d548]
>> [0.00] .rodata: [0x0220-0x02a1cfff]
>> [0.00]   .data: [0x02c0-0x02e50e7f]
>> [0.00]   .init: [0x02e52000-0x03212fff]
>> [0.00].bss: [0x03221000-0x0437bfff]
>> [0.00].brk: [0x0437c000-0x043a1fff]
>
> And which CONFIG turns on this reporting?
>

My local patch.

this brk does not shrink yet.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Kees Cook
On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu  wrote:
> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook  wrote:
>> When setting up permissions on kernel memory at boot, the end of the
>> PMD that was split from bss remained executable. It should be NX like
>> the rest. This performs a PMD alignment instead of a PAGE alignment to
>> get the correct span of memory, and should be freed.
>>
>> Before:
>> ---[ High Kernel Mapping ]---
>> ...
>> 0x8202d000-0x8220  1868K RW   GLB NX pte
>> 0x8220-0x82c010M RW   PSE GLB NX pmd
>> 0x82c0-0x82df5000  2004K RW   GLB NX pte
>> 0x82df5000-0x82e044K RW   GLB x  pte
>> 0x82e0-0xc000   978M pmd
>>
>> After:
>> ---[ High Kernel Mapping ]---
>> ...
>> 0x8202d000-0x8220  1868K RW   GLB NX pte
>> 0x8220-0x82c010M RW   PSE GLB NX pmd
>> 0x82c0-0x82df5000  2004K RW   GLB NX pte
>> 0x82df5000-0x82e044K RW   NX pte
>> 0x82e0-0xc000   978M pmd
>>
>> Signed-off-by: Kees Cook 
>> ---
>> v2:
>>  - added call to free_init_pages(), as suggested by tglx
>> ---
>>  arch/x86/mm/init_64.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 4cb8763868fc..0d498c922668 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void)
>> unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
>> unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
>> unsigned long all_end = PFN_ALIGN(&_end);
>> +   unsigned long pmd_end = roundup(all_end, PMD_SIZE);
>>
>> printk(KERN_INFO "Write protecting the kernel read-only data: 
>> %luk\n",
>>(end - start) >> 10);
>> @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void)
>>  * The rodata/data/bss/brk section (but not the kernel text!)
>>  * should also be not-executable.
>>  */
>> -   set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
>> +   set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
>>
>> rodata_test();
>>
>> @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void)
>> set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>>  #endif
>>
>> +   free_init_pages("unused kernel", all_end, pmd_end);
>> free_init_pages("unused kernel",
>> (unsigned long) __va(__pa_symbol(text_end)),
>> (unsigned long) __va(__pa_symbol(rodata_start)));
>
> something is wrong:
>
> [7.842479] Freeing unused kernel memory: 3844K (82e52000 -
> 83213000)
> [7.843305] Write protecting the kernel read-only data: 28672k
> [7.844433] BUG: Bad page state in process swapper/0  pfn:043c0
> [7.845093] page:ea10f000 count:0 mapcount:-127 mapping:
>   (null) index:0x2
> [7.846388] flags: 0x10()
> [7.846871] page dumped because: nonzero mapcount
> [7.847343] Modules linked in:
> [7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 3.18.0-rc4-yh-01896-g40204c8-dirty #23
> [7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
> 04/01/2014
> [7.850014]  828300ca 880078babd68 81ff47d0
> 0001
> [7.850857]  ea10f000 880078babd98 8118c2bd
> 001d4cc0
> [7.851791]  ea10f000 ea10f000 
> 880078babdf8
> [7.852700] Call Trace:
> [7.852991]  [] dump_stack+0x45/0x57
> [7.853494]  [] bad_page+0xfd/0x130
> [7.854130]  [] free_pages_prepare+0x13c/0x1c0
> [7.854808]  [] ? adjust_managed_page_count+0x5d/0x70
> [7.855575]  [] free_hot_cold_page+0x35/0x180
> [7.856326]  [] __free_pages+0x13/0x40
> [7.856854]  [] free_reserved_area+0xcd/0x140
> [7.857442]  [] free_init_pages+0x98/0xb0
> [7.858001]  [] mark_rodata_ro+0xb5/0x120
> [7.858622]  [] ? rest_init+0xc0/0xc0
> [7.859174]  [] kernel_init+0x1d/0x100
> [7.859724]  [] ret_from_fork+0x7c/0xb0
> [7.860279]  [] ? rest_init+0xc0/0xc0
> [7.860836] Disabling lock debugging due to kernel taint
> [7.861432] Freeing unused kernel memory: 376K (843a2000 -
> 8440)
> [7.866118] Freeing unused kernel memory: 1980K (880002011000 -
> 88000220)
> [7.870525] Freeing unused kernel memory: 1932K (880002a1d000 -
> 880002c0)

Also, what tree is this? "Freeing %s" went away in
c88442ec45f30d587b38b935a14acde4e217a926 (and should probably be
re-added, which is what I assume has happened.)

>
> [0.00]   .text: [0x0100-0x0200d548]
> [0.00] .rodata: 

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Kees Cook
On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu  wrote:
> On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu  wrote:
>> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook  wrote:
>>> v2:
>>>  - added call to free_init_pages(), as suggested by tglx
>
>> something is wrong:
>>
>> [7.842479] Freeing unused kernel memory: 3844K (82e52000 -
>> 83213000)
>> [7.843305] Write protecting the kernel read-only data: 28672k
>
> 
> should use attached one instead.
>
> 1. should use _brk_end instead of , as we only use partial of
>brk.
> 2. [_brk_end, pm_end) page range is already converted. aka
>is not wasted.

Are you sure? For me, _brk_end isn't far enough:

[1.475572] all_end: 0x82df5000
[1.476736] _brk_end: 0x82dd6000

>
> ---
>  arch/x86/mm/init_64.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/arch/x86/mm/init_64.c
> ===
> --- linux-2.6.orig/arch/x86/mm/init_64.c
> +++ linux-2.6/arch/x86/mm/init_64.c
> @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
>  unsigned long end = (unsigned long) &__end_rodata_hpage_align;
>  unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
>  unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
> -unsigned long all_end = PFN_ALIGN(&_end);
> +unsigned long all_end = PFN_ALIGN(_brk_end);
> +unsigned long pmd_end = roundup(all_end, PMD_SIZE);
>
>  printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
> (end - start) >> 10);
> @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
>   * The rodata/data/bss/brk section (but not the kernel text!)
>   * should also be not-executable.
>   */
> -set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
> +set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
>
>  rodata_test();
>
> @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
>  set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>  #endif
>
> +/* all_end to pmd_end is handled via free_all_bootmem() */
>  free_init_pages("unused kernel",
>  (unsigned long) __va(__pa_symbol(text_end)),
>  (unsigned long) __va(__pa_symbol(rodata_start)));

This patch produces the same results as my v1 patch:

0x8202d000-0x8220  1868K RW   GLB NX pte
0x8220-0x82e012M RW   PSE GLB NX pmd
0x82e0-0xc000   978M pmd

Is this correct? It sounded like tglx wanted the pmd split, like this:

0x8220-0x82c010M RW   PSE GLB NX pmd
0x82c0-0x82df5000  2004K RW   GLB NX pte
0x82df5000-0x82e044K RW   NX pte
0x82e0-0xc000   978M pmd


-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu  wrote:
> On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook  wrote:
>> v2:
>>  - added call to free_init_pages(), as suggested by tglx

> something is wrong:
>
> [7.842479] Freeing unused kernel memory: 3844K (82e52000 -
> 83213000)
> [7.843305] Write protecting the kernel read-only data: 28672k


should use attached one instead.

1. should use _brk_end instead of , as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted. aka
   is not wasted.

---
 arch/x86/mm/init_64.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
 unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-unsigned long all_end = PFN_ALIGN(&_end);
+unsigned long all_end = PFN_ALIGN(_brk_end);
+unsigned long pmd_end = roundup(all_end, PMD_SIZE);

 printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
(end - start) >> 10);
@@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
  * The rodata/data/bss/brk section (but not the kernel text!)
  * should also be not-executable.
  */
-set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);

 rodata_test();

@@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
 set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif

+/* all_end to pmd_end is handled via free_all_bootmem() */
 free_init_pages("unused kernel",
 (unsigned long) __va(__pa_symbol(text_end)),
 (unsigned long) __va(__pa_symbol(rodata_start)));
1. should use _brk_end instead of , as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted. aka
   is not wasted.

---
 arch/x86/mm/init_64.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
 	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
 	unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
-	unsigned long all_end = PFN_ALIGN(&_end);
+	unsigned long all_end = PFN_ALIGN(_brk_end);
+	unsigned long pmd_end = roundup(all_end, PMD_SIZE);
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	   (end - start) >> 10);
@@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
 	 * The rodata/data/bss/brk section (but not the kernel text!)
 	 * should also be not-executable.
 	 */
-	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
+	set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
 
 	rodata_test();
 
@@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
 
+	/* all_end to pmd_end is handled via free_all_bootmem() */
 	free_init_pages("unused kernel",
 			(unsigned long) __va(__pa_symbol(text_end)),
 			(unsigned long) __va(__pa_symbol(rodata_start)));


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook  wrote:
> When setting up permissions on kernel memory at boot, the end of the
> PMD that was split from bss remained executable. It should be NX like
> the rest. This performs a PMD alignment instead of a PAGE alignment to
> get the correct span of memory, and should be freed.
>
> Before:
> ---[ High Kernel Mapping ]---
> ...
> 0x8202d000-0x8220  1868K RW   GLB NX pte
> 0x8220-0x82c010M RW   PSE GLB NX pmd
> 0x82c0-0x82df5000  2004K RW   GLB NX pte
> 0x82df5000-0x82e044K RW   GLB x  pte
> 0x82e0-0xc000   978M pmd
>
> After:
> ---[ High Kernel Mapping ]---
> ...
> 0x8202d000-0x8220  1868K RW   GLB NX pte
> 0x8220-0x82c010M RW   PSE GLB NX pmd
> 0x82c0-0x82df5000  2004K RW   GLB NX pte
> 0x82df5000-0x82e044K RW   NX pte
> 0x82e0-0xc000   978M pmd
>
> Signed-off-by: Kees Cook 
> ---
> v2:
>  - added call to free_init_pages(), as suggested by tglx
> ---
>  arch/x86/mm/init_64.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 4cb8763868fc..0d498c922668 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void)
> unsigned long text_end = PFN_ALIGN(&__stop___ex_table);
> unsigned long rodata_end = PFN_ALIGN(&__end_rodata);
> unsigned long all_end = PFN_ALIGN(&_end);
> +   unsigned long pmd_end = roundup(all_end, PMD_SIZE);
>
> printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
>(end - start) >> 10);
> @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void)
>  * The rodata/data/bss/brk section (but not the kernel text!)
>  * should also be not-executable.
>  */
> -   set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
> +   set_memory_nx(rodata_start, (pmd_end - rodata_start) >> PAGE_SHIFT);
>
> rodata_test();
>
> @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void)
> set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>  #endif
>
> +   free_init_pages("unused kernel", all_end, pmd_end);
> free_init_pages("unused kernel",
> (unsigned long) __va(__pa_symbol(text_end)),
> (unsigned long) __va(__pa_symbol(rodata_start)));

something is wrong:

[7.842479] Freeing unused kernel memory: 3844K (82e52000 -
83213000)
[7.843305] Write protecting the kernel read-only data: 28672k
[7.844433] BUG: Bad page state in process swapper/0  pfn:043c0
[7.845093] page:ea10f000 count:0 mapcount:-127 mapping:
  (null) index:0x2
[7.846388] flags: 0x10()
[7.846871] page dumped because: nonzero mapcount
[7.847343] Modules linked in:
[7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
3.18.0-rc4-yh-01896-g40204c8-dirty #23
[7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[7.850014]  828300ca 880078babd68 81ff47d0
0001
[7.850857]  ea10f000 880078babd98 8118c2bd
001d4cc0
[7.851791]  ea10f000 ea10f000 
880078babdf8
[7.852700] Call Trace:
[7.852991]  [] dump_stack+0x45/0x57
[7.853494]  [] bad_page+0xfd/0x130
[7.854130]  [] free_pages_prepare+0x13c/0x1c0
[7.854808]  [] ? adjust_managed_page_count+0x5d/0x70
[7.855575]  [] free_hot_cold_page+0x35/0x180
[7.856326]  [] __free_pages+0x13/0x40
[7.856854]  [] free_reserved_area+0xcd/0x140
[7.857442]  [] free_init_pages+0x98/0xb0
[7.858001]  [] mark_rodata_ro+0xb5/0x120
[7.858622]  [] ? rest_init+0xc0/0xc0
[7.859174]  [] kernel_init+0x1d/0x100
[7.859724]  [] ret_from_fork+0x7c/0xb0
[7.860279]  [] ? rest_init+0xc0/0xc0
[7.860836] Disabling lock debugging due to kernel taint
[7.861432] Freeing unused kernel memory: 376K (843a2000 -
8440)
[7.866118] Freeing unused kernel memory: 1980K (880002011000 -
88000220)
[7.870525] Freeing unused kernel memory: 1932K (880002a1d000 -
880002c0)

[0.00]   .text: [0x0100-0x0200d548]
[0.00] .rodata: [0x0220-0x02a1cfff]
[0.00]   .data: [0x02c0-0x02e50e7f]
[0.00]   .init: [0x02e52000-0x03212fff]
[0.00].bss: [0x03221000-0x0437bfff]
[0.00].brk: [0x0437c000-0x043a1fff]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote:
 When setting up permissions on kernel memory at boot, the end of the
 PMD that was split from bss remained executable. It should be NX like
 the rest. This performs a PMD alignment instead of a PAGE alignment to
 get the correct span of memory, and should be freed.

 Before:
 ---[ High Kernel Mapping ]---
 ...
 0x8202d000-0x8220  1868K RW   GLB NX pte
 0x8220-0x82c010M RW   PSE GLB NX pmd
 0x82c0-0x82df5000  2004K RW   GLB NX pte
 0x82df5000-0x82e044K RW   GLB x  pte
 0x82e0-0xc000   978M pmd

 After:
 ---[ High Kernel Mapping ]---
 ...
 0x8202d000-0x8220  1868K RW   GLB NX pte
 0x8220-0x82c010M RW   PSE GLB NX pmd
 0x82c0-0x82df5000  2004K RW   GLB NX pte
 0x82df5000-0x82e044K RW   NX pte
 0x82e0-0xc000   978M pmd

 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
 v2:
  - added call to free_init_pages(), as suggested by tglx
 ---
  arch/x86/mm/init_64.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
 index 4cb8763868fc..0d498c922668 100644
 --- a/arch/x86/mm/init_64.c
 +++ b/arch/x86/mm/init_64.c
 @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void)
 unsigned long text_end = PFN_ALIGN(__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(__end_rodata);
 unsigned long all_end = PFN_ALIGN(_end);
 +   unsigned long pmd_end = roundup(all_end, PMD_SIZE);

 printk(KERN_INFO Write protecting the kernel read-only data: %luk\n,
(end - start)  10);
 @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void)
  * The rodata/data/bss/brk section (but not the kernel text!)
  * should also be not-executable.
  */
 -   set_memory_nx(rodata_start, (all_end - rodata_start)  PAGE_SHIFT);
 +   set_memory_nx(rodata_start, (pmd_end - rodata_start)  PAGE_SHIFT);

 rodata_test();

 @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void)
 set_memory_ro(start, (end-start)  PAGE_SHIFT);
  #endif

 +   free_init_pages(unused kernel, all_end, pmd_end);
 free_init_pages(unused kernel,
 (unsigned long) __va(__pa_symbol(text_end)),
 (unsigned long) __va(__pa_symbol(rodata_start)));

something is wrong:

[7.842479] Freeing unused kernel memory: 3844K (82e52000 -
83213000)
[7.843305] Write protecting the kernel read-only data: 28672k
[7.844433] BUG: Bad page state in process swapper/0  pfn:043c0
[7.845093] page:ea10f000 count:0 mapcount:-127 mapping:
  (null) index:0x2
[7.846388] flags: 0x10()
[7.846871] page dumped because: nonzero mapcount
[7.847343] Modules linked in:
[7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
3.18.0-rc4-yh-01896-g40204c8-dirty #23
[7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
04/01/2014
[7.850014]  828300ca 880078babd68 81ff47d0
0001
[7.850857]  ea10f000 880078babd98 8118c2bd
001d4cc0
[7.851791]  ea10f000 ea10f000 
880078babdf8
[7.852700] Call Trace:
[7.852991]  [81ff47d0] dump_stack+0x45/0x57
[7.853494]  [8118c2bd] bad_page+0xfd/0x130
[7.854130]  [8118c42c] free_pages_prepare+0x13c/0x1c0
[7.854808]  [8118c64d] ? adjust_managed_page_count+0x5d/0x70
[7.855575]  [8118f285] free_hot_cold_page+0x35/0x180
[7.856326]  [8118f3e3] __free_pages+0x13/0x40
[7.856854]  [8118f4dd] free_reserved_area+0xcd/0x140
[7.857442]  [81091778] free_init_pages+0x98/0xb0
[7.858001]  [81092085] mark_rodata_ro+0xb5/0x120
[7.858622]  [81fe3240] ? rest_init+0xc0/0xc0
[7.859174]  [81fe325d] kernel_init+0x1d/0x100
[7.859724]  [820066ec] ret_from_fork+0x7c/0xb0
[7.860279]  [81fe3240] ? rest_init+0xc0/0xc0
[7.860836] Disabling lock debugging due to kernel taint
[7.861432] Freeing unused kernel memory: 376K (843a2000 -
8440)
[7.866118] Freeing unused kernel memory: 1980K (880002011000 -
88000220)
[7.870525] Freeing unused kernel memory: 1932K (880002a1d000 -
880002c0)

[0.00]   .text: [0x0100-0x0200d548]
[0.00] .rodata: [0x0220-0x02a1cfff]
[0.00]   .data: [0x02c0-0x02e50e7f]
[0.00]   .init: [0x02e52000-0x03212fff]
[0.00].bss: [0x03221000-0x0437bfff]
[0.00].brk: [0x0437c000-0x043a1fff]

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote:
 On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote:
 v2:
  - added call to free_init_pages(), as suggested by tglx

 something is wrong:

 [7.842479] Freeing unused kernel memory: 3844K (82e52000 -
 83213000)
 [7.843305] Write protecting the kernel read-only data: 28672k


should use attached one instead.

1. should use _brk_end instead of end, as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted. aka
   is not wasted.

---
 arch/x86/mm/init_64.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
 unsigned long end = (unsigned long) __end_rodata_hpage_align;
 unsigned long text_end = PFN_ALIGN(__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(__end_rodata);
-unsigned long all_end = PFN_ALIGN(_end);
+unsigned long all_end = PFN_ALIGN(_brk_end);
+unsigned long pmd_end = roundup(all_end, PMD_SIZE);

 printk(KERN_INFO Write protecting the kernel read-only data: %luk\n,
(end - start)  10);
@@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
  * The rodata/data/bss/brk section (but not the kernel text!)
  * should also be not-executable.
  */
-set_memory_nx(rodata_start, (all_end - rodata_start)  PAGE_SHIFT);
+set_memory_nx(rodata_start, (pmd_end - rodata_start)  PAGE_SHIFT);

 rodata_test();

@@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
 set_memory_ro(start, (end-start)  PAGE_SHIFT);
 #endif

+/* all_end to pmd_end is handled via free_all_bootmem() */
 free_init_pages(unused kernel,
 (unsigned long) __va(__pa_symbol(text_end)),
 (unsigned long) __va(__pa_symbol(rodata_start)));
1. should use _brk_end instead of end, as we only use partial of
   brk.
2. [_brk_end, pm_end) page range is already converted. aka
   is not wasted.

---
 arch/x86/mm/init_64.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
 	unsigned long end = (unsigned long) __end_rodata_hpage_align;
 	unsigned long text_end = PFN_ALIGN(__stop___ex_table);
 	unsigned long rodata_end = PFN_ALIGN(__end_rodata);
-	unsigned long all_end = PFN_ALIGN(_end);
+	unsigned long all_end = PFN_ALIGN(_brk_end);
+	unsigned long pmd_end = roundup(all_end, PMD_SIZE);
 
 	printk(KERN_INFO Write protecting the kernel read-only data: %luk\n,
 	   (end - start)  10);
@@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
 	 * The rodata/data/bss/brk section (but not the kernel text!)
 	 * should also be not-executable.
 	 */
-	set_memory_nx(rodata_start, (all_end - rodata_start)  PAGE_SHIFT);
+	set_memory_nx(rodata_start, (pmd_end - rodata_start)  PAGE_SHIFT);
 
 	rodata_test();
 
@@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start)  PAGE_SHIFT);
 #endif
 
+	/* all_end to pmd_end is handled via free_all_bootmem() */
 	free_init_pages(unused kernel,
 			(unsigned long) __va(__pa_symbol(text_end)),
 			(unsigned long) __va(__pa_symbol(rodata_start)));


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Kees Cook
On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote:
 On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote:
 On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote:
 v2:
  - added call to free_init_pages(), as suggested by tglx

 something is wrong:

 [7.842479] Freeing unused kernel memory: 3844K (82e52000 -
 83213000)
 [7.843305] Write protecting the kernel read-only data: 28672k

 
 should use attached one instead.

 1. should use _brk_end instead of end, as we only use partial of
brk.
 2. [_brk_end, pm_end) page range is already converted. aka
is not wasted.

Are you sure? For me, _brk_end isn't far enough:

[1.475572] all_end: 0x82df5000
[1.476736] _brk_end: 0x82dd6000


 ---
  arch/x86/mm/init_64.c |6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

 Index: linux-2.6/arch/x86/mm/init_64.c
 ===
 --- linux-2.6.orig/arch/x86/mm/init_64.c
 +++ linux-2.6/arch/x86/mm/init_64.c
 @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
  unsigned long end = (unsigned long) __end_rodata_hpage_align;
  unsigned long text_end = PFN_ALIGN(__stop___ex_table);
  unsigned long rodata_end = PFN_ALIGN(__end_rodata);
 -unsigned long all_end = PFN_ALIGN(_end);
 +unsigned long all_end = PFN_ALIGN(_brk_end);
 +unsigned long pmd_end = roundup(all_end, PMD_SIZE);

  printk(KERN_INFO Write protecting the kernel read-only data: %luk\n,
 (end - start)  10);
 @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
   * The rodata/data/bss/brk section (but not the kernel text!)
   * should also be not-executable.
   */
 -set_memory_nx(rodata_start, (all_end - rodata_start)  PAGE_SHIFT);
 +set_memory_nx(rodata_start, (pmd_end - rodata_start)  PAGE_SHIFT);

  rodata_test();

 @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
  set_memory_ro(start, (end-start)  PAGE_SHIFT);
  #endif

 +/* all_end to pmd_end is handled via free_all_bootmem() */
  free_init_pages(unused kernel,
  (unsigned long) __va(__pa_symbol(text_end)),
  (unsigned long) __va(__pa_symbol(rodata_start)));

This patch produces the same results as my v1 patch:

0x8202d000-0x8220  1868K RW   GLB NX pte
0x8220-0x82e012M RW   PSE GLB NX pmd
0x82e0-0xc000   978M pmd

Is this correct? It sounded like tglx wanted the pmd split, like this:

0x8220-0x82c010M RW   PSE GLB NX pmd
0x82c0-0x82df5000  2004K RW   GLB NX pte
0x82df5000-0x82e044K RW   NX pte
0x82e0-0xc000   978M pmd


-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Kees Cook
On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote:
 On Fri, Nov 14, 2014 at 12:45 PM, Kees Cook keesc...@chromium.org wrote:
 When setting up permissions on kernel memory at boot, the end of the
 PMD that was split from bss remained executable. It should be NX like
 the rest. This performs a PMD alignment instead of a PAGE alignment to
 get the correct span of memory, and should be freed.

 Before:
 ---[ High Kernel Mapping ]---
 ...
 0x8202d000-0x8220  1868K RW   GLB NX pte
 0x8220-0x82c010M RW   PSE GLB NX pmd
 0x82c0-0x82df5000  2004K RW   GLB NX pte
 0x82df5000-0x82e044K RW   GLB x  pte
 0x82e0-0xc000   978M pmd

 After:
 ---[ High Kernel Mapping ]---
 ...
 0x8202d000-0x8220  1868K RW   GLB NX pte
 0x8220-0x82c010M RW   PSE GLB NX pmd
 0x82c0-0x82df5000  2004K RW   GLB NX pte
 0x82df5000-0x82e044K RW   NX pte
 0x82e0-0xc000   978M pmd

 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
 v2:
  - added call to free_init_pages(), as suggested by tglx
 ---
  arch/x86/mm/init_64.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
 index 4cb8763868fc..0d498c922668 100644
 --- a/arch/x86/mm/init_64.c
 +++ b/arch/x86/mm/init_64.c
 @@ -1124,6 +1124,7 @@ void mark_rodata_ro(void)
 unsigned long text_end = PFN_ALIGN(__stop___ex_table);
 unsigned long rodata_end = PFN_ALIGN(__end_rodata);
 unsigned long all_end = PFN_ALIGN(_end);
 +   unsigned long pmd_end = roundup(all_end, PMD_SIZE);

 printk(KERN_INFO Write protecting the kernel read-only data: 
 %luk\n,
(end - start)  10);
 @@ -1135,7 +1136,7 @@ void mark_rodata_ro(void)
  * The rodata/data/bss/brk section (but not the kernel text!)
  * should also be not-executable.
  */
 -   set_memory_nx(rodata_start, (all_end - rodata_start)  PAGE_SHIFT);
 +   set_memory_nx(rodata_start, (pmd_end - rodata_start)  PAGE_SHIFT);

 rodata_test();

 @@ -1147,6 +1148,7 @@ void mark_rodata_ro(void)
 set_memory_ro(start, (end-start)  PAGE_SHIFT);
  #endif

 +   free_init_pages(unused kernel, all_end, pmd_end);
 free_init_pages(unused kernel,
 (unsigned long) __va(__pa_symbol(text_end)),
 (unsigned long) __va(__pa_symbol(rodata_start)));

 something is wrong:

 [7.842479] Freeing unused kernel memory: 3844K (82e52000 -
 83213000)
 [7.843305] Write protecting the kernel read-only data: 28672k
 [7.844433] BUG: Bad page state in process swapper/0  pfn:043c0
 [7.845093] page:ea10f000 count:0 mapcount:-127 mapping:
   (null) index:0x2
 [7.846388] flags: 0x10()
 [7.846871] page dumped because: nonzero mapcount
 [7.847343] Modules linked in:
 [7.847719] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
 3.18.0-rc4-yh-01896-g40204c8-dirty #23
 [7.848809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
 BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
 04/01/2014
 [7.850014]  828300ca 880078babd68 81ff47d0
 0001
 [7.850857]  ea10f000 880078babd98 8118c2bd
 001d4cc0
 [7.851791]  ea10f000 ea10f000 
 880078babdf8
 [7.852700] Call Trace:
 [7.852991]  [81ff47d0] dump_stack+0x45/0x57
 [7.853494]  [8118c2bd] bad_page+0xfd/0x130
 [7.854130]  [8118c42c] free_pages_prepare+0x13c/0x1c0
 [7.854808]  [8118c64d] ? adjust_managed_page_count+0x5d/0x70
 [7.855575]  [8118f285] free_hot_cold_page+0x35/0x180
 [7.856326]  [8118f3e3] __free_pages+0x13/0x40
 [7.856854]  [8118f4dd] free_reserved_area+0xcd/0x140
 [7.857442]  [81091778] free_init_pages+0x98/0xb0
 [7.858001]  [81092085] mark_rodata_ro+0xb5/0x120
 [7.858622]  [81fe3240] ? rest_init+0xc0/0xc0
 [7.859174]  [81fe325d] kernel_init+0x1d/0x100
 [7.859724]  [820066ec] ret_from_fork+0x7c/0xb0
 [7.860279]  [81fe3240] ? rest_init+0xc0/0xc0
 [7.860836] Disabling lock debugging due to kernel taint
 [7.861432] Freeing unused kernel memory: 376K (843a2000 -
 8440)
 [7.866118] Freeing unused kernel memory: 1980K (880002011000 -
 88000220)
 [7.870525] Freeing unused kernel memory: 1932K (880002a1d000 -
 880002c0)

Also, what tree is this? Freeing %s went away in
c88442ec45f30d587b38b935a14acde4e217a926 (and should probably be
re-added, which is what I assume has happened.)


 [

Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 7:06 PM, Kees Cook keesc...@chromium.org wrote:
 On Fri, Nov 14, 2014 at 5:29 PM, Yinghai Lu ying...@kernel.org wrote:

 [0.00]   .text: [0x0100-0x0200d548]
 [0.00] .rodata: [0x0220-0x02a1cfff]
 [0.00]   .data: [0x02c0-0x02e50e7f]
 [0.00]   .init: [0x02e52000-0x03212fff]
 [0.00].bss: [0x03221000-0x0437bfff]
 [0.00].brk: [0x0437c000-0x043a1fff]

 And which CONFIG turns on this reporting?


My local patch.

this brk does not shrink yet.

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, mm: set NX across entire PMD at boot

2014-11-14 Thread Yinghai Lu
On Fri, Nov 14, 2014 at 6:46 PM, Kees Cook keesc...@chromium.org wrote:
 On Fri, Nov 14, 2014 at 6:29 PM, Yinghai Lu ying...@kernel.org wrote:

 should use attached one instead.

 1. should use _brk_end instead of end, as we only use partial of
brk.
 2. [_brk_end, pm_end) page range is already converted. aka
is not wasted.

 Are you sure? For me, _brk_end isn't far enough:

 [1.475572] all_end: 0x82df5000
 [1.476736] _brk_end: 0x82dd6000

Yes. _brk_end should be small then _end.



 ---
  arch/x86/mm/init_64.c |6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

 Index: linux-2.6/arch/x86/mm/init_64.c
 ===
 --- linux-2.6.orig/arch/x86/mm/init_64.c
 +++ linux-2.6/arch/x86/mm/init_64.c
 @@ -1124,7 +1124,8 @@ void mark_rodata_ro(void)
  unsigned long end = (unsigned long) __end_rodata_hpage_align;
  unsigned long text_end = PFN_ALIGN(__stop___ex_table);
  unsigned long rodata_end = PFN_ALIGN(__end_rodata);
 -unsigned long all_end = PFN_ALIGN(_end);
 +unsigned long all_end = PFN_ALIGN(_brk_end);
 +unsigned long pmd_end = roundup(all_end, PMD_SIZE);

  printk(KERN_INFO Write protecting the kernel read-only data: %luk\n,
 (end - start)  10);
 @@ -1136,7 +1137,7 @@ void mark_rodata_ro(void)
   * The rodata/data/bss/brk section (but not the kernel text!)
   * should also be not-executable.
   */
 -set_memory_nx(rodata_start, (all_end - rodata_start)  PAGE_SHIFT);
 +set_memory_nx(rodata_start, (pmd_end - rodata_start)  PAGE_SHIFT);

  rodata_test();

 @@ -1148,6 +1149,7 @@ void mark_rodata_ro(void)
  set_memory_ro(start, (end-start)  PAGE_SHIFT);
  #endif

 +/* all_end to pmd_end is handled via free_all_bootmem() */
  free_init_pages(unused kernel,
  (unsigned long) __va(__pa_symbol(text_end)),
  (unsigned long) __va(__pa_symbol(rodata_start)));

 This patch produces the same results as my v1 patch:

 0x8202d000-0x8220  1868K RW   GLB NX pte
 0x8220-0x82e012M RW   PSE GLB NX pmd
 0x82e0-0xc000   978M pmd

 Is this correct? It sounded like tglx wanted the pmd split, like this:

 0x8220-0x82c010M RW   PSE GLB NX pmd
 0x82c0-0x82df5000  2004K RW   GLB NX pte
 0x82df5000-0x82e044K RW   NX pte
 0x82e0-0xc000   978M pmd

Need to remove GLB ?

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/