Re: [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()

2018-08-04 Thread Andi Kleen
> [ Goes around and rummages ]
> 
> Oh, never mind, looking around reminded me why: we want to map the
> kernel text in the top 31 bits, so that we can use the faster
> -mcmodel=kernel because all symbols fit in sign-extended 32 bits.
> 
> Maybe there was some other reason too, but I think that's it.

No that was the only reason.

Large code model would be extremely expensive, and PIC linked
kernel also had some issues. So we ended up with this set up.

-Andi


Re: [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()

2018-08-04 Thread Hugh Dickins
On Sat, 4 Aug 2018, Linus Torvalds wrote:
> On Fri, Aug 3, 2018 at 5:19 PM Hugh Dickins  wrote:
> >
> > I thought that virt_to_page() only works on virtual addresses
> > in the direct map
> 
> You're right that virt_to_page() does not work on any _actual_ virtual
> mappings (ie no user pages, and no vmalloc() pages etc). It does not
> follow page tables at all.
> 
> And on 32-bit, it literally ends up doing (see __phys_addr_nodebug()) a simple
> 
> #define __phys_addr_nodebug(x)  ((x) - PAGE_OFFSET)
> 
> However, on x86-64, we have *two* cases of direct mappings: we have
> the one at __START_KERNEL_map, and we have the one at PAGE_OFFSET.
> 
> And virt_to_page() handles both of those direct mappings.
> 
> Annoying? Yes. And it has caused bugs in the past. And I entirely
> forget why we needed it on x86-64.
> 
> [ Goes around and rummages ]
> 
> Oh, never mind, looking around reminded me why: we want to map the
> kernel text in the top 31 bits, so that we can use the faster
> -mcmodel=kernel because all symbols fit in sign-extended 32 bits.
> 
> Maybe there was some other reason too, but I think that's it.

Thanks a lot for writing that up. You shamed me into grepping
a little harder than I did yesterday, when all I could find were
"- PAGE_OFFSET" conversions (maybe I got lost in 32-bit-land).
I had missed __phys_addr_nodebug(), where the __START_KERNEL_map
alternative is handled.

Hugh


Re: [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()

2018-08-04 Thread Linus Torvalds
On Fri, Aug 3, 2018 at 5:19 PM Hugh Dickins  wrote:
>
> I thought that virt_to_page() only works on virtual addresses
> in the direct map

You're right that virt_to_page() does not work on any _actual_ virtual
mappings (ie no user pages, and no vmalloc() pages etc). It does not
follow page tables at all.

And on 32-bit, it literally ends up doing (see __phys_addr_nodebug()) a simple

#define __phys_addr_nodebug(x)  ((x) - PAGE_OFFSET)

However, on x86-64, we have *two* cases of direct mappings: we have
the one at __START_KERNEL_map, and we have the one at PAGE_OFFSET.

And virt_to_page() handles both of those direct mappings.

Annoying? Yes. And it has caused bugs in the past. And I entirely
forget why we needed it on x86-64.

[ Goes around and rummages ]

Oh, never mind, looking around reminded me why: we want to map the
kernel text in the top 31 bits, so that we can use the faster
-mcmodel=kernel because all symbols fit in sign-extended 32 bits.

Maybe there was some other reason too, but I think that's it.

   Linus


Re: [PATCH 3/7] x86/mm/init: pass unconverted symbol addresses to free_init_pages()

2018-08-03 Thread Hugh Dickins
On Thu, 2 Aug 2018, Dave Hansen wrote:
> 
> From: Dave Hansen 
> 
> The x86 code has several places where it frees parts of kernel image:
> 
>  1. Unused SMP alternative
>  2. __init code
>  3. The hole between text and rodata
>  4. The hole between rodata and data
> 
> We call free_init_pages() to do this.  Strangely, we convert the
> symbol addresses to kernel direct map addresses in some cases
> (#3, #4) but not others (#1, #2).
> 
> The virt_to_page() and the other code in free_reserved_area() now
> works fine for for symbol addresses on x86, so don't bother
> converting the addresses to direct map addresses before freeing
> them.
> 

Thanks Dave, this series works for me.

But in trying to review it, I am feeling very stupid about this 3/7
(and/or the 2/7 before it: though I don't think that it's wrong).

I simply do not understand how this change works, and how #1 and #2
work. I thought that virt_to_page() only works on virtual addresses
in the direct map: the __va(__pa_symbol()) you remove below was a
good conversion from high kernel mapping to direct map. Without it,
how does virt_to_page() then find the right page?

Is it that there's a guaranteed common alignment between the direct
map and the high kernel mapping, and the 8... addresses
pass through the arithmetic in such a way that virt_to_page() comes
up with the right answer; and that is well-known and relied upon by
x86 developers? Or are you now adding a dependency on a coincidence?
Or does it not work at all, and the pages are actually not freed?

Hugh

p.s. if there were to be a respin (I hope not), I notice that in
both 1/7 and 2/7 commit comments, you've said " no " for " not ";
which always makes us pause and wonder if you meant " now ".

> Signed-off-by: Dave Hansen 
> Cc: Kees Cook 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Andrea Arcangeli 
> Cc: Juergen Gross 
> Cc: Josh Poimboeuf 
> Cc: Greg Kroah-Hartman 
> Cc: Peter Zijlstra 
> Cc: Hugh Dickins 
> Cc: Linus Torvalds 
> Cc: Borislav Petkov 
> Cc: Andy Lutomirski 
> Cc: Andi Kleen 
> ---
> 
>  b/arch/x86/mm/init_64.c |8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff -puN arch/x86/mm/init_64.c~x86-init-do-not-convert-symbol-addresses 
> arch/x86/mm/init_64.c
> --- a/arch/x86/mm/init_64.c~x86-init-do-not-convert-symbol-addresses  
> 2018-08-02 14:14:48.380483277 -0700
> +++ b/arch/x86/mm/init_64.c   2018-08-02 14:14:48.383483277 -0700
> @@ -1283,12 +1283,8 @@ void mark_rodata_ro(void)
>   set_memory_ro(start, (end-start) >> PAGE_SHIFT);
>  #endif
>  
> - free_init_pages("unused kernel",
> - (unsigned long) __va(__pa_symbol(text_end)),
> - (unsigned long) __va(__pa_symbol(rodata_start)));
> - free_init_pages("unused kernel",
> - (unsigned long) __va(__pa_symbol(rodata_end)),
> - (unsigned long) __va(__pa_symbol(_sdata)));
> + free_init_pages("unused kernel", text_end, rodata_start);
> + free_init_pages("unused kernel", rodata_end, _sdata);
>  
>   debug_checkwx();
>  
> _