Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-25 Thread Yinghai Lu
On Tue, Dec 25, 2012 at 3:20 AM, Borislav Petkov  wrote:
> On Mon, Dec 24, 2012 at 08:04:18PM -0800, Yinghai Lu wrote:
>> well, I updated for-x86-boot-v7 that stop #PF handler after
>> init_mem_mapping.
>>
>> it has fix for AMD system aka reverting far jmp to ret.
>
> -v7?
>
> You told me yesterday -v8 is the current branch. Do you have -v7 which
> does break KGDB and -v8 which breaks it and both branches are current?
>

-v7: stop #PF handler after init_mem_mapping, so it could break KGDB,
if someone try to use mdump.
-v8: stop #PF handler before x86_64_start_reservations.

Now both have be updated and could work with AMD platform after drop
the change with lretq aka keep lretq.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-25 Thread Borislav Petkov
On Mon, Dec 24, 2012 at 08:04:18PM -0800, Yinghai Lu wrote:
> well, I updated for-x86-boot-v7 that stop #PF handler after
> init_mem_mapping.
>
> it has fix for AMD system aka reverting far jmp to ret.

-v7?

You told me yesterday -v8 is the current branch. Do you have -v7 which
does break KGDB and -v8 which breaks it and both branches are current?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-25 Thread Borislav Petkov
On Mon, Dec 24, 2012 at 08:04:18PM -0800, Yinghai Lu wrote:
 well, I updated for-x86-boot-v7 that stop #PF handler after
 init_mem_mapping.

 it has fix for AMD system aka reverting far jmp to ret.

-v7?

You told me yesterday -v8 is the current branch. Do you have -v7 which
does break KGDB and -v8 which breaks it and both branches are current?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-25 Thread Yinghai Lu
On Tue, Dec 25, 2012 at 3:20 AM, Borislav Petkov b...@alien8.de wrote:
 On Mon, Dec 24, 2012 at 08:04:18PM -0800, Yinghai Lu wrote:
 well, I updated for-x86-boot-v7 that stop #PF handler after
 init_mem_mapping.

 it has fix for AMD system aka reverting far jmp to ret.

 -v7?

 You told me yesterday -v8 is the current branch. Do you have -v7 which
 does break KGDB and -v8 which breaks it and both branches are current?


-v7: stop #PF handler after init_mem_mapping, so it could break KGDB,
if someone try to use mdump.
-v8: stop #PF handler before x86_64_start_reservations.

Now both have be updated and could work with AMD platform after drop
the change with lretq aka keep lretq.

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-24 Thread Yinghai Lu
On Mon, Dec 24, 2012 at 4:16 PM, H. Peter Anvin  wrote:
> On 12/20/2012 08:56 AM, Yinghai Lu wrote:
>>>
>>>
>>> So in that case, kgdb is broken and will need to be fixed up.  That
>>> happens all the time with debugging tools.
>>
>>
>> If there is a way that we can make all parties happy, we really should
>> not break KGDB.
>>
>> Please reconsider to stop #PF handler in x86_64_start_kernel. in that case
>> 1. microcode update still can use #PF handler to find microcode in
>> ramdisk and use it.
>> 2. kernel that is loaded above 4G, could set mapping in C instead of
>> set that in head_64.S
>> and use ioremap to access zero_page
>> 3. KGDB still can call early_trap_init early before init_mem_mapping.
>>
>
> Yinghai, this is total and utter bullshit.
>
> We should *fix* kgdb, not pave around it.  I refuse to have kgdb be yet
> another Xen turning random kernel internals into ABIs.

well, I updated for-x86-boot-v7 that stop #PF handler after init_mem_mapping.

it has fix for AMD system aka reverting far jmp to ret.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-24 Thread H. Peter Anvin

On 12/20/2012 08:56 AM, Yinghai Lu wrote:


So in that case, kgdb is broken and will need to be fixed up.  That
happens all the time with debugging tools.


If there is a way that we can make all parties happy, we really should
not break KGDB.

Please reconsider to stop #PF handler in x86_64_start_kernel. in that case
1. microcode update still can use #PF handler to find microcode in
ramdisk and use it.
2. kernel that is loaded above 4G, could set mapping in C instead of
set that in head_64.S
and use ioremap to access zero_page
3. KGDB still can call early_trap_init early before init_mem_mapping.



Yinghai, this is total and utter bullshit.

We should *fix* kgdb, not pave around it.  I refuse to have kgdb be yet 
another Xen turning random kernel internals into ABIs.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-24 Thread H. Peter Anvin

On 12/20/2012 08:56 AM, Yinghai Lu wrote:


So in that case, kgdb is broken and will need to be fixed up.  That
happens all the time with debugging tools.


If there is a way that we can make all parties happy, we really should
not break KGDB.

Please reconsider to stop #PF handler in x86_64_start_kernel. in that case
1. microcode update still can use #PF handler to find microcode in
ramdisk and use it.
2. kernel that is loaded above 4G, could set mapping in C instead of
set that in head_64.S
and use ioremap to access zero_page
3. KGDB still can call early_trap_init early before init_mem_mapping.



Yinghai, this is total and utter bullshit.

We should *fix* kgdb, not pave around it.  I refuse to have kgdb be yet 
another Xen turning random kernel internals into ABIs.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-24 Thread Yinghai Lu
On Mon, Dec 24, 2012 at 4:16 PM, H. Peter Anvin h...@zytor.com wrote:
 On 12/20/2012 08:56 AM, Yinghai Lu wrote:


 So in that case, kgdb is broken and will need to be fixed up.  That
 happens all the time with debugging tools.


 If there is a way that we can make all parties happy, we really should
 not break KGDB.

 Please reconsider to stop #PF handler in x86_64_start_kernel. in that case
 1. microcode update still can use #PF handler to find microcode in
 ramdisk and use it.
 2. kernel that is loaded above 4G, could set mapping in C instead of
 set that in head_64.S
 and use ioremap to access zero_page
 3. KGDB still can call early_trap_init early before init_mem_mapping.


 Yinghai, this is total and utter bullshit.

 We should *fix* kgdb, not pave around it.  I refuse to have kgdb be yet
 another Xen turning random kernel internals into ABIs.

well, I updated for-x86-boot-v7 that stop #PF handler after init_mem_mapping.

it has fix for AMD system aka reverting far jmp to ret.

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-20 Thread Yinghai Lu
On Tue, Dec 18, 2012 at 1:07 PM, H. Peter Anvin  wrote:
> On 12/18/2012 12:55 PM, Yinghai Lu wrote:
>> On Tue, Dec 18, 2012 at 12:49 PM, H. Peter Anvin  wrote:
>>> On 12/18/2012 12:43 PM, Yinghai Lu wrote:
>>
>>>
>>> That is putting the cart before the horse.  What is the specific requirement
>>> with kgdb here (I didn't see any email on that, please don't have private
>>> back conversations)?  Either way, however, kgdb is a tool to debug the
>>> kernel... having it a barrier for proper functionality of the kernel is not
>>> acceptable.
>>
>> did not hear back from Jason or Jan.
>>
>> Looks like last mail in LKML from Jason is about Oct 20
>>
>> looks like kgdb is want DB, BP, and PF are set at first.
>>
>> and just after that early_param for kgdbwait will get into to hold the 
>> kernel.
>>
>> then command from kgdb could dump ram etc.
>>
>
> So in that case, kgdb is broken and will need to be fixed up.  That
> happens all the time with debugging tools.
>

If there is a way that we can make all parties happy, we really should
not break KGDB.

Please reconsider to stop #PF handler in x86_64_start_kernel. in that case
1. microcode update still can use #PF handler to find microcode in
ramdisk and use it.
2. kernel that is loaded above 4G, could set mapping in C instead of
set that in head_64.S
   and use ioremap to access zero_page
3. KGDB still can call early_trap_init early before init_mem_mapping.

I put the change in for-x86-boot-v8 branch.
core patch is:
http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=6fa4f1e68f0b67d0dc13d30c5ce6c3932697d08f

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-20 Thread Yinghai Lu
On Tue, Dec 18, 2012 at 1:07 PM, H. Peter Anvin h...@zytor.com wrote:
 On 12/18/2012 12:55 PM, Yinghai Lu wrote:
 On Tue, Dec 18, 2012 at 12:49 PM, H. Peter Anvin h...@zytor.com wrote:
 On 12/18/2012 12:43 PM, Yinghai Lu wrote:


 That is putting the cart before the horse.  What is the specific requirement
 with kgdb here (I didn't see any email on that, please don't have private
 back conversations)?  Either way, however, kgdb is a tool to debug the
 kernel... having it a barrier for proper functionality of the kernel is not
 acceptable.

 did not hear back from Jason or Jan.

 Looks like last mail in LKML from Jason is about Oct 20

 looks like kgdb is want DB, BP, and PF are set at first.

 and just after that early_param for kgdbwait will get into to hold the 
 kernel.

 then command from kgdb could dump ram etc.


 So in that case, kgdb is broken and will need to be fixed up.  That
 happens all the time with debugging tools.


If there is a way that we can make all parties happy, we really should
not break KGDB.

Please reconsider to stop #PF handler in x86_64_start_kernel. in that case
1. microcode update still can use #PF handler to find microcode in
ramdisk and use it.
2. kernel that is loaded above 4G, could set mapping in C instead of
set that in head_64.S
   and use ioremap to access zero_page
3. KGDB still can call early_trap_init early before init_mem_mapping.

I put the change in for-x86-boot-v8 branch.
core patch is:
http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=6fa4f1e68f0b67d0dc13d30c5ce6c3932697d08f

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-18 Thread H. Peter Anvin
On 12/18/2012 12:55 PM, Yinghai Lu wrote:
> On Tue, Dec 18, 2012 at 12:49 PM, H. Peter Anvin  wrote:
>> On 12/18/2012 12:43 PM, Yinghai Lu wrote:
> 
>>
>> That is putting the cart before the horse.  What is the specific requirement
>> with kgdb here (I didn't see any email on that, please don't have private
>> back conversations)?  Either way, however, kgdb is a tool to debug the
>> kernel... having it a barrier for proper functionality of the kernel is not
>> acceptable.
> 
> did not hear back from Jason or Jan.
> 
> Looks like last mail in LKML from Jason is about Oct 20
> 
> looks like kgdb is want DB, BP, and PF are set at first.
> 
> and just after that early_param for kgdbwait will get into to hold the kernel.
> 
> then command from kgdb could dump ram etc.
> 

So in that case, kgdb is broken and will need to be fixed up.  That
happens all the time with debugging tools.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-18 Thread Yinghai Lu
On Tue, Dec 18, 2012 at 12:49 PM, H. Peter Anvin  wrote:
> On 12/18/2012 12:43 PM, Yinghai Lu wrote:

>
> That is putting the cart before the horse.  What is the specific requirement
> with kgdb here (I didn't see any email on that, please don't have private
> back conversations)?  Either way, however, kgdb is a tool to debug the
> kernel... having it a barrier for proper functionality of the kernel is not
> acceptable.

did not hear back from Jason or Jan.

Looks like last mail in LKML from Jason is about Oct 20

looks like kgdb is want DB, BP, and PF are set at first.

and just after that early_param for kgdbwait will get into to hold the kernel.

then command from kgdb could dump ram etc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-18 Thread H. Peter Anvin

On 12/18/2012 12:43 PM, Yinghai Lu wrote:

On Mon, Dec 17, 2012 at 11:15 PM, Yinghai Lu  wrote:

-v8: we need to keep that handler alive until init_mem_mapping and don't
  let early_trap_init to trash that early #PF handler.
  So split early_trap_pf_init out and move it down. - Yinghai


Peter,

looks like moving down early_trap_init would break kgdb.

we could make temporary early pgt cover 1G, and kernel and stop updating later.

please check attached patch.

init_mem_mapping need to be change a little: map BRK at first then switch pgt.

Thanks

Yinghai



That is putting the cart before the horse.  What is the specific 
requirement with kgdb here (I didn't see any email on that, please don't 
have private back conversations)?  Either way, however, kgdb is a tool 
to debug the kernel... having it a barrier for proper functionality of 
the kernel is not acceptable.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-18 Thread Yinghai Lu
On Mon, Dec 17, 2012 at 11:15 PM, Yinghai Lu  wrote:
> -v8: we need to keep that handler alive until init_mem_mapping and don't
>  let early_trap_init to trash that early #PF handler.
>  So split early_trap_pf_init out and move it down. - Yinghai

Peter,

looks like moving down early_trap_init would break kgdb.

we could make temporary early pgt cover 1G, and kernel and stop updating later.

please check attached patch.

init_mem_mapping need to be change a little: map BRK at first then switch pgt.

Thanks

Yinghai


hpa_pf_set_page_table_5.patch
Description: Binary data


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-18 Thread Yinghai Lu
On Mon, Dec 17, 2012 at 11:15 PM, Yinghai Lu ying...@kernel.org wrote:
 -v8: we need to keep that handler alive until init_mem_mapping and don't
  let early_trap_init to trash that early #PF handler.
  So split early_trap_pf_init out and move it down. - Yinghai

Peter,

looks like moving down early_trap_init would break kgdb.

we could make temporary early pgt cover 1G, and kernel and stop updating later.

please check attached patch.

init_mem_mapping need to be change a little: map BRK at first then switch pgt.

Thanks

Yinghai


hpa_pf_set_page_table_5.patch
Description: Binary data


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-18 Thread H. Peter Anvin

On 12/18/2012 12:43 PM, Yinghai Lu wrote:

On Mon, Dec 17, 2012 at 11:15 PM, Yinghai Lu ying...@kernel.org wrote:

-v8: we need to keep that handler alive until init_mem_mapping and don't
  let early_trap_init to trash that early #PF handler.
  So split early_trap_pf_init out and move it down. - Yinghai


Peter,

looks like moving down early_trap_init would break kgdb.

we could make temporary early pgt cover 1G, and kernel and stop updating later.

please check attached patch.

init_mem_mapping need to be change a little: map BRK at first then switch pgt.

Thanks

Yinghai



That is putting the cart before the horse.  What is the specific 
requirement with kgdb here (I didn't see any email on that, please don't 
have private back conversations)?  Either way, however, kgdb is a tool 
to debug the kernel... having it a barrier for proper functionality of 
the kernel is not acceptable.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-18 Thread Yinghai Lu
On Tue, Dec 18, 2012 at 12:49 PM, H. Peter Anvin h...@zytor.com wrote:
 On 12/18/2012 12:43 PM, Yinghai Lu wrote:


 That is putting the cart before the horse.  What is the specific requirement
 with kgdb here (I didn't see any email on that, please don't have private
 back conversations)?  Either way, however, kgdb is a tool to debug the
 kernel... having it a barrier for proper functionality of the kernel is not
 acceptable.

did not hear back from Jason or Jan.

Looks like last mail in LKML from Jason is about Oct 20

looks like kgdb is want DB, BP, and PF are set at first.

and just after that early_param for kgdbwait will get into to hold the kernel.

then command from kgdb could dump ram etc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-18 Thread H. Peter Anvin
On 12/18/2012 12:55 PM, Yinghai Lu wrote:
 On Tue, Dec 18, 2012 at 12:49 PM, H. Peter Anvin h...@zytor.com wrote:
 On 12/18/2012 12:43 PM, Yinghai Lu wrote:
 

 That is putting the cart before the horse.  What is the specific requirement
 with kgdb here (I didn't see any email on that, please don't have private
 back conversations)?  Either way, however, kgdb is a tool to debug the
 kernel... having it a barrier for proper functionality of the kernel is not
 acceptable.
 
 did not hear back from Jason or Jan.
 
 Looks like last mail in LKML from Jason is about Oct 20
 
 looks like kgdb is want DB, BP, and PF are set at first.
 
 and just after that early_param for kgdbwait will get into to hold the kernel.
 
 then command from kgdb could dump ram etc.
 

So in that case, kgdb is broken and will need to be fixed up.  That
happens all the time with debugging tools.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-17 Thread Yinghai Lu
From: "H. Peter Anvin" 

two use cases:
1. We will support load and run kernel above 4G, and zero_page, ramdisk
   will be above 4G, too
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them, but it will make code to
messy and hard to unified with 32bit.

So here comes #PF handler to set page page.

When #PF happen, handler will use pages in __initdata to set page page
to cover accessed page.

those code and page in __INIT sections, so will not increase ram usages.

The good point is: with help of #PF handler, we can set kernel mapping
from blank, and switch to init_level4_pgt later.

switchover in head_64.S is only using three page to handle kernel
crossing 1G, 512G with shareing page, most insteresting part.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
init_mapping_kernel()   - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
 also move back init_level4_pgt from BSS to DATA again.
 because we have to clear it anyway.  - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
 it is with fill 512,8,0 already in head_64.S  - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
 let early_trap_init to trash that early #PF handler.
 So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
 touch possible mem holes. - Yinghai

Signed-off-by: Yinghai Lu 
---
 arch/x86/include/asm/pgtable_64_types.h |4 +
 arch/x86/include/asm/processor.h|1 +
 arch/x86/kernel/head64.c|   79 ++--
 arch/x86/kernel/head_64.S   |  202 +--
 arch/x86/kernel/setup.c |2 +
 arch/x86/kernel/traps.c |9 ++
 arch/x86/mm/init.c  |3 +-
 7 files changed, 204 insertions(+), 96 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include 
+
 #ifndef __ASSEMBLY__
 #include 
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END  _AC(0xff00, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES  64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 888184b..a0b58dd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -731,6 +731,7 @@ extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
 extern void early_trap_init(void);
+extern void early_trap_pf_init(void);
 
 /* Defined in head.S */
 extern struct desc_ptr early_gdt_descr;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7b215a5..cac61dc 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,72 @@
 #include 
 #include 
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-   pgd_t *pgd = pgd_offset_k(0UL);
-   pgd_clear(pgd);
-   __flush_tlb_all();
+   unsigned long i;
+
+   for (i = 0; i < PTRS_PER_PGD-1; i++)
+   early_level4_pgt[i].pgd = 0;
+
+   next_early_pgt = 0;
+
+   write_cr3(__pa(early_level4_pgt));
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+   unsigned long physaddr = address - __PAGE_OFFSET;
+   unsigned long i;
+   pgdval_t pgd, *pgd_p;
+   pudval_t *pud_p;
+   pmdval_t pmd, *pmd_p;
+
+
+   /* Invalid address or early pgt is done ?  */
+   if (physaddr >= MAXMEM || read_cr3() != __pa(early_level4_pgt))
+   return -1;
+
+   pgd_p = _level4_pgt[pgd_index(address)].pgd;
+   pgd = *pgd_p;
+
+   /*
+* The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+* critical -- __PAGE_OFFSET would point us back into the dynamic
+* range and we might end up looping forever...
+*/
+   if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+   pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map 
- phys_base);
+   } else {
+   if (next_early_pgt >= 

[PATCH v7 06/27] x86, 64bit: early #PF handler set page table

2012-12-17 Thread Yinghai Lu
From: H. Peter Anvin h...@zytor.com

two use cases:
1. We will support load and run kernel above 4G, and zero_page, ramdisk
   will be above 4G, too
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them, but it will make code to
messy and hard to unified with 32bit.

So here comes #PF handler to set page page.

When #PF happen, handler will use pages in __initdata to set page page
to cover accessed page.

those code and page in __INIT sections, so will not increase ram usages.

The good point is: with help of #PF handler, we can set kernel mapping
from blank, and switch to init_level4_pgt later.

switchover in head_64.S is only using three page to handle kernel
crossing 1G, 512G with shareing page, most insteresting part.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
init_mapping_kernel()   - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
 also move back init_level4_pgt from BSS to DATA again.
 because we have to clear it anyway.  - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
 it is with fill 512,8,0 already in head_64.S  - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
 let early_trap_init to trash that early #PF handler.
 So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
 touch possible mem holes. - Yinghai

Signed-off-by: Yinghai Lu ying...@kernel.org
---
 arch/x86/include/asm/pgtable_64_types.h |4 +
 arch/x86/include/asm/processor.h|1 +
 arch/x86/kernel/head64.c|   79 ++--
 arch/x86/kernel/head_64.S   |  202 +--
 arch/x86/kernel/setup.c |2 +
 arch/x86/kernel/traps.c |9 ++
 arch/x86/mm/init.c  |3 +-
 7 files changed, 204 insertions(+), 96 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include asm/sparsemem.h
+
 #ifndef __ASSEMBLY__
 #include linux/types.h
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END  _AC(0xff00, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES  64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 888184b..a0b58dd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -731,6 +731,7 @@ extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
 extern void early_trap_init(void);
+extern void early_trap_pf_init(void);
 
 /* Defined in head.S */
 extern struct desc_ptr early_gdt_descr;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7b215a5..cac61dc 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,72 @@
 #include asm/e820.h
 #include asm/bios_ebda.h
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-   pgd_t *pgd = pgd_offset_k(0UL);
-   pgd_clear(pgd);
-   __flush_tlb_all();
+   unsigned long i;
+
+   for (i = 0; i  PTRS_PER_PGD-1; i++)
+   early_level4_pgt[i].pgd = 0;
+
+   next_early_pgt = 0;
+
+   write_cr3(__pa(early_level4_pgt));
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+   unsigned long physaddr = address - __PAGE_OFFSET;
+   unsigned long i;
+   pgdval_t pgd, *pgd_p;
+   pudval_t *pud_p;
+   pmdval_t pmd, *pmd_p;
+
+
+   /* Invalid address or early pgt is done ?  */
+   if (physaddr = MAXMEM || read_cr3() != __pa(early_level4_pgt))
+   return -1;
+
+   pgd_p = early_level4_pgt[pgd_index(address)].pgd;
+   pgd = *pgd_p;
+
+   /*
+* The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+* critical -- __PAGE_OFFSET would point us back into the dynamic
+* range and we might end up looping forever...
+*/
+   if (pgd  next_early_pgt  EARLY_DYNAMIC_PAGE_TABLES) {
+   pud_p = (pudval_t *)((pgd  PTE_PFN_MASK) + __START_KERNEL_map 
-