Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-12 Thread H. Peter Anvin

On 01/10/2013 12:27 PM, Borislav Petkov wrote:


So at that time how can the Signed-off from them?

And there are commits in the upstream does not have Signed-off from the Author.


I certainly hope those are a very very small number, if any.



There are indeed a handful, at which point the first Signed-off-by: 
indicates that he, *based on his own first-hand knowledge* knows the 
author is intending and allowed to release this patch under the 
appropriate licensing term (see the Developer's Certificate of Origin 
document for the exact details.)


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-12 Thread H. Peter Anvin

On 01/10/2013 12:27 PM, Borislav Petkov wrote:


So at that time how can the Signed-off from them?

And there are commits in the upstream does not have Signed-off from the Author.


I certainly hope those are a very very small number, if any.



There are indeed a handful, at which point the first Signed-off-by: 
indicates that he, *based on his own first-hand knowledge* knows the 
author is intending and allowed to release this patch under the 
appropriate licensing term (see the Developer's Certificate of Origin 
document for the exact details.)


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-10 Thread Borislav Petkov
On Thu, Jan 10, 2013 at 09:05:46AM -0800, Yinghai Lu wrote:
> On Thu, Jan 10, 2013 at 4:19 AM, Borislav Petkov  wrote:
> > This is not how SOB chaining works:
> >
> > SOB: Author
> > SOB: Handler - this is you, who has added it to the patchset
> > SOB: Committer - maintainer
> >
> > You need to read Documentation/SubmittingPatches if there's still things
> > unclear.
> 
> Really don't know what you are doing here.
> 
> We did that before for a long time.
> 
> During reviewing some patches, Linus or HPA or Eric has better idea
> and drafted some patch,
> without their Signed-offs.
> 
> then first version submitter will continue the debugging and testing
> and make the patch working.
> 
> At last the submit the patch with authorship from Linus or HPA or Eric.
> 
> So at that time how can the Signed-off from them?
> 
> And there are commits in the upstream does not have Signed-off from the 
> Author.

I certainly hope those are a very very small number, if any.

In any case, if you've taken hpa's (or anyone's, for that matter) patch,
it should have SOB from the original author. Then, no matter whether you
do modifications to it or not, if it goes upstream through you, then it
has to have your SOB. And then, the upstream maintainer adds his/hers
because he's/she's the one committing it.

This way, the chain of patch handling is clear when you look at it and
you can trace the path back to this patch's origin and how it came
upstream.

Here's the relevant portion of SubmittingPatches:

"Rule (b) allows you to adjust the code, but then it is very impolite
to change one submitter's code and make him endorse your bugs. To
solve this problem, it is recommended that you add a line between the
last Signed-off-by header and yours, indicating the nature of your
changes. While there is nothing mandatory about this, it seems like
prepending the description with your mail and/or name, all enclosed in
square brackets, is noticeable enough to make it obvious that you are
responsible for last-minute changes. Example :

Signed-off-by: Random J Developer 
[lu...@maintainer.example.org: struct foo moved from foo.c to foo.h]
Signed-off-by: Lucky K Maintainer "

In your case, the second SOB should be "Lucky K Developer 2" :-)

This way the SOB chain tells you exactly who did what.

HTH.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-10 Thread Yinghai Lu
On Thu, Jan 10, 2013 at 4:19 AM, Borislav Petkov  wrote:
> This is not how SOB chaining works:
>
> SOB: Author
> SOB: Handler - this is you, who has added it to the patchset
> SOB: Committer - maintainer
>
> You need to read Documentation/SubmittingPatches if there's still things
> unclear.

Really don't know what you are doing here.

We did that before for a long time.

During reviewing some patches, Linus or HPA or Eric has better idea
and drafted some patch,
without their Signed-offs.

then first version submitter will continue the debugging and testing
and make the patch working.

At last the submit the patch with authorship from Linus or HPA or Eric.

So at that time how can the Signed-off from them?

And there are commits in the upstream does not have Signed-off from the Author.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-10 Thread Borislav Petkov
On Wed, Jan 09, 2013 at 05:56:07PM -0800, Yinghai Lu wrote:
> On Mon, Jan 7, 2013 at 7:55 AM, Borislav Petkov  wrote:
> > Those -vXX version lines need to go under the "---" line. Alternatively,
> > you might want to add some of them to the commit message with a proper
> > explanation since they are not that trivial at a first glance, for
> > example the -v5, -v6, -v8, -v9 with a better explanation.
> 
> mostly they are for tracking version.

I know that! Please read my suggestion again.

> > This needs hpa's S-O-B.
> 
> he will add later when he put the in the tip.

This is not how SOB chaining works:

SOB: Author
SOB: Handler - this is you, who has added it to the patchset
SOB: Committer - maintainer

You need to read Documentation/SubmittingPatches if there's still things
unclear.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-10 Thread Yinghai Lu
On Thu, Jan 10, 2013 at 4:19 AM, Borislav Petkov b...@alien8.de wrote:
 This is not how SOB chaining works:

 SOB: Author
 SOB: Handler - this is you, who has added it to the patchset
 SOB: Committer - maintainer

 You need to read Documentation/SubmittingPatches if there's still things
 unclear.

Really don't know what you are doing here.

We did that before for a long time.

During reviewing some patches, Linus or HPA or Eric has better idea
and drafted some patch,
without their Signed-offs.

then first version submitter will continue the debugging and testing
and make the patch working.

At last the submit the patch with authorship from Linus or HPA or Eric.

So at that time how can the Signed-off from them?

And there are commits in the upstream does not have Signed-off from the Author.

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-10 Thread Borislav Petkov
On Thu, Jan 10, 2013 at 09:05:46AM -0800, Yinghai Lu wrote:
 On Thu, Jan 10, 2013 at 4:19 AM, Borislav Petkov b...@alien8.de wrote:
  This is not how SOB chaining works:
 
  SOB: Author
  SOB: Handler - this is you, who has added it to the patchset
  SOB: Committer - maintainer
 
  You need to read Documentation/SubmittingPatches if there's still things
  unclear.
 
 Really don't know what you are doing here.
 
 We did that before for a long time.
 
 During reviewing some patches, Linus or HPA or Eric has better idea
 and drafted some patch,
 without their Signed-offs.
 
 then first version submitter will continue the debugging and testing
 and make the patch working.
 
 At last the submit the patch with authorship from Linus or HPA or Eric.
 
 So at that time how can the Signed-off from them?
 
 And there are commits in the upstream does not have Signed-off from the 
 Author.

I certainly hope those are a very very small number, if any.

In any case, if you've taken hpa's (or anyone's, for that matter) patch,
it should have SOB from the original author. Then, no matter whether you
do modifications to it or not, if it goes upstream through you, then it
has to have your SOB. And then, the upstream maintainer adds his/hers
because he's/she's the one committing it.

This way, the chain of patch handling is clear when you look at it and
you can trace the path back to this patch's origin and how it came
upstream.

Here's the relevant portion of SubmittingPatches:

Rule (b) allows you to adjust the code, but then it is very impolite
to change one submitter's code and make him endorse your bugs. To
solve this problem, it is recommended that you add a line between the
last Signed-off-by header and yours, indicating the nature of your
changes. While there is nothing mandatory about this, it seems like
prepending the description with your mail and/or name, all enclosed in
square brackets, is noticeable enough to make it obvious that you are
responsible for last-minute changes. Example :

Signed-off-by: Random J Developer ran...@developer.example.org
[lu...@maintainer.example.org: struct foo moved from foo.c to foo.h]
Signed-off-by: Lucky K Maintainer lu...@maintainer.example.org

In your case, the second SOB should be Lucky K Developer 2 :-)

This way the SOB chain tells you exactly who did what.

HTH.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-10 Thread Borislav Petkov
On Wed, Jan 09, 2013 at 05:56:07PM -0800, Yinghai Lu wrote:
 On Mon, Jan 7, 2013 at 7:55 AM, Borislav Petkov b...@alien8.de wrote:
  Those -vXX version lines need to go under the --- line. Alternatively,
  you might want to add some of them to the commit message with a proper
  explanation since they are not that trivial at a first glance, for
  example the -v5, -v6, -v8, -v9 with a better explanation.
 
 mostly they are for tracking version.

I know that! Please read my suggestion again.

  This needs hpa's S-O-B.
 
 he will add later when he put the in the tip.

This is not how SOB chaining works:

SOB: Author
SOB: Handler - this is you, who has added it to the patchset
SOB: Committer - maintainer

You need to read Documentation/SubmittingPatches if there's still things
unclear.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-09 Thread Yinghai Lu
On Mon, Jan 7, 2013 at 7:55 AM, Borislav Petkov  wrote:
> Those -vXX version lines need to go under the "---" line. Alternatively,
> you might want to add some of them to the commit message with a proper
> explanation since they are not that trivial at a first glance, for
> example the -v5, -v6, -v8, -v9 with a better explanation.

mostly they are for tracking version.

>
>>
>
> This needs hpa's S-O-B.

he will add later when he put the in the tip.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-09 Thread Yinghai Lu
On Mon, Jan 7, 2013 at 7:55 AM, Borislav Petkov b...@alien8.de wrote:
 Those -vXX version lines need to go under the --- line. Alternatively,
 you might want to add some of them to the commit message with a proper
 explanation since they are not that trivial at a first glance, for
 example the -v5, -v6, -v8, -v9 with a better explanation.

mostly they are for tracking version.




 This needs hpa's S-O-B.

he will add later when he put the in the tip.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-07 Thread Borislav Petkov
On Thu, Jan 03, 2013 at 04:48:28PM -0800, Yinghai Lu wrote:
> From: "H. Peter Anvin" 
> 
> two use cases:
> 1. We will support load and run kernel above 4G, and zero_page, ramdisk
>will be above 4G, too
> 2. need to access ramdisk early to get microcode to update that as
>early possible.
> 
> We could use early_iomap to access them, but it will make code to
 too
> messy and hard to unified with 32bit.
s/unified/unify/

> 
> So here comes #PF handler to set page page.
> 
> When #PF happen, handler will use pages in __initdata to set page page

"When a page fault happens, the handler will use pages from __initdata
to cover the accessed page."

> to cover accessed page.
> 
> those code and page in __INIT sections, so will not increase ram usages.

Huh, what? Something is in __INIT and will not increase RAM usage?

> The good point is: with help of #PF handler, we can set kernel mapping
> from blank, and switch to init_level4_pgt later.

I think you want to say "we can create temporary, ad-hoc kernel mappings
and forget them later by switching to init_level4_pgt." ?

> switchover in head_64.S is only using three page to handle kernel
> crossing 1G, 512G with shareing page, most insteresting part.

Again, what?

> early_make_pgtable is using kernel high mapping address to access pages
> to set page table.
> 
> -v4: Add phys_base offset to make kexec happy, and add
>   init_mapping_kernel()   - Yinghai
> -v5: fix compiling with xen, and add back ident level3 and level2 for xen
>  also move back init_level4_pgt from BSS to DATA again.
>  because we have to clear it anyway.  - Yinghai
> -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
> -v7: remove not needed clear_page for init_level4_page
>  it is with fill 512,8,0 already in head_64.S  - Yinghai
> -v8: we need to keep that handler alive until init_mem_mapping and don't
>  let early_trap_init to trash that early #PF handler.
>  So split early_trap_pf_init out and move it down. - Yinghai
> -v9: switchover only cover kernel space instead of 1G so could avoid
>  touch possible mem holes. - Yinghai
> -v11: change far jmp back to far return to initial_code, that is needed
>  to fix failure that is reported by Konrad on AMD system.  - Yinghai

Those -vXX version lines need to go under the "---" line. Alternatively,
you might want to add some of them to the commit message with a proper
explanation since they are not that trivial at a first glance, for
example the -v5, -v6, -v8, -v9 with a better explanation.

> 
> Signed-off-by: Yinghai Lu 

This needs hpa's S-O-B.

[ … ]

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-07 Thread Borislav Petkov
On Thu, Jan 03, 2013 at 04:48:28PM -0800, Yinghai Lu wrote:
 From: H. Peter Anvin h...@zytor.com
 
 two use cases:
 1. We will support load and run kernel above 4G, and zero_page, ramdisk
will be above 4G, too
 2. need to access ramdisk early to get microcode to update that as
early possible.
 
 We could use early_iomap to access them, but it will make code to
 too
 messy and hard to unified with 32bit.
s/unified/unify/

 
 So here comes #PF handler to set page page.
 
 When #PF happen, handler will use pages in __initdata to set page page

When a page fault happens, the handler will use pages from __initdata
to cover the accessed page.

 to cover accessed page.
 
 those code and page in __INIT sections, so will not increase ram usages.

Huh, what? Something is in __INIT and will not increase RAM usage?

 The good point is: with help of #PF handler, we can set kernel mapping
 from blank, and switch to init_level4_pgt later.

I think you want to say we can create temporary, ad-hoc kernel mappings
and forget them later by switching to init_level4_pgt. ?

 switchover in head_64.S is only using three page to handle kernel
 crossing 1G, 512G with shareing page, most insteresting part.

Again, what?

 early_make_pgtable is using kernel high mapping address to access pages
 to set page table.
 
 -v4: Add phys_base offset to make kexec happy, and add
   init_mapping_kernel()   - Yinghai
 -v5: fix compiling with xen, and add back ident level3 and level2 for xen
  also move back init_level4_pgt from BSS to DATA again.
  because we have to clear it anyway.  - Yinghai
 -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
 -v7: remove not needed clear_page for init_level4_page
  it is with fill 512,8,0 already in head_64.S  - Yinghai
 -v8: we need to keep that handler alive until init_mem_mapping and don't
  let early_trap_init to trash that early #PF handler.
  So split early_trap_pf_init out and move it down. - Yinghai
 -v9: switchover only cover kernel space instead of 1G so could avoid
  touch possible mem holes. - Yinghai
 -v11: change far jmp back to far return to initial_code, that is needed
  to fix failure that is reported by Konrad on AMD system.  - Yinghai

Those -vXX version lines need to go under the --- line. Alternatively,
you might want to add some of them to the commit message with a proper
explanation since they are not that trivial at a first glance, for
example the -v5, -v6, -v8, -v9 with a better explanation.

 
 Signed-off-by: Yinghai Lu ying...@kernel.org

This needs hpa's S-O-B.

[ … ]

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-03 Thread Yinghai Lu
From: "H. Peter Anvin" 

two use cases:
1. We will support load and run kernel above 4G, and zero_page, ramdisk
   will be above 4G, too
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them, but it will make code to
messy and hard to unified with 32bit.

So here comes #PF handler to set page page.

When #PF happen, handler will use pages in __initdata to set page page
to cover accessed page.

those code and page in __INIT sections, so will not increase ram usages.

The good point is: with help of #PF handler, we can set kernel mapping
from blank, and switch to init_level4_pgt later.

switchover in head_64.S is only using three page to handle kernel
crossing 1G, 512G with shareing page, most insteresting part.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
init_mapping_kernel()   - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
 also move back init_level4_pgt from BSS to DATA again.
 because we have to clear it anyway.  - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
 it is with fill 512,8,0 already in head_64.S  - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
 let early_trap_init to trash that early #PF handler.
 So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
 touch possible mem holes. - Yinghai
-v11: change far jmp back to far return to initial_code, that is needed
 to fix failure that is reported by Konrad on AMD system.  - Yinghai

Signed-off-by: Yinghai Lu 
---
 arch/x86/include/asm/pgtable_64_types.h |4 +
 arch/x86/include/asm/processor.h|1 +
 arch/x86/kernel/head64.c|   81 ++--
 arch/x86/kernel/head_64.S   |  210 +++
 arch/x86/kernel/setup.c |2 +
 arch/x86/kernel/traps.c |9 ++
 arch/x86/mm/init.c  |3 +-
 7 files changed, 219 insertions(+), 91 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include 
+
 #ifndef __ASSEMBLY__
 #include 
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END  _AC(0xff00, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES  64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 888184b..bdee8bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -731,6 +731,7 @@ extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
 extern void early_trap_init(void);
+void early_trap_pf_init(void);
 
 /* Defined in head.S */
 extern struct desc_ptr early_gdt_descr;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c0a25e0..25591f9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include 
 #include 
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-   pgd_t *pgd = pgd_offset_k(0UL);
-   pgd_clear(pgd);
-   __flush_tlb_all();
+   unsigned long i;
+
+   for (i = 0; i < PTRS_PER_PGD-1; i++)
+   early_level4_pgt[i].pgd = 0;
+
+   next_early_pgt = 0;
+
+   write_cr3(__pa(early_level4_pgt));
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+   unsigned long physaddr = address - __PAGE_OFFSET;
+   unsigned long i;
+   pgdval_t pgd, *pgd_p;
+   pudval_t *pud_p;
+   pmdval_t pmd, *pmd_p;
+
+   /* Invalid address or early pgt is done ?  */
+   if (physaddr >= MAXMEM || read_cr3() != __pa(early_level4_pgt))
+   return -1;
+
+   i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+   pgd_p = _level4_pgt[i].pgd;
+   pgd = *pgd_p;
+
+   /*
+* The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+* critical -- __PAGE_OFFSET would point us back into the dynamic
+* range and we might end up looping forever...
+*/
+   if (pgd && next_early_pgt < 

[PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table

2013-01-03 Thread Yinghai Lu
From: H. Peter Anvin h...@zytor.com

two use cases:
1. We will support load and run kernel above 4G, and zero_page, ramdisk
   will be above 4G, too
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them, but it will make code to
messy and hard to unified with 32bit.

So here comes #PF handler to set page page.

When #PF happen, handler will use pages in __initdata to set page page
to cover accessed page.

those code and page in __INIT sections, so will not increase ram usages.

The good point is: with help of #PF handler, we can set kernel mapping
from blank, and switch to init_level4_pgt later.

switchover in head_64.S is only using three page to handle kernel
crossing 1G, 512G with shareing page, most insteresting part.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
init_mapping_kernel()   - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
 also move back init_level4_pgt from BSS to DATA again.
 because we have to clear it anyway.  - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
 it is with fill 512,8,0 already in head_64.S  - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
 let early_trap_init to trash that early #PF handler.
 So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
 touch possible mem holes. - Yinghai
-v11: change far jmp back to far return to initial_code, that is needed
 to fix failure that is reported by Konrad on AMD system.  - Yinghai

Signed-off-by: Yinghai Lu ying...@kernel.org
---
 arch/x86/include/asm/pgtable_64_types.h |4 +
 arch/x86/include/asm/processor.h|1 +
 arch/x86/kernel/head64.c|   81 ++--
 arch/x86/kernel/head_64.S   |  210 +++
 arch/x86/kernel/setup.c |2 +
 arch/x86/kernel/traps.c |9 ++
 arch/x86/mm/init.c  |3 +-
 7 files changed, 219 insertions(+), 91 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include asm/sparsemem.h
+
 #ifndef __ASSEMBLY__
 #include linux/types.h
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END  _AC(0xff00, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES  64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 888184b..bdee8bd 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -731,6 +731,7 @@ extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
 extern void early_trap_init(void);
+void early_trap_pf_init(void);
 
 /* Defined in head.S */
 extern struct desc_ptr early_gdt_descr;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c0a25e0..25591f9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include asm/e820.h
 #include asm/bios_ebda.h
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-   pgd_t *pgd = pgd_offset_k(0UL);
-   pgd_clear(pgd);
-   __flush_tlb_all();
+   unsigned long i;
+
+   for (i = 0; i  PTRS_PER_PGD-1; i++)
+   early_level4_pgt[i].pgd = 0;
+
+   next_early_pgt = 0;
+
+   write_cr3(__pa(early_level4_pgt));
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+   unsigned long physaddr = address - __PAGE_OFFSET;
+   unsigned long i;
+   pgdval_t pgd, *pgd_p;
+   pudval_t *pud_p;
+   pmdval_t pmd, *pmd_p;
+
+   /* Invalid address or early pgt is done ?  */
+   if (physaddr = MAXMEM || read_cr3() != __pa(early_level4_pgt))
+   return -1;
+
+   i = (address  PGDIR_SHIFT)  (PTRS_PER_PGD - 1);
+   pgd_p = early_level4_pgt[i].pgd;
+   pgd = *pgd_p;
+
+   /*
+* The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+* critical -- __PAGE_OFFSET would point us back into the dynamic
+* range and we might end up