Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment

2017-11-13 Thread Dave Hansen
On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
>  struct tss_struct {
>   /*
> +  * Space for the temporary SYSENTER stack.  Used for the entry
> +  * trampoline as well.  Size it such that tss_struct ends up
> +  * as a multiple of PAGE_SIZE.  This calculation assumes that
> +  * io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one
> +  * long.
> +  */
> + unsigned long   SYSENTER_stack_canary;
> + unsigned long   SYSENTER_stack[(PAGE_SIZE - sizeof(struct 
> x86_hw_tss)) / sizeof(unsigned long) - 2];
> +
> + /*
>* The hardware state:
>*/
>   struct x86_hw_tss   x86_tss;
> @@ -337,15 +347,9 @@ struct tss_struct {
>* be within the limit.
>*/
>   unsigned long   io_bitmap[IO_BITMAP_LONGS + 1];
> -
> - /*
> -  * Space for the temporary SYSENTER stack.
> -  */
> - unsigned long   SYSENTER_stack_canary;
> - unsigned long   SYSENTER_stack[64];
>  } cacheline_aligned;


If io_bitmap[] is already page-size-aligned, how does it help us to move
SYSENTER_stack[]?

It seems like it would be easier to just leave SYSENTER_stack[] where it
is, make it SYSENTER_stack[0], and just find somewhere else to choose
how much to bloat the tss_struct *allocation* instead of trying to make
sure that sizeof(tss_struct) matches the allocation.

That SYSENTER_stack[] size calculation is pretty hideous. :)


Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment

2017-11-13 Thread Dave Hansen
On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
>  struct tss_struct {
>   /*
> +  * Space for the temporary SYSENTER stack.  Used for the entry
> +  * trampoline as well.  Size it such that tss_struct ends up
> +  * as a multiple of PAGE_SIZE.  This calculation assumes that
> +  * io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one
> +  * long.
> +  */
> + unsigned long   SYSENTER_stack_canary;
> + unsigned long   SYSENTER_stack[(PAGE_SIZE - sizeof(struct 
> x86_hw_tss)) / sizeof(unsigned long) - 2];
> +
> + /*
>* The hardware state:
>*/
>   struct x86_hw_tss   x86_tss;
> @@ -337,15 +347,9 @@ struct tss_struct {
>* be within the limit.
>*/
>   unsigned long   io_bitmap[IO_BITMAP_LONGS + 1];
> -
> - /*
> -  * Space for the temporary SYSENTER stack.
> -  */
> - unsigned long   SYSENTER_stack_canary;
> - unsigned long   SYSENTER_stack[64];
>  } cacheline_aligned;


If io_bitmap[] is already page-size-aligned, how does it help us to move
SYSENTER_stack[]?

It seems like it would be easier to just leave SYSENTER_stack[] where it
is, make it SYSENTER_stack[0], and just find somewhere else to choose
how much to bloat the tss_struct *allocation* instead of trying to make
sure that sizeof(tss_struct) matches the allocation.

That SYSENTER_stack[] size calculation is pretty hideous. :)


Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment

2017-11-10 Thread Andy Lutomirski
On Fri, Nov 10, 2017 at 8:05 PM, Andy Lutomirski  wrote:
> The Intel SDM says (Volume 3, 7.2.1):
>
>Avoid placing a page boundary in the part of the TSS that the
>processor reads during a task switch (the first 104 bytes). The
>processor may not correctly perform address translations if a
>boundary occurs in this area. During a task switch, the processor
>reads and writes into the first 104 bytes of each TSS (using
>contiguous physical addresses beginning with the physical address
>of the first byte of the TSS). So, after TSS access begins, if
>part of the 104 bytes is not physically contiguous, the processor
>will access incorrect information without generating a page-fault
>exception.

Hmm.  I should add that I suspect we rarely if ever hit this problem
in practice because (a) we only ever task switch on 32-bit
doublefaults, (b) if the old register state gets corrupted by this
issue during a doublefault, we might not notice, and (c) there is
probably rarely a page boundary in the wrong place.  I suspect that
regular kernel entries have the same issue but that esp0 and ss0 were
always in the same page due to cacheline alignment.

FWIW, we really do virtually map the percpu section AFAICT.  The code
does not appear to guarantee that percpu variables are physically
contiguous.

I'd love to make this mapping RO, but the SDM advises against that.  I
don't know whether there's a real concern (on 64-bit) or whether it's
just being overly cautious.


Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment

2017-11-10 Thread Andy Lutomirski
On Fri, Nov 10, 2017 at 8:05 PM, Andy Lutomirski  wrote:
> The Intel SDM says (Volume 3, 7.2.1):
>
>Avoid placing a page boundary in the part of the TSS that the
>processor reads during a task switch (the first 104 bytes). The
>processor may not correctly perform address translations if a
>boundary occurs in this area. During a task switch, the processor
>reads and writes into the first 104 bytes of each TSS (using
>contiguous physical addresses beginning with the physical address
>of the first byte of the TSS). So, after TSS access begins, if
>part of the 104 bytes is not physically contiguous, the processor
>will access incorrect information without generating a page-fault
>exception.

Hmm.  I should add that I suspect we rarely if ever hit this problem
in practice because (a) we only ever task switch on 32-bit
doublefaults, (b) if the old register state gets corrupted by this
issue during a doublefault, we might not notice, and (c) there is
probably rarely a page boundary in the wrong place.  I suspect that
regular kernel entries have the same issue but that esp0 and ss0 were
always in the same page due to cacheline alignment.

FWIW, we really do virtually map the percpu section AFAICT.  The code
does not appear to guarantee that percpu variables are physically
contiguous.

I'd love to make this mapping RO, but the SDM advises against that.  I
don't know whether there's a real concern (on 64-bit) or whether it's
just being overly cautious.


[RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment

2017-11-10 Thread Andy Lutomirski
The Intel SDM says (Volume 3, 7.2.1):

   Avoid placing a page boundary in the part of the TSS that the
   processor reads during a task switch (the first 104 bytes). The
   processor may not correctly perform address translations if a
   boundary occurs in this area. During a task switch, the processor
   reads and writes into the first 104 bytes of each TSS (using
   contiguous physical addresses beginning with the physical address
   of the first byte of the TSS). So, after TSS access begins, if
   part of the 104 bytes is not physically contiguous, the processor
   will access incorrect information without generating a page-fault
   exception.

Merely cacheline-aligning the TSS doesn't actually guarantee that
the hardware TSS doesn't span a page.  Instead, page-align the
structure that contains it.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/processor.h | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 301d41ca1fa1..97ded6e3edd3 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -326,6 +326,16 @@ struct x86_hw_tss {
 
 struct tss_struct {
/*
+* Space for the temporary SYSENTER stack.  Used for the entry
+* trampoline as well.  Size it such that tss_struct ends up
+* as a multiple of PAGE_SIZE.  This calculation assumes that
+* io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one
+* long.
+*/
+   unsigned long   SYSENTER_stack_canary;
+   unsigned long   SYSENTER_stack[(PAGE_SIZE - sizeof(struct 
x86_hw_tss)) / sizeof(unsigned long) - 2];
+
+   /*
 * The hardware state:
 */
struct x86_hw_tss   x86_tss;
@@ -337,15 +347,9 @@ struct tss_struct {
 * be within the limit.
 */
unsigned long   io_bitmap[IO_BITMAP_LONGS + 1];
-
-   /*
-* Space for the temporary SYSENTER stack.
-*/
-   unsigned long   SYSENTER_stack_canary;
-   unsigned long   SYSENTER_stack[64];
 } cacheline_aligned;
 
-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
+DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss);
 
 /*
  * sizeof(unsigned long) coming from an extra "long" at the end
-- 
2.13.6



[RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment

2017-11-10 Thread Andy Lutomirski
The Intel SDM says (Volume 3, 7.2.1):

   Avoid placing a page boundary in the part of the TSS that the
   processor reads during a task switch (the first 104 bytes). The
   processor may not correctly perform address translations if a
   boundary occurs in this area. During a task switch, the processor
   reads and writes into the first 104 bytes of each TSS (using
   contiguous physical addresses beginning with the physical address
   of the first byte of the TSS). So, after TSS access begins, if
   part of the 104 bytes is not physically contiguous, the processor
   will access incorrect information without generating a page-fault
   exception.

Merely cacheline-aligning the TSS doesn't actually guarantee that
the hardware TSS doesn't span a page.  Instead, page-align the
structure that contains it.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/processor.h | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 301d41ca1fa1..97ded6e3edd3 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -326,6 +326,16 @@ struct x86_hw_tss {
 
 struct tss_struct {
/*
+* Space for the temporary SYSENTER stack.  Used for the entry
+* trampoline as well.  Size it such that tss_struct ends up
+* as a multiple of PAGE_SIZE.  This calculation assumes that
+* io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one
+* long.
+*/
+   unsigned long   SYSENTER_stack_canary;
+   unsigned long   SYSENTER_stack[(PAGE_SIZE - sizeof(struct 
x86_hw_tss)) / sizeof(unsigned long) - 2];
+
+   /*
 * The hardware state:
 */
struct x86_hw_tss   x86_tss;
@@ -337,15 +347,9 @@ struct tss_struct {
 * be within the limit.
 */
unsigned long   io_bitmap[IO_BITMAP_LONGS + 1];
-
-   /*
-* Space for the temporary SYSENTER stack.
-*/
-   unsigned long   SYSENTER_stack_canary;
-   unsigned long   SYSENTER_stack[64];
 } cacheline_aligned;
 
-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
+DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss);
 
 /*
  * sizeof(unsigned long) coming from an extra "long" at the end
-- 
2.13.6