Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
On 11/10/2017 08:05 PM, Andy Lutomirski wrote: > struct tss_struct { > /* > + * Space for the temporary SYSENTER stack. Used for the entry > + * trampoline as well. Size it such that tss_struct ends up > + * as a multiple of PAGE_SIZE. This calculation assumes that > + * io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one > + * long. > + */ > + unsigned long SYSENTER_stack_canary; > + unsigned long SYSENTER_stack[(PAGE_SIZE - sizeof(struct > x86_hw_tss)) / sizeof(unsigned long) - 2]; > + > + /* >* The hardware state: >*/ > struct x86_hw_tss x86_tss; > @@ -337,15 +347,9 @@ struct tss_struct { >* be within the limit. >*/ > unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; > - > - /* > - * Space for the temporary SYSENTER stack. > - */ > - unsigned long SYSENTER_stack_canary; > - unsigned long SYSENTER_stack[64]; > } cacheline_aligned; If io_bitmap[] is already page-size-aligned, how does it help us to move SYSENTER_stack[]? It seems like it would be easier to just leave SYSENTER_stack[] where it is, make it SYSENTER_stack[0], and just find somewhere else to choose how much to bloat the tss_struct *allocation* instead of trying to make sure that sizeof(tss_struct) matches the allocation. That SYSENTER_stack[] size calculation is pretty hideous. :)
Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
On 11/10/2017 08:05 PM, Andy Lutomirski wrote: > struct tss_struct { > /* > + * Space for the temporary SYSENTER stack. Used for the entry > + * trampoline as well. Size it such that tss_struct ends up > + * as a multiple of PAGE_SIZE. This calculation assumes that > + * io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one > + * long. > + */ > + unsigned long SYSENTER_stack_canary; > + unsigned long SYSENTER_stack[(PAGE_SIZE - sizeof(struct > x86_hw_tss)) / sizeof(unsigned long) - 2]; > + > + /* >* The hardware state: >*/ > struct x86_hw_tss x86_tss; > @@ -337,15 +347,9 @@ struct tss_struct { >* be within the limit. >*/ > unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; > - > - /* > - * Space for the temporary SYSENTER stack. > - */ > - unsigned long SYSENTER_stack_canary; > - unsigned long SYSENTER_stack[64]; > } cacheline_aligned; If io_bitmap[] is already page-size-aligned, how does it help us to move SYSENTER_stack[]? It seems like it would be easier to just leave SYSENTER_stack[] where it is, make it SYSENTER_stack[0], and just find somewhere else to choose how much to bloat the tss_struct *allocation* instead of trying to make sure that sizeof(tss_struct) matches the allocation. That SYSENTER_stack[] size calculation is pretty hideous. :)
Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
On Fri, Nov 10, 2017 at 8:05 PM, Andy Lutomirskiwrote: > The Intel SDM says (Volume 3, 7.2.1): > >Avoid placing a page boundary in the part of the TSS that the >processor reads during a task switch (the first 104 bytes). The >processor may not correctly perform address translations if a >boundary occurs in this area. During a task switch, the processor >reads and writes into the first 104 bytes of each TSS (using >contiguous physical addresses beginning with the physical address >of the first byte of the TSS). So, after TSS access begins, if >part of the 104 bytes is not physically contiguous, the processor >will access incorrect information without generating a page-fault >exception. Hmm. I should add that I suspect we rarely if ever hit this problem in practice because (a) we only ever task switch on 32-bit doublefaults, (b) if the old register state gets corrupted by this issue during a doublefault, we might not notice, and (c) there is probably rarely a page boundary in the wrong place. I suspect that regular kernel entries have the same issue but that esp0 and ss0 were always in the same page due to cacheline alignment. FWIW, we really do virtually map the percpu section AFAICT. The code does not appear to guarantee that percpu variables are physically contiguous. I'd love to make this mapping RO, but the SDM advises against that. I don't know whether there's a real concern (on 64-bit) or whether it's just being overly cautious.
Re: [RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
On Fri, Nov 10, 2017 at 8:05 PM, Andy Lutomirski wrote: > The Intel SDM says (Volume 3, 7.2.1): > >Avoid placing a page boundary in the part of the TSS that the >processor reads during a task switch (the first 104 bytes). The >processor may not correctly perform address translations if a >boundary occurs in this area. During a task switch, the processor >reads and writes into the first 104 bytes of each TSS (using >contiguous physical addresses beginning with the physical address >of the first byte of the TSS). So, after TSS access begins, if >part of the 104 bytes is not physically contiguous, the processor >will access incorrect information without generating a page-fault >exception. Hmm. I should add that I suspect we rarely if ever hit this problem in practice because (a) we only ever task switch on 32-bit doublefaults, (b) if the old register state gets corrupted by this issue during a doublefault, we might not notice, and (c) there is probably rarely a page boundary in the wrong place. I suspect that regular kernel entries have the same issue but that esp0 and ss0 were always in the same page due to cacheline alignment. FWIW, we really do virtually map the percpu section AFAICT. The code does not appear to guarantee that percpu variables are physically contiguous. I'd love to make this mapping RO, but the SDM advises against that. I don't know whether there's a real concern (on 64-bit) or whether it's just being overly cautious.
[RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
The Intel SDM says (Volume 3, 7.2.1): Avoid placing a page boundary in the part of the TSS that the processor reads during a task switch (the first 104 bytes). The processor may not correctly perform address translations if a boundary occurs in this area. During a task switch, the processor reads and writes into the first 104 bytes of each TSS (using contiguous physical addresses beginning with the physical address of the first byte of the TSS). So, after TSS access begins, if part of the 104 bytes is not physically contiguous, the processor will access incorrect information without generating a page-fault exception. Merely cacheline-aligning the TSS doesn't actually guarantee that the hardware TSS doesn't span a page. Instead, page-align the structure that contains it. Signed-off-by: Andy Lutomirski--- arch/x86/include/asm/processor.h | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 301d41ca1fa1..97ded6e3edd3 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -326,6 +326,16 @@ struct x86_hw_tss { struct tss_struct { /* +* Space for the temporary SYSENTER stack. Used for the entry +* trampoline as well. Size it such that tss_struct ends up +* as a multiple of PAGE_SIZE. This calculation assumes that +* io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one +* long. +*/ + unsigned long SYSENTER_stack_canary; + unsigned long SYSENTER_stack[(PAGE_SIZE - sizeof(struct x86_hw_tss)) / sizeof(unsigned long) - 2]; + + /* * The hardware state: */ struct x86_hw_tss x86_tss; @@ -337,15 +347,9 @@ struct tss_struct { * be within the limit. */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; - - /* -* Space for the temporary SYSENTER stack. -*/ - unsigned long SYSENTER_stack_canary; - unsigned long SYSENTER_stack[64]; } cacheline_aligned; -DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss); +DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss); /* * sizeof(unsigned long) coming from an extra "long" at the end -- 2.13.6
[RFC 5/7] x86/asm: Rearrange struct cpu_tss to enlarge SYSENTER_stack and fix alignment
The Intel SDM says (Volume 3, 7.2.1): Avoid placing a page boundary in the part of the TSS that the processor reads during a task switch (the first 104 bytes). The processor may not correctly perform address translations if a boundary occurs in this area. During a task switch, the processor reads and writes into the first 104 bytes of each TSS (using contiguous physical addresses beginning with the physical address of the first byte of the TSS). So, after TSS access begins, if part of the 104 bytes is not physically contiguous, the processor will access incorrect information without generating a page-fault exception. Merely cacheline-aligning the TSS doesn't actually guarantee that the hardware TSS doesn't span a page. Instead, page-align the structure that contains it. Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/processor.h | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 301d41ca1fa1..97ded6e3edd3 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -326,6 +326,16 @@ struct x86_hw_tss { struct tss_struct { /* +* Space for the temporary SYSENTER stack. Used for the entry +* trampoline as well. Size it such that tss_struct ends up +* as a multiple of PAGE_SIZE. This calculation assumes that +* io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one +* long. +*/ + unsigned long SYSENTER_stack_canary; + unsigned long SYSENTER_stack[(PAGE_SIZE - sizeof(struct x86_hw_tss)) / sizeof(unsigned long) - 2]; + + /* * The hardware state: */ struct x86_hw_tss x86_tss; @@ -337,15 +347,9 @@ struct tss_struct { * be within the limit. */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; - - /* -* Space for the temporary SYSENTER stack. -*/ - unsigned long SYSENTER_stack_canary; - unsigned long SYSENTER_stack[64]; } cacheline_aligned; -DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss); +DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss); /* * sizeof(unsigned long) coming from an extra "long" at the end -- 2.13.6