Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Tom Lendacky
On 4/9/2018 2:50 PM, Dave Hansen wrote:
> On 04/09/2018 11:59 AM, Tom Lendacky wrote:
>> On 4/9/2018 1:17 PM, Dave Hansen wrote:
>>> On 04/09/2018 11:04 AM, Tom Lendacky wrote:
 On 4/6/2018 3:55 PM, Dave Hansen wrote:
> Changes from v4
>  * Fix compile error reported by Tom Lendacky
 This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
 I think you're missing the initialization of __default_kernel_pte_mask in
 kaslr.c.
>>>
>>> This should be simple to fix (just add a -1 instead of 0), but let me
>>> double-check and actually boot the fix.
>>
>> Yup, added an "= ~0" and everything is good.
> 
> I'm testing at this commit in the tip tree:
> 
> 0564258... x86/pti: Leave kernel text global for !PCID
> 
> It seems to boot OK with RANDOMIZE_BASE=y for both PCID and non-PCID
> configuration.  Could you send along your .config so I can try to reproduce?
> 

Sure, I'll send it to you directly as an attachment.

Thanks,
Tom


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Tom Lendacky
On 4/9/2018 2:50 PM, Dave Hansen wrote:
> On 04/09/2018 11:59 AM, Tom Lendacky wrote:
>> On 4/9/2018 1:17 PM, Dave Hansen wrote:
>>> On 04/09/2018 11:04 AM, Tom Lendacky wrote:
 On 4/6/2018 3:55 PM, Dave Hansen wrote:
> Changes from v4
>  * Fix compile error reported by Tom Lendacky
 This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
 I think you're missing the initialization of __default_kernel_pte_mask in
 kaslr.c.
>>>
>>> This should be simple to fix (just add a -1 instead of 0), but let me
>>> double-check and actually boot the fix.
>>
>> Yup, added an "= ~0" and everything is good.
> 
> I'm testing at this commit in the tip tree:
> 
> 0564258... x86/pti: Leave kernel text global for !PCID
> 
> It seems to boot OK with RANDOMIZE_BASE=y for both PCID and non-PCID
> configuration.  Could you send along your .config so I can try to reproduce?
> 

Sure, I'll send it to you directly as an attachment.

Thanks,
Tom


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Dave Hansen
On 04/09/2018 11:59 AM, Tom Lendacky wrote:
> On 4/9/2018 1:17 PM, Dave Hansen wrote:
>> On 04/09/2018 11:04 AM, Tom Lendacky wrote:
>>> On 4/6/2018 3:55 PM, Dave Hansen wrote:
 Changes from v4
  * Fix compile error reported by Tom Lendacky
>>> This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
>>> I think you're missing the initialization of __default_kernel_pte_mask in
>>> kaslr.c.
>>
>> This should be simple to fix (just add a -1 instead of 0), but let me
>> double-check and actually boot the fix.
> 
> Yup, added an "= ~0" and everything is good.

I'm testing at this commit in the tip tree:

0564258... x86/pti: Leave kernel text global for !PCID

It seems to boot OK with RANDOMIZE_BASE=y for both PCID and non-PCID
configuration.  Could you send along your .config so I can try to reproduce?


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Dave Hansen
On 04/09/2018 11:59 AM, Tom Lendacky wrote:
> On 4/9/2018 1:17 PM, Dave Hansen wrote:
>> On 04/09/2018 11:04 AM, Tom Lendacky wrote:
>>> On 4/6/2018 3:55 PM, Dave Hansen wrote:
 Changes from v4
  * Fix compile error reported by Tom Lendacky
>>> This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
>>> I think you're missing the initialization of __default_kernel_pte_mask in
>>> kaslr.c.
>>
>> This should be simple to fix (just add a -1 instead of 0), but let me
>> double-check and actually boot the fix.
> 
> Yup, added an "= ~0" and everything is good.

I'm testing at this commit in the tip tree:

0564258... x86/pti: Leave kernel text global for !PCID

It seems to boot OK with RANDOMIZE_BASE=y for both PCID and non-PCID
configuration.  Could you send along your .config so I can try to reproduce?


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Tom Lendacky
On 4/9/2018 1:17 PM, Dave Hansen wrote:
> On 04/09/2018 11:04 AM, Tom Lendacky wrote:
>> On 4/6/2018 3:55 PM, Dave Hansen wrote:
>>> Changes from v4
>>>  * Fix compile error reported by Tom Lendacky
>> This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
>> I think you're missing the initialization of __default_kernel_pte_mask in
>> kaslr.c.
> 
> This should be simple to fix (just add a -1 instead of 0), but let me
> double-check and actually boot the fix.

Yup, added an "= ~0" and everything is good.

Thanks,
Tom

> 


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Tom Lendacky
On 4/9/2018 1:17 PM, Dave Hansen wrote:
> On 04/09/2018 11:04 AM, Tom Lendacky wrote:
>> On 4/6/2018 3:55 PM, Dave Hansen wrote:
>>> Changes from v4
>>>  * Fix compile error reported by Tom Lendacky
>> This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
>> I think you're missing the initialization of __default_kernel_pte_mask in
>> kaslr.c.
> 
> This should be simple to fix (just add a -1 instead of 0), but let me
> double-check and actually boot the fix.

Yup, added an "= ~0" and everything is good.

Thanks,
Tom

> 


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Dave Hansen
On 04/09/2018 11:04 AM, Tom Lendacky wrote:
> On 4/6/2018 3:55 PM, Dave Hansen wrote:
>> Changes from v4
>>  * Fix compile error reported by Tom Lendacky
> This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
> I think you're missing the initialization of __default_kernel_pte_mask in
> kaslr.c.

This should be simple to fix (just add a -1 instead of 0), but let me
double-check and actually boot the fix.


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Dave Hansen
On 04/09/2018 11:04 AM, Tom Lendacky wrote:
> On 4/6/2018 3:55 PM, Dave Hansen wrote:
>> Changes from v4
>>  * Fix compile error reported by Tom Lendacky
> This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
> I think you're missing the initialization of __default_kernel_pte_mask in
> kaslr.c.

This should be simple to fix (just add a -1 instead of 0), but let me
double-check and actually boot the fix.


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Tom Lendacky
On 4/6/2018 3:55 PM, Dave Hansen wrote:
> Changes from v4
>  * Fix compile error reported by Tom Lendacky

This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
I think you're missing the initialization of __default_kernel_pte_mask in
kaslr.c.

Thanks,
Tom

>  * Avoid setting _PAGE_GLOBAL on non-present entries
> 
> Changes from v3:
>  * Fix whitespace issue noticed by willy
>  * Clarify comments about X86_FEATURE_PGE checks
>  * Clarify commit message around the necessity of _PAGE_GLOBAL
>filtering when CR4.PGE=0 or PGE is unsupported.
> 
> Changes from v2:
> 
>  * Add performance numbers to changelogs
>  * Fix compile error resulting from use of x86-specific
>__default_kernel_pte_mask in arch-generic mm/early_ioremap.c
>  * Delay kernel text cloning until after we are done messing
>with it (patch 11).
>  * Blacklist K8 explicitly from mapping all kernel text as
>global (this should never happen because K8 does not use
>pti when pti=auto, but we on the safe side). (patch 11)
> 
> --
> 
> The later versions of the KAISER patches (pre-PTI) allowed the
> user/kernel shared areas to be GLOBAL.  The thought was that this would
> reduce the TLB overhead of keeping two copies of these mappings.
> 
> During the switch over to PTI, we seem to have lost our ability to have
> GLOBAL mappings.  This adds them back.
> 
> To measure the benefits of this, I took a modern Atom system without
> PCIDs and ran a microbenchmark[1] (higher is better):
> 
> No Global Lines (baseline  ): 6077741 lseeks/sec
> 88 Global Lines (kern entry): 7528609 lseeks/sec (+23.9%)
> 94 Global Lines (all ktext ): 8433111 lseeks/sec (+38.8%)
> 
> On a modern Skylake desktop with PCIDs, the benefits are tangible, but not
> huge:
> 
> No Global pages (baseline): 15783951 lseeks/sec
> 28 Global pages (this set): 16054688 lseeks/sec
>  +270737 lseeks/sec (+1.71%)
> 
> I also double-checked with a kernel compile on the Skylake system (lower
> is better):
> 
> No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
> 28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
>  -1.195 seconds (-0.64%)
> 
> 1. https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c
> 
> Cc: Andrea Arcangeli 
> Cc: Andy Lutomirski 
> Cc: Linus Torvalds 
> Cc: Kees Cook 
> Cc: Hugh Dickins 
> Cc: Juergen Gross 
> Cc: x...@kernel.org
> Cc: Nadav Amit 
> 


Re: [PATCH 00/11] [v5] Use global pages with PTI

2018-04-09 Thread Tom Lendacky
On 4/6/2018 3:55 PM, Dave Hansen wrote:
> Changes from v4
>  * Fix compile error reported by Tom Lendacky

This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully.
I think you're missing the initialization of __default_kernel_pte_mask in
kaslr.c.

Thanks,
Tom

>  * Avoid setting _PAGE_GLOBAL on non-present entries
> 
> Changes from v3:
>  * Fix whitespace issue noticed by willy
>  * Clarify comments about X86_FEATURE_PGE checks
>  * Clarify commit message around the necessity of _PAGE_GLOBAL
>filtering when CR4.PGE=0 or PGE is unsupported.
> 
> Changes from v2:
> 
>  * Add performance numbers to changelogs
>  * Fix compile error resulting from use of x86-specific
>__default_kernel_pte_mask in arch-generic mm/early_ioremap.c
>  * Delay kernel text cloning until after we are done messing
>with it (patch 11).
>  * Blacklist K8 explicitly from mapping all kernel text as
>global (this should never happen because K8 does not use
>pti when pti=auto, but we on the safe side). (patch 11)
> 
> --
> 
> The later versions of the KAISER patches (pre-PTI) allowed the
> user/kernel shared areas to be GLOBAL.  The thought was that this would
> reduce the TLB overhead of keeping two copies of these mappings.
> 
> During the switch over to PTI, we seem to have lost our ability to have
> GLOBAL mappings.  This adds them back.
> 
> To measure the benefits of this, I took a modern Atom system without
> PCIDs and ran a microbenchmark[1] (higher is better):
> 
> No Global Lines (baseline  ): 6077741 lseeks/sec
> 88 Global Lines (kern entry): 7528609 lseeks/sec (+23.9%)
> 94 Global Lines (all ktext ): 8433111 lseeks/sec (+38.8%)
> 
> On a modern Skylake desktop with PCIDs, the benefits are tangible, but not
> huge:
> 
> No Global pages (baseline): 15783951 lseeks/sec
> 28 Global pages (this set): 16054688 lseeks/sec
>  +270737 lseeks/sec (+1.71%)
> 
> I also double-checked with a kernel compile on the Skylake system (lower
> is better):
> 
> No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
> 28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
>  -1.195 seconds (-0.64%)
> 
> 1. https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c
> 
> Cc: Andrea Arcangeli 
> Cc: Andy Lutomirski 
> Cc: Linus Torvalds 
> Cc: Kees Cook 
> Cc: Hugh Dickins 
> Cc: Juergen Gross 
> Cc: x...@kernel.org
> Cc: Nadav Amit 
> 


[PATCH 00/11] [v5] Use global pages with PTI

2018-04-06 Thread Dave Hansen
Changes from v4
 * Fix compile error reported by Tom Lendacky
 * Avoid setting _PAGE_GLOBAL on non-present entries

Changes from v3:
 * Fix whitespace issue noticed by willy
 * Clarify comments about X86_FEATURE_PGE checks
 * Clarify commit message around the necessity of _PAGE_GLOBAL
   filtering when CR4.PGE=0 or PGE is unsupported.

Changes from v2:

 * Add performance numbers to changelogs
 * Fix compile error resulting from use of x86-specific
   __default_kernel_pte_mask in arch-generic mm/early_ioremap.c
 * Delay kernel text cloning until after we are done messing
   with it (patch 11).
 * Blacklist K8 explicitly from mapping all kernel text as
   global (this should never happen because K8 does not use
   pti when pti=auto, but we on the safe side). (patch 11)

--

The later versions of the KAISER patches (pre-PTI) allowed the
user/kernel shared areas to be GLOBAL.  The thought was that this would
reduce the TLB overhead of keeping two copies of these mappings.

During the switch over to PTI, we seem to have lost our ability to have
GLOBAL mappings.  This adds them back.

To measure the benefits of this, I took a modern Atom system without
PCIDs and ran a microbenchmark[1] (higher is better):

No Global Lines (baseline  ): 6077741 lseeks/sec
88 Global Lines (kern entry): 7528609 lseeks/sec (+23.9%)
94 Global Lines (all ktext ): 8433111 lseeks/sec (+38.8%)

On a modern Skylake desktop with PCIDs, the benefits are tangible, but not
huge:

No Global pages (baseline): 15783951 lseeks/sec
28 Global pages (this set): 16054688 lseeks/sec
 +270737 lseeks/sec (+1.71%)

I also double-checked with a kernel compile on the Skylake system (lower
is better):

No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
 -1.195 seconds (-0.64%)

1. https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c

Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Nadav Amit 


[PATCH 00/11] [v5] Use global pages with PTI

2018-04-06 Thread Dave Hansen
Changes from v4
 * Fix compile error reported by Tom Lendacky
 * Avoid setting _PAGE_GLOBAL on non-present entries

Changes from v3:
 * Fix whitespace issue noticed by willy
 * Clarify comments about X86_FEATURE_PGE checks
 * Clarify commit message around the necessity of _PAGE_GLOBAL
   filtering when CR4.PGE=0 or PGE is unsupported.

Changes from v2:

 * Add performance numbers to changelogs
 * Fix compile error resulting from use of x86-specific
   __default_kernel_pte_mask in arch-generic mm/early_ioremap.c
 * Delay kernel text cloning until after we are done messing
   with it (patch 11).
 * Blacklist K8 explicitly from mapping all kernel text as
   global (this should never happen because K8 does not use
   pti when pti=auto, but we on the safe side). (patch 11)

--

The later versions of the KAISER patches (pre-PTI) allowed the
user/kernel shared areas to be GLOBAL.  The thought was that this would
reduce the TLB overhead of keeping two copies of these mappings.

During the switch over to PTI, we seem to have lost our ability to have
GLOBAL mappings.  This adds them back.

To measure the benefits of this, I took a modern Atom system without
PCIDs and ran a microbenchmark[1] (higher is better):

No Global Lines (baseline  ): 6077741 lseeks/sec
88 Global Lines (kern entry): 7528609 lseeks/sec (+23.9%)
94 Global Lines (all ktext ): 8433111 lseeks/sec (+38.8%)

On a modern Skylake desktop with PCIDs, the benefits are tangible, but not
huge:

No Global pages (baseline): 15783951 lseeks/sec
28 Global pages (this set): 16054688 lseeks/sec
 +270737 lseeks/sec (+1.71%)

I also double-checked with a kernel compile on the Skylake system (lower
is better):

No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
 -1.195 seconds (-0.64%)

1. https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c

Cc: Andrea Arcangeli 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Nadav Amit