Re: Avoid speculative indirect calls in kernel

2018-02-23 Thread Ywe Cærlyn

Patchmeister Torvalds:

"Or is Intel basically saying "we are committed to selling you shit
forever and ever, and never fixing anything"?"

Back in Celeron days, Intel was popular because you could clock the 
lesser cached Celeron 300mhz to ~500mhz.


Everybody knew then not to get anything pricier. But still Intel sells 
Xeons at 3x the price, for little noticable gain?


Basically I did research on philosophy, and indeed the mainconcept of a 
culture is what determines a cultures behaviour. Even the internal 
design of the cpu seems to be inspired by "God".


I have tried the zén-realized version "Zün", instead, God of absolute 
reality. Because regressions in philosophy, is regressions in computing, 
is ultimately Bill Gates talking about fecal water.


--
Fredelige hilsener,
Ywe Cærlyn,



Re: Avoid speculative indirect calls in kernel

2018-02-23 Thread Ywe Cærlyn

Patchmeister Torvalds:

"Or is Intel basically saying "we are committed to selling you shit
forever and ever, and never fixing anything"?"

Back in Celeron days, Intel was popular because you could clock the 
lesser cached Celeron 300mhz to ~500mhz.


Everybody knew then not to get anything pricier. But still Intel sells 
Xeons at 3x the price, for little noticable gain?


Basically I did research on philosophy, and indeed the mainconcept of a 
culture is what determines a cultures behaviour. Even the internal 
design of the cpu seems to be inspired by "God".


I have tried the zén-realized version "Zün", instead, God of absolute 
reality. Because regressions in philosophy, is regressions in computing, 
is ultimately Bill Gates talking about fecal water.


--
Fredelige hilsener,
Ywe Cærlyn,



Re: Avoid speculative indirect calls in kernel

2018-01-12 Thread Dr. Greg Wettstein
On Jan 5, 12:12pm, Alan Cox wrote:
} Subject: Re: Avoid speculative indirect calls in kernel

Good morning to everyone, a bit behind on mail given everything which
has been going on.

> On Fri, 5 Jan 2018 01:54:13 +0100 (CET)
> Thomas Gleixner <t...@linutronix.de> wrote:
> 
> > On Thu, 4 Jan 2018, Jon Masters wrote:
> > > P.S. I've an internal document where I've been tracking "nice to haves"
> > > for later, and one of them is whether it makes sense to tag binaries as
> > > "trusted" (e.g. extended attribute, label, whatever). It was something I
> > > wanted to bring up at some point as potentially worth considering.  
> > 
> > Scratch that. There is no such thing as a trusted binary.

> There is if you are using signing and the like. I'm sure SELinux and
> friends will grow the ability to set per process policy but that's
> certainly not a priority.
>
> However the question is wrong. 'trusted' is a binary operator not a
> unary one.

Alan's observations are correct.

In our autonomous introspection work we apply the notion that
'trusted' is a binary characteristic of a context of execution (COE).
Its value is an expression of whether or not the information exchange
events it has been involved in have deviated from the desired
execution trajectory path of the system.

It is a decidedly different way of thinking about things.  Most
importantly it is a namespaceable characteristic.

We have already written the futuristic LSM that Alan aludes to in
order to implement per COE security policies and forensics for
actors/COE's that have gone over to the 'dark side'.

> Alan

Have a good weekend.

Dr. Greg

}-- End of excerpt from Alan Cox

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"Given a choice between a complex, difficult-to-understand,
 disconcerting explanation and a simplistic, comforting one, many
 prefer simplistic comfort if it's remotely plausible, especially if it
 involves blaming someone else for their problems."
-- Bob Lewis
   _Infoworld_


Re: Avoid speculative indirect calls in kernel

2018-01-12 Thread Dr. Greg Wettstein
On Jan 5, 12:12pm, Alan Cox wrote:
} Subject: Re: Avoid speculative indirect calls in kernel

Good morning to everyone, a bit behind on mail given everything which
has been going on.

> On Fri, 5 Jan 2018 01:54:13 +0100 (CET)
> Thomas Gleixner  wrote:
> 
> > On Thu, 4 Jan 2018, Jon Masters wrote:
> > > P.S. I've an internal document where I've been tracking "nice to haves"
> > > for later, and one of them is whether it makes sense to tag binaries as
> > > "trusted" (e.g. extended attribute, label, whatever). It was something I
> > > wanted to bring up at some point as potentially worth considering.  
> > 
> > Scratch that. There is no such thing as a trusted binary.

> There is if you are using signing and the like. I'm sure SELinux and
> friends will grow the ability to set per process policy but that's
> certainly not a priority.
>
> However the question is wrong. 'trusted' is a binary operator not a
> unary one.

Alan's observations are correct.

In our autonomous introspection work we apply the notion that
'trusted' is a binary characteristic of a context of execution (COE).
Its value is an expression of whether or not the information exchange
events it has been involved in have deviated from the desired
execution trajectory path of the system.

It is a decidedly different way of thinking about things.  Most
importantly it is a namespaceable characteristic.

We have already written the futuristic LSM that Alan aludes to in
order to implement per COE security policies and forensics for
actors/COE's that have gone over to the 'dark side'.

> Alan

Have a good weekend.

Dr. Greg

}-- End of excerpt from Alan Cox

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"Given a choice between a complex, difficult-to-understand,
 disconcerting explanation and a simplistic, comforting one, many
 prefer simplistic comfort if it's remotely plausible, especially if it
 involves blaming someone else for their problems."
-- Bob Lewis
   _Infoworld_


[PATCH v8 00/12] Retpoline: Avoid speculative indirect calls in kernel

2018-01-11 Thread David Woodhouse
This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the hjl/indirect/gcc-7-branch/master
branch of https://github.com/hjl-tools/gcc/commits/hjl and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

Fedora 27 packages of the updated compiler are available at
https://koji.fedoraproject.org/koji/taskinfo?taskID=24065739


v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch
v6: Update to latest GCC patches with no dots in symbols
Fix MODVERSIONS properly(ish)
Fix typo breaking 32-bit, introduced in V5
Never set X86_FEATURE_RETPOLINE_AMD yet, pending confirmation
v7: Further bikeshedding on macro names
Stuff RSB on kernel entry
Implement 'spectre_v2=' command line option for IBRS/IBPB too
Revert to precisely the asm sequences from the Google paper
v8: Re-enable (I won't say "fix") objtool support
Use numeric labels for GCC compatibility
Add support for RSB-stuffing on vmexit
I don't know... other bloody bikeshedding. Can I sleep now?

Andi Kleen (1):
  x86/retpoline/irq32: Convert assembler indirect jumps

David Woodhouse (10):
  objtool: Allow alternatives to be ignored
  x86/retpoline: Add initial retpoline support
  x86/spectre: Add boot time option to select Spectre v2 mitigation
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps
  x86/retpoline: Fill return stack buffer on vmexit

Josh Poimboeuf (1):
  objtool: Detect jumps to retpoline thunks

 Documentation/admin-guide/kernel-parameters.txt |  28 
 arch/x86/Kconfig|  13 ++
 arch/x86/Makefile   |  10 ++
 arch/x86/crypto/aesni-intel_asm.S   |   5 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |   3 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S|   3 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S   |   3 +-
 arch/x86/entry/entry_32.S   |   5 +-
 arch/x86/entry/entry_64.S   |  12 +-
 arch/x86/include/asm/asm-prototypes.h   |  25 +++
 arch/x86/include/asm/cpufeatures.h  |   2 +
 arch/x86/include/asm/mshyperv.h |  18 +-
 arch/x86/include/asm/nospec-branch.h| 209 
 arch/x86/include/asm/xen/hypercall.h|   5 +-
 arch/x86/kernel/cpu/bugs.c  | 158 +-
 arch/x86/kernel/ftrace_32.S |   6 +-
 arch/x86/kernel/ftrace_64.S |   8 +-
 arch/x86/kernel/irq_32.c|   9 +-
 arch/x86/kvm/svm.c  |   4 +
 arch/x86/kvm/vmx.c  |   4 +
 arch/x86/lib/Makefile   |   1 +
 arch/x86/lib/checksum_32.S  |   7 +-
 arch/x86/lib/retpoline.S|  48 ++
 tools/objtool/check.c   |  69 +++-
 tools/objtool/check.h   |   2 +-
 25 files changed, 616 insertions(+), 41 deletions(-)
 create mode 100644 arch/x86/include/asm/nospec-branch.h
 create mode 100644 arch/x86/lib/retpoline.S

-- 
2.7.4



[PATCH v8 00/12] Retpoline: Avoid speculative indirect calls in kernel

2018-01-11 Thread David Woodhouse
This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the hjl/indirect/gcc-7-branch/master
branch of https://github.com/hjl-tools/gcc/commits/hjl and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

Fedora 27 packages of the updated compiler are available at
https://koji.fedoraproject.org/koji/taskinfo?taskID=24065739


v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch
v6: Update to latest GCC patches with no dots in symbols
Fix MODVERSIONS properly(ish)
Fix typo breaking 32-bit, introduced in V5
Never set X86_FEATURE_RETPOLINE_AMD yet, pending confirmation
v7: Further bikeshedding on macro names
Stuff RSB on kernel entry
Implement 'spectre_v2=' command line option for IBRS/IBPB too
Revert to precisely the asm sequences from the Google paper
v8: Re-enable (I won't say "fix") objtool support
Use numeric labels for GCC compatibility
Add support for RSB-stuffing on vmexit
I don't know... other bloody bikeshedding. Can I sleep now?

Andi Kleen (1):
  x86/retpoline/irq32: Convert assembler indirect jumps

David Woodhouse (10):
  objtool: Allow alternatives to be ignored
  x86/retpoline: Add initial retpoline support
  x86/spectre: Add boot time option to select Spectre v2 mitigation
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps
  x86/retpoline: Fill return stack buffer on vmexit

Josh Poimboeuf (1):
  objtool: Detect jumps to retpoline thunks

 Documentation/admin-guide/kernel-parameters.txt |  28 
 arch/x86/Kconfig|  13 ++
 arch/x86/Makefile   |  10 ++
 arch/x86/crypto/aesni-intel_asm.S   |   5 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |   3 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S|   3 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S   |   3 +-
 arch/x86/entry/entry_32.S   |   5 +-
 arch/x86/entry/entry_64.S   |  12 +-
 arch/x86/include/asm/asm-prototypes.h   |  25 +++
 arch/x86/include/asm/cpufeatures.h  |   2 +
 arch/x86/include/asm/mshyperv.h |  18 +-
 arch/x86/include/asm/nospec-branch.h| 209 
 arch/x86/include/asm/xen/hypercall.h|   5 +-
 arch/x86/kernel/cpu/bugs.c  | 158 +-
 arch/x86/kernel/ftrace_32.S |   6 +-
 arch/x86/kernel/ftrace_64.S |   8 +-
 arch/x86/kernel/irq_32.c|   9 +-
 arch/x86/kvm/svm.c  |   4 +
 arch/x86/kvm/vmx.c  |   4 +
 arch/x86/lib/Makefile   |   1 +
 arch/x86/lib/checksum_32.S  |   7 +-
 arch/x86/lib/retpoline.S|  48 ++
 tools/objtool/check.c   |  69 +++-
 tools/objtool/check.h   |   2 +-
 25 files changed, 616 insertions(+), 41 deletions(-)
 create mode 100644 arch/x86/include/asm/nospec-branch.h
 create mode 100644 arch/x86/lib/retpoline.S

-- 
2.7.4



Re: Avoid speculative indirect calls in kernel

2018-01-10 Thread Thomas Gleixner
On Tue, 9 Jan 2018, Dave Hansen wrote:
> On 01/09/2018 04:45 PM, Thomas Gleixner wrote:
> > On Mon, 8 Jan 2018, Andrea Arcangeli wrote:
> >> On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
> >> Did my best to do the cleanest patch for tip, but I now figured Dave's
> >> original comment was spot on: a _PAGE_NX clear then becomes necessary
> >> also after pud_alloc not only after p4d_alloc.
> >>
> >> pmd_alloc would run into the same with x86 32bit non-PAE too.
> 
> non-PAE doesn't have an NX bit. :)
> 
> But we #define _PAGE_NX down to 0 there so it's harmless.
> 
> >> So there are two choices, either going back to one single _PAGE_NX
> >> clear from the original Dave's original patch as below, or to add
> >> multiple clear after each level which was my objective and is more
> >> robust, but it may be overkill in this case. As long as it was one
> >> line it looked a clear improvement.
> >>
> >> Considering the caller in both cases is going to abort I guess we can
> >> use the one liner approach as Dave and Jiri did originally.
> > 
> > Dave ?
> 
> I agree with Andrea.  The patch in -tip potentially misses the pgd
> clearing if pud_alloc() sets a PGD.  It would also be nice to have that
> comment back.
> 
> Note that the -tip commit probably works in *practice* because for two
> adjacent calls to map_tboot_page() that share a PGD entry, the first
> will clear NX, *then* allocate and set the PGD (without NX clear).  The
> second call will *not* allocate but will clear the NX bit.
> 
> The patch I think we want is attached.

Color me confused. I have queued the one below in tip. It lacks the comment
and does the !NX at a different place.

Thanks,

tglx

8<- 

commit 262b6b30087246abf09d6275eb0c0dc421bcbe38
Author: Dave Hansen 
Date:   Sat Jan 6 18:41:14 2018 +0100

x86/tboot: Unbreak tboot with PTI enabled

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.

Undo the poison to allow execution.

Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig")
Signed-off-by: Dave Hansen 
Signed-off-by: Andrea Arcangeli 
Signed-off-by: Thomas Gleixner 
Cc: Alan Cox 
Cc: Tim Chen 
Cc: Jon Masters 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Jeff Law 
Cc: Paolo Bonzini 
Cc: Linus Torvalds 
Cc: Greg Kroah-Hartman 
Cc: David" 
Cc: Nick Clifton 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180108102805.gk25...@redhat.com

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index a4eb27918ceb..75869a4b6c41 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -127,6 +127,7 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
long pfn,
p4d = p4d_alloc(_mm, pgd, vaddr);
if (!p4d)
return -1;
+   pgd->pgd &= ~_PAGE_NX;
pud = pud_alloc(_mm, p4d, vaddr);
if (!pud)
return -1;



Re: Avoid speculative indirect calls in kernel

2018-01-10 Thread Thomas Gleixner
On Tue, 9 Jan 2018, Dave Hansen wrote:
> On 01/09/2018 04:45 PM, Thomas Gleixner wrote:
> > On Mon, 8 Jan 2018, Andrea Arcangeli wrote:
> >> On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
> >> Did my best to do the cleanest patch for tip, but I now figured Dave's
> >> original comment was spot on: a _PAGE_NX clear then becomes necessary
> >> also after pud_alloc not only after p4d_alloc.
> >>
> >> pmd_alloc would run into the same with x86 32bit non-PAE too.
> 
> non-PAE doesn't have an NX bit. :)
> 
> But we #define _PAGE_NX down to 0 there so it's harmless.
> 
> >> So there are two choices, either going back to one single _PAGE_NX
> >> clear from the original Dave's original patch as below, or to add
> >> multiple clear after each level which was my objective and is more
> >> robust, but it may be overkill in this case. As long as it was one
> >> line it looked a clear improvement.
> >>
> >> Considering the caller in both cases is going to abort I guess we can
> >> use the one liner approach as Dave and Jiri did originally.
> > 
> > Dave ?
> 
> I agree with Andrea.  The patch in -tip potentially misses the pgd
> clearing if pud_alloc() sets a PGD.  It would also be nice to have that
> comment back.
> 
> Note that the -tip commit probably works in *practice* because for two
> adjacent calls to map_tboot_page() that share a PGD entry, the first
> will clear NX, *then* allocate and set the PGD (without NX clear).  The
> second call will *not* allocate but will clear the NX bit.
> 
> The patch I think we want is attached.

Color me confused. I have queued the one below in tip. It lacks the comment
and does the !NX at a different place.

Thanks,

tglx

8<- 

commit 262b6b30087246abf09d6275eb0c0dc421bcbe38
Author: Dave Hansen 
Date:   Sat Jan 6 18:41:14 2018 +0100

x86/tboot: Unbreak tboot with PTI enabled

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.

Undo the poison to allow execution.

Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig")
Signed-off-by: Dave Hansen 
Signed-off-by: Andrea Arcangeli 
Signed-off-by: Thomas Gleixner 
Cc: Alan Cox 
Cc: Tim Chen 
Cc: Jon Masters 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Jeff Law 
Cc: Paolo Bonzini 
Cc: Linus Torvalds 
Cc: Greg Kroah-Hartman 
Cc: David" 
Cc: Nick Clifton 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180108102805.gk25...@redhat.com

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index a4eb27918ceb..75869a4b6c41 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -127,6 +127,7 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
long pfn,
p4d = p4d_alloc(_mm, pgd, vaddr);
if (!p4d)
return -1;
+   pgd->pgd &= ~_PAGE_NX;
pud = pud_alloc(_mm, p4d, vaddr);
if (!pud)
return -1;



Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-10 Thread Dr. David Alan Gilbert
* Woodhouse, David (d...@amazon.co.uk) wrote:
> On Mon, 2018-01-08 at 02:42 -0800, Paul Turner wrote:
> > 
> > While the cases above involve the crafting and use of poisoned
> > entries.  Recall also that one of the initial conditions was that we
> > should avoid RSB underflow as some CPUs may try to use other indirect
> > predictors when this occurs.
> 
> I think we should start by deliberately ignoring the CPUs which use the
> other indirect predictors on RSB underflow. Those CPUs don't perform
> *quite* so badly with IBRS anyway.
> 
> Let's get the minimum amount of RSB handling in to cope with the pre-
> SKL CPUs, and then see if we really do want to extend it to make SKL
> 100% secure in retpoline mode or not.

How do you make decisions on which CPU you're running on?
I'm worried about the case of a VM that starts off on an older host
and then gets live migrated to a new Skylake.
For Intel CPUs we've historically been safe to live migrate
to any newer host based on having all the features that the old one had;
with the guest still seeing the flags etc for the old CPU.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-10 Thread Dr. David Alan Gilbert
* Woodhouse, David (d...@amazon.co.uk) wrote:
> On Mon, 2018-01-08 at 02:42 -0800, Paul Turner wrote:
> > 
> > While the cases above involve the crafting and use of poisoned
> > entries.  Recall also that one of the initial conditions was that we
> > should avoid RSB underflow as some CPUs may try to use other indirect
> > predictors when this occurs.
> 
> I think we should start by deliberately ignoring the CPUs which use the
> other indirect predictors on RSB underflow. Those CPUs don't perform
> *quite* so badly with IBRS anyway.
> 
> Let's get the minimum amount of RSB handling in to cope with the pre-
> SKL CPUs, and then see if we really do want to extend it to make SKL
> 100% secure in retpoline mode or not.

How do you make decisions on which CPU you're running on?
I'm worried about the case of a VM that starts off on an older host
and then gets live migrated to a new Skylake.
For Intel CPUs we've historically been safe to live migrate
to any newer host based on having all the features that the old one had;
with the guest still seeing the flags etc for the old CPU.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-10 Thread Woodhouse, David
On Mon, 2018-01-08 at 02:42 -0800, Paul Turner wrote:
> 
> While the cases above involve the crafting and use of poisoned
> entries.  Recall also that one of the initial conditions was that we
> should avoid RSB underflow as some CPUs may try to use other indirect
> predictors when this occurs.

I think we should start by deliberately ignoring the CPUs which use the
other indirect predictors on RSB underflow. Those CPUs don't perform
*quite* so badly with IBRS anyway.

Let's get the minimum amount of RSB handling in to cope with the pre-
SKL CPUs, and then see if we really do want to extend it to make SKL
100% secure in retpoline mode or not.

So let's go through your list of cases and attempt to distinguish the
underflow concerns (which I declare we don't care about for now) from
the pollution (which we care about especially for non-SMEP) concerns...

> The cases we care about here are:
> - When we return _into_ protected execution.  For the kernel, this
> means when we exit interrupt context into kernel context, since may
> have emptied or reduced the number of RSB entries while in iinterrupt
> context.

Don't care about that particular example. That's underflow-only.

However, we *do* care about entry to kernel code from userspace, for
interrupts and system calls etc. Basically everywhere that the IBRS
code would be setting IBRS, we need to flush the RSB (if !SMEP, I
think).

> - Context switch (even if we are returning to user code, we need to
> at unwind the scheduler/triggering frames that preempted it
> previously, considering that detail, this is a subset of the above,
> but listed for completeness)

Don't care. This is underflow-only. (Which means I think we want to
drop Andi's patch?)

> - On VMEXIT (it turns out we need to worry about both poisoned
> entries, and no entries, the solution is a single refill
> nonetheless).

Do care. This fixes pollution from the guest, and even SMEP isn't
enough to make us not care.

> - Leaving deeper (>C1) c-states, which may have flushed hardware
> state

Don't care.

> - Where we are unwinding call-chains of >16 entries[*]

Don't care.

Overall, I think the RSB-stuffing is needed in all the same places that
it's needed with IBRS.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-10 Thread Woodhouse, David
On Mon, 2018-01-08 at 02:42 -0800, Paul Turner wrote:
> 
> While the cases above involve the crafting and use of poisoned
> entries.  Recall also that one of the initial conditions was that we
> should avoid RSB underflow as some CPUs may try to use other indirect
> predictors when this occurs.

I think we should start by deliberately ignoring the CPUs which use the
other indirect predictors on RSB underflow. Those CPUs don't perform
*quite* so badly with IBRS anyway.

Let's get the minimum amount of RSB handling in to cope with the pre-
SKL CPUs, and then see if we really do want to extend it to make SKL
100% secure in retpoline mode or not.

So let's go through your list of cases and attempt to distinguish the
underflow concerns (which I declare we don't care about for now) from
the pollution (which we care about especially for non-SMEP) concerns...

> The cases we care about here are:
> - When we return _into_ protected execution.  For the kernel, this
> means when we exit interrupt context into kernel context, since may
> have emptied or reduced the number of RSB entries while in iinterrupt
> context.

Don't care about that particular example. That's underflow-only.

However, we *do* care about entry to kernel code from userspace, for
interrupts and system calls etc. Basically everywhere that the IBRS
code would be setting IBRS, we need to flush the RSB (if !SMEP, I
think).

> - Context switch (even if we are returning to user code, we need to
> at unwind the scheduler/triggering frames that preempted it
> previously, considering that detail, this is a subset of the above,
> but listed for completeness)

Don't care. This is underflow-only. (Which means I think we want to
drop Andi's patch?)

> - On VMEXIT (it turns out we need to worry about both poisoned
> entries, and no entries, the solution is a single refill
> nonetheless).

Do care. This fixes pollution from the guest, and even SMEP isn't
enough to make us not care.

> - Leaving deeper (>C1) c-states, which may have flushed hardware
> state

Don't care.

> - Where we are unwinding call-chains of >16 entries[*]

Don't care.

Overall, I think the RSB-stuffing is needed in all the same places that
it's needed with IBRS.

smime.p7s
Description: S/MIME cryptographic signature


Re: Avoid speculative indirect calls in kernel

2018-01-09 Thread Dave Hansen
On 01/09/2018 04:45 PM, Thomas Gleixner wrote:
> On Mon, 8 Jan 2018, Andrea Arcangeli wrote:
>> On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
>> Did my best to do the cleanest patch for tip, but I now figured Dave's
>> original comment was spot on: a _PAGE_NX clear then becomes necessary
>> also after pud_alloc not only after p4d_alloc.
>>
>> pmd_alloc would run into the same with x86 32bit non-PAE too.

non-PAE doesn't have an NX bit. :)

But we #define _PAGE_NX down to 0 there so it's harmless.

>> So there are two choices, either going back to one single _PAGE_NX
>> clear from the original Dave's original patch as below, or to add
>> multiple clear after each level which was my objective and is more
>> robust, but it may be overkill in this case. As long as it was one
>> line it looked a clear improvement.
>>
>> Considering the caller in both cases is going to abort I guess we can
>> use the one liner approach as Dave and Jiri did originally.
> 
> Dave ?

I agree with Andrea.  The patch in -tip potentially misses the pgd
clearing if pud_alloc() sets a PGD.  It would also be nice to have that
comment back.

Note that the -tip commit probably works in *practice* because for two
adjacent calls to map_tboot_page() that share a PGD entry, the first
will clear NX, *then* allocate and set the PGD (without NX clear).  The
second call will *not* allocate but will clear the NX bit.

The patch I think we want is attached.

From: Dave Hansen 

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.  Undo
the poison to allow execution.

Signed-off-by: Dave Hansen 
Cc: Ning Sun 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: tboot-de...@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
---

 b/arch/x86/kernel/tboot.c |   11 +++
 1 file changed, 11 insertions(+)

diff -puN arch/x86/kernel/tboot.c~pti-tboot-fix arch/x86/kernel/tboot.c
--- a/arch/x86/kernel/tboot.c~pti-tboot-fix	2018-01-05 21:50:55.74960 -0800
+++ b/arch/x86/kernel/tboot.c	2018-01-05 23:51:41.368536890 -0800
@@ -138,6 +138,17 @@ static int map_tboot_page(unsigned long
 		return -1;
 	set_pte_at(_mm, vaddr, pte, pfn_pte(pfn, prot));
 	pte_unmap(pte);
+
+	/*
+	 * PTI poisons low addresses in the kernel page tables in the
+	 * name of making them unusable for userspace.  To execute
+	 * code at such a low address, the poison must be cleared.
+	 *
+	 * Note: 'pgd' actually gets set in p4d_alloc() _or_
+	 * pud_alloc() depending on 4/5-level paging.
+	 */
+	pgd->pgd &= ~_PAGE_NX;
+
 	return 0;
 }
 
_


Re: Avoid speculative indirect calls in kernel

2018-01-09 Thread Dave Hansen
On 01/09/2018 04:45 PM, Thomas Gleixner wrote:
> On Mon, 8 Jan 2018, Andrea Arcangeli wrote:
>> On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
>> Did my best to do the cleanest patch for tip, but I now figured Dave's
>> original comment was spot on: a _PAGE_NX clear then becomes necessary
>> also after pud_alloc not only after p4d_alloc.
>>
>> pmd_alloc would run into the same with x86 32bit non-PAE too.

non-PAE doesn't have an NX bit. :)

But we #define _PAGE_NX down to 0 there so it's harmless.

>> So there are two choices, either going back to one single _PAGE_NX
>> clear from the original Dave's original patch as below, or to add
>> multiple clear after each level which was my objective and is more
>> robust, but it may be overkill in this case. As long as it was one
>> line it looked a clear improvement.
>>
>> Considering the caller in both cases is going to abort I guess we can
>> use the one liner approach as Dave and Jiri did originally.
> 
> Dave ?

I agree with Andrea.  The patch in -tip potentially misses the pgd
clearing if pud_alloc() sets a PGD.  It would also be nice to have that
comment back.

Note that the -tip commit probably works in *practice* because for two
adjacent calls to map_tboot_page() that share a PGD entry, the first
will clear NX, *then* allocate and set the PGD (without NX clear).  The
second call will *not* allocate but will clear the NX bit.

The patch I think we want is attached.

From: Dave Hansen 

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.  Undo
the poison to allow execution.

Signed-off-by: Dave Hansen 
Cc: Ning Sun 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: tboot-de...@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
---

 b/arch/x86/kernel/tboot.c |   11 +++
 1 file changed, 11 insertions(+)

diff -puN arch/x86/kernel/tboot.c~pti-tboot-fix arch/x86/kernel/tboot.c
--- a/arch/x86/kernel/tboot.c~pti-tboot-fix	2018-01-05 21:50:55.74960 -0800
+++ b/arch/x86/kernel/tboot.c	2018-01-05 23:51:41.368536890 -0800
@@ -138,6 +138,17 @@ static int map_tboot_page(unsigned long
 		return -1;
 	set_pte_at(_mm, vaddr, pte, pfn_pte(pfn, prot));
 	pte_unmap(pte);
+
+	/*
+	 * PTI poisons low addresses in the kernel page tables in the
+	 * name of making them unusable for userspace.  To execute
+	 * code at such a low address, the poison must be cleared.
+	 *
+	 * Note: 'pgd' actually gets set in p4d_alloc() _or_
+	 * pud_alloc() depending on 4/5-level paging.
+	 */
+	pgd->pgd &= ~_PAGE_NX;
+
 	return 0;
 }
 
_


Re: Avoid speculative indirect calls in kernel

2018-01-09 Thread Thomas Gleixner
On Mon, 8 Jan 2018, Andrea Arcangeli wrote:

> On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
> > Thanks for resending it.
> 
> Thanks to you for the PTI improvements!
> 
> Did my best to do the cleanest patch for tip, but I now figured Dave's
> original comment was spot on: a _PAGE_NX clear then becomes necessary
> also after pud_alloc not only after p4d_alloc.
> 
> pmd_alloc would run into the same with x86 32bit non-PAE too.
> 
> So there are two choices, either going back to one single _PAGE_NX
> clear from the original Dave's original patch as below, or to add
> multiple clear after each level which was my objective and is more
> robust, but it may be overkill in this case. As long as it was one
> line it looked a clear improvement.
> 
> Considering the caller in both cases is going to abort I guess we can
> use the one liner approach as Dave and Jiri did originally.

Dave ?

> 
> It's up to you, doing it at each level would be more resilent in case
> the caller is changed.
> 
> For the efi_64 same issue, the current tip patch will work better, but
> it can still be cleaned up with pgd_efi instead of pgd_offset_k().
> 
> I got partly fooled because it worked great with 4levels, but it
> wasn't ok anyway for 32bit non-PAE. Sometime it's the simpler stuff
> that gets more subtle.
> 
> Andrea
> 
> >From 391517951e904cdd231dda9943c36a25a7bf01b9 Mon Sep 17 00:00:00 2001
> From: Dave Hansen 
> Date: Sat, 6 Jan 2018 18:41:14 +0100
> Subject: [PATCH 1/1] x86/kaiser/efi: unbreak tboot
> 
> This is another case similar to what EFI does: create a new set of
> page tables, map some code at a low address, and jump to it.  PTI
> mistakes this low address for userspace and mistakenly marks it
> non-executable in an effort to make it unusable for userspace.  Undo
> the poison to allow execution.
> 
> Signed-off-by: Dave Hansen 
> Cc: Ning Sun 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: tboot-de...@lists.sourceforge.net
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Andrea Arcangeli 
> ---
>  arch/x86/kernel/tboot.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
> index a4eb27918ceb..a2486f444073 100644
> --- a/arch/x86/kernel/tboot.c
> +++ b/arch/x86/kernel/tboot.c
> @@ -138,6 +138,17 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
> long pfn,
>   return -1;
>   set_pte_at(_mm, vaddr, pte, pfn_pte(pfn, prot));
>   pte_unmap(pte);
> +
> + /*
> +  * PTI poisons low addresses in the kernel page tables in the
> +  * name of making them unusable for userspace.  To execute
> +  * code at such a low address, the poison must be cleared.
> +  *
> +  * Note: 'pgd' actually gets set in p4d_alloc() _or_
> +  * pud_alloc() depending on 4/5-level paging.
> +  */
> + pgd->pgd &= ~_PAGE_NX;
> +
>   return 0;
>  }
>  
> 


Re: Avoid speculative indirect calls in kernel

2018-01-09 Thread Thomas Gleixner
On Mon, 8 Jan 2018, Andrea Arcangeli wrote:

> On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
> > Thanks for resending it.
> 
> Thanks to you for the PTI improvements!
> 
> Did my best to do the cleanest patch for tip, but I now figured Dave's
> original comment was spot on: a _PAGE_NX clear then becomes necessary
> also after pud_alloc not only after p4d_alloc.
> 
> pmd_alloc would run into the same with x86 32bit non-PAE too.
> 
> So there are two choices, either going back to one single _PAGE_NX
> clear from the original Dave's original patch as below, or to add
> multiple clear after each level which was my objective and is more
> robust, but it may be overkill in this case. As long as it was one
> line it looked a clear improvement.
> 
> Considering the caller in both cases is going to abort I guess we can
> use the one liner approach as Dave and Jiri did originally.

Dave ?

> 
> It's up to you, doing it at each level would be more resilent in case
> the caller is changed.
> 
> For the efi_64 same issue, the current tip patch will work better, but
> it can still be cleaned up with pgd_efi instead of pgd_offset_k().
> 
> I got partly fooled because it worked great with 4levels, but it
> wasn't ok anyway for 32bit non-PAE. Sometime it's the simpler stuff
> that gets more subtle.
> 
> Andrea
> 
> >From 391517951e904cdd231dda9943c36a25a7bf01b9 Mon Sep 17 00:00:00 2001
> From: Dave Hansen 
> Date: Sat, 6 Jan 2018 18:41:14 +0100
> Subject: [PATCH 1/1] x86/kaiser/efi: unbreak tboot
> 
> This is another case similar to what EFI does: create a new set of
> page tables, map some code at a low address, and jump to it.  PTI
> mistakes this low address for userspace and mistakenly marks it
> non-executable in an effort to make it unusable for userspace.  Undo
> the poison to allow execution.
> 
> Signed-off-by: Dave Hansen 
> Cc: Ning Sun 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: tboot-de...@lists.sourceforge.net
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Andrea Arcangeli 
> ---
>  arch/x86/kernel/tboot.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
> index a4eb27918ceb..a2486f444073 100644
> --- a/arch/x86/kernel/tboot.c
> +++ b/arch/x86/kernel/tboot.c
> @@ -138,6 +138,17 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
> long pfn,
>   return -1;
>   set_pte_at(_mm, vaddr, pte, pfn_pte(pfn, prot));
>   pte_unmap(pte);
> +
> + /*
> +  * PTI poisons low addresses in the kernel page tables in the
> +  * name of making them unusable for userspace.  To execute
> +  * code at such a low address, the poison must be cleared.
> +  *
> +  * Note: 'pgd' actually gets set in p4d_alloc() _or_
> +  * pud_alloc() depending on 4/5-level paging.
> +  */
> + pgd->pgd &= ~_PAGE_NX;
> +
>   return 0;
>  }
>  
> 


[PATCH v7 00/11] Retpoline: Avoid speculative indirect calls in kernel

2018-01-09 Thread David Woodhouse
This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the hjl/indirect/gcc-7-branch/master
branch of https://github.com/hjl-tools/gcc/commits/hjl and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

Fedora 27 packages of the updated compiler are available at
https://koji.fedoraproject.org/koji/taskinfo?taskID=24065739


v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch
v6: Update to latest GCC patches with no dots in symbols
Fix MODVERSIONS properly(ish)
Fix typo breaking 32-bit, introduced in V5
Never set X86_FEATURE_RETPOLINE_AMD yet, pending confirmation
v7: Further bikeshedding on macro names
Stuff RSB on kernel entry
Implement 'spectre_v2=' command line option for IBRS/IBPB too
Revert to precisely the asm sequences from the Google paper

Andi Kleen (3):
  x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline: Avoid return buffer underflows on context switch

David Woodhouse (8):
  x86/retpoline: Add initial retpoline support
  x86/spectre: Add boot time option to select Spectre v2 mitigation
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps

 Documentation/admin-guide/kernel-parameters.txt |  28 +
 arch/x86/Kconfig|  17 ++-
 arch/x86/Kconfig.debug  |   6 +-
 arch/x86/Makefile   |  10 ++
 arch/x86/crypto/aesni-intel_asm.S   |   5 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |   3 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S|   3 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S   |   3 +-
 arch/x86/entry/entry_32.S   |  22 +++-
 arch/x86/entry/entry_64.S   |  29 -
 arch/x86/include/asm/asm-prototypes.h   |  25 
 arch/x86/include/asm/cpufeatures.h  |   2 +
 arch/x86/include/asm/mshyperv.h |  18 +--
 arch/x86/include/asm/nospec-branch.h| 158 
 arch/x86/include/asm/xen/hypercall.h|   5 +-
 arch/x86/kernel/cpu/bugs.c  | 122 +-
 arch/x86/kernel/ftrace_32.S |   6 +-
 arch/x86/kernel/ftrace_64.S |   8 +-
 arch/x86/kernel/irq_32.c|   9 +-
 arch/x86/kernel/setup.c |   3 +
 arch/x86/lib/Makefile   |   1 +
 arch/x86/lib/checksum_32.S  |   7 +-
 arch/x86/lib/retpoline.S|  48 +++
 23 files changed, 499 insertions(+), 39 deletions(-)
 create mode 100644 arch/x86/include/asm/nospec-branch.h
 create mode 100644 arch/x86/lib/retpoline.S

-- 
2.7.4



[PATCH v7 00/11] Retpoline: Avoid speculative indirect calls in kernel

2018-01-09 Thread David Woodhouse
This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the hjl/indirect/gcc-7-branch/master
branch of https://github.com/hjl-tools/gcc/commits/hjl and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

Fedora 27 packages of the updated compiler are available at
https://koji.fedoraproject.org/koji/taskinfo?taskID=24065739


v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch
v6: Update to latest GCC patches with no dots in symbols
Fix MODVERSIONS properly(ish)
Fix typo breaking 32-bit, introduced in V5
Never set X86_FEATURE_RETPOLINE_AMD yet, pending confirmation
v7: Further bikeshedding on macro names
Stuff RSB on kernel entry
Implement 'spectre_v2=' command line option for IBRS/IBPB too
Revert to precisely the asm sequences from the Google paper

Andi Kleen (3):
  x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline: Avoid return buffer underflows on context switch

David Woodhouse (8):
  x86/retpoline: Add initial retpoline support
  x86/spectre: Add boot time option to select Spectre v2 mitigation
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps

 Documentation/admin-guide/kernel-parameters.txt |  28 +
 arch/x86/Kconfig|  17 ++-
 arch/x86/Kconfig.debug  |   6 +-
 arch/x86/Makefile   |  10 ++
 arch/x86/crypto/aesni-intel_asm.S   |   5 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |   3 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S|   3 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S   |   3 +-
 arch/x86/entry/entry_32.S   |  22 +++-
 arch/x86/entry/entry_64.S   |  29 -
 arch/x86/include/asm/asm-prototypes.h   |  25 
 arch/x86/include/asm/cpufeatures.h  |   2 +
 arch/x86/include/asm/mshyperv.h |  18 +--
 arch/x86/include/asm/nospec-branch.h| 158 
 arch/x86/include/asm/xen/hypercall.h|   5 +-
 arch/x86/kernel/cpu/bugs.c  | 122 +-
 arch/x86/kernel/ftrace_32.S |   6 +-
 arch/x86/kernel/ftrace_64.S |   8 +-
 arch/x86/kernel/irq_32.c|   9 +-
 arch/x86/kernel/setup.c |   3 +
 arch/x86/lib/Makefile   |   1 +
 arch/x86/lib/checksum_32.S  |   7 +-
 arch/x86/lib/retpoline.S|  48 +++
 23 files changed, 499 insertions(+), 39 deletions(-)
 create mode 100644 arch/x86/include/asm/nospec-branch.h
 create mode 100644 arch/x86/lib/retpoline.S

-- 
2.7.4



Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Samir Bellabes
Alan Cox  writes:

> On Fri, 5 Jan 2018 01:54:13 +0100 (CET)
> Thomas Gleixner  wrote:
>
>> On Thu, 4 Jan 2018, Jon Masters wrote:
>> > P.S. I've an internal document where I've been tracking "nice to haves"
>> > for later, and one of them is whether it makes sense to tag binaries as
>> > "trusted" (e.g. extended attribute, label, whatever). It was something I
>> > wanted to bring up at some point as potentially worth considering.  
>> 
>> Scratch that. There is no such thing as a trusted binary.
>
> There is if you are using signing and the like. I'm sure SELiux and
> friends will grow the ability to set per process policy but that's
> certainly not a priority.

There was a proposed security module providing such a per-process
policy. 

When a process want to execute a specific networking syscall regarding
specific "transport protocol", the security module catches the syscall
at the LSM hook, and ask user about the "verdict" (authorized or not ?) 

Verdicts are put inside "tickets" (it's a struct of information
regarding the autorization). Verdicts can have timeout or live
forever. They are managed by a hashtable.

The policy can be define by attaching tickets to process with a
userspace tool. Interface between userspace command tool and kernel is
using netlink protocol.

I managed to do the same on process and memory. memory access requires
process to delivery a available ticket. Sharing memory is like "process
A has a ticket required to access memory of process B"

Of course, direct assignation, throught asm code or operation like :
buffer[x] = y;
are impossible to catch at this level. It requires hooks at the asm
level.

As I understand, Willy needs to built such a took to classify "trusted"
binaries from others.

This is just the top of the iceberg, because, after starting to mark
process as "trusted" or not, there is a need of an architecture to track
such operations, evaluate incoherences, evaluate the convergence of such
classification, regarding thousands of binaries, in a lot of
contexts. This was the big part of the job.


last series I propose was years ago under the name :
[RFC,v3,00/10] snet: Security for NETwork syscalls

and particulary :
[RFC,v3,08/10] snet: introduce snet_ticket
http://patchwork.ozlabs.org/patch/93808/ 



thanks;
sam


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Samir Bellabes
Alan Cox  writes:

> On Fri, 5 Jan 2018 01:54:13 +0100 (CET)
> Thomas Gleixner  wrote:
>
>> On Thu, 4 Jan 2018, Jon Masters wrote:
>> > P.S. I've an internal document where I've been tracking "nice to haves"
>> > for later, and one of them is whether it makes sense to tag binaries as
>> > "trusted" (e.g. extended attribute, label, whatever). It was something I
>> > wanted to bring up at some point as potentially worth considering.  
>> 
>> Scratch that. There is no such thing as a trusted binary.
>
> There is if you are using signing and the like. I'm sure SELiux and
> friends will grow the ability to set per process policy but that's
> certainly not a priority.

There was a proposed security module providing such a per-process
policy. 

When a process want to execute a specific networking syscall regarding
specific "transport protocol", the security module catches the syscall
at the LSM hook, and ask user about the "verdict" (authorized or not ?) 

Verdicts are put inside "tickets" (it's a struct of information
regarding the autorization). Verdicts can have timeout or live
forever. They are managed by a hashtable.

The policy can be define by attaching tickets to process with a
userspace tool. Interface between userspace command tool and kernel is
using netlink protocol.

I managed to do the same on process and memory. memory access requires
process to delivery a available ticket. Sharing memory is like "process
A has a ticket required to access memory of process B"

Of course, direct assignation, throught asm code or operation like :
buffer[x] = y;
are impossible to catch at this level. It requires hooks at the asm
level.

As I understand, Willy needs to built such a took to classify "trusted"
binaries from others.

This is just the top of the iceberg, because, after starting to mark
process as "trusted" or not, there is a need of an architecture to track
such operations, evaluate incoherences, evaluate the convergence of such
classification, regarding thousands of binaries, in a lot of
contexts. This was the big part of the job.


last series I propose was years ago under the name :
[RFC,v3,00/10] snet: Security for NETwork syscalls

and particulary :
[RFC,v3,08/10] snet: introduce snet_ticket
http://patchwork.ozlabs.org/patch/93808/ 



thanks;
sam


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Andrea Arcangeli
On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
> Thanks for resending it.

Thanks to you for the PTI improvements!

Did my best to do the cleanest patch for tip, but I now figured Dave's
original comment was spot on: a _PAGE_NX clear then becomes necessary
also after pud_alloc not only after p4d_alloc.

pmd_alloc would run into the same with x86 32bit non-PAE too.

So there are two choices, either going back to one single _PAGE_NX
clear from the original Dave's original patch as below, or to add
multiple clear after each level which was my objective and is more
robust, but it may be overkill in this case. As long as it was one
line it looked a clear improvement.

Considering the caller in both cases is going to abort I guess we can
use the one liner approach as Dave and Jiri did originally.

It's up to you, doing it at each level would be more resilent in case
the caller is changed.

For the efi_64 same issue, the current tip patch will work better, but
it can still be cleaned up with pgd_efi instead of pgd_offset_k().

I got partly fooled because it worked great with 4levels, but it
wasn't ok anyway for 32bit non-PAE. Sometime it's the simpler stuff
that gets more subtle.

Andrea

>From 391517951e904cdd231dda9943c36a25a7bf01b9 Mon Sep 17 00:00:00 2001
From: Dave Hansen 
Date: Sat, 6 Jan 2018 18:41:14 +0100
Subject: [PATCH 1/1] x86/kaiser/efi: unbreak tboot

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.  Undo
the poison to allow execution.

Signed-off-by: Dave Hansen 
Cc: Ning Sun 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: tboot-de...@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andrea Arcangeli 
---
 arch/x86/kernel/tboot.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index a4eb27918ceb..a2486f444073 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -138,6 +138,17 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
long pfn,
return -1;
set_pte_at(_mm, vaddr, pte, pfn_pte(pfn, prot));
pte_unmap(pte);
+
+   /*
+* PTI poisons low addresses in the kernel page tables in the
+* name of making them unusable for userspace.  To execute
+* code at such a low address, the poison must be cleared.
+*
+* Note: 'pgd' actually gets set in p4d_alloc() _or_
+* pud_alloc() depending on 4/5-level paging.
+*/
+   pgd->pgd &= ~_PAGE_NX;
+
return 0;
 }
 


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Andrea Arcangeli
On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
> Thanks for resending it.

Thanks to you for the PTI improvements!

Did my best to do the cleanest patch for tip, but I now figured Dave's
original comment was spot on: a _PAGE_NX clear then becomes necessary
also after pud_alloc not only after p4d_alloc.

pmd_alloc would run into the same with x86 32bit non-PAE too.

So there are two choices, either going back to one single _PAGE_NX
clear from the original Dave's original patch as below, or to add
multiple clear after each level which was my objective and is more
robust, but it may be overkill in this case. As long as it was one
line it looked a clear improvement.

Considering the caller in both cases is going to abort I guess we can
use the one liner approach as Dave and Jiri did originally.

It's up to you, doing it at each level would be more resilent in case
the caller is changed.

For the efi_64 same issue, the current tip patch will work better, but
it can still be cleaned up with pgd_efi instead of pgd_offset_k().

I got partly fooled because it worked great with 4levels, but it
wasn't ok anyway for 32bit non-PAE. Sometime it's the simpler stuff
that gets more subtle.

Andrea

>From 391517951e904cdd231dda9943c36a25a7bf01b9 Mon Sep 17 00:00:00 2001
From: Dave Hansen 
Date: Sat, 6 Jan 2018 18:41:14 +0100
Subject: [PATCH 1/1] x86/kaiser/efi: unbreak tboot

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.  Undo
the poison to allow execution.

Signed-off-by: Dave Hansen 
Cc: Ning Sun 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: tboot-de...@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andrea Arcangeli 
---
 arch/x86/kernel/tboot.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index a4eb27918ceb..a2486f444073 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -138,6 +138,17 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
long pfn,
return -1;
set_pte_at(_mm, vaddr, pte, pfn_pte(pfn, prot));
pte_unmap(pte);
+
+   /*
+* PTI poisons low addresses in the kernel page tables in the
+* name of making them unusable for userspace.  To execute
+* code at such a low address, the poison must be cleared.
+*
+* Note: 'pgd' actually gets set in p4d_alloc() _or_
+* pud_alloc() depending on 4/5-level paging.
+*/
+   pgd->pgd &= ~_PAGE_NX;
+
return 0;
 }
 


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Thomas Gleixner
On Mon, 8 Jan 2018, Ingo Molnar wrote:
> * Linus Torvalds  wrote:
> 
> > On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse  wrote:
> > > This is a mitigation for the 'variant 2' attack described in
> > > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> > 
> > Ok, I don't love the patches, but I see nothing horribly wrong here
> > either, and I assume the performance impact of this is pretty minimal.
> > 
> > Thomas? I'm obviously doing rc7 today without these, but I assume the
> > x86 maintainers are resigned to this all. And yes, we'll have at least
> > an rc8 this release..
> 
> I'm definitely resigned to them, and with these patches being disclosed so 
> late
> we don't have any good choices left, so a tentative:
> 
> Acked-by: Ingo Molnar 

Just doing the last polishing on that. I'll add your ack


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Thomas Gleixner
On Mon, 8 Jan 2018, Ingo Molnar wrote:
> * Linus Torvalds  wrote:
> 
> > On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse  wrote:
> > > This is a mitigation for the 'variant 2' attack described in
> > > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> > 
> > Ok, I don't love the patches, but I see nothing horribly wrong here
> > either, and I assume the performance impact of this is pretty minimal.
> > 
> > Thomas? I'm obviously doing rc7 today without these, but I assume the
> > x86 maintainers are resigned to this all. And yes, we'll have at least
> > an rc8 this release..
> 
> I'm definitely resigned to them, and with these patches being disclosed so 
> late
> we don't have any good choices left, so a tentative:
> 
> Acked-by: Ingo Molnar 

Just doing the last polishing on that. I'll add your ack


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Thomas Gleixner
On Mon, 8 Jan 2018, Andrea Arcangeli wrote:
> On Fri, Jan 05, 2018 at 10:59:28AM +0100, Thomas Gleixner wrote:
> I sent you a better version of the efi_64.c fix from Jiri privately
> and you still miss the tboot fix in linux-tip so you still got a boot
> failure to fix there.

Missed that in the pile ...

> This is incremental with
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=WIP.x86/pti
> where the "Unbreak EFI old_memmap" fix is applied.
> 
> I respinned it after doing the more correct fix in this case too (same
> as the efi_64.c improvement) while leaving the attribution to the fix
> to Dave as he did the hard part.

Thanks for resending it.

> >From 0c480d1eeabd56379144a4ed6b6fb24f3b84e40e Mon Sep 17 00:00:00 2001
> From: Dave Hansen 
> Date: Sat, 6 Jan 2018 18:41:14 +0100
> Subject: [PATCH 1/1] x86/kaiser/efi: unbreak tboot
> 
> This is another case similar to what EFI does: create a new set of
> page tables, map some code at a low address, and jump to it.  PTI
> mistakes this low address for userspace and mistakenly marks it
> non-executable in an effort to make it unusable for userspace.  Undo
> the poison to allow execution.
> 
> Signed-off-by: Dave Hansen 
> Cc: Ning Sun 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: tboot-de...@lists.sourceforge.net
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Andrea Arcangeli 
> ---
>  arch/x86/kernel/tboot.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
> index a4eb27918ceb..75869a4b6c41 100644
> --- a/arch/x86/kernel/tboot.c
> +++ b/arch/x86/kernel/tboot.c
> @@ -127,6 +127,7 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
> long pfn,
>   p4d = p4d_alloc(_mm, pgd, vaddr);
>   if (!p4d)
>   return -1;
> + pgd->pgd &= ~_PAGE_NX;
>   pud = pud_alloc(_mm, p4d, vaddr);
>   if (!pud)
>   return -1;
> 
> If I can help and assist in any other way let me know.
> 
> Thanks,
> Andrea
> 


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Thomas Gleixner
On Mon, 8 Jan 2018, Andrea Arcangeli wrote:
> On Fri, Jan 05, 2018 at 10:59:28AM +0100, Thomas Gleixner wrote:
> I sent you a better version of the efi_64.c fix from Jiri privately
> and you still miss the tboot fix in linux-tip so you still got a boot
> failure to fix there.

Missed that in the pile ...

> This is incremental with
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=WIP.x86/pti
> where the "Unbreak EFI old_memmap" fix is applied.
> 
> I respinned it after doing the more correct fix in this case too (same
> as the efi_64.c improvement) while leaving the attribution to the fix
> to Dave as he did the hard part.

Thanks for resending it.

> >From 0c480d1eeabd56379144a4ed6b6fb24f3b84e40e Mon Sep 17 00:00:00 2001
> From: Dave Hansen 
> Date: Sat, 6 Jan 2018 18:41:14 +0100
> Subject: [PATCH 1/1] x86/kaiser/efi: unbreak tboot
> 
> This is another case similar to what EFI does: create a new set of
> page tables, map some code at a low address, and jump to it.  PTI
> mistakes this low address for userspace and mistakenly marks it
> non-executable in an effort to make it unusable for userspace.  Undo
> the poison to allow execution.
> 
> Signed-off-by: Dave Hansen 
> Cc: Ning Sun 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: tboot-de...@lists.sourceforge.net
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Andrea Arcangeli 
> ---
>  arch/x86/kernel/tboot.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
> index a4eb27918ceb..75869a4b6c41 100644
> --- a/arch/x86/kernel/tboot.c
> +++ b/arch/x86/kernel/tboot.c
> @@ -127,6 +127,7 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
> long pfn,
>   p4d = p4d_alloc(_mm, pgd, vaddr);
>   if (!p4d)
>   return -1;
> + pgd->pgd &= ~_PAGE_NX;
>   pud = pud_alloc(_mm, p4d, vaddr);
>   if (!pud)
>   return -1;
> 
> If I can help and assist in any other way let me know.
> 
> Thanks,
> Andrea
> 


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse  wrote:
> > This is a mitigation for the 'variant 2' attack described in
> > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> 
> Ok, I don't love the patches, but I see nothing horribly wrong here
> either, and I assume the performance impact of this is pretty minimal.
> 
> Thomas? I'm obviously doing rc7 today without these, but I assume the
> x86 maintainers are resigned to this all. And yes, we'll have at least
> an rc8 this release..

I'm definitely resigned to them, and with these patches being disclosed so late
we don't have any good choices left, so a tentative:

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse  wrote:
> > This is a mitigation for the 'variant 2' attack described in
> > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> 
> Ok, I don't love the patches, but I see nothing horribly wrong here
> either, and I assume the performance impact of this is pretty minimal.
> 
> Thomas? I'm obviously doing rc7 today without these, but I assume the
> x86 maintainers are resigned to this all. And yes, we'll have at least
> an rc8 this release..

I'm definitely resigned to them, and with these patches being disclosed so late
we don't have any good choices left, so a tentative:

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Willy Tarreau
On Mon, Jan 08, 2018 at 05:22:41PM +0100, Borislav Petkov wrote:
> On Sun, Jan 07, 2018 at 11:10:38PM +0100, Willy Tarreau wrote:
> >  I just want to be clear that the big drop some of us are facing is
> > not an option *at all* for certain processes in certain environments
> > and that we'll either continue to run with pti=off or with pti=on + a
> > finer grained setting ASAP.
> 
> And that's all I'm saying: do pti=off in that case. The finer-grained
> "solution" is just silly.

I disagree because I want that, as much as possible, occasional
unprivileged local users can't exploit it. pti=off gives them full
access. The finer-grained solution ensures that only a few processes
share the same risk as the kernel as they work together to deliver
the service. And that's what I've implemented in a patch series I
sent in another thread :-)

   https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1580131.html

Cheers,
Willy


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Willy Tarreau
On Mon, Jan 08, 2018 at 05:22:41PM +0100, Borislav Petkov wrote:
> On Sun, Jan 07, 2018 at 11:10:38PM +0100, Willy Tarreau wrote:
> >  I just want to be clear that the big drop some of us are facing is
> > not an option *at all* for certain processes in certain environments
> > and that we'll either continue to run with pti=off or with pti=on + a
> > finer grained setting ASAP.
> 
> And that's all I'm saying: do pti=off in that case. The finer-grained
> "solution" is just silly.

I disagree because I want that, as much as possible, occasional
unprivileged local users can't exploit it. pti=off gives them full
access. The finer-grained solution ensures that only a few processes
share the same risk as the kernel as they work together to deliver
the service. And that's what I've implemented in a patch series I
sent in another thread :-)

   https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1580131.html

Cheers,
Willy


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Borislav Petkov
On Sun, Jan 07, 2018 at 11:10:38PM +0100, Willy Tarreau wrote:
>  I just want to be clear that the big drop some of us are facing is
> not an option *at all* for certain processes in certain environments
> and that we'll either continue to run with pti=off or with pti=on + a
> finer grained setting ASAP.

And that's all I'm saying: do pti=off in that case. The finer-grained
"solution" is just silly.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Borislav Petkov
On Sun, Jan 07, 2018 at 11:10:38PM +0100, Willy Tarreau wrote:
>  I just want to be clear that the big drop some of us are facing is
> not an option *at all* for certain processes in certain environments
> and that we'll either continue to run with pti=off or with pti=on + a
> finer grained setting ASAP.

And that's all I'm saying: do pti=off in that case. The finer-grained
"solution" is just silly.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Alexei Starovoitov
On Mon, Jan 08, 2018 at 02:42:13AM -0800, Paul Turner wrote:
> 
> kernel->kernel independent of SMEP:
> While much harder to coordinate, facilities such as eBPF potentially
> allow exploitable return targets to be created.
> Generally speaking (particularly if eBPF has been disabled) the risk
> is _much_ lower here, since we can only return into kernel execution
> that was already occurring on another thread (which could e.g. likely
> be attacked there directly independent of RSB poisoning.)

we can remove bpf interpreter without losing features:
https://patchwork.ozlabs.org/patch/856694/
Ironically JIT is more secure than interpreter.



Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Alexei Starovoitov
On Mon, Jan 08, 2018 at 02:42:13AM -0800, Paul Turner wrote:
> 
> kernel->kernel independent of SMEP:
> While much harder to coordinate, facilities such as eBPF potentially
> allow exploitable return targets to be created.
> Generally speaking (particularly if eBPF has been disabled) the risk
> is _much_ lower here, since we can only return into kernel execution
> that was already occurring on another thread (which could e.g. likely
> be attacked there directly independent of RSB poisoning.)

we can remove bpf interpreter without losing features:
https://patchwork.ozlabs.org/patch/856694/
Ironically JIT is more secure than interpreter.



Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread David Woodhouse

> On Mon, Jan 8, 2018 at 2:45 AM, David Woodhouse 
> wrote:
>> On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
>>> One detail that is missing is that we still need RSB refill in some
>>> cases.
>>> This is not because the retpoline sequence itself will underflow (it
>>> is actually guaranteed not to, since it consumes only RSB entries
>>> that it generates.
>>> But either to avoid poisoning of the RSB entries themselves, or to
>>> avoid the hardware turning to alternate predictors on RSB underflow.
>>>
>>> Enumerating the cases we care about:
>>>
>>> • user->kernel in the absence of SMEP:
>>> In the absence of SMEP, we must worry about user-generated RSB
>>> entries being consumable by kernel execution.
>>> Generally speaking, for synchronous execution this will not occur
>>> (e.g. syscall, interrupt), however, one important case remains.
>>> When we context switch between two threads, we should flush the RSB
>>> so that execution generated from the unbalanced return path on the
>>> thread that we just scheduled into, cannot consume RSB entries
>>> potentially installed by the prior thread.
>>
>> Or IBPB here, yes? That's what we had in the original patch set when
>> retpoline came last, and what I assume will be put back again once we
>> *finally* get our act together and reinstate the full set of microcode
>> patches.
>
> IBPB is *much* more expensive than the sequence I suggested.
> If the kernel has been protected with a retpoline compilation, it is
> much faster to not use IBPB here; we only need to prevent
> ret-poisoning in this case.

Retpoline protects the kernel but IBPB is needed on context switch anyway
to protect userspace processes from each other.

But...

> A) I am enumerating all of the cases for completeness.  It was missed
> by many that this detail was necessary on this patch, independently of
> IBRS.
> B) On the parts duplicated in (A), for specifics that are contributory to
> correctness in both cases, we should not hand-wave over the fact that
> they may or may not be covered by another patch-set.  Users need to
> understand what's required for complete protection.  Particularly if they
> are backporting.

... yes, agreed. Now we are putting retpoline first we shouldn't miss
things that we *were* doing anyway. TBH I really don't think we should
have spilt the patch sets apart; we'll work on getting the rest on top
ASAP.

-- 
dwmw2



Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread David Woodhouse

> On Mon, Jan 8, 2018 at 2:45 AM, David Woodhouse 
> wrote:
>> On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
>>> One detail that is missing is that we still need RSB refill in some
>>> cases.
>>> This is not because the retpoline sequence itself will underflow (it
>>> is actually guaranteed not to, since it consumes only RSB entries
>>> that it generates.
>>> But either to avoid poisoning of the RSB entries themselves, or to
>>> avoid the hardware turning to alternate predictors on RSB underflow.
>>>
>>> Enumerating the cases we care about:
>>>
>>> • user->kernel in the absence of SMEP:
>>> In the absence of SMEP, we must worry about user-generated RSB
>>> entries being consumable by kernel execution.
>>> Generally speaking, for synchronous execution this will not occur
>>> (e.g. syscall, interrupt), however, one important case remains.
>>> When we context switch between two threads, we should flush the RSB
>>> so that execution generated from the unbalanced return path on the
>>> thread that we just scheduled into, cannot consume RSB entries
>>> potentially installed by the prior thread.
>>
>> Or IBPB here, yes? That's what we had in the original patch set when
>> retpoline came last, and what I assume will be put back again once we
>> *finally* get our act together and reinstate the full set of microcode
>> patches.
>
> IBPB is *much* more expensive than the sequence I suggested.
> If the kernel has been protected with a retpoline compilation, it is
> much faster to not use IBPB here; we only need to prevent
> ret-poisoning in this case.

Retpoline protects the kernel but IBPB is needed on context switch anyway
to protect userspace processes from each other.

But...

> A) I am enumerating all of the cases for completeness.  It was missed
> by many that this detail was necessary on this patch, independently of
> IBRS.
> B) On the parts duplicated in (A), for specifics that are contributory to
> correctness in both cases, we should not hand-wave over the fact that
> they may or may not be covered by another patch-set.  Users need to
> understand what's required for complete protection.  Particularly if they
> are backporting.

... yes, agreed. Now we are putting retpoline first we shouldn't miss
things that we *were* doing anyway. TBH I really don't think we should
have spilt the patch sets apart; we'll work on getting the rest on top
ASAP.

-- 
dwmw2



Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Paul Turner
For Intel the manuals state that it's 16 entries -- 2.5.2.1
Agner also reports 16 (presumably experimentally measured)  e.g.
http://www.agner.org/optimize/microarchitecture.pdf [3.8]
For AMD it can be larger, for example 32 entries on Fam17h (but 16
entries on Fam16h).

For future proofing a binary, or a new AMD processor, 32 calls are
required.  I would suggest tuning this based on the current CPU (which
also covers the future case while saving cycles now) to save overhead.



On Mon, Jan 8, 2018 at 3:16 AM, Andrew Cooper  wrote:
> On 08/01/18 10:42, Paul Turner wrote:
>> A sequence for efficiently refilling the RSB is:
>> mov $8, %rax;
>> .align 16;
>>3: call 4f;
>>   3p: pause; call 3p;
>>  .align 16;
>>   4: call 5f;
>>   4p: pause; call 4p;
>>  .align 16;
>>5: dec %rax;
>>   jnz 3b;
>>   add $(16*8), %rsp;
>> This implementation uses 8 loops, with 2 calls per iteration.  This is
>> marginally faster than a single call per iteration.  We did not
>> observe useful benefit (particularly relative to text size) from
>> further unrolling.  This may also be usefully split into smaller (e.g.
>> 4 or 8 call)  segments where we can usefully pipeline/intermix with
>> other operations.  It includes retpoline type traps so that if an
>> entry is consumed, it cannot lead to controlled speculation.  On my
>> test system it took ~43 cycles on average.  Note that non-zero
>> displacement calls should be used as these may be optimized to not
>> interact with the RSB due to their use in fetching RIP for 32-bit
>> relocations.
>
> Guidance from both Intel and AMD still states that 32 calls are required
> in general.  Is your above code optimised for a specific processor which
> you know the RSB to be smaller on?
>
> ~Andrew


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Paul Turner
For Intel the manuals state that it's 16 entries -- 2.5.2.1
Agner also reports 16 (presumably experimentally measured)  e.g.
http://www.agner.org/optimize/microarchitecture.pdf [3.8]
For AMD it can be larger, for example 32 entries on Fam17h (but 16
entries on Fam16h).

For future proofing a binary, or a new AMD processor, 32 calls are
required.  I would suggest tuning this based on the current CPU (which
also covers the future case while saving cycles now) to save overhead.



On Mon, Jan 8, 2018 at 3:16 AM, Andrew Cooper  wrote:
> On 08/01/18 10:42, Paul Turner wrote:
>> A sequence for efficiently refilling the RSB is:
>> mov $8, %rax;
>> .align 16;
>>3: call 4f;
>>   3p: pause; call 3p;
>>  .align 16;
>>   4: call 5f;
>>   4p: pause; call 4p;
>>  .align 16;
>>5: dec %rax;
>>   jnz 3b;
>>   add $(16*8), %rsp;
>> This implementation uses 8 loops, with 2 calls per iteration.  This is
>> marginally faster than a single call per iteration.  We did not
>> observe useful benefit (particularly relative to text size) from
>> further unrolling.  This may also be usefully split into smaller (e.g.
>> 4 or 8 call)  segments where we can usefully pipeline/intermix with
>> other operations.  It includes retpoline type traps so that if an
>> entry is consumed, it cannot lead to controlled speculation.  On my
>> test system it took ~43 cycles on average.  Note that non-zero
>> displacement calls should be used as these may be optimized to not
>> interact with the RSB due to their use in fetching RIP for 32-bit
>> relocations.
>
> Guidance from both Intel and AMD still states that 32 calls are required
> in general.  Is your above code optimised for a specific processor which
> you know the RSB to be smaller on?
>
> ~Andrew


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Andrew Cooper
On 08/01/18 10:42, Paul Turner wrote:
> A sequence for efficiently refilling the RSB is:
> mov $8, %rax;
> .align 16;
>3: call 4f;
>   3p: pause; call 3p;
>  .align 16;
>   4: call 5f;
>   4p: pause; call 4p;
>  .align 16;
>5: dec %rax;
>   jnz 3b;
>   add $(16*8), %rsp;
> This implementation uses 8 loops, with 2 calls per iteration.  This is
> marginally faster than a single call per iteration.  We did not
> observe useful benefit (particularly relative to text size) from
> further unrolling.  This may also be usefully split into smaller (e.g.
> 4 or 8 call)  segments where we can usefully pipeline/intermix with
> other operations.  It includes retpoline type traps so that if an
> entry is consumed, it cannot lead to controlled speculation.  On my
> test system it took ~43 cycles on average.  Note that non-zero
> displacement calls should be used as these may be optimized to not
> interact with the RSB due to their use in fetching RIP for 32-bit
> relocations.

Guidance from both Intel and AMD still states that 32 calls are required
in general.  Is your above code optimised for a specific processor which
you know the RSB to be smaller on?

~Andrew


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Andrew Cooper
On 08/01/18 10:42, Paul Turner wrote:
> A sequence for efficiently refilling the RSB is:
> mov $8, %rax;
> .align 16;
>3: call 4f;
>   3p: pause; call 3p;
>  .align 16;
>   4: call 5f;
>   4p: pause; call 4p;
>  .align 16;
>5: dec %rax;
>   jnz 3b;
>   add $(16*8), %rsp;
> This implementation uses 8 loops, with 2 calls per iteration.  This is
> marginally faster than a single call per iteration.  We did not
> observe useful benefit (particularly relative to text size) from
> further unrolling.  This may also be usefully split into smaller (e.g.
> 4 or 8 call)  segments where we can usefully pipeline/intermix with
> other operations.  It includes retpoline type traps so that if an
> entry is consumed, it cannot lead to controlled speculation.  On my
> test system it took ~43 cycles on average.  Note that non-zero
> displacement calls should be used as these may be optimized to not
> interact with the RSB due to their use in fetching RIP for 32-bit
> relocations.

Guidance from both Intel and AMD still states that 32 calls are required
in general.  Is your above code optimised for a specific processor which
you know the RSB to be smaller on?

~Andrew


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Paul Turner
On Mon, Jan 8, 2018 at 2:45 AM, David Woodhouse  wrote:
> On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
>> One detail that is missing is that we still need RSB refill in some
>> cases.
>> This is not because the retpoline sequence itself will underflow (it
>> is actually guaranteed not to, since it consumes only RSB entries
>> that it generates.
>> But either to avoid poisoning of the RSB entries themselves, or to
>> avoid the hardware turning to alternate predictors on RSB underflow.
>>
>> Enumerating the cases we care about:
>>
>> • user->kernel in the absence of SMEP:
>> In the absence of SMEP, we must worry about user-generated RSB
>> entries being consumable by kernel execution.
>> Generally speaking, for synchronous execution this will not occur
>> (e.g. syscall, interrupt), however, one important case remains.
>> When we context switch between two threads, we should flush the RSB
>> so that execution generated from the unbalanced return path on the
>> thread that we just scheduled into, cannot consume RSB entries
>> potentially installed by the prior thread.
>
> Or IBPB here, yes? That's what we had in the original patch set when
> retpoline came last, and what I assume will be put back again once we
> *finally* get our act together and reinstate the full set of microcode
> patches.

IBPB is *much* more expensive than the sequence I suggested.
If the kernel has been protected with a retpoline compilation, it is
much faster to not use IBPB here; we only need to prevent
ret-poisoning in this case.

>
>> kernel->kernel independent of SMEP:
>> While much harder to coordinate, facilities such as eBPF potentially
>> allow exploitable return targets to be created.
>> Generally speaking (particularly if eBPF has been disabled) the risk
>> is _much_ lower here, since we can only return into kernel execution
>> that was already occurring on another thread (which could e.g. likely
>> be attacked there directly independent of RSB poisoning.)
>>
>> guest->hypervisor, independent of SMEP:
>> For guest ring0 -> host ring0 transitions, it is possible that the
>> tagging only includes that the entry was only generated in a ring0
>> context.  Meaning that a guest generated entry may be consumed by the
>> host.  This admits:
>
> We are also stuffing the RSB on vmexit in the IBRS/IBPB patch set,
> aren't we?

A) I am enumerating all of the cases for completeness.  It was missed
by many that this detail was necessary on this patch, independently of
IBRS.
B) On the parts duplicated in (A), for specifics that are contributory to
correctness in both cases, we should not hand-wave over the fact that
they may or may not be covered by another patch-set.  Users need to
understand what's required for complete protection.  Particularly if they
are backporting.


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Paul Turner
On Mon, Jan 8, 2018 at 2:45 AM, David Woodhouse  wrote:
> On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
>> One detail that is missing is that we still need RSB refill in some
>> cases.
>> This is not because the retpoline sequence itself will underflow (it
>> is actually guaranteed not to, since it consumes only RSB entries
>> that it generates.
>> But either to avoid poisoning of the RSB entries themselves, or to
>> avoid the hardware turning to alternate predictors on RSB underflow.
>>
>> Enumerating the cases we care about:
>>
>> • user->kernel in the absence of SMEP:
>> In the absence of SMEP, we must worry about user-generated RSB
>> entries being consumable by kernel execution.
>> Generally speaking, for synchronous execution this will not occur
>> (e.g. syscall, interrupt), however, one important case remains.
>> When we context switch between two threads, we should flush the RSB
>> so that execution generated from the unbalanced return path on the
>> thread that we just scheduled into, cannot consume RSB entries
>> potentially installed by the prior thread.
>
> Or IBPB here, yes? That's what we had in the original patch set when
> retpoline came last, and what I assume will be put back again once we
> *finally* get our act together and reinstate the full set of microcode
> patches.

IBPB is *much* more expensive than the sequence I suggested.
If the kernel has been protected with a retpoline compilation, it is
much faster to not use IBPB here; we only need to prevent
ret-poisoning in this case.

>
>> kernel->kernel independent of SMEP:
>> While much harder to coordinate, facilities such as eBPF potentially
>> allow exploitable return targets to be created.
>> Generally speaking (particularly if eBPF has been disabled) the risk
>> is _much_ lower here, since we can only return into kernel execution
>> that was already occurring on another thread (which could e.g. likely
>> be attacked there directly independent of RSB poisoning.)
>>
>> guest->hypervisor, independent of SMEP:
>> For guest ring0 -> host ring0 transitions, it is possible that the
>> tagging only includes that the entry was only generated in a ring0
>> context.  Meaning that a guest generated entry may be consumed by the
>> host.  This admits:
>
> We are also stuffing the RSB on vmexit in the IBRS/IBPB patch set,
> aren't we?

A) I am enumerating all of the cases for completeness.  It was missed
by many that this detail was necessary on this patch, independently of
IBRS.
B) On the parts duplicated in (A), for specifics that are contributory to
correctness in both cases, we should not hand-wave over the fact that
they may or may not be covered by another patch-set.  Users need to
understand what's required for complete protection.  Particularly if they
are backporting.


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Paul Turner
On Mon, Jan 8, 2018 at 2:38 AM, Jiri Kosina  wrote:
> On Mon, 8 Jan 2018, Paul Turner wrote:
>
>> user->kernel in the absence of SMEP:
>> In the absence of SMEP, we must worry about user-generated RSB entries
>> being consumable by kernel execution.
>> Generally speaking, for synchronous execution this will not occur (e.g.
>> syscall, interrupt), however, one important case remains.
>> When we context switch between two threads, we should flush the RSB so that
>> execution generated from the unbalanced return path on the thread that we
>> just scheduled into, cannot consume RSB entries potentially installed by
>> the prior thread.
>
> I am still unclear whether this closes it completely, as when HT is on,
> the RSB is shared between the threads, right? Therefore one thread can
> poision it for the other without even context switch happening.
>

See 2.6.1.1 [Replicated resources]:
  "The return stack predictor is replicated to improve branch
prediction of return instructions"

(This is part of the reason that the sequence is attractive; its use
of the RSB to control prediction naturally prevents cross-sibling
attack.)

> --
> Jiri Kosina
> SUSE Labs
>


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Paul Turner
On Mon, Jan 8, 2018 at 2:38 AM, Jiri Kosina  wrote:
> On Mon, 8 Jan 2018, Paul Turner wrote:
>
>> user->kernel in the absence of SMEP:
>> In the absence of SMEP, we must worry about user-generated RSB entries
>> being consumable by kernel execution.
>> Generally speaking, for synchronous execution this will not occur (e.g.
>> syscall, interrupt), however, one important case remains.
>> When we context switch between two threads, we should flush the RSB so that
>> execution generated from the unbalanced return path on the thread that we
>> just scheduled into, cannot consume RSB entries potentially installed by
>> the prior thread.
>
> I am still unclear whether this closes it completely, as when HT is on,
> the RSB is shared between the threads, right? Therefore one thread can
> poision it for the other without even context switch happening.
>

See 2.6.1.1 [Replicated resources]:
  "The return stack predictor is replicated to improve branch
prediction of return instructions"

(This is part of the reason that the sequence is attractive; its use
of the RSB to control prediction naturally prevents cross-sibling
attack.)

> --
> Jiri Kosina
> SUSE Labs
>


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread David Woodhouse
On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
> One detail that is missing is that we still need RSB refill in some
> cases.
> This is not because the retpoline sequence itself will underflow (it
> is actually guaranteed not to, since it consumes only RSB entries
> that it generates.  
> But either to avoid poisoning of the RSB entries themselves, or to
> avoid the hardware turning to alternate predictors on RSB underflow.
> 
> Enumerating the cases we care about:
> 
> • user->kernel in the absence of SMEP:
> In the absence of SMEP, we must worry about user-generated RSB
> entries being consumable by kernel execution.
> Generally speaking, for synchronous execution this will not occur
> (e.g. syscall, interrupt), however, one important case remains.
> When we context switch between two threads, we should flush the RSB
> so that execution generated from the unbalanced return path on the
> thread that we just scheduled into, cannot consume RSB entries
> potentially installed by the prior thread.

Or IBPB here, yes? That's what we had in the original patch set when
retpoline came last, and what I assume will be put back again once we
*finally* get our act together and reinstate the full set of microcode
patches.

> kernel->kernel independent of SMEP:
> While much harder to coordinate, facilities such as eBPF potentially
> allow exploitable return targets to be created.
> Generally speaking (particularly if eBPF has been disabled) the risk
> is _much_ lower here, since we can only return into kernel execution
> that was already occurring on another thread (which could e.g. likely
> be attacked there directly independent of RSB poisoning.)
> 
> guest->hypervisor, independent of SMEP:
> For guest ring0 -> host ring0 transitions, it is possible that the
> tagging only includes that the entry was only generated in a ring0
> context.  Meaning that a guest generated entry may be consumed by the
> host.  This admits:

We are also stuffing the RSB on vmexit in the IBRS/IBPB patch set,
aren't we?


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread David Woodhouse
On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
> One detail that is missing is that we still need RSB refill in some
> cases.
> This is not because the retpoline sequence itself will underflow (it
> is actually guaranteed not to, since it consumes only RSB entries
> that it generates.  
> But either to avoid poisoning of the RSB entries themselves, or to
> avoid the hardware turning to alternate predictors on RSB underflow.
> 
> Enumerating the cases we care about:
> 
> • user->kernel in the absence of SMEP:
> In the absence of SMEP, we must worry about user-generated RSB
> entries being consumable by kernel execution.
> Generally speaking, for synchronous execution this will not occur
> (e.g. syscall, interrupt), however, one important case remains.
> When we context switch between two threads, we should flush the RSB
> so that execution generated from the unbalanced return path on the
> thread that we just scheduled into, cannot consume RSB entries
> potentially installed by the prior thread.

Or IBPB here, yes? That's what we had in the original patch set when
retpoline came last, and what I assume will be put back again once we
*finally* get our act together and reinstate the full set of microcode
patches.

> kernel->kernel independent of SMEP:
> While much harder to coordinate, facilities such as eBPF potentially
> allow exploitable return targets to be created.
> Generally speaking (particularly if eBPF has been disabled) the risk
> is _much_ lower here, since we can only return into kernel execution
> that was already occurring on another thread (which could e.g. likely
> be attacked there directly independent of RSB poisoning.)
> 
> guest->hypervisor, independent of SMEP:
> For guest ring0 -> host ring0 transitions, it is possible that the
> tagging only includes that the entry was only generated in a ring0
> context.  Meaning that a guest generated entry may be consumed by the
> host.  This admits:

We are also stuffing the RSB on vmexit in the IBRS/IBPB patch set,
aren't we?


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Paul Turner
[ First send did not make list because gmail ate its plain-text force
when I pasted content. ]

One detail that is missing is that we still need RSB refill in some cases.
This is not because the retpoline sequence itself will underflow (it
is actually guaranteed not to, since it consumes only RSB entries that
it generates.
But either to avoid poisoning of the RSB entries themselves, or to
avoid the hardware turning to alternate predictors on RSB underflow.

Enumerating the cases we care about:

user->kernel in the absence of SMEP:
In the absence of SMEP, we must worry about user-generated RSB entries
being consumable by kernel execution.
Generally speaking, for synchronous execution this will not occur
(e.g. syscall, interrupt), however, one important case remains.
When we context switch between two threads, we should flush the RSB so
that execution generated from the unbalanced return path on the thread
that we just scheduled into, cannot consume RSB entries potentially
installed by the prior thread.

kernel->kernel independent of SMEP:
While much harder to coordinate, facilities such as eBPF potentially
allow exploitable return targets to be created.
Generally speaking (particularly if eBPF has been disabled) the risk
is _much_ lower here, since we can only return into kernel execution
that was already occurring on another thread (which could e.g. likely
be attacked there directly independent of RSB poisoning.)

guest->hypervisor, independent of SMEP:
For guest ring0 -> host ring0 transitions, it is possible that the
tagging only includes that the entry was only generated in a ring0
context.  Meaning that a guest generated entry may be consumed by the
host.  This admits:

hypervisor_run_vcpu_implementation() {
  
  … run virtualized work (1)
   
  < update vmcs state, prior to any function return > (2)
   < return from hypervisor_run_vcpu_implementation() to handle VMEXIT > (3)
}

A guest to craft poisoned entries at (1) which, if not flushed at (2),
may immediately be eligible for consumption at (3).

While the cases above involve the crafting and use of poisoned
entries.  Recall also that one of the initial conditions was that we
should avoid RSB underflow as some CPUs may try to use other indirect
predictors when this occurs.

The cases we care about here are:
- When we return _into_ protected execution.  For the kernel, this
means when we exit interrupt context into kernel context, since may
have emptied or reduced the number of RSB entries while in iinterrupt
context.
- Context switch (even if we are returning to user code, we need to at
unwind the scheduler/triggering frames that preempted it previously,
considering that detail, this is a subset of the above, but listed for
completeness)
- On VMEXIT (it turns out we need to worry about both poisoned
entries, and no entries, the solution is a single refill nonetheless).
- Leaving deeper (>C1) c-states, which may have flushed hardware state
- Where we are unwinding call-chains of >16 entries[*]

[*] This is obviously the trickiest case.  Fortunately, it is tough to
exploit since such call-chains are reasonably rare, and action must
typically be predicted at a considerable distance from where current
execution lies.  Both dramatically increasing the feasibility of an
attack and lowering the bit-rate (number of ops per attempt is
necessarily increased).  For our systems, since we control the binary
image we can determine this through aggregate profiling of every
machine in the fleet.  I'm happy to provide those symbols; but it's
obviously restricted from complete coverage due to code differences.
Generally, this is a level of paranoia no typical user will likely
care about and only applies to a subset of CPUs.


A sequence for efficiently refilling the RSB is:
mov $8, %rax;
.align 16;
   3: call 4f;
  3p: pause; call 3p;
 .align 16;
  4: call 5f;
  4p: pause; call 4p;
 .align 16;
   5: dec %rax;
  jnz 3b;
  add $(16*8), %rsp;
This implementation uses 8 loops, with 2 calls per iteration.  This is
marginally faster than a single call per iteration.  We did not
observe useful benefit (particularly relative to text size) from
further unrolling.  This may also be usefully split into smaller (e.g.
4 or 8 call)  segments where we can usefully pipeline/intermix with
other operations.  It includes retpoline type traps so that if an
entry is consumed, it cannot lead to controlled speculation.  On my
test system it took ~43 cycles on average.  Note that non-zero
displacement calls should be used as these may be optimized to not
interact with the RSB due to their use in fetching RIP for 32-bit
relocations.

On Mon, Jan 8, 2018 at 2:34 AM, Paul Turner  wrote:
> One detail that is missing is that we still need RSB refill in some cases.
> This is not because the retpoline sequence itself will underflow (it is
> actually guaranteed not to, since it consumes only RSB entries that it
> generates.
> But either to avoid poisoning 

Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Paul Turner
[ First send did not make list because gmail ate its plain-text force
when I pasted content. ]

One detail that is missing is that we still need RSB refill in some cases.
This is not because the retpoline sequence itself will underflow (it
is actually guaranteed not to, since it consumes only RSB entries that
it generates.
But either to avoid poisoning of the RSB entries themselves, or to
avoid the hardware turning to alternate predictors on RSB underflow.

Enumerating the cases we care about:

user->kernel in the absence of SMEP:
In the absence of SMEP, we must worry about user-generated RSB entries
being consumable by kernel execution.
Generally speaking, for synchronous execution this will not occur
(e.g. syscall, interrupt), however, one important case remains.
When we context switch between two threads, we should flush the RSB so
that execution generated from the unbalanced return path on the thread
that we just scheduled into, cannot consume RSB entries potentially
installed by the prior thread.

kernel->kernel independent of SMEP:
While much harder to coordinate, facilities such as eBPF potentially
allow exploitable return targets to be created.
Generally speaking (particularly if eBPF has been disabled) the risk
is _much_ lower here, since we can only return into kernel execution
that was already occurring on another thread (which could e.g. likely
be attacked there directly independent of RSB poisoning.)

guest->hypervisor, independent of SMEP:
For guest ring0 -> host ring0 transitions, it is possible that the
tagging only includes that the entry was only generated in a ring0
context.  Meaning that a guest generated entry may be consumed by the
host.  This admits:

hypervisor_run_vcpu_implementation() {
  
  … run virtualized work (1)
   
  < update vmcs state, prior to any function return > (2)
   < return from hypervisor_run_vcpu_implementation() to handle VMEXIT > (3)
}

A guest to craft poisoned entries at (1) which, if not flushed at (2),
may immediately be eligible for consumption at (3).

While the cases above involve the crafting and use of poisoned
entries.  Recall also that one of the initial conditions was that we
should avoid RSB underflow as some CPUs may try to use other indirect
predictors when this occurs.

The cases we care about here are:
- When we return _into_ protected execution.  For the kernel, this
means when we exit interrupt context into kernel context, since may
have emptied or reduced the number of RSB entries while in iinterrupt
context.
- Context switch (even if we are returning to user code, we need to at
unwind the scheduler/triggering frames that preempted it previously,
considering that detail, this is a subset of the above, but listed for
completeness)
- On VMEXIT (it turns out we need to worry about both poisoned
entries, and no entries, the solution is a single refill nonetheless).
- Leaving deeper (>C1) c-states, which may have flushed hardware state
- Where we are unwinding call-chains of >16 entries[*]

[*] This is obviously the trickiest case.  Fortunately, it is tough to
exploit since such call-chains are reasonably rare, and action must
typically be predicted at a considerable distance from where current
execution lies.  Both dramatically increasing the feasibility of an
attack and lowering the bit-rate (number of ops per attempt is
necessarily increased).  For our systems, since we control the binary
image we can determine this through aggregate profiling of every
machine in the fleet.  I'm happy to provide those symbols; but it's
obviously restricted from complete coverage due to code differences.
Generally, this is a level of paranoia no typical user will likely
care about and only applies to a subset of CPUs.


A sequence for efficiently refilling the RSB is:
mov $8, %rax;
.align 16;
   3: call 4f;
  3p: pause; call 3p;
 .align 16;
  4: call 5f;
  4p: pause; call 4p;
 .align 16;
   5: dec %rax;
  jnz 3b;
  add $(16*8), %rsp;
This implementation uses 8 loops, with 2 calls per iteration.  This is
marginally faster than a single call per iteration.  We did not
observe useful benefit (particularly relative to text size) from
further unrolling.  This may also be usefully split into smaller (e.g.
4 or 8 call)  segments where we can usefully pipeline/intermix with
other operations.  It includes retpoline type traps so that if an
entry is consumed, it cannot lead to controlled speculation.  On my
test system it took ~43 cycles on average.  Note that non-zero
displacement calls should be used as these may be optimized to not
interact with the RSB due to their use in fetching RIP for 32-bit
relocations.

On Mon, Jan 8, 2018 at 2:34 AM, Paul Turner  wrote:
> One detail that is missing is that we still need RSB refill in some cases.
> This is not because the retpoline sequence itself will underflow (it is
> actually guaranteed not to, since it consumes only RSB entries that it
> generates.
> But either to avoid poisoning of the RSB 

Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Jiri Kosina
On Mon, 8 Jan 2018, Paul Turner wrote:

> user->kernel in the absence of SMEP:
> In the absence of SMEP, we must worry about user-generated RSB entries
> being consumable by kernel execution.
> Generally speaking, for synchronous execution this will not occur (e.g.
> syscall, interrupt), however, one important case remains.
> When we context switch between two threads, we should flush the RSB so that
> execution generated from the unbalanced return path on the thread that we
> just scheduled into, cannot consume RSB entries potentially installed by
> the prior thread.

I am still unclear whether this closes it completely, as when HT is on, 
the RSB is shared between the threads, right? Therefore one thread can 
poision it for the other without even context switch happening.

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Jiri Kosina
On Mon, 8 Jan 2018, Paul Turner wrote:

> user->kernel in the absence of SMEP:
> In the absence of SMEP, we must worry about user-generated RSB entries
> being consumable by kernel execution.
> Generally speaking, for synchronous execution this will not occur (e.g.
> syscall, interrupt), however, one important case remains.
> When we context switch between two threads, we should flush the RSB so that
> execution generated from the unbalanced return path on the thread that we
> just scheduled into, cannot consume RSB entries potentially installed by
> the prior thread.

I am still unclear whether this closes it completely, as when HT is on, 
the RSB is shared between the threads, right? Therefore one thread can 
poision it for the other without even context switch happening.

-- 
Jiri Kosina
SUSE Labs



Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 10:59:28AM +0100, Thomas Gleixner wrote:
> I've seen the insanities which were crammed into the distro kernels, which
> have sysctls and whatever, but at the same time these kernels shipped in a

Debugfs tunables only, there are no sysctl, quoting Greg:

http://lkml.kernel.org/r/20180107082026.ga11...@kroah.com

"It's a debugfs api, it can be changed at any time, to be anything we
want, and all is fine :)"

> haste do not even boot on a specific class of machines. [..]

If you refer to the two efi_64.c and tboot.c corner case boot failures
found over the last weekend those affected upstream 4.15-rc 4.14.12
and all PTI branches in linux-tip too (perhaps less reproducible there
because of differences in old_memmap handling).

I sent you a better version of the efi_64.c fix from Jiri privately
and you still miss the tboot fix in linux-tip so you still got a boot
failure to fix there.

This is incremental with
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=WIP.x86/pti
where the "Unbreak EFI old_memmap" fix is applied.

I respinned it after doing the more correct fix in this case too (same
as the efi_64.c improvement) while leaving the attribution to the fix
to Dave as he did the hard part.

>From 0c480d1eeabd56379144a4ed6b6fb24f3b84e40e Mon Sep 17 00:00:00 2001
From: Dave Hansen 
Date: Sat, 6 Jan 2018 18:41:14 +0100
Subject: [PATCH 1/1] x86/kaiser/efi: unbreak tboot

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.  Undo
the poison to allow execution.

Signed-off-by: Dave Hansen 
Cc: Ning Sun 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: tboot-de...@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andrea Arcangeli 
---
 arch/x86/kernel/tboot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index a4eb27918ceb..75869a4b6c41 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -127,6 +127,7 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
long pfn,
p4d = p4d_alloc(_mm, pgd, vaddr);
if (!p4d)
return -1;
+   pgd->pgd &= ~_PAGE_NX;
pud = pud_alloc(_mm, p4d, vaddr);
if (!pud)
return -1;

If I can help and assist in any other way let me know.

Thanks,
Andrea


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 10:59:28AM +0100, Thomas Gleixner wrote:
> I've seen the insanities which were crammed into the distro kernels, which
> have sysctls and whatever, but at the same time these kernels shipped in a

Debugfs tunables only, there are no sysctl, quoting Greg:

http://lkml.kernel.org/r/20180107082026.ga11...@kroah.com

"It's a debugfs api, it can be changed at any time, to be anything we
want, and all is fine :)"

> haste do not even boot on a specific class of machines. [..]

If you refer to the two efi_64.c and tboot.c corner case boot failures
found over the last weekend those affected upstream 4.15-rc 4.14.12
and all PTI branches in linux-tip too (perhaps less reproducible there
because of differences in old_memmap handling).

I sent you a better version of the efi_64.c fix from Jiri privately
and you still miss the tboot fix in linux-tip so you still got a boot
failure to fix there.

This is incremental with
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=WIP.x86/pti
where the "Unbreak EFI old_memmap" fix is applied.

I respinned it after doing the more correct fix in this case too (same
as the efi_64.c improvement) while leaving the attribution to the fix
to Dave as he did the hard part.

>From 0c480d1eeabd56379144a4ed6b6fb24f3b84e40e Mon Sep 17 00:00:00 2001
From: Dave Hansen 
Date: Sat, 6 Jan 2018 18:41:14 +0100
Subject: [PATCH 1/1] x86/kaiser/efi: unbreak tboot

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.  Undo
the poison to allow execution.

Signed-off-by: Dave Hansen 
Cc: Ning Sun 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: tboot-de...@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andrea Arcangeli 
---
 arch/x86/kernel/tboot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index a4eb27918ceb..75869a4b6c41 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -127,6 +127,7 @@ static int map_tboot_page(unsigned long vaddr, unsigned 
long pfn,
p4d = p4d_alloc(_mm, pgd, vaddr);
if (!p4d)
return -1;
+   pgd->pgd &= ~_PAGE_NX;
pud = pud_alloc(_mm, p4d, vaddr);
if (!pud)
return -1;

If I can help and assist in any other way let me know.

Thanks,
Andrea


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Thomas Gleixner
On Sun, 7 Jan 2018, Linus Torvalds wrote:

> On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse  wrote:
> > This is a mitigation for the 'variant 2' attack described in
> > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> 
> Ok, I don't love the patches, but I see nothing horribly wrong here
> either, and I assume the performance impact of this is pretty minimal.
> 
> Thomas? I'm obviously doing rc7 today without these, but I assume the
> x86 maintainers are resigned to this all.

That seems to be the general mental state for lots of involved people.

Thanks,

tglx





Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-08 Thread Thomas Gleixner
On Sun, 7 Jan 2018, Linus Torvalds wrote:

> On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse  wrote:
> > This is a mitigation for the 'variant 2' attack described in
> > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> 
> Ok, I don't love the patches, but I see nothing horribly wrong here
> either, and I assume the performance impact of this is pretty minimal.
> 
> Thomas? I'm obviously doing rc7 today without these, but I assume the
> x86 maintainers are resigned to this all.

That seems to be the general mental state for lots of involved people.

Thanks,

tglx





Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Willy Tarreau
Hi Thomas,

On Mon, Jan 08, 2018 at 10:18:09AM +0100, Thomas Gleixner wrote:
> On Sun, 7 Jan 2018, Willy Tarreau wrote:
> > On Sun, Jan 07, 2018 at 07:55:11PM +0100, Borislav Petkov wrote:
> > > > Just like you have to trust your plane's pilot eventhough you don't
> > > > know him personally.
> > > 
> > > Funny you should make that analogy. Remember that germanwings pilot?
> > > People trusted him too.
> > > 
> > > Now imagine if the plane had protection against insane pilots... some of
> > > those people might still be alive, who knows...
> > 
> > Sure but despite this case many people continue to take the plane because
> > it's their only option to cross half of the world in a reasonable time.
> > 
> > Boris, I'm *not* contesting the performance resulting from the fixes,
> > and I would never have been able to produce them myself had I to, so
> > I'm really glad we have them. I just want to be clear that the big drop
> > some of us are facing is not an option *at all* for certain processes
> > in certain environments and that we'll either continue to run with
> > pti=off or with pti=on + a finer grained setting ASAP.
> 
> No argument about that. We've looked into per process PTI very early and
> decided not to go that route because of the time pressure and the risk. I'm
> glad that we managed to pull it off at all without breaking the world
> completely. It's surely doable and we all know that it has to be done, just
> not right now as we have to fast track at least the basic protections for
> the other two attack vectors.

I know that most people with the skills to do it are very busy, which is
why I started to take a look at it, not being involved at all in this and
having interest in seeing it done. For me the road is long, progressively
discovering asid/pcid etc in the code, you can guess I won't come up with
something testable any time soon ;-)

My idea would be to use a privileged prctl() call to set a new TIF_NOPTI
on the task and to see where to check for this to avoid switching to the
user-only PGD when returning to userspace. I have no idea if this is
doable at all nor if this would be sufficient (I hope so) but reading
the code to try to figure whether it makes sense cannot hurt.

> You can be sure, that all people involved hate it more than you do.

I'm definitely convinced about this, we're all proud to save one CPU
cycle here and there from time to time and having to suddenly flush TLBs
and throw hundreds or thousands of cycles at once down the drain must be
a very hard decision to take. And by the way I don't hate what was done
because there's a config option and I still have the choice. Other OS
users probably don't even have this choice, so thanks to all involved
for this!

Willy


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Willy Tarreau
Hi Thomas,

On Mon, Jan 08, 2018 at 10:18:09AM +0100, Thomas Gleixner wrote:
> On Sun, 7 Jan 2018, Willy Tarreau wrote:
> > On Sun, Jan 07, 2018 at 07:55:11PM +0100, Borislav Petkov wrote:
> > > > Just like you have to trust your plane's pilot eventhough you don't
> > > > know him personally.
> > > 
> > > Funny you should make that analogy. Remember that germanwings pilot?
> > > People trusted him too.
> > > 
> > > Now imagine if the plane had protection against insane pilots... some of
> > > those people might still be alive, who knows...
> > 
> > Sure but despite this case many people continue to take the plane because
> > it's their only option to cross half of the world in a reasonable time.
> > 
> > Boris, I'm *not* contesting the performance resulting from the fixes,
> > and I would never have been able to produce them myself had I to, so
> > I'm really glad we have them. I just want to be clear that the big drop
> > some of us are facing is not an option *at all* for certain processes
> > in certain environments and that we'll either continue to run with
> > pti=off or with pti=on + a finer grained setting ASAP.
> 
> No argument about that. We've looked into per process PTI very early and
> decided not to go that route because of the time pressure and the risk. I'm
> glad that we managed to pull it off at all without breaking the world
> completely. It's surely doable and we all know that it has to be done, just
> not right now as we have to fast track at least the basic protections for
> the other two attack vectors.

I know that most people with the skills to do it are very busy, which is
why I started to take a look at it, not being involved at all in this and
having interest in seeing it done. For me the road is long, progressively
discovering asid/pcid etc in the code, you can guess I won't come up with
something testable any time soon ;-)

My idea would be to use a privileged prctl() call to set a new TIF_NOPTI
on the task and to see where to check for this to avoid switching to the
user-only PGD when returning to userspace. I have no idea if this is
doable at all nor if this would be sufficient (I hope so) but reading
the code to try to figure whether it makes sense cannot hurt.

> You can be sure, that all people involved hate it more than you do.

I'm definitely convinced about this, we're all proud to save one CPU
cycle here and there from time to time and having to suddenly flush TLBs
and throw hundreds or thousands of cycles at once down the drain must be
a very hard decision to take. And by the way I don't hate what was done
because there's a config option and I still have the choice. Other OS
users probably don't even have this choice, so thanks to all involved
for this!

Willy


Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Thomas Gleixner
On Sun, 7 Jan 2018, Willy Tarreau wrote:
> On Sun, Jan 07, 2018 at 07:55:11PM +0100, Borislav Petkov wrote:
> > > Just like you have to trust your plane's pilot eventhough you don't
> > > know him personally.
> > 
> > Funny you should make that analogy. Remember that germanwings pilot?
> > People trusted him too.
> > 
> > Now imagine if the plane had protection against insane pilots... some of
> > those people might still be alive, who knows...
> 
> Sure but despite this case many people continue to take the plane because
> it's their only option to cross half of the world in a reasonable time.
> 
> Boris, I'm *not* contesting the performance resulting from the fixes,
> and I would never have been able to produce them myself had I to, so
> I'm really glad we have them. I just want to be clear that the big drop
> some of us are facing is not an option *at all* for certain processes
> in certain environments and that we'll either continue to run with
> pti=off or with pti=on + a finer grained setting ASAP.

No argument about that. We've looked into per process PTI very early and
decided not to go that route because of the time pressure and the risk. I'm
glad that we managed to pull it off at all without breaking the world
completely. It's surely doable and we all know that it has to be done, just
not right now as we have to fast track at least the basic protections for
the other two attack vectors.

You can be sure, that all people involved hate it more than you do.

Thanks,

tglx




Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Thomas Gleixner
On Sun, 7 Jan 2018, Willy Tarreau wrote:
> On Sun, Jan 07, 2018 at 07:55:11PM +0100, Borislav Petkov wrote:
> > > Just like you have to trust your plane's pilot eventhough you don't
> > > know him personally.
> > 
> > Funny you should make that analogy. Remember that germanwings pilot?
> > People trusted him too.
> > 
> > Now imagine if the plane had protection against insane pilots... some of
> > those people might still be alive, who knows...
> 
> Sure but despite this case many people continue to take the plane because
> it's their only option to cross half of the world in a reasonable time.
> 
> Boris, I'm *not* contesting the performance resulting from the fixes,
> and I would never have been able to produce them myself had I to, so
> I'm really glad we have them. I just want to be clear that the big drop
> some of us are facing is not an option *at all* for certain processes
> in certain environments and that we'll either continue to run with
> pti=off or with pti=on + a finer grained setting ASAP.

No argument about that. We've looked into per process PTI very early and
decided not to go that route because of the time pressure and the risk. I'm
glad that we managed to pull it off at all without breaking the world
completely. It's surely doable and we all know that it has to be done, just
not right now as we have to fast track at least the basic protections for
the other two attack vectors.

You can be sure, that all people involved hate it more than you do.

Thanks,

tglx




Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-07 Thread Linus Torvalds
On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse  wrote:
> This is a mitigation for the 'variant 2' attack described in
> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Ok, I don't love the patches, but I see nothing horribly wrong here
either, and I assume the performance impact of this is pretty minimal.

Thomas? I'm obviously doing rc7 today without these, but I assume the
x86 maintainers are resigned to this all. And yes, we'll have at least
an rc8 this release..

Linus


Re: [PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-07 Thread Linus Torvalds
On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse  wrote:
> This is a mitigation for the 'variant 2' attack described in
> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Ok, I don't love the patches, but I see nothing horribly wrong here
either, and I assume the performance impact of this is pretty minimal.

Thomas? I'm obviously doing rc7 today without these, but I assume the
x86 maintainers are resigned to this all. And yes, we'll have at least
an rc8 this release..

Linus


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2018 at 07:55:11PM +0100, Borislav Petkov wrote:
> > Just like you have to trust your plane's pilot eventhough you don't
> > know him personally.
> 
> Funny you should make that analogy. Remember that germanwings pilot?
> People trusted him too.
> 
> Now imagine if the plane had protection against insane pilots... some of
> those people might still be alive, who knows...

Sure but despite this case many people continue to take the plane because
it's their only option to cross half of the world in a reasonable time.

Boris, I'm *not* contesting the performance resulting from the fixes,
and I would never have been able to produce them myself had I to, so
I'm really glad we have them. I just want to be clear that the big drop
some of us are facing is not an option *at all* for certain processes
in certain environments and that we'll either continue to run with
pti=off or with pti=on + a finer grained setting ASAP.

I mean, the kernel is not the only sensitive part in a system (and
sometimes it's even not at all). A kernel + a userland processes
deliver a service, each in it role. Breaking one or the other can be
similar or sometimes the trouble can be worse for one than the other.
But for some situations, the good work condition of the combination of
the two is critical, and even a kernel compromission could be a detail
compared to the impact of something crashing at full load. Sometimes a
userspace compromission would already be critical enough that the risk
is not higher by accepting to take it for the kernel as well.

In my specific case, on LB appliances, I don't really care what happens
once haproxy has already been compromised, it's too late. End of the
game, all sensitive information are already disclosed at this point.
What I'd rather avoid however is the occasional sysop who has an account
on the machine to retrieve some stats once in a while that would suddenly
be able to get more than these stats. That's where I draw the line for
*this* use case. Plenty of others will have plenty of other perception
and that's fine.

Cheers,
Willy


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2018 at 07:55:11PM +0100, Borislav Petkov wrote:
> > Just like you have to trust your plane's pilot eventhough you don't
> > know him personally.
> 
> Funny you should make that analogy. Remember that germanwings pilot?
> People trusted him too.
> 
> Now imagine if the plane had protection against insane pilots... some of
> those people might still be alive, who knows...

Sure but despite this case many people continue to take the plane because
it's their only option to cross half of the world in a reasonable time.

Boris, I'm *not* contesting the performance resulting from the fixes,
and I would never have been able to produce them myself had I to, so
I'm really glad we have them. I just want to be clear that the big drop
some of us are facing is not an option *at all* for certain processes
in certain environments and that we'll either continue to run with
pti=off or with pti=on + a finer grained setting ASAP.

I mean, the kernel is not the only sensitive part in a system (and
sometimes it's even not at all). A kernel + a userland processes
deliver a service, each in it role. Breaking one or the other can be
similar or sometimes the trouble can be worse for one than the other.
But for some situations, the good work condition of the combination of
the two is critical, and even a kernel compromission could be a detail
compared to the impact of something crashing at full load. Sometimes a
userspace compromission would already be critical enough that the risk
is not higher by accepting to take it for the kernel as well.

In my specific case, on LB appliances, I don't really care what happens
once haproxy has already been compromised, it's too late. End of the
game, all sensitive information are already disclosed at this point.
What I'd rather avoid however is the occasional sysop who has an account
on the machine to retrieve some stats once in a while that would suddenly
be able to get more than these stats. That's where I draw the line for
*this* use case. Plenty of others will have plenty of other perception
and that's fine.

Cheers,
Willy


[PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-07 Thread David Woodhouse
This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the hjl/indirect/gcc-7-branch/master
branch of https://github.com/hjl-tools/gcc/commits/hjl and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

Fedora 27 packages of the updated compiler are available at
https://koji.fedoraproject.org/koji/taskinfo?taskID=24065739


v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch
v6: Update to latest GCC patches with no dots in symbols
Fix MODVERSIONS properly(ish)
Fix typo breaking 32-bit, introduced in V5
Never set X86_FEATURE_RETPOLINE_AMD yet, pending confirmation

Andi Kleen (3):
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline: Add boot time option to disable retpoline
  x86/retpoline: Exclude objtool with retpoline

David Woodhouse (7):
  x86/retpoline: Add initial retpoline support
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps

 Documentation/admin-guide/kernel-parameters.txt |  3 +
 arch/x86/Kconfig| 17 -
 arch/x86/Kconfig.debug  |  6 +-
 arch/x86/Makefile   | 10 +++
 arch/x86/crypto/aesni-intel_asm.S   |  5 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |  3 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S|  3 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S   |  3 +-
 arch/x86/entry/entry_32.S   |  5 +-
 arch/x86/entry/entry_64.S   | 12 +++-
 arch/x86/include/asm/asm-prototypes.h   | 25 +++
 arch/x86/include/asm/cpufeatures.h  |  2 +
 arch/x86/include/asm/mshyperv.h | 18 ++---
 arch/x86/include/asm/nospec-branch.h| 92 +
 arch/x86/include/asm/xen/hypercall.h|  5 +-
 arch/x86/kernel/cpu/common.c|  3 +
 arch/x86/kernel/cpu/intel.c | 11 +++
 arch/x86/kernel/ftrace_32.S |  6 +-
 arch/x86/kernel/ftrace_64.S |  8 +--
 arch/x86/kernel/irq_32.c|  9 +--
 arch/x86/lib/Makefile   |  1 +
 arch/x86/lib/checksum_32.S  |  7 +-
 arch/x86/lib/retpoline.S| 48 +
 23 files changed, 264 insertions(+), 38 deletions(-)
 create mode 100644 arch/x86/include/asm/nospec-branch.h
 create mode 100644 arch/x86/lib/retpoline.S

-- 
2.7.4



[PATCH v6 00/10] Retpoline: Avoid speculative indirect calls in kernel

2018-01-07 Thread David Woodhouse
This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the hjl/indirect/gcc-7-branch/master
branch of https://github.com/hjl-tools/gcc/commits/hjl and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

Fedora 27 packages of the updated compiler are available at
https://koji.fedoraproject.org/koji/taskinfo?taskID=24065739


v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch
v6: Update to latest GCC patches with no dots in symbols
Fix MODVERSIONS properly(ish)
Fix typo breaking 32-bit, introduced in V5
Never set X86_FEATURE_RETPOLINE_AMD yet, pending confirmation

Andi Kleen (3):
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline: Add boot time option to disable retpoline
  x86/retpoline: Exclude objtool with retpoline

David Woodhouse (7):
  x86/retpoline: Add initial retpoline support
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps

 Documentation/admin-guide/kernel-parameters.txt |  3 +
 arch/x86/Kconfig| 17 -
 arch/x86/Kconfig.debug  |  6 +-
 arch/x86/Makefile   | 10 +++
 arch/x86/crypto/aesni-intel_asm.S   |  5 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |  3 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S|  3 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S   |  3 +-
 arch/x86/entry/entry_32.S   |  5 +-
 arch/x86/entry/entry_64.S   | 12 +++-
 arch/x86/include/asm/asm-prototypes.h   | 25 +++
 arch/x86/include/asm/cpufeatures.h  |  2 +
 arch/x86/include/asm/mshyperv.h | 18 ++---
 arch/x86/include/asm/nospec-branch.h| 92 +
 arch/x86/include/asm/xen/hypercall.h|  5 +-
 arch/x86/kernel/cpu/common.c|  3 +
 arch/x86/kernel/cpu/intel.c | 11 +++
 arch/x86/kernel/ftrace_32.S |  6 +-
 arch/x86/kernel/ftrace_64.S |  8 +--
 arch/x86/kernel/irq_32.c|  9 +--
 arch/x86/lib/Makefile   |  1 +
 arch/x86/lib/checksum_32.S  |  7 +-
 arch/x86/lib/retpoline.S| 48 +
 23 files changed, 264 insertions(+), 38 deletions(-)
 create mode 100644 arch/x86/include/asm/nospec-branch.h
 create mode 100644 arch/x86/lib/retpoline.S

-- 
2.7.4



Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Borislav Petkov
On Sun, Jan 07, 2018 at 06:44:51PM +0100, Willy Tarreau wrote:
> Exactly, but there's much more to gain by owning this process anyway in
> certain cases than just dumping a few hundreds of kernel bytes.

A few hundred? It is *all* machine bytes.

> That's where I consider that "trusted" is more "critical" than "safe" :
> if it dies, we all die anyway.

No, not die. Exploit it and since it is "trusted", use it to dump all
memory. All your memories belongs to us.

> Just like you have to trust your plane's pilot eventhough you don't
> know him personally.

Funny you should make that analogy. Remember that germanwings pilot?
People trusted him too.

Now imagine if the plane had protection against insane pilots... some of
those people might still be alive, who knows...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Borislav Petkov
On Sun, Jan 07, 2018 at 06:44:51PM +0100, Willy Tarreau wrote:
> Exactly, but there's much more to gain by owning this process anyway in
> certain cases than just dumping a few hundreds of kernel bytes.

A few hundred? It is *all* machine bytes.

> That's where I consider that "trusted" is more "critical" than "safe" :
> if it dies, we all die anyway.

No, not die. Exploit it and since it is "trusted", use it to dump all
memory. All your memories belongs to us.

> Just like you have to trust your plane's pilot eventhough you don't
> know him personally.

Funny you should make that analogy. Remember that germanwings pilot?
People trusted him too.

Now imagine if the plane had protection against insane pilots... some of
those people might still be alive, who knows...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Borislav Petkov
On Sun, Jan 07, 2018 at 09:21:44AM -0800, David Lang wrote:
> The point is that in many cases, if someone explits the "trusted" process,
> they already have everything that the machine is able to do anyway.

...and then we don't need the per-process complication anyway.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Borislav Petkov
On Sun, Jan 07, 2018 at 09:21:44AM -0800, David Lang wrote:
> The point is that in many cases, if someone explits the "trusted" process,
> they already have everything that the machine is able to do anyway.

...and then we don't need the per-process complication anyway.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Woodhouse, David
On Sun, 2018-01-07 at 21:01 +0300, Ivan Ivanov wrote:
> Make sure that your patches do not affect AMD CPU,
> because they are unaffected by Meltdown vulnerability
> for which this "30% slowdown Intel patch" is required

These patches *do* affect AMD CPUs, because they address one of the
issues for which AMD CPUs are also vulnerable.



smime.p7s
Description: S/MIME cryptographic signature


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Woodhouse, David
On Sun, 2018-01-07 at 21:01 +0300, Ivan Ivanov wrote:
> Make sure that your patches do not affect AMD CPU,
> because they are unaffected by Meltdown vulnerability
> for which this "30% slowdown Intel patch" is required

These patches *do* affect AMD CPUs, because they address one of the
issues for which AMD CPUs are also vulnerable.



smime.p7s
Description: S/MIME cryptographic signature


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Ivan Ivanov
Make sure that your patches do not affect AMD CPU,
because they are unaffected by Meltdown vulnerability
for which this "30% slowdown Intel patch" is required

All your security patches regarding Meltdown should be like:
*) if its Intel, it is " cpu_insecure " ==> take a safe and slow route
*) if its AMD, it is " secure cpu " ==> take a normal route

AMD users should not suffer because of Intel screwups.
if Intel is responsible they should accept the CPU returns

Best regards
Ivan Ivanov,
coreboot developer and
open source enthusiast
 

  https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail;
target="_blank">https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif;
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/>
Без вирусов. https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail;
target="_blank" style="color: #4453ea;">www.avast.ru




2018-01-07 20:47 GMT+03:00 Willy Tarreau :
> On Sun, Jan 07, 2018 at 02:01:38PM +, Alan Cox wrote:
>> > I disagree. When there are patches that slow execution down up to 30%,
>> > I want to be able to mark a binary as "trusted" so that I can run it
>>
>> It's not a binary that is trusted - it's a binary in a given use case.
>> You could easily have the same binary being run in two situations on the
>> same box at the same time and run just one of them 'trusted'.
>
> That's what I like with the prctl approach. This can end up as a config
> option in the application itself. At least I'd see it like this in
> haproxy. Basically :
>   - start it with enough privileges (always the case to warrant chroot()
> then setuid())
>
>   - if config option "disable-kpti" is set, run prctl() to disable it.
>
>
> It is sufficiently inconvenient to ensure that it's only done where
> relevant and regardless of the executable itself (ie it should not be
> an xattr on the FS for example).
>
> Willy


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Ivan Ivanov
Make sure that your patches do not affect AMD CPU,
because they are unaffected by Meltdown vulnerability
for which this "30% slowdown Intel patch" is required

All your security patches regarding Meltdown should be like:
*) if its Intel, it is " cpu_insecure " ==> take a safe and slow route
*) if its AMD, it is " secure cpu " ==> take a normal route

AMD users should not suffer because of Intel screwups.
if Intel is responsible they should accept the CPU returns

Best regards
Ivan Ivanov,
coreboot developer and
open source enthusiast
 

  https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail;
target="_blank">https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif;
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/>
Без вирусов. https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail;
target="_blank" style="color: #4453ea;">www.avast.ru




2018-01-07 20:47 GMT+03:00 Willy Tarreau :
> On Sun, Jan 07, 2018 at 02:01:38PM +, Alan Cox wrote:
>> > I disagree. When there are patches that slow execution down up to 30%,
>> > I want to be able to mark a binary as "trusted" so that I can run it
>>
>> It's not a binary that is trusted - it's a binary in a given use case.
>> You could easily have the same binary being run in two situations on the
>> same box at the same time and run just one of them 'trusted'.
>
> That's what I like with the prctl approach. This can end up as a config
> option in the application itself. At least I'd see it like this in
> haproxy. Basically :
>   - start it with enough privileges (always the case to warrant chroot()
> then setuid())
>
>   - if config option "disable-kpti" is set, run prctl() to disable it.
>
>
> It is sufficiently inconvenient to ensure that it's only done where
> relevant and regardless of the executable itself (ie it should not be
> an xattr on the FS for example).
>
> Willy


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2018 at 02:01:38PM +, Alan Cox wrote:
> > I disagree. When there are patches that slow execution down up to 30%,
> > I want to be able to mark a binary as "trusted" so that I can run it
> 
> It's not a binary that is trusted - it's a binary in a given use case.
> You could easily have the same binary being run in two situations on the
> same box at the same time and run just one of them 'trusted'.

That's what I like with the prctl approach. This can end up as a config
option in the application itself. At least I'd see it like this in
haproxy. Basically :
  - start it with enough privileges (always the case to warrant chroot()
then setuid())

  - if config option "disable-kpti" is set, run prctl() to disable it.


It is sufficiently inconvenient to ensure that it's only done where
relevant and regardless of the executable itself (ie it should not be
an xattr on the FS for example).

Willy


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2018 at 02:01:38PM +, Alan Cox wrote:
> > I disagree. When there are patches that slow execution down up to 30%,
> > I want to be able to mark a binary as "trusted" so that I can run it
> 
> It's not a binary that is trusted - it's a binary in a given use case.
> You could easily have the same binary being run in two situations on the
> same box at the same time and run just one of them 'trusted'.

That's what I like with the prctl approach. This can end up as a config
option in the application itself. At least I'd see it like this in
haproxy. Basically :
  - start it with enough privileges (always the case to warrant chroot()
then setuid())

  - if config option "disable-kpti" is set, run prctl() to disable it.


It is sufficiently inconvenient to ensure that it's only done where
relevant and regardless of the executable itself (ie it should not be
an xattr on the FS for example).

Willy


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2018 at 03:14:10PM +0100, Borislav Petkov wrote:
> On Fri, Jan 05, 2018 at 08:13:33AM +0100, Willy Tarreau wrote:
> > I'm not fond of running the mitigations, but given that a few sysops can
> > connect to the machine to collect stats or counters, I think it would be
> > better to ensure these people can't happily play with the exploits to
> > dump stuff they shouldn't have access to.
> 
> So if someone exploits the "trusted" process, and then dumps all memory,
> you have practically lost.

Exactly, but there's much more to gain by owning this process anyway in
certain cases than just dumping a few hundreds of kernel bytes.

That's where I consider that "trusted" is more "critical" than "safe" :
if it dies, we all die anyway. Just like you have to trust your plane's
pilot eventhough you don't know him personally.

Willy


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2018 at 03:14:10PM +0100, Borislav Petkov wrote:
> On Fri, Jan 05, 2018 at 08:13:33AM +0100, Willy Tarreau wrote:
> > I'm not fond of running the mitigations, but given that a few sysops can
> > connect to the machine to collect stats or counters, I think it would be
> > better to ensure these people can't happily play with the exploits to
> > dump stuff they shouldn't have access to.
> 
> So if someone exploits the "trusted" process, and then dumps all memory,
> you have practically lost.

Exactly, but there's much more to gain by owning this process anyway in
certain cases than just dumping a few hundreds of kernel bytes.

That's where I consider that "trusted" is more "critical" than "safe" :
if it dies, we all die anyway. Just like you have to trust your plane's
pilot eventhough you don't know him personally.

Willy


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread David Lang
The point is that in many cases, if someone explits the "trusted" process, they 
already have everything that the machine is able to do anyway.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread David Lang
The point is that in many cases, if someone explits the "trusted" process, they 
already have everything that the machine is able to do anyway.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Borislav Petkov
On Fri, Jan 05, 2018 at 08:13:33AM +0100, Willy Tarreau wrote:
> I'm not fond of running the mitigations, but given that a few sysops can
> connect to the machine to collect stats or counters, I think it would be
> better to ensure these people can't happily play with the exploits to
> dump stuff they shouldn't have access to.

So if someone exploits the "trusted" process, and then dumps all memory,
you have practically lost.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Borislav Petkov
On Fri, Jan 05, 2018 at 08:13:33AM +0100, Willy Tarreau wrote:
> I'm not fond of running the mitigations, but given that a few sysops can
> connect to the machine to collect stats or counters, I think it would be
> better to ensure these people can't happily play with the exploits to
> dump stuff they shouldn't have access to.

So if someone exploits the "trusted" process, and then dumps all memory,
you have practically lost.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Alan Cox
> I disagree. When there are patches that slow execution down up to 30%,
> I want to be able to mark a binary as "trusted" so that I can run it

It's not a binary that is trusted - it's a binary in a given use case.
You could easily have the same binary being run in two situations on the
same box at the same time and run just one of them 'trusted'.



Re: Avoid speculative indirect calls in kernel

2018-01-07 Thread Alan Cox
> I disagree. When there are patches that slow execution down up to 30%,
> I want to be able to mark a binary as "trusted" so that I can run it

It's not a binary that is trusted - it's a binary in a given use case.
You could easily have the same binary being run in two situations on the
same box at the same time and run just one of them 'trusted'.



Re: Fwd: Avoid speculative indirect calls in kernel

2018-01-06 Thread Willy Tarreau
On Sat, Jan 06, 2018 at 10:04:26PM -0700, Kiernan Hager wrote:
> On Thu, Jan 4, 2018 at 5:54 PM, Thomas Gleixner  wrote:
> > On Thu, 4 Jan 2018, Jon Masters wrote:
> >> P.S. I've an internal document where I've been tracking "nice to haves"
> >> for later, and one of them is whether it makes sense to tag binaries as
> >> "trusted" (e.g. extended attribute, label, whatever). It was something I
> >> wanted to bring up at some point as potentially worth considering.
> >
> > Scratch that. There is no such thing as a trusted binary.
> >
> 
> I disagree. When there are patches that slow execution down up to 30%,
> I want to be able to mark a binary as "trusted" so that I can run it
> without those patches if it is important. This is a boon to
> configurability and helps lessen the significant performance impact of
> these patches. Besides, anything run as root can already not only
> read, but also write kernel memory and other processes' memory, so
> it's not like this particular ability for processes trusted by the
> user is anything new. This flag should probably only be settable by
> root though, for obvious reasons.

I think everyone agrees on this, but most developers are still very
busy trying to get all issues addressed first. We should simply start
to work in parallel on what could consistute the next steps without
polluting them.

BTW the performance loss can be even worse, I have a packet generator
here whose performance was divided by 4 in a VM :-) No tests yet on
bare metal (it's easier to reboot a VM).

Willy


Re: Fwd: Avoid speculative indirect calls in kernel

2018-01-06 Thread Willy Tarreau
On Sat, Jan 06, 2018 at 10:04:26PM -0700, Kiernan Hager wrote:
> On Thu, Jan 4, 2018 at 5:54 PM, Thomas Gleixner  wrote:
> > On Thu, 4 Jan 2018, Jon Masters wrote:
> >> P.S. I've an internal document where I've been tracking "nice to haves"
> >> for later, and one of them is whether it makes sense to tag binaries as
> >> "trusted" (e.g. extended attribute, label, whatever). It was something I
> >> wanted to bring up at some point as potentially worth considering.
> >
> > Scratch that. There is no such thing as a trusted binary.
> >
> 
> I disagree. When there are patches that slow execution down up to 30%,
> I want to be able to mark a binary as "trusted" so that I can run it
> without those patches if it is important. This is a boon to
> configurability and helps lessen the significant performance impact of
> these patches. Besides, anything run as root can already not only
> read, but also write kernel memory and other processes' memory, so
> it's not like this particular ability for processes trusted by the
> user is anything new. This flag should probably only be settable by
> root though, for obvious reasons.

I think everyone agrees on this, but most developers are still very
busy trying to get all issues addressed first. We should simply start
to work in parallel on what could consistute the next steps without
polluting them.

BTW the performance loss can be even worse, I have a packet generator
here whose performance was divided by 4 in a VM :-) No tests yet on
bare metal (it's easier to reboot a VM).

Willy


Fwd: Avoid speculative indirect calls in kernel

2018-01-06 Thread Kiernan Hager
On Thu, Jan 4, 2018 at 5:54 PM, Thomas Gleixner  wrote:
> On Thu, 4 Jan 2018, Jon Masters wrote:
>> P.S. I've an internal document where I've been tracking "nice to haves"
>> for later, and one of them is whether it makes sense to tag binaries as
>> "trusted" (e.g. extended attribute, label, whatever). It was something I
>> wanted to bring up at some point as potentially worth considering.
>
> Scratch that. There is no such thing as a trusted binary.
>

I disagree. When there are patches that slow execution down up to 30%,
I want to be able to mark a binary as "trusted" so that I can run it
without those patches if it is important. This is a boon to
configurability and helps lessen the significant performance impact of
these patches. Besides, anything run as root can already not only
read, but also write kernel memory and other processes' memory, so
it's not like this particular ability for processes trusted by the
user is anything new. This flag should probably only be settable by
root though, for obvious reasons.


Fwd: Avoid speculative indirect calls in kernel

2018-01-06 Thread Kiernan Hager
On Thu, Jan 4, 2018 at 5:54 PM, Thomas Gleixner  wrote:
> On Thu, 4 Jan 2018, Jon Masters wrote:
>> P.S. I've an internal document where I've been tracking "nice to haves"
>> for later, and one of them is whether it makes sense to tag binaries as
>> "trusted" (e.g. extended attribute, label, whatever). It was something I
>> wanted to bring up at some point as potentially worth considering.
>
> Scratch that. There is no such thing as a trusted binary.
>

I disagree. When there are patches that slow execution down up to 30%,
I want to be able to mark a binary as "trusted" so that I can run it
without those patches if it is important. This is a boon to
configurability and helps lessen the significant performance impact of
these patches. Besides, anything run as root can already not only
read, but also write kernel memory and other processes' memory, so
it's not like this particular ability for processes trusted by the
user is anything new. This flag should probably only be settable by
root though, for obvious reasons.


[PATCH v5 00/12] Retpoline: Avoid speculative indirect calls in kernel

2018-01-06 Thread David Woodhouse

This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the gcc-7_2_0-retpoline branch of
http://git.infradead.org/users/dwmw2/gcc-retpoline.git and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch

Andi Kleen (4):
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline: Add boot time option to disable retpoline
  x86/retpoline: Exclude objtool with retpoline
  retpoline/modpost: Quieten MODVERSION retpoline build

David Woodhouse (8):
  x86/spectre: Add X86_BUG_SPECTRE_V[12]
  x86/retpoline: Add initial retpoline support
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps

 Documentation/admin-guide/kernel-parameters.txt |  3 +
 arch/x86/Kconfig| 17 -
 arch/x86/Kconfig.debug  |  6 +-
 arch/x86/Makefile   | 10 +++
 arch/x86/crypto/aesni-intel_asm.S   |  5 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |  3 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S|  3 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S   |  3 +-
 arch/x86/entry/entry_32.S   |  5 +-
 arch/x86/entry/entry_64.S   | 12 +++-
 arch/x86/include/asm/cpufeatures.h  |  4 ++
 arch/x86/include/asm/mshyperv.h | 18 ++---
 arch/x86/include/asm/nospec-branch.h| 91 +
 arch/x86/include/asm/xen/hypercall.h|  5 +-
 arch/x86/kernel/cpu/common.c|  8 +++
 arch/x86/kernel/cpu/intel.c | 11 +++
 arch/x86/kernel/ftrace_32.S |  6 +-
 arch/x86/kernel/ftrace_64.S |  8 +--
 arch/x86/kernel/irq_32.c|  9 +--
 arch/x86/lib/Makefile   |  1 +
 arch/x86/lib/checksum_32.S  |  7 +-
 arch/x86/lib/retpoline.S| 30 
 scripts/mod/modpost.c   |  6 +-
 23 files changed, 231 insertions(+), 40 deletions(-)
 create mode 100644 arch/x86/include/asm/nospec-branch.h
 create mode 100644 arch/x86/lib/retpoline.S

-- 
2.7.4



[PATCH v5 00/12] Retpoline: Avoid speculative indirect calls in kernel

2018-01-06 Thread David Woodhouse

This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the gcc-7_2_0-retpoline branch of
http://git.infradead.org/users/dwmw2/gcc-retpoline.git and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch

Andi Kleen (4):
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline: Add boot time option to disable retpoline
  x86/retpoline: Exclude objtool with retpoline
  retpoline/modpost: Quieten MODVERSION retpoline build

David Woodhouse (8):
  x86/spectre: Add X86_BUG_SPECTRE_V[12]
  x86/retpoline: Add initial retpoline support
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps

 Documentation/admin-guide/kernel-parameters.txt |  3 +
 arch/x86/Kconfig| 17 -
 arch/x86/Kconfig.debug  |  6 +-
 arch/x86/Makefile   | 10 +++
 arch/x86/crypto/aesni-intel_asm.S   |  5 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |  3 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S|  3 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S   |  3 +-
 arch/x86/entry/entry_32.S   |  5 +-
 arch/x86/entry/entry_64.S   | 12 +++-
 arch/x86/include/asm/cpufeatures.h  |  4 ++
 arch/x86/include/asm/mshyperv.h | 18 ++---
 arch/x86/include/asm/nospec-branch.h| 91 +
 arch/x86/include/asm/xen/hypercall.h|  5 +-
 arch/x86/kernel/cpu/common.c|  8 +++
 arch/x86/kernel/cpu/intel.c | 11 +++
 arch/x86/kernel/ftrace_32.S |  6 +-
 arch/x86/kernel/ftrace_64.S |  8 +--
 arch/x86/kernel/irq_32.c|  9 +--
 arch/x86/lib/Makefile   |  1 +
 arch/x86/lib/checksum_32.S  |  7 +-
 arch/x86/lib/retpoline.S| 30 
 scripts/mod/modpost.c   |  6 +-
 23 files changed, 231 insertions(+), 40 deletions(-)
 create mode 100644 arch/x86/include/asm/nospec-branch.h
 create mode 100644 arch/x86/lib/retpoline.S

-- 
2.7.4



Re: Avoid speculative indirect calls in kernel

2018-01-05 Thread james harvey
On Fri, Jan 5, 2018 at 5:40 AM, Woodhouse, David  wrote:
> On Thu, 2018-01-04 at 21:01 -0500, james harvey wrote:
>>
>>
>> I understand the GCC patches being discussed will fix the
>> vulnerability because newly compiled kernels will be compiled with a
>> GCC with these patches.
>
> The GCC patches work by eliminating all indirect branches, thus
> avoiding 'variant 2' of the three problems which have been discovered.
>
> Note that we also need to eliminate all the indirect branches which
> occurred in native assembler code too, and provide the 'thunks' that
> GCC uses instead, which is why there's a series of kernel patches to go
> with it.
>
> But building a kernel this way is *only* sufficient to protect the
> kernel. Attacks between userspace processes are still possible — you
> need the updated microcode, with branch-predictor flushes/restrictions,
> to protect existing userspace processes from each other.
>
>> But, are the GCC patches being discussed also expected to fix the
>> vulnerability because user binaries will be compiled using them?
>
> It would be possible to do that. Sensitive userspace processes could be
> built this way, rendering them invulnerable to 'variant 2' attacks
> without the kernel having to use the microcode features.
>
>> In such case, a binary could be maliciously changed back, or a custom GCC
>> made with the patches reverted.
>
> If the attacker can replace the sensitive binary, or replace the
> compiler with which the sensitive binary was compiled, then we have
> other problems. I'm not going to lose sleep over that.
>
>
> Note that *none* of this addresses 'variant 1'. There's a separate
> patch series which addresses likely 'variant 1 gadgets' in the kernel,
> which I haven't seen posted in public yet. And I'm not sure what we do
> about that for userspace except extending the existing Coverity ruleset
> and teaching GCC to emit barriers automatically in the right places,
> which is a bit far-fetched right now. Elena?
> Amazon Web Services UK Limited. Registered in England and Wales with 
> registration number 08650665 and which has its registered office at 60 
> Holborn Viaduct, London EC1A 2FD, United Kingdom.

I could have written more clearly.  What I'm getting at is if any of
the GCC patches are intended to prevent an exploit from being able to
be attempted, rather than making binaries immune from attacks.  So, I
don't mean being able to modify a sensitive binary or the system's
compiler.  I mean someone has non-root SSH access.  Compiles GCC with
whatever patches wind up for variant 1-3 reverted, or uses an old
version.  Leaves their custom compiled GCC in their user directory, to
compile malicious code.  Executes the compiled malicious code from
their user directory.  (Or, compiles a malicious program on their own
system compatible in architecture, kernel and library versions, using
GCC without new patches, and just copies over the binary to their user
directory to execute.)  Or, malicious customer with a VM on a shared
machine who has root access within their VM, reverting GCC patches
attempting to see into the host or other VM's on the machine.


Re: Avoid speculative indirect calls in kernel

2018-01-05 Thread james harvey
On Fri, Jan 5, 2018 at 5:40 AM, Woodhouse, David  wrote:
> On Thu, 2018-01-04 at 21:01 -0500, james harvey wrote:
>>
>>
>> I understand the GCC patches being discussed will fix the
>> vulnerability because newly compiled kernels will be compiled with a
>> GCC with these patches.
>
> The GCC patches work by eliminating all indirect branches, thus
> avoiding 'variant 2' of the three problems which have been discovered.
>
> Note that we also need to eliminate all the indirect branches which
> occurred in native assembler code too, and provide the 'thunks' that
> GCC uses instead, which is why there's a series of kernel patches to go
> with it.
>
> But building a kernel this way is *only* sufficient to protect the
> kernel. Attacks between userspace processes are still possible — you
> need the updated microcode, with branch-predictor flushes/restrictions,
> to protect existing userspace processes from each other.
>
>> But, are the GCC patches being discussed also expected to fix the
>> vulnerability because user binaries will be compiled using them?
>
> It would be possible to do that. Sensitive userspace processes could be
> built this way, rendering them invulnerable to 'variant 2' attacks
> without the kernel having to use the microcode features.
>
>> In such case, a binary could be maliciously changed back, or a custom GCC
>> made with the patches reverted.
>
> If the attacker can replace the sensitive binary, or replace the
> compiler with which the sensitive binary was compiled, then we have
> other problems. I'm not going to lose sleep over that.
>
>
> Note that *none* of this addresses 'variant 1'. There's a separate
> patch series which addresses likely 'variant 1 gadgets' in the kernel,
> which I haven't seen posted in public yet. And I'm not sure what we do
> about that for userspace except extending the existing Coverity ruleset
> and teaching GCC to emit barriers automatically in the right places,
> which is a bit far-fetched right now. Elena?
> Amazon Web Services UK Limited. Registered in England and Wales with 
> registration number 08650665 and which has its registered office at 60 
> Holborn Viaduct, London EC1A 2FD, United Kingdom.

I could have written more clearly.  What I'm getting at is if any of
the GCC patches are intended to prevent an exploit from being able to
be attempted, rather than making binaries immune from attacks.  So, I
don't mean being able to modify a sensitive binary or the system's
compiler.  I mean someone has non-root SSH access.  Compiles GCC with
whatever patches wind up for variant 1-3 reverted, or uses an old
version.  Leaves their custom compiled GCC in their user directory, to
compile malicious code.  Executes the compiled malicious code from
their user directory.  (Or, compiles a malicious program on their own
system compatible in architecture, kernel and library versions, using
GCC without new patches, and just copies over the binary to their user
directory to execute.)  Or, malicious customer with a VM on a shared
machine who has root access within their VM, reverting GCC patches
attempting to see into the host or other VM's on the machine.


Re: Avoid speculative indirect calls in kernel

2018-01-05 Thread Alan Cox
On Fri, 5 Jan 2018 01:54:13 +0100 (CET)
Thomas Gleixner  wrote:

> On Thu, 4 Jan 2018, Jon Masters wrote:
> > P.S. I've an internal document where I've been tracking "nice to haves"
> > for later, and one of them is whether it makes sense to tag binaries as
> > "trusted" (e.g. extended attribute, label, whatever). It was something I
> > wanted to bring up at some point as potentially worth considering.  
> 
> Scratch that. There is no such thing as a trusted binary.

There is if you are using signing and the like. I'm sure SELiux and
friends will grow the ability to set per process policy but that's
certainly not a priority.

However the question is wrong. 'trusted' is a binary operator not a unary
one.

The question that matters is

If I am executing A and about to switch to B does B trust A

because if B trusts A (which in Linuxspeak is 'can A ptrace B') then
there's not much point worrying about protection between them because what
you are trying to prevent is already expressly permitted.

It's even more important if there is a cost to the barrier imposition
because not only can you skip it sometimes but your scheduler can
schedule considering that cost just as it does cache eviction costs.

Alan


Re: Avoid speculative indirect calls in kernel

2018-01-05 Thread Alan Cox
On Fri, 5 Jan 2018 01:54:13 +0100 (CET)
Thomas Gleixner  wrote:

> On Thu, 4 Jan 2018, Jon Masters wrote:
> > P.S. I've an internal document where I've been tracking "nice to haves"
> > for later, and one of them is whether it makes sense to tag binaries as
> > "trusted" (e.g. extended attribute, label, whatever). It was something I
> > wanted to bring up at some point as potentially worth considering.  
> 
> Scratch that. There is no such thing as a trusted binary.

There is if you are using signing and the like. I'm sure SELiux and
friends will grow the ability to set per process policy but that's
certainly not a priority.

However the question is wrong. 'trusted' is a binary operator not a unary
one.

The question that matters is

If I am executing A and about to switch to B does B trust A

because if B trusts A (which in Linuxspeak is 'can A ptrace B') then
there's not much point worrying about protection between them because what
you are trying to prevent is already expressly permitted.

It's even more important if there is a cost to the barrier imposition
because not only can you skip it sometimes but your scheduler can
schedule considering that cost just as it does cache eviction costs.

Alan


Re: Avoid speculative indirect calls in kernel

2018-01-05 Thread Alan Cox
> But, are the GCC patches being discussed also expected to fix the
> vulnerability because user binaries will be compiled using them?  In

If you have a system with just a few user binaries where you are
concerned about such a thing you might go that way.

> such case, a binary could be maliciously changed back, or a custom GCC
> made with the patches reverted.

If I can change your gcc or your binary then instead of removing the
speculation protection I can make it encrypt all your files instead. Much
simpler.

At the point I can do this you already lost.

Alan


Re: Avoid speculative indirect calls in kernel

2018-01-05 Thread Alan Cox
> But, are the GCC patches being discussed also expected to fix the
> vulnerability because user binaries will be compiled using them?  In

If you have a system with just a few user binaries where you are
concerned about such a thing you might go that way.

> such case, a binary could be maliciously changed back, or a custom GCC
> made with the patches reverted.

If I can change your gcc or your binary then instead of removing the
speculation protection I can make it encrypt all your files instead. Much
simpler.

At the point I can do this you already lost.

Alan


Re: Avoid speculative indirect calls in kernel

2018-01-05 Thread Thomas Gleixner
On Thu, 4 Jan 2018, Jon Masters wrote:
> On 01/04/2018 07:54 PM, Thomas Gleixner wrote:
> > On Thu, 4 Jan 2018, Jon Masters wrote:
> >> P.S. I've an internal document where I've been tracking "nice to haves"
> >> for later, and one of them is whether it makes sense to tag binaries as
> >> "trusted" (e.g. extended attribute, label, whatever). It was something I
> >> wanted to bring up at some point as potentially worth considering.
> > 
> > Scratch that. There is no such thing as a trusted binary.
> 
> I agree with your sentiment, but for those mitigations that carry a
> significant performance overhead (for example IBRS at the moment, and on
> some other architectures where we might not end up with retpolines)
> there /could/ be some value in leaving them on by default but allowing a
> sysadmin to decide to trust a given application/container and accept the
> risk. Sure, it's selectively weakened security, I get that. I am not
> necessarily advocating this, just suggesting it be discussed.
> 
> [ I also totally get that you can extend variant 2 to have any
> application that interacts with another abuse it (even over a pipe or a
> socket, etc. provided they share the same cache and take untrusted data
> that can lead to some kind of load within a speculation window), and
> there are a ton of ways to still cause an attack in that case. ]

Correct.

We have neither the basic mitigations in place nor has anyone fully
understood the implications and possible further issues.

So can we please all sit back and fix the problems at hand in a sane way
before we start discussing things like selective trust or whatever?

I've seen the insanities which were crammed into the distro kernels, which
have sysctls and whatever, but at the same time these kernels shipped in a
haste do not even boot on a specific class of machines. Great engineering
work.

The thing which sits between the ears is not an acronyn for:

Big Revenue All Intelligence Nuked

But it seems that in some ways it has been degraded to exactly that or do
you have a sane explanation why quite some of the chip vendors ignored the
textbooks from the 90es about speculative execution, which clearly say that
speculation has to stop on domain borders and permission violations.

We already lost a lot of precious time due to other even more disgusting
big corporate games and many of us haven't had a quite moment in the past
two month.

So can we please fix the stuff on the oldest and most important principle
of engineering "Correctness first" and then once that done think about ways
how to optimize that w/o digging yet another hole.

Thanks,

tglx






Re: Avoid speculative indirect calls in kernel

2018-01-05 Thread Thomas Gleixner
On Thu, 4 Jan 2018, Jon Masters wrote:
> On 01/04/2018 07:54 PM, Thomas Gleixner wrote:
> > On Thu, 4 Jan 2018, Jon Masters wrote:
> >> P.S. I've an internal document where I've been tracking "nice to haves"
> >> for later, and one of them is whether it makes sense to tag binaries as
> >> "trusted" (e.g. extended attribute, label, whatever). It was something I
> >> wanted to bring up at some point as potentially worth considering.
> > 
> > Scratch that. There is no such thing as a trusted binary.
> 
> I agree with your sentiment, but for those mitigations that carry a
> significant performance overhead (for example IBRS at the moment, and on
> some other architectures where we might not end up with retpolines)
> there /could/ be some value in leaving them on by default but allowing a
> sysadmin to decide to trust a given application/container and accept the
> risk. Sure, it's selectively weakened security, I get that. I am not
> necessarily advocating this, just suggesting it be discussed.
> 
> [ I also totally get that you can extend variant 2 to have any
> application that interacts with another abuse it (even over a pipe or a
> socket, etc. provided they share the same cache and take untrusted data
> that can lead to some kind of load within a speculation window), and
> there are a ton of ways to still cause an attack in that case. ]

Correct.

We have neither the basic mitigations in place nor has anyone fully
understood the implications and possible further issues.

So can we please all sit back and fix the problems at hand in a sane way
before we start discussing things like selective trust or whatever?

I've seen the insanities which were crammed into the distro kernels, which
have sysctls and whatever, but at the same time these kernels shipped in a
haste do not even boot on a specific class of machines. Great engineering
work.

The thing which sits between the ears is not an acronyn for:

Big Revenue All Intelligence Nuked

But it seems that in some ways it has been degraded to exactly that or do
you have a sane explanation why quite some of the chip vendors ignored the
textbooks from the 90es about speculative execution, which clearly say that
speculation has to stop on domain borders and permission violations.

We already lost a lot of precious time due to other even more disgusting
big corporate games and many of us haven't had a quite moment in the past
two month.

So can we please fix the stuff on the oldest and most important principle
of engineering "Correctness first" and then once that done think about ways
how to optimize that w/o digging yet another hole.

Thanks,

tglx






Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Willy Tarreau
On Thu, Jan 04, 2018 at 10:57:19PM -0800, Dave Hansen wrote:
> On 01/04/2018 10:49 PM, Willy Tarreau wrote:
> > On Fri, Jan 05, 2018 at 01:54:13AM +0100, Thomas Gleixner wrote:
> >> On Thu, 4 Jan 2018, Jon Masters wrote:
> >>> P.S. I've an internal document where I've been tracking "nice to haves"
> >>> for later, and one of them is whether it makes sense to tag binaries as
> >>> "trusted" (e.g. extended attribute, label, whatever). It was something I
> >>> wanted to bring up at some point as potentially worth considering.
> >> Scratch that. There is no such thing as a trusted binary.
> > I disagree with you on this Thomas. "trusted" means "we agree to share the
> > risk this binary takes because it's critical to our service". When you
> > build a load balancing appliance on which 100% of the service is assured
> > by a single executable and the rest is just config management, you'd better
> > trust that process.
> 
> So you want to run this "one binary" as fast as possible and without
> mitigations in place?  But, you want mitigations *available* on that
> system at the same time?  For what?  If there's only one binary, why not
> just disable the mitigations entirely?

I'm not fond of running the mitigations, but given that a few sysops can
connect to the machine to collect stats or counters, I think it would be
better to ensure these people can't happily play with the exploits to
dump stuff they shouldn't have access to. It's even easier to understand
on a database or key-value server for example, where you may expect the
highest performance the CPU can bring for a specific process and the rest
can be mitigated and will never ever notice any performance impact at all.

That's why I was saying in another thread that it would be nice over the
long term if we could 1) make the mitigation dynamic, and 2) make it
possible for an admin to disable it for certain processes/programs.

Don't get me wrong, I'm perfectly aware that it's far from being simple
and for now we need to get a reliable mitigation. I'm just saying that
the performance impact is a huge loss for certain use cases and that
once things settle down we should start to work on ways to recover what
was lost.

Regards,
Willy


Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Willy Tarreau
On Thu, Jan 04, 2018 at 10:57:19PM -0800, Dave Hansen wrote:
> On 01/04/2018 10:49 PM, Willy Tarreau wrote:
> > On Fri, Jan 05, 2018 at 01:54:13AM +0100, Thomas Gleixner wrote:
> >> On Thu, 4 Jan 2018, Jon Masters wrote:
> >>> P.S. I've an internal document where I've been tracking "nice to haves"
> >>> for later, and one of them is whether it makes sense to tag binaries as
> >>> "trusted" (e.g. extended attribute, label, whatever). It was something I
> >>> wanted to bring up at some point as potentially worth considering.
> >> Scratch that. There is no such thing as a trusted binary.
> > I disagree with you on this Thomas. "trusted" means "we agree to share the
> > risk this binary takes because it's critical to our service". When you
> > build a load balancing appliance on which 100% of the service is assured
> > by a single executable and the rest is just config management, you'd better
> > trust that process.
> 
> So you want to run this "one binary" as fast as possible and without
> mitigations in place?  But, you want mitigations *available* on that
> system at the same time?  For what?  If there's only one binary, why not
> just disable the mitigations entirely?

I'm not fond of running the mitigations, but given that a few sysops can
connect to the machine to collect stats or counters, I think it would be
better to ensure these people can't happily play with the exploits to
dump stuff they shouldn't have access to. It's even easier to understand
on a database or key-value server for example, where you may expect the
highest performance the CPU can bring for a specific process and the rest
can be mitigated and will never ever notice any performance impact at all.

That's why I was saying in another thread that it would be nice over the
long term if we could 1) make the mitigation dynamic, and 2) make it
possible for an admin to disable it for certain processes/programs.

Don't get me wrong, I'm perfectly aware that it's far from being simple
and for now we need to get a reliable mitigation. I'm just saying that
the performance impact is a huge loss for certain use cases and that
once things settle down we should start to work on ways to recover what
was lost.

Regards,
Willy


Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Dave Hansen
On 01/04/2018 10:49 PM, Willy Tarreau wrote:
> On Fri, Jan 05, 2018 at 01:54:13AM +0100, Thomas Gleixner wrote:
>> On Thu, 4 Jan 2018, Jon Masters wrote:
>>> P.S. I've an internal document where I've been tracking "nice to haves"
>>> for later, and one of them is whether it makes sense to tag binaries as
>>> "trusted" (e.g. extended attribute, label, whatever). It was something I
>>> wanted to bring up at some point as potentially worth considering.
>> Scratch that. There is no such thing as a trusted binary.
> I disagree with you on this Thomas. "trusted" means "we agree to share the
> risk this binary takes because it's critical to our service". When you
> build a load balancing appliance on which 100% of the service is assured
> by a single executable and the rest is just config management, you'd better
> trust that process.

So you want to run this "one binary" as fast as possible and without
mitigations in place?  But, you want mitigations *available* on that
system at the same time?  For what?  If there's only one binary, why not
just disable the mitigations entirely?


Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Dave Hansen
On 01/04/2018 10:49 PM, Willy Tarreau wrote:
> On Fri, Jan 05, 2018 at 01:54:13AM +0100, Thomas Gleixner wrote:
>> On Thu, 4 Jan 2018, Jon Masters wrote:
>>> P.S. I've an internal document where I've been tracking "nice to haves"
>>> for later, and one of them is whether it makes sense to tag binaries as
>>> "trusted" (e.g. extended attribute, label, whatever). It was something I
>>> wanted to bring up at some point as potentially worth considering.
>> Scratch that. There is no such thing as a trusted binary.
> I disagree with you on this Thomas. "trusted" means "we agree to share the
> risk this binary takes because it's critical to our service". When you
> build a load balancing appliance on which 100% of the service is assured
> by a single executable and the rest is just config management, you'd better
> trust that process.

So you want to run this "one binary" as fast as possible and without
mitigations in place?  But, you want mitigations *available* on that
system at the same time?  For what?  If there's only one binary, why not
just disable the mitigations entirely?


  1   2   3   >