Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-10 Thread Jeff Law
On 01/09/2018 08:29 AM, Richard Earnshaw (lists) wrote:
> I'm in two minds about that.  There are certainly cases where you might
> want to use the generic expansion as part of handling a case where you
> have a more standard speculation barrier; in those cases you might want
> to emit your barrier and then let the generic code expand and provide
> the logical behaviour of the builtin.
> 
> However, if you don't have a barrier you need to ask yourself, will my
> architecture ever have an implementation that does do speculation?
> Unless you can be certain that it won't, you probably need to be warned
> that some code the programmer thinks might be vulnerable to spectre has
> not been compiled with that protection, otherwise if you run that code
> on a later implementation that does speculate, it might be vulnerable.
> 
> Let me give an example, we use the generic code expansion if we
> encounter the builtin when generating code for Thumb on pre-ARMv7
> devices.  We don't have instructions in 'thumb1' to guard against
> speculation and we really want to warn users that they've done this (it
> might be a mistake in how they're invoking the compiler).
> 
> I could add an extra parameter to the helper (bool warn_unimplemented),
> which would be true if called directly from the expanders, but would now
> allow back-ends to implement a trivial variant that just suppressed the
> warning.  They could also then use the generic expansion if all that was
> needed was a subsequent fence instruction.
Yea, I see your side as well on this -- advisable or not I suspect we're
going to see folks saying "my embedded chip doesn't have these problems,
I don't want to pay any of these costs" and I don't want to be warned
about a problem I know can't happen (today).

Anyway, I think relatively speaking this is minor compared to the stuff
we're discussing around the semantics of the builtin.  I can live with
iterating on this aspect based on feedback.

jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-10 Thread Jeff Law
On 01/09/2018 03:47 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 13:08, Alexander Monakov wrote:
>> On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
>>> This is quite tricky.  For ARM we have to have a speculated load.
>>
>> Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
>> wouldn't work (applying CSEL to the address rather than loaded value), and
>> if it wouldn't, then ARM-specific lowering of the builtin can handle that
>> anyhow, right? (by spilling the pointer)
> 
> The load has to feed /in/ to the csel/csdb sequence, not come after it.
> 
>>
>> (on x86 the current Intel's recommendation is to emit LFENCE prior to the 
>> load)
> 
> That can be supported in the way you expand the builtin.  The builtin
> expander is given a (MEM (ptr)) , but it's up to the back-end where to
> put that in the expanded sequence to materialize the load, so you could
> write (sorry, don't know x86 asm very well, but I think this is how
> you'd put it)
> 
>   lfence
>   mov (ptr), dest
> 
> with branches around that as appropriate to support the remainder of the
> builtin's behaviour.
I think the argument is going to be that they don't want the branches
around to support the other test + failval semantics.  Essentially the
same position as IBM has with PPC.

> 
>> Is the main issue expressing the CSEL condition in the source code? Perhaps 
>> it is
>> possible to introduce
>>
>>   int guard = __builtin_nontransparent(predicate);
>>
>>   if (predicate)
>> foo = __builtin_load_no_speculate([addr], guard);
>>
>> ... or maybe even
>>
>>   if (predicate)
>> foo = arr[__builtin_loadspecbarrier(addr, guard)];
>>
>> where internally __builtin_nontransparent is the same as
>>
>>   guard = predicate;
>>   asm volatile("" : "+g"(guard));
>>
>> although admittedly this is not perfect since it forces evaluation of 'guard'
>> before the branch.
> 
> As I explained to Bernd last night, I think this is likely be unsafe.
> If there's some control path before __builtin_nontransparent that allows
> 'predicate' to be simplified (eg by value range propagation), then your
> guard doesn't protect against the speculation that you think it does.
> Changing all the optimizers to guarantee that wouldn't happen (and
> guaranteeing that all future optimizers won't introduce new problems of
> that nature) is, I suspect, very non-trivial.
Agreed.  Whatever PREDICATE happens to be, the compiler is going to go
through extreme measures to try and collapse PREDICATE down to a
compile-time constant, including splitting paths to the point where
PREDICATE is used in the conditional so that on one side it's constant
and the other it's non-constant.  It seems like this approach is likely
to be compromised by the optimizers.


Jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Bernd Edlinger
On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
> On 09/01/18 17:36, Bernd Edlinger wrote:
>> Richard Earnshaw wrote:
>>   > Let me give an example, we use the generic code expansion if we
>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>   > speculation and we really want to warn users that they've done this (it
>>   > might be a mistake in how they're invoking the compiler).
>>
>> Instead of this warning in one of the unsupported cases, could you use
>> the DSB SYS + ISB  barrier as the white paper suggests?
>>
>>
>> Bernd.
> 
> Thumb1 doesn't have those instructions.
> 


I don't know if it helps, but a few years ago I have written some
assembler macros that allow to temporarily switch from THUMB to ARM
mode and back on an ARMv5 target, that used to worked quite well:

# define HAL_ARM_MODE() ";"  \
   " .align 2;"   \
   " bx pc;"  \
   " nop;"\
   " .code 32;"

# define HAL_THUMB_MODE(_scratch_) ";"   \
   " orr " #_scratch_ ",pc,#1;"   \
   " bx " #_scratch_ ";"  \
   " .code 16;"

#define HAL_DISABLE_INTERRUPTS(_old_)   \
 asm volatile (  \
 HAL_ARM_MODE()  \
 "mrs %0,cpsr;"  \
 "orr r3,%0,%1;" \
 "msr cpsr,r3"   \
 HAL_THUMB_MODE(r3)  \
 : "=r"(_old_)   \
 : "i"(CPSR_INTR_MASK)   \
 : "r3"  \
 );

#define HAL_ENABLE_INTERRUPTS() \
 asm volatile (  \
 HAL_ARM_MODE()  \
 "mrs r3,cpsr;"  \
 "bic r3,r3,%0;" \
 "msr cpsr,r3"   \
 HAL_THUMB_MODE(r3)  \
 :   \
 : "i"(CPSR_INTR_MASK)   \
 : "r3"  \
 );


So the the switch back macro just needed a scratch register,
that's all.


Bernd.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Bernd Edlinger
On 01/09/18 19:08, Richard Earnshaw (lists) wrote:
> 
> But we probably don't need to.
> 
>   int x __attribute__((mode(TI))) = 0;
> 
> $ arm-none-elf-gcc timode.c
> 
> /tmp/ti.c:1:1: error: unable to emulate ‘TI’
>   int x __attribute__((mode(TI))) = 0;
>   ^~~
> 

Yes, good, then you should probably do this:

+  if (TARGET_THUMB1)
+return default_inhibit_load_speculation (mode, result, mem, 
lower_bound,
+upper_bound, fail_result, cmpptr);


Bernd.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 09/01/18 18:00, Richard Earnshaw (lists) wrote:
> On 09/01/18 17:57, Bernd Edlinger wrote:
>> On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
>>> On 09/01/18 17:36, Bernd Edlinger wrote:
 Richard Earnshaw wrote:
   > Let me give an example, we use the generic code expansion if we
   > encounter the builtin when generating code for Thumb on pre-ARMv7
   > devices.  We don't have instructions in 'thumb1' to guard against
   > speculation and we really want to warn users that they've done this (it
   > might be a mistake in how they're invoking the compiler).

 Instead of this warning in one of the unsupported cases, could you use
 the DSB SYS + ISB  barrier as the white paper suggests?


 Bernd.
>>>
>>> Thumb1 doesn't have those instructions.
>>> 
>> 
>> Ah, well, and in the case mode == TImode ?
>> 
>> Bernd.
> 
> I don't think we can ever get a TImode integral type on AArch32.
> Perhaps we should just ICE in that case...
> 
> R.

But we probably don't need to.

 int x __attribute__((mode(TI))) = 0;

$ arm-none-elf-gcc timode.c

/tmp/ti.c:1:1: error: unable to emulate ‘TI’
 int x __attribute__((mode(TI))) = 0;
 ^~~


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 09/01/18 17:57, Bernd Edlinger wrote:
> On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
>> On 09/01/18 17:36, Bernd Edlinger wrote:
>>> Richard Earnshaw wrote:
>>>   > Let me give an example, we use the generic code expansion if we
>>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>>   > speculation and we really want to warn users that they've done this (it
>>>   > might be a mistake in how they're invoking the compiler).
>>>
>>> Instead of this warning in one of the unsupported cases, could you use
>>> the DSB SYS + ISB  barrier as the white paper suggests?
>>>
>>>
>>> Bernd.
>>
>> Thumb1 doesn't have those instructions.
>> 
> 
> Ah, well, and in the case mode == TImode ?
> 
> Bernd.

I don't think we can ever get a TImode integral type on AArch32.
Perhaps we should just ICE in that case...

R.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Bernd Edlinger
On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
> On 09/01/18 17:36, Bernd Edlinger wrote:
>> Richard Earnshaw wrote:
>>   > Let me give an example, we use the generic code expansion if we
>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>   > speculation and we really want to warn users that they've done this (it
>>   > might be a mistake in how they're invoking the compiler).
>>
>> Instead of this warning in one of the unsupported cases, could you use
>> the DSB SYS + ISB  barrier as the white paper suggests?
>>
>>
>> Bernd.
>
> Thumb1 doesn't have those instructions.
> 

Ah, well, and in the case mode == TImode ?

Bernd.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 09/01/18 17:36, Bernd Edlinger wrote:
> Richard Earnshaw wrote:
>  > Let me give an example, we use the generic code expansion if we
>  > encounter the builtin when generating code for Thumb on pre-ARMv7
>  > devices.  We don't have instructions in 'thumb1' to guard against
>  > speculation and we really want to warn users that they've done this (it
>  > might be a mistake in how they're invoking the compiler).
> 
> Instead of this warning in one of the unsupported cases, could you use
> the DSB SYS + ISB  barrier as the white paper suggests?
> 
> 
> Bernd.

Thumb1 doesn't have those instructions.

R.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Bernd Edlinger
Richard Earnshaw wrote:
 > Let me give an example, we use the generic code expansion if we
 > encounter the builtin when generating code for Thumb on pre-ARMv7
 > devices.  We don't have instructions in 'thumb1' to guard against
 > speculation and we really want to warn users that they've done this (it
 > might be a mistake in how they're invoking the compiler).

Instead of this warning in one of the unsupported cases, could you use 
the DSB SYS + ISB  barrier as the white paper suggests?


Bernd.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 05/01/18 17:26, Jeff Law wrote:
> On 01/04/2018 11:20 AM, Joseph Myers wrote:
>> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
>>
>>> 1 - generic modifications to GCC providing the builtin function for all
>>> architectures and expanding to an implementation that gives the
>>> logical behaviour of the builtin only.  A warning is generated if
>>> this expansion path is used that code will execute correctly but
>>> without providing protection against speculative use.
>>
>> Presumably it would make sense to have a standard way for architectures 
>> with no speculative execution to just use the generic version, but without 
>> the warning?  E.g., split up default_inhibit_load_speculation so that it 
>> generates the warning then calls another function with the same prototype, 
>> so such architectures can just define the hook to use that other function?
>>
> Seems wise.  There's still architectures that don't speculate or don't
> speculate enough to trigger these problems.
> 
> Jeff
> 

I'm in two minds about that.  There are certainly cases where you might
want to use the generic expansion as part of handling a case where you
have a more standard speculation barrier; in those cases you might want
to emit your barrier and then let the generic code expand and provide
the logical behaviour of the builtin.

However, if you don't have a barrier you need to ask yourself, will my
architecture ever have an implementation that does do speculation?
Unless you can be certain that it won't, you probably need to be warned
that some code the programmer thinks might be vulnerable to spectre has
not been compiled with that protection, otherwise if you run that code
on a later implementation that does speculate, it might be vulnerable.

Let me give an example, we use the generic code expansion if we
encounter the builtin when generating code for Thumb on pre-ARMv7
devices.  We don't have instructions in 'thumb1' to guard against
speculation and we really want to warn users that they've done this (it
might be a mistake in how they're invoking the compiler).

I could add an extra parameter to the helper (bool warn_unimplemented),
which would be true if called directly from the expanders, but would now
allow back-ends to implement a trivial variant that just suppressed the
warning.  They could also then use the generic expansion if all that was
needed was a subsequent fence instruction.

R.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Alexander Monakov
On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
> Read condition 1 i) again.
> 
> 
> i) the conditional select instruction has a register data dependency on
> a load R1, that has been executed speculatively, for one of its input
> registers, and

Sorry - I don't know how I missed that. I understand the motivation for the
behavior of the new built-in now; the CSDB specification is surprising, but
that is off-topic I guess. Seems quite unfortunate that CSDB is constraining
the generic built-in like this, though.

Thank you for clearing this up.

I suggest that the user documentation in extend.texi should explicitly call out
that the load itself still may be speculatively executed - only its consumers
can be expected to be fenced off. For example, after

+but in addition target-specific code will be inserted to ensure that
+speculation using @code{*ptr} cannot occur when @var{cmpptr} lies outside of
+the specified bounds.

it can go on to say e.g.,

  However note that the load of @code{*ptr} itself and the conditional branch
  leading to it may still be subject to speculative execution (and the load may
  have observable effects on the cache as a result).

Thanks.
Alexander


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 09/01/18 13:27, Alexander Monakov wrote:
> On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
>> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
>> > wouldn't work (applying CSEL to the address rather than loaded value), and
>> > if it wouldn't, then ARM-specific lowering of the builtin can handle that
>> > anyhow, right? (by spilling the pointer)
>> 
>> The load has to feed /in/ to the csel/csdb sequence, not come after it.
> 
> Again, I'm sorry, but I have to insist that what you're saying here
> contradicts
> the documentation linked from
> https://developer.arm.com/support/security-update
> The PDF currently says, in "Details of the CSDB barrier":
> 
>     Until the barrier completes:
>     1) For any load, store, data or instruction preload, RW2, appearing in
>     program order *after the barrier* [...]
> 
>     2) For any indirect branch (B2), appearing in program order
>     *after the barrier* [...]
> 
>     [...] the speculative execution of RW2/B2 does not influence the
>     allocations of entries in a cache [...]
> 
> It doesn't say anything about the behavior of CSDB being dependent on
> the loads
> encountered prior to it.  (and imho it doesn't make sense for a hardware
> implementation to work that way)

Read condition 1 i) again.


i) the conditional select instruction has a register data dependency on
a load R1, that has been executed speculatively, for one of its input
registers, and

To be executed speculatively it must be *after* a predicted operation
that has not yet resolved.  Once it has resolved it is no-longer
speculative, it's committed (one way or another).

R.

> 
>> As I explained to Bernd last night, I think this is likely be unsafe.
>> If there's some control path before __builtin_nontransparent that allows
>> 'predicate' to be simplified (eg by value range propagation), then your
>> guard doesn't protect against the speculation that you think it does.
>> Changing all the optimizers to guarantee that wouldn't happen (and
>> guaranteeing that all future optimizers won't introduce new problems of
>> that nature) is, I suspect, very non-trivial.
> 
> But note that in that case the compiler could have performed the same
> simplification in the original code as well, emitting straight-line
> machine code
> lacking speculatively executable parts in the first place.
> 
> Alexander



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Alexander Monakov
On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
> > wouldn't work (applying CSEL to the address rather than loaded value), and
> > if it wouldn't, then ARM-specific lowering of the builtin can handle that
> > anyhow, right? (by spilling the pointer)
> 
> The load has to feed /in/ to the csel/csdb sequence, not come after it.

Again, I'm sorry, but I have to insist that what you're saying here contradicts
the documentation linked from https://developer.arm.com/support/security-update
The PDF currently says, in "Details of the CSDB barrier":

Until the barrier completes:
1) For any load, store, data or instruction preload, RW2, appearing in
program order *after the barrier* [...]

2) For any indirect branch (B2), appearing in program order
*after the barrier* [...]

[...] the speculative execution of RW2/B2 does not influence the
allocations of entries in a cache [...]

It doesn't say anything about the behavior of CSDB being dependent on the loads
encountered prior to it.  (and imho it doesn't make sense for a hardware
implementation to work that way)

> As I explained to Bernd last night, I think this is likely be unsafe.
> If there's some control path before __builtin_nontransparent that allows
> 'predicate' to be simplified (eg by value range propagation), then your
> guard doesn't protect against the speculation that you think it does.
> Changing all the optimizers to guarantee that wouldn't happen (and
> guaranteeing that all future optimizers won't introduce new problems of
> that nature) is, I suspect, very non-trivial.

But note that in that case the compiler could have performed the same
simplification in the original code as well, emitting straight-line machine code
lacking speculatively executable parts in the first place.

Alexander


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 05/01/18 17:24, Jeff Law wrote:
> On 01/04/2018 06:58 AM, Richard Earnshaw wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
> So I think it's important for anyone reading this stuff to read
> Richard's paragraph carefully --  "an arbitrary read gadget".
> 
> I fully expect that over the course of time we're going to see other
> arbitrary read gadgets to be found.  Those gadgets may have lower
> bandwidth, have a higher degree of jitter or be harder to exploit, but
> they're out there just waiting to be discovered.
> 
> So I don't expect this to be the only mitigation we have to put into place.
> 
> Anyway...
> 
> 
>>
>> Some optimizations are permitted to make the builtin easier to use.
>> The final two arguments can both be omitted (c++ style): failval will
>> default to 0 in this case and if cmpptr is omitted ptr will be used
>> for expansions of the range check.  In addition either lower or upper
>> (but not both) may be a literal NULL and the expansion will then
>> ignore that boundary condition when expanding.
> So what are the cases where FAILVAL is useful rather than just using
> zero (or some other constant) all the time?

So some background first.  My initial attempts to define a builtin were
based entirely around trying to specify the builtin without out any hard
functional behaviour as well.  The specification was a mess.  Things
just became so much easier to specify when I moved to defining a logical
behaviour on top of the intended side effects on speculation.  Having
done that it seemed sensible to permit the user to use the builtin in
more creative ways by allowing it to select the failure value.  The idea
was that code such as

  if (ptr >= base && ptr < limit) // bounds check
return *ptr;
  return FAILVAL;

could simply be rewritten as

  return __builtin_load_no_speculate (ptr, base, limit, FAILVAL);

and now you don't have to worry about writing the condition out twice or
any other such nonsense.  In this case the builtin does exactly what you
want it to do.  (It's also easier now to write test cases that check the
correctness of the builtin's expansion, since you can test for a
behaviour of the code's execution, even if you can't check the
speculation behaviour properly.)



> 
> Similarly under what conditions would one want to use CMPPTR rather than
> PTR?

This was at the request of some kernel folk.  They have some cases where
CMPPTR may be a pointer to an atomic type and want to do something like

  if (cmpptr >= lower && cmpptr < upper)
val = __atomic_read_and_inc (cmpptr);

The atomics are complicated enough already and rewriting them all to
additionally inhibit speculation based on the result would be a
nightmare.  By separating out cmpptr from ptr you can now write

  if (cmpptr >= lower && cmpptr < upper)
{
  TYPE tmp_val = __atomic_read_and_inc (cmpptr);
  val = builtin_load_no_speculate (_val, lower, upper, 0,
   cmpptr);
}

It's meaningless in this case to check the bounds of tmp_val - it's just
a value pushed onto the stack in order to satisfy the requirement that
we need a load that is being speculatively executed; but we can still
test against the speculation conditions, which are still the range check
for cmpptr.

There may be other similar cases as well where you have an object that
you want to clean up that is somehow derived from cmpptr but is not
cmpptr itself.  You do have to be more careful with the extended form,
since it is possible to write sequences that don't inhibit speculation
in the way you might think they do, but with greater flexibility also
comes greater risk.  I don't think there's ever a problem if ptr is
somehow derived from dereferencing cmpptr.

R.

> 
> I wandered down through the LKML thread but didn't see anything which
> would shed light on those two questions.
> 
> jeff
>>



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 05/01/18 13:08, Alexander Monakov wrote:
> On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
>> This is quite tricky.  For ARM we have to have a speculated load.
> 
> Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
> wouldn't work (applying CSEL to the address rather than loaded value), and
> if it wouldn't, then ARM-specific lowering of the builtin can handle that
> anyhow, right? (by spilling the pointer)

The load has to feed /in/ to the csel/csdb sequence, not come after it.

> 
> (on x86 the current Intel's recommendation is to emit LFENCE prior to the 
> load)

That can be supported in the way you expand the builtin.  The builtin
expander is given a (MEM (ptr)) , but it's up to the back-end where to
put that in the expanded sequence to materialize the load, so you could
write (sorry, don't know x86 asm very well, but I think this is how
you'd put it)

lfence
mov (ptr), dest

with branches around that as appropriate to support the remainder of the
builtin's behaviour.

> Is the main issue expressing the CSEL condition in the source code? Perhaps 
> it is
> possible to introduce
> 
>   int guard = __builtin_nontransparent(predicate);
> 
>   if (predicate)
> foo = __builtin_load_no_speculate([addr], guard);
> 
> ... or maybe even
> 
>   if (predicate)
> foo = arr[__builtin_loadspecbarrier(addr, guard)];
> 
> where internally __builtin_nontransparent is the same as
> 
>   guard = predicate;
>   asm volatile("" : "+g"(guard));
> 
> although admittedly this is not perfect since it forces evaluation of 'guard'
> before the branch.

As I explained to Bernd last night, I think this is likely be unsafe.
If there's some control path before __builtin_nontransparent that allows
'predicate' to be simplified (eg by value range propagation), then your
guard doesn't protect against the speculation that you think it does.
Changing all the optimizers to guarantee that wouldn't happen (and
guaranteeing that all future optimizers won't introduce new problems of
that nature) is, I suspect, very non-trivial.

R.

> 
> Alexander
> 



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-08 Thread Richard Earnshaw (lists)
On 08/01/18 16:10, Bernd Edlinger wrote:
> I thought about your new builtin again, and I wonder if
> something like that might work as well?
> 
> 
> cat despec.s
>   .arch armv7-a
>   .eabi_attribute 28, 1
>   .eabi_attribute 20, 1
>   .eabi_attribute 21, 1
>   .eabi_attribute 23, 3
>   .eabi_attribute 24, 1
>   .eabi_attribute 25, 1
>   .eabi_attribute 26, 2
>   .eabi_attribute 30, 4
>   .eabi_attribute 34, 1
>   .eabi_attribute 18, 4
>   .section.text.startup,"ax",%progbits
>   .align  2
>   .global despec_i
>   .syntax unified
>   .arm
>   .fpu vfpv3-d16
> 
> despec_i:
>   cmp r0,#0
>   beq L0
>   ldr r0,[r1]
>   moveq r0,r2
>   nop {0x14} @ CSDB
>   str r0,[r1]
>   mov r0,#1
>   bx lr
> L0:   mov r0,#0
>   bx lr
> 
> cat test.c
> extern int despec_i(int predicate, int *untrusted, int fallback);
> #define N 8
> int a[N] = {1,2,3,4,5,7,8,9};
> int a2[0x200];
> int test(int untrust)
> {
>int x = 0;
>if (despec_i(untrust >= 0 && untrust < N, , 0))
>{
>   int v = a[untrust] & 0x1 ? 0x100 : 0x0;
>   x = a2[v];
>}
>return x;
> }
> 
> 
> So this should feed the predicate through the builtin, and
> clear the untrusted value when the condition has been been
> mis-predicted.
> 
> Wouldn't that be more flexible to use?
> Or am I missing something?

Yes, if you modified your test case to be something like:


int test(int untrust)
{
   int x = 0;

   if (untrust < 0)
 return x;

   if (despec_i(untrust >= 0 && untrust < N, , 0))
   {
  int v = a[untrust] & 0x1 ? 0x100 : 0x0;
  x = a2[v];
   }
   return x;
}

then the compiler can (and will) optimize the condition to

  if (despec (true && untrust < N, , 0))

and suddenly you don't have the protection against speculative execution
that you thought you had.  That makes the API exceedingly dangerous as
we can't easily detect and inhibit all the sources of information that
the compiler might use to make such deductions.  What happens, for
example if your original case is inlined into a wider function that has
additional tests?

R.


> 
> 
> As a side note: I noticed that "nop {0x14}" seems to produce the correct
> assembler opcode without the need for using a .insn code.
> 
> 
> Bernd.
> 



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-08 Thread Bernd Edlinger
I thought about your new builtin again, and I wonder if
something like that might work as well?


cat despec.s
.arch armv7-a
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 4
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.section.text.startup,"ax",%progbits
.align  2
.global despec_i
.syntax unified
.arm
.fpu vfpv3-d16

despec_i:
cmp r0,#0
beq L0
ldr r0,[r1]
moveq r0,r2
nop {0x14} @ CSDB
str r0,[r1]
mov r0,#1
bx lr
L0: mov r0,#0
bx lr

cat test.c
extern int despec_i(int predicate, int *untrusted, int fallback);
#define N 8
int a[N] = {1,2,3,4,5,7,8,9};
int a2[0x200];
int test(int untrust)
{
   int x = 0;
   if (despec_i(untrust >= 0 && untrust < N, , 0))
   {
  int v = a[untrust] & 0x1 ? 0x100 : 0x0;
  x = a2[v];
   }
   return x;
}


So this should feed the predicate through the builtin, and
clear the untrusted value when the condition has been been
mis-predicted.

Wouldn't that be more flexible to use?
Or am I missing something?


As a side note: I noticed that "nop {0x14}" seems to produce the correct
assembler opcode without the need for using a .insn code.


Bernd.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-08 Thread Nick Clifton
Hi Guys,

  It seems to me that it might be worth taking a step back here,
  and consider adding a security framework to gcc.  Mitigations
  for CVEs in the past have resulted in individual patches being
  added to gcc, oftern in a target specific manner, and with no
  real framework to support them, document them, or indicate to
  an external tool that they have been applied.

  In addition security fixes often result in the generation of
  less optimal code, and so it might be useful to have a way to
  tell other parts of gcc that a given particular sequence should
  not be altered.

  Not that I am an expert in this area, but I do think that it is
  something that should be discussed...

Cheers
  Nick





Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-07 Thread Bernd Edlinger
Hi Richard,

I wonder how to use this builtin correctly.
Frankly it sounds like way too complicated.

Do you expect we need to use this stuff on every
array access now, or is that just a theoretical
thing that can only happen with jave byte code
interpreters?

If I assume there is an array of int, then
the correct way to check the array bounds
would be something like:

int a[N];

int x = __builtin_load_no_speculate(ptr, a, a+N);

But what if ptr points to the last byte of a[N-1] ?
This would load 3 bytes past the array, right ?

Then you would have to write:

x = __builtin_load_no_speculate(ptr, a, (char*)(a+N) - 3);

which is just horrible, wouldn't it be better to fold that
into the condition:

I mean, instead of...

if (cmpptr >= lower && cmpptr < upper)

...use something like:

if (cmpptr >= lower && cmpptr <= upper - sizeof(TYP))


And what happens if the untrusted pointer is unaligned?
Wouldn't that also give quite surprising values to speculate
with?


Bernd.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/04/2018 11:20 AM, Joseph Myers wrote:
> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
> 
>> 1 - generic modifications to GCC providing the builtin function for all
>> architectures and expanding to an implementation that gives the
>> logical behaviour of the builtin only.  A warning is generated if
>> this expansion path is used that code will execute correctly but
>> without providing protection against speculative use.
> 
> Presumably it would make sense to have a standard way for architectures 
> with no speculative execution to just use the generic version, but without 
> the warning?  E.g., split up default_inhibit_load_speculation so that it 
> generates the warning then calls another function with the same prototype, 
> so such architectures can just define the hook to use that other function?
> 
Seems wise.  There's still architectures that don't speculate or don't
speculate enough to trigger these problems.

Jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/04/2018 06:58 AM, Richard Earnshaw wrote:
> 
> Recently, Google Project Zero disclosed several classes of attack
> against speculative execution. One of these, known as variant-1
> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
> speculation, providing an arbitrary read gadget. Further details can
> be found on the GPZ blog [1] and the documentation that is included
> with the first patch.
So I think it's important for anyone reading this stuff to read
Richard's paragraph carefully --  "an arbitrary read gadget".

I fully expect that over the course of time we're going to see other
arbitrary read gadgets to be found.  Those gadgets may have lower
bandwidth, have a higher degree of jitter or be harder to exploit, but
they're out there just waiting to be discovered.

So I don't expect this to be the only mitigation we have to put into place.

Anyway...


> 
> Some optimizations are permitted to make the builtin easier to use.
> The final two arguments can both be omitted (c++ style): failval will
> default to 0 in this case and if cmpptr is omitted ptr will be used
> for expansions of the range check.  In addition either lower or upper
> (but not both) may be a literal NULL and the expansion will then
> ignore that boundary condition when expanding.
So what are the cases where FAILVAL is useful rather than just using
zero (or some other constant) all the time?

Similarly under what conditions would one want to use CMPPTR rather than
PTR?

I wandered down through the LKML thread but didn't see anything which
would shed light on those two questions.

jeff
> 


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/05/2018 03:47 AM, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
>  wrote:
>> On 05/01/18 09:44, Richard Biener wrote:
>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>>  wrote:

 Recently, Google Project Zero disclosed several classes of attack
 against speculative execution. One of these, known as variant-1
 (CVE-2017-5753), allows explicit bounds checks to be bypassed under
 speculation, providing an arbitrary read gadget. Further details can
 be found on the GPZ blog [1] and the documentation that is included
 with the first patch.

 This patch set adds a new builtin function for GCC to provide a
 mechanism for limiting speculation by a CPU after a bounds-checked
 memory access.  I've tried to design this in such a way that it can be
 used for any target where this might be necessary.  The patch set
 provides a generic implementation of the builtin and then
 target-specific support for Arm and AArch64.  Other architectures can
 utilize the internal infrastructure as needed.

 Most of the details of the builtin and the hooks that need to be
 implemented to support it are described in the updates to the manual,
 but a short summary is given below.

 TYP __builtin_load_no_speculate
 (const volatile TYP *ptr,
  const volatile void *lower,
  const volatile void *upper,
  TYP failval,
  const volatile void *cmpptr)

 Where TYP can be any integral type (signed or unsigned char, int,
 short, long, etc) or any pointer type.

 The builtin implements the following logical behaviour:

 inline TYP __builtin_load_no_speculate
  (const volatile TYP *ptr,
   const volatile void *lower,
   const volatile void *upper,
   TYP failval,
   const volatile void *cmpptr)
 {
   TYP result;

   if (cmpptr >= lower && cmpptr < upper)
 result = *ptr;
   else
 result = failval;
   return result;
 }

 in addition the specification of the builtin ensures that future
 speculation using *ptr may only continue iff cmpptr lies within the
 bounds specified.
>>>
>>> I fail to see how the vulnerability doesn't affect aggregate copies
>>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>>> value to leak but (part of) the address by populating the CPU cache
>>> side-channel.
>>>
>>
>> It's not quite as simple as that.  You'll need to read the white paper
>> on Arm's web site to get a full grasp of this (linked from
>> https://www.arm.com/security-update).
>>
>>> So why isn't this just
>>>
>>>  T __builtin_load_no_speculate (T *);
>>>
>>> ?  Why that "fallback" and why the lower/upper bounds?
>>
>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
>> to restrict subsequent speculation, the CSEL needs a condition and the
>> compiler must not be able to optimize it away based on logical
>> reachability.  The fallback is used for the other operand of the CSEL
>> instruction.
> 
> But the condition could be just 'true' in the instruction encoding?  That is,
> why do you do both the jump-around and the csel?  Maybe I misunderstood
> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> the compiler messing up things.
> 
>>>
>>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>>
>> As stated we'll shortly be submitting similar patches for LLVM.
> 
> How do you solve the aggregate load issue?  I would imagine
> that speculating stores (by pre-fetching the affected cache line!)
> could be another attack vector?  Not sure if any CPU speculatively
> does that (but I see no good reason why a CPU shouldn't do that).
Without going into details, I've speculated to the appropriate folks
that there's other vectors for these kinds of attacks, some more
exploitable than others.  Attacking the cache via loads/stores is just
one instance of a class of problems IMHO.

I fully expect we'll be fighting this class of problems for a while.
This is just the tip of the iceberg.

Jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/05/2018 04:57 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 11:35, Jakub Jelinek wrote:
>> On Fri, Jan 05, 2018 at 10:59:21AM +, Richard Earnshaw (lists) wrote:
 But the condition could be just 'true' in the instruction encoding?  That 
 is,
 why do you do both the jump-around and the csel?  Maybe I misunderstood
 the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
 the compiler messing up things.

>>>
>>> That's why the intrinsic takes explicit bounds not a simple bool
>>> expressing the conditions.  It prevents the optimizers from replacing
>>> the condition by value range propagation.  That does restrict the
>>
>> Preventing optimizers from replacing the condition can be done in many ways,
>> IFN, UNSPEC, ...
>> The question is if you really need it at the assembly level, or if you can
>> just do an unconditional branch or branch conditional on say comparison of
>> two constants, without any dynamic bounds.
>>
>>  Jakub
>>
> 
> The bounds /have/ to reflect the real speculation condition.  Otherwise
> the CSEL becomes ineffective.
I think this is probably the key that Jakub and Richard B. were missing.
 It certainly hadn't clicked for me when we spoke earlier.

Jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/05/2018 03:40 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 09:44, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>  wrote:
>>>
>>> Recently, Google Project Zero disclosed several classes of attack
>>> against speculative execution. One of these, known as variant-1
>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>> speculation, providing an arbitrary read gadget. Further details can
>>> be found on the GPZ blog [1] and the documentation that is included
>>> with the first patch.
>>>
>>> This patch set adds a new builtin function for GCC to provide a
>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>> memory access.  I've tried to design this in such a way that it can be
>>> used for any target where this might be necessary.  The patch set
>>> provides a generic implementation of the builtin and then
>>> target-specific support for Arm and AArch64.  Other architectures can
>>> utilize the internal infrastructure as needed.
>>>
>>> Most of the details of the builtin and the hooks that need to be
>>> implemented to support it are described in the updates to the manual,
>>> but a short summary is given below.
>>>
>>> TYP __builtin_load_no_speculate
>>> (const volatile TYP *ptr,
>>>  const volatile void *lower,
>>>  const volatile void *upper,
>>>  TYP failval,
>>>  const volatile void *cmpptr)
>>>
>>> Where TYP can be any integral type (signed or unsigned char, int,
>>> short, long, etc) or any pointer type.
>>>
>>> The builtin implements the following logical behaviour:
>>>
>>> inline TYP __builtin_load_no_speculate
>>>  (const volatile TYP *ptr,
>>>   const volatile void *lower,
>>>   const volatile void *upper,
>>>   TYP failval,
>>>   const volatile void *cmpptr)
>>> {
>>>   TYP result;
>>>
>>>   if (cmpptr >= lower && cmpptr < upper)
>>> result = *ptr;
>>>   else
>>> result = failval;
>>>   return result;
>>> }
>>>
>>> in addition the specification of the builtin ensures that future
>>> speculation using *ptr may only continue iff cmpptr lies within the
>>> bounds specified.
>>
>> I fail to see how the vulnerability doesn't affect aggregate copies
>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>> value to leak but (part of) the address by populating the CPU cache
>> side-channel.
>>
> 
> It's not quite as simple as that.  You'll need to read the white paper
> on Arm's web site to get a full grasp of this (linked from
> https://www.arm.com/security-update).
I actually recommend reading the original papers referenced by Google in
addition to the ARM whitepaper.  This stuff is far from intuitive.  I've
been loosely involved for a month or so, but it really didn't start to
click for me until right before the holiday break.



jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/05/2018 02:44 AM, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>  wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
>>
>> This patch set adds a new builtin function for GCC to provide a
>> mechanism for limiting speculation by a CPU after a bounds-checked
>> memory access.  I've tried to design this in such a way that it can be
>> used for any target where this might be necessary.  The patch set
>> provides a generic implementation of the builtin and then
>> target-specific support for Arm and AArch64.  Other architectures can
>> utilize the internal infrastructure as needed.
>>
>> Most of the details of the builtin and the hooks that need to be
>> implemented to support it are described in the updates to the manual,
>> but a short summary is given below.
>>
>> TYP __builtin_load_no_speculate
>> (const volatile TYP *ptr,
>>  const volatile void *lower,
>>  const volatile void *upper,
>>  TYP failval,
>>  const volatile void *cmpptr)
>>
>> Where TYP can be any integral type (signed or unsigned char, int,
>> short, long, etc) or any pointer type.
>>
>> The builtin implements the following logical behaviour:
>>
>> inline TYP __builtin_load_no_speculate
>>  (const volatile TYP *ptr,
>>   const volatile void *lower,
>>   const volatile void *upper,
>>   TYP failval,
>>   const volatile void *cmpptr)
>> {
>>   TYP result;
>>
>>   if (cmpptr >= lower && cmpptr < upper)
>> result = *ptr;
>>   else
>> result = failval;
>>   return result;
>> }
>>
>> in addition the specification of the builtin ensures that future
>> speculation using *ptr may only continue iff cmpptr lies within the
>> bounds specified.
> 
> I fail to see how the vulnerability doesn't affect aggregate copies
> or bitfield accesses.  The vulnerability doesn't cause the loaded
> value to leak but (part of) the address by populating the CPU cache
> side-channel.
> 
> So why isn't this just
> 
>  T __builtin_load_no_speculate (T *);
> 
> ?  Why that "fallback" and why the lower/upper bounds?
> 
> Did you talk with other compiler vendors (intel, llvm, ...?)?
I think "fallback" could potentially be any value, I don't think it's
actual value matters much -- I could just as easily be "0" across the
board or some indeterminate value (as long as the indeterminate doesn't
itself present an information leak).

The bounds are used to build a condition for the csel to select between
the loaded value and the fail value.  You need some way to select
between them.

If I've got anything wrong, I'm sure Richard E. will correct me.

Jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Alexander Monakov
On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
> This is quite tricky.  For ARM we have to have a speculated load.

Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
wouldn't work (applying CSEL to the address rather than loaded value), and
if it wouldn't, then ARM-specific lowering of the builtin can handle that
anyhow, right? (by spilling the pointer)

(on x86 the current Intel's recommendation is to emit LFENCE prior to the load)

Is the main issue expressing the CSEL condition in the source code? Perhaps it 
is
possible to introduce

  int guard = __builtin_nontransparent(predicate);

  if (predicate)
foo = __builtin_load_no_speculate([addr], guard);

... or maybe even

  if (predicate)
foo = arr[__builtin_loadspecbarrier(addr, guard)];

where internally __builtin_nontransparent is the same as

  guard = predicate;
  asm volatile("" : "+g"(guard));

although admittedly this is not perfect since it forces evaluation of 'guard'
before the branch.

Alexander


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 11:35, Jakub Jelinek wrote:
> On Fri, Jan 05, 2018 at 10:59:21AM +, Richard Earnshaw (lists) wrote:
>>> But the condition could be just 'true' in the instruction encoding?  That 
>>> is,
>>> why do you do both the jump-around and the csel?  Maybe I misunderstood
>>> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
>>> the compiler messing up things.
>>>
>>
>> That's why the intrinsic takes explicit bounds not a simple bool
>> expressing the conditions.  It prevents the optimizers from replacing
>> the condition by value range propagation.  That does restrict the
> 
> Preventing optimizers from replacing the condition can be done in many ways,
> IFN, UNSPEC, ...
> The question is if you really need it at the assembly level, or if you can
> just do an unconditional branch or branch conditional on say comparison of
> two constants, without any dynamic bounds.
> 
>   Jakub
> 

The bounds /have/ to reflect the real speculation condition.  Otherwise
the CSEL becomes ineffective.

R.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 11:37, Alexander Monakov wrote:
> I believe the proposed behavior of the new builtin is over-specialized.
> In principle the following snippet may be open to exploitation too:
> 
>   if (predicate)
>     foo = arr[(secret >> untrusted) & 64];
> 
> assuming the attacker has a means to observe which part of 'arr' is
> brought into
> cache, but cannot set 'predicate' to true (so has to rely on the speculative
> population of the cache); and likewise if a store is predicated-off
> rather than
> a load.
> 
> So shouldn't, for generality, the new builtin work "as if" by cleansing the
> address rather than the result of the load, like the following? It would
> also be
> applicable to stores then.

This is quite tricky.  For ARM we have to have a speculated load.  So
one way to recast this might be to push 'untrusted' into a stack slot
that was then speculatively loaded back by the builtin.  To do that
you'd need to find a way of expressing predicate as a range check, so
something like

  if (predicate)
   {
 foo = arr[(secret >> __builtin_load_no_speculate (,
predicate_base, predicate_upper, 0,
predicate_val)) & 64)];
   }
You could use similar techniques with stores as well.

I'm open to suggestions as to how we might express the conditions
better.  It's certainly the least pleasant part of this proposal, but
the seemingly obvious '_Bool condition' parameter won't work because the
compiler will just substitute true in too many situations where it
thinks it has proved that it knows the conditions.

R.


> 
> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
>> inline TYP __builtin_load_no_speculate
>>  (const volatile TYP *ptr,
>>   const volatile void *lower,
>>   const volatile void *upper,
>>   TYP failval,
>>   const volatile void *cmpptr)
>> {
>>   TYP result;
>> 
>>   if (cmpptr >= lower && cmpptr < upper)
>> result = *ptr;
>>   else
>> result = failval;
>>   return result;
>> }
> 
> {
>   if (!(cmpptr >= lower && cmpptr < upper))
>     ptr = NULL;
> 
>   return *ptr;
> }
> 
> Alexander



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Alexander Monakov
I believe the proposed behavior of the new builtin is over-specialized.
In principle the following snippet may be open to exploitation too:

  if (predicate)
foo = arr[(secret >> untrusted) & 64];

assuming the attacker has a means to observe which part of 'arr' is brought into
cache, but cannot set 'predicate' to true (so has to rely on the speculative
population of the cache); and likewise if a store is predicated-off rather than
a load.

So shouldn't, for generality, the new builtin work "as if" by cleansing the
address rather than the result of the load, like the following? It would also be
applicable to stores then.

On Thu, 4 Jan 2018, Richard Earnshaw wrote:
> inline TYP __builtin_load_no_speculate
>  (const volatile TYP *ptr,
>   const volatile void *lower,
>   const volatile void *upper,
>   TYP failval,
>   const volatile void *cmpptr)
> {
>   TYP result;
> 
>   if (cmpptr >= lower && cmpptr < upper)
> result = *ptr;
>   else
> result = failval;
>   return result;
> }

{
  if (!(cmpptr >= lower && cmpptr < upper))
ptr = NULL;

  return *ptr;
}

Alexander


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jakub Jelinek
On Fri, Jan 05, 2018 at 10:59:21AM +, Richard Earnshaw (lists) wrote:
> > But the condition could be just 'true' in the instruction encoding?  That 
> > is,
> > why do you do both the jump-around and the csel?  Maybe I misunderstood
> > the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> > the compiler messing up things.
> > 
> 
> That's why the intrinsic takes explicit bounds not a simple bool
> expressing the conditions.  It prevents the optimizers from replacing
> the condition by value range propagation.  That does restrict the

Preventing optimizers from replacing the condition can be done in many ways,
IFN, UNSPEC, ...
The question is if you really need it at the assembly level, or if you can
just do an unconditional branch or branch conditional on say comparison of
two constants, without any dynamic bounds.

Jakub


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 10:47, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
>  wrote:
>> On 05/01/18 09:44, Richard Biener wrote:
>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>>  wrote:

 Recently, Google Project Zero disclosed several classes of attack
 against speculative execution. One of these, known as variant-1
 (CVE-2017-5753), allows explicit bounds checks to be bypassed under
 speculation, providing an arbitrary read gadget. Further details can
 be found on the GPZ blog [1] and the documentation that is included
 with the first patch.

 This patch set adds a new builtin function for GCC to provide a
 mechanism for limiting speculation by a CPU after a bounds-checked
 memory access.  I've tried to design this in such a way that it can be
 used for any target where this might be necessary.  The patch set
 provides a generic implementation of the builtin and then
 target-specific support for Arm and AArch64.  Other architectures can
 utilize the internal infrastructure as needed.

 Most of the details of the builtin and the hooks that need to be
 implemented to support it are described in the updates to the manual,
 but a short summary is given below.

 TYP __builtin_load_no_speculate
 (const volatile TYP *ptr,
  const volatile void *lower,
  const volatile void *upper,
  TYP failval,
  const volatile void *cmpptr)

 Where TYP can be any integral type (signed or unsigned char, int,
 short, long, etc) or any pointer type.

 The builtin implements the following logical behaviour:

 inline TYP __builtin_load_no_speculate
  (const volatile TYP *ptr,
   const volatile void *lower,
   const volatile void *upper,
   TYP failval,
   const volatile void *cmpptr)
 {
   TYP result;

   if (cmpptr >= lower && cmpptr < upper)
 result = *ptr;
   else
 result = failval;
   return result;
 }

 in addition the specification of the builtin ensures that future
 speculation using *ptr may only continue iff cmpptr lies within the
 bounds specified.
>>>
>>> I fail to see how the vulnerability doesn't affect aggregate copies
>>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>>> value to leak but (part of) the address by populating the CPU cache
>>> side-channel.
>>>
>>
>> It's not quite as simple as that.  You'll need to read the white paper
>> on Arm's web site to get a full grasp of this (linked from
>> https://www.arm.com/security-update).
>>
>>> So why isn't this just
>>>
>>>  T __builtin_load_no_speculate (T *);
>>>
>>> ?  Why that "fallback" and why the lower/upper bounds?
>>
>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
>> to restrict subsequent speculation, the CSEL needs a condition and the
>> compiler must not be able to optimize it away based on logical
>> reachability.  The fallback is used for the other operand of the CSEL
>> instruction.
> 
> But the condition could be just 'true' in the instruction encoding?  That is,
> why do you do both the jump-around and the csel?  Maybe I misunderstood
> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> the compiler messing up things.
> 

That's why the intrinsic takes explicit bounds not a simple bool
expressing the conditions.  It prevents the optimizers from replacing
the condition by value range propagation.  That does restrict the
flexibility of the builtin somewhat but it does allow it to work
correctly at all optimization levels.

>>>
>>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>>
>> As stated we'll shortly be submitting similar patches for LLVM.
> 
> How do you solve the aggregate load issue?  I would imagine
> that speculating stores (by pre-fetching the affected cache line!)
> could be another attack vector?  Not sure if any CPU speculatively
> does that (but I see no good reason why a CPU shouldn't do that).
> 
> Does ARM have a barrier like instruction suitable for use in the
> kernel patches that are floating around?
> 
> Richard.
> 
>> R.
>>
>>>
>>> Richard.
>>>
 Some optimizations are permitted to make the builtin easier to use.
 The final two arguments can both be omitted (c++ style): failval will
 default to 0 in this case and if cmpptr is omitted ptr will be used
 for expansions of the range check.  In addition either lower or upper
 (but not both) may be a literal NULL and the expansion will then
 ignore that boundary condition when expanding.

 The patch set is constructed as follows:
 1 - generic modifications to GCC providing the builtin function for all
 architectures and expanding to an implementation that gives the
 logical 

Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Biener
On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
 wrote:
> On 05/01/18 09:44, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>  wrote:
>>>
>>> Recently, Google Project Zero disclosed several classes of attack
>>> against speculative execution. One of these, known as variant-1
>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>> speculation, providing an arbitrary read gadget. Further details can
>>> be found on the GPZ blog [1] and the documentation that is included
>>> with the first patch.
>>>
>>> This patch set adds a new builtin function for GCC to provide a
>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>> memory access.  I've tried to design this in such a way that it can be
>>> used for any target where this might be necessary.  The patch set
>>> provides a generic implementation of the builtin and then
>>> target-specific support for Arm and AArch64.  Other architectures can
>>> utilize the internal infrastructure as needed.
>>>
>>> Most of the details of the builtin and the hooks that need to be
>>> implemented to support it are described in the updates to the manual,
>>> but a short summary is given below.
>>>
>>> TYP __builtin_load_no_speculate
>>> (const volatile TYP *ptr,
>>>  const volatile void *lower,
>>>  const volatile void *upper,
>>>  TYP failval,
>>>  const volatile void *cmpptr)
>>>
>>> Where TYP can be any integral type (signed or unsigned char, int,
>>> short, long, etc) or any pointer type.
>>>
>>> The builtin implements the following logical behaviour:
>>>
>>> inline TYP __builtin_load_no_speculate
>>>  (const volatile TYP *ptr,
>>>   const volatile void *lower,
>>>   const volatile void *upper,
>>>   TYP failval,
>>>   const volatile void *cmpptr)
>>> {
>>>   TYP result;
>>>
>>>   if (cmpptr >= lower && cmpptr < upper)
>>> result = *ptr;
>>>   else
>>> result = failval;
>>>   return result;
>>> }
>>>
>>> in addition the specification of the builtin ensures that future
>>> speculation using *ptr may only continue iff cmpptr lies within the
>>> bounds specified.
>>
>> I fail to see how the vulnerability doesn't affect aggregate copies
>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>> value to leak but (part of) the address by populating the CPU cache
>> side-channel.
>>
>
> It's not quite as simple as that.  You'll need to read the white paper
> on Arm's web site to get a full grasp of this (linked from
> https://www.arm.com/security-update).
>
>> So why isn't this just
>>
>>  T __builtin_load_no_speculate (T *);
>>
>> ?  Why that "fallback" and why the lower/upper bounds?
>
> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
> to restrict subsequent speculation, the CSEL needs a condition and the
> compiler must not be able to optimize it away based on logical
> reachability.  The fallback is used for the other operand of the CSEL
> instruction.

But the condition could be just 'true' in the instruction encoding?  That is,
why do you do both the jump-around and the csel?  Maybe I misunderstood
the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
the compiler messing up things.

>>
>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>
> As stated we'll shortly be submitting similar patches for LLVM.

How do you solve the aggregate load issue?  I would imagine
that speculating stores (by pre-fetching the affected cache line!)
could be another attack vector?  Not sure if any CPU speculatively
does that (but I see no good reason why a CPU shouldn't do that).

Does ARM have a barrier like instruction suitable for use in the
kernel patches that are floating around?

Richard.

> R.
>
>>
>> Richard.
>>
>>> Some optimizations are permitted to make the builtin easier to use.
>>> The final two arguments can both be omitted (c++ style): failval will
>>> default to 0 in this case and if cmpptr is omitted ptr will be used
>>> for expansions of the range check.  In addition either lower or upper
>>> (but not both) may be a literal NULL and the expansion will then
>>> ignore that boundary condition when expanding.
>>>
>>> The patch set is constructed as follows:
>>> 1 - generic modifications to GCC providing the builtin function for all
>>> architectures and expanding to an implementation that gives the
>>> logical behaviour of the builtin only.  A warning is generated if
>>> this expansion path is used that code will execute correctly but
>>> without providing protection against speculative use.
>>> 2 - AArch64 support
>>> 3 - AArch32 support (arm) for A32 and thumb2 states.
>>>
>>> These patches can be used with the header file that Arm recently
>>> published here: https://github.com/ARM-software/speculation-barrier.
>>>
>>> Kernel patches are also being developed, eg:
>>> 

Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 09:44, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>  wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
>>
>> This patch set adds a new builtin function for GCC to provide a
>> mechanism for limiting speculation by a CPU after a bounds-checked
>> memory access.  I've tried to design this in such a way that it can be
>> used for any target where this might be necessary.  The patch set
>> provides a generic implementation of the builtin and then
>> target-specific support for Arm and AArch64.  Other architectures can
>> utilize the internal infrastructure as needed.
>>
>> Most of the details of the builtin and the hooks that need to be
>> implemented to support it are described in the updates to the manual,
>> but a short summary is given below.
>>
>> TYP __builtin_load_no_speculate
>> (const volatile TYP *ptr,
>>  const volatile void *lower,
>>  const volatile void *upper,
>>  TYP failval,
>>  const volatile void *cmpptr)
>>
>> Where TYP can be any integral type (signed or unsigned char, int,
>> short, long, etc) or any pointer type.
>>
>> The builtin implements the following logical behaviour:
>>
>> inline TYP __builtin_load_no_speculate
>>  (const volatile TYP *ptr,
>>   const volatile void *lower,
>>   const volatile void *upper,
>>   TYP failval,
>>   const volatile void *cmpptr)
>> {
>>   TYP result;
>>
>>   if (cmpptr >= lower && cmpptr < upper)
>> result = *ptr;
>>   else
>> result = failval;
>>   return result;
>> }
>>
>> in addition the specification of the builtin ensures that future
>> speculation using *ptr may only continue iff cmpptr lies within the
>> bounds specified.
> 
> I fail to see how the vulnerability doesn't affect aggregate copies
> or bitfield accesses.  The vulnerability doesn't cause the loaded
> value to leak but (part of) the address by populating the CPU cache
> side-channel.
> 

It's not quite as simple as that.  You'll need to read the white paper
on Arm's web site to get a full grasp of this (linked from
https://www.arm.com/security-update).

> So why isn't this just
> 
>  T __builtin_load_no_speculate (T *);
> 
> ?  Why that "fallback" and why the lower/upper bounds?

Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
to restrict subsequent speculation, the CSEL needs a condition and the
compiler must not be able to optimize it away based on logical
reachability.  The fallback is used for the other operand of the CSEL
instruction.

> 
> Did you talk with other compiler vendors (intel, llvm, ...?)?

As stated we'll shortly be submitting similar patches for LLVM.

R.

> 
> Richard.
> 
>> Some optimizations are permitted to make the builtin easier to use.
>> The final two arguments can both be omitted (c++ style): failval will
>> default to 0 in this case and if cmpptr is omitted ptr will be used
>> for expansions of the range check.  In addition either lower or upper
>> (but not both) may be a literal NULL and the expansion will then
>> ignore that boundary condition when expanding.
>>
>> The patch set is constructed as follows:
>> 1 - generic modifications to GCC providing the builtin function for all
>> architectures and expanding to an implementation that gives the
>> logical behaviour of the builtin only.  A warning is generated if
>> this expansion path is used that code will execute correctly but
>> without providing protection against speculative use.
>> 2 - AArch64 support
>> 3 - AArch32 support (arm) for A32 and thumb2 states.
>>
>> These patches can be used with the header file that Arm recently
>> published here: https://github.com/ARM-software/speculation-barrier.
>>
>> Kernel patches are also being developed, eg:
>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
>> code like this will be able to use support directly from the compiler
>> in a portable manner.
>>
>> Similar patches are also being developed for LLVM and will be posted
>> to their development lists shortly.
>>
>> [1] More information on the topic can be found here:
>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
>> Arm specific information can be found here: 
>> https://www.arm.com/security-update
>>
>>
>>
>> Richard Earnshaw (3):
>>   [builtins] Generic support for __builtin_load_no_speculate()
>>   [aarch64] Implement support for __builtin_load_no_speculate.
>>   [arm] Implement support for the de-speculation intrinsic
>>
>>  gcc/builtin-types.def |  16 +
>>  gcc/builtins.c|  99 

Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Biener
On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
 wrote:
>
> Recently, Google Project Zero disclosed several classes of attack
> against speculative execution. One of these, known as variant-1
> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
> speculation, providing an arbitrary read gadget. Further details can
> be found on the GPZ blog [1] and the documentation that is included
> with the first patch.
>
> This patch set adds a new builtin function for GCC to provide a
> mechanism for limiting speculation by a CPU after a bounds-checked
> memory access.  I've tried to design this in such a way that it can be
> used for any target where this might be necessary.  The patch set
> provides a generic implementation of the builtin and then
> target-specific support for Arm and AArch64.  Other architectures can
> utilize the internal infrastructure as needed.
>
> Most of the details of the builtin and the hooks that need to be
> implemented to support it are described in the updates to the manual,
> but a short summary is given below.
>
> TYP __builtin_load_no_speculate
> (const volatile TYP *ptr,
>  const volatile void *lower,
>  const volatile void *upper,
>  TYP failval,
>  const volatile void *cmpptr)
>
> Where TYP can be any integral type (signed or unsigned char, int,
> short, long, etc) or any pointer type.
>
> The builtin implements the following logical behaviour:
>
> inline TYP __builtin_load_no_speculate
>  (const volatile TYP *ptr,
>   const volatile void *lower,
>   const volatile void *upper,
>   TYP failval,
>   const volatile void *cmpptr)
> {
>   TYP result;
>
>   if (cmpptr >= lower && cmpptr < upper)
> result = *ptr;
>   else
> result = failval;
>   return result;
> }
>
> in addition the specification of the builtin ensures that future
> speculation using *ptr may only continue iff cmpptr lies within the
> bounds specified.

I fail to see how the vulnerability doesn't affect aggregate copies
or bitfield accesses.  The vulnerability doesn't cause the loaded
value to leak but (part of) the address by populating the CPU cache
side-channel.

So why isn't this just

 T __builtin_load_no_speculate (T *);

?  Why that "fallback" and why the lower/upper bounds?

Did you talk with other compiler vendors (intel, llvm, ...?)?

Richard.

> Some optimizations are permitted to make the builtin easier to use.
> The final two arguments can both be omitted (c++ style): failval will
> default to 0 in this case and if cmpptr is omitted ptr will be used
> for expansions of the range check.  In addition either lower or upper
> (but not both) may be a literal NULL and the expansion will then
> ignore that boundary condition when expanding.
>
> The patch set is constructed as follows:
> 1 - generic modifications to GCC providing the builtin function for all
> architectures and expanding to an implementation that gives the
> logical behaviour of the builtin only.  A warning is generated if
> this expansion path is used that code will execute correctly but
> without providing protection against speculative use.
> 2 - AArch64 support
> 3 - AArch32 support (arm) for A32 and thumb2 states.
>
> These patches can be used with the header file that Arm recently
> published here: https://github.com/ARM-software/speculation-barrier.
>
> Kernel patches are also being developed, eg:
> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
> code like this will be able to use support directly from the compiler
> in a portable manner.
>
> Similar patches are also being developed for LLVM and will be posted
> to their development lists shortly.
>
> [1] More information on the topic can be found here:
> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
> Arm specific information can be found here: 
> https://www.arm.com/security-update
>
>
>
> Richard Earnshaw (3):
>   [builtins] Generic support for __builtin_load_no_speculate()
>   [aarch64] Implement support for __builtin_load_no_speculate.
>   [arm] Implement support for the de-speculation intrinsic
>
>  gcc/builtin-types.def |  16 +
>  gcc/builtins.c|  99 +
>  gcc/builtins.def  |  22 ++
>  gcc/c-family/c-common.c   | 164 
> ++
>  gcc/c-family/c-cppbuiltin.c   |   5 +-
>  gcc/config/aarch64/aarch64.c  |  92 
>  gcc/config/aarch64/aarch64.md |  28 
>  gcc/config/arm/arm.c  | 107 +++
>  gcc/config/arm/arm.md |  40 ++-
>  gcc/config/arm/unspecs.md |   1 +
>  gcc/doc/cpp.texi  |   4 ++
>  gcc/doc/extend.texi   |  53 ++
>  gcc/doc/tm.texi   |   6 ++
>  gcc/doc/tm.texi.in|   2 +
>  gcc/target.def|  20 ++
>  

Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-04 Thread Joseph Myers
On Thu, 4 Jan 2018, Richard Earnshaw wrote:

> 1 - generic modifications to GCC providing the builtin function for all
> architectures and expanding to an implementation that gives the
> logical behaviour of the builtin only.  A warning is generated if
> this expansion path is used that code will execute correctly but
> without providing protection against speculative use.

Presumably it would make sense to have a standard way for architectures 
with no speculative execution to just use the generic version, but without 
the warning?  E.g., split up default_inhibit_load_speculation so that it 
generates the warning then calls another function with the same prototype, 
so such architectures can just define the hook to use that other function?

-- 
Joseph S. Myers
jos...@codesourcery.com