Re: [RFC PATCH] x86/retpolines: Prevent speculation after RET
On 19/02/2021 08:15, Peter Zijlstra wrote: > On Thu, Feb 18, 2021 at 08:11:38PM +0100, Borislav Petkov wrote: >> On Thu, Feb 18, 2021 at 08:02:31PM +0100, Peter Zijlstra wrote: >>> On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote: Both vendors speculate after a near RET in some way: Intel: "Unlike near indirect CALL and near indirect JMP, the processor will not speculatively execute the next sequential instruction after a near RET unless that instruction is also the target of a jump or is a target in a branch predictor." >>> Right, the way I read that means it's not a problem for us here. >> Look at that other thread: the instruction *after* the RET can be >> speculatively executed if that instruction is the target of a jump or it >> is in a branch predictor. > Right, but that has nothing to do with the RET instruction itself. You > can speculatively execute any random instruction by training the BTB, > which is I suppose the entire point of things :-) > > So the way I read it is that: RET does not 'leak' speculation, but if > you target the instruction after RET with any other speculation crud, > ofcourse you can get it to 'run'. > > And until further clarified, I'll stick with that :-) https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf Final page, Mitigation G-5 Some parts (before Milan I believe that CPUID rule translates into) may speculatively execute the instructions sequentially following a call/jmp indirect or ret instruction. For Intel, its just call/jmp instructions. From SDM Vol2 for CALL (and similar for JMP) "Certain situations may lead to the next sequential instruction after a near indirect CALL being speculatively executed. If software needs to prevent this (e.g., in order to prevent a speculative execution side channel), then an LFENCE instruction opcode can be placed after the near indirect CALL in order to block speculative execution." In both cases, the reason LFENCE is given is for the CALL case, where there is sequential architectural execution. JMP and RET do not have architectural execution following them, so can use a shorter speculation blocker. When compiling with retpoline, all CALL/JMP indirects are removed, other than within the __x86_indirect_thunk_%reg blocks, and those can be fixed by hand. That just leaves RET speculation, which has no following architectural execution, at which point `ret; int3` is the shortest way of halting speculation, at half the size of `ret; lfence`. With a gcc toolchain, it does actually work if you macro 'ret' (and retl/q) to be .byte 0xc3, 0xcc, but this doesn't work for Clang IAS which refuses to macro real instructions. What would be massively helpful if is the toolchains could have their existing ARM straight-line-speculation support hooked up appropriately so we get some new code gen options on x86, and don't have to resort to the macro bodges above. ~Andrew
RE: [RFC PATCH] x86/retpolines: Prevent speculation after RET
From: Peter Zijlstra > Sent: 18 February 2021 19:03 > > On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote: > > Both vendors speculate after a near RET in some way: > > > > Intel: > > > > "Unlike near indirect CALL and near indirect JMP, the processor will not > > speculatively execute the next sequential instruction after a near RET > > unless that instruction is also the target of a jump or is a target in a > > branch predictor." > > Right, the way I read that means it's not a problem for us here. They got a lawyer to write that sentence :-) What on earth is that 'unless' clause about? Either: 1) The instructions might be speculatively executed for some entirely different reason. or: 2) The cpu might use the BTB to determine the instruction that follows the RET - and so might happen to execute the instruction that follows it. I can't manage to read it in any way that suggests that the cpu will ignore the fact it is a RET and start executing the instruction that follows. (Unlike some ARM cpus which do seem to do that.) > > AMD: > > > > "Some AMD processors when they first encounter a branch do not stall > > dispatch and use the branches dynamic execution to determine the target. > > Therefore, they will speculatively dispatch the sequential instructions > > after the branch. This happens for near return instructions where it is > > not clear what code may exist sequentially after the return instruction. Sounds like the conditional branch prediction (and the BTB?) get used for RET instructions when the 'return address stack' is invalid. > > This behavior also occurs with jmp/call instructions with indirect > > targets. Software should place a LFENCE or another dispatch serializing > > instruction after the return or jmp/call indirect instruction to prevent > > this sequential speculation." > > > > The AMD side doesn't really need the LFENCE because it'll do LFENCE; > > JMP/CALL due to X86_FEATURE_RETPOLINE_AMD, before it reaches > > the RET. > > It never reached the RET. > > So all in all, I really don't see why we'd need this. I read that as implying that some AMD cpu can sometimes treat the RET as a conditional branch and so speculatively assume it isn't taken. So you need an LFENCE (or ???) following the RET at the end of every function. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [RFC PATCH] x86/retpolines: Prevent speculation after RET
On Thu, Feb 18, 2021 at 08:11:38PM +0100, Borislav Petkov wrote: > On Thu, Feb 18, 2021 at 08:02:31PM +0100, Peter Zijlstra wrote: > > On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote: > > > Both vendors speculate after a near RET in some way: > > > > > > Intel: > > > > > > "Unlike near indirect CALL and near indirect JMP, the processor will not > > > speculatively execute the next sequential instruction after a near RET > > > unless that instruction is also the target of a jump or is a target in a > > > branch predictor." > > > > Right, the way I read that means it's not a problem for us here. > > Look at that other thread: the instruction *after* the RET can be > speculatively executed if that instruction is the target of a jump or it > is in a branch predictor. Right, but that has nothing to do with the RET instruction itself. You can speculatively execute any random instruction by training the BTB, which is I suppose the entire point of things :-) So the way I read it is that: RET does not 'leak' speculation, but if you target the instruction after RET with any other speculation crud, ofcourse you can get it to 'run'. And until further clarified, I'll stick with that :-)
Re: [RFC PATCH] x86/retpolines: Prevent speculation after RET
On Thu, Feb 18, 2021 at 08:02:31PM +0100, Peter Zijlstra wrote: > On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote: > > Both vendors speculate after a near RET in some way: > > > > Intel: > > > > "Unlike near indirect CALL and near indirect JMP, the processor will not > > speculatively execute the next sequential instruction after a near RET > > unless that instruction is also the target of a jump or is a target in a > > branch predictor." > > Right, the way I read that means it's not a problem for us here. Look at that other thread: the instruction *after* the RET can be speculatively executed if that instruction is the target of a jump or it is in a branch predictor. And yes, the text is confusing and no one from Intel has clarified definitively yet what that text means exactly. > Now, if AMD were to say something like: hey, that retpoline is pretty > awesome, we ought to use that instead of an uconditional LFENCE, then > sure, but as is, I don't think so. AMD prefers the LFENCE instead of the ratpoline sequence. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH] x86/retpolines: Prevent speculation after RET
On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote: > Both vendors speculate after a near RET in some way: > > Intel: > > "Unlike near indirect CALL and near indirect JMP, the processor will not > speculatively execute the next sequential instruction after a near RET > unless that instruction is also the target of a jump or is a target in a > branch predictor." Right, the way I read that means it's not a problem for us here. > AMD: > > "Some AMD processors when they first encounter a branch do not stall > dispatch and use the branches dynamic execution to determine the target. > Therefore, they will speculatively dispatch the sequential instructions > after the branch. This happens for near return instructions where it is > not clear what code may exist sequentially after the return instruction. > This behavior also occurs with jmp/call instructions with indirect > targets. Software should place a LFENCE or another dispatch serializing > instruction after the return or jmp/call indirect instruction to prevent > this sequential speculation." > > The AMD side doesn't really need the LFENCE because it'll do LFENCE; > JMP/CALL due to X86_FEATURE_RETPOLINE_AMD, before it reaches > the RET. It never reached the RET. So all in all, I really don't see why we'd need this. Now, if AMD were to say something like: hey, that retpoline is pretty awesome, we ought to use that instead of an uconditional LFENCE, then sure, but as is, I don't think so.