> Date: Sat, 22 Jul 2017 02:25:29 -0400
> From: Todd Mortimer
>
> Hello tech,
>
> I've been working on llvm/clang some lately, and am experimenting with
> the llvm Pass infrastructure. Passes essentially let you perform
> arbitrary transforms on the program at various points in the compilation
> process.
>
> I've attached a patch that defines a machine function pass that adds:
>
> xor [rsp], rsp
>
> at the start of each function, and before every RET. This means that the
> return pointer on the stack is xored with the stack address where it
> happens to be. When the function returns, the same transform is applied
> again to reverse the process. The consequences of doing this are (a) The
> return address is obfuscated on the stack, which mitigates against some
> info leaks and (b) All the RETs in the program become hard to use in ROP
> gadgets, because the return address is permuted before being popped.
> ASLR on the stack makes predicting stack addresses difficult, so if an
> attacker is dumping a ROP chain onto the stack that uses these gadgets,
> they will need to know the addresses they are writing to on the stack in
> order for them to work, which adds an extra hurdle.
>
> The performance cost here is 2 extra instructions per function call. I
> picked rsp to xor with the return address because it is cheap to use,
> non-constant, and makes the transform simple and easy to reason about. I
> am happy to bikeshed about better choices if people are interested.
>
> I've done some light testing on this, and it seems to work on my test
> programs and when I did it to my libc. I am not really sure if this is
> interesting enough to warrant more work, but I figured there isn't any
> harm in posting it up. So I'm not looking for okays, but figured it
> might be interesting to some others on the list.
Cool stuff. The downside is that this probably kills doing
backtraces, making debugging stuff hard. Unless this also changes the
DWARF debugging information to reflect the xor operation. But I'm not
sure that's possible.
Having a "constant" (per process) cookie would make things easier for
debuggers, but also weaken the mechanism. It might be possible to
have the kernel initialize such a cookie in the TIB (thread
information block) that is used for fast access to thread-local
variables.