The AMD64 ABI allows a function to use 128 bytes beyond the bottom of the stack without updating the stack pointer - as an optimization known as the "red zone". If this function calls another function, it is supposed to update the stack pointer first; But with the SYSCALL instruction, it is not aware it ends up calling a function.
So our implementation of syscall needs to skip 128 bytes of the stack, which may be used by the caller's red zone, before making any use of the stack. I haven't been able to reproduce a failure without this patch, but I do believe it is necessary. Signed-off-by: Nadav Har'El <[email protected]> --- arch/x64/entry.S | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x64/entry.S b/arch/x64/entry.S index 48a0a71..cdd2c61 100644 --- a/arch/x64/entry.S +++ b/arch/x64/entry.S @@ -166,6 +166,10 @@ syscall_entry: .cfi_startproc simple # There is no ring transition and rflags are left unchanged. + # Skip the "red zone" allowed by the AMD64 ABI (the caller used a + # SYSCALL instruction and doesn't know he called a function): + subq $128, %rsp + # We need to save and restore the caller's %rbp anyway, so let's also # set it up properly for old-style frame-pointer backtracing to work # (e.g., backtrace_safe()). Also need to push the return address before @@ -279,6 +283,8 @@ syscall_entry: popq %rbp popq %rcx + addq $128, %rsp # undo red-zone skip + # jump to rcx where the syscall instruction put rip # (sysret would leave rxc cloberred so we have nothing to do to restore it) jmpq *%rcx -- 2.7.4 -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
