On Monday, February 22, 2021 at 7:30:31 AM UTC+2 jwkoz...@gmail.com wrote:

> I think I have an explanation of what is going on. Before I present it let 
> me recap the calling convention for aarch64:
>
> Caller:
>
>    1. If we need any of x0-x18 registers, save them. They are corruptible.
>    2. Move the first 8 parameters into registers x0-x7.
>    3. Push any additional parameters on the stack.
>    4. Use BL to call the function.
>    5. Evaluate the return code in x0.
>    6. Restore any of x0-x18 that we saved in step 1.
>
> Callee:
>
>    1. Push LR (x30) and x19-x29 onto the stack if used by this routine.
>    2. Do the work
>    3. Put return code in x0.
>    4. Pop LR and x19-x29 if pushed in step 1.
>    5. Use RET instruction to return execution to the caller (this will 
>    implicitly use LR (x30) as an address to return to).
>
> Now imagine the following scenario involving function F executing on 
> thread T1 that calls thread::yield() or another function calling yield():
>
>    1. Function F pushes one of the callee saved registers - x23 (just an 
>    example) - on the T1 stack becuase it uses it for something and it must do 
>    it per the calling convention.
>    2. Function F stores some value in x23.
>    3. Function F calls thread::yield() directly or indirectly.
>    4. Eventually, reschedule_from_interrupt() is called and it calls 
>    switch_to() to switch stack pointer to the new thread T2 stack. The debug 
>    version of  reschedule_from_interrupt() nor switch_to() stores x23 as they 
>    do not use this register (unlike the release version).
>    5. At some point, later reschedule_from_interrupt() is called again 
>    (not necessarily the very next time) and calls switch_to() that switches 
>    back to T1.
>    6. T1 resumes and eventually returns the control to the function F1 
>    right after it called yield().
>    7. The code in F1 after calling yield() reads the value of x23 ... and 
>    boom. The x23 quite likely contains garbage because it was never restored 
>    by F1 after calling yield() because per calling convention yield() or 
> other 
>    callees should have saved and restored. But it did not, did it? Or rather 
>    different routines on different threads running on the same cpu in between 
>    ruined it.
>
> Why does it all work with the release version? It does because the 
> reschedule_from_interrupt() compiled with -02 happens to use and save all 
> callee-saved registers x19-x28. So they happen to be restored to correct 
> values after the switch.
>
> So it seems that the right solution is to save and restore x19-x28 (callee 
> saved registers) in switch_to() like so:
>
> diff --git a/arch/aarch64/arch-switch.hh b/arch/aarch64/arch-switch.hh
> index dff7467c..45aff4a7 100644
> --- a/arch/aarch64/arch-switch.hh
> +++ b/arch/aarch64/arch-switch.hh
> @@ -27,6 +27,7 @@ void thread::switch_to()
>  
>      asm volatile("\n"
>                   "str x29,     %0  \n"
> +                "sub sp, sp, #0x50\n"
>                   "mov x2, sp       \n"
>                   "adr x1, 1f       \n" /* address of label */
>                   "stp x2, x1,  %1  \n"
> @@ -34,10 +35,23 @@ void thread::switch_to()
>                   "ldp x29, x0, %2  \n"
>                   "ldp x2, x1,  %3  \n"
>  
> +                "stp x19, x20, [sp, #0]\n"
> +                "stp x21, x22, [sp, #16]\n"
> +                "stp x23, x24, [sp, #32]\n"
> +                "stp x25, x26, [sp, #48]\n"
> +                "stp x27, x28, [sp, #64]\n"
> +
>                   "mov sp, x2       \n"
>                   "blr x1           \n"
>  
>                   "1:               \n" /* label */
> +
> +                "ldp x19, x20, [sp, #0]\n"
> +                "ldp x21, x22, [sp, #16]\n"
> +                "ldp x23, x24, [sp, #32]\n"
> +                "ldp x25, x26, [sp, #48]\n"
> +                "ldp x27, x28, [sp, #64]\n"
> +                "add sp, sp, #0x50\n"
>                   :
>                   : "Q"(old->_state.fp), "Ump"(old->_state.sp),
>                     "Ump"(this->_state.fp), "Ump"(this->_state.sp)
>
> And indeed the crashes in both -00 and -O1 go away.
>
> Does my explanation have holes? Or am I completely wrong?
>

I think you're completely right, well spotted.

I think you're completely right. Here's the equivalent from Linux.

Here's the Linux equivalent:

SYM_FUNC_START(cpu_switch_to)
        mov     x10, #THREAD_CPU_CONTEXT
        add     x8, x0, x10
        mov     x9, sp
        stp     x19, x20, [x8], #16             // store callee-saved 
registers
        stp     x21, x22, [x8], #16
        stp     x23, x24, [x8], #16
        stp     x25, x26, [x8], #16
        stp     x27, x28, [x8], #16
        stp     x29, x9, [x8], #16
        str     lr, [x8]
        add     x8, x1, x10
        ldp     x19, x20, [x8], #16             // restore callee-saved 
registers
        ldp     x21, x22, [x8], #16
        ldp     x23, x24, [x8], #16
        ldp     x25, x26, [x8], #16
        ldp     x27, x28, [x8], #16
        ldp     x29, x9, [x8], #16
        ldr     lr, [x8]
        mov     sp, x9
        msr     sp_el0, x1
        ptrauth_keys_install_kernel x1, x8, x9, x10
        scs_save x0, x8
        scs_load x1, x8
        ret
SYM_FUNC_END(cpu_switch_to)
NOKPROBE(cpu_switch_to)


 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/a6d7b7a4-4757-4959-9c6b-1e63c24997den%40googlegroups.com.

Reply via email to