On Wednesday, March 3, 2021 at 4:57:18 PM UTC-5 Waldek Kozaczuk wrote:
> As the issue #1124 and the thread > https://groups.google.com/g/osv-dev/c/ystZElFNdBI > on the mailing list explains, some tests in the aarch64 debug build > fail in a consistent manner with page faults suggesting that possibly > the values of parameters passed from function to function get somehow > currupt. After painful research, I pin-pointed the crashes to the > reschedule_from_interrupt() method that does not work correctly in > voluntary (non-preemptive) context switch scenarios where > thread::yield() method is called by an app. It turns out that in those > situations so called callee-save registers used by an app function > calling thread::yield() are not saved during context switch and then > later not restored when such thread gets resumed later. Following steps > illustrate such scenario: > > 1. Function F pushes one of the callee saved registers - x23 (just an > example) - on the T1 > stack becuase it uses it for something and it must do it per the calling > convention. > 2. Function F stores some value in x23. > 3. Function F calls thread::yield() directly or indirectly. > 4. Eventually, reschedule_from_interrupt() is called and it calls > switch_to() to switch > stack pointer to the new thread T2 stack. The debug version of neither > reschedule_from_interrupt() > nor switch_to() stores x23 as they do not use this register (unlike the > release version). > 5. At some point, later reschedule_from_interrupt() is called again (not > necessarily the very next time) > and calls switch_to() that switches back to T1. > 6. T1 resumes and eventually returns the control to the function F1 right > after it called yield(). > 7. The code in F1 after calling yield() reads the value of x23 ... and > boom. The x23 quite likely > contains garbage because it was never restored by F1 after calling yield() > because per calling > convention yield() or other callees should have saved and restored it. But > they did not, did they? > Or rather different routines on different threads running on the same cpu > in between ruined it. > > One way to solve this problem would be to add all callee-saved registers > (x19-x28 and d8-d15 per > https://github.com/ARM-software/abi-aa/blob/f52e1ad3f81254497a83578dc102f6aac89e52d0/aapcs64/aapcs64.rst > > - > see "6.1.1 General-purpose Registers" and "6.1.2 SIMD and Floating-Point > Registers" about v8-v15 registers) > to the list of clobberred registers of the inline assembly in > thread::switch_to() in arch/aarch/arch-switch.hh. But this does not > always work as some versions of gcc (see > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99342) depending on > the optimization level (-O0, -O2, etc) inline switch_to() into > reschedule_from_interrupt or not and generate machine code that does not > do > what our intention is. On top of that, the code in > reschedule_from_interrupt() > that destroys any terminated thread on the current cpu uses the thread > local memory (TLS) pointer register - tpidr_el0 - which is not re-read > after the switch_to() and causes interesting crashes as the issue #1121 > describes. > > Given all above and the fact that reschedule_from_interrupt() calls > switch_to() to switch threads which is very tricky to do right using > pure C++ and some inline assembly, this patch implements high-level > version of reschedule_from_interrupt() in assembly for aarch64 (the x64 > stays as is). > > This patch modifies include/osv/sched.hh and core/sched.cc with number > of `#ifdef AARCH64_PORT_STUB` to keep reschedule_from_interrupt() as is > for x64 and split it into 2 new methods - schedule_next_thread() and > destroy_current_cpu_terminating_thread() for aarch64. The two new > functions correspond to the parts of the reschedule_from_interrupt() up > to the switch_to() call and after. Please note that > schedule_next_thread() returns simple 2-field struct with pointers to > old and new thread state (pc,sp,tcb,etc) that gets used by > reschedule_from_interrupt. > > On top of this, this patch adds arch/aarch64/sched.S which implements > reschedule_from_interrupt() as a C function in assembly. The > reschedule_from_interrupt > in essence calls new schedule_next_thread(), saves all callee-saves > registers on old thread stack, executes a context switch as the old > switch_to() > did, restores the callee-saves registers from new thread stack and > finally calls destroy_current_cpu_terminating_thread(). > > This patch has been tested on with the -O0, -O1, -O2 optimization levels > (release, debug 1 and 2) on real ARM hardware and emulated TGI mode. > > Please note that the new code makes the context switch a little bit more > expensive (~3%) per this numbers: > > BEFORE: > taskset -c 2-5 ./scripts/run.py -c 1 -e '/tests/misc-ctxsw.so' > OSv v0.55.0-204-g837a9b5d > getauxval() stubbed > eth0: 192.168.122.15 > Booted up in 58.19 ms > Cmdline: /tests/misc-ctxsw.so > 664 colocated > 665 apart > 665 nopin > > AFTER: > taskset -c 2-5 ./scripts/run.py -c 1 -e '/tests/misc-ctxsw.so' > OSv v0.55.0-205-g7e7c49f7 > getauxval() stubbed > eth0: 192.168.122.15 > Booted up in 58.20 ms > Cmdline: /tests/misc-ctxsw.so > 683 colocated > 690 apart > 690 nopin > > Fixes #1121 > Fixes #1124 > > Signed-off-by: Waldemar Kozaczuk <[email protected]> > --- > Makefile | 1 + > arch/aarch64/arch-switch.hh | 38 +++-------------- > arch/aarch64/arch-thread-state.hh | 1 + > arch/aarch64/sched.S | 70 +++++++++++++++++++++++++++++++ > core/sched.cc | 63 ++++++++++++++++++++++++++-- > include/osv/sched.hh | 34 ++++++++++++++- > 6 files changed, 170 insertions(+), 37 deletions(-) > create mode 100644 arch/aarch64/sched.S > > diff --git a/Makefile b/Makefile > index bbc57252..e1866c3b 100644 > --- a/Makefile > +++ b/Makefile > @@ -878,6 +878,7 @@ objects += arch/$(arch)/memset.o > objects += arch/$(arch)/memcpy.o > objects += arch/$(arch)/memmove.o > objects += arch/$(arch)/tlsdesc.o > +objects += arch/$(arch)/sched.o > objects += $(libfdt) > endif > > diff --git a/arch/aarch64/arch-switch.hh b/arch/aarch64/arch-switch.hh > index dff7467c..1a78d132 100644 > --- a/arch/aarch64/arch-switch.hh > +++ b/arch/aarch64/arch-switch.hh > @@ -20,32 +20,6 @@ void thread_main_c(sched::thread* t); > > namespace sched { > > -void thread::switch_to() > -{ > - thread* old = current(); > - asm volatile ("msr tpidr_el0, %0; isb; " :: "r"(_tcb) : "memory"); > - > - asm volatile("\n" > - "str x29, %0 \n" > - "mov x2, sp \n" > - "adr x1, 1f \n" /* address of label */ > - "stp x2, x1, %1 \n" > - > - "ldp x29, x0, %2 \n" > - "ldp x2, x1, %3 \n" > - > - "mov sp, x2 \n" > - "blr x1 \n" > - > - "1: \n" /* label */ > - : > - : "Q"(old->_state.fp), "Ump"(old->_state.sp), > - "Ump"(this->_state.fp), "Ump"(this->_state.sp) > - : "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", > - "x9", "x10", "x11", "x12", "x13", "x14", "x15", > - "x16", "x17", "x18", "x30", "memory"); > -} > - > void thread::switch_to_first() > { > asm volatile ("msr tpidr_el0, %0; isb; " :: "r"(_tcb) : "memory"); > @@ -59,14 +33,13 @@ void thread::switch_to_first() > > asm volatile("\n" > "ldp x29, x0, %0 \n" > - "ldp x2, x1, %1 \n" > - "mov sp, x2 \n" > - "blr x1 \n" > + "ldp x22, x21, %1 \n" > + "mov sp, x22 \n" > + "blr x21 \n" > : > : "Ump"(this->_state.fp), "Ump"(this->_state.sp) > - : "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", > - "x9", "x10", "x11", "x12", "x13", "x14", "x15", > - "x16", "x17", "x18", "x30", "memory"); > + : "x0", "x19", "x20", "x21", "x22", "x23", "x24", > + "x25", "x26", "x27", "x28", "x29", "x30", "memory"); > } > > void thread::init_stack() > @@ -150,6 +123,7 @@ void thread::setup_tcb() > void* p = aligned_alloc(64, total_tls_size + sizeof(*_tcb)); > _tcb = (thread_control_block *)p; > _tcb[0].tls_base = &_tcb[1]; > + _state.tcb = p; > // > // First goes kernel TLS data > auto kernel_tls = _tcb[0].tls_base; > diff --git a/arch/aarch64/arch-thread-state.hh > b/arch/aarch64/arch-thread-state.hh > index af424472..6f1b680d 100644 > --- a/arch/aarch64/arch-thread-state.hh > +++ b/arch/aarch64/arch-thread-state.hh > @@ -14,6 +14,7 @@ struct thread_state { > > void* sp; > void* pc; > + void* tcb; > }; > > #endif /* ARCH_THREAD_STATE_HH_ */ > diff --git a/arch/aarch64/sched.S b/arch/aarch64/sched.S > new file mode 100644 > index 00000000..1e69f8fd > --- /dev/null > +++ b/arch/aarch64/sched.S > @@ -0,0 +1,70 @@ > +#void cpu::reschedule_from_interrupt(bool called_from_yield, > +# thread_runtime::duration preempt_after) > +.global reschedule_from_interrupt > +.type reschedule_from_interrupt, @function > +reschedule_from_interrupt: > + #.cfi_startproc > + stp x29, x30, [sp, #-176]! > + #.cfi_adjust_cfa_offset 176 > + mov x29, sp > + #.cfi_def_cfa_register x29 > + > + #Call cpu_schedule_next_thread() to determine next thread to switch to > if any > + bl cpu_schedule_next_thread > + > + #The cpu_schedule_next_thread returns thread_switch_data in x0 and x1, > + #where x0 holds old_thread_state and x1 holds new_thread_state > + #If cpu_schedule_next_thread() returns thread_switch_data with null > pointers, exit > + cmp x0, #0 > + b.eq 2f > + > + #Store all callee-save registers on the old thread stack > + stp x19, x20, [sp, #16] > + stp x21, x22, [sp, #32] > + stp x23, x24, [sp, #48] > + stp x25, x26, [sp, #64] > + stp x27, x28, [sp, #80] > + > + stp d8, d9, [sp, #96] > + stp d10, d11, [sp, #112] > + stp d12, d13, [sp, #128] > + stp d14, d15, [sp, #144] > + > + #Switch to new thread > + ldr x2, [x1, #32] //Fetch _tcb of the new thread > + msr tpidr_el0, x2 //Set thread pointer > + isb > + > + str x29, [x0, #0] //Save frame pointer of the old thread > + mov x3, sp //Fetch old thread stack pointer > + adr x4, 1f //Fetch old thread instruction point > + stp x3, x4, [x0, #16] //Save old thread sp and pc > + > + ldp x29, x0, [x1, #0] //Set frame pointer of the new thread and this > (x0) of the new thread > + //Please note that the pc may point to thread_main_c(thread*) which is > + //why we have to set x0 (1st argument) to new thread object > + ldp x3, x4, [x1, #16] //Fetch new thread sp and pc > + mov sp, x3 //Set new thread stack pointer > + blr x4 //Jump to the new thread pc > + > +1: > + #Restore all callee-save registers from the new thread stack > + ldp x19, x20, [sp, #16] > + ldp x21, x22, [sp, #32] > + ldp x23, x24, [sp, #48] > + ldp x25, x26, [sp, #64] > + ldp x27, x28, [sp, #80] > + > + ldp d8, d9, [sp, #96] > + ldp d10, d11, [sp, #112] > + ldp d12, d13, [sp, #128] > + ldp d14, d15, [sp, #144] I think that one more register needs to be saved and restored during context switch - FPCR (Floating Point Control Register): "The FPSR is a status register that holds the cumulative exception bits of the floating-point unit. It contains the fields IDC, IXC, UFC, OFC, DZC, IOC and QC. These fields are not preserved across a public interface and may have any value on entry to a subroutine. The FPCR is used to control the behavior of the floating-point unit. It is a global register with the following properties. - *The exception-control bits (8-12), rounding mode bits (22-23) and flush-to-zero bits (24) may be modified by calls to specific support functions that affect the global state of the application.* - All other bits are reserved and must not be modified. It is not defined whether the bits read as zero or one, or whether they are preserved across a public interface." Saving and restoring it makes tst-fpu.cc work now for aarch64 where yield() is used. I do not think we need to save/restore FPSR - status register. > + > + #Call destroy_current_cpu_terminating_thread() > + bl destroy_current_cpu_terminating_thread > + > +2: > + ldp x29, x30, [sp], #176 > + #.cfi_def_cfa sp, 0 > + ret > + #.cfi_endproc > diff --git a/core/sched.cc b/core/sched.cc > index 06f849d1..74b4ab95 100644 > --- a/core/sched.cc > +++ b/core/sched.cc > @@ -225,13 +225,34 @@ void thread::cputime_estimator_get( > void cpu::schedule() > { > WITH_LOCK(irq_lock) { > +#ifdef AARCH64_PORT_STUB > + reschedule_from_interrupt(sched::cpu::current(), false, thyst); > +#else > current()->reschedule_from_interrupt(); > - } > -} > - > +#endif > + } > +} > + > +// In aarch64 port, the reschedule_from_interrupt() needs to be > implemented > +// in assembly (please see arch/aarch64/sched.S) to give us better > control > +// which registers are used and which ones are saved and restored during > +// the context switch. In essence, most of the original > reschedule_from_interrupt() > +// code up to switch_to() is shared with aarch64-specific > cpu::schedule_next_thread() > +// that is called from reschedule_from_interrupt() in arch/arch64/sched.S > assembly. > +// The logic executed after switch_to() that makes up > destroy_current_cpu_terminating_thread() > +// is called from arch/arch64/sched.S as well. > +// At the end, we define reschedule_from_interrupt() in C++ for x86_64 > and schedule_next_thread() > +// and destroy_current_cpu_terminating_thread() in C++ for aarch64. > +#ifdef AARCH64_PORT_STUB > +inline thread_switch_data cpu::schedule_next_thread(bool > called_from_yield, > + thread_runtime::duration preempt_after) > +{ > + thread_switch_data switch_data; > +#else > void cpu::reschedule_from_interrupt(bool called_from_yield, > thread_runtime::duration preempt_after) > { > +#endif > trace_sched_sched(); > assert(sched::exception_depth <= 1); > need_reschedule = false; > @@ -259,7 +280,11 @@ void cpu::reschedule_from_interrupt(bool > called_from_yield, > // lowest runtime, and update the timer until the next thread's turn. > if (runqueue.empty()) { > preemption_timer.cancel(); > +#ifdef AARCH64_PORT_STUB > + return switch_data; > +#else > return; > +#endif > } else if (!called_from_yield) { > auto &t = *runqueue.begin(); > if (p->_runtime.get_local() < t._runtime.get_local()) { > @@ -268,7 +293,11 @@ void cpu::reschedule_from_interrupt(bool > called_from_yield, > if (delta > 0) { > preemption_timer.set(now + delta); > } > +#ifdef AARCH64_PORT_STUB > + return switch_data; > +#else > return; > +#endif > } > } > // If we're here, p no longer has the lowest runtime. Before queuing > @@ -336,19 +365,39 @@ void cpu::reschedule_from_interrupt(bool > called_from_yield, > if (lazy_flush_tlb.exchange(false, std::memory_order_seq_cst)) { > mmu::flush_tlb_local(); > } > +#ifdef AARCH64_PORT_STUB > + switch_data.old_thread_state = &(p->_state); > + switch_data.new_thread_state = &(n->_state); > + return switch_data; > +} > +#else > n->switch_to(); > +#endif > > // Note: after the call to n->switch_to(), we should no longer use any of > // the local variables, nor "this" object, because we just switched to n's > // stack and the values we can access now are those that existed in the > // reschedule call which scheduled n out, and will now be returning. > // So to get the current cpu, we must use cpu::current(), not "this". > +#ifdef AARCH64_PORT_STUB > +extern "C" void destroy_current_cpu_terminating_thread() > +{ > +#endif > if (cpu::current()->terminating_thread) { > cpu::current()->terminating_thread->destroy(); > cpu::current()->terminating_thread = nullptr; > } > } > > +#ifdef AARCH64_PORT_STUB > +extern "C" thread_switch_data cpu_schedule_next_thread(cpu* cpu, > + bool called_from_yield, > + thread_runtime::duration preempt_after) > +{ > + return cpu->schedule_next_thread(called_from_yield, preempt_after); > +} > +#endif > + > void cpu::timer_fired() > { > // nothing to do, preemption will happen if needed > @@ -521,7 +570,11 @@ void thread::pin(cpu *target_cpu) > // Note that wakeme is on the same CPU, and irq is disabled, > // so it will not actually run until we stop running. > wakeme->wake_with([&] { do_wakeme = true; }); > +#ifdef AARCH64_PORT_STUB > + reschedule_from_interrupt(source_cpu, false, thyst); > +#else > source_cpu->reschedule_from_interrupt(); > +#endif > } > // wakeme will be implicitly join()ed here. > } > @@ -760,7 +813,11 @@ void thread::yield(thread_runtime::duration > preempt_after) > } > trace_sched_yield_switch(); > > +#ifdef AARCH64_PORT_STUB > + reschedule_from_interrupt(cpu::current(), true, preempt_after); > +#else > cpu::current()->reschedule_from_interrupt(true, preempt_after); > +#endif > } > > void thread::set_priority(float priority) > diff --git a/include/osv/sched.hh b/include/osv/sched.hh > index 0fb44f77..1dacd623 100644 > --- a/include/osv/sched.hh > +++ b/include/osv/sched.hh > @@ -31,6 +31,9 @@ typedef float runtime_t; > > extern "C" { > void smp_main(); > +#ifdef AARCH64_PORT_STUB > +void destroy_current_cpu_terminating_thread(); > +#endif > }; > void smp_launch(); > > @@ -321,7 +324,12 @@ constexpr thread_runtime::duration thyst = > std::chrono::milliseconds(5); > constexpr thread_runtime::duration context_switch_penalty = > std::chrono::microseconds(10); > > - > +#ifdef AARCH64_PORT_STUB > +struct thread_switch_data { > + thread_state* old_thread_state = nullptr; > + thread_state* new_thread_state = nullptr; > +}; > +#endif > /** > * OSv thread > */ > @@ -600,7 +608,9 @@ private: > unsigned allowed_initial_states_mask = 1 << unsigned(status::waiting)); > static void sleep_impl(timer &tmr); > void main(); > +#ifndef AARCH64_PORT_STUB > void switch_to(); > +#endif > void switch_to_first(); > void prepare_wait(); > void wait(); > @@ -702,7 +712,12 @@ private: > std::vector<char*> _tls; > bool _app; > std::shared_ptr<osv::application_runtime> _app_runtime; > +public: //TODO: For whatever reason we need to make this public even > though destroy_current_cpu_terminating_thread is defined as friend below, > so compiler still complains. > void destroy(); > +private: > +#ifdef AARCH64_PORT_STUB > + friend void ::destroy_current_cpu_terminating_thread(); > +#endif > friend class thread_ref_guard; > friend void thread_main_c(thread* t); > friend class wait_guard; > @@ -884,8 +899,13 @@ struct cpu : private timer_base::client { > * @param preempt_after fire the preemption timer after this time period > * (relevant only when "called_from_yield" is TRUE) > */ > +#ifdef AARCH64_PORT_STUB > + thread_switch_data schedule_next_thread(bool called_from_yield = false, > + thread_runtime::duration preempt_after = thyst); > +#else > void reschedule_from_interrupt(bool called_from_yield = false, > - thread_runtime::duration preempt_after = thyst); > + thread_runtime::duration preempt_after = thyst); > +#endif > void enqueue(thread& t); > void init_idle_thread(); > virtual void timer_fired() override; > @@ -895,6 +915,12 @@ struct cpu : private timer_base::client { > int renormalize_count; > }; > > +#ifdef AARCH64_PORT_STUB > +extern "C" void reschedule_from_interrupt(cpu* cpu, > + bool called_from_yield, > + thread_runtime::duration preempt_after); > +#endif > + > class cpu::notifier { > public: > explicit notifier(std::function<void ()> cpu_up); > @@ -996,7 +1022,11 @@ inline bool preemptable() > inline void preempt() > { > if (preemptable()) { > +#ifdef AARCH64_PORT_STUB > + reschedule_from_interrupt(sched::cpu::current(), false, thyst); > +#else > sched::cpu::current()->reschedule_from_interrupt(); > +#endif > } else { > // preempt_enable() will pick this up eventually > need_reschedule = true; > -- > 2.27.0 > > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/32e8da0b-a3b3-440b-86e2-e2d521f8c38dn%40googlegroups.com.
