Please note that adding emms instruction right after saving FPU state is enough to make the new unit test pass, however it is not enough to fix issues #536, #1010 and #1018. Which means that somehow sometimes in voluntary task switch scenarios (sleep, etc) FPU gets in a non-clean state. So adding emms right after the context switch on new thread (arch-cpu.cc) cleans FPU status word and sets new thread in proper state to execute FPU instructions.
I do not understand what would cause FPU to get in such state in voluntary task switch scenarios - possibly some bugs in user code, GCC or some underlying assumption that kernel typically does not use FPU and OSv does. So even though this patch fixes scenarios like #536 and #101, it leaves me worrying a bit that there may be cases when FPU is in a non-clean state when our thread scheduler calculates time duration (see https://github.com/cloudius-systems/osv/blob/master/core/sched.cc#L94-L117) and ends up with wrong results line NaN. Waldek On Sunday, December 16, 2018 at 5:05:01 PM UTC-5, Waldek Kozaczuk wrote: > > This patch add "emms" instruction in critical places > of OSv thread scheduler code to reset FPU state after it is saved > upon involuntary context switch (for example interrupt) and right > after executing context switch on new thread to clear FPU status word. > Clearing FPU status word is necessary because old thread we are swiching > from, might have used MMX instructions and any code executing after that > - new thread we are switching to - on the same CPU that needs to execute > traditional FPU instructions would fail with FPU overflow exception > leading > to all kind of mysterious and nasty bugs. For specific description look > at the comments added to each line adding emms instruction. > > This patch also adds new unit test that verifies that FPU status > word is cleared upon involuntary context switch from thread using MMX > instruction to a thread that uses traditional FPU instructions - > printf of a float number. > > Please note there are cases other than involuntary context switch > where FPU gets into non-clean state right after context switch which > may lead to other kinds of bugs like infinite loop in printf leading > to a page fault (see #536, #1010 and #1018). This patch fixes those as > well > however we have not been able to provide a unit test that would > demonstrate > this. > > Fixes #1019 > Fixes #1020 > > Fixes #536 - based on around 10 successful test runs where it would crash > every other > time before the patch > > Fixes #1010 - based on over 100 successful test runs where it would crash > almost every time > > Fixes #1018 - based on over successful 50 test runs with /os/threads API > being used > in tight loop and before patch would crash sporadically > > Signed-off-by: Waldemar Kozaczuk <[email protected]> > --- > arch/x64/arch-cpu.cc | 10 ++++++ > arch/x64/arch-switch.hh | 10 +++++- > modules/tests/Makefile | 4 +-- > tests/tst-mmx-fpu.cc | 69 +++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 90 insertions(+), 3 deletions(-) > create mode 100644 tests/tst-mmx-fpu.cc > > diff --git a/arch/x64/arch-cpu.cc b/arch/x64/arch-cpu.cc > index bb96af3d..f3bab022 100644 > --- a/arch/x64/arch-cpu.cc > +++ b/arch/x64/arch-cpu.cc > @@ -61,11 +61,21 @@ void fpu_state_init_fxsave(processor::fpu_state *s) { > extern "C" > void fpu_state_save_xsave(processor::fpu_state *s) { > processor::xsave(s, -1); > + // Reset FPU state after saving it as the thread we are about > + // to switch from might have been using MMX registers and > + // the thread scheduler that does FPU calculations needs > + // FPU to be a clean state > + asm volatile("emms"); > } > > extern "C" > void fpu_state_save_fxsave(processor::fpu_state *s) { > processor::fxsave(s); > + // Reset FPU state after saving it as the thread we are about > + // to switch from might have been using MMX registers and > + // the thread scheduler that does FPU calculations needs > + // FPU to be a clean state > + asm volatile("emms"); > } > > extern "C" > diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh > index e3769540..c859ba89 100644 > --- a/arch/x64/arch-switch.hh > +++ b/arch/x64/arch-switch.hh > @@ -102,7 +102,15 @@ void thread::switch_to() > [rbp]"i"(offsetof(thread_state, rbp)), > [rip]"i"(offsetof(thread_state, rip)) > : "rbx", "rdx", "rsi", "rdi", "r8", "r9", > - "r10", "r11", "r12", "r13", "r14", "r15", "memory"); > + "r10", "r11", "r12", "r13", "r14", "r15", "memory") > + // As the catch-all solution, reset FPU state and more specifically > + // its status word as the old thread we have just switched from due > to voluntary > + // or involuntary reason may have been using MMX or caused FPU to get > + // in a state where it needs to be cleared. Otherwise the new thread > we have > + // just switched to, may experience FPU overflow exceptions leading > + // to very nasty bugs like infinite loop or printing wrong floating > + // point number > + asm volatile ("emms"); > processor::fldcw(fpucw); > processor::ldmxcsr(mxcsr); > } > diff --git a/modules/tests/Makefile b/modules/tests/Makefile > index ece5f846..5c94e549 100644 > --- a/modules/tests/Makefile > +++ b/modules/tests/Makefile > @@ -116,8 +116,8 @@ tests := tst-pthread.so misc-ramdisk.so tst-vblk.so > tst-bsd-evh.so \ > tst-ifaddrs.so tst-pthread-affinity-inherit.so > tst-sem-timed-wait.so \ > tst-ttyname.so tst-pthread-barrier.so tst-feexcept.so tst-math.so > \ > tst-sigaltstack.so tst-fread.so tst-tcp-cork.so tst-tcp-v6.so \ > - tst-calloc.so tst-crypt.so tst-non-fpic.so tst-small-malloc.so > - > + tst-calloc.so tst-crypt.so tst-non-fpic.so tst-small-malloc.so \ > + tst-mmx-fpu.so > # libstatic-thread-variable.so tst-static-thread-variable.so \ > > tests += testrunner.so > diff --git a/tests/tst-mmx-fpu.cc b/tests/tst-mmx-fpu.cc > new file mode 100644 > index 00000000..163d9f26 > --- /dev/null > +++ b/tests/tst-mmx-fpu.cc > @@ -0,0 +1,69 @@ > +#include <stdio.h> > +#include <unistd.h> > +#include <pthread.h> > +#include <math.h> > +#include <assert.h> > +#include <string.h> > +#include <stdlib.h> > +#include <time.h> > +#include <atomic> > + > +static std::atomic<bool> mmx_finished = { false }; > + > +static void *mmx(void *arg) { > + printf("MMX START: %f\n", 10.5); > + > + // Execute some non-sensical code using MMX instructions > + // in long-enough loop to be preempted by OSv thread scheduler > + for (unsigned long i = 0; i < 500000000l; i++) { > + asm volatile ("movq $0x0fffffff, %rdx \n\t"); > + > + asm volatile ("movq %rdx, %mm0 \n\t"); > + asm volatile ("movq %rdx, %mm1 \n\t"); > + asm volatile ("paddsb %mm0, %mm1 \n\t"); > + } > + asm volatile ("emms"); > + > + printf("MMX END: %f\n", 11.5); > + return NULL; > +} > + > +static void *fpu(void *arg) { > + printf("FPU START: %f\n", 10.5); > + > + struct timespec ts; > + ts.tv_sec = 0; > + ts.tv_nsec = 1000; > + > + char buffer[10]; > + while (!mmx_finished.load()) { > + sprintf(buffer, "%f", 0.5); > + assert(!strcmp(buffer, "0.500000")); > + nanosleep(&ts, NULL); > + } > + > + printf("FPU END: %f\n", 11.5); > + return NULL; > +} > + > +// This test verifies that two threads - one using MMX instructions > +// and another one using FPU and yielding (see sleep()) - can execute > +// correctly without affecting each other. To that end OSv > +// needs to make sure that CPU is properly reset after MMX code > +// gets preempted so that code on new thread can safely execute > +// code using traditional FPU instructions > +// > +int main() { > + printf("Hello from C code, %f\n", NAN); > + > + pthread_t mmx_thread, fpu_thread; > + > + pthread_create(&mmx_thread, NULL, mmx, NULL); > + pthread_create(&fpu_thread, NULL, fpu, NULL); > + > + pthread_join(mmx_thread, NULL); > + mmx_finished.store(true); > + pthread_join(fpu_thread, NULL); > + > + return 0; > +} > -- > 2.19.1 > > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
