Thanks for reviewing this patch. On Wednesday, January 23, 2019 at 5:10:56 PM UTC-5, Nadav Har'El wrote: > > On Wed, Dec 19, 2018 at 2:26 PM Waldemar Kozaczuk <[email protected] > <javascript:>> wrote: > >> This patch adds "emms" instruction in critical places >> > > Thank you so much for working on this, and sorry for having delayed my > review for SO long. > I'm basically happy with this patch except one thing that nags me - I > don't understand why the > *second* EMMS call, the one in switch_to(), is necessary. And I'd really > like to understand. > I have a *guess* below why it might be necessary, and I would be really > grateful if you could > check my guess. I also have a couple of other small comments below. > > >> of OSv thread scheduler code to reset FPU state after it is saved >> upon involuntary context switch (for example interrupt) and right >> after executing context switch on new thread to clear FPU status word. >> Clearing FPU status word is necessary because old thread we are switching >> from, might have used MMX instructions and any code executing after that >> - new thread we are switching to - on the same CPU that needs to execute >> traditional FPU instructions would fail with FPU overflow exception >> leading >> to all kind of mysterious and nasty bugs. For specific description look >> at the comments added to each line adding emms instruction. >> >> This patch also adds new unit test that verifies that FPU status >> word is cleared upon involuntary context switch from thread using MMX >> instruction to a thread that uses traditional FPU instructions - >> printf of a float number. >> >> Please note there are cases other than involuntary context switch >> where FPU gets into non-clean state right after context switch which >> may lead to other kinds of bugs like infinite loop in printf leading >> to a page fault (see #536, #1010 and #1018). This patch fixes those as >> well >> however we have not been able to provide a unit test that would >> demonstrate >> this. >> >> Fixes #1019 >> Fixes #1020 >> >> Fixes #536 - based on around 10 successful test runs where it would crash >> every other >> time before the patch >> >> Fixes #1010 - based on over 100 successful test runs where it would crash >> almost every time >> >> Fixes #1018 - based on over successful 50 test runs with /os/threads API >> being used >> in tight loop and before patch would crash sporadically >> >> Signed-off-by: Waldemar Kozaczuk <[email protected] <javascript:>> >> --- >> arch/x64/arch-cpu.cc | 10 ++++++ >> arch/x64/arch-switch.hh | 8 +++++ >> modules/tests/Makefile | 4 +-- >> tests/tst-mmx-fpu.cc | 69 +++++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 89 insertions(+), 2 deletions(-) >> create mode 100644 tests/tst-mmx-fpu.cc >> >> diff --git a/arch/x64/arch-cpu.cc b/arch/x64/arch-cpu.cc >> index bb96af3d..f3bab022 100644 >> --- a/arch/x64/arch-cpu.cc >> +++ b/arch/x64/arch-cpu.cc >> @@ -61,11 +61,21 @@ void fpu_state_init_fxsave(processor::fpu_state *s) { >> extern "C" >> void fpu_state_save_xsave(processor::fpu_state *s) { >> processor::xsave(s, -1); >> + // Reset FPU state after saving it as the thread we are about >> + // to switch from might have been using MMX registers and >> + // the thread scheduler that does FPU calculations needs >> + // FPU to be a clean state >> + asm volatile("emms"); >> } >> >> extern "C" >> void fpu_state_save_fxsave(processor::fpu_state *s) { >> processor::fxsave(s); >> + // Reset FPU state after saving it as the thread we are about >> + // to switch from might have been using MMX registers and >> + // the thread scheduler that does FPU calculations needs >> + // FPU to be a clean state >> + asm volatile("emms"); >> > > Nitpick: Here you duplicate this code (and comment) twice, but this was > supposed to be a low-level > function that does either xsave or fxsave, only. Maybe it makes more sense > to add the extra "emms" > call (and comment) in arch/x64/arch-cpu.hh, in the save_fpu::save() > function, so it will only need to > appear once? > I do not remember why I did not do this. Will try to remove this duplication.
> > } >> >> extern "C" >> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh >> index e3769540..08db318a 100644 >> --- a/arch/x64/arch-switch.hh >> +++ b/arch/x64/arch-switch.hh >> @@ -103,6 +103,14 @@ void thread::switch_to() >> [rip]"i"(offsetof(thread_state, rip)) >> : "rbx", "rdx", "rsi", "rdi", "r8", "r9", >> "r10", "r11", "r12", "r13", "r14", "r15", "memory"); >> + // As the catch-all solution, reset FPU state and more specifically >> + // its status word as the old thread we have just switched from due >> to voluntary >> + // or involuntary reason may have been using MMX or caused FPU to get >> + // in a state where it needs to be cleared. Otherwise the new thread >> we have >> + // just switched to, may experience FPU overflow exceptions leading >> + // to very nasty bugs like infinite loop or printing wrong floating >> + // point number >> > > I'm not sure I fully understand this comment, or why this "emms" is needed > at all. I tried to explain in > https://github.com/cloudius-systems/osv/issues/1020 that > it was *not* needed, but if I remember correctly you discovered that it > *is* needed > so my explanation was obviously wrong somehow. > > So I'm really curious *why* this EMMS is needed beyond the one we added to > the > FPU save) because it's possible that the actual scheduler code used the > FPU and > then *inlined *the thread::switch_to() causing switch_to() to begin with > an unclear > FPU stack. Can you please help me verify if this is (or isn't) the reason > why this > second EMMS is needed? You can check this by adding __attribute__ > ((noinline) > to thread::switch_to(). If this makes this EMMS call unnecessary, we found > the > reason (if this EMMS is still necessary, then we still don't understand > something...). > This patch did not help fix the issue #1010 (I did not test #1018 and #536 this time but given the stack trace looks identical I think I am guessing "noinline" fix would not help there either. But who knows. Here is how I changed the code on top of what this patch does: diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh index 08db318a..8d685ebd 100644 --- a/arch/x64/arch-switch.hh +++ b/arch/x64/arch-switch.hh @@ -74,7 +74,8 @@ void (*resolve_set_fsbase(void))(u64 v) void set_fsbase(u64 v) __attribute__((ifunc("resolve_set_fsbase"))); -void thread::switch_to() +__attribute__((noinline)) void thread::switch_to() +//void thread::switch_to() { thread* old = current(); // writing to fs_base invalidates memory accesses, so surround with @@ -110,7 +111,7 @@ void thread::switch_to() // just switched to, may experience FPU overflow exceptions leading // to very nasty bugs like infinite loop or printing wrong floating // point number - asm volatile ("emms"); + //asm volatile ("emms"); processor::fldcw(fpucw); processor::ldmxcsr(mxcsr); } Also running this before and after applying your suggestion did not change anything (it seems to me that switch_to was not inlined in first place, is my thinking wrong?). Otherwise the "switch_to" symbol should not show up in the elf, should it? readelf -s build/release/loader-stripped.elf | grep switch 1566: 00000000003f69c0 124 FUNC GLOBAL DEFAULT 2 _ZN5sched6thread15switch_ 9593: 00000000003f6910 172 FUNC GLOBAL DEFAULT 2 _ZN5sched6thread9switch_t 14840: 00000000003a2330 21 FUNC GLOBAL DEFAULT 2 _ZN3mmu29switch_to_runtim Finally OSv scheduler for sure does do floating point calculation, but is it a way that corrupts FPU, I do not know. I believe thread_runtime::time_until() calls taulog() that uses floats to calculate some expression. See https://github.com/cloudius-systems/osv/blob/a3cd022fcda2c88eae89476aa6c29e3c4be04926/core/sched.cc#L1759-L1773. The time_until() is called by cpu::reschedule_from_interrupt() which eventually calls switch_to(). > > By the way, which of the various tests you ran allows you to check that > this EMMS > here is necessary? > The second EMMS in switch_to() is necessary to fix #536, #1010 and #1018. The #1010 is easiest to reproduce - I added original steps there. I you have time I am sending a patch with some shell scripts that let you replicate this issue (look at fpu_test_scripts/run_ffmpeg_extract_frames.sh). > > P.S. instead of a long comment which I'm not sure is 100% accurate, you > can also just point to issue 1020 in the comment. > > Sure. > > + asm volatile ("emms"); >> processor::fldcw(fpucw); >> processor::ldmxcsr(mxcsr); >> } >> diff --git a/modules/tests/Makefile b/modules/tests/Makefile >> index ece5f846..5c94e549 100644 >> --- a/modules/tests/Makefile >> +++ b/modules/tests/Makefile >> @@ -116,8 +116,8 @@ tests := tst-pthread.so misc-ramdisk.so tst-vblk.so >> tst-bsd-evh.so \ >> tst-ifaddrs.so tst-pthread-affinity-inherit.so >> tst-sem-timed-wait.so \ >> tst-ttyname.so tst-pthread-barrier.so tst-feexcept.so tst-math.so >> \ >> tst-sigaltstack.so tst-fread.so tst-tcp-cork.so tst-tcp-v6.so \ >> - tst-calloc.so tst-crypt.so tst-non-fpic.so tst-small-malloc.so >> - >> + tst-calloc.so tst-crypt.so tst-non-fpic.so tst-small-malloc.so \ >> + tst-mmx-fpu.so >> # libstatic-thread-variable.so tst-static-thread-variable.so \ >> >> tests += testrunner.so >> diff --git a/tests/tst-mmx-fpu.cc b/tests/tst-mmx-fpu.cc >> new file mode 100644 >> index 00000000..163d9f26 >> --- /dev/null >> +++ b/tests/tst-mmx-fpu.cc >> @@ -0,0 +1,69 @@ >> +#include <stdio.h> >> +#include <unistd.h> >> +#include <pthread.h> >> +#include <math.h> >> +#include <assert.h> >> +#include <string.h> >> +#include <stdlib.h> >> +#include <time.h> >> +#include <atomic> >> + >> +static std::atomic<bool> mmx_finished = { false }; >> + >> +static void *mmx(void *arg) { >> + printf("MMX START: %f\n", 10.5); >> + >> + // Execute some non-sensical code using MMX instructions >> + // in long-enough loop to be preempted by OSv thread scheduler >> + for (unsigned long i = 0; i < 500000000l; i++) { >> + asm volatile ("movq $0x0fffffff, %rdx \n\t"); >> + >> + asm volatile ("movq %rdx, %mm0 \n\t"); >> + asm volatile ("movq %rdx, %mm1 \n\t"); >> + asm volatile ("paddsb %mm0, %mm1 \n\t"); >> + } >> + asm volatile ("emms"); >> + >> + printf("MMX END: %f\n", 11.5); >> + return NULL; >> +} >> + >> +static void *fpu(void *arg) { >> + printf("FPU START: %f\n", 10.5); >> + >> + struct timespec ts; >> + ts.tv_sec = 0; >> + ts.tv_nsec = 1000; >> + >> + char buffer[10]; >> + while (!mmx_finished.load()) { >> + sprintf(buffer, "%f", 0.5); >> + assert(!strcmp(buffer, "0.500000")); >> + nanosleep(&ts, NULL); >> + } >> + >> + printf("FPU END: %f\n", 11.5); >> + return NULL; >> +} >> + >> +// This test verifies that two threads - one using MMX instructions >> +// and another one using FPU and yielding (see sleep()) - can execute >> +// correctly without affecting each other. To that end OSv >> +// needs to make sure that CPU is properly reset after MMX code >> +// gets preempted so that code on new thread can safely execute >> +// code using traditional FPU instructions >> > > Would be nice to refer to the relevant issue and say something about what > happened before it was fixed (i.e., did this test crash? fail on an > assertion > above? or what?). > By the way it is enough to have emms instruction added to fpu_state_save_* functions to have this unit test pass. But it is not enough to fix #536, #1010 and #1018 -> the second emms in switch_to() is necessary. > > +// >> +int main() { >> + printf("Hello from C code, %f\n", NAN); >> + >> + pthread_t mmx_thread, fpu_thread; >> + >> + pthread_create(&mmx_thread, NULL, mmx, NULL); >> + pthread_create(&fpu_thread, NULL, fpu, NULL); >> + >> + pthread_join(mmx_thread, NULL); >> + mmx_finished.store(true); >> + pthread_join(fpu_thread, NULL); >> + >> + return 0; >> +} >> -- >> 2.19.1 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "OSv Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
