On Friday, January 25, 2019 at 6:31:11 PM UTC-5, Waldek Kozaczuk wrote:
>
> Thanks for reviewing this patch.
>
> On Wednesday, January 23, 2019 at 5:10:56 PM UTC-5, Nadav Har'El wrote:
>>
>> On Wed, Dec 19, 2018 at 2:26 PM Waldemar Kozaczuk <[email protected]> 
>> wrote:
>>
>>> This patch adds "emms" instruction in critical places
>>>
>>
>> Thank you so much for working on this, and sorry for having delayed my 
>> review for SO long.
>> I'm basically happy with this patch except one thing that nags me - I 
>> don't understand why the
>> *second* EMMS call, the one in switch_to(), is necessary. And I'd really 
>> like to understand.
>> I have a *guess* below why it might be necessary, and I would be really 
>> grateful if you could
>> check my guess. I also have a couple of other small comments below.
>>  
>>
>>> of OSv thread scheduler code to reset FPU state after it is saved
>>> upon involuntary context switch (for example interrupt) and right
>>> after executing context switch on new thread to clear FPU status word.
>>> Clearing FPU status word is necessary because old thread we are switching
>>> from, might have used MMX instructions and any code executing after that
>>> - new thread we are switching to - on the same CPU that needs to execute
>>> traditional FPU instructions would fail with FPU overflow exception 
>>> leading
>>> to all kind of mysterious and nasty bugs. For specific description look
>>> at the comments added to each line adding emms instruction.
>>>
>>> This patch also adds new unit test that verifies that FPU status
>>> word is cleared upon involuntary context switch from thread using MMX
>>> instruction to a thread that uses traditional FPU instructions -
>>> printf of a float number.
>>>
>>> Please note there are cases other than involuntary context switch
>>> where FPU gets into non-clean state right after context switch which
>>> may lead to other kinds of bugs like infinite loop in printf leading
>>> to a page fault (see #536, #1010 and #1018). This patch fixes those as 
>>> well
>>> however we have not been able to provide a unit test that would 
>>> demonstrate
>>> this.
>>>
>>> Fixes #1019
>>> Fixes #1020
>>>
>>> Fixes #536 - based on around 10 successful test runs where it would 
>>> crash every other
>>> time before the patch
>>>
>>> Fixes #1010 - based on over 100 successful test runs where it would 
>>> crash almost every time
>>>
>>> Fixes #1018 - based on over successful 50 test runs with /os/threads API 
>>> being used
>>> in tight loop and before patch would crash sporadically
>>>
>>> Signed-off-by: Waldemar Kozaczuk <[email protected]>
>>> ---
>>>  arch/x64/arch-cpu.cc    | 10 ++++++
>>>  arch/x64/arch-switch.hh |  8 +++++
>>>  modules/tests/Makefile  |  4 +--
>>>  tests/tst-mmx-fpu.cc    | 69 +++++++++++++++++++++++++++++++++++++++++
>>>  4 files changed, 89 insertions(+), 2 deletions(-)
>>>  create mode 100644 tests/tst-mmx-fpu.cc
>>>
>>> diff --git a/arch/x64/arch-cpu.cc b/arch/x64/arch-cpu.cc
>>> index bb96af3d..f3bab022 100644
>>> --- a/arch/x64/arch-cpu.cc
>>> +++ b/arch/x64/arch-cpu.cc
>>> @@ -61,11 +61,21 @@ void fpu_state_init_fxsave(processor::fpu_state *s) {
>>>  extern "C"
>>>  void fpu_state_save_xsave(processor::fpu_state *s) {
>>>      processor::xsave(s, -1);
>>> +    // Reset FPU state after saving it as the thread we are about
>>> +    // to switch from might have been using MMX registers and
>>> +    // the thread scheduler that does FPU calculations needs
>>> +    // FPU to be a clean state
>>> +    asm volatile("emms");
>>>  }
>>>
>>>  extern "C"
>>>  void fpu_state_save_fxsave(processor::fpu_state *s) {
>>>      processor::fxsave(s);
>>> +    // Reset FPU state after saving it as the thread we are about
>>> +    // to switch from might have been using MMX registers and
>>> +    // the thread scheduler that does FPU calculations needs
>>> +    // FPU to be a clean state
>>> +    asm volatile("emms");
>>>
>>
>> Nitpick: Here you duplicate this code (and comment) twice, but this was 
>> supposed to be a low-level
>> function that does either xsave or fxsave, only. Maybe it makes more 
>> sense to add the extra "emms" 
>> call (and comment) in arch/x64/arch-cpu.hh, in the save_fpu::save() 
>> function, so it will only need to
>> appear once?
>>
> I do not remember why I did not do this. Will try to remove this 
> duplication. 
>
>>
>>  }
>>>
>>>  extern "C"
>>> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh
>>> index e3769540..08db318a 100644
>>> --- a/arch/x64/arch-switch.hh
>>> +++ b/arch/x64/arch-switch.hh
>>> @@ -103,6 +103,14 @@ void thread::switch_to()
>>>             [rip]"i"(offsetof(thread_state, rip))
>>>           : "rbx", "rdx", "rsi", "rdi", "r8", "r9",
>>>             "r10", "r11", "r12", "r13", "r14", "r15", "memory");
>>> +    // As the catch-all solution, reset FPU state and more specifically
>>> +    // its status word as the old thread we have just switched from due 
>>> to voluntary
>>> +    // or involuntary reason may have been using MMX or caused FPU to 
>>> get
>>> +    // in a state where it needs to be cleared. Otherwise the new 
>>> thread we have
>>> +    // just switched to, may experience FPU overflow exceptions leading
>>> +    // to very nasty bugs like infinite loop or printing wrong floating
>>> +    // point number
>>>
>>
>> I'm not sure I fully understand this comment, or why this "emms" is needed
>> at all. I tried to explain in 
>> https://github.com/cloudius-systems/osv/issues/1020 that
>> it was *not* needed, but if I remember correctly you discovered that it 
>> *is* needed
>> so my explanation was obviously wrong somehow.
>>
>> So I'm really curious *why* this EMMS is needed beyond the one we added 
>> to the
>> FPU save) because it's possible that the actual scheduler code used the 
>> FPU and
>> then *inlined *the thread::switch_to() causing switch_to() to begin with 
>> an unclear
>> FPU stack. Can you please help me verify if this is (or isn't) the reason 
>> why this
>> second EMMS is needed? You can check this by adding  __attribute__ 
>> ((noinline)
>> to thread::switch_to(). If this makes this EMMS call unnecessary, we 
>> found the
>> reason (if this EMMS is still necessary, then we still don't understand 
>> something...).
>>
>
> This patch did not help fix the issue #1010 (I did not test #1018 and #536 
> this time but given the stack trace looks identical I think I am guessing 
> "noinline" fix would not help there either. But who knows. 
>
I meant your  "__attribute__ ((noinline)" suggestion on top of my patch 
(see the diff below) did not fix #1010.

>
> Here is how I changed the code on top of what this patch does:
> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh
> index 08db318a..8d685ebd 100644
> --- a/arch/x64/arch-switch.hh
> +++ b/arch/x64/arch-switch.hh
> @@ -74,7 +74,8 @@ void (*resolve_set_fsbase(void))(u64 v)
>
>  void set_fsbase(u64 v) __attribute__((ifunc("resolve_set_fsbase")));
>
> -void thread::switch_to()
> +__attribute__((noinline)) void thread::switch_to()
> +//void thread::switch_to()
>  {
>      thread* old = current();
>      // writing to fs_base invalidates memory accesses, so surround with
> @@ -110,7 +111,7 @@ void thread::switch_to()
>      // just switched to, may experience FPU overflow exceptions leading
>      // to very nasty bugs like infinite loop or printing wrong floating
>      // point number
> -    asm volatile ("emms");
> +    //asm volatile ("emms");
>      processor::fldcw(fpucw);
>      processor::ldmxcsr(mxcsr);
>  }
>
> Also running this before and after applying your suggestion did not change 
> anything (it seems to me that switch_to was not inlined in first place, is 
> my thinking wrong?). Otherwise the "switch_to" symbol should not show up in 
> the elf, should it?
> readelf -s build/release/loader-stripped.elf | grep switch
>   1566: 00000000003f69c0   124 FUNC    GLOBAL DEFAULT    2 
> _ZN5sched6thread15switch_
>   9593: 00000000003f6910   172 FUNC    GLOBAL DEFAULT    2 
> _ZN5sched6thread9switch_t
>  14840: 00000000003a2330    21 FUNC    GLOBAL DEFAULT    2 
> _ZN3mmu29switch_to_runtim
>
> Finally OSv scheduler for sure does do floating point calculation, but is 
> it a way that corrupts FPU, I do not know. I believe 
> thread_runtime::time_until() 
> calls taulog() that uses floats to calculate some expression. See 
> https://github.com/cloudius-systems/osv/blob/a3cd022fcda2c88eae89476aa6c29e3c4be04926/core/sched.cc#L1759-L1773.
>  
> The time_until() is called by cpu::reschedule_from_interrupt() which 
> eventually calls switch_to().
>
>>
>> By the way, which of the various tests you ran allows you to check that 
>> this EMMS
>> here is necessary?
>>
> The second EMMS in switch_to() is necessary to fix #536, #1010 and #1018. 
> The #1010 is easiest to reproduce - I added original steps there. I you 
> have time I am sending a patch with some shell scripts that let you 
> replicate this issue (look 
> at fpu_test_scripts/run_ffmpeg_extract_frames.sh).
>  
>
>>
>> P.S. instead of a long comment which I'm not sure is 100% accurate, you
>> can also just point to issue 1020 in the comment.
>>
>> Sure. 
>
>>
>> +    asm volatile ("emms");
>>>      processor::fldcw(fpucw);
>>>      processor::ldmxcsr(mxcsr);
>>>  }
>>> diff --git a/modules/tests/Makefile b/modules/tests/Makefile
>>> index ece5f846..5c94e549 100644
>>> --- a/modules/tests/Makefile
>>> +++ b/modules/tests/Makefile
>>> @@ -116,8 +116,8 @@ tests := tst-pthread.so misc-ramdisk.so tst-vblk.so 
>>> tst-bsd-evh.so \
>>>         tst-ifaddrs.so tst-pthread-affinity-inherit.so 
>>> tst-sem-timed-wait.so \
>>>         tst-ttyname.so tst-pthread-barrier.so tst-feexcept.so 
>>> tst-math.so \
>>>         tst-sigaltstack.so tst-fread.so tst-tcp-cork.so tst-tcp-v6.so \
>>> -       tst-calloc.so tst-crypt.so tst-non-fpic.so tst-small-malloc.so
>>> -
>>> +       tst-calloc.so tst-crypt.so tst-non-fpic.so tst-small-malloc.so \
>>> +       tst-mmx-fpu.so
>>>  #      libstatic-thread-variable.so tst-static-thread-variable.so \
>>>
>>>  tests += testrunner.so
>>> diff --git a/tests/tst-mmx-fpu.cc b/tests/tst-mmx-fpu.cc
>>> new file mode 100644
>>> index 00000000..163d9f26
>>> --- /dev/null
>>> +++ b/tests/tst-mmx-fpu.cc
>>> @@ -0,0 +1,69 @@
>>> +#include <stdio.h>
>>> +#include <unistd.h>
>>> +#include <pthread.h>
>>> +#include <math.h>
>>> +#include <assert.h>
>>> +#include <string.h>
>>> +#include <stdlib.h>
>>> +#include <time.h>
>>> +#include <atomic>
>>> +
>>> +static std::atomic<bool> mmx_finished = { false };
>>> +
>>> +static void *mmx(void *arg) {
>>> +    printf("MMX START: %f\n", 10.5);
>>> +
>>> +    // Execute some non-sensical code using MMX instructions
>>> +    // in long-enough loop to be preempted by OSv thread scheduler
>>> +    for (unsigned long i = 0; i < 500000000l; i++) {
>>> +        asm volatile ("movq $0x0fffffff, %rdx \n\t");
>>> +
>>> +        asm volatile ("movq %rdx, %mm0 \n\t");
>>> +        asm volatile ("movq %rdx, %mm1 \n\t");
>>> +        asm volatile ("paddsb %mm0, %mm1 \n\t");
>>> +    }
>>> +    asm volatile ("emms");
>>> +
>>> +    printf("MMX END: %f\n", 11.5);
>>> +    return NULL;
>>> +}
>>> +
>>> +static void *fpu(void *arg) {
>>> +    printf("FPU START: %f\n", 10.5);
>>> +
>>> +    struct timespec ts;
>>> +    ts.tv_sec = 0;
>>> +    ts.tv_nsec = 1000;
>>> +
>>> +    char buffer[10];
>>> +    while (!mmx_finished.load()) {
>>> +        sprintf(buffer, "%f", 0.5);
>>> +        assert(!strcmp(buffer, "0.500000"));
>>> +        nanosleep(&ts, NULL);
>>> +    }
>>> +
>>> +    printf("FPU END: %f\n", 11.5);
>>> +    return NULL;
>>> +}
>>> +
>>> +// This test verifies that two threads - one using MMX instructions
>>> +// and another one using FPU and yielding (see sleep()) - can execute
>>> +// correctly without affecting each other. To that end OSv
>>> +// needs to make sure that CPU is properly reset after MMX code
>>> +// gets preempted so that code on new thread can safely execute
>>> +// code using traditional FPU instructions
>>>
>>
>> Would be nice to refer to the relevant issue and say something about what
>> happened before it was fixed (i.e., did this test crash? fail on an 
>> assertion
>> above? or what?).
>>
> By the way it is enough to have emms instruction added to  
> fpu_state_save_* functions  to have this unit test pass. But it is not 
> enough to fix #536, #1010 and #1018 -> the second emms in switch_to() is 
> necessary.
>
>>
>> +//
>>> +int main() {
>>> +    printf("Hello from C code, %f\n", NAN);
>>> +
>>> +    pthread_t mmx_thread, fpu_thread;
>>> +
>>> +    pthread_create(&mmx_thread, NULL, mmx, NULL);
>>> +    pthread_create(&fpu_thread, NULL, fpu, NULL);
>>> +
>>> +    pthread_join(mmx_thread, NULL);
>>> +    mmx_finished.store(true);
>>> +    pthread_join(fpu_thread, NULL);
>>> +
>>> +    return 0;
>>> +}
>>> -- 
>>> 2.19.1
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OSv Development" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to