Re: OSv crashes fairly sporadically with page fault when transcoding video with ffmpeg

Nadav Har'El Sun, 02 Dec 2018 02:36:21 -0800

On Thu, Nov 29, 2018 at 9:19 PM Waldek Kozaczuk <[email protected]>
wrote:


> Implemented checksum logic (please let me know if I have made any stupid
> mistakes).
>
> diff --git a/arch/x64/arch-cpu.cc b/arch/x64/arch-cpu.cc
> index bb96af3d..34bf6081 100644
> --- a/arch/x64/arch-cpu.cc
> +++ b/arch/x64/arch-cpu.cc
> @@ -42,6 +42,14 @@ exception_guard::~exception_guard()
>      sched::cpu::current()->arch.exit_exception();
>  }
>
> +static inline u64 calculate_fpu_state_checksum(processor::fpu_state *s) {
> +    u64 val = 0;
> +    char* data = s->legacy;
> +    for (u64 i = 0; i < sizeof(processor::fpu_state) - sizeof(u64); i++)
>

What is this "- sizeof(u64)"?

+       val += data[i];
>

Addition isn't the greatest hash function, but it is probably good enough
to detect random corruption.


> +    return val;
> +}
> +
>  extern "C"
>  [[gnu::target("no-sse")]]
>  void fpu_state_init_xsave(processor::fpu_state *s) {
> @@ -61,21 +69,25 @@ void fpu_state_init_fxsave(processor::fpu_state *s) {
>  extern "C"
>  void fpu_state_save_xsave(processor::fpu_state *s) {
>      processor::xsave(s, -1);
> +    s->checksum = calculate_fpu_state_checksum(s);
>

I hope you verified that indeed xsave or fxsave was used in your test
setup, and not the old x87 FPU saving code?

 }
>
>  extern "C"
>  void fpu_state_save_fxsave(processor::fpu_state *s) {
>      processor::fxsave(s);
> +    s->checksum = calculate_fpu_state_checksum(s);
>  }
>
>  extern "C"
>  void fpu_state_restore_xsave(processor::fpu_state *s) {
>      processor::xrstor(s, -1);
> +    assert(s->checksum == calculate_fpu_state_checksum(s));
>  }
>
>  extern "C"
>  void fpu_state_restore_fxsave(processor::fpu_state *s) {
>      processor::fxrstor(s);
> +    assert(s->checksum == calculate_fpu_state_checksum(s));
>  }
>
>  extern "C"
> diff --git a/arch/x64/processor.hh b/arch/x64/processor.hh
> index 6b1fbf66..a26fb028 100644
> --- a/arch/x64/processor.hh
> +++ b/arch/x64/processor.hh
> @@ -343,6 +343,7 @@ struct fpu_state {
>      char xsavehdr[24];
>      char reserved[40];
>      char ymm[256];
> +    u64 checksum;
>  } __attribute__((packed));
>
>  inline void fxsave(fpu_state* s)
>
> :-( Unfortunately no fpu_state corruption got detected. And I am still
> getting same faults in same place.
>
> On Thursday, November 29, 2018 at 12:28:53 PM UTC-5, Waldek Kozaczuk wrote:
>>
>> Nadav,
>>
>> I have found this email thread around patches adding AVX support -
>> https://groups.google.com/forum/#!msg/osv-dev/DIbzudXr0jo/glTrLgjFOssJ;context-place=forum/osv-dev
>> .
>>
>> There is a comment you left specifically about "per-thread fpu save
>> area". Maybe that is something we need to pursue once we confirm that the
>> fpu_state on interrupt stack somehow gets corrupted. I still need to code
>> that checksum logic to verify that indeed it happens.
>>
>> "When we leave a thread during an involuntary context switch (e.g.,
>> interrupt), we need to XSAVEOPT its state in the same place we last
>> XRESTORed it from when entering this thread - so this can't be a random
>> location on the stack chosen at interrupt time (as in inplace_fpu_state)
>> because then it won't be the same address used last time and the
>> optimization won't work. So we need to have a per-thread fpu save area, and
>> save the state *there* on interrupt. Funny enough, we already had such a
>> tread->_fpu area in the past, but removed it (commit
>> 202b2cccc14a43d088002a80f7add51e3e4bc4ce)."
>>
>> On Wednesday, November 28, 2018 at 5:03:31 PM UTC-5, Waldek Kozaczuk
>> wrote:
>>>
>>>
>>>
>>> On Wednesday, November 28, 2018 at 4:51:08 PM UTC-5, Nadav Har'El wrote:
>>>>
>>>> On Wed, Nov 28, 2018 at 9:20 PM Waldek Kozaczuk <[email protected]>
>>>> wrote:
>>>>
>>>>> The title of this thread indicates that the crash happens sporadically
>>>>> and indeed it is the case with the scenario described in 1st and 2nd email
>>>>> of this thread. The symptom (stack trace with fmt_fp() function captured 
>>>>> by
>>>>> gdb) looks identical to what was reported in
>>>>> https://github.com/cloudius-systems/osv/issues/1010 and this
>>>>> https://github.com/cloudius-systems/osv/issues/536. We cannot be
>>>>> certain about whether the root cause is the same. However it is worth
>>>>> mentioning that crash happens with same symptom almost every time if I run
>>>>> the scenario described in issue #1010 which is different (but similar) 
>>>>> than
>>>>> scenario described in 1st and 2nd email of this thread. I am
>>>>> emphasizing this because we have a way to replicate one of these crashes 
>>>>> (*possibly
>>>>> same root cause*) in very repeatable way in case we have ideas of
>>>>> things to try.
>>>>>
>>>>
>>>> That is good. If we give up on this effort (at least temporarily),
>>>> please summarize what you know about reproducing this issue in a new
>>>> bug-tracker issue (and reference 1010 and 536) so we could return to it
>>>> later.
>>>>
>>>>
>>>>> Regarding your excellent explanations how non-FPU and FPU state is
>>>>> saved/restored by OSv I wanted to run a scenario to see if I understand it
>>>>> well (please forgive my ignorance if I am missing something obvious).
>>>>> Imagine we have 2 user threads (uT1 and uT2) executing some *logic
>>>>> that involves floating point operations in a loop* (like the one in
>>>>> the disassembled code). Ffmpeg is a good example as it must be very CPU
>>>>> intensive and doing all kind of FP computations to transcode video. Let us
>>>>> say on given cpu this sequence happens in time:
>>>>>
>>>>> ->uT1 is running (executing long FP logic loop)
>>>>> --> timer triggered interrupt happens because allotted time slice
>>>>> expired [some FPU registers had some inflight data = FPU has state S1)
>>>>> ----> OSv saves FPU state and other registers using fpu_lock construct
>>>>>
>>>>
>>>> So far, right. This state is saved inside the interrupt stack, which
>>>> while thread uT1 is running, is  uT1->_arch->interrupt_stack.
>>>> By the way, note that the regular (non-FPU) registers are also on this
>>>> stack - the code in entry.S saves them on the stack and passes the pointer
>>>> to it when it calls interrupt().
>>>>
>>>> -------> OSv identifies next thread to run which is uT2
>>>>> ---------> OSv restores FPU to state S1 (fpu_lock goes out of scope)
>>>>> and other registers as well and switches to uT2
>>>>>
>>>>
>>>> No, this is not what happens now (at least, not if you mean that "S1"
>>>> was uT1's state...). Before interrupt() ends or fpu_lock runs out of scope,
>>>> interrupt calls sched::preempt().
>>>> This is a weird function. It doesn't do something quickly and return
>>>> :-) Rather, it calls the scheduler to decide which thread to run next. If
>>>> it decides to run uT1 again, it just returns normally (and the FPU state is
>>>> restored and the interrupt returns).
>>>> But if it decides to run uT2 it doesn't quite return... Rather, it
>>>> *switches* to thread uT2. This means we switch to uT2's stack, and jump to
>>>> the program counter where uT2 was before. And where was uT2 running before?
>>>> One of the likely options was that it was itself in interrupt() calling
>>>> preempt() :-) So now, finally, when uT2 resumes, its call to preempt()
>>>> "returns", and we reach the end of the interrupt() function. Remember that
>>>> at this point we're running uT2, with uT2's stack. So when interrupt()
>>>> ends, the fpu_lock goes out of scope, and the FPU state is popped from
>>>> *uT2*'s stack. This is was uT2's FPU state, saved long ago, and now we're
>>>> restoring it, and when interrupt() returns, uT2's correct FPU state will
>>>> have been restored.
>>>>
>>> Thanks. I get it now. The key thing is that FPU state is saved (pushed)
>>> to current and restored from (popped) new thread's stack.
>>>
>>>>
>>>>
>>>>> --------------> UT2 is running (executing long FP logic loop)
>>>>> --> timer triggered interrupt happens again because allotted time
>>>>> slice expired [some FPU registers had some inflight data = FPU has state 
>>>>> S2)
>>>>> ... same thing
>>>>> ... same thing
>>>>> ---------> OSv restores FPU to state S2 and other registers as well
>>>>> and switches back to uT1
>>>>> *--------------> UT1 is resuming where it left above and instead of
>>>>> FPU be in expected S1 state it sees FPU in state S2 -> crash????*
>>>>>
>>>>> What am I missing in my scenario?
>>>>>
>>>>> Lastly I am adding the disassembled portion of fmt_fp() with applied
>>>>> patch like below that makes the repeatable crash (from issue #1010) go 
>>>>> away
>>>>> or at least way less frequent.
>>>>>
>>>>> diff --git a/libc/stdio/vfprintf.c b/libc/stdio/vfprintf.c
>>>>> index aac790c0..1e116038 100644
>>>>> --- a/libc/stdio/vfprintf.c
>>>>> +++ b/libc/stdio/vfprintf.c
>>>>> @@ -8,6 +8,7 @@
>>>>> #include <inttypes.h>
>>>>> #include <math.h>
>>>>> #include <float.h>
>>>>> +#include <assert.h>
>>>>>  /* Some useful macros */
>>>>> @@ -296,9 +297,14 @@ static int fmt_fp(FILE *f, long double y, int w,
>>>>> int p, int fl, int t)
>>>>>        if (e2<0) a=r=z=big;
>>>>>        else a=r=z=big+sizeof(big)/sizeof(*big) - LDBL_MANT_DIG - 1;
>>>>> +        int steps = 0;
>>>>>        do {
>>>>>                *z = y;
>>>>>                y = 1000000000*(y-*z++);
>>>>> +                steps++;
>>>>> +               if(steps > 2000) {
>>>>> +                  assert(0);
>>>>> +                }
>>>>>        } while (y);
>>>>>         while (e2>0) {
>>>>>
>>>>> Your explanation was that because of added assert() function call we
>>>>> force compiler to generate code to save/restore any used FPU registers.
>>>>>
>>>>
>>>> Yes, my thesis (and I'm too tired now to verify this in the assembler
>>>> code below) was that because there is this function call (and the compiler
>>>> doesn't know if it is commonly called or not), the compiler doesn't want to
>>>> leave y in a register, so it will load and store it from memory. But this
>>>> is just a guess.
>>>> You can try instead of assert(0), do
>>>>
>>>> asm volatile ("cli; hlt" : : : "memory");
>>>>
>>>> This is not a function call, so it will not change the compiled code.
>>>> It will allow you to connect with a debugger to print more stuff at that
>>>> point. Note that this cli; hlt stuff will only hang the single CPU that ran
>>>> this command (if you can reproduce this with -c1, this will be the only cpu
>>>> so things will be easier).
>>>>
>>>>
>>>>
>>>>> But if we do not see the crash with this patch applied happeny any
>>>>> more, then the assert function does not get called and FPU not
>>>>> saved/restored, so how this helps make this crash go away?
>>>>>
>>>>> Portion of disassembled code:
>>>>>       if (e2<0) a=r=z=big;
>>>>>         else a=r=z=big+sizeof(big)/sizeof(*big) - LDBL_MANT_DIG - 1;
>>>>>
>>>>>         int steps = 0;
>>>>>         do {
>>>>>                 *z = y;
>>>>>      501:       d9 bd 0e e3 ff ff       fnstcw -0x1cf2(%rbp)
>>>>>         if (e2<0) a=r=z=big;
>>>>>      507:       48 8d 85 40 e3 ff ff    lea    -0x1cc0(%rbp),%rax
>>>>>      50e:       45 85 c9                test   %r9d,%r9d
>>>>>      511:       48 8d 95 cc fe ff ff    lea    -0x134(%rbp),%rdx
>>>>>      518:       48 89 c1                mov    %rax,%rcx
>>>>>                 *z = y;
>>>>>      51b:       0f b7 85 0e e3 ff ff    movzwl -0x1cf2(%rbp),%eax
>>>>>         if (e2<0) a=r=z=big;
>>>>>      522:       48 0f 49 ca             cmovns %rdx,%rcx
>>>>>                 *z = y;
>>>>>      526:       80 cc 0c                or     $0xc,%ah
>>>>>         if (e2<0) a=r=z=big;
>>>>>      529:       48 89 8d d8 e2 ff ff    mov    %rcx,-0x1d28(%rbp)
>>>>>                 y = 1000000000*(y-*z++);
>>>>>      530:       48 8d 59 04             lea    0x4(%rcx),%rbx
>>>>>      534:       48 8d 91 44 1f 00 00    lea    0x1f44(%rcx),%rdx
>>>>>                 *z = y;
>>>>>      53b:       66 89 85 0c e3 ff ff    mov    %ax,-0x1cf4(%rbp)
>>>>>      542:       d9 c0                   fld    %st(0)
>>>>>      544:       d9 ad 0c e3 ff ff       fldcw  -0x1cf4(%rbp)
>>>>>      54a:       df bd 00 e3 ff ff       fistpll -0x1d00(%rbp)
>>>>>      550:       d9 ad 0e e3 ff ff       fldcw  -0x1cf2(%rbp)
>>>>>      556:       48 8b 85 00 e3 ff ff    mov    -0x1d00(%rbp),%rax
>>>>>      55d:       89 01                   mov    %eax,(%rcx)
>>>>>                 y = 1000000000*(y-*z++);
>>>>>      55f:       89 c0                   mov    %eax,%eax
>>>>>      561:       48 89 85 f8 e2 ff ff    mov    %rax,-0x1d08(%rbp)
>>>>>      568:       df ad f8 e2 ff ff       fildll -0x1d08(%rbp)
>>>>>      56e:       de e9                   fsubrp %st,%st(1)
>>>>>      570:       d9 05 00 00 00 00       flds   0x0(%rip)        # 576
>>>>> <fmt_fp+0x176>
>>>>>      576:       dc c9                   fmul   %st,%st(1)
>>>>>                 steps++;
>>>>>                 if(steps > 2000) {
>>>>>      578:       eb 48                   jmp    5c2 <fmt_fp+0x1c2>
>>>>>      57a:       d9 c9                   fxch   %st(1)
>>>>>      57c:       eb 04                   jmp    582 <fmt_fp+0x182>
>>>>>      57e:       66 90                   xchg   %ax,%ax
>>>>>      580:       d9 c9                   fxch   %st(1)
>>>>>                 *z = y;
>>>>>      582:       d9 c0                   fld    %st(0)
>>>>>      584:       d9 ad 0c e3 ff ff       fldcw  -0x1cf4(%rbp)
>>>>>      58a:       df bd 00 e3 ff ff       fistpll -0x1d00(%rbp)
>>>>>      590:       d9 ad 0e e3 ff ff       fldcw  -0x1cf2(%rbp)
>>>>>                 y = 1000000000*(y-*z++);
>>>>>      596:       48 83 c3 04             add    $0x4,%rbx
>>>>>                 *z = y;
>>>>>      59a:       48 8b 85 00 e3 ff ff    mov    -0x1d00(%rbp),%rax
>>>>>      5a1:       89 43 fc                mov    %eax,-0x4(%rbx)
>>>>>                 y = 1000000000*(y-*z++);
>>>>>      5a4:       89 c0                   mov    %eax,%eax
>>>>>      5a6:       48 89 85 f8 e2 ff ff    mov    %rax,-0x1d08(%rbp)
>>>>>      5ad:       df ad f8 e2 ff ff       fildll -0x1d08(%rbp)
>>>>>      5b3:       de e9                   fsubrp %st,%st(1)
>>>>>      5b5:       d8 c9                   fmul   %st(1),%st
>>>>>                 if(steps > 2000) {
>>>>>      5b7:       48 39 d3                cmp    %rdx,%rbx
>>>>>      5ba:       0f 84 99 10 00 00       je     1659 <fmt_fp+0x1259>
>>>>>      5c0:       d9 c9                   fxch   %st(1)
>>>>>                    assert(0);
>>>>>                 }
>>>>>         } while (y);
>>>>>      5c2:       d9 ee                   fldz
>>>>>      5c4:       d9 ca                   fxch   %st(2)
>>>>>      5c6:       db ea                   fucomi %st(2),%st
>>>>>      5c8:       dd da                   fstp   %st(2)
>>>>>      5ca:       7a ae                   jp     57a <fmt_fp+0x17a>
>>>>>      5cc:       75 b2                   jne    580 <fmt_fp+0x180>
>>>>>      5ce:       dd d8                   fstp   %st(0)
>>>>>      5d0:       dd d8                   fstp   %st(0)
>>>>>
>>>>> On Wednesday, November 28, 2018 at 8:14:05 AM UTC-5, Nadav Har'El
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 28, 2018 at 2:18 PM Waldek Kozaczuk <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> On Nov 28, 2018, at 03:58, Nadav Har'El <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> The situation is different with involuntary context switches. When
>>>>>>> an asynchronous event, e.g., an interrupt, occurs, the user thread is 
>>>>>>> in a
>>>>>>> random position in the code. It may be using all its registers, and the 
>>>>>>> FPU
>>>>>>> state (including the old-style FPU and the new SSE and AVX registers).
>>>>>>> Because our interrupt handler (which may do anything from running the
>>>>>>> scheduler to reading a page from disk on page fault) may need to use 
>>>>>>> any of
>>>>>>> these registers, all of them, including the FPU, need to be saved on
>>>>>>> interrupt time. The interrupt has a separate stack, and the FPU is 
>>>>>>> saved on
>>>>>>> this stack (see fpu_lock use in interrupt()). When the interrupt 
>>>>>>> finishes,
>>>>>>> this FPU is restored. This includes involuntary context switching: 
>>>>>>> thread A
>>>>>>> receives an interrupt, saves the FPU, does something and decides to 
>>>>>>> switch
>>>>>>> to thread B, and a while later we switch back to thread A at which point
>>>>>>> the interrupt handler "returns" and restores the  FPU state.
>>>>>>>
>>>>>>> Does involuntary case include scenario when enough time designated
>>>>>>> for current thread by scheduler expires? I would imaging this would 
>>>>>>> qualify
>>>>>>> as interrupt?
>>>>>>>
>>>>>>
>>>>>> Indeed. We set a timer to when this thread's runtime quota will
>>>>>> expire, and this timer generates an interrupt. Check out interrupt() -
>>>>>> after saving the FPU state and acknowledging the interrupt (EOI), it 
>>>>>> calls
>>>>>> the scheduler (sched::preempt()). This will decide which thread to run 
>>>>>> next
>>>>>> - it may run the same thread again, or a different thread. When
>>>>>> sched::preempt() returns - possibly a long time later, it means the
>>>>>> scheduler decided to run *this* thread again. At that point, interrupt()
>>>>>> returns and just before returning it restores the FPU state 
>>>>>> automatically.
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "OSv Development" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: OSv crashes fairly sporadically with page fault when transcoding video with ffmpeg

Reply via email to