Re: OSv crashes fairly sporadically with page fault when transcoding video with ffmpeg

Waldek Kozaczuk Thu, 29 Nov 2018 09:29:04 -0800

Nadav,

I have found this email thread around patches adding AVX support 
- 
https://groups.google.com/forum/#!msg/osv-dev/DIbzudXr0jo/glTrLgjFOssJ;context-place=forum/osv-dev.


There is a comment you left specifically about "per-thread fpu save area". 
Maybe that is something we need to pursue once we confirm that the 
fpu_state on interrupt stack somehow gets corrupted. I still need to code 
that checksum logic to verify that indeed it happens.

"When we leave a thread during an involuntary context switch (e.g., 
interrupt), we need to XSAVEOPT its state in the same place we last 
XRESTORed it from when entering this thread - so this can't be a random 
location on the stack chosen at interrupt time (as in inplace_fpu_state) 
because then it won't be the same address used last time and the 
optimization won't work. So we need to have a per-thread fpu save area, and 
save the state *there* on interrupt. Funny enough, we already had such a 
tread->_fpu area in the past, but removed it (commit 
202b2cccc14a43d088002a80f7add51e3e4bc4ce)."

On Wednesday, November 28, 2018 at 5:03:31 PM UTC-5, Waldek Kozaczuk wrote:
>
>
>
> On Wednesday, November 28, 2018 at 4:51:08 PM UTC-5, Nadav Har'El wrote:
>>
>> On Wed, Nov 28, 2018 at 9:20 PM Waldek Kozaczuk <[email protected]> 
>> wrote:
>>
>>> The title of this thread indicates that the crash happens sporadically 
>>> and indeed it is the case with the scenario described in 1st and 2nd email 
>>> of this thread. The symptom (stack trace with fmt_fp() function captured by 
>>> gdb) looks identical to what was reported in 
>>> https://github.com/cloudius-systems/osv/issues/1010 and this 
>>> https://github.com/cloudius-systems/osv/issues/536. We cannot be 
>>> certain about whether the root cause is the same. However it is worth 
>>> mentioning that crash happens with same symptom almost every time if I run 
>>> the scenario described in issue #1010 which is different (but similar) than 
>>> scenario described in 1st and 2nd email of this thread. I am 
>>> emphasizing this because we have a way to replicate one of these crashes 
>>> (*possibly 
>>> same root cause*) in very repeatable way in case we have ideas of 
>>> things to try.
>>>
>>
>> That is good. If we give up on this effort (at least temporarily), please 
>> summarize what you know about reproducing this issue in a new bug-tracker 
>> issue (and reference 1010 and 536) so we could return to it later.
>>
>>
>>> Regarding your excellent explanations how non-FPU and FPU state is 
>>> saved/restored by OSv I wanted to run a scenario to see if I understand it 
>>> well (please forgive my ignorance if I am missing something obvious). 
>>> Imagine we have 2 user threads (uT1 and uT2) executing some *logic that 
>>> involves floating point operations in a loop* (like the one in the 
>>> disassembled code). Ffmpeg is a good example as it must be very CPU 
>>> intensive and doing all kind of FP computations to transcode video. Let us 
>>> say on given cpu this sequence happens in time:
>>>
>>> ->uT1 is running (executing long FP logic loop) 
>>> --> timer triggered interrupt happens because allotted time slice 
>>> expired [some FPU registers had some inflight data = FPU has state S1)
>>> ----> OSv saves FPU state and other registers using fpu_lock construct
>>>
>>
>> So far, right. This state is saved inside the interrupt stack, which 
>> while thread uT1 is running, is  uT1->_arch->interrupt_stack.
>> By the way, note that the regular (non-FPU) registers are also on this 
>> stack - the code in entry.S saves them on the stack and passes the pointer 
>> to it when it calls interrupt().
>>
>> -------> OSv identifies next thread to run which is uT2
>>> ---------> OSv restores FPU to state S1 (fpu_lock goes out of scope) and 
>>> other registers as well and switches to uT2
>>>
>>
>> No, this is not what happens now (at least, not if you mean that "S1"  
>> was uT1's state...). Before interrupt() ends or fpu_lock runs out of scope, 
>> interrupt calls sched::preempt().
>> This is a weird function. It doesn't do something quickly and return :-) 
>> Rather, it calls the scheduler to decide which thread to run next. If it 
>> decides to run uT1 again, it just returns normally (and the FPU state is 
>> restored and the interrupt returns). 
>> But if it decides to run uT2 it doesn't quite return... Rather, it 
>> *switches* to thread uT2. This means we switch to uT2's stack, and jump to 
>> the program counter where uT2 was before. And where was uT2 running before? 
>> One of the likely options was that it was itself in interrupt() calling 
>> preempt() :-) So now, finally, when uT2 resumes, its call to preempt() 
>> "returns", and we reach the end of the interrupt() function. Remember that 
>> at this point we're running uT2, with uT2's stack. So when interrupt() 
>> ends, the fpu_lock goes out of scope, and the FPU state is popped from 
>> *uT2*'s stack. This is was uT2's FPU state, saved long ago, and now we're 
>> restoring it, and when interrupt() returns, uT2's correct FPU state will 
>> have been restored. 
>>
> Thanks. I get it now. The key thing is that FPU state is saved (pushed) to 
> current and restored from (popped) new thread's stack. 
>
>>  
>>
>>> --------------> UT2 is running (executing long FP logic loop) 
>>> --> timer triggered interrupt happens again because allotted time slice 
>>> expired [some FPU registers had some inflight data = FPU has state S2)
>>> ... same thing 
>>> ... same thing 
>>> ---------> OSv restores FPU to state S2 and other registers as well and 
>>> switches back to uT1
>>> *--------------> UT1 is resuming where it left above and instead of FPU 
>>> be in expected S1 state it sees FPU in state S2 -> crash????*
>>>
>>> What am I missing in my scenario?
>>>
>>> Lastly I am adding the disassembled portion of fmt_fp() with applied 
>>> patch like below that makes the repeatable crash (from issue #1010) go away 
>>> or at least way less frequent. 
>>>
>>> diff --git a/libc/stdio/vfprintf.c b/libc/stdio/vfprintf.c
>>> index aac790c0..1e116038 100644
>>> --- a/libc/stdio/vfprintf.c
>>> +++ b/libc/stdio/vfprintf.c
>>> @@ -8,6 +8,7 @@
>>> #include <inttypes.h>
>>> #include <math.h>
>>> #include <float.h>
>>> +#include <assert.h>
>>>  /* Some useful macros */
>>> @@ -296,9 +297,14 @@ static int fmt_fp(FILE *f, long double y, int w, 
>>> int p, int fl, int t)
>>>        if (e2<0) a=r=z=big;
>>>        else a=r=z=big+sizeof(big)/sizeof(*big) - LDBL_MANT_DIG - 1;
>>> +        int steps = 0;
>>>        do {
>>>                *z = y;
>>>                y = 1000000000*(y-*z++);
>>> +                steps++;
>>> +               if(steps > 2000) {
>>> +                  assert(0);
>>> +                }
>>>        } while (y);
>>>         while (e2>0) {
>>>
>>> Your explanation was that because of added assert() function call we 
>>> force compiler to generate code to save/restore any used FPU registers.
>>>
>>
>> Yes, my thesis (and I'm too tired now to verify this in the assembler 
>> code below) was that because there is this function call (and the compiler 
>> doesn't know if it is commonly called or not), the compiler doesn't want to 
>> leave y in a register, so it will load and store it from memory. But this 
>> is just a guess.
>> You can try instead of assert(0), do
>>
>> asm volatile ("cli; hlt" : : : "memory");
>>
>> This is not a function call, so it will not change the compiled code. It 
>> will allow you to connect with a debugger to print more stuff at that 
>> point. Note that this cli; hlt stuff will only hang the single CPU that ran 
>> this command (if you can reproduce this with -c1, this will be the only cpu 
>> so things will be easier).
>>
>>  
>>
>>> But if we do not see the crash with this patch applied happeny any more, 
>>> then the assert function does not get called and FPU not saved/restored, so 
>>> how this helps make this crash go away?
>>>
>>> Portion of disassembled code:
>>>       if (e2<0) a=r=z=big;
>>>         else a=r=z=big+sizeof(big)/sizeof(*big) - LDBL_MANT_DIG - 1;
>>>
>>>         int steps = 0;
>>>         do {
>>>                 *z = y;
>>>      501:       d9 bd 0e e3 ff ff       fnstcw -0x1cf2(%rbp)
>>>         if (e2<0) a=r=z=big;
>>>      507:       48 8d 85 40 e3 ff ff    lea    -0x1cc0(%rbp),%rax
>>>      50e:       45 85 c9                test   %r9d,%r9d
>>>      511:       48 8d 95 cc fe ff ff    lea    -0x134(%rbp),%rdx
>>>      518:       48 89 c1                mov    %rax,%rcx
>>>                 *z = y;
>>>      51b:       0f b7 85 0e e3 ff ff    movzwl -0x1cf2(%rbp),%eax
>>>         if (e2<0) a=r=z=big;
>>>      522:       48 0f 49 ca             cmovns %rdx,%rcx
>>>                 *z = y;
>>>      526:       80 cc 0c                or     $0xc,%ah
>>>         if (e2<0) a=r=z=big;
>>>      529:       48 89 8d d8 e2 ff ff    mov    %rcx,-0x1d28(%rbp)
>>>                 y = 1000000000*(y-*z++);
>>>      530:       48 8d 59 04             lea    0x4(%rcx),%rbx
>>>      534:       48 8d 91 44 1f 00 00    lea    0x1f44(%rcx),%rdx
>>>                 *z = y;
>>>      53b:       66 89 85 0c e3 ff ff    mov    %ax,-0x1cf4(%rbp)
>>>      542:       d9 c0                   fld    %st(0)
>>>      544:       d9 ad 0c e3 ff ff       fldcw  -0x1cf4(%rbp)
>>>      54a:       df bd 00 e3 ff ff       fistpll -0x1d00(%rbp)
>>>      550:       d9 ad 0e e3 ff ff       fldcw  -0x1cf2(%rbp)
>>>      556:       48 8b 85 00 e3 ff ff    mov    -0x1d00(%rbp),%rax
>>>      55d:       89 01                   mov    %eax,(%rcx)
>>>                 y = 1000000000*(y-*z++);
>>>      55f:       89 c0                   mov    %eax,%eax
>>>      561:       48 89 85 f8 e2 ff ff    mov    %rax,-0x1d08(%rbp)
>>>      568:       df ad f8 e2 ff ff       fildll -0x1d08(%rbp)
>>>      56e:       de e9                   fsubrp %st,%st(1)
>>>      570:       d9 05 00 00 00 00       flds   0x0(%rip)        # 576 
>>> <fmt_fp+0x176>
>>>      576:       dc c9                   fmul   %st,%st(1)
>>>                 steps++;
>>>                 if(steps > 2000) {
>>>      578:       eb 48                   jmp    5c2 <fmt_fp+0x1c2>
>>>      57a:       d9 c9                   fxch   %st(1)
>>>      57c:       eb 04                   jmp    582 <fmt_fp+0x182>
>>>      57e:       66 90                   xchg   %ax,%ax
>>>      580:       d9 c9                   fxch   %st(1)
>>>                 *z = y;
>>>      582:       d9 c0                   fld    %st(0)
>>>      584:       d9 ad 0c e3 ff ff       fldcw  -0x1cf4(%rbp)
>>>      58a:       df bd 00 e3 ff ff       fistpll -0x1d00(%rbp)
>>>      590:       d9 ad 0e e3 ff ff       fldcw  -0x1cf2(%rbp)
>>>                 y = 1000000000*(y-*z++);
>>>      596:       48 83 c3 04             add    $0x4,%rbx
>>>                 *z = y;
>>>      59a:       48 8b 85 00 e3 ff ff    mov    -0x1d00(%rbp),%rax
>>>      5a1:       89 43 fc                mov    %eax,-0x4(%rbx)
>>>                 y = 1000000000*(y-*z++);
>>>      5a4:       89 c0                   mov    %eax,%eax
>>>      5a6:       48 89 85 f8 e2 ff ff    mov    %rax,-0x1d08(%rbp)
>>>      5ad:       df ad f8 e2 ff ff       fildll -0x1d08(%rbp)
>>>      5b3:       de e9                   fsubrp %st,%st(1)
>>>      5b5:       d8 c9                   fmul   %st(1),%st
>>>                 if(steps > 2000) {
>>>      5b7:       48 39 d3                cmp    %rdx,%rbx
>>>      5ba:       0f 84 99 10 00 00       je     1659 <fmt_fp+0x1259>
>>>      5c0:       d9 c9                   fxch   %st(1)
>>>                    assert(0);
>>>                 }
>>>         } while (y);
>>>      5c2:       d9 ee                   fldz
>>>      5c4:       d9 ca                   fxch   %st(2)
>>>      5c6:       db ea                   fucomi %st(2),%st
>>>      5c8:       dd da                   fstp   %st(2)
>>>      5ca:       7a ae                   jp     57a <fmt_fp+0x17a>
>>>      5cc:       75 b2                   jne    580 <fmt_fp+0x180>
>>>      5ce:       dd d8                   fstp   %st(0)
>>>      5d0:       dd d8                   fstp   %st(0)
>>>
>>> On Wednesday, November 28, 2018 at 8:14:05 AM UTC-5, Nadav Har'El wrote:
>>>>
>>>>
>>>> On Wed, Nov 28, 2018 at 2:18 PM Waldek Kozaczuk <[email protected]> 
>>>> wrote:
>>>>
>>>>> On Nov 28, 2018, at 03:58, Nadav Har'El <[email protected]> wrote:
>>>>>
>>>>>
>>>>> The situation is different with involuntary context switches. When an 
>>>>> asynchronous event, e.g., an interrupt, occurs, the user thread is in a 
>>>>> random position in the code. It may be using all its registers, and the 
>>>>> FPU 
>>>>> state (including the old-style FPU and the new SSE and AVX registers). 
>>>>> Because our interrupt handler (which may do anything from running the 
>>>>> scheduler to reading a page from disk on page fault) may need to use any 
>>>>> of 
>>>>> these registers, all of them, including the FPU, need to be saved on 
>>>>> interrupt time. The interrupt has a separate stack, and the FPU is saved 
>>>>> on 
>>>>> this stack (see fpu_lock use in interrupt()). When the interrupt 
>>>>> finishes, 
>>>>> this FPU is restored. This includes involuntary context switching: thread 
>>>>> A 
>>>>> receives an interrupt, saves the FPU, does something and decides to 
>>>>> switch 
>>>>> to thread B, and a while later we switch back to thread A at which point 
>>>>> the interrupt handler "returns" and restores the  FPU state.
>>>>>
>>>>> Does involuntary case include scenario when enough time designated for 
>>>>> current thread by scheduler expires? I would imaging this would qualify 
>>>>> as 
>>>>> interrupt?
>>>>>
>>>>
>>>> Indeed. We set a timer to when this thread's runtime quota will expire, 
>>>> and this timer generates an interrupt. Check out interrupt() - after 
>>>> saving 
>>>> the FPU state and acknowledging the interrupt (EOI), it calls the 
>>>> scheduler 
>>>> (sched::preempt()). This will decide which thread to run next - it may run 
>>>> the same thread again, or a different thread. When sched::preempt() 
>>>> returns 
>>>> - possibly a long time later, it means the scheduler decided to run *this* 
>>>> thread again. At that point, interrupt() returns and just before returning 
>>>> it restores the FPU state automatically.  
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OSv Development" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: OSv crashes fairly sporadically with page fault when transcoding video with ffmpeg

Reply via email to