Re: OSv crashes fairly sporadically with page fault when transcoding video with ffmpeg

Waldek Kozaczuk Wed, 28 Nov 2018 11:20:49 -0800

The title of this thread indicates that the crash happens sporadically and 
indeed it is the case with the scenario described in 1st and 2nd email of 
this thread. The symptom (stack trace with fmt_fp() function captured by 
gdb) looks identical to what was reported in 
https://github.com/cloudius-systems/osv/issues/1010 and this 
https://github.com/cloudius-systems/osv/issues/536. We cannot be certain 
about whether the root cause is the same. However it is worth mentioning 
that crash happens with same symptom almost every time if I run the 
scenario described in issue #1010 which is different (but similar) than 
scenario described in 1st and 2nd email of this thread. I am emphasizing 
this because we have a way to replicate one of these crashes (*possibly 
same root cause*) in very repeatable way in case we have ideas of things to 
try.


Regarding your excellent explanations how non-FPU and FPU state is 
saved/restored by OSv I wanted to run a scenario to see if I understand it 
well (please forgive my ignorance if I am missing something obvious). 
Imagine we have 2 user threads (uT1 and uT2) executing some *logic that 
involves floating point operations in a loop* (like the one in the 
disassembled code). Ffmpeg is a good example as it must be very CPU 
intensive and doing all kind of FP computations to transcode video. Let us 
say on given cpu this sequence happens in time:

->uT1 is running (executing long FP logic loop) 
--> timer triggered interrupt happens because allotted time slice expired 
[some FPU registers had some inflight data = FPU has state S1)
----> OSv saves FPU state and other registers using fpu_lock construct
-------> OSv identifies next thread to run which is uT2
---------> OSv restores FPU to state S1 (fpu_lock goes out of scope) and 
other registers as well and switches to uT2
--------------> UT2 is running (executing long FP logic loop) 
--> timer triggered interrupt happens again because allotted time slice 
expired [some FPU registers had some inflight data = FPU has state S2)
... same thing 
... same thing 
---------> OSv restores FPU to state S2 and other registers as well and 
switches back to uT1
*--------------> UT1 is resuming where it left above and instead of FPU be 
in expected S1 state it sees FPU in state S2 -> crash????*

What am I missing in my scenario?

Lastly I am adding the disassembled portion of fmt_fp() with applied patch 
like below that makes the repeatable crash (from issue #1010) go away or at 
least way less frequent. 

diff --git a/libc/stdio/vfprintf.c b/libc/stdio/vfprintf.c
index aac790c0..1e116038 100644
--- a/libc/stdio/vfprintf.c
+++ b/libc/stdio/vfprintf.c
@@ -8,6 +8,7 @@
#include <inttypes.h>
#include <math.h>
#include <float.h>
+#include <assert.h>
 /* Some useful macros */
@@ -296,9 +297,14 @@ static int fmt_fp(FILE *f, long double y, int w, int p, 
int fl, int t)
       if (e2<0) a=r=z=big;
       else a=r=z=big+sizeof(big)/sizeof(*big) - LDBL_MANT_DIG - 1;
+        int steps = 0;
       do {
               *z = y;
               y = 1000000000*(y-*z++);
+                steps++;
+               if(steps > 2000) {
+                  assert(0);
+                }
       } while (y);
        while (e2>0) {

Your explanation was that because of added assert() function call we force 
compiler to generate code to save/restore any used FPU registers. But if we 
do not see the crash with this patch applied happeny any more, then the 
assert function does not get called and FPU not saved/restored, so how this 
helps make this crash go away?

Portion of disassembled code:
      if (e2<0) a=r=z=big;
        else a=r=z=big+sizeof(big)/sizeof(*big) - LDBL_MANT_DIG - 1;

        int steps = 0;
        do {
                *z = y;
     501:       d9 bd 0e e3 ff ff       fnstcw -0x1cf2(%rbp)
        if (e2<0) a=r=z=big;
     507:       48 8d 85 40 e3 ff ff    lea    -0x1cc0(%rbp),%rax
     50e:       45 85 c9                test   %r9d,%r9d
     511:       48 8d 95 cc fe ff ff    lea    -0x134(%rbp),%rdx
     518:       48 89 c1                mov    %rax,%rcx
                *z = y;
     51b:       0f b7 85 0e e3 ff ff    movzwl -0x1cf2(%rbp),%eax
        if (e2<0) a=r=z=big;
     522:       48 0f 49 ca             cmovns %rdx,%rcx
                *z = y;
     526:       80 cc 0c                or     $0xc,%ah
        if (e2<0) a=r=z=big;
     529:       48 89 8d d8 e2 ff ff    mov    %rcx,-0x1d28(%rbp)
                y = 1000000000*(y-*z++);
     530:       48 8d 59 04             lea    0x4(%rcx),%rbx
     534:       48 8d 91 44 1f 00 00    lea    0x1f44(%rcx),%rdx
                *z = y;
     53b:       66 89 85 0c e3 ff ff    mov    %ax,-0x1cf4(%rbp)
     542:       d9 c0                   fld    %st(0)
     544:       d9 ad 0c e3 ff ff       fldcw  -0x1cf4(%rbp)
     54a:       df bd 00 e3 ff ff       fistpll -0x1d00(%rbp)
     550:       d9 ad 0e e3 ff ff       fldcw  -0x1cf2(%rbp)
     556:       48 8b 85 00 e3 ff ff    mov    -0x1d00(%rbp),%rax
     55d:       89 01                   mov    %eax,(%rcx)
                y = 1000000000*(y-*z++);
     55f:       89 c0                   mov    %eax,%eax
     561:       48 89 85 f8 e2 ff ff    mov    %rax,-0x1d08(%rbp)
     568:       df ad f8 e2 ff ff       fildll -0x1d08(%rbp)
     56e:       de e9                   fsubrp %st,%st(1)
     570:       d9 05 00 00 00 00       flds   0x0(%rip)        # 576 
<fmt_fp+0x176>
     576:       dc c9                   fmul   %st,%st(1)
                steps++;
                if(steps > 2000) {
     578:       eb 48                   jmp    5c2 <fmt_fp+0x1c2>
     57a:       d9 c9                   fxch   %st(1)
     57c:       eb 04                   jmp    582 <fmt_fp+0x182>
     57e:       66 90                   xchg   %ax,%ax
     580:       d9 c9                   fxch   %st(1)
                *z = y;
     582:       d9 c0                   fld    %st(0)
     584:       d9 ad 0c e3 ff ff       fldcw  -0x1cf4(%rbp)
     58a:       df bd 00 e3 ff ff       fistpll -0x1d00(%rbp)
     590:       d9 ad 0e e3 ff ff       fldcw  -0x1cf2(%rbp)
                y = 1000000000*(y-*z++);
     596:       48 83 c3 04             add    $0x4,%rbx
                *z = y;
     59a:       48 8b 85 00 e3 ff ff    mov    -0x1d00(%rbp),%rax
     5a1:       89 43 fc                mov    %eax,-0x4(%rbx)
                y = 1000000000*(y-*z++);
     5a4:       89 c0                   mov    %eax,%eax
     5a6:       48 89 85 f8 e2 ff ff    mov    %rax,-0x1d08(%rbp)
     5ad:       df ad f8 e2 ff ff       fildll -0x1d08(%rbp)
     5b3:       de e9                   fsubrp %st,%st(1)
     5b5:       d8 c9                   fmul   %st(1),%st
                if(steps > 2000) {
     5b7:       48 39 d3                cmp    %rdx,%rbx
     5ba:       0f 84 99 10 00 00       je     1659 <fmt_fp+0x1259>
     5c0:       d9 c9                   fxch   %st(1)
                   assert(0);
                }
        } while (y);
     5c2:       d9 ee                   fldz
     5c4:       d9 ca                   fxch   %st(2)
     5c6:       db ea                   fucomi %st(2),%st
     5c8:       dd da                   fstp   %st(2)
     5ca:       7a ae                   jp     57a <fmt_fp+0x17a>
     5cc:       75 b2                   jne    580 <fmt_fp+0x180>
     5ce:       dd d8                   fstp   %st(0)
     5d0:       dd d8                   fstp   %st(0)

On Wednesday, November 28, 2018 at 8:14:05 AM UTC-5, Nadav Har'El wrote:
>
>
> On Wed, Nov 28, 2018 at 2:18 PM Waldek Kozaczuk <[email protected] 
> <javascript:>> wrote:
>
>> On Nov 28, 2018, at 03:58, Nadav Har'El <[email protected] <javascript:>> 
>> wrote:
>>
>>
>> The situation is different with involuntary context switches. When an 
>> asynchronous event, e.g., an interrupt, occurs, the user thread is in a 
>> random position in the code. It may be using all its registers, and the FPU 
>> state (including the old-style FPU and the new SSE and AVX registers). 
>> Because our interrupt handler (which may do anything from running the 
>> scheduler to reading a page from disk on page fault) may need to use any of 
>> these registers, all of them, including the FPU, need to be saved on 
>> interrupt time. The interrupt has a separate stack, and the FPU is saved on 
>> this stack (see fpu_lock use in interrupt()). When the interrupt finishes, 
>> this FPU is restored. This includes involuntary context switching: thread A 
>> receives an interrupt, saves the FPU, does something and decides to switch 
>> to thread B, and a while later we switch back to thread A at which point 
>> the interrupt handler "returns" and restores the  FPU state.
>>
>> Does involuntary case include scenario when enough time designated for 
>> current thread by scheduler expires? I would imaging this would qualify as 
>> interrupt?
>>
>
> Indeed. We set a timer to when this thread's runtime quota will expire, 
> and this timer generates an interrupt. Check out interrupt() - after saving 
> the FPU state and acknowledging the interrupt (EOI), it calls the scheduler 
> (sched::preempt()). This will decide which thread to run next - it may run 
> the same thread again, or a different thread. When sched::preempt() returns 
> - possibly a long time later, it means the scheduler decided to run *this* 
> thread again. At that point, interrupt() returns and just before returning 
> it restores the FPU state automatically.  
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: OSv crashes fairly sporadically with page fault when transcoding video with ffmpeg

Reply via email to