Re: OSv crashes fairly sporadically with page fault when transcoding video with ffmpeg

Nadav Har'El Mon, 26 Nov 2018 06:27:48 -0800

On Thu, Nov 22, 2018 at 9:55 PM Waldek Kozaczuk <[email protected]>
wrote:


> I see this crash:
>
> frame=  275 fps= 47 q=-0.0 size=     520kB time=00:00:1pa2ge. f2a1 ublitt
> roauttes=i d3e4 8a.p6pklbiictast/iso ns,p eeadd=d2r200001d00000 0 0
> [registers]
> RIP: 0x000000000044ef5f <???+4517727>
> RFL: 0x0000000000010202  CS:  0x0000000000000008  SS:  0x0000000000000010
> RAX: 0x8000000000000000  RBX: 0x0000200001d00004  RCX: 0x0000000000000002
> RDX: 0x0000200001cff66c
> RSI: 0xfffffffffffffffc  RDI: 0x8000000000000000  RBP: 0x0000200001cff7a0
> R8:  0x0000000000004000
> R9:  0x00000000ffffffe5  R10: 0x0000000000004000  R11: 0x8000000000000000
> R12: 0x0000000000000000
> R13: 0x00000000ffffffe5  R14: 0x0000000000004000  R15: 0x8000000000000000
> RSP: 0x0000200001cfda50
> Aborted
>
> [backtrace]
> 0x0000000000346ce2 <???+3435746>
> 0x0000000000347946 <mmu::vm_fault(unsigned long, exception_frame*)+310>
> 0x00000000003a222b <page_fault+123>
> 0x00000000003a10a6 <???+3805350>
>
> Please note that ffmpeg is constantly printing to screen (vga or serial
> console?) some output about progress.
>
> Once connected to gdb I see this stacktrace:
>
> (gdb) bt
> #0  0x00000000003a83d2 in processor::cli_hlt () at
> arch/x64/processor.hh:247
> #1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
> #2  osv::halt () at arch/x64/power.cc:24
> #3  0x000000000023ef34 in abort (fmt=fmt@entry=0x63095b "Aborted\n") at
> runtime.cc:132
> #4  0x0000000000202765 in abort () at runtime.cc:98
> #5  0x0000000000346ce3 in mmu::vm_sigsegv (addr=<optimized out>,
> ef=0xffff800006550068) at core/mmu.cc:1316
> #6  0x0000000000347947 in mmu::vm_fault (addr=addr@entry=35184402497536,
> ef=ef@entry=0xffff800006550068) at core/mmu.cc:1330
> #7  0x00000000003a222c in page_fault (ef=0xffff800006550068) at
> arch/x64/mmu.cc:38
> #8  <signal handler called>
> #9  0x000000000044ef5f in fmt_fp (f=0x200001cffa50, y=0, w=0, p=2, fl=0,
> t=102) at libc/stdio/vfprintf.c:300
> #10 0x0000000000000000 in ?? ()
>
> I wonder if this is related to
> https://github.com/cloudius-systems/osv/issues/1010 (though no httpserver
> at all) and this https://github.com/cloudius-systems/osv/issues/536.
>
> Please note that the common thing between all these stack traces is
> fmt_fp() function in libc/stdio/vfprintf.c:300. Coincidence?
>

Interesting. Please open a new issue about it but link to #1010 and #536,
which I agree - are probably all exactly the same bug.

Such crashes always smell like FPU-state-saving bugs but in this case
(which are notoriously difficult to fix), but I wonder...
The code in line 300 is:

       do {
                *z = y;
                y = 1000000000*(y-*z++);
        } while (y);


Can you print "y" and "z" with the debugger?

The FPU bugs usually cause crashes when memory-copying code using FPU
overwrites random parts of memory. So why here do we always get a crash in
exactly the same place with this "z", and what overwrote it and when??? (z
is set just a couple of lines above and increased here in a tight loop,
what can overwrite it?

Another possibility is that "y" is kept in a floating point register and
clobbered by some interrupt that doesn't save FPU state (or something like
that) which causes the loop to continue forever.
If you can reliably reproduce this, you can add various printouts or
variables to help debug what goes wrong with "z" or "y".

Finally, I'm not familiar with this code - this can also be a musl bug -
see for example https://www.openwall.com/lists/musl/2014/03/22/7 which says
MUSL did have known bugs in this area. You should be able to fairly easily
port a new musl's vfprintf.c to OSv (do diff libc/stdio/vfprintf.c
musl/src/stdio/vfprintf.c to see what were the only changes OSv
deliberately did to musl's vfprintf.c).

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: OSv crashes fairly sporadically with page fault when transcoding video with ffmpeg

Reply via email to