On Thu, Nov 22, 2018 at 9:55 PM Waldek Kozaczuk <[email protected]> wrote:
> I see this crash: > > frame= 275 fps= 47 q=-0.0 size= 520kB time=00:00:1pa2ge. f2a1 ublitt > roauttes=i d3e4 8a.p6pklbiictast/iso ns,p eeadd=d2r200001d00000 0 0 > [registers] > RIP: 0x000000000044ef5f <???+4517727> > RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: 0x0000000000000010 > RAX: 0x8000000000000000 RBX: 0x0000200001d00004 RCX: 0x0000000000000002 > RDX: 0x0000200001cff66c > RSI: 0xfffffffffffffffc RDI: 0x8000000000000000 RBP: 0x0000200001cff7a0 > R8: 0x0000000000004000 > R9: 0x00000000ffffffe5 R10: 0x0000000000004000 R11: 0x8000000000000000 > R12: 0x0000000000000000 > R13: 0x00000000ffffffe5 R14: 0x0000000000004000 R15: 0x8000000000000000 > RSP: 0x0000200001cfda50 > Aborted > > [backtrace] > 0x0000000000346ce2 <???+3435746> > 0x0000000000347946 <mmu::vm_fault(unsigned long, exception_frame*)+310> > 0x00000000003a222b <page_fault+123> > 0x00000000003a10a6 <???+3805350> > > Please note that ffmpeg is constantly printing to screen (vga or serial > console?) some output about progress. > > Once connected to gdb I see this stacktrace: > > (gdb) bt > #0 0x00000000003a83d2 in processor::cli_hlt () at > arch/x64/processor.hh:247 > #1 arch::halt_no_interrupts () at arch/x64/arch.hh:48 > #2 osv::halt () at arch/x64/power.cc:24 > #3 0x000000000023ef34 in abort (fmt=fmt@entry=0x63095b "Aborted\n") at > runtime.cc:132 > #4 0x0000000000202765 in abort () at runtime.cc:98 > #5 0x0000000000346ce3 in mmu::vm_sigsegv (addr=<optimized out>, > ef=0xffff800006550068) at core/mmu.cc:1316 > #6 0x0000000000347947 in mmu::vm_fault (addr=addr@entry=35184402497536, > ef=ef@entry=0xffff800006550068) at core/mmu.cc:1330 > #7 0x00000000003a222c in page_fault (ef=0xffff800006550068) at > arch/x64/mmu.cc:38 > #8 <signal handler called> > #9 0x000000000044ef5f in fmt_fp (f=0x200001cffa50, y=0, w=0, p=2, fl=0, > t=102) at libc/stdio/vfprintf.c:300 > #10 0x0000000000000000 in ?? () > > I wonder if this is related to > https://github.com/cloudius-systems/osv/issues/1010 (though no httpserver > at all) and this https://github.com/cloudius-systems/osv/issues/536. > > Please note that the common thing between all these stack traces is > fmt_fp() function in libc/stdio/vfprintf.c:300. Coincidence? > Interesting. Please open a new issue about it but link to #1010 and #536, which I agree - are probably all exactly the same bug. Such crashes always smell like FPU-state-saving bugs but in this case (which are notoriously difficult to fix), but I wonder... The code in line 300 is: do { *z = y; y = 1000000000*(y-*z++); } while (y); Can you print "y" and "z" with the debugger? The FPU bugs usually cause crashes when memory-copying code using FPU overwrites random parts of memory. So why here do we always get a crash in exactly the same place with this "z", and what overwrote it and when??? (z is set just a couple of lines above and increased here in a tight loop, what can overwrite it? Another possibility is that "y" is kept in a floating point register and clobbered by some interrupt that doesn't save FPU state (or something like that) which causes the loop to continue forever. If you can reliably reproduce this, you can add various printouts or variables to help debug what goes wrong with "z" or "y". Finally, I'm not familiar with this code - this can also be a musl bug - see for example https://www.openwall.com/lists/musl/2014/03/22/7 which says MUSL did have known bugs in this area. You should be able to fairly easily port a new musl's vfprintf.c to OSv (do diff libc/stdio/vfprintf.c musl/src/stdio/vfprintf.c to see what were the only changes OSv deliberately did to musl's vfprintf.c). -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
