I have updated #1010 where I think it is possibly a duplicate to #536. As 
far as #1010 I did an experiment where I changed priority field of type 
float in JSON thread structure to long (which makes it not use fmt_fp()) 
and the crash went away. So I wonder if this problem is similar in that the 
the underlying cause might be related to how OSv handles FPU state when 
concurrent threads are trying to printf floating numbers (%f) (that is why 
it happens in fmt_fp()) or possibly there is a bug in this code that was 
fixed in musl.

I will try to create simple test that printfs floating point number in 
multiple threads to see if we can narrow it down even more.

On Thursday, November 22, 2018 at 2:58:40 PM UTC-5, Waldek Kozaczuk wrote:
>
> To reproduce (may not happen every time):
>
> 1) build with ffmpeg (enable h265 in ffmpeg Makefile)
> ./scripts/build image=ffmpeg
>
> 2) In one terminal start ffmpeg to receive video on host:
> cd apps/ffmpeg/ROOTFS
> LD_LIBRARY_PATH=. ./ffmpeg.so -i tcp://0.0.0.0:12345?listen -c copy 
> /home/wkozaczuk/test.mp4
>
> 3) In another terminal start ffmpeg to transcode and send video:
> ./scripts/run.py -c 8 -e '/ffmpeg.so -i 
> http://clips.vorwaerts-gmbh.de/VfE_html5.mp4 -c:v libx265 -crf 28 -c:a 
> aac -b:a 128k -f mpegts tcp://192.168.122.1:12345'
>
> Waldek
>
> On Thursday, November 22, 2018 at 2:55:03 PM UTC-5, Waldek Kozaczuk wrote:
>>
>> I see this crash:
>>
>> frame=  275 fps= 47 q=-0.0 size=     520kB time=00:00:1pa2ge. f2a1 ublitt 
>> roauttes=i d3e4 8a.p6pklbiictast/iso ns,p eeadd=d2r200001d00000 0 0
>> [registers]
>> RIP: 0x000000000044ef5f <???+4517727>
>> RFL: 0x0000000000010202  CS:  0x0000000000000008  SS:  0x0000000000000010
>> RAX: 0x8000000000000000  RBX: 0x0000200001d00004  RCX: 
>> 0x0000000000000002  RDX: 0x0000200001cff66c
>> RSI: 0xfffffffffffffffc  RDI: 0x8000000000000000  RBP: 
>> 0x0000200001cff7a0  R8:  0x0000000000004000
>> R9:  0x00000000ffffffe5  R10: 0x0000000000004000  R11: 
>> 0x8000000000000000  R12: 0x0000000000000000
>> R13: 0x00000000ffffffe5  R14: 0x0000000000004000  R15: 
>> 0x8000000000000000  RSP: 0x0000200001cfda50
>> Aborted
>>
>> [backtrace]
>> 0x0000000000346ce2 <???+3435746>
>> 0x0000000000347946 <mmu::vm_fault(unsigned long, exception_frame*)+310>
>> 0x00000000003a222b <page_fault+123>
>> 0x00000000003a10a6 <???+3805350>
>>
>> Please note that ffmpeg is constantly printing to screen (vga or serial 
>> console?) some output about progress.
>>
>> Once connected to gdb I see this stacktrace:
>>
>> (gdb) bt
>> #0  0x00000000003a83d2 in processor::cli_hlt () at 
>> arch/x64/processor.hh:247
>> #1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
>> #2  osv::halt () at arch/x64/power.cc:24
>> #3  0x000000000023ef34 in abort (fmt=fmt@entry=0x63095b "Aborted\n") at 
>> runtime.cc:132
>> #4  0x0000000000202765 in abort () at runtime.cc:98
>> #5  0x0000000000346ce3 in mmu::vm_sigsegv (addr=<optimized out>, 
>> ef=0xffff800006550068) at core/mmu.cc:1316
>> #6  0x0000000000347947 in mmu::vm_fault (addr=addr@entry=35184402497536, 
>> ef=ef@entry=0xffff800006550068) at core/mmu.cc:1330
>> #7  0x00000000003a222c in page_fault (ef=0xffff800006550068) at 
>> arch/x64/mmu.cc:38
>> #8  <signal handler called>
>> #9  0x000000000044ef5f in fmt_fp (f=0x200001cffa50, y=0, w=0, p=2, fl=0, 
>> t=102) at libc/stdio/vfprintf.c:300
>> #10 0x0000000000000000 in ?? ()
>>
>> I wonder if this is related to 
>> https://github.com/cloudius-systems/osv/issues/1010 (though no 
>> httpserver at all) and this 
>> https://github.com/cloudius-systems/osv/issues/536.
>>
>> Please note that the common thing between all these stack traces is 
>> fmt_fp() function in libc/stdio/vfprintf.c:300. Coincidence?
>>
>> Wadek
>>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to