Nadav, I have found this email thread around patches adding AVX support - https://groups.google.com/forum/#!msg/osv-dev/DIbzudXr0jo/glTrLgjFOssJ;context-place=forum/osv-dev.
There is a comment you left specifically about "per-thread fpu save area". Maybe that is something we need to pursue once we confirm that the fpu_state on interrupt stack somehow gets corrupted. I still need to code that checksum logic to verify that indeed it happens. "When we leave a thread during an involuntary context switch (e.g., interrupt), we need to XSAVEOPT its state in the same place we last XRESTORed it from when entering this thread - so this can't be a random location on the stack chosen at interrupt time (as in inplace_fpu_state) because then it won't be the same address used last time and the optimization won't work. So we need to have a per-thread fpu save area, and save the state *there* on interrupt. Funny enough, we already had such a tread->_fpu area in the past, but removed it (commit 202b2cccc14a43d088002a80f7add51e3e4bc4ce)." On Wednesday, November 28, 2018 at 5:03:31 PM UTC-5, Waldek Kozaczuk wrote: > > > > On Wednesday, November 28, 2018 at 4:51:08 PM UTC-5, Nadav Har'El wrote: >> >> On Wed, Nov 28, 2018 at 9:20 PM Waldek Kozaczuk <[email protected]> >> wrote: >> >>> The title of this thread indicates that the crash happens sporadically >>> and indeed it is the case with the scenario described in 1st and 2nd email >>> of this thread. The symptom (stack trace with fmt_fp() function captured by >>> gdb) looks identical to what was reported in >>> https://github.com/cloudius-systems/osv/issues/1010 and this >>> https://github.com/cloudius-systems/osv/issues/536. We cannot be >>> certain about whether the root cause is the same. However it is worth >>> mentioning that crash happens with same symptom almost every time if I run >>> the scenario described in issue #1010 which is different (but similar) than >>> scenario described in 1st and 2nd email of this thread. I am >>> emphasizing this because we have a way to replicate one of these crashes >>> (*possibly >>> same root cause*) in very repeatable way in case we have ideas of >>> things to try. >>> >> >> That is good. If we give up on this effort (at least temporarily), please >> summarize what you know about reproducing this issue in a new bug-tracker >> issue (and reference 1010 and 536) so we could return to it later. >> >> >>> Regarding your excellent explanations how non-FPU and FPU state is >>> saved/restored by OSv I wanted to run a scenario to see if I understand it >>> well (please forgive my ignorance if I am missing something obvious). >>> Imagine we have 2 user threads (uT1 and uT2) executing some *logic that >>> involves floating point operations in a loop* (like the one in the >>> disassembled code). Ffmpeg is a good example as it must be very CPU >>> intensive and doing all kind of FP computations to transcode video. Let us >>> say on given cpu this sequence happens in time: >>> >>> ->uT1 is running (executing long FP logic loop) >>> --> timer triggered interrupt happens because allotted time slice >>> expired [some FPU registers had some inflight data = FPU has state S1) >>> ----> OSv saves FPU state and other registers using fpu_lock construct >>> >> >> So far, right. This state is saved inside the interrupt stack, which >> while thread uT1 is running, is uT1->_arch->interrupt_stack. >> By the way, note that the regular (non-FPU) registers are also on this >> stack - the code in entry.S saves them on the stack and passes the pointer >> to it when it calls interrupt(). >> >> -------> OSv identifies next thread to run which is uT2 >>> ---------> OSv restores FPU to state S1 (fpu_lock goes out of scope) and >>> other registers as well and switches to uT2 >>> >> >> No, this is not what happens now (at least, not if you mean that "S1" >> was uT1's state...). Before interrupt() ends or fpu_lock runs out of scope, >> interrupt calls sched::preempt(). >> This is a weird function. It doesn't do something quickly and return :-) >> Rather, it calls the scheduler to decide which thread to run next. If it >> decides to run uT1 again, it just returns normally (and the FPU state is >> restored and the interrupt returns). >> But if it decides to run uT2 it doesn't quite return... Rather, it >> *switches* to thread uT2. This means we switch to uT2's stack, and jump to >> the program counter where uT2 was before. And where was uT2 running before? >> One of the likely options was that it was itself in interrupt() calling >> preempt() :-) So now, finally, when uT2 resumes, its call to preempt() >> "returns", and we reach the end of the interrupt() function. Remember that >> at this point we're running uT2, with uT2's stack. So when interrupt() >> ends, the fpu_lock goes out of scope, and the FPU state is popped from >> *uT2*'s stack. This is was uT2's FPU state, saved long ago, and now we're >> restoring it, and when interrupt() returns, uT2's correct FPU state will >> have been restored. >> > Thanks. I get it now. The key thing is that FPU state is saved (pushed) to > current and restored from (popped) new thread's stack. > >> >> >>> --------------> UT2 is running (executing long FP logic loop) >>> --> timer triggered interrupt happens again because allotted time slice >>> expired [some FPU registers had some inflight data = FPU has state S2) >>> ... same thing >>> ... same thing >>> ---------> OSv restores FPU to state S2 and other registers as well and >>> switches back to uT1 >>> *--------------> UT1 is resuming where it left above and instead of FPU >>> be in expected S1 state it sees FPU in state S2 -> crash????* >>> >>> What am I missing in my scenario? >>> >>> Lastly I am adding the disassembled portion of fmt_fp() with applied >>> patch like below that makes the repeatable crash (from issue #1010) go away >>> or at least way less frequent. >>> >>> diff --git a/libc/stdio/vfprintf.c b/libc/stdio/vfprintf.c >>> index aac790c0..1e116038 100644 >>> --- a/libc/stdio/vfprintf.c >>> +++ b/libc/stdio/vfprintf.c >>> @@ -8,6 +8,7 @@ >>> #include <inttypes.h> >>> #include <math.h> >>> #include <float.h> >>> +#include <assert.h> >>> /* Some useful macros */ >>> @@ -296,9 +297,14 @@ static int fmt_fp(FILE *f, long double y, int w, >>> int p, int fl, int t) >>> if (e2<0) a=r=z=big; >>> else a=r=z=big+sizeof(big)/sizeof(*big) - LDBL_MANT_DIG - 1; >>> + int steps = 0; >>> do { >>> *z = y; >>> y = 1000000000*(y-*z++); >>> + steps++; >>> + if(steps > 2000) { >>> + assert(0); >>> + } >>> } while (y); >>> while (e2>0) { >>> >>> Your explanation was that because of added assert() function call we >>> force compiler to generate code to save/restore any used FPU registers. >>> >> >> Yes, my thesis (and I'm too tired now to verify this in the assembler >> code below) was that because there is this function call (and the compiler >> doesn't know if it is commonly called or not), the compiler doesn't want to >> leave y in a register, so it will load and store it from memory. But this >> is just a guess. >> You can try instead of assert(0), do >> >> asm volatile ("cli; hlt" : : : "memory"); >> >> This is not a function call, so it will not change the compiled code. It >> will allow you to connect with a debugger to print more stuff at that >> point. Note that this cli; hlt stuff will only hang the single CPU that ran >> this command (if you can reproduce this with -c1, this will be the only cpu >> so things will be easier). >> >> >> >>> But if we do not see the crash with this patch applied happeny any more, >>> then the assert function does not get called and FPU not saved/restored, so >>> how this helps make this crash go away? >>> >>> Portion of disassembled code: >>> if (e2<0) a=r=z=big; >>> else a=r=z=big+sizeof(big)/sizeof(*big) - LDBL_MANT_DIG - 1; >>> >>> int steps = 0; >>> do { >>> *z = y; >>> 501: d9 bd 0e e3 ff ff fnstcw -0x1cf2(%rbp) >>> if (e2<0) a=r=z=big; >>> 507: 48 8d 85 40 e3 ff ff lea -0x1cc0(%rbp),%rax >>> 50e: 45 85 c9 test %r9d,%r9d >>> 511: 48 8d 95 cc fe ff ff lea -0x134(%rbp),%rdx >>> 518: 48 89 c1 mov %rax,%rcx >>> *z = y; >>> 51b: 0f b7 85 0e e3 ff ff movzwl -0x1cf2(%rbp),%eax >>> if (e2<0) a=r=z=big; >>> 522: 48 0f 49 ca cmovns %rdx,%rcx >>> *z = y; >>> 526: 80 cc 0c or $0xc,%ah >>> if (e2<0) a=r=z=big; >>> 529: 48 89 8d d8 e2 ff ff mov %rcx,-0x1d28(%rbp) >>> y = 1000000000*(y-*z++); >>> 530: 48 8d 59 04 lea 0x4(%rcx),%rbx >>> 534: 48 8d 91 44 1f 00 00 lea 0x1f44(%rcx),%rdx >>> *z = y; >>> 53b: 66 89 85 0c e3 ff ff mov %ax,-0x1cf4(%rbp) >>> 542: d9 c0 fld %st(0) >>> 544: d9 ad 0c e3 ff ff fldcw -0x1cf4(%rbp) >>> 54a: df bd 00 e3 ff ff fistpll -0x1d00(%rbp) >>> 550: d9 ad 0e e3 ff ff fldcw -0x1cf2(%rbp) >>> 556: 48 8b 85 00 e3 ff ff mov -0x1d00(%rbp),%rax >>> 55d: 89 01 mov %eax,(%rcx) >>> y = 1000000000*(y-*z++); >>> 55f: 89 c0 mov %eax,%eax >>> 561: 48 89 85 f8 e2 ff ff mov %rax,-0x1d08(%rbp) >>> 568: df ad f8 e2 ff ff fildll -0x1d08(%rbp) >>> 56e: de e9 fsubrp %st,%st(1) >>> 570: d9 05 00 00 00 00 flds 0x0(%rip) # 576 >>> <fmt_fp+0x176> >>> 576: dc c9 fmul %st,%st(1) >>> steps++; >>> if(steps > 2000) { >>> 578: eb 48 jmp 5c2 <fmt_fp+0x1c2> >>> 57a: d9 c9 fxch %st(1) >>> 57c: eb 04 jmp 582 <fmt_fp+0x182> >>> 57e: 66 90 xchg %ax,%ax >>> 580: d9 c9 fxch %st(1) >>> *z = y; >>> 582: d9 c0 fld %st(0) >>> 584: d9 ad 0c e3 ff ff fldcw -0x1cf4(%rbp) >>> 58a: df bd 00 e3 ff ff fistpll -0x1d00(%rbp) >>> 590: d9 ad 0e e3 ff ff fldcw -0x1cf2(%rbp) >>> y = 1000000000*(y-*z++); >>> 596: 48 83 c3 04 add $0x4,%rbx >>> *z = y; >>> 59a: 48 8b 85 00 e3 ff ff mov -0x1d00(%rbp),%rax >>> 5a1: 89 43 fc mov %eax,-0x4(%rbx) >>> y = 1000000000*(y-*z++); >>> 5a4: 89 c0 mov %eax,%eax >>> 5a6: 48 89 85 f8 e2 ff ff mov %rax,-0x1d08(%rbp) >>> 5ad: df ad f8 e2 ff ff fildll -0x1d08(%rbp) >>> 5b3: de e9 fsubrp %st,%st(1) >>> 5b5: d8 c9 fmul %st(1),%st >>> if(steps > 2000) { >>> 5b7: 48 39 d3 cmp %rdx,%rbx >>> 5ba: 0f 84 99 10 00 00 je 1659 <fmt_fp+0x1259> >>> 5c0: d9 c9 fxch %st(1) >>> assert(0); >>> } >>> } while (y); >>> 5c2: d9 ee fldz >>> 5c4: d9 ca fxch %st(2) >>> 5c6: db ea fucomi %st(2),%st >>> 5c8: dd da fstp %st(2) >>> 5ca: 7a ae jp 57a <fmt_fp+0x17a> >>> 5cc: 75 b2 jne 580 <fmt_fp+0x180> >>> 5ce: dd d8 fstp %st(0) >>> 5d0: dd d8 fstp %st(0) >>> >>> On Wednesday, November 28, 2018 at 8:14:05 AM UTC-5, Nadav Har'El wrote: >>>> >>>> >>>> On Wed, Nov 28, 2018 at 2:18 PM Waldek Kozaczuk <[email protected]> >>>> wrote: >>>> >>>>> On Nov 28, 2018, at 03:58, Nadav Har'El <[email protected]> wrote: >>>>> >>>>> >>>>> The situation is different with involuntary context switches. When an >>>>> asynchronous event, e.g., an interrupt, occurs, the user thread is in a >>>>> random position in the code. It may be using all its registers, and the >>>>> FPU >>>>> state (including the old-style FPU and the new SSE and AVX registers). >>>>> Because our interrupt handler (which may do anything from running the >>>>> scheduler to reading a page from disk on page fault) may need to use any >>>>> of >>>>> these registers, all of them, including the FPU, need to be saved on >>>>> interrupt time. The interrupt has a separate stack, and the FPU is saved >>>>> on >>>>> this stack (see fpu_lock use in interrupt()). When the interrupt >>>>> finishes, >>>>> this FPU is restored. This includes involuntary context switching: thread >>>>> A >>>>> receives an interrupt, saves the FPU, does something and decides to >>>>> switch >>>>> to thread B, and a while later we switch back to thread A at which point >>>>> the interrupt handler "returns" and restores the FPU state. >>>>> >>>>> Does involuntary case include scenario when enough time designated for >>>>> current thread by scheduler expires? I would imaging this would qualify >>>>> as >>>>> interrupt? >>>>> >>>> >>>> Indeed. We set a timer to when this thread's runtime quota will expire, >>>> and this timer generates an interrupt. Check out interrupt() - after >>>> saving >>>> the FPU state and acknowledging the interrupt (EOI), it calls the >>>> scheduler >>>> (sched::preempt()). This will decide which thread to run next - it may run >>>> the same thread again, or a different thread. When sched::preempt() >>>> returns >>>> - possibly a long time later, it means the scheduler decided to run *this* >>>> thread again. At that point, interrupt() returns and just before returning >>>> it restores the FPU state automatically. >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "OSv Development" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
