I also checked if y == 0 in Gdb when I connected after the crash and indeed it was true.
What about eflags which are set by FUCOMI? Are we saving those? Sent from my iPhone > On Nov 27, 2018, at 17:53, Nadav Har'El <[email protected]> wrote: > > >> On Tue, Nov 27, 2018 at 10:11 PM Nadav Har'El <[email protected]> wrote: >> Indeed, seems like a loop that works on fpu registers and stack. The actual >> loop's test, while(y) is the "fucomi" instruction which compares two >> floating point values one of which being a zero created by "fldz". My >> completely unproven suspicion is that in the middle of this loop we get an >> interrupt (possibly also leading to a context switch, running another >> thread, and only much later returning to this thread), and for some reason >> the floating point state (which includes the register stack, etc.) is not >> saved correctly - or not restored correctly (perhaps restored from a >> corrupted array?). If after such corruption, "y" (in whatever register it >> sits) becomes, for example, NaN, the loop will never finish. I wonder if we >> can print these registers from gdb to see if perhaps gdb showing "y=0" isn't >> really correct. > > > Ok, so I started theorizing what might cause this... > If I remember correctly, OSv currently always saves the FPU state on some > stack, using the fpu_lock type. > Could we possibly be using stacks which are too small to hold this FPU state? > In arch/x64/arch-cpu.hh we set a 4096 byte stack for nested exceptions, 4096 > byte stack for interrupts, and 4096*4 byte stack for normal exceptions. Maybe > one of these is too small? If you can easily reproduce this bug, can you add > a zero to all of these and see if maybe the bug goes away with bigger stacks? > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
