On Tue, Nov 27, 2018 at 10:11 PM Nadav Har'El <[email protected]> wrote:
> Indeed, seems like a loop that works on fpu registers and stack. The > actual loop's test, while(y) is the "fucomi" instruction which compares two > floating point values one of which being a zero created by "fldz". My > completely unproven suspicion is that in the middle of this loop we get an > interrupt (possibly also leading to a context switch, running another > thread, and only much later returning to this thread), and for some reason > the floating point state (which includes the register stack, etc.) is not > saved correctly - or not restored correctly (perhaps restored from a > corrupted array?). If after such corruption, "y" (in whatever register it > sits) becomes, for example, NaN, the loop will never finish. I wonder if we > can print these registers from gdb to see if perhaps gdb showing "y=0" > isn't really correct. > Ok, so I started theorizing what might cause this... If I remember correctly, OSv currently always saves the FPU state on some stack, using the fpu_lock type. Could we possibly be using stacks which are too small to hold this FPU state? In arch/x64/arch-cpu.hh we set a 4096 byte stack for nested exceptions, 4096 byte stack for interrupts, and 4096*4 byte stack for normal exceptions. Maybe one of these is too small? If you can easily reproduce this bug, can you add a zero to all of these and see if maybe the bug goes away with bigger stacks? -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
