On Tue, Nov 27, 2018 at 10:11 PM Nadav Har'El <[email protected]> wrote:

> Indeed, seems like a loop that works on fpu registers and stack. The
> actual loop's test, while(y) is the "fucomi" instruction which compares two
> floating point values one of which being a zero created by "fldz". My
> completely unproven suspicion is that in the middle of this loop we get an
> interrupt (possibly also leading to a context switch, running another
> thread, and only much later returning to this thread), and for some reason
> the floating point state (which includes the register stack, etc.) is not
> saved correctly - or not restored correctly (perhaps restored from a
> corrupted array?). If after such corruption, "y" (in whatever register it
> sits) becomes, for example, NaN, the loop will never finish. I wonder if we
> can print these registers from gdb to see if perhaps gdb showing "y=0"
> isn't really correct.
>

Ok, so I started theorizing what might cause this...
If I remember correctly, OSv currently always saves the FPU state on some
stack, using the fpu_lock type.
Could we possibly be using stacks which are too small to hold this FPU
state?
In arch/x64/arch-cpu.hh we set a 4096 byte stack for nested exceptions,
4096 byte stack for interrupts, and 4096*4 byte stack for normal
exceptions. Maybe one of these is too small? If you can easily reproduce
this bug, can you add a zero to all of these and see if maybe the bug goes
away with bigger stacks?

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to