I was also reading about "lazy" vs "eager" FPU state save/restore in Linux
and how due to some security reasons they are advocating users to switch to
the eager. I think the eager means that on each context switch FPU state
gets saved/restored regardless if FPU registered are used.

Is OSv using "eager" or "lazy" strategy? I am guessing probably the lazy
one.

Also I am not sure about cassandra (ant original older bug) but ffmpeg is
very heavy on floating point arithmetic so maybe that exposes FPU bugs more
easily.

On Tue, Nov 27, 2018 at 6:07 PM Waldek Kozaczuk <[email protected]>
wrote:

> I also checked if y == 0 in Gdb when I connected after the crash and
> indeed it was true.
>
> What about eflags which are set by FUCOMI? Are we saving those?
>
> Sent from my iPhone
>
> On Nov 27, 2018, at 17:53, Nadav Har'El <[email protected]> wrote:
>
>
> On Tue, Nov 27, 2018 at 10:11 PM Nadav Har'El <[email protected]> wrote:
>
>> Indeed, seems like a loop that works on fpu registers and stack. The
>> actual loop's test, while(y) is the "fucomi" instruction which compares two
>> floating point values one of which being a zero created by "fldz". My
>> completely unproven suspicion is that in the middle of this loop we get an
>> interrupt (possibly also leading to a context switch, running another
>> thread, and only much later returning to this thread), and for some reason
>> the floating point state (which includes the register stack, etc.) is not
>> saved correctly - or not restored correctly (perhaps restored from a
>> corrupted array?). If after such corruption, "y" (in whatever register it
>> sits) becomes, for example, NaN, the loop will never finish. I wonder if we
>> can print these registers from gdb to see if perhaps gdb showing "y=0"
>> isn't really correct.
>>
>
> Ok, so I started theorizing what might cause this...
> If I remember correctly, OSv currently always saves the FPU state on some
> stack, using the fpu_lock type.
> Could we possibly be using stacks which are too small to hold this FPU
> state?
> In arch/x64/arch-cpu.hh we set a 4096 byte stack for nested exceptions,
> 4096 byte stack for interrupts, and 4096*4 byte stack for normal
> exceptions. Maybe one of these is too small? If you can easily reproduce
> this bug, can you add a zero to all of these and see if maybe the bug goes
> away with bigger stacks?
>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to