On Sat, Dec 1, 2018 at 6:26 PM Waldek Kozaczuk <[email protected]> wrote:

> Here we go again ... I was thinking more about it.
>
> First of regarding potential corruption of FPU state during involuntary
> interrupt task switch, unless my code that verifies it is not the case
> using the checksum logic (see my previous email with the patch), there
> could be another possibility - we are telling xsave/xstor to save/restore
> to/from wrong location in memory. Meaning that whatever got saved stays not
> corrupted but somehow we are restoring wrong data for whatever reason.
> Probably very unlikely.
>

Yes, I think it is unlikely - since you are checking the checksum, it means
we are restoring valid saved state, but someone else's saved state. I don't
have an idea how this can happen.

If we rule out corruption of the saved FPU state, I can think of three
remaining options:

1. This is not a saved-FPU-state corruption - printf() really goes into an
endless loop when it gets a certain input. I'm still hoping you can get
more detailed printouts on what exactly what printf()'s *input* when it
went into this endless loop. One thing you can do is to have a variable
(perhaps thread-local global if you don't want it optimized out?) holding
that printf_fp's last input, and when you get a crash, print this global.
Or something.

2. This is a situation of some asynchronous even where we ran a handler -
especially the scheduler which uses the FPU - but we *forgot* to save the
FPU state. I don't know what this can be. This can perhaps be debugged with
tracepoints - enable various tracepoints, especially the scheduler, and fpu
saves/restores, and after the crash look at the tracepoint log. Can we find
unexpected scheduler calls without an fpu save?

3. This is a situation where we saved the FPU, but *after* doing some FPU
work. We had and fix this kind of issue in
4349bfd3583df03d44cda480049e630081d3a20b but it's more surprising to see
this kind of with x87 registers, not SSE, I can't think of a reason why the
compiler will touch these registers without our intent.


> Secondly I think that part of our theory, that this famous loop in
> vfprintf loops infinitely until tries to access unmapped page of memory, is
> very likely correct. But it might happening for different reason. So far we
> thought that simply the number is NaN or corrupt that makes FPU Instruction
> calculate NAN forever. But what if for whatever reason we are experiencing
> FPU stack overflow - all registers are taken and any instruction trying to
> load data into it keeps failing because it is full.
>

This is an interesting point. The eight x87 registers are freely
addressable (st0 through st7), but can *also* be used as a stack. The
current position is stored as three bits in the floating-point status word
(FSW), and the fxsave (or newer xsave...) saves this as well on involuntary
context switches.

The calling convention when calling a function (i.e., voluntary context
switch) is that this stack is popped. The same is true on return (except
st0 can be used for returning a floating-point value, if this is a function
returning a floating point value). I don't know what will happen if a
function violates this convention.


> I believe this exception is disabled but I can enable it to see if I am
> correct and it happens somewhere in the code. Now what if after voluntary
> or involuntary thread switch the FPU stack is full and the thread switched
> to simply does not know it. Whose responsibility is in this case to make
> sure that FPU is in clean state (empty)?
>

When we switch *back* to an old thread, we should switch to its old FPU
stack, including its old position in the x87 stack, so if it's full, it's
this thread's fault.
But on a *new* thread, we do need to make sure it got a clean slate. We had
a bug in this long ago, which 0e8d9b5e0a1af02a31799fe1d40e434bc2c2a5fb
fixed - thread_main_c() now calls processor::init_fpu(). This calls the
FNINIT instruction, which among other things, resets the stack postion (in
FSW) to 0.

But now that I think about this, in case of an involuntary switch to the
kernel (e.g., interrupt) what FPU state does the kernel code get? What if
the kernel gets a full fpu stack and then can't do its own FPU
calculations? Should fpu_lock() run fpu_init() after saving the FPU state?
In any case, I don't think this has anything to do with the problem at
hand. I'll open a separate issue.


> Maybe the involuntary switch has/should have two sub-cases:
> 1) the thread we are switching to before was also involuntarily switched
> and we need to restore its FPU state from interrupt stack which we do
> 2) the thread we are switching to voluntarily yielded and ... I think FPU
> should be in somewhat clean state, at least be empty unless something left
> it full before, right?
>

I think you may have indeed found a problem (I'm not sure it's related to
the bug we're chasing, but a problem nontheless, and *may* be related):

We already have these two cases, implicitly: As I explained in a previous
mail, when we context switch *back* to a thread, we get back to where it
was when it gave up the CPU:
1. If this was an involuntary context switch, we find ourselves returning
(very belatedly) from the scheduler to the end of the interrupt() function,
and now this function ends and pops the saved fpu (from the thread's
interrupt stack), and finally returns from interrupt() to the original
thread.
2. If this was a voluntary context switch, the scheduler returns into
whatever function which called it (sleep(), read(), etc.) and that returns
to. The floating point state was never saved (it doesn't need to be,
because a function was called) but, on return, it was never restored. It
just gets a copy of whatever junk was there from the previous thread or OSv
code that ran. The junk st0..st7 registers are fine - the caller cannot
count on anything in there too. But the control registers might now have
junk as well, and that OSv does have to restore, and that it does since
commit 0959f20863b119c5ca7d1e663bc22e6329814386. However, we will never
restore the floating point *status* word (FSW). So if one thread was
interrupted with a full fp stack, another thread we switch to might be
returned-to with a full stack.

*You can try adding an FNINIT in arch-switch.hh *before*
processor::fldcw(fpucw) and see if it fixes anything.*

By the way, to be honest, all this discussion is about x87. I don't know
what is the modern SSE equivalent :-(

Finally, it may indeed make sense to enable fpu stack overflow exceptions
(and their SSE equivalent??) to see if we have any of these problems.





> My thinking might be all wrong. But just food for thought.
>
> Waldek
>
> Sent from my iPhone
>
> On Dec 1, 2018, at 00:46, Waldek Kozaczuk <[email protected]> wrote:
>
> My theory has fallen apart: I modified httpserver like this to pass
> constant float value of 0.5 and still getting same crash:
>
> index 358a86c6..126ab5f4 100644
> --- a/modules/httpserver-api/api/os.cc
> +++ b/modules/httpserver-api/api/os.cc
> @@ -131,7 +131,7 @@ void init(routes& routes)
>              thread.migrations = t.stat_migrations.get();
>              thread.preemptions = t.stat_preemptions.get();
>              thread.name = t.name();
> -            thread.priority = t.priority();
> +            thread.priority = 1.0;//t.priority();
>              thread.stack_size = t.get_stack_info().size;
>              thread.status = t.get_status();
>              threads.list.push(thread);
>
>  I even created separate test program that keeps sprintf'ing both NAN and
> regular float numbers using 5 threads to replace ffmpeg and I could not
> replicate same program running exact same scenario with httpserver.
>
> As I wrote in other thread I added checksum logic to detect FPU state
> corruption that unfortunately does not detect anything. But is this logic
> actually correct?
>
> Ediff --git a/arch/x64/arch-cpu.cc b/arch/x64/arch-cpu.cc
> index bb96af3d..34bf6081 100644
> --- a/arch/x64/arch-cpu.cc
> +++ b/arch/x64/arch-cpu.cc
> @@ -42,6 +42,14 @@ exception_guard::~exception_guard()
>      sched::cpu::current()->arch.exit_exception();
>  }
>
> +static inline u64 calculate_fpu_state_checksum(processor::fpu_state *s) {
> +    u64 val = 0;
> +    char* data = s->legacy;
> +    for (u64 i = 0; i < sizeof(processor::fpu_state) - sizeof(u64); i++)
> +       val += data[i];
> +    return val;
> +}
> +
>  extern "C"
>  [[gnu::target("no-sse")]]
>  void fpu_state_init_xsave(processor::fpu_state *s) {
> @@ -61,21 +69,25 @@ void fpu_state_init_fxsave(processor::fpu_state *s) {
>  extern "C"
>  void fpu_state_save_xsave(processor::fpu_state *s) {
>      processor::xsave(s, -1);
> +    s->checksum = calculate_fpu_state_checksum(s);
>  }
>
>  extern "C"
>  void fpu_state_save_fxsave(processor::fpu_state *s) {
>      processor::fxsave(s);
> +    s->checksum = calculate_fpu_state_checksum(s);
>  }
>
>  extern "C"
>  void fpu_state_restore_xsave(processor::fpu_state *s) {
>      processor::xrstor(s, -1);
> +    assert(s->checksum == calculate_fpu_state_checksum(s));
>  }
>
>  extern "C"
>  void fpu_state_restore_fxsave(processor::fpu_state *s) {
>      processor::fxrstor(s);
> +    assert(s->checksum == calculate_fpu_state_checksum(s));
>  }
>
>  extern "C"
> diff --git a/arch/x64/processor.hh b/arch/x64/processor.hh
> index 6b1fbf66..f699d5b8 100644
> --- a/arch/x64/processor.hh
> +++ b/arch/x64/processor.hh
> @@ -343,6 +343,7 @@ struct fpu_state {
>      char xsavehdr[24];
>      char reserved[40];
>      char ymm[256];
> +    u64 checksum;
>  } __attribute__((packed));
>
>  inline void fxsave(fpu_state* s)
>
> On Friday, November 30, 2018 at 7:43:31 PM UTC-5, Nadav Har'El wrote:
>>
>> On Thu, Nov 29, 2018 at 9:33 PM Waldek Kozaczuk <[email protected]>
>> wrote:
>>
>>> I wonder if per this - https://software.intel.com/en-us/node/523375 -
>>> we are enabling correct bits for xsave if caste is what being used here.
>>>
>>> “The second operand is a save/restore mask specifying the saved/restored
>>> extended states. The value of the mask is ANDed with
>>> XFEATURE_ENABLED_MASK(XCR0). A particular extended state is saved/restored
>>> only if the corresponding bit of both save/restore mask and
>>> XFEATURE_ENABLED_MASK is set to '1'.”
>>>
>>
>> Interesting question. In arch/x64/arch-cpu.hh, arch_cpu::init_on_cpu(),
>> you can see:
>>
>>     if (features().xsave) {
>>         auto bits = xcr0_x87 | xcr0_sse;
>>         if (features().avx) {
>>             bits |= xcr0_avx;
>>         }
>>         write_xcr(xcr0, bits);
>>     }
>>
>> So we're turning on the "x87", "sse", and "avx" bits. Are there any other
>> bits we need to set (or at least, not zero with this overwrite-all-bits
>> code)?
>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "OSv Development" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/osv-dev/ZAjvi_vEhnA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to