On Sat, Dec 1, 2018 at 6:26 PM Waldek Kozaczuk <[email protected]> wrote:
> Here we go again ... I was thinking more about it. > > First of regarding potential corruption of FPU state during involuntary > interrupt task switch, unless my code that verifies it is not the case > using the checksum logic (see my previous email with the patch), there > could be another possibility - we are telling xsave/xstor to save/restore > to/from wrong location in memory. Meaning that whatever got saved stays not > corrupted but somehow we are restoring wrong data for whatever reason. > Probably very unlikely. > Yes, I think it is unlikely - since you are checking the checksum, it means we are restoring valid saved state, but someone else's saved state. I don't have an idea how this can happen. If we rule out corruption of the saved FPU state, I can think of three remaining options: 1. This is not a saved-FPU-state corruption - printf() really goes into an endless loop when it gets a certain input. I'm still hoping you can get more detailed printouts on what exactly what printf()'s *input* when it went into this endless loop. One thing you can do is to have a variable (perhaps thread-local global if you don't want it optimized out?) holding that printf_fp's last input, and when you get a crash, print this global. Or something. 2. This is a situation of some asynchronous even where we ran a handler - especially the scheduler which uses the FPU - but we *forgot* to save the FPU state. I don't know what this can be. This can perhaps be debugged with tracepoints - enable various tracepoints, especially the scheduler, and fpu saves/restores, and after the crash look at the tracepoint log. Can we find unexpected scheduler calls without an fpu save? 3. This is a situation where we saved the FPU, but *after* doing some FPU work. We had and fix this kind of issue in 4349bfd3583df03d44cda480049e630081d3a20b but it's more surprising to see this kind of with x87 registers, not SSE, I can't think of a reason why the compiler will touch these registers without our intent. > Secondly I think that part of our theory, that this famous loop in > vfprintf loops infinitely until tries to access unmapped page of memory, is > very likely correct. But it might happening for different reason. So far we > thought that simply the number is NaN or corrupt that makes FPU Instruction > calculate NAN forever. But what if for whatever reason we are experiencing > FPU stack overflow - all registers are taken and any instruction trying to > load data into it keeps failing because it is full. > This is an interesting point. The eight x87 registers are freely addressable (st0 through st7), but can *also* be used as a stack. The current position is stored as three bits in the floating-point status word (FSW), and the fxsave (or newer xsave...) saves this as well on involuntary context switches. The calling convention when calling a function (i.e., voluntary context switch) is that this stack is popped. The same is true on return (except st0 can be used for returning a floating-point value, if this is a function returning a floating point value). I don't know what will happen if a function violates this convention. > I believe this exception is disabled but I can enable it to see if I am > correct and it happens somewhere in the code. Now what if after voluntary > or involuntary thread switch the FPU stack is full and the thread switched > to simply does not know it. Whose responsibility is in this case to make > sure that FPU is in clean state (empty)? > When we switch *back* to an old thread, we should switch to its old FPU stack, including its old position in the x87 stack, so if it's full, it's this thread's fault. But on a *new* thread, we do need to make sure it got a clean slate. We had a bug in this long ago, which 0e8d9b5e0a1af02a31799fe1d40e434bc2c2a5fb fixed - thread_main_c() now calls processor::init_fpu(). This calls the FNINIT instruction, which among other things, resets the stack postion (in FSW) to 0. But now that I think about this, in case of an involuntary switch to the kernel (e.g., interrupt) what FPU state does the kernel code get? What if the kernel gets a full fpu stack and then can't do its own FPU calculations? Should fpu_lock() run fpu_init() after saving the FPU state? In any case, I don't think this has anything to do with the problem at hand. I'll open a separate issue. > Maybe the involuntary switch has/should have two sub-cases: > 1) the thread we are switching to before was also involuntarily switched > and we need to restore its FPU state from interrupt stack which we do > 2) the thread we are switching to voluntarily yielded and ... I think FPU > should be in somewhat clean state, at least be empty unless something left > it full before, right? > I think you may have indeed found a problem (I'm not sure it's related to the bug we're chasing, but a problem nontheless, and *may* be related): We already have these two cases, implicitly: As I explained in a previous mail, when we context switch *back* to a thread, we get back to where it was when it gave up the CPU: 1. If this was an involuntary context switch, we find ourselves returning (very belatedly) from the scheduler to the end of the interrupt() function, and now this function ends and pops the saved fpu (from the thread's interrupt stack), and finally returns from interrupt() to the original thread. 2. If this was a voluntary context switch, the scheduler returns into whatever function which called it (sleep(), read(), etc.) and that returns to. The floating point state was never saved (it doesn't need to be, because a function was called) but, on return, it was never restored. It just gets a copy of whatever junk was there from the previous thread or OSv code that ran. The junk st0..st7 registers are fine - the caller cannot count on anything in there too. But the control registers might now have junk as well, and that OSv does have to restore, and that it does since commit 0959f20863b119c5ca7d1e663bc22e6329814386. However, we will never restore the floating point *status* word (FSW). So if one thread was interrupted with a full fp stack, another thread we switch to might be returned-to with a full stack. *You can try adding an FNINIT in arch-switch.hh *before* processor::fldcw(fpucw) and see if it fixes anything.* By the way, to be honest, all this discussion is about x87. I don't know what is the modern SSE equivalent :-( Finally, it may indeed make sense to enable fpu stack overflow exceptions (and their SSE equivalent??) to see if we have any of these problems. > My thinking might be all wrong. But just food for thought. > > Waldek > > Sent from my iPhone > > On Dec 1, 2018, at 00:46, Waldek Kozaczuk <[email protected]> wrote: > > My theory has fallen apart: I modified httpserver like this to pass > constant float value of 0.5 and still getting same crash: > > index 358a86c6..126ab5f4 100644 > --- a/modules/httpserver-api/api/os.cc > +++ b/modules/httpserver-api/api/os.cc > @@ -131,7 +131,7 @@ void init(routes& routes) > thread.migrations = t.stat_migrations.get(); > thread.preemptions = t.stat_preemptions.get(); > thread.name = t.name(); > - thread.priority = t.priority(); > + thread.priority = 1.0;//t.priority(); > thread.stack_size = t.get_stack_info().size; > thread.status = t.get_status(); > threads.list.push(thread); > > I even created separate test program that keeps sprintf'ing both NAN and > regular float numbers using 5 threads to replace ffmpeg and I could not > replicate same program running exact same scenario with httpserver. > > As I wrote in other thread I added checksum logic to detect FPU state > corruption that unfortunately does not detect anything. But is this logic > actually correct? > > Ediff --git a/arch/x64/arch-cpu.cc b/arch/x64/arch-cpu.cc > index bb96af3d..34bf6081 100644 > --- a/arch/x64/arch-cpu.cc > +++ b/arch/x64/arch-cpu.cc > @@ -42,6 +42,14 @@ exception_guard::~exception_guard() > sched::cpu::current()->arch.exit_exception(); > } > > +static inline u64 calculate_fpu_state_checksum(processor::fpu_state *s) { > + u64 val = 0; > + char* data = s->legacy; > + for (u64 i = 0; i < sizeof(processor::fpu_state) - sizeof(u64); i++) > + val += data[i]; > + return val; > +} > + > extern "C" > [[gnu::target("no-sse")]] > void fpu_state_init_xsave(processor::fpu_state *s) { > @@ -61,21 +69,25 @@ void fpu_state_init_fxsave(processor::fpu_state *s) { > extern "C" > void fpu_state_save_xsave(processor::fpu_state *s) { > processor::xsave(s, -1); > + s->checksum = calculate_fpu_state_checksum(s); > } > > extern "C" > void fpu_state_save_fxsave(processor::fpu_state *s) { > processor::fxsave(s); > + s->checksum = calculate_fpu_state_checksum(s); > } > > extern "C" > void fpu_state_restore_xsave(processor::fpu_state *s) { > processor::xrstor(s, -1); > + assert(s->checksum == calculate_fpu_state_checksum(s)); > } > > extern "C" > void fpu_state_restore_fxsave(processor::fpu_state *s) { > processor::fxrstor(s); > + assert(s->checksum == calculate_fpu_state_checksum(s)); > } > > extern "C" > diff --git a/arch/x64/processor.hh b/arch/x64/processor.hh > index 6b1fbf66..f699d5b8 100644 > --- a/arch/x64/processor.hh > +++ b/arch/x64/processor.hh > @@ -343,6 +343,7 @@ struct fpu_state { > char xsavehdr[24]; > char reserved[40]; > char ymm[256]; > + u64 checksum; > } __attribute__((packed)); > > inline void fxsave(fpu_state* s) > > On Friday, November 30, 2018 at 7:43:31 PM UTC-5, Nadav Har'El wrote: >> >> On Thu, Nov 29, 2018 at 9:33 PM Waldek Kozaczuk <[email protected]> >> wrote: >> >>> I wonder if per this - https://software.intel.com/en-us/node/523375 - >>> we are enabling correct bits for xsave if caste is what being used here. >>> >>> “The second operand is a save/restore mask specifying the saved/restored >>> extended states. The value of the mask is ANDed with >>> XFEATURE_ENABLED_MASK(XCR0). A particular extended state is saved/restored >>> only if the corresponding bit of both save/restore mask and >>> XFEATURE_ENABLED_MASK is set to '1'.” >>> >> >> Interesting question. In arch/x64/arch-cpu.hh, arch_cpu::init_on_cpu(), >> you can see: >> >> if (features().xsave) { >> auto bits = xcr0_x87 | xcr0_sse; >> if (features().avx) { >> bits |= xcr0_avx; >> } >> write_xcr(xcr0, bits); >> } >> >> So we're turning on the "x87", "sse", and "avx" bits. Are there any other >> bits we need to set (or at least, not zero with this overwrite-all-bits >> code)? >> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "OSv Development" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/osv-dev/ZAjvi_vEhnA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "OSv Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
