On Sat, Feb 25, 2017 at 11:18:30AM +1100, Bruce Evans wrote:
> On Fri, 24 Feb 2017, John Baldwin wrote:
> 
> > On Friday, February 24, 2017 07:14:05 PM Konstantin Belousov wrote:
> >> On Fri, Feb 24, 2017 at 08:53:27AM -0800, Rodney W. Grimes wrote:
> >>>> Author: kib
> >>>> Date: Fri Feb 24 16:02:01 2017
> >>>> New Revision: 314210
> >>>> URL: https://svnweb.freebsd.org/changeset/base/314210
> >>>>
> >>>> Log:
> >>>>   MFC r313154:
> >>>>   For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE
> >>>>   and device npx.
> >>>
> >>> Um, why?????
> 
> Since the static configuration is only useful for testing, and complicates
> the code, and tends to rot (it was most recently broken by only adding
> fcmpset() in the !CPU_DISABLE_CMPXCHG clause of atomic.h and then using
> fcmpset() in MI code without any ifdef).
> 
> >>> Makes it much easier to test soft float if we can remove
> >>> the npx device.   Or has soft float support died yet again?
> 
> Not much easier.  First you have to fix the bitrot.  Than maybe rewrite
> the static configuration code to be less complicated and more maintainable.
> It is easier to use dynamic configuration for everything.  This only costs
> a few hundred bytes (unless you have really large and complicated ifdefs
> to omit more code statically) and a few cycles quite often at runtime.  No
> one cares much about the few cycles.
> 
> >> Soft float was removed very long time ago.
> >
> > I think it was gone in 5.0.
> >
> >>> Yes, an i386 without an FPU is anchient by why are we removing working
> >>> functionality?
> >> This question makes an impression that you think that kernel would not
> >> boot on a machine without FPU.  The code to tolerate such configuration
> >> is there, but it is not tested for obvious reasons.
> >>
> >> Completely different issue is that userspace requires FPU and e.g. /bin/sh
> >> traps on the next setjmp(3) call.
> 
> This is my context-switching code for the FPU in setjmp().  It requires
> either an FPU or an emulator, and there was an emulator when it was written.
> 
> When I discussed removing the static configuration code with kib, I
> mentioned that SSE is still not properly supported on i386, but haven't
> got around to responding with the details.  The main one is this code
> in setjmp().  It is is still completely missing SSE support.  So FPU+SSE
> states set by fenv get restored inconsistently by longjmp().  The current
> brokenness seems to be:
> - i386 restores the control word for the i387 only.  fenve has dynamic
>    SSE tests, and it sets both the x87 and the SSE control works for
>    things like fesetround().  Then longjmp() leaves the SSE control word
>    inconsistent.
We do not have unused space in the jmp_buf to preserve SSE state.
Do you mean to restore SSE MXCSR control bits to the same state as x87 cw ?

> - both amd64 and i386 attempt to preserve the exception flags.  This
>    might be useful, but is quite broken.
> 
>    My original fixes cleared the i387 env before restoring the control
>    word.  Other systems were broken in not restoring anything or more
>    broken in not clearing anything.  According to das's commits that
>    changed this, other systems still didn't restore the control word(s)
>    16 years later in 2008, and C99 doesn't require restoring the control
>    words, but C99 does require restoring the status word.  I couldn't
>    find where C99 requires this.
> 
>    For the i387 or any FPU with similar or even more imprecise exceptions,
>    leaving unmasked exceptions to trap again later would be painful.
> 
>    Anyway, das's changes don't really work:
>    - kib committed my fix to stop clearing x87 exception flags in the
>      kernel part of the SIGFPE handler in 2012.  This makes x87 exceptions
>      faults (repeat at the FP instruction that trapped, if a SIGFPE handler
>      returns), which is too surprising for most applications, but few notice
>      because few even unmask the traps.  So it is now possible to recover
>      the exception flags.  But this is quite complicated, and far too hard
>      to do in longjmp() from a signal handler.  So the flags aren't actually
>      preserved for the longjmp() freom a signal handler.
>    - for the x87, there is other state to restore or clear.  The status word
>      contains the stack pointer together with the exception flags.  The tag
>      word contains the status of the 8 registers on the stack.  The stack
>      pointer can be anything provided the tag word is clear.  For longjmp()
>      from signal handlers, we depend on signal handlers getting a clean
>      state so that there is no garbage on the stack.  I'm not sure if all
>      combinations of kernels and libraries are consistent about this (old
>      signal handlers don't do this, but should only be used with old
>      setjmp()/longjmp() that clear the state).  For longjmp() not from
>      signal handlers, we depend on the ABI guaranteeing that there is
>      no garbage on the stack or in the FP env when longjmp() is called.
> 
> Dynamic SSE tests can probably be done better using vdso, but only if
> vdso is almost free.  I don't know if it requires a syscall to set up,
> or is automatic but costs all applications to set up.  Otherwise,
> sigsetjmp() and siglongjmp() are already very slow (they do too many
> syscalls), so even another syscall to detect SSE would not make them
> much slower.  They can share the SSE detection with fenv.  This would
> work OK, but is inconvenient since fenv lives in libm.
SSE detection is done by CPUID and a single bit test, it does not
require a syscall. There are some plans and WIP to bring analog
of cpu_feature* variables into libc, but it is not too critically
important. Libc and indirectly libthr already detect XSAVE support and
use it for full context saving and for deferred signal delivery in
threaded critical section.

sigsetjmp and siglongjmp only execute one syscall each on x86.

In principle, we can try to increase jmp_buf size to provide more
hardware registers saved and restored by *jmp functions, but this is
quite painful and should be proven to provide real benefits besides
theoretical correctness.  E.g., some high-profile application which
depends on restoration of SSE state after siglongjmp from a signal
handler would do it, otherwise ABI breakage makes it better to not
try.

> 
> >> Also, we do not run on real 386, only on 486+, and there was probably only
> >> Intel 486SX CPU model which has all 486 features but no FPU.
> >
> > Yes, we effectively require an FPU on i386.  I'd be tempted to start 
> > requiring
> > a built-in FPU (so INT 16 vs IRQ 13) so we could further reduce diffs with
> > amd64 and eventually have an "x86" fpu.c.  That would only drop support for
> 
> That would be backwords.  amd64 should have an "x86" npx.c.  The kernel does
> almost no FPU handling, so npx is a better name than fpu.  It is even better
> than in 1991, since the numeric extensions are not limited to floating
> point ones.  First there was mmx, then Intel started unimproving names with
> sse, then intel got tired of changing the prefix and started using
> nondescript sse numbers.
> 
> > systems using a 486sx with an external FPU.  Those systems are probably 
> > happier
> > running FreeBSD 4.x than 12 anyway.
> 
> Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to