[Bug 204671] clang floating point wrong around Inf (i386)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 Mark Linimon changed: What|Removed |Added Resolution|--- |Overcome By Events Assignee|b...@freebsd.org|bugmeis...@freebsd.org Status|New |Closed --- Comment #5 from Mark Linimon --- ^Triage: close as OBE. I'm sorry that this PR never got looked at, but by now, 10.X is long out of support. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 204671] clang floating point wrong around Inf (i386)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 --- Comment #4 from Jilles Tjoelker--- It may be reasonable to make i386 fesetround() a non-inline function, at least when compiling without SSE (__test_sse() may be called). In that case, compilers are likely to save caller-save registers already, so part of the cost of a function call is already paid, even though an actual function call only happens the first time. Alternatively, __test_sse() could be called somewhere during startup, so the function call in the inlined fesetround() is not needed. This will reduce code size of fesetround() calls considerably, but rounding to float or double is still likely to use an incorrect rounding mode. When compiling with SSE, there is no __test_sse() call and the behaviour for the x87 is similar; however, if SSE2 is enabled, the x87 is probably only used for long double. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
Re: [Bug 204671] clang floating point wrong around Inf (i386)
On Sun, 22 Nov 2015 a bug that doesn't want replies wrote: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 --- Comment #3 from ne...@segfault.kiev.ua --- (In reply to Jilles Tjoelker from comment #2) Jilles, thanks for the excellent explanation. This exposes I have lost some important advances in floating point (like FENV_ACCESS role and need). But, The conversion for printf happens during the inlined fesetround() call, after setting the x87 rounding mode and before calling a function __test_sse to check whether SSE is available. Isn't this the issue by itself? If fesetround() makes an action which generally shall be atomic, no intervention must be allowed during this setting. If it can't be explained in inlined version using C, either "asm volatile" should be used, or a fully separate function. The asm is already volatile. Even more ordering might (and probably would) make a difference, but this would from accidentally avoiding the compiler bugs. Even an inline function gives a sequence point. The assignment is supposed to be complete before this. You will generally have fewer problems with weirdly changing floating point results if you use SSE instead of the x87 FPU, assuming your CPUs are new enough. Yep, SSE is better in all senses, if supported and exploited. But the latter is No, SSE isn't better in all senses. It doesn't even support extra precision for long doubles. SSE with 128-bit long doubles in hardware would be better, but i386 would also need 80-bit long doubles in SSE for some compatibility. If fully exploited, 80-bit long doubles are also better for scalar double precision code (they give more accuracy using faster, simpler methods). I intentionally didn't fully exploit the x87 in libm, since the more complicated methods are still needed for arches that don't have x87. The main thing that compiler writers don't like about the x87 is that its extra precision is (almost) always present (and not choosable at compile time for every operation). But this is a feature. a separate issue. Default compiler installation for the i386 target still uses the least possible CPU (as far as I see from compilation without any options in make.conf). Old style option update (CFLAGS?= in make.conf) doesn't work anymore. CFLAGS?= in make.conf never worked, since sys.mk sets CFLAGS earlier. I use something like: .if ${CFLAGS} == "-O -pipe" CFLAGS+=-mcpu=athlon-xp .endif The ifdef avoids doing anything if CFLAGS is not the default (newer FreeBSD has the bad default of -O2 instead of -O). Then it adds to CFLAGS instead of overriding it. I set CFLAGS on the command line a lot, and the ifdef prevents changing this if the command line is not the default. Bruce ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204671] clang floating point wrong around Inf (i386)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 --- Comment #3 from ne...@segfault.kiev.ua --- (In reply to Jilles Tjoelker from comment #2) Jilles, thanks for the excellent explanation. This exposes I have lost some important advances in floating point (like FENV_ACCESS role and need). But, > The conversion for printf happens during the inlined fesetround() call, after > setting the x87 rounding mode and before calling a function __test_sse to > check whether SSE is available. Isn't this the issue by itself? If fesetround() makes an action which generally shall be atomic, no intervention must be allowed during this setting. If it can't be explained in inlined version using C, either "asm volatile" should be used, or a fully separate function. > You will generally have fewer problems with weirdly changing floating point > results if you use SSE instead of the x87 FPU, assuming your CPUs are new > enough. Yep, SSE is better in all senses, if supported and exploited. But the latter is a separate issue. Default compiler installation for the i386 target still uses the least possible CPU (as far as I see from compilation without any options in make.conf). Old style option update (CFLAGS?= in make.conf) doesn't work anymore. With the current install base, I'd prefer to see an option in installer which suggests something like "-march=nocona -mtune=native" for local builds. (This also hints at the very old topic with having a subtarget for binary builds for modern processors, since 99% of them are at least P3, and deliver them for freebsd-update... but this is definitely not the current ticket issue...) For this particular installation, I had neither strong reason nor inspiration to convert it to 64-bit one, so still are many users. So, make.conf will be loaded with weird constructs like NO_CPU_CFLAGS=true NO_CPU_COPTFLAGS=true .if ${.CURDIR:N*/BSD/src/*} == "" CFLAGS+= -march=nocona -mtune=k8 -mmmx -msse -msse2 COPTFLAGS+= -march=i686 -mtune=k8 .endif > Clang has a bug about the pragma, https://llvm.org/bugs/show_bug.cgi?id=8100, > which has been open for five years with various duplicates but no other > significant action. As soon as they rely on GCC frontend, I doubt this will be fixed until GCC guys implement its support on their side. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
Re: [Bug 204671] clang floating point wrong around Inf (i386)
On Sat, 21 Nov 2015 a bug that supreesses replies in mail wrote: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 Jilles Tjoelkerchanged: What|Removed |Added CC||jil...@freebsd.org --- Comment #2 from Jilles Tjoelker --- This is related to the strangeness that is the x87 FPU. Internally, the x87 performs calculations in extended precision. Even if the precision control is set to double precision, like FreeBSD and Windows do by default but Linux and Solaris do not, the x87 registers still have greater range than double precision. Which versions of Windows do it? I only have Windows/DOS compilers from 1995 or earlier, and they do it. I think Visual Studio (?) does it for compatibility. Does Windows actually require this as an ABI? then it should also disallow clang's bug of using SSE on 32-bit systems. As a result, the addition 1e308 + 1e308 does not overflow, but produces a result of approximately 2e308 in an x87 register. When this result is stored to memory in double precision format, overflow or rounding will occur. For C (C90 and later) compilers, also when this result is assigned or cast to variable of type double. This sometimes loses precision and is always slow (typically 2-4 times slower) and is rarely needed, so it is broken by default in gcc and clang on i386 with x87. Recent versions of gcc can be turned into C compilers in this respect using -fexcess-precision=standard. Standards directives like -std=c99 but not -std=gnu99 also give this perfectly correct slowness for unsuspecting users that don't want the slowness but want a C compiler in other respects. clang now knows that -fexcess-precision exists, but doesn't support it. It also doesn't support this implicitly for -std=c99. For C11 compilers, also when this result is returned. This gives further destruction of precision and slownes and is broken by default. IIRC, -std=c99 gives this bug even for C99 mode in gcc. clang doesn't support this even with-std=c11. What happens in t1.c is that the conversion from extended to double precision happens two times. The conversion for printing the bytes happens directly after the calculation and therefore uses the modified rounding mode. The conversion for printf happens during the inlined fesetround() call, after setting the x87 rounding mode and before calling a function __test_sse to check whether SSE is available. (After that, the value is stored and loaded again a few times.) Therefore, the conversion for printf uses an incorrect rounding mode. Both conversions are done after the fesetround() call in program order. This is asking for trouble. But since there is an assignment before the call, there is no problem if the compiler is a C compiler. clang is far from being a C compiler and does unnatural ordering that gives trouble: program order: runtime order: add add assign assign (to memory var) for printing in hex restore rounding mode restore rounding mode print as double assign (to memory var) for printing as double print as hexprint as double print as hex Global variables force the compiler to store values to memory more often and may therefore reduce x87 weirdnesses. -ffloat-store is often recommended for causing the slow store. Before -fexcess-precision, there was no similar hack for for fixing casts. But it is an easier and more controllable hack to use a volatile variable. See STRICT_ASSIGN() in FreeBSD libm. Even minimised use of this gives slowness and loses precision. So in some functions I have started using double_t to avoid the slowness (especially if the compiler is a C compiler) and keep the extra precision intentionally. Some hacks are needed to avoid destroying the extra precision on return. (Since the extra precision is intentionaly, it doesn't take the C11 bug to require destroying it on return.) The expression huge*huge is used often in FreeBSD libm to raise the overflow flag and return +Inf. It doesn't actually work for that. Some broken compilers invalididly optimize it and similar expressions for raising underflow to just returning a value; the value is then correct but the flags are not. But the code is buggy. With extra precision, it asks for and should get a value larger than DBL_MAX and no exception. The C11 bug breaks this. This gives a wrong value and for use in expressions, but the use is often to store to a value of type double; then if the compiler is a C compiler or due to some accident like storing to memory, the value is sometimes converted to double. A special case test program for comparing functions does rounding mode flipping almost exactly the same as t1.c and differs only in care taken with
[Bug 204671] clang floating point wrong around Inf (i386)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 Jilles Tjoelkerchanged: What|Removed |Added CC||jil...@freebsd.org --- Comment #2 from Jilles Tjoelker --- This is related to the strangeness that is the x87 FPU. Internally, the x87 performs calculations in extended precision. Even if the precision control is set to double precision, like FreeBSD and Windows do by default but Linux and Solaris do not, the x87 registers still have greater range than double precision. As a result, the addition 1e308 + 1e308 does not overflow, but produces a result of approximately 2e308 in an x87 register. When this result is stored to memory in double precision format, overflow or rounding will occur. What happens in t1.c is that the conversion from extended to double precision happens two times. The conversion for printing the bytes happens directly after the calculation and therefore uses the modified rounding mode. The conversion for printf happens during the inlined fesetround() call, after setting the x87 rounding mode and before calling a function __test_sse to check whether SSE is available. (After that, the value is stored and loaded again a few times.) Therefore, the conversion for printf uses an incorrect rounding mode. Global variables force the compiler to store values to memory more often and may therefore reduce x87 weirdnesses. Following the C standard, you would have to use #pragma STDC FENV_ACCESS on to make this work reliably. However, neither gcc nor clang support this pragma. They follow an ad hoc approach to floating point exceptions and modes. In gcc you can use -frounding-math to prevent some problematic optimizations but clang doesn't even support that. Clang has a bug about the pragma, https://llvm.org/bugs/show_bug.cgi?id=8100, which has been open for five years with various duplicates but no other significant action. You will generally have fewer problems with weirdly changing floating point results if you use SSE instead of the x87 FPU, assuming your CPUs are new enough. SSE performs calculations in the precision specified by the program (single or double), so it does not matter when or if a value is spilled to memory. As noted above, GCC and clang are still ignorant about the side effects with the floating point exceptions and modes, though. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204671] clang floating point wrong around Inf (i386)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 --- Comment #1 from ne...@segfault.kiev.ua --- Not reproduced on Kubuntu/i386 14.04, clang-3.6 from packages, AMD FX-8150 => seems FreeBSD specific. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204671] clang floating point wrong around Inf (i386)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671 Bug ID: 204671 Summary: clang floating point wrong around Inf (i386) Product: Base System Version: 10.2-RELEASE Hardware: i386 OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: ne...@segfault.kiev.ua Created attachment 163320 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163320=edit source files The test program, being called as "./t", performs a single arithmetic operator with the specified rounding and prints its results. In some cases, output is wrong. Conditions to reproduce: 1. Clang of any available version (confirmed on 3.4 from base, clang36-3.6.2, clang37-3.7.2 from ports). I can't get this issue with gcc-4.8.5, gcc-5.2.0_1 from ports. 2. i386 (amd64 isn't affected, I guess, because the issue is bound to FPU variant). 3. no high -march= ("native" causes issues to disappear, I guess, for the same connection to FPU; clang starts emitting SSE for this CPU). 4. -O or higher optimization level (-O0 isn't affected). The OS is: FreeBSD 10.2-RELEASE-p7 i386. The CPU on the test machine is: AMD Athlon(tm) 64 Processor 3500+ (Origin="AuthenticAMD" Id=0x50ff2 Family=0xf Model=0x5f Stepping=2). The proper results are (as I see from available IEEE754 documents): $ ./t1 1e308 + 1e308 0 r=inf ( 7F F0 00 00 00 00 00 00) $ ./t1 1e308 + 1e308 1 r=1.797693134862316e+308( 7F EF FF FF FF FF FF FF) $ ./t1 1e308 + 1e308 2 r=inf ( 7F F0 00 00 00 00 00 00) $ ./t1 1e308 + 1e308 3 r=1.797693134862316e+308( 7F EF FF FF FF FF FF FF) This satisties the standard requirement that, e.g., "roundTowardZero, the result shall be the format's floating-point number closest to and no greater in magnitude than the infinitely precise result." The variant with t1.c from attachment when the issue is exposed (compiled as "cc -o t1 t1.c -g -Wall -W -lm -O"): $ ./t1 1e308 + 1e308 0 r=inf ( 7F F0 00 00 00 00 00 00) $ ./t1 1e308 + 1e308 1 r=inf ( 7F EF FF FF FF FF FF FF) $ ./t1 1e308 + 1e308 2 r=inf ( 7F F0 00 00 00 00 00 00) $ ./t1 1e308 + 1e308 3 r=inf ( 7F EF FF FF FF FF FF FF) So, the binary representation of result is correct, but the printf output is not. The same compilation with -DNO_HEX always prints "inf" (so, it rejects a guess of an aliasing issue): $ ./t1 1e308 + 1e308 0 r=inf () $ ./t1 1e308 + 1e308 1 r=inf () $ ./t1 1e308 + 1e308 2 r=inf () $ ./t1 1e308 + 1e308 3 r=inf () The variant in t2.c uses global union instead of local on-stack one for binary printing. The behavior differs so binary representation always shows "inf": $ ./t2 1e308 + 1e308 0 r=inf ( 7F F0 00 00 00 00 00 00) $ ./t2 1e308 + 1e308 1 r=inf ( 7F F0 00 00 00 00 00 00) $ ./t2 1e308 + 1e308 2 r=inf ( 7F F0 00 00 00 00 00 00) $ ./t2 1e308 + 1e308 3 r=inf ( 7F F0 00 00 00 00 00 00) Again, adding -DNO_HEX causes "inf" still printed in all cases. But: a variant with "r" declared as global variable instead of local one (-DR_GLOBAL for both source versions) stops the issue. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"