[Bug 204671] clang floating point wrong around Inf (i386)

2023-12-30 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671

Mark Linimon  changed:

   What|Removed |Added

 Resolution|--- |Overcome By Events
   Assignee|b...@freebsd.org|bugmeis...@freebsd.org
 Status|New |Closed

--- Comment #5 from Mark Linimon  ---
^Triage: close as OBE.

I'm sorry that this PR never got looked at, but by now, 10.X is long out of
support.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 204671] clang floating point wrong around Inf (i386)

2015-11-27 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671

--- Comment #4 from Jilles Tjoelker  ---
It may be reasonable to make i386 fesetround() a non-inline function, at least
when compiling without SSE (__test_sse() may be called). In that case,
compilers are likely to save caller-save registers already, so part of the cost
of a function call is already paid, even though an actual function call only
happens the first time.

Alternatively, __test_sse() could be called somewhere during startup, so the
function call in the inlined fesetround() is not needed. This will reduce code
size of fesetround() calls considerably, but rounding to float or double is
still likely to use an incorrect rounding mode.

When compiling with SSE, there is no __test_sse() call and the behaviour for
the x87 is similar; however, if SSE2 is enabled, the x87 is probably only used
for long double.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


Re: [Bug 204671] clang floating point wrong around Inf (i386)

2015-11-22 Thread Bruce Evans

On Sun, 22 Nov 2015 a bug that doesn't want replies wrote:


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671

--- Comment #3 from ne...@segfault.kiev.ua ---
(In reply to Jilles Tjoelker from comment #2)

Jilles, thanks for the excellent explanation. This exposes I have lost some
important advances in floating point (like FENV_ACCESS role and need). But,


The conversion for printf happens during the inlined fesetround() call, after 
setting the x87 rounding mode and before calling a function __test_sse to check 
whether SSE is available.


Isn't this the issue by itself? If fesetround() makes an action which generally
shall be atomic, no intervention must be allowed during this setting. If it
can't be explained in inlined version using C, either "asm volatile" should be
used, or a fully separate function.


The asm is already volatile.  Even more ordering might (and probably would)
make a difference, but this would from accidentally avoiding the compiler
bugs.  Even an inline function gives a sequence point.  The assignment is
supposed to be complete before this.


You will generally have fewer problems with weirdly changing floating point 
results if you use SSE instead of the x87 FPU, assuming your CPUs are new 
enough.


Yep, SSE is better in all senses, if supported and exploited. But the latter is


No, SSE isn't better in all senses.  It doesn't even support extra
precision for long doubles.  SSE with 128-bit long doubles in hardware
would be better, but i386 would also need 80-bit long doubles in SSE
for some compatibility.  If fully exploited, 80-bit long doubles are
also better for scalar double precision code (they give more accuracy
using faster, simpler methods).  I intentionally didn't fully exploit
the x87 in libm, since the more complicated methods are still needed
for arches that don't have x87.  The main thing that compiler writers
don't like about the x87 is that its extra precision is (almost) always
present (and not choosable at compile time for every operation).  But
this is a feature.


a separate issue. Default compiler installation for the i386 target still uses
the least possible CPU (as far as I see from compilation without any options in
make.conf). Old style option update (CFLAGS?= in make.conf) doesn't work
anymore.


CFLAGS?= in make.conf never worked, since sys.mk sets CFLAGS earlier.

I use something like:

.if ${CFLAGS} == "-O -pipe"
CFLAGS+=-mcpu=athlon-xp
.endif

The ifdef avoids doing anything if CFLAGS is not the default (newer FreeBSD
has the bad default of -O2 instead of -O).  Then it adds to CFLAGS instead
of overriding it.  I set CFLAGS on the command line a lot, and the ifdef
prevents changing this if the command line is not the default.

Bruce
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 204671] clang floating point wrong around Inf (i386)

2015-11-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671

--- Comment #3 from ne...@segfault.kiev.ua ---
(In reply to Jilles Tjoelker from comment #2)

Jilles, thanks for the excellent explanation. This exposes I have lost some
important advances in floating point (like FENV_ACCESS role and need). But,

> The conversion for printf happens during the inlined fesetround() call, after 
> setting the x87 rounding mode and before calling a function __test_sse to 
> check whether SSE is available.

Isn't this the issue by itself? If fesetround() makes an action which generally
shall be atomic, no intervention must be allowed during this setting. If it
can't be explained in inlined version using C, either "asm volatile" should be
used, or a fully separate function.

> You will generally have fewer problems with weirdly changing floating point 
> results if you use SSE instead of the x87 FPU, assuming your CPUs are new 
> enough.

Yep, SSE is better in all senses, if supported and exploited. But the latter is
a separate issue. Default compiler installation for the i386 target still uses
the least possible CPU (as far as I see from compilation without any options in
make.conf). Old style option update (CFLAGS?= in make.conf) doesn't work
anymore. With the current install base, I'd prefer to see an option in
installer which suggests something like "-march=nocona -mtune=native" for local
builds. (This also hints at the very old topic with having a subtarget for
binary builds for modern processors, since 99% of them are at least P3, and
deliver them for freebsd-update... but this is definitely not the current
ticket issue...) For this particular installation, I had neither strong reason
nor inspiration to convert it to 64-bit one, so still are many users. So,
make.conf will be loaded with weird constructs like

NO_CPU_CFLAGS=true
NO_CPU_COPTFLAGS=true
.if ${.CURDIR:N*/BSD/src/*} == ""
CFLAGS+= -march=nocona -mtune=k8 -mmmx -msse -msse2
COPTFLAGS+= -march=i686 -mtune=k8
.endif

> Clang has a bug about the pragma, https://llvm.org/bugs/show_bug.cgi?id=8100, 
> which has been open for five years with various duplicates but no other 
> significant action.

As soon as they rely on GCC frontend, I doubt this will be fixed until GCC guys
implement its support on their side.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


Re: [Bug 204671] clang floating point wrong around Inf (i386)

2015-11-21 Thread Bruce Evans

On Sat, 21 Nov 2015 a bug that supreesses replies in mail wrote:


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671

Jilles Tjoelker  changed:

  What|Removed |Added

CC||jil...@freebsd.org

--- Comment #2 from Jilles Tjoelker  ---
This is related to the strangeness that is the x87 FPU. Internally, the x87
performs calculations in extended precision. Even if the precision control is
set to double precision, like FreeBSD and Windows do by default but Linux and
Solaris do not, the x87 registers still have greater range than double
precision.


Which versions of Windows do it?  I only have Windows/DOS compilers
from 1995 or earlier, and they do it.  I think Visual Studio (?) does
it for compatibility.  Does Windows actually require this as an ABI?
then it should also disallow clang's bug of using SSE on 32-bit systems.


As a result, the addition 1e308 + 1e308 does not overflow, but produces a
result of approximately 2e308 in an x87 register. When this result is stored to
memory in double precision format, overflow or rounding will occur.


For C (C90 and later) compilers, also when this result is assigned or cast
to variable of type double.  This sometimes loses precision and is always
slow (typically 2-4 times slower) and is rarely needed, so it is broken
by default in gcc and clang on i386 with x87.  Recent versions of gcc can
be turned into C compilers in this respect using -fexcess-precision=standard.
Standards directives like -std=c99 but not -std=gnu99 also give this
perfectly correct slowness for unsuspecting users that don't want the
slowness but want a C compiler in other respects.  clang now knows that
-fexcess-precision exists, but doesn't support it.  It also doesn't support
this implicitly for -std=c99.

For C11 compilers, also when this result is returned.  This gives further
destruction of precision and slownes and is broken by default.  IIRC,
-std=c99 gives this bug even for C99 mode in gcc.  clang doesn't support
this even with-std=c11.



What happens in t1.c is that the conversion from extended to double precision
happens two times. The conversion for printing the bytes happens directly after
the calculation and therefore uses the modified rounding mode. The conversion
for printf happens during the inlined fesetround() call, after setting the x87
rounding mode and before calling a function __test_sse to check whether SSE is
available. (After that, the value is stored and loaded again a few times.)
Therefore, the conversion for printf uses an incorrect rounding mode.


Both conversions are done after the fesetround() call in program order.
This is asking for trouble.  But since there is an assignment before the
call, there is no problem if the compiler is a C compiler.  clang is far
from being a C compiler and does unnatural ordering that gives trouble:

program order:  runtime order:
add add
assign  assign (to memory var) for printing in hex
restore rounding mode   restore rounding mode
print as double assign (to memory var) for printing as double
print as hexprint as double
print as hex


Global variables force the compiler to store values to memory more often and
may therefore reduce x87 weirdnesses.


-ffloat-store is often recommended for causing the slow store.  Before
-fexcess-precision, there was no similar hack for for fixing casts.

But it is an easier and more controllable hack to use a volatile variable.
See STRICT_ASSIGN() in FreeBSD libm.  Even minimised use of this gives
slowness and loses precision.  So in some functions I have started using
double_t to avoid the slowness (especially if the compiler is a C compiler)
and keep the extra precision intentionally.  Some hacks are needed to
avoid destroying the extra precision on return.  (Since the extra precision
is intentionaly, it doesn't take the C11 bug to require destroying it on
return.)

The expression huge*huge is used often in FreeBSD libm to raise the overflow
flag and return +Inf.  It doesn't actually work for that.  Some broken
compilers invalididly optimize it and similar expressions for raising
underflow to just returning a value; the value is then correct but the
flags are not.  But the code is buggy.  With extra precision, it asks
for and should get a value larger than DBL_MAX and no exception.  The
C11 bug breaks this.  This gives a wrong value and for use in
expressions, but the use is often to store to a value of type double;
then if the compiler is a C compiler or due to some accident like
storing to memory, the value is sometimes converted to double.

A special case test program for comparing functions does rounding
mode flipping almost exactly the same as t1.c and differs only in
care taken with 

[Bug 204671] clang floating point wrong around Inf (i386)

2015-11-21 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671

Jilles Tjoelker  changed:

   What|Removed |Added

 CC||jil...@freebsd.org

--- Comment #2 from Jilles Tjoelker  ---
This is related to the strangeness that is the x87 FPU. Internally, the x87
performs calculations in extended precision. Even if the precision control is
set to double precision, like FreeBSD and Windows do by default but Linux and
Solaris do not, the x87 registers still have greater range than double
precision.

As a result, the addition 1e308 + 1e308 does not overflow, but produces a
result of approximately 2e308 in an x87 register. When this result is stored to
memory in double precision format, overflow or rounding will occur.

What happens in t1.c is that the conversion from extended to double precision
happens two times. The conversion for printing the bytes happens directly after
the calculation and therefore uses the modified rounding mode. The conversion
for printf happens during the inlined fesetround() call, after setting the x87
rounding mode and before calling a function __test_sse to check whether SSE is
available. (After that, the value is stored and loaded again a few times.)
Therefore, the conversion for printf uses an incorrect rounding mode.

Global variables force the compiler to store values to memory more often and
may therefore reduce x87 weirdnesses.

Following the C standard, you would have to use  #pragma STDC FENV_ACCESS on 
to make this work reliably. However, neither gcc nor clang support this pragma.
They follow an ad hoc approach to floating point exceptions and modes. In gcc
you can use -frounding-math to prevent some problematic optimizations but clang
doesn't even support that. Clang has a bug about the pragma,
https://llvm.org/bugs/show_bug.cgi?id=8100, which has been open for five years
with various duplicates but no other significant action.

You will generally have fewer problems with weirdly changing floating point
results if you use SSE instead of the x87 FPU, assuming your CPUs are new
enough. SSE performs calculations in the precision specified by the program
(single or double), so it does not matter when or if a value is spilled to
memory. As noted above, GCC and clang are still ignorant about the side effects
with the floating point exceptions and modes, though.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 204671] clang floating point wrong around Inf (i386)

2015-11-20 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671

--- Comment #1 from ne...@segfault.kiev.ua ---
Not reproduced on Kubuntu/i386 14.04, clang-3.6 from packages, AMD FX-8150 =>
seems FreeBSD specific.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 204671] clang floating point wrong around Inf (i386)

2015-11-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671

Bug ID: 204671
   Summary: clang floating point wrong around Inf (i386)
   Product: Base System
   Version: 10.2-RELEASE
  Hardware: i386
OS: Any
Status: New
  Severity: Affects Some People
  Priority: ---
 Component: bin
  Assignee: freebsd-bugs@FreeBSD.org
  Reporter: ne...@segfault.kiev.ua

Created attachment 163320
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163320=edit
source files

The test program, being called as "./t",
performs a single arithmetic operator with the specified rounding and
prints its results. In some cases, output is wrong.

Conditions to reproduce:
1. Clang of any available version (confirmed on 3.4 from base,
clang36-3.6.2, clang37-3.7.2 from ports). I can't get this issue with
gcc-4.8.5, gcc-5.2.0_1 from ports.
2. i386 (amd64 isn't affected, I guess, because the issue is bound to FPU
variant).
3. no high -march= ("native" causes issues to disappear, I guess, for
the same connection to FPU; clang starts emitting SSE for this CPU).
4. -O or higher optimization level (-O0 isn't affected).

The OS is: FreeBSD 10.2-RELEASE-p7 i386.
The CPU on the test machine is: AMD Athlon(tm) 64 Processor 3500+
(Origin="AuthenticAMD"  Id=0x50ff2  Family=0xf  Model=0x5f  Stepping=2).

The proper results are (as I see from available IEEE754 documents):

$ ./t1 1e308 + 1e308 0
r=inf   ( 7F F0 00 00 00 00 00 00)
$ ./t1 1e308 + 1e308 1
r=1.797693134862316e+308( 7F EF FF FF FF FF FF FF)
$ ./t1 1e308 + 1e308 2
r=inf   ( 7F F0 00 00 00 00 00 00)
$ ./t1 1e308 + 1e308 3
r=1.797693134862316e+308( 7F EF FF FF FF FF FF FF)

This satisties the standard requirement that, e.g., "roundTowardZero,
the result shall be the format's floating-point number closest to and no
greater in magnitude than the infinitely precise result."

The variant with t1.c from attachment when the issue is exposed
(compiled as "cc -o t1 t1.c -g -Wall -W -lm -O"):

$ ./t1 1e308 + 1e308 0
r=inf   ( 7F F0 00 00 00 00 00 00)
$ ./t1 1e308 + 1e308 1
r=inf   ( 7F EF FF FF FF FF FF FF)
$ ./t1 1e308 + 1e308 2
r=inf   ( 7F F0 00 00 00 00 00 00)
$ ./t1 1e308 + 1e308 3
r=inf   ( 7F EF FF FF FF FF FF FF)

So, the binary representation of result is correct, but the printf
output is not.

The same compilation with -DNO_HEX always prints "inf" (so, it rejects a
guess of an aliasing issue):

$ ./t1 1e308 + 1e308 0
r=inf   ()
$ ./t1 1e308 + 1e308 1
r=inf   ()
$ ./t1 1e308 + 1e308 2
r=inf   ()
$ ./t1 1e308 + 1e308 3
r=inf   ()

The variant in t2.c uses global union instead of local on-stack
one for binary printing. The behavior differs so binary representation
always shows "inf":

$ ./t2 1e308 + 1e308 0
r=inf   ( 7F F0 00 00 00 00 00 00)
$ ./t2 1e308 + 1e308 1
r=inf   ( 7F F0 00 00 00 00 00 00)
$ ./t2 1e308 + 1e308 2
r=inf   ( 7F F0 00 00 00 00 00 00)
$ ./t2 1e308 + 1e308 3
r=inf   ( 7F F0 00 00 00 00 00 00)

Again, adding -DNO_HEX causes "inf" still printed in all cases.

But: a variant with "r" declared as global variable instead of local one
(-DR_GLOBAL for both source versions) stops the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"