Re: Optimization bug with floating-point?

2019-03-25 Thread Steve Kargl
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> All,
> 
> There seems to an optimization bug with clang on
> 
> % uname -a
> FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT r344653 MOBILE  i386
> 
> IOW, if you do numerica work on i386, you may want to check your
> results.

This is now 

https://bugs.llvm.org/show_bug.cgi?id=41224

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-14 Thread John Baldwin
On 3/14/19 1:08 PM, Steve Kargl wrote:
> On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
>> On 2019-Mar-13 23:30:07 -0700, Steve Kargl 
>>  wrote:
>>> AFAICT, all libm float routines need to be modified to conditional
>>> include ieeefp.h and call fpsetprec(FP_PD).  This will work around
>>> issues is FP and libm.  FreeBSD needs to issue an erratum about 
>>> the numerical issues with clang.
>>
>> I vaguely recall looking into the x87 initialisation a long time ago
>> and STR that the startup code (either crtX or in the kernel) does
>> a fninit() to set the precision.  I don't recall exactly where.
>>
>> IMO, calling fpsetprec() in every libm float function is overkill. It
>> should be enough to fpsetprec() before main() and add a note in the
>> man pages that libm is built to use the default FPU configuration and
>> changing the configuration (precision or rounding) may result in larger
>> errors.
> 
> My understanding of the situation is that FreeBSD i386/387 sets
> the FPU to 53-bit precision (whether at start up or first access
> is immaterial).  This was done long ago to prevent issues with
> different optimization levels leaving different intermediate
> results is registers with extended precision.  You can observe
> the problem with the toy program I posted and clang.  Compile it
> with -O0 and -O2.  With the former you have max ULP of 2.9 (the
> desired result); with the latter you have a max ULP of 23.xxx.
> I have observed a 6 billion ULP issue when running my testsuite.
> As pointed out by John Baldwin, GCC is aware of the FPU setting.
> The problem with clang is that it seems to unconditionally assume
> the FPU is set to 64-bit precision.   It is unclear if clang is
> generated the desired result for float routines in libm.  The
> only to gaurantee the desired resut is to use fpsetprec(FP_PD),
> or fix clang to take into account the FPU environment.

OTOH, note that every other OS in 32-bit mode uses 64-bit precision,
and amd64 also uses 64-bit precision by default IIUC.  FreeBSD/i386
is definitely unique in this regard.  Linux doesn't do it, none of
the other BSD's do it (only Dragonfly does b/c they inherited it
from FreeBSD).  None of Solaris, Windows, etc. do it either if the
gcc sources are to be trusted as a reference.

That said, I think it must have to do with how clang vs GCC is
handling saving the values in memory and whether or not it does
truncation to 53 bits when stored in memory somehow.  I was trying
to poke around in GCC's sources to figure out if it was doing anything
differently, but I couldn't find a difference in terms of function
pointers, etc.  The only difference is is the constants used in a set
of structures.  I haven't tried to track down what those struct
member values control though.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-14 Thread Konstantin Belousov
On Thu, Mar 14, 2019 at 12:59:14PM -0700, John Baldwin wrote:
> On 3/14/19 12:20 PM, Konstantin Belousov wrote:
> > On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
> >> On 2019-Mar-13 23:30:07 -0700, Steve Kargl 
> >>  wrote:
> >>> AFAICT, all libm float routines need to be modified to conditional
> >>> include ieeefp.h and call fpsetprec(FP_PD).  This will work around
> >>> issues is FP and libm.  FreeBSD needs to issue an erratum about 
> >>> the numerical issues with clang.
> >>
> >> I vaguely recall looking into the x87 initialisation a long time ago
> >> and STR that the startup code (either crtX or in the kernel) does
> >> a fninit() to set the precision.  I don't recall exactly where.
> > At boot, a clean initial FPU state is stored in fpu_initialstate.
> > Then on first FPU access from userspace  (first for the given process
> > context), this saved state is copied into hardware registers.  The
> > quirk is that for i386 binaries on amd64, we adjust fpu control word
> > to what is expected by i386 binaries.
> > 
> >>
> >> IMO, calling fpsetprec() in every libm float function is overkill. It
> >> should be enough to fpsetprec() before main() and add a note in the
> >> man pages that libm is built to use the default FPU configuration and
> >> changing the configuration (precision or rounding) may result in larger
> >> errors.
> > Changing default precision in crt1 would break the ABI.
> 
> So what I don't understand then is what is gcc doing different than clang
> in this case.  I assume neither GCC _nor_ clang are adjusting the FPU in
> compiler-generated code, and in fact as Steve's earlier tests shows, the
> precision is set to PD by default when a clang-built binary is run.

Precision control only affect elementary floating-point instructions.
Could this be the cause ?

SDM vol 1 8.1.5.2 Precision Control Field
The precision-control bits only affect the results of the following
floating-point instructions: FADD, FADDP, FIADD, FSUB, FSUBP, FISUB,
FSUBR, FSUBRP, FISUBR, FMUL, FMULP, FIMUL, FDIV, FDIVP, FIDIV, FDIVR,
FDIVRP, FIDIVR, and FSQRT.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-14 Thread Steve Kargl
On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
> On 2019-Mar-13 23:30:07 -0700, Steve Kargl 
>  wrote:
> >AFAICT, all libm float routines need to be modified to conditional
> >include ieeefp.h and call fpsetprec(FP_PD).  This will work around
> >issues is FP and libm.  FreeBSD needs to issue an erratum about 
> >the numerical issues with clang.
> 
> I vaguely recall looking into the x87 initialisation a long time ago
> and STR that the startup code (either crtX or in the kernel) does
> a fninit() to set the precision.  I don't recall exactly where.
> 
> IMO, calling fpsetprec() in every libm float function is overkill. It
> should be enough to fpsetprec() before main() and add a note in the
> man pages that libm is built to use the default FPU configuration and
> changing the configuration (precision or rounding) may result in larger
> errors.

My understanding of the situation is that FreeBSD i386/387 sets
the FPU to 53-bit precision (whether at start up or first access
is immaterial).  This was done long ago to prevent issues with
different optimization levels leaving different intermediate
results is registers with extended precision.  You can observe
the problem with the toy program I posted and clang.  Compile it
with -O0 and -O2.  With the former you have max ULP of 2.9 (the
desired result); with the latter you have a max ULP of 23.xxx.
I have observed a 6 billion ULP issue when running my testsuite.
As pointed out by John Baldwin, GCC is aware of the FPU setting.
The problem with clang is that it seems to unconditionally assume
the FPU is set to 64-bit precision.   It is unclear if clang is
generated the desired result for float routines in libm.  The
only to gaurantee the desired resut is to use fpsetprec(FP_PD),
or fix clang to take into account the FPU environment.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-14 Thread John Baldwin
On 3/14/19 12:20 PM, Konstantin Belousov wrote:
> On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
>> On 2019-Mar-13 23:30:07 -0700, Steve Kargl 
>>  wrote:
>>> AFAICT, all libm float routines need to be modified to conditional
>>> include ieeefp.h and call fpsetprec(FP_PD).  This will work around
>>> issues is FP and libm.  FreeBSD needs to issue an erratum about 
>>> the numerical issues with clang.
>>
>> I vaguely recall looking into the x87 initialisation a long time ago
>> and STR that the startup code (either crtX or in the kernel) does
>> a fninit() to set the precision.  I don't recall exactly where.
> At boot, a clean initial FPU state is stored in fpu_initialstate.
> Then on first FPU access from userspace  (first for the given process
> context), this saved state is copied into hardware registers.  The
> quirk is that for i386 binaries on amd64, we adjust fpu control word
> to what is expected by i386 binaries.
> 
>>
>> IMO, calling fpsetprec() in every libm float function is overkill. It
>> should be enough to fpsetprec() before main() and add a note in the
>> man pages that libm is built to use the default FPU configuration and
>> changing the configuration (precision or rounding) may result in larger
>> errors.
> Changing default precision in crt1 would break the ABI.

So what I don't understand then is what is gcc doing different than clang
in this case.  I assume neither GCC _nor_ clang are adjusting the FPU in
compiler-generated code, and in fact as Steve's earlier tests shows, the
precision is set to PD by default when a clang-built binary is run.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-14 Thread Konstantin Belousov
On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
> On 2019-Mar-13 23:30:07 -0700, Steve Kargl 
>  wrote:
> >AFAICT, all libm float routines need to be modified to conditional
> >include ieeefp.h and call fpsetprec(FP_PD).  This will work around
> >issues is FP and libm.  FreeBSD needs to issue an erratum about 
> >the numerical issues with clang.
> 
> I vaguely recall looking into the x87 initialisation a long time ago
> and STR that the startup code (either crtX or in the kernel) does
> a fninit() to set the precision.  I don't recall exactly where.
At boot, a clean initial FPU state is stored in fpu_initialstate.
Then on first FPU access from userspace  (first for the given process
context), this saved state is copied into hardware registers.  The
quirk is that for i386 binaries on amd64, we adjust fpu control word
to what is expected by i386 binaries.

> 
> IMO, calling fpsetprec() in every libm float function is overkill. It
> should be enough to fpsetprec() before main() and add a note in the
> man pages that libm is built to use the default FPU configuration and
> changing the configuration (precision or rounding) may result in larger
> errors.
Changing default precision in crt1 would break the ABI.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-14 Thread Peter Jeremy
On 2019-Mar-13 23:30:07 -0700, Steve Kargl  
wrote:
>AFAICT, all libm float routines need to be modified to conditional
>include ieeefp.h and call fpsetprec(FP_PD).  This will work around
>issues is FP and libm.  FreeBSD needs to issue an erratum about 
>the numerical issues with clang.

I vaguely recall looking into the x87 initialisation a long time ago
and STR that the startup code (either crtX or in the kernel) does
a fninit() to set the precision.  I don't recall exactly where.

IMO, calling fpsetprec() in every libm float function is overkill. It
should be enough to fpsetprec() before main() and add a note in the
man pages that libm is built to use the default FPU configuration and
changing the configuration (precision or rounding) may result in larger
errors.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Optimization bug with floating-point?

2019-03-14 Thread Steve Kargl
On Wed, Mar 13, 2019 at 11:30:07PM -0700, Steve Kargl wrote:
> 
> Spent a couple hours wandering in contrib/llvm.  Have no idea
> how to fix clang to actually work on i386/387.  Any ideas 
> would be welcomed.
> 
> AFAICT, all libm float routines need to be modified to conditional
> include ieeefp.h and call fpsetprec(FP_PD).  This will work around
> issues is FP and libm.  FreeBSD needs to issue an erratum about 
> the numerical issues with clang.
> 

Probably beating a dead horse, but I'll continue as someone
might actually be able to me fix clang.

clang has the ability to determine the default precision that
the FPU on i386 is using.

#include 
#include 
#include 
#include 

int
main(void)
{

   fp_prec_t p;

   p = fpgetprec();

   switch(p) {
   case FP_PS:
  printf("24 bit (single-precision)\n");
  break;
   case FP_PRS:
  printf("reserved\n");
  break;
   case FP_PD:
  printf("53 bit (double-precision)\n");
  break;
   case FP_PE:
  printf("64 bit (extended-precision)\n");
  break;
   default:
  errx(1,"unable to determine precision");
   };

   return 0;
}

%  cc -o z -O2 d.c && ./z
53 bit (double-precision)

It is likely that one (or more files) in contrib/llvm/Target/X86
to be fixed.  Unfortunately, there are 116 files, which are written
in languages I do not know.

Any pointers of which file(s) to poke?

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-14 Thread Steve Kargl
On Wed, Mar 13, 2019 at 02:24:55PM -0700, Steve Kargl wrote:
> On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote:
> > On 3/13/19 9:40 AM, Steve Kargl wrote:
> > > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
> > >> On 3/13/19 8:16 AM, Steve Kargl wrote:
> > >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> > 
> >  gcc8 --version
> >  gcc8 (FreeBSD Ports Collection) 8.3.0
> > 
> >  gcc8 -fno-builtin -o z a.c -lm && ./z
> >  gcc8 -O -fno-builtin -o z a.c -lm && ./z
> >  gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> >  gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> > 
> >  Max ULP: 2.297073
> >  Count: 0   (# of ULP that exceed 21)
> > >>>
> > >>> clang agrees with gcc8 if one changes ...
> > >>>
> >  int
> >  main(void)
> >  {
> > double re, im, u, ur, ui;
> > float complex f;
> > float x, y;
> > >>>
> > >>> this line to "volatile float x, y".
> > >>
> > >> So it seems to be a regression in clang 7 vs clang 6?
> > > 
> > > /usr/local/bin/clang60 has the same problem.  
> > > 
> > > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
> > >   Maximum ULP: 23.061242
> > > # of ULP > 21: 39
> > > 
> > > Adding volatile as in the above "fixes" the problem.
> > > 
> > > AFAICT, this a i386/387 code generation problem.  Perhaps,
> > > an alignment issue?
> > 
> > Oh, I misread your earlier e-mail to say that clang60 worked.
> > 
> > One issue I'm aware of is that clang does not have any support for the
> > special arrangement FreeBSD/i386 uses where it uses different precision
> > for registers vs in-memory for some of the floating point types (GCC has
> > a special hack that is only used on FreeBSD for this but isn't used on
> > any other OS's).  I wonder if that could be a factor?  Volatile probably
> > forces a round trip between memory which might explain why this is the
> > case.
> > 
> 
> I went looking for this special hack.  In gcc/gccx/config/i386,
> one finds 
> 
> /* FreeBSD sets the rounding precision of the FPU to 53 bits.  Let the
>compiler get the contents of  and std::numeric_limits correct.  */
> #undef TARGET_96_ROUND_53_LONG_DOUBLE
> #define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT)
> 
> So, taking this as a hunch, I added ieeefp.h to my test program
> and called 'fpsetprec(FP_PD)' as the first executable statement.
> This then results in
> 
> % cc -fno-builtin -m32 -O2 -o z b.o a.c -lm && ./z
> Max u: 2.297073
> Count: 0
> 
> So, is there a way to correctly build clang for i386/387
> to automatically set the precision correctly?
> 

Spent a couple hours wandering in contrib/llvm.  Have no idea
how to fix clang to actually work on i386/387.  Any ideas 
would be welcomed.

AFAICT, all libm float routines need to be modified to conditional
include ieeefp.h and call fpsetprec(FP_PD).  This will work around
issues is FP and libm.  FreeBSD needs to issue an erratum about 
the numerical issues with clang.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote:
> On 3/13/19 9:40 AM, Steve Kargl wrote:
> > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
> >> On 3/13/19 8:16 AM, Steve Kargl wrote:
> >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> 
>  gcc8 --version
>  gcc8 (FreeBSD Ports Collection) 8.3.0
> 
>  gcc8 -fno-builtin -o z a.c -lm && ./z
>  gcc8 -O -fno-builtin -o z a.c -lm && ./z
>  gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
>  gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> 
>  Max ULP: 2.297073
>  Count: 0   (# of ULP that exceed 21)
> 
> >>>
> >>> clang agrees with gcc8 if one changes ...
> >>>
>  int
>  main(void)
>  {
> double re, im, u, ur, ui;
> float complex f;
> float x, y;
> >>>
> >>> this line to "volatile float x, y".
> >>
> >> So it seems to be a regression in clang 7 vs clang 6?
> >>
> > 
> > /usr/local/bin/clang60 has the same problem.  
> > 
> > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
> >   Maximum ULP: 23.061242
> > # of ULP > 21: 39
> > 
> > Adding volatile as in the above "fixes" the problem.
> > 
> > AFAICT, this a i386/387 code generation problem.  Perhaps,
> > an alignment issue?
> 
> Oh, I misread your earlier e-mail to say that clang60 worked.
> 
> One issue I'm aware of is that clang does not have any support for the
> special arrangement FreeBSD/i386 uses where it uses different precision
> for registers vs in-memory for some of the floating point types (GCC has
> a special hack that is only used on FreeBSD for this but isn't used on
> any other OS's).  I wonder if that could be a factor?  Volatile probably
> forces a round trip between memory which might explain why this is the
> case.
> 

I went looking for this special hack.  In gcc/gccx/config/i386,
one finds 

/* FreeBSD sets the rounding precision of the FPU to 53 bits.  Let the
   compiler get the contents of  and std::numeric_limits correct.  */
#undef TARGET_96_ROUND_53_LONG_DOUBLE
#define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT)

So, taking this as a hunch, I added ieeefp.h to my test program
and called 'fpsetprec(FP_PD)' as the first executable statement.
This then results in

% cc -fno-builtin -m32 -O2 -o z b.o a.c -lm && ./z
Max u: 2.297073
Count: 0

So, is there a way to correctly build clang for i386/387
to automatically set the precision correctly?

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Wed, Mar 13, 2019 at 10:40:28AM -0700, Conrad Meyer wrote:
> Hi John,
> 
> On Wed, Mar 13, 2019 at 10:17 AM John Baldwin  wrote:
> > One issue I'm aware of is that clang does not have any support for the
> > special arrangement FreeBSD/i386 uses where it uses different precision
> > for registers vs in-memory for some of the floating point types (GCC has
> > a special hack that is only used on FreeBSD for this but isn't used on
> > any other OS's).  I wonder if that could be a factor?  Volatile probably
> > forces a round trip between memory which might explain why this is the
> > case.
> >
> > I wonder what your test program does on i386 Linux with GCC?
> 
> $ uname -sr
> Linux 4.20.4
> $ gcc --version
> gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6)
> ...
> $ rpm -qf /usr/lib/libm-2.27.so
> glibc-2.27-37.fc28.i686
> 
> $ gcc -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: 1.959975
> Count: 0
> $ gcc -O -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: 1.959975
> Count: 0
> $ gcc -O1 -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: 1.959975
> Count: 0
> $ gcc -O2 -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: nan
> Count: 0
> $ gcc -O3 -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: nan
> Count: 0
> 
> Uh.
> 
> kargl.c: In function ‘main’:
> kargl.c:80:10: warning: ‘u’ may be used uninitialized in this function
> [-Wmaybe-uninitialized]
>if (ur > u) u = ur;
>   ^

Whoops.  There are a number of variations on a theme named a.c.
Initializing u to 0 doesn't change the outcome with clang on
FreeBSD.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote:
> On 3/13/19 9:40 AM, Steve Kargl wrote:
> > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
> >> On 3/13/19 8:16 AM, Steve Kargl wrote:
> >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> 
>  gcc8 --version
>  gcc8 (FreeBSD Ports Collection) 8.3.0
> 
>  gcc8 -fno-builtin -o z a.c -lm && ./z
>  gcc8 -O -fno-builtin -o z a.c -lm && ./z
>  gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
>  gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> 
>  Max ULP: 2.297073
>  Count: 0   (# of ULP that exceed 21)
> 
> >>>
> >>> clang agrees with gcc8 if one changes ...
> >>>
>  int
>  main(void)
>  {
> double re, im, u, ur, ui;
> float complex f;
> float x, y;
> >>>
> >>> this line to "volatile float x, y".
> >>
> >> So it seems to be a regression in clang 7 vs clang 6?
> >>
> > 
> > /usr/local/bin/clang60 has the same problem.  
> > 
> > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
> >   Maximum ULP: 23.061242
> > # of ULP > 21: 39
> > 
> > Adding volatile as in the above "fixes" the problem.
> > 
> > AFAICT, this a i386/387 code generation problem.  Perhaps,
> > an alignment issue?
> 
> Oh, I misread your earlier e-mail to say that clang60 worked.
> 
> One issue I'm aware of is that clang does not have any support for the
> special arrangement FreeBSD/i386 uses where it uses different precision
> for registers vs in-memory for some of the floating point types (GCC has
> a special hack that is only used on FreeBSD for this but isn't used on
> any other OS's).  I wonder if that could be a factor?  Volatile probably
> forces a round trip between memory which might explain why this is the
> case.
> 
> I wonder what your test program does on i386 Linux with GCC?

I don't have an i386 Linux environment.  I tried comparing the
assembly generated with and without volatile, but it proves
difficult as register numbers are changed between the 2 listings
so almost all lines mismatch

If I move ranged(), rangef(), dp_csinh(), and ulpfd() into b.c
so a.c only contains main(), add appropriate prototypes to a.c,
and comment out the printf() statements, I still see the problem.
Judging from the diff, there is a difference in the spills and
loads in 2 places.

% diff -uw without_volatile with_volatile
--- without_volatile2019-03-13 10:51:33.244226000 -0700
+++ with_volatile   2019-03-13 10:51:54.088095000 -0700
@@ -35,11 +35,13 @@
movl%esi, 68(%esp)  # 4-byte Spill
calll   rangef
fadds   .LCPI0_0
-   fstpl   24(%esp)# 8-byte Folded Spill
+   fstps   28(%esp)
calll   rangef
fadds   .LCPI0_1
-   fstl100(%esp)   # 8-byte Folded Spill
-   fldl24(%esp)# 8-byte Folded Reload
+   fstps   24(%esp)
+   flds28(%esp)
+   flds24(%esp)
+   fxch%st(1)
fstps   48(%esp)
fstps   52(%esp)
movl48(%esp), %eax
@@ -49,13 +51,13 @@
calll   csinhf
movl%eax, %esi
movl%edx, %edi
+   flds28(%esp)
+   flds24(%esp)
leal72(%esp), %eax
movl%eax, 20(%esp)
leal80(%esp), %eax
movl%eax, 16(%esp)
-   fldl100(%esp)   # 8-byte Folded Reload
fstpl   8(%esp)
-   fldl24(%esp)# 8-byte Folded Reload
fstpl   (%esp)
calll   dp_csinh
movl%esi, 40(%esp)
@@ -75,7 +77,7 @@
fnstsw  %ax
 # kill: def $ah killed $ah killed $ax
sahf
-   fstl24(%esp)# 8-byte Folded Spill
+   fstl100(%esp)   # 8-byte Folded Spill
ja  .LBB0_3
 # %bb.2:# %for.body
 #   in Loop: Header=BB0_1 Depth=1
@@ -114,7 +116,7 @@
 #   in Loop: Header=BB0_1 Depth=1
fstp%st(2)
fldl92(%esp)# 8-byte Folded Reload
-   fldl24(%esp)# 8-byte Folded Reload
+   fldl100(%esp)   # 8-byte Folded Reload
fucomp  %st(1)
fnstsw  %ax
 # kill: def $ah killed $ah killed $ax

Adding ieeefp.h to a.c and fpsetprec(FP_PE) in main()
produces a massive diff, but still wrong results if
volatile is not use.

Clang appears to be broken for FP on i386/387.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Conrad Meyer
Hi John,

On Wed, Mar 13, 2019 at 10:17 AM John Baldwin  wrote:
> One issue I'm aware of is that clang does not have any support for the
> special arrangement FreeBSD/i386 uses where it uses different precision
> for registers vs in-memory for some of the floating point types (GCC has
> a special hack that is only used on FreeBSD for this but isn't used on
> any other OS's).  I wonder if that could be a factor?  Volatile probably
> forces a round trip between memory which might explain why this is the
> case.
>
> I wonder what your test program does on i386 Linux with GCC?

$ uname -sr
Linux 4.20.4
$ gcc --version
gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6)
...
$ rpm -qf /usr/lib/libm-2.27.so
glibc-2.27-37.fc28.i686

$ gcc -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: 1.959975
Count: 0
$ gcc -O -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: 1.959975
Count: 0
$ gcc -O1 -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: 1.959975
Count: 0
$ gcc -O2 -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: nan
Count: 0
$ gcc -O3 -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: nan
Count: 0

Uh.

kargl.c: In function ‘main’:
kargl.c:80:10: warning: ‘u’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
   if (ur > u) u = ur;
  ^

If I initialize 'u' (to, e.g., -1e52), I get:
Max ULP: 1.959975
Count: 0

at -O2 and -O3 as well.

Best,
Conrad
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread John Baldwin
On 3/13/19 9:40 AM, Steve Kargl wrote:
> On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
>> On 3/13/19 8:16 AM, Steve Kargl wrote:
>>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:

 gcc8 --version
 gcc8 (FreeBSD Ports Collection) 8.3.0

 gcc8 -fno-builtin -o z a.c -lm && ./z
 gcc8 -O -fno-builtin -o z a.c -lm && ./z
 gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
 gcc8 -O3 -fno-builtin -o z a.c -lm && ./z

 Max ULP: 2.297073
 Count: 0   (# of ULP that exceed 21)

>>>
>>> clang agrees with gcc8 if one changes ...
>>>
 int
 main(void)
 {
double re, im, u, ur, ui;
float complex f;
float x, y;
>>>
>>> this line to "volatile float x, y".
>>
>> So it seems to be a regression in clang 7 vs clang 6?
>>
> 
> /usr/local/bin/clang60 has the same problem.  
> 
> % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
>   Maximum ULP: 23.061242
> # of ULP > 21: 39
> 
> Adding volatile as in the above "fixes" the problem.
> 
> AFAICT, this a i386/387 code generation problem.  Perhaps,
> an alignment issue?

Oh, I misread your earlier e-mail to say that clang60 worked.

One issue I'm aware of is that clang does not have any support for the
special arrangement FreeBSD/i386 uses where it uses different precision
for registers vs in-memory for some of the floating point types (GCC has
a special hack that is only used on FreeBSD for this but isn't used on
any other OS's).  I wonder if that could be a factor?  Volatile probably
forces a round trip between memory which might explain why this is the
case.

I wonder what your test program does on i386 Linux with GCC?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
> On 3/13/19 8:16 AM, Steve Kargl wrote:
> > On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> >>
> >> gcc8 --version
> >> gcc8 (FreeBSD Ports Collection) 8.3.0
> >>
> >> gcc8 -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> >>
> >> Max ULP: 2.297073
> >> Count: 0   (# of ULP that exceed 21)
> >>
> > 
> > clang agrees with gcc8 if one changes ...
> > 
> >> int
> >> main(void)
> >> {
> >>double re, im, u, ur, ui;
> >>float complex f;
> >>float x, y;
> > 
> > this line to "volatile float x, y".
> 
> So it seems to be a regression in clang 7 vs clang 6?
> 

/usr/local/bin/clang60 has the same problem.  

% /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
  Maximum ULP: 23.061242
# of ULP > 21: 39

Adding volatile as in the above "fixes" the problem.

AFAICT, this a i386/387 code generation problem.  Perhaps,
an alignment issue?

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread John Baldwin
On 3/13/19 8:16 AM, Steve Kargl wrote:
> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
>>
>> gcc8 --version
>> gcc8 (FreeBSD Ports Collection) 8.3.0
>>
>> gcc8 -fno-builtin -o z a.c -lm && ./z
>> gcc8 -O -fno-builtin -o z a.c -lm && ./z
>> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
>> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
>>
>> Max ULP: 2.297073
>> Count: 0   (# of ULP that exceed 21)
>>
> 
> clang agrees with gcc8 if one changes ...
> 
>> int
>> main(void)
>> {
>>double re, im, u, ur, ui;
>>float complex f;
>>float x, y;
> 
> this line to "volatile float x, y".

So it seems to be a regression in clang 7 vs clang 6?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Wed, Mar 13, 2019 at 04:56:26PM +0100, Hans Petter Selasky wrote:
> On 3/13/19 4:50 PM, Steve Kargl wrote:
> > Using sin() and cos() directly as in
> > 
> > /* Double precision csinh() without using C's double complex.s */
> > void
> > dp_csinh(double x, double y, double *re, double *im)
> > {
> > double c, s;
> > *re = sinh(x) * cos(y);
> > *im = cosh(x) * sin(y);
> > }
> > 
> > does not change the result.  I'll also note that libm
> > is compiled by clang, and I do not recompile it for the
> > tests.  Both gcc8 and cc are using the same libm.
> > 
> > I've also tested clang of amd64 with the -m32, it fails
> > as well.
> 
> Hi,
> 
> I cannot see this is failing with 11-stable userland. Can you check with 
> objdump() that clang doesn't optimise it to sincos() ?

It doesn't.

% nm z | grep sin
 U csinhf
00401360 T dp_csinh
 U sin
 U sinh

> FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on 
> LLVM 3.8.0)
> Target: x86_64-unknown-freebsd11.0

The test does not fail on x86_64 unless you add the -m32 option,
which forces i386 behavior.

cc --version
FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250)
(based on LLVM 7.0.1)
Target: x86_64-unknown-freebsd13.0

cc -fno-builtin -O2 -o z a.c -lm && ./z
Max u: 2.297073
Count: 0

cc -fno-builtin -O2 -o z a.c -lm -m32 && ./z
Max u: 23.061242
Count: 39

> Thread model: posix
> InstalledDir: /usr/bin
> 
> cc -lm -O2 -Wall test.c && ./a.out
> Max ULP: 2.297073
> Count: 0

add -m32.

> 
> clang40 -lm -O2 test6.c
>  > ./a.out
> Max ULP: 2.297073
> Count: 0
> 
> clang50 -lm -O2 test6.c
>  > ./a.out
> Max ULP: 2.297073
> Count: 0
> 
> clang60 -lm -O2 test6.c
>  > ./a.out
> Max ULP: 2.297073
> Count: 0

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Hans Petter Selasky

On 3/13/19 4:50 PM, Steve Kargl wrote:

Using sin() and cos() directly as in

/* Double precision csinh() without using C's double complex.s */
void
dp_csinh(double x, double y, double *re, double *im)
{
double c, s;
*re = sinh(x) * cos(y);
*im = cosh(x) * sin(y);
}

does not change the result.  I'll also note that libm
is compiled by clang, and I do not recompile it for the
tests.  Both gcc8 and cc are using the same libm.

I've also tested clang of amd64 with the -m32, it fails
as well.


Hi,

I cannot see this is failing with 11-stable userland. Can you check with 
objdump() that clang doesn't optimise it to sincos() ?


FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on 
LLVM 3.8.0)

Target: x86_64-unknown-freebsd11.0
Thread model: posix
InstalledDir: /usr/bin

cc -lm -O2 -Wall test.c && ./a.out
Max ULP: 2.297073
Count: 0

clang40 -lm -O2 test6.c
> ./a.out
Max ULP: 2.297073
Count: 0

clang50 -lm -O2 test6.c
> ./a.out
Max ULP: 2.297073
Count: 0

clang60 -lm -O2 test6.c
> ./a.out
Max ULP: 2.297073
Count: 0

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Wed, Mar 13, 2019 at 04:41:51PM +0100, Hans Petter Selasky wrote:
> On 3/13/19 4:16 PM, Steve Kargl wrote:
> > On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> >>
> >> gcc8 --version
> >> gcc8 (FreeBSD Ports Collection) 8.3.0
> >>
> >> gcc8 -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> >>
> >> Max ULP: 2.297073
> >> Count: 0   (# of ULP that exceed 21)
> >>
> > 
> > clang agrees with gcc8 if one changes ...
> > 
> >> int
> >> main(void)
> >> {
> >> double re, im, u, ur, ui;
> >> float complex f;
> >> float x, y;
> > 
> > this line to "volatile float x, y".
> > 
> 
> Can you try to use:
> 
> #define sincos(x,p,q) do { \
>  *(p) = sin(x); \
>  *(q) = cos(x); \
> } while (0)
> 
> 
> Instead of libm's sincos(). Might be a bug in there.
> 

Using sin() and cos() directly as in 

/* Double precision csinh() without using C's double complex.s */
void
dp_csinh(double x, double y, double *re, double *im)
{
   double c, s;
   *re = sinh(x) * cos(y);
   *im = cosh(x) * sin(y);
}

does not change the result.  I'll also note that libm
is compiled by clang, and I do not recompile it for the
tests.  Both gcc8 and cc are using the same libm.

I've also tested clang of amd64 with the -m32, it fails
as well.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Hans Petter Selasky

On 3/13/19 4:16 PM, Steve Kargl wrote:

On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:


gcc8 --version
gcc8 (FreeBSD Ports Collection) 8.3.0

gcc8 -fno-builtin -o z a.c -lm && ./z
gcc8 -O -fno-builtin -o z a.c -lm && ./z
gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
gcc8 -O3 -fno-builtin -o z a.c -lm && ./z

Max ULP: 2.297073
Count: 0   (# of ULP that exceed 21)



clang agrees with gcc8 if one changes ...


int
main(void)
{
double re, im, u, ur, ui;
float complex f;
float x, y;


this line to "volatile float x, y".



Can you try to use:

#define sincos(x,p,q) do { \
*(p) = sin(x); \
*(q) = cos(x); \
} while (0)


Instead of libm's sincos(). Might be a bug in there.

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> 
> gcc8 --version
> gcc8 (FreeBSD Ports Collection) 8.3.0
> 
> gcc8 -fno-builtin -o z a.c -lm && ./z
> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> 
> Max ULP: 2.297073
> Count: 0   (# of ULP that exceed 21)
> 

clang agrees with gcc8 if one changes ...

> int
> main(void)
> {
>double re, im, u, ur, ui;
>float complex f;
>float x, y;

this line to "volatile float x, y".

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> 
> cc -O -fno-builtin -o z a.c -lm && ./z
> cc -O2 -fno-builtin -o z a.c -lm && ./z
> cc -O3 -fno-builtin -o z a.c -lm && ./z
> 
> 
> Max ULP: 23.061242
> Count: 39  (# of ULP that exceeds 21)
> 

These results do not change if one uses /usr/local/bin/clang60.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Optimization bug with floating-point?

2019-03-13 Thread Steve Kargl
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> All,
> 
> There seems to an optimization bug with clang on
> 
> % uname -a
> FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT r344653 MOBILE  i386
> 
> IOW, if you do numerica work on i386, you may want to check your
> results.
> 
> The program demonstrating the issue is at the end of this email.
> 
> gcc8 --version
> gcc8 (FreeBSD Ports Collection) 8.3.0
> 
> gcc8 -fno-builtin -o z a.c -lm && ./z
> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> 
> Max ULP: 2.297073
> Count: 0   (# of ULP that exceed 21)
> 

The above results do not change if one add -ffloat-store to
the command line.


> cc -O -fno-builtin -o z a.c -lm && ./z
> cc -O2 -fno-builtin -o z a.c -lm && ./z
> cc -O3 -fno-builtin -o z a.c -lm && ./z
> 
> Max ULP: 23.061242
> Count: 39  (# of ULP that exceeds 21)

Clang doesn't support -ffloat-store, so the above does not change.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"