Re: Optimization bug with floating-point?
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > All, > > There seems to an optimization bug with clang on > > % uname -a > FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT r344653 MOBILE i386 > > IOW, if you do numerica work on i386, you may want to check your > results. This is now https://bugs.llvm.org/show_bug.cgi?id=41224 -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On 3/14/19 1:08 PM, Steve Kargl wrote: > On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote: >> On 2019-Mar-13 23:30:07 -0700, Steve Kargl >> wrote: >>> AFAICT, all libm float routines need to be modified to conditional >>> include ieeefp.h and call fpsetprec(FP_PD). This will work around >>> issues is FP and libm. FreeBSD needs to issue an erratum about >>> the numerical issues with clang. >> >> I vaguely recall looking into the x87 initialisation a long time ago >> and STR that the startup code (either crtX or in the kernel) does >> a fninit() to set the precision. I don't recall exactly where. >> >> IMO, calling fpsetprec() in every libm float function is overkill. It >> should be enough to fpsetprec() before main() and add a note in the >> man pages that libm is built to use the default FPU configuration and >> changing the configuration (precision or rounding) may result in larger >> errors. > > My understanding of the situation is that FreeBSD i386/387 sets > the FPU to 53-bit precision (whether at start up or first access > is immaterial). This was done long ago to prevent issues with > different optimization levels leaving different intermediate > results is registers with extended precision. You can observe > the problem with the toy program I posted and clang. Compile it > with -O0 and -O2. With the former you have max ULP of 2.9 (the > desired result); with the latter you have a max ULP of 23.xxx. > I have observed a 6 billion ULP issue when running my testsuite. > As pointed out by John Baldwin, GCC is aware of the FPU setting. > The problem with clang is that it seems to unconditionally assume > the FPU is set to 64-bit precision. It is unclear if clang is > generated the desired result for float routines in libm. The > only to gaurantee the desired resut is to use fpsetprec(FP_PD), > or fix clang to take into account the FPU environment. OTOH, note that every other OS in 32-bit mode uses 64-bit precision, and amd64 also uses 64-bit precision by default IIUC. FreeBSD/i386 is definitely unique in this regard. Linux doesn't do it, none of the other BSD's do it (only Dragonfly does b/c they inherited it from FreeBSD). None of Solaris, Windows, etc. do it either if the gcc sources are to be trusted as a reference. That said, I think it must have to do with how clang vs GCC is handling saving the values in memory and whether or not it does truncation to 53 bits when stored in memory somehow. I was trying to poke around in GCC's sources to figure out if it was doing anything differently, but I couldn't find a difference in terms of function pointers, etc. The only difference is is the constants used in a set of structures. I haven't tried to track down what those struct member values control though. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Thu, Mar 14, 2019 at 12:59:14PM -0700, John Baldwin wrote: > On 3/14/19 12:20 PM, Konstantin Belousov wrote: > > On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote: > >> On 2019-Mar-13 23:30:07 -0700, Steve Kargl > >> wrote: > >>> AFAICT, all libm float routines need to be modified to conditional > >>> include ieeefp.h and call fpsetprec(FP_PD). This will work around > >>> issues is FP and libm. FreeBSD needs to issue an erratum about > >>> the numerical issues with clang. > >> > >> I vaguely recall looking into the x87 initialisation a long time ago > >> and STR that the startup code (either crtX or in the kernel) does > >> a fninit() to set the precision. I don't recall exactly where. > > At boot, a clean initial FPU state is stored in fpu_initialstate. > > Then on first FPU access from userspace (first for the given process > > context), this saved state is copied into hardware registers. The > > quirk is that for i386 binaries on amd64, we adjust fpu control word > > to what is expected by i386 binaries. > > > >> > >> IMO, calling fpsetprec() in every libm float function is overkill. It > >> should be enough to fpsetprec() before main() and add a note in the > >> man pages that libm is built to use the default FPU configuration and > >> changing the configuration (precision or rounding) may result in larger > >> errors. > > Changing default precision in crt1 would break the ABI. > > So what I don't understand then is what is gcc doing different than clang > in this case. I assume neither GCC _nor_ clang are adjusting the FPU in > compiler-generated code, and in fact as Steve's earlier tests shows, the > precision is set to PD by default when a clang-built binary is run. Precision control only affect elementary floating-point instructions. Could this be the cause ? SDM vol 1 8.1.5.2 Precision Control Field The precision-control bits only affect the results of the following floating-point instructions: FADD, FADDP, FIADD, FSUB, FSUBP, FISUB, FSUBR, FSUBRP, FISUBR, FMUL, FMULP, FIMUL, FDIV, FDIVP, FIDIV, FDIVR, FDIVRP, FIDIVR, and FSQRT. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote: > On 2019-Mar-13 23:30:07 -0700, Steve Kargl > wrote: > >AFAICT, all libm float routines need to be modified to conditional > >include ieeefp.h and call fpsetprec(FP_PD). This will work around > >issues is FP and libm. FreeBSD needs to issue an erratum about > >the numerical issues with clang. > > I vaguely recall looking into the x87 initialisation a long time ago > and STR that the startup code (either crtX or in the kernel) does > a fninit() to set the precision. I don't recall exactly where. > > IMO, calling fpsetprec() in every libm float function is overkill. It > should be enough to fpsetprec() before main() and add a note in the > man pages that libm is built to use the default FPU configuration and > changing the configuration (precision or rounding) may result in larger > errors. My understanding of the situation is that FreeBSD i386/387 sets the FPU to 53-bit precision (whether at start up or first access is immaterial). This was done long ago to prevent issues with different optimization levels leaving different intermediate results is registers with extended precision. You can observe the problem with the toy program I posted and clang. Compile it with -O0 and -O2. With the former you have max ULP of 2.9 (the desired result); with the latter you have a max ULP of 23.xxx. I have observed a 6 billion ULP issue when running my testsuite. As pointed out by John Baldwin, GCC is aware of the FPU setting. The problem with clang is that it seems to unconditionally assume the FPU is set to 64-bit precision. It is unclear if clang is generated the desired result for float routines in libm. The only to gaurantee the desired resut is to use fpsetprec(FP_PD), or fix clang to take into account the FPU environment. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On 3/14/19 12:20 PM, Konstantin Belousov wrote: > On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote: >> On 2019-Mar-13 23:30:07 -0700, Steve Kargl >> wrote: >>> AFAICT, all libm float routines need to be modified to conditional >>> include ieeefp.h and call fpsetprec(FP_PD). This will work around >>> issues is FP and libm. FreeBSD needs to issue an erratum about >>> the numerical issues with clang. >> >> I vaguely recall looking into the x87 initialisation a long time ago >> and STR that the startup code (either crtX or in the kernel) does >> a fninit() to set the precision. I don't recall exactly where. > At boot, a clean initial FPU state is stored in fpu_initialstate. > Then on first FPU access from userspace (first for the given process > context), this saved state is copied into hardware registers. The > quirk is that for i386 binaries on amd64, we adjust fpu control word > to what is expected by i386 binaries. > >> >> IMO, calling fpsetprec() in every libm float function is overkill. It >> should be enough to fpsetprec() before main() and add a note in the >> man pages that libm is built to use the default FPU configuration and >> changing the configuration (precision or rounding) may result in larger >> errors. > Changing default precision in crt1 would break the ABI. So what I don't understand then is what is gcc doing different than clang in this case. I assume neither GCC _nor_ clang are adjusting the FPU in compiler-generated code, and in fact as Steve's earlier tests shows, the precision is set to PD by default when a clang-built binary is run. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote: > On 2019-Mar-13 23:30:07 -0700, Steve Kargl > wrote: > >AFAICT, all libm float routines need to be modified to conditional > >include ieeefp.h and call fpsetprec(FP_PD). This will work around > >issues is FP and libm. FreeBSD needs to issue an erratum about > >the numerical issues with clang. > > I vaguely recall looking into the x87 initialisation a long time ago > and STR that the startup code (either crtX or in the kernel) does > a fninit() to set the precision. I don't recall exactly where. At boot, a clean initial FPU state is stored in fpu_initialstate. Then on first FPU access from userspace (first for the given process context), this saved state is copied into hardware registers. The quirk is that for i386 binaries on amd64, we adjust fpu control word to what is expected by i386 binaries. > > IMO, calling fpsetprec() in every libm float function is overkill. It > should be enough to fpsetprec() before main() and add a note in the > man pages that libm is built to use the default FPU configuration and > changing the configuration (precision or rounding) may result in larger > errors. Changing default precision in crt1 would break the ABI. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On 2019-Mar-13 23:30:07 -0700, Steve Kargl wrote: >AFAICT, all libm float routines need to be modified to conditional >include ieeefp.h and call fpsetprec(FP_PD). This will work around >issues is FP and libm. FreeBSD needs to issue an erratum about >the numerical issues with clang. I vaguely recall looking into the x87 initialisation a long time ago and STR that the startup code (either crtX or in the kernel) does a fninit() to set the precision. I don't recall exactly where. IMO, calling fpsetprec() in every libm float function is overkill. It should be enough to fpsetprec() before main() and add a note in the man pages that libm is built to use the default FPU configuration and changing the configuration (precision or rounding) may result in larger errors. -- Peter Jeremy signature.asc Description: PGP signature
Re: Optimization bug with floating-point?
On Wed, Mar 13, 2019 at 11:30:07PM -0700, Steve Kargl wrote: > > Spent a couple hours wandering in contrib/llvm. Have no idea > how to fix clang to actually work on i386/387. Any ideas > would be welcomed. > > AFAICT, all libm float routines need to be modified to conditional > include ieeefp.h and call fpsetprec(FP_PD). This will work around > issues is FP and libm. FreeBSD needs to issue an erratum about > the numerical issues with clang. > Probably beating a dead horse, but I'll continue as someone might actually be able to me fix clang. clang has the ability to determine the default precision that the FPU on i386 is using. #include #include #include #include int main(void) { fp_prec_t p; p = fpgetprec(); switch(p) { case FP_PS: printf("24 bit (single-precision)\n"); break; case FP_PRS: printf("reserved\n"); break; case FP_PD: printf("53 bit (double-precision)\n"); break; case FP_PE: printf("64 bit (extended-precision)\n"); break; default: errx(1,"unable to determine precision"); }; return 0; } % cc -o z -O2 d.c && ./z 53 bit (double-precision) It is likely that one (or more files) in contrib/llvm/Target/X86 to be fixed. Unfortunately, there are 116 files, which are written in languages I do not know. Any pointers of which file(s) to poke? -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Wed, Mar 13, 2019 at 02:24:55PM -0700, Steve Kargl wrote: > On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote: > > On 3/13/19 9:40 AM, Steve Kargl wrote: > > > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote: > > >> On 3/13/19 8:16 AM, Steve Kargl wrote: > > >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > > > > gcc8 --version > > gcc8 (FreeBSD Ports Collection) 8.3.0 > > > > gcc8 -fno-builtin -o z a.c -lm && ./z > > gcc8 -O -fno-builtin -o z a.c -lm && ./z > > gcc8 -O2 -fno-builtin -o z a.c -lm && ./z > > gcc8 -O3 -fno-builtin -o z a.c -lm && ./z > > > > Max ULP: 2.297073 > > Count: 0 (# of ULP that exceed 21) > > >>> > > >>> clang agrees with gcc8 if one changes ... > > >>> > > int > > main(void) > > { > > double re, im, u, ur, ui; > > float complex f; > > float x, y; > > >>> > > >>> this line to "volatile float x, y". > > >> > > >> So it seems to be a regression in clang 7 vs clang 6? > > > > > > /usr/local/bin/clang60 has the same problem. > > > > > > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z > > > Maximum ULP: 23.061242 > > > # of ULP > 21: 39 > > > > > > Adding volatile as in the above "fixes" the problem. > > > > > > AFAICT, this a i386/387 code generation problem. Perhaps, > > > an alignment issue? > > > > Oh, I misread your earlier e-mail to say that clang60 worked. > > > > One issue I'm aware of is that clang does not have any support for the > > special arrangement FreeBSD/i386 uses where it uses different precision > > for registers vs in-memory for some of the floating point types (GCC has > > a special hack that is only used on FreeBSD for this but isn't used on > > any other OS's). I wonder if that could be a factor? Volatile probably > > forces a round trip between memory which might explain why this is the > > case. > > > > I went looking for this special hack. In gcc/gccx/config/i386, > one finds > > /* FreeBSD sets the rounding precision of the FPU to 53 bits. Let the >compiler get the contents of and std::numeric_limits correct. */ > #undef TARGET_96_ROUND_53_LONG_DOUBLE > #define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT) > > So, taking this as a hunch, I added ieeefp.h to my test program > and called 'fpsetprec(FP_PD)' as the first executable statement. > This then results in > > % cc -fno-builtin -m32 -O2 -o z b.o a.c -lm && ./z > Max u: 2.297073 > Count: 0 > > So, is there a way to correctly build clang for i386/387 > to automatically set the precision correctly? > Spent a couple hours wandering in contrib/llvm. Have no idea how to fix clang to actually work on i386/387. Any ideas would be welcomed. AFAICT, all libm float routines need to be modified to conditional include ieeefp.h and call fpsetprec(FP_PD). This will work around issues is FP and libm. FreeBSD needs to issue an erratum about the numerical issues with clang. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote: > On 3/13/19 9:40 AM, Steve Kargl wrote: > > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote: > >> On 3/13/19 8:16 AM, Steve Kargl wrote: > >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > > gcc8 --version > gcc8 (FreeBSD Ports Collection) 8.3.0 > > gcc8 -fno-builtin -o z a.c -lm && ./z > gcc8 -O -fno-builtin -o z a.c -lm && ./z > gcc8 -O2 -fno-builtin -o z a.c -lm && ./z > gcc8 -O3 -fno-builtin -o z a.c -lm && ./z > > Max ULP: 2.297073 > Count: 0 (# of ULP that exceed 21) > > >>> > >>> clang agrees with gcc8 if one changes ... > >>> > int > main(void) > { > double re, im, u, ur, ui; > float complex f; > float x, y; > >>> > >>> this line to "volatile float x, y". > >> > >> So it seems to be a regression in clang 7 vs clang 6? > >> > > > > /usr/local/bin/clang60 has the same problem. > > > > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z > > Maximum ULP: 23.061242 > > # of ULP > 21: 39 > > > > Adding volatile as in the above "fixes" the problem. > > > > AFAICT, this a i386/387 code generation problem. Perhaps, > > an alignment issue? > > Oh, I misread your earlier e-mail to say that clang60 worked. > > One issue I'm aware of is that clang does not have any support for the > special arrangement FreeBSD/i386 uses where it uses different precision > for registers vs in-memory for some of the floating point types (GCC has > a special hack that is only used on FreeBSD for this but isn't used on > any other OS's). I wonder if that could be a factor? Volatile probably > forces a round trip between memory which might explain why this is the > case. > I went looking for this special hack. In gcc/gccx/config/i386, one finds /* FreeBSD sets the rounding precision of the FPU to 53 bits. Let the compiler get the contents of and std::numeric_limits correct. */ #undef TARGET_96_ROUND_53_LONG_DOUBLE #define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT) So, taking this as a hunch, I added ieeefp.h to my test program and called 'fpsetprec(FP_PD)' as the first executable statement. This then results in % cc -fno-builtin -m32 -O2 -o z b.o a.c -lm && ./z Max u: 2.297073 Count: 0 So, is there a way to correctly build clang for i386/387 to automatically set the precision correctly? -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Wed, Mar 13, 2019 at 10:40:28AM -0700, Conrad Meyer wrote: > Hi John, > > On Wed, Mar 13, 2019 at 10:17 AM John Baldwin wrote: > > One issue I'm aware of is that clang does not have any support for the > > special arrangement FreeBSD/i386 uses where it uses different precision > > for registers vs in-memory for some of the floating point types (GCC has > > a special hack that is only used on FreeBSD for this but isn't used on > > any other OS's). I wonder if that could be a factor? Volatile probably > > forces a round trip between memory which might explain why this is the > > case. > > > > I wonder what your test program does on i386 Linux with GCC? > > $ uname -sr > Linux 4.20.4 > $ gcc --version > gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6) > ... > $ rpm -qf /usr/lib/libm-2.27.so > glibc-2.27-37.fc28.i686 > > $ gcc -m32 -fno-builtin -o z kargl.c -lm && ./z > Max ULP: 1.959975 > Count: 0 > $ gcc -O -m32 -fno-builtin -o z kargl.c -lm && ./z > Max ULP: 1.959975 > Count: 0 > $ gcc -O1 -m32 -fno-builtin -o z kargl.c -lm && ./z > Max ULP: 1.959975 > Count: 0 > $ gcc -O2 -m32 -fno-builtin -o z kargl.c -lm && ./z > Max ULP: nan > Count: 0 > $ gcc -O3 -m32 -fno-builtin -o z kargl.c -lm && ./z > Max ULP: nan > Count: 0 > > Uh. > > kargl.c: In function ‘main’: > kargl.c:80:10: warning: ‘u’ may be used uninitialized in this function > [-Wmaybe-uninitialized] >if (ur > u) u = ur; > ^ Whoops. There are a number of variations on a theme named a.c. Initializing u to 0 doesn't change the outcome with clang on FreeBSD. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote: > On 3/13/19 9:40 AM, Steve Kargl wrote: > > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote: > >> On 3/13/19 8:16 AM, Steve Kargl wrote: > >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > > gcc8 --version > gcc8 (FreeBSD Ports Collection) 8.3.0 > > gcc8 -fno-builtin -o z a.c -lm && ./z > gcc8 -O -fno-builtin -o z a.c -lm && ./z > gcc8 -O2 -fno-builtin -o z a.c -lm && ./z > gcc8 -O3 -fno-builtin -o z a.c -lm && ./z > > Max ULP: 2.297073 > Count: 0 (# of ULP that exceed 21) > > >>> > >>> clang agrees with gcc8 if one changes ... > >>> > int > main(void) > { > double re, im, u, ur, ui; > float complex f; > float x, y; > >>> > >>> this line to "volatile float x, y". > >> > >> So it seems to be a regression in clang 7 vs clang 6? > >> > > > > /usr/local/bin/clang60 has the same problem. > > > > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z > > Maximum ULP: 23.061242 > > # of ULP > 21: 39 > > > > Adding volatile as in the above "fixes" the problem. > > > > AFAICT, this a i386/387 code generation problem. Perhaps, > > an alignment issue? > > Oh, I misread your earlier e-mail to say that clang60 worked. > > One issue I'm aware of is that clang does not have any support for the > special arrangement FreeBSD/i386 uses where it uses different precision > for registers vs in-memory for some of the floating point types (GCC has > a special hack that is only used on FreeBSD for this but isn't used on > any other OS's). I wonder if that could be a factor? Volatile probably > forces a round trip between memory which might explain why this is the > case. > > I wonder what your test program does on i386 Linux with GCC? I don't have an i386 Linux environment. I tried comparing the assembly generated with and without volatile, but it proves difficult as register numbers are changed between the 2 listings so almost all lines mismatch If I move ranged(), rangef(), dp_csinh(), and ulpfd() into b.c so a.c only contains main(), add appropriate prototypes to a.c, and comment out the printf() statements, I still see the problem. Judging from the diff, there is a difference in the spills and loads in 2 places. % diff -uw without_volatile with_volatile --- without_volatile2019-03-13 10:51:33.244226000 -0700 +++ with_volatile 2019-03-13 10:51:54.088095000 -0700 @@ -35,11 +35,13 @@ movl%esi, 68(%esp) # 4-byte Spill calll rangef fadds .LCPI0_0 - fstpl 24(%esp)# 8-byte Folded Spill + fstps 28(%esp) calll rangef fadds .LCPI0_1 - fstl100(%esp) # 8-byte Folded Spill - fldl24(%esp)# 8-byte Folded Reload + fstps 24(%esp) + flds28(%esp) + flds24(%esp) + fxch%st(1) fstps 48(%esp) fstps 52(%esp) movl48(%esp), %eax @@ -49,13 +51,13 @@ calll csinhf movl%eax, %esi movl%edx, %edi + flds28(%esp) + flds24(%esp) leal72(%esp), %eax movl%eax, 20(%esp) leal80(%esp), %eax movl%eax, 16(%esp) - fldl100(%esp) # 8-byte Folded Reload fstpl 8(%esp) - fldl24(%esp)# 8-byte Folded Reload fstpl (%esp) calll dp_csinh movl%esi, 40(%esp) @@ -75,7 +77,7 @@ fnstsw %ax # kill: def $ah killed $ah killed $ax sahf - fstl24(%esp)# 8-byte Folded Spill + fstl100(%esp) # 8-byte Folded Spill ja .LBB0_3 # %bb.2:# %for.body # in Loop: Header=BB0_1 Depth=1 @@ -114,7 +116,7 @@ # in Loop: Header=BB0_1 Depth=1 fstp%st(2) fldl92(%esp)# 8-byte Folded Reload - fldl24(%esp)# 8-byte Folded Reload + fldl100(%esp) # 8-byte Folded Reload fucomp %st(1) fnstsw %ax # kill: def $ah killed $ah killed $ax Adding ieeefp.h to a.c and fpsetprec(FP_PE) in main() produces a massive diff, but still wrong results if volatile is not use. Clang appears to be broken for FP on i386/387. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
Hi John, On Wed, Mar 13, 2019 at 10:17 AM John Baldwin wrote: > One issue I'm aware of is that clang does not have any support for the > special arrangement FreeBSD/i386 uses where it uses different precision > for registers vs in-memory for some of the floating point types (GCC has > a special hack that is only used on FreeBSD for this but isn't used on > any other OS's). I wonder if that could be a factor? Volatile probably > forces a round trip between memory which might explain why this is the > case. > > I wonder what your test program does on i386 Linux with GCC? $ uname -sr Linux 4.20.4 $ gcc --version gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6) ... $ rpm -qf /usr/lib/libm-2.27.so glibc-2.27-37.fc28.i686 $ gcc -m32 -fno-builtin -o z kargl.c -lm && ./z Max ULP: 1.959975 Count: 0 $ gcc -O -m32 -fno-builtin -o z kargl.c -lm && ./z Max ULP: 1.959975 Count: 0 $ gcc -O1 -m32 -fno-builtin -o z kargl.c -lm && ./z Max ULP: 1.959975 Count: 0 $ gcc -O2 -m32 -fno-builtin -o z kargl.c -lm && ./z Max ULP: nan Count: 0 $ gcc -O3 -m32 -fno-builtin -o z kargl.c -lm && ./z Max ULP: nan Count: 0 Uh. kargl.c: In function ‘main’: kargl.c:80:10: warning: ‘u’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (ur > u) u = ur; ^ If I initialize 'u' (to, e.g., -1e52), I get: Max ULP: 1.959975 Count: 0 at -O2 and -O3 as well. Best, Conrad ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On 3/13/19 9:40 AM, Steve Kargl wrote: > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote: >> On 3/13/19 8:16 AM, Steve Kargl wrote: >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: gcc8 --version gcc8 (FreeBSD Ports Collection) 8.3.0 gcc8 -fno-builtin -o z a.c -lm && ./z gcc8 -O -fno-builtin -o z a.c -lm && ./z gcc8 -O2 -fno-builtin -o z a.c -lm && ./z gcc8 -O3 -fno-builtin -o z a.c -lm && ./z Max ULP: 2.297073 Count: 0 (# of ULP that exceed 21) >>> >>> clang agrees with gcc8 if one changes ... >>> int main(void) { double re, im, u, ur, ui; float complex f; float x, y; >>> >>> this line to "volatile float x, y". >> >> So it seems to be a regression in clang 7 vs clang 6? >> > > /usr/local/bin/clang60 has the same problem. > > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z > Maximum ULP: 23.061242 > # of ULP > 21: 39 > > Adding volatile as in the above "fixes" the problem. > > AFAICT, this a i386/387 code generation problem. Perhaps, > an alignment issue? Oh, I misread your earlier e-mail to say that clang60 worked. One issue I'm aware of is that clang does not have any support for the special arrangement FreeBSD/i386 uses where it uses different precision for registers vs in-memory for some of the floating point types (GCC has a special hack that is only used on FreeBSD for this but isn't used on any other OS's). I wonder if that could be a factor? Volatile probably forces a round trip between memory which might explain why this is the case. I wonder what your test program does on i386 Linux with GCC? -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote: > On 3/13/19 8:16 AM, Steve Kargl wrote: > > On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > >> > >> gcc8 --version > >> gcc8 (FreeBSD Ports Collection) 8.3.0 > >> > >> gcc8 -fno-builtin -o z a.c -lm && ./z > >> gcc8 -O -fno-builtin -o z a.c -lm && ./z > >> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z > >> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z > >> > >> Max ULP: 2.297073 > >> Count: 0 (# of ULP that exceed 21) > >> > > > > clang agrees with gcc8 if one changes ... > > > >> int > >> main(void) > >> { > >>double re, im, u, ur, ui; > >>float complex f; > >>float x, y; > > > > this line to "volatile float x, y". > > So it seems to be a regression in clang 7 vs clang 6? > /usr/local/bin/clang60 has the same problem. % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z Maximum ULP: 23.061242 # of ULP > 21: 39 Adding volatile as in the above "fixes" the problem. AFAICT, this a i386/387 code generation problem. Perhaps, an alignment issue? -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On 3/13/19 8:16 AM, Steve Kargl wrote: > On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: >> >> gcc8 --version >> gcc8 (FreeBSD Ports Collection) 8.3.0 >> >> gcc8 -fno-builtin -o z a.c -lm && ./z >> gcc8 -O -fno-builtin -o z a.c -lm && ./z >> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z >> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z >> >> Max ULP: 2.297073 >> Count: 0 (# of ULP that exceed 21) >> > > clang agrees with gcc8 if one changes ... > >> int >> main(void) >> { >>double re, im, u, ur, ui; >>float complex f; >>float x, y; > > this line to "volatile float x, y". So it seems to be a regression in clang 7 vs clang 6? -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Wed, Mar 13, 2019 at 04:56:26PM +0100, Hans Petter Selasky wrote: > On 3/13/19 4:50 PM, Steve Kargl wrote: > > Using sin() and cos() directly as in > > > > /* Double precision csinh() without using C's double complex.s */ > > void > > dp_csinh(double x, double y, double *re, double *im) > > { > > double c, s; > > *re = sinh(x) * cos(y); > > *im = cosh(x) * sin(y); > > } > > > > does not change the result. I'll also note that libm > > is compiled by clang, and I do not recompile it for the > > tests. Both gcc8 and cc are using the same libm. > > > > I've also tested clang of amd64 with the -m32, it fails > > as well. > > Hi, > > I cannot see this is failing with 11-stable userland. Can you check with > objdump() that clang doesn't optimise it to sincos() ? It doesn't. % nm z | grep sin U csinhf 00401360 T dp_csinh U sin U sinh > FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on > LLVM 3.8.0) > Target: x86_64-unknown-freebsd11.0 The test does not fail on x86_64 unless you add the -m32 option, which forces i386 behavior. cc --version FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1) Target: x86_64-unknown-freebsd13.0 cc -fno-builtin -O2 -o z a.c -lm && ./z Max u: 2.297073 Count: 0 cc -fno-builtin -O2 -o z a.c -lm -m32 && ./z Max u: 23.061242 Count: 39 > Thread model: posix > InstalledDir: /usr/bin > > cc -lm -O2 -Wall test.c && ./a.out > Max ULP: 2.297073 > Count: 0 add -m32. > > clang40 -lm -O2 test6.c > > ./a.out > Max ULP: 2.297073 > Count: 0 > > clang50 -lm -O2 test6.c > > ./a.out > Max ULP: 2.297073 > Count: 0 > > clang60 -lm -O2 test6.c > > ./a.out > Max ULP: 2.297073 > Count: 0 -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On 3/13/19 4:50 PM, Steve Kargl wrote: Using sin() and cos() directly as in /* Double precision csinh() without using C's double complex.s */ void dp_csinh(double x, double y, double *re, double *im) { double c, s; *re = sinh(x) * cos(y); *im = cosh(x) * sin(y); } does not change the result. I'll also note that libm is compiled by clang, and I do not recompile it for the tests. Both gcc8 and cc are using the same libm. I've also tested clang of amd64 with the -m32, it fails as well. Hi, I cannot see this is failing with 11-stable userland. Can you check with objdump() that clang doesn't optimise it to sincos() ? FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0) Target: x86_64-unknown-freebsd11.0 Thread model: posix InstalledDir: /usr/bin cc -lm -O2 -Wall test.c && ./a.out Max ULP: 2.297073 Count: 0 clang40 -lm -O2 test6.c > ./a.out Max ULP: 2.297073 Count: 0 clang50 -lm -O2 test6.c > ./a.out Max ULP: 2.297073 Count: 0 clang60 -lm -O2 test6.c > ./a.out Max ULP: 2.297073 Count: 0 --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Wed, Mar 13, 2019 at 04:41:51PM +0100, Hans Petter Selasky wrote: > On 3/13/19 4:16 PM, Steve Kargl wrote: > > On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > >> > >> gcc8 --version > >> gcc8 (FreeBSD Ports Collection) 8.3.0 > >> > >> gcc8 -fno-builtin -o z a.c -lm && ./z > >> gcc8 -O -fno-builtin -o z a.c -lm && ./z > >> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z > >> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z > >> > >> Max ULP: 2.297073 > >> Count: 0 (# of ULP that exceed 21) > >> > > > > clang agrees with gcc8 if one changes ... > > > >> int > >> main(void) > >> { > >> double re, im, u, ur, ui; > >> float complex f; > >> float x, y; > > > > this line to "volatile float x, y". > > > > Can you try to use: > > #define sincos(x,p,q) do { \ > *(p) = sin(x); \ > *(q) = cos(x); \ > } while (0) > > > Instead of libm's sincos(). Might be a bug in there. > Using sin() and cos() directly as in /* Double precision csinh() without using C's double complex.s */ void dp_csinh(double x, double y, double *re, double *im) { double c, s; *re = sinh(x) * cos(y); *im = cosh(x) * sin(y); } does not change the result. I'll also note that libm is compiled by clang, and I do not recompile it for the tests. Both gcc8 and cc are using the same libm. I've also tested clang of amd64 with the -m32, it fails as well. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On 3/13/19 4:16 PM, Steve Kargl wrote: On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: gcc8 --version gcc8 (FreeBSD Ports Collection) 8.3.0 gcc8 -fno-builtin -o z a.c -lm && ./z gcc8 -O -fno-builtin -o z a.c -lm && ./z gcc8 -O2 -fno-builtin -o z a.c -lm && ./z gcc8 -O3 -fno-builtin -o z a.c -lm && ./z Max ULP: 2.297073 Count: 0 (# of ULP that exceed 21) clang agrees with gcc8 if one changes ... int main(void) { double re, im, u, ur, ui; float complex f; float x, y; this line to "volatile float x, y". Can you try to use: #define sincos(x,p,q) do { \ *(p) = sin(x); \ *(q) = cos(x); \ } while (0) Instead of libm's sincos(). Might be a bug in there. --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > > gcc8 --version > gcc8 (FreeBSD Ports Collection) 8.3.0 > > gcc8 -fno-builtin -o z a.c -lm && ./z > gcc8 -O -fno-builtin -o z a.c -lm && ./z > gcc8 -O2 -fno-builtin -o z a.c -lm && ./z > gcc8 -O3 -fno-builtin -o z a.c -lm && ./z > > Max ULP: 2.297073 > Count: 0 (# of ULP that exceed 21) > clang agrees with gcc8 if one changes ... > int > main(void) > { >double re, im, u, ur, ui; >float complex f; >float x, y; this line to "volatile float x, y". -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > > cc -O -fno-builtin -o z a.c -lm && ./z > cc -O2 -fno-builtin -o z a.c -lm && ./z > cc -O3 -fno-builtin -o z a.c -lm && ./z > > > Max ULP: 23.061242 > Count: 39 (# of ULP that exceeds 21) > These results do not change if one uses /usr/local/bin/clang60. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Optimization bug with floating-point?
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote: > All, > > There seems to an optimization bug with clang on > > % uname -a > FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT r344653 MOBILE i386 > > IOW, if you do numerica work on i386, you may want to check your > results. > > The program demonstrating the issue is at the end of this email. > > gcc8 --version > gcc8 (FreeBSD Ports Collection) 8.3.0 > > gcc8 -fno-builtin -o z a.c -lm && ./z > gcc8 -O -fno-builtin -o z a.c -lm && ./z > gcc8 -O2 -fno-builtin -o z a.c -lm && ./z > gcc8 -O3 -fno-builtin -o z a.c -lm && ./z > > Max ULP: 2.297073 > Count: 0 (# of ULP that exceed 21) > The above results do not change if one add -ffloat-store to the command line. > cc -O -fno-builtin -o z a.c -lm && ./z > cc -O2 -fno-builtin -o z a.c -lm && ./z > cc -O3 -fno-builtin -o z a.c -lm && ./z > > Max ULP: 23.061242 > Count: 39 (# of ULP that exceeds 21) Clang doesn't support -ffloat-store, so the above does not change. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"