Bug#572746: libm: sinf/cosf performance is awful on amd64
On Wed, Mar 17, 2010 at 11:29:00AM +0100, Vincent Lefevre wrote: On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote: On amd64, only sincos has an optimized version, It may be optimized, but completely buggy. For instance, on 1e22, sincos returns 0.46261304076460174617 for the sine instead of -0.85220084976718879499 (correctly rounded value). Even the sign is incorrect! Where did you get this result? In my tests both the x87 FPU and the current glibc code give the following result when using double variables and the sincos() function: sin (1e22) = 0.46261304076460174617 cos (1e22) = -0.88656030506363692201 -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100321151449.gb11...@hall.aurel32.net
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 2010-03-21 16:14:49 +0100, Aurelien Jarno wrote: On Wed, Mar 17, 2010 at 11:29:00AM +0100, Vincent Lefevre wrote: It may be optimized, but completely buggy. For instance, on 1e22, sincos returns 0.46261304076460174617 for the sine instead of -0.85220084976718879499 (correctly rounded value). Even the sign is incorrect! Where did you get this result? In my tests both the x87 FPU and the current glibc code give the following result when using double variables and the sincos() function: sin (1e22) = 0.46261304076460174617 cos (1e22) = -0.88656030506363692201 Actually the sincos() function uses the x87 FPU (fsincos instruction), so that's not surprising that you get the same result. Try with GCC = 4.4 (so that constant expressions are evaluated with MPFR): xvii:~ cat tst.c #include stdio.h #include math.h int main (void) { printf (sin (1e22) = %.17g\n, sin(1e22)); printf (cos (1e22) = %.17g\n, cos(1e22)); return 0; } xvii:~ gcc tst.c -o tst xvii:~ ./tst sin (1e22) = -0.85220084976718879 cos (1e22) = 0.52321478539513899 And if you don't trust MPFR, you can still try with gp (pari). -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100321153036.gd1...@prunille.vinc17.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
On Sun, Mar 21, 2010 at 04:30:36PM +0100, Vincent Lefevre wrote: On 2010-03-21 16:14:49 +0100, Aurelien Jarno wrote: On Wed, Mar 17, 2010 at 11:29:00AM +0100, Vincent Lefevre wrote: It may be optimized, but completely buggy. For instance, on 1e22, sincos returns 0.46261304076460174617 for the sine instead of -0.85220084976718879499 (correctly rounded value). Even the sign is incorrect! Where did you get this result? In my tests both the x87 FPU and the current glibc code give the following result when using double variables and the sincos() function: sin (1e22) = 0.46261304076460174617 cos (1e22) = -0.88656030506363692201 Actually the sincos() function uses the x87 FPU (fsincos instruction), so that's not surprising that you get the same result. That just means that the precision is not an argument for switching to the x87 FPU version, contrary to what your first email suggested. Both are wrong. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100321195501.gc11...@hall.aurel32.net
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 2010-03-21 20:55:01 +0100, Aurelien Jarno wrote: On Sun, Mar 21, 2010 at 04:30:36PM +0100, Vincent Lefevre wrote: Actually the sincos() function uses the x87 FPU (fsincos instruction), so that's not surprising that you get the same result. That just means that the precision is not an argument for switching to the x87 FPU version, contrary to what your first email suggested. Both are wrong. My e-mail didn't suggest that. On the contrary, the glibc sincos() implementation shouldn't use the x87 FPU version, but should use the generic C implementation (like sin and cos) instead. -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100321211150.gb8...@prunille.vinc17.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote: On amd64, only sincos has an optimized version, It may be optimized, but completely buggy. For instance, on 1e22, sincos returns 0.46261304076460174617 for the sine instead of -0.85220084976718879499 (correctly rounded value). Even the sign is incorrect! -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100317102900.ga29...@prunille.vinc17.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 2010-03-06 11:42:51 +0100, Jerome Vizcaino wrote: After many tests and research I've come to the conclusion that the float variants of sin/cos (and maybe others) are anormaly slow Debian amd64. Note that on amd64, sin and cos may be slow, but at least they are mostly correct (in rounding to nearest only). In 32-bit mode (-m32), the hardware fsin/fcos instructions (and implementation) are used, but they are buggy on large arguments, just like sincos in 64-bit mode (which uses the hardware instruction fsincos), and not as accurate as the MathLib version on small arguments (even though MathLib has a bug in its error analysis for sin). If you don't mind about correctness, you can still use the -ffast-math GCC option (you don't use it in your Makefile). Depending on the application, this can be OK, but don't complain if you get incorrect, inaccurate or unexpected results in some cases. This was for sin/cos. I don't know how sinf/cosf are implemented in 32-bit and 64-bit modes, but make sure they don't have the same problem. If they do, this bug is invalid. Regards, -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100317104738.gb29...@prunille.vinc17.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 17.03.2010 11:29, Vincent Lefevre wrote: On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote: On amd64, only sincos has an optimized version, It may be optimized, but completely buggy. For instance, on 1e22, sincos returns 0.46261304076460174617 for the sine instead of -0.85220084976718879499 (correctly rounded value). Even the sign is incorrect! Sorry, but I don't see the error. Both are the correct values, as any values from -1 to 1. The double 1e22 is outside the *relevant* precision for double, e.g. a whole range of 2*pi is described with the same number (1e22). Maybe using long double (and sincosl) you will have the expected results (I did not calculate the minimun precision of long double). ciao cate -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4ba0cde0.1020...@debian.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 2010-03-17 13:41:04 +0100, Giacomo A. Catenazzi wrote: On 17.03.2010 11:29, Vincent Lefevre wrote: On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote: On amd64, only sincos has an optimized version, It may be optimized, but completely buggy. For instance, on 1e22, sincos returns 0.46261304076460174617 for the sine instead of -0.85220084976718879499 (correctly rounded value). Even the sign is incorrect! Sorry, but I don't see the error. Both are the correct values, as any values from -1 to 1. The double 1e22 is outside the *relevant* precision for double, e.g. a whole range of 2*pi is described with the same number (1e22). No, this is wrong. This could be correct with interval arithmetic, but floating-point arithmetic works differently: inputs are regarded as *exact*. One reason is that the implementation cannot know whether the input is really exact or not (this information is not passed to the function). Another reason is that some formulas (based on correlation) such as sin(x)^2 + cos(x)^2 = 1 or algorithms may fail if the implementation assumes that inputs are regarded as some arbitrary point in an interval. Also, IEEE 754 requires correct rounding. Support for elementary functions is optional (Section 9), but since they are in the library, it is much better to support them. Users/developers who know that their software would not be affected by a poorly accurate implementation could still use compiler options. -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100317133616.gr1...@prunille.vinc17.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 17.03.2010 14:36, Vincent Lefevre wrote: On 2010-03-17 13:41:04 +0100, Giacomo A. Catenazzi wrote: On 17.03.2010 11:29, Vincent Lefevre wrote: On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote: On amd64, only sincos has an optimized version, It may be optimized, but completely buggy. For instance, on 1e22, sincos returns 0.46261304076460174617 for the sine instead of -0.85220084976718879499 (correctly rounded value). Even the sign is incorrect! Sorry, but I don't see the error. Both are the correct values, as any values from -1 to 1. The double 1e22 is outside the *relevant* precision for double, e.g. a whole range of 2*pi is described with the same number (1e22). No, this is wrong. This could be correct with interval arithmetic, but floating-point arithmetic works differently: inputs are regarded as *exact*. Are you sure? From C standard (not really the standard, but the draft n1256: 5.2.4.2.2, point 5): : The accuracy of the floating-point operations (+, -, *, /) and of : the library functions in math.h and complex.h that return : floating-point results is implementationdefined, : as is the accuracy of the conversion between floating-point : internal representations and string representations performed by : the library functions in stdio.h, stdlib.h, and wchar.h. : The implementation may state that the accuracy is unknown. And also looking in POSIX, I find no further requirement, so are you sure that 1e22 should be taken as exact. So maybe the bug is to define __STDC_IEC_559__ on such case. OTOH, section F9.1 don't require (my interpretation) trigonometric to be IEC 60559 functions. It has such requirement for more elementary functions, e.g. for sqrt (see section F3). OTOH I think you are an expert on such standards, so it is possible/probable that I'm completly wrong. ciao cate -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4ba0ffed.6070...@debian.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 2010-03-17 17:14:37 +0100, Giacomo A. Catenazzi wrote: From C standard (not really the standard, but the draft n1256: 5.2.4.2.2, point 5): : The accuracy of the floating-point operations (+, -, *, /) and of : the library functions in math.h and complex.h that return : floating-point results is implementationdefined, : as is the accuracy of the conversion between floating-point : internal representations and string representations performed by : the library functions in stdio.h, stdlib.h, and wchar.h. : The implementation may state that the accuracy is unknown. The values are defined by 5.2.4.2.2p2 (see the equality). The model is that a floating-point number represents a real number, not an interval. accuracy is a measure of the difference between the exact value and the (rounded) value returned by the function. So maybe the bug is to define __STDC_IEC_559__ on such case. OTOH, section F9.1 don't require (my interpretation) trigonometric to be IEC 60559 functions. It has such requirement for more elementary functions, e.g. for sqrt (see section F3). Yes, elementary functions are not covered by Annex F (which is on the old IEEE 754-1985 standard). However, a C implementation could claim to conform to IEEE 754 (i.e. more than being conform to ISO C). This is more a quality of implementation. Note: I don't know about other languages, whose implementation could also use the glibc. They could be more restrictive than C. -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100317165728.gs1...@prunille.vinc17.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
Hi, I do not complain about the sin/cos performance but only on the float variants. Using -ffast-math gives a nice performance boost but leads to bad results (in our cases which may be different from the simple given example) so it's not really a workaround. In fact, I don't really care about super-accuracy and the glibc standard implementations gives us good results. Maybe sinf/cosf in 32 bits is a bit broken but that worked for us in terms of precision and performance. On 64 bits, I can only rely on precision, as performance is gone :( Jerome On Wednesday 17 March 2010, Vincent Lefevre wrote: On 2010-03-06 11:42:51 +0100, Jerome Vizcaino wrote: After many tests and research I've come to the conclusion that the float variants of sin/cos (and maybe others) are anormaly slow Debian amd64. Note that on amd64, sin and cos may be slow, but at least they are mostly correct (in rounding to nearest only). In 32-bit mode (-m32), the hardware fsin/fcos instructions (and implementation) are used, but they are buggy on large arguments, just like sincos in 64-bit mode (which uses the hardware instruction fsincos), and not as accurate as the MathLib version on small arguments (even though MathLib has a bug in its error analysis for sin). If you don't mind about correctness, you can still use the -ffast-math GCC option (you don't use it in your Makefile). Depending on the application, this can be OK, but don't complain if you get incorrect, inaccurate or unexpected results in some cases. This was for sin/cos. I don't know how sinf/cosf are implemented in 32-bit and 64-bit modes, but make sure they don't have the same problem. If they do, this bug is invalid. Regards, -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201003171931.16812.vizcaino_jer...@yahoo.fr
Bug#572746: libm: sinf/cosf performance is awful on amd64
On 2010-03-17 19:31:16 +0100, Jerome Vizcaino wrote: I do not complain about the sin/cos performance but only on the float variants. OK. I haven't looked at the code, but if sinf() simply calls sin(), this is suboptimal and there would be room for performance boost without sacrifying accuracy. For large arguments, it would always be slower if you care about not too bad accuracy. If you don't care because any value between -1 and 1 is OK, then you could write a wrapper with: return fabsf(x) some_bound ? sinf(x) : 0; Using -ffast-math gives a nice performance boost but leads to bad results (in our cases which may be different from the simple given example) so it's not really a workaround. Yes, that's the problem with optimizations that change the semantics (in the same way, users can complain about the 32-bit implementation of trigonometric functions). Note that there are other compiler options that allow a finer grain control of these optimizations. You might want to look at them (e.g. man gcc). -- Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100317194313.gu1...@prunille.vinc17.org
Bug#572746: libm: sinf/cosf performance is awful on amd64
Ok, running sinf/cosf with bounded values gives better performance results (close to sin/cos ones). I think the binding trick should be written in the manpages as a note on amd64 (at least) because the behavior is different on i386 and clearly not expected... Anyway, I still can't get the performance I had on the same hardware : + 0.94 secs on 32 bits sinf/cosf without bound values + 1.07 secs on 64 bits sinf/cosf with manual binding. Do you know how the asm in the lib binds the input value (I mean for the optimised sin/cos versions for example) ? Jerome On Monday 08 March 2010, Aurelien Jarno wrote: Jerome Vizcaino a écrit : What do you mean sinf/cosf is supposed to be twice faster ? You're mentionning calling it with bound values ? Yes, with the current code and bounded values, it is twice faster. This is not the case anymore with the assembly code, as the same FPU instruction (fcos/fsin/fsincos) is used for the three versions (float, double, long double). -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201003090911.17967.vizcaino_jer...@yahoo.fr
Bug#572746: libm: sinf/cosf performance is awful on amd64
What do you mean sinf/cosf is supposed to be twice faster ? You're mentionning calling it with bound values ? Thank you. Jerome On Monday 08 March 2010, Aurelien Jarno wrote: On Mon, Mar 08, 2010 at 12:17:41AM +0100, Jerome Vizcaino wrote: Ok about the patches : there had to be a reason for those not to be merged upstream. Also I think they have never been submitted upstream. They are from 2002... Some of my co-workers noticed the performance improvement when binding values between -pi and pi but the thing is, this kind of trick do not need to be applied when on 32 bits systems... Libm's behavior is not really arch-proof in terms of performance which is a bit confusing (and I would expect 64 bit being better than 32 bits when dealing with maths... :( cosf/sinf is actually twice faster on 64-bit than on 32-bit in normal cases, but can be a lot slower in the cases that concern you. Have you tried pushing your code upstream ? Maybe this would be useful for future versions. It's currently not clean enough to be send upstream. I'll try to send it upstream, but I don't have to much hope. -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201003080927.08652.vizcaino_jer...@yahoo.fr
Bug#572746: libm: sinf/cosf performance is awful on amd64
Jerome Vizcaino a écrit : What do you mean sinf/cosf is supposed to be twice faster ? You're mentionning calling it with bound values ? Yes, with the current code and bounded values, it is twice faster. This is not the case anymore with the assembly code, as the same FPU instruction (fcos/fsin/fsincos) is used for the three versions (float, double, long double). -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4b94dc2b.7010...@aurel32.net
Bug#572746: libm: sinf/cosf performance is awful on amd64
On Sat, Mar 06, 2010 at 11:42:51AM +0100, Jerome Vizcaino wrote: Package: libc6 Version: 2.10.2-6 Severity: normal Hi, After many tests and research I've come to the conclusion that the float variants of sin/cos (and maybe others) are anormaly slow Debian amd64. The performance loss is really impressive (around 8 to 9 times slower). I've attached the prog used to make my experiments and used it in the following cases. + Lenny-amd64: sinf/cosf is really slow + Lenny-i386: float performance is ok (faster than the cos/sin using double) + Sid-amd64: sinf/cosf slow + Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is OK. On amd64, only sincos has an optimized version, sincosf is using the generic C implementation. On i386, there are optimized version of both sincos and sincosf + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64, the tests run fine ! = The problem is not compiler related. There seems to be a problem with the way libm is compiled for the amd64 architecture on Debian. This is why the OpenSuse test was run: the problem is somewhere in the compile chain or debian specific patches. The problem is clearly not Debian specific, and is also present upstream. OpenSuse is probably using a patch to workaround the problem. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100307151708.ga...@hall.aurel32.net
Bug#572746: libm: sinf/cosf performance is awful on amd64
On Sun, Mar 07, 2010 at 04:17:08PM +0100, Aurelien Jarno wrote: On Sat, Mar 06, 2010 at 11:42:51AM +0100, Jerome Vizcaino wrote: Package: libc6 Version: 2.10.2-6 Severity: normal Hi, After many tests and research I've come to the conclusion that the float variants of sin/cos (and maybe others) are anormaly slow Debian amd64. The performance loss is really impressive (around 8 to 9 times slower). I've attached the prog used to make my experiments and used it in the following cases. + Lenny-amd64: sinf/cosf is really slow + Lenny-i386: float performance is ok (faster than the cos/sin using double) + Sid-amd64: sinf/cosf slow + Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is OK. On amd64, only sincos has an optimized version, sincosf is using the generic C implementation. On i386, there are optimized version of both sincos and sincosf + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64, the tests run fine ! = The problem is not compiler related. There seems to be a problem with the way libm is compiled for the amd64 architecture on Debian. This is why the OpenSuse test was run: the problem is somewhere in the compile chain or debian specific patches. The problem is clearly not Debian specific, and is also present upstream. OpenSuse is probably using a patch to workaround the problem. This is confirmed, there using an AMD version of the libm library on x86_64, still coded in C for the sincosf function. A quick an dirty implementation of sincosf in x86_64 assembly gives me a speed around 4% slower than sincos. What kind of performance ratio do you get on SuSe? The solution seems to write each *f function in x86_64 assembly, but that'll probably take time. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100307161759.gc...@hall.aurel32.net
Bug#572746: libm: sinf/cosf performance is awful on amd64
Hi, I could not say for sure the difference between sin and sinf (for example) on Suse but the performance ratio I had on 32 bits, stayed the same on 64 bits. This is why I was surprised to get impressive slowness when moving to debian :( Thanks for pointing out the Suse patch : as we only have Suse or Debian at work I could not do more comparisons. How about including patches from OpenSuse ? Is it possible as a quick workaround? Thanks for your help. Jerome On Sunday 07 March 2010, you wrote: On Sun, Mar 07, 2010 at 04:17:08PM +0100, Aurelien Jarno wrote: On Sat, Mar 06, 2010 at 11:42:51AM +0100, Jerome Vizcaino wrote: Package: libc6 Version: 2.10.2-6 Severity: normal Hi, After many tests and research I've come to the conclusion that the float variants of sin/cos (and maybe others) are anormaly slow Debian amd64. The performance loss is really impressive (around 8 to 9 times slower). I've attached the prog used to make my experiments and used it in the following cases. + Lenny-amd64: sinf/cosf is really slow + Lenny-i386: float performance is ok (faster than the cos/sin using double) + Sid-amd64: sinf/cosf slow + Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is OK. On amd64, only sincos has an optimized version, sincosf is using the generic C implementation. On i386, there are optimized version of both sincos and sincosf + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64, the tests run fine ! = The problem is not compiler related. There seems to be a problem with the way libm is compiled for the amd64 architecture on Debian. This is why the OpenSuse test was run: the problem is somewhere in the compile chain or debian specific patches. The problem is clearly not Debian specific, and is also present upstream. OpenSuse is probably using a patch to workaround the problem. This is confirmed, there using an AMD version of the libm library on x86_64, still coded in C for the sincosf function. A quick an dirty implementation of sincosf in x86_64 assembly gives me a speed around 4% slower than sincos. What kind of performance ratio do you get on SuSe? The solution seems to write each *f function in x86_64 assembly, but that'll probably take time. -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201003071943.46266.vizcaino_jer...@yahoo.fr
Bug#572746: libm: sinf/cosf performance is awful on amd64
On Sun, Mar 07, 2010 at 07:43:46PM +0100, Jerome Vizcaino wrote: Hi, I could not say for sure the difference between sin and sinf (for example) on Suse but the performance ratio I had on 32 bits, stayed the same on 64 bits. This is why I was surprised to get impressive slowness when moving to debian :( Thanks for pointing out the Suse patch : as we only have Suse or Debian at work I could not do more comparisons. How about including patches from OpenSuse ? Is it possible as a quick workaround? The patches from OpenSuse are ugly and very invasive, and they do not seem to include the recent errno changes for C99 compliance (though I haven't tested them). I am not really sure we want that. I have started to rewrite part of the functions in assembly. While this new assembly code behaves correctly with your testcase, it is twice slower than the current version when using normal arguments. I have modified a bit your code to stay within a reasonable range of arguments, and also test the l version of the functions. Here is the result with the original code (using C code for the f version): | Testing 1000 sinf, cosf and tanf... Result: 19764686.00, Duration: 0.516700 sec | Testing 1000 sin, cos and tan (with float args)... Result: 19764686.00, Duration: 1.056214 sec | Testing 1000 sinl, cosl and tanl (with float args)... Result: 19764686.00, Duration: 1.089871 sec Here is the result with assembly code instead (using the FPU instructions), I get instead: | Testing 1000 sinf, cosf and tanf... Result: 19764686.00, Duration: 1.010248 sec | Testing 1000 sin, cos and tan (with float args)... Result: 19764686.00, Duration: 1.055434 sec | Testing 1000 sinl, cosl and tanl (with float args)... Result: 19764686.00, Duration: 1.095374 sec As I expect most codes to use values between -2pi and 2pi, I am not really sure we should change the current code. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100307210105.ge...@hall.aurel32.net
Bug#572746: libm: sinf/cosf performance is awful on amd64
On Mon, Mar 08, 2010 at 12:17:41AM +0100, Jerome Vizcaino wrote: Ok about the patches : there had to be a reason for those not to be merged upstream. Also I think they have never been submitted upstream. They are from 2002... Some of my co-workers noticed the performance improvement when binding values between -pi and pi but the thing is, this kind of trick do not need to be applied when on 32 bits systems... Libm's behavior is not really arch-proof in terms of performance which is a bit confusing (and I would expect 64 bit being better than 32 bits when dealing with maths... :( cosf/sinf is actually twice faster on 64-bit than on 32-bit in normal cases, but can be a lot slower in the cases that concern you. Have you tried pushing your code upstream ? Maybe this would be useful for future versions. It's currently not clean enough to be send upstream. I'll try to send it upstream, but I don't have to much hope. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100307232342.gg...@hall.aurel32.net
Bug#572746: libm: sinf/cosf performance is awful on amd64
Ok about the patches : there had to be a reason for those not to be merged upstream. Some of my co-workers noticed the performance improvement when binding values between -pi and pi but the thing is, this kind of trick do not need to be applied when on 32 bits systems... Libm's behavior is not really arch-proof in terms of performance which is a bit confusing (and I would expect 64 bit being better than 32 bits when dealing with maths... :( Have you tried pushing your code upstream ? Maybe this would be useful for future versions. Thanks for your help Jerome On Sunday 07 March 2010, Aurelien Jarno wrote: On Sun, Mar 07, 2010 at 07:43:46PM +0100, Jerome Vizcaino wrote: Hi, I could not say for sure the difference between sin and sinf (for example) on Suse but the performance ratio I had on 32 bits, stayed the same on 64 bits. This is why I was surprised to get impressive slowness when moving to debian :( Thanks for pointing out the Suse patch : as we only have Suse or Debian at work I could not do more comparisons. How about including patches from OpenSuse ? Is it possible as a quick workaround? The patches from OpenSuse are ugly and very invasive, and they do not seem to include the recent errno changes for C99 compliance (though I haven't tested them). I am not really sure we want that. I have started to rewrite part of the functions in assembly. While this new assembly code behaves correctly with your testcase, it is twice slower than the current version when using normal arguments. I have modified a bit your code to stay within a reasonable range of arguments, and also test the l version of the functions. Here is the result with the original code (using C code for the f version): | Testing 1000 sinf, cosf and tanf... Result: 19764686.00, | Duration: 0.516700 sec Testing 1000 sin, cos and tan (with float | args)... Result: 19764686.00, Duration: 1.056214 sec Testing 1000 | sinl, cosl and tanl (with float args)... Result: 19764686.00, | Duration: 1.089871 sec Here is the result with assembly code instead (using the FPU instructions), I get instead: | Testing 1000 sinf, cosf and tanf... Result: 19764686.00, | Duration: 1.010248 sec Testing 1000 sin, cos and tan (with float | args)... Result: 19764686.00, Duration: 1.055434 sec Testing 1000 | sinl, cosl and tanl (with float args)... Result: 19764686.00, | Duration: 1.095374 sec As I expect most codes to use values between -2pi and 2pi, I am not really sure we should change the current code. -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201003080017.41964.vizcaino_jer...@yahoo.fr
Bug#572746: libm: sinf/cosf performance is awful on amd64
Package: libc6 Version: 2.10.2-6 Severity: normal Hi, After many tests and research I've come to the conclusion that the float variants of sin/cos (and maybe others) are anormaly slow Debian amd64. The performance loss is really impressive (around 8 to 9 times slower). I've attached the prog used to make my experiments and used it in the following cases. + Lenny-amd64: sinf/cosf is really slow + Lenny-i386: float performance is ok (faster than the cos/sin using double) + Sid-amd64: sinf/cosf slow + Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is OK. + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64, the tests run fine ! = The problem is not compiler related. There seems to be a problem with the way libm is compiled for the amd64 architecture on Debian. This is why the OpenSuse test was run: the problem is somewhere in the compile chain or debian specific patches. We're extensively using these for calculations and this is a real problem. Using cos/sin as a temporary workaround would do the trick but this is still slower than the sinf/cosf implementations that works so well on 32 bits computers... Thank you Jerome -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-trunk-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8) Shell: /bin/sh linked to /bin/bash Versions of packages libc6 depends on: ii libc-bin 2.10.2-6 Embedded GNU C Library: Binaries ii libgcc1 1:4.4.3-3 GCC support library libc6 recommends no packages. Versions of packages libc6 suggests: ii debconf [debconf-2.0] 1.5.28 Debian configuration management sy pn glibc-doc none (no description available) ii locales 2.10.2-6 Embedded GNU C Library: National L -- debconf information excluded CC=gcc CFLAGS=-DNDEBUG -O3 -D_ISOC99_SOURCE -Wall -Wextra LDFLAGS=-lm all: test_trig clean: rm test_trig test_trig: test_trig.c #include math.h #include sys/time.h #include stdio.h int main(void) { const int nbElement_i = 1000; int i=0; float f1=0.0f, f2=0.0f, f3=0.0f; struct timeval tv1, tv2; printf(Testing %d sinf and cosf... , nbElement_i); fflush(stdout); gettimeofday(tv1, NULL); for(i=0; inbElement_i; i++){ f1 += cosf(i); f2 += sinf(i); } // This is needed for gcc to know a and b results // really matters, otherwise sin and cos could // be ignored. f3 = f1+f2; gettimeofday(tv2, NULL); // printf(Result: %f, Duration: %ld sec %ld usec\n, f3, tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec); f1 = 0.0f; f2 = 0.0f; printf(Testing %d sin and cos (with float args)... , nbElement_i); fflush(stdout); gettimeofday(tv1, NULL); for(i=0; inbElement_i; i++){ f1 += cos(i); f2 += sin(i); } // This is needed for gcc to know a and b results // really matters, otherwise sin and cos could // be ignored. f3 = f1+f2; gettimeofday(tv2, NULL); // printf(Result: %f, Duration: %ld sec %ld usec\n, f3, tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec); return 0; }