Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-21 Thread Aurelien Jarno
On Wed, Mar 17, 2010 at 11:29:00AM +0100, Vincent Lefevre wrote:
 On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote:
  On amd64, only sincos has an optimized version,
 
 It may be optimized, but completely buggy. For instance, on 1e22,
 sincos returns 0.46261304076460174617 for the sine instead of
 -0.85220084976718879499 (correctly rounded value). Even the sign
 is incorrect!
 

Where did you get this result? In my tests both the x87 FPU and the
current glibc code give the following result when using double variables
and the sincos() function:

sin (1e22) =  0.46261304076460174617
cos (1e22) = -0.88656030506363692201


-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100321151449.gb11...@hall.aurel32.net



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-21 Thread Vincent Lefevre
On 2010-03-21 16:14:49 +0100, Aurelien Jarno wrote:
 On Wed, Mar 17, 2010 at 11:29:00AM +0100, Vincent Lefevre wrote:
  It may be optimized, but completely buggy. For instance, on 1e22,
  sincos returns 0.46261304076460174617 for the sine instead of
  -0.85220084976718879499 (correctly rounded value). Even the sign
  is incorrect!
 
 Where did you get this result? In my tests both the x87 FPU and the
 current glibc code give the following result when using double variables
 and the sincos() function:
 
 sin (1e22) =  0.46261304076460174617
 cos (1e22) = -0.88656030506363692201

Actually the sincos() function uses the x87 FPU (fsincos instruction),
so that's not surprising that you get the same result.

Try with GCC = 4.4 (so that constant expressions are evaluated
with MPFR):

xvii:~ cat tst.c
#include stdio.h
#include math.h
int main (void)
{
  printf (sin (1e22) = %.17g\n, sin(1e22));
  printf (cos (1e22) = %.17g\n, cos(1e22));
  return 0;
}
xvii:~ gcc tst.c -o tst
xvii:~ ./tst
sin (1e22) = -0.85220084976718879
cos (1e22) = 0.52321478539513899

And if you don't trust MPFR, you can still try with gp (pari).

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100321153036.gd1...@prunille.vinc17.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-21 Thread Aurelien Jarno
On Sun, Mar 21, 2010 at 04:30:36PM +0100, Vincent Lefevre wrote:
 On 2010-03-21 16:14:49 +0100, Aurelien Jarno wrote:
  On Wed, Mar 17, 2010 at 11:29:00AM +0100, Vincent Lefevre wrote:
   It may be optimized, but completely buggy. For instance, on 1e22,
   sincos returns 0.46261304076460174617 for the sine instead of
   -0.85220084976718879499 (correctly rounded value). Even the sign
   is incorrect!
  
  Where did you get this result? In my tests both the x87 FPU and the
  current glibc code give the following result when using double variables
  and the sincos() function:
  
  sin (1e22) =  0.46261304076460174617
  cos (1e22) = -0.88656030506363692201
 
 Actually the sincos() function uses the x87 FPU (fsincos instruction),
 so that's not surprising that you get the same result.
 

That just means that the precision is not an argument for switching to
the x87 FPU version, contrary to what your first email suggested. Both
are wrong.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100321195501.gc11...@hall.aurel32.net



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-21 Thread Vincent Lefevre
On 2010-03-21 20:55:01 +0100, Aurelien Jarno wrote:
 On Sun, Mar 21, 2010 at 04:30:36PM +0100, Vincent Lefevre wrote:
  Actually the sincos() function uses the x87 FPU (fsincos instruction),
  so that's not surprising that you get the same result.
 
 That just means that the precision is not an argument for switching to
 the x87 FPU version, contrary to what your first email suggested. Both
 are wrong.

My e-mail didn't suggest that. On the contrary, the glibc sincos()
implementation shouldn't use the x87 FPU version, but should use
the generic C implementation (like sin and cos) instead.

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100321211150.gb8...@prunille.vinc17.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-17 Thread Vincent Lefevre
On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote:
 On amd64, only sincos has an optimized version,

It may be optimized, but completely buggy. For instance, on 1e22,
sincos returns 0.46261304076460174617 for the sine instead of
-0.85220084976718879499 (correctly rounded value). Even the sign
is incorrect!

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100317102900.ga29...@prunille.vinc17.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-17 Thread Vincent Lefevre
On 2010-03-06 11:42:51 +0100, Jerome Vizcaino wrote:
 After many tests and research I've come to the conclusion that the
 float variants of
 sin/cos (and maybe others) are anormaly slow Debian amd64.

Note that on amd64, sin and cos may be slow, but at least they are
mostly correct (in rounding to nearest only). In 32-bit mode (-m32),
the hardware fsin/fcos instructions (and implementation) are used,
but they are buggy on large arguments, just like sincos in 64-bit
mode (which uses the hardware instruction fsincos), and not as
accurate as the MathLib version on small arguments (even though
MathLib has a bug in its error analysis for sin).

If you don't mind about correctness, you can still use the -ffast-math
GCC option (you don't use it in your Makefile). Depending on the
application, this can be OK, but don't complain if you get incorrect,
inaccurate or unexpected results in some cases.

This was for sin/cos. I don't know how sinf/cosf are implemented
in 32-bit and 64-bit modes, but make sure they don't have the same
problem. If they do, this bug is invalid.

Regards,

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100317104738.gb29...@prunille.vinc17.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-17 Thread Giacomo A. Catenazzi

On 17.03.2010 11:29, Vincent Lefevre wrote:

On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote:

On amd64, only sincos has an optimized version,


It may be optimized, but completely buggy. For instance, on 1e22,
sincos returns 0.46261304076460174617 for the sine instead of
-0.85220084976718879499 (correctly rounded value). Even the sign
is incorrect!


Sorry, but I don't see the error. Both are the correct values,
as any values from -1 to 1.

The double 1e22 is outside the *relevant* precision for double,
e.g. a whole range of 2*pi is described with the same number (1e22).

Maybe using long double (and sincosl) you will have the expected results 
(I did not calculate the minimun precision of long double).


ciao
cate



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4ba0cde0.1020...@debian.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-17 Thread Vincent Lefevre
On 2010-03-17 13:41:04 +0100, Giacomo A. Catenazzi wrote:
 On 17.03.2010 11:29, Vincent Lefevre wrote:
 On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote:
 On amd64, only sincos has an optimized version,
 
 It may be optimized, but completely buggy. For instance, on 1e22,
 sincos returns 0.46261304076460174617 for the sine instead of
 -0.85220084976718879499 (correctly rounded value). Even the sign
 is incorrect!
 
 Sorry, but I don't see the error. Both are the correct values,
 as any values from -1 to 1.
 
 The double 1e22 is outside the *relevant* precision for double,
 e.g. a whole range of 2*pi is described with the same number (1e22).

No, this is wrong. This could be correct with interval arithmetic,
but floating-point arithmetic works differently: inputs are
regarded as *exact*. One reason is that the implementation cannot
know whether the input is really exact or not (this information
is not passed to the function). Another reason is that some
formulas (based on correlation) such as sin(x)^2 + cos(x)^2 = 1
or algorithms may fail if the implementation assumes that inputs
are regarded as some arbitrary point in an interval.

Also, IEEE 754 requires correct rounding. Support for elementary
functions is optional (Section 9), but since they are in the
library, it is much better to support them. Users/developers
who know that their software would not be affected by a poorly
accurate implementation could still use compiler options.

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100317133616.gr1...@prunille.vinc17.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-17 Thread Giacomo A. Catenazzi

On 17.03.2010 14:36, Vincent Lefevre wrote:

On 2010-03-17 13:41:04 +0100, Giacomo A. Catenazzi wrote:

On 17.03.2010 11:29, Vincent Lefevre wrote:

On 2010-03-07 16:17:08 +0100, Aurelien Jarno wrote:

On amd64, only sincos has an optimized version,


It may be optimized, but completely buggy. For instance, on 1e22,
sincos returns 0.46261304076460174617 for the sine instead of
-0.85220084976718879499 (correctly rounded value). Even the sign
is incorrect!


Sorry, but I don't see the error. Both are the correct values,
as any values from -1 to 1.

The double 1e22 is outside the *relevant* precision for double,
e.g. a whole range of 2*pi is described with the same number (1e22).


No, this is wrong. This could be correct with interval arithmetic,
but floating-point arithmetic works differently: inputs are
regarded as *exact*.


Are you sure?

From C standard (not really the standard, but the draft n1256: 
5.2.4.2.2, point 5):


: The accuracy of the floating-point operations (+, -, *, /) and of
: the library functions in math.h and complex.h that return
: floating-point results is implementationdefined,
: as is the accuracy of the conversion between floating-point
: internal representations and string representations performed by
: the library functions in stdio.h, stdlib.h, and wchar.h.
: The implementation may state that the accuracy is unknown.

And also looking in POSIX, I find no further requirement,
so are you sure that 1e22 should be taken as exact.


So maybe the bug is to define __STDC_IEC_559__ on such case.

OTOH, section F9.1 don't require (my interpretation)
trigonometric to be IEC 60559 functions. It has such requirement
for more elementary functions, e.g. for sqrt (see section F3).


OTOH I think you are an expert on such standards, so
it is possible/probable that I'm completly wrong.

ciao
cate



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4ba0ffed.6070...@debian.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-17 Thread Vincent Lefevre
On 2010-03-17 17:14:37 +0100, Giacomo A. Catenazzi wrote:
 From C standard (not really the standard, but the draft n1256:
 5.2.4.2.2, point 5):
 
 : The accuracy of the floating-point operations (+, -, *, /) and of
 : the library functions in math.h and complex.h that return
 : floating-point results is implementationdefined,
 : as is the accuracy of the conversion between floating-point
 : internal representations and string representations performed by
 : the library functions in stdio.h, stdlib.h, and wchar.h.
 : The implementation may state that the accuracy is unknown.

The values are defined by 5.2.4.2.2p2 (see the equality). The model
is that a floating-point number represents a real number, not an
interval.

accuracy is a measure of the difference between the exact value
and the (rounded) value returned by the function.

 So maybe the bug is to define __STDC_IEC_559__ on such case.
 
 OTOH, section F9.1 don't require (my interpretation)
 trigonometric to be IEC 60559 functions. It has such requirement
 for more elementary functions, e.g. for sqrt (see section F3).

Yes, elementary functions are not covered by Annex F (which is on
the old IEEE 754-1985 standard). However, a C implementation could
claim to conform to IEEE 754 (i.e. more than being conform to
ISO C). This is more a quality of implementation.

Note: I don't know about other languages, whose implementation could
also use the glibc. They could be more restrictive than C.

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100317165728.gs1...@prunille.vinc17.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-17 Thread Jerome Vizcaino
Hi,

I do not complain about the sin/cos performance but only on the float variants.

Using -ffast-math gives a nice performance boost but leads to bad results (in 
our 
cases which may be different from the simple given example) so it's not really 
a 
workaround.
In fact, I don't really care about super-accuracy and the glibc standard 
implementations gives us good results.
Maybe sinf/cosf in 32 bits is a bit broken but that worked for us in terms of 
precision and performance. On 64 bits, I can only rely on precision, as 
performance is gone :(

Jerome

On Wednesday 17 March 2010, Vincent Lefevre wrote:
 On 2010-03-06 11:42:51 +0100, Jerome Vizcaino wrote:
  After many tests and research I've come to the conclusion that the
  float variants of
  sin/cos (and maybe others) are anormaly slow Debian amd64.
 
 Note that on amd64, sin and cos may be slow, but at least they are
 mostly correct (in rounding to nearest only). In 32-bit mode (-m32),
 the hardware fsin/fcos instructions (and implementation) are used,
 but they are buggy on large arguments, just like sincos in 64-bit
 mode (which uses the hardware instruction fsincos), and not as
 accurate as the MathLib version on small arguments (even though
 MathLib has a bug in its error analysis for sin).
 
 If you don't mind about correctness, you can still use the -ffast-math
 GCC option (you don't use it in your Makefile). Depending on the
 application, this can be OK, but don't complain if you get incorrect,
 inaccurate or unexpected results in some cases.
 
 This was for sin/cos. I don't know how sinf/cosf are implemented
 in 32-bit and 64-bit modes, but make sure they don't have the same
 problem. If they do, this bug is invalid.
 
 Regards,
 




-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201003171931.16812.vizcaino_jer...@yahoo.fr



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-17 Thread Vincent Lefevre
On 2010-03-17 19:31:16 +0100, Jerome Vizcaino wrote:
 I do not complain about the sin/cos performance but only on the
 float variants.

OK. I haven't looked at the code, but if sinf() simply calls sin(),
this is suboptimal and there would be room for performance boost
without sacrifying accuracy.

For large arguments, it would always be slower if you care about
not too bad accuracy. If you don't care because any value between
-1 and 1 is OK, then you could write a wrapper with:

  return fabsf(x)  some_bound ? sinf(x) : 0;

 Using -ffast-math gives a nice performance boost but leads to bad
 results (in our cases which may be different from the simple given
 example) so it's not really a workaround.

Yes, that's the problem with optimizations that change the semantics
(in the same way, users can complain about the 32-bit implementation
of trigonometric functions). Note that there are other compiler
options that allow a finer grain control of these optimizations.
You might want to look at them (e.g. man gcc).

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100317194313.gu1...@prunille.vinc17.org



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-09 Thread Jerome Vizcaino
Ok, running sinf/cosf with bounded values gives better performance results 
(close to sin/cos ones).

I think the binding trick should be written in the manpages as a note on 
amd64 
(at least) because the behavior is different on i386 and clearly not expected...

Anyway, I still can't get the performance I had on the same hardware : 
+ 0.94 secs on 32 bits sinf/cosf without bound values
+ 1.07 secs on 64 bits sinf/cosf with manual binding.

Do you know how the asm in the lib binds the input value (I mean for the 
optimised sin/cos versions for example) ?

Jerome

On Monday 08 March 2010, Aurelien Jarno wrote:
 Jerome Vizcaino a écrit :
  What do you mean sinf/cosf is supposed to be twice faster ?
  You're mentionning calling it with bound values ?
 
 Yes, with the current code and bounded values, it is twice faster. This
 is not the case anymore with the assembly code, as the same FPU
 instruction (fcos/fsin/fsincos) is used for the three versions (float,
 double, long double).
 




--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201003090911.17967.vizcaino_jer...@yahoo.fr



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-08 Thread Jerome Vizcaino
What do you mean sinf/cosf is supposed to be twice faster ?
You're mentionning calling it with bound values ?

Thank you.

Jerome

On Monday 08 March 2010, Aurelien Jarno wrote:
 On Mon, Mar 08, 2010 at 12:17:41AM +0100, Jerome Vizcaino wrote:
  Ok about the patches : there had to be a reason for those not to be
  merged upstream.
 
 Also I think they have never been submitted upstream. They are from
 2002...
 
  Some of my co-workers noticed the performance improvement when binding
  values between -pi and pi but the thing is, this kind of trick do not
  need to be applied when on 32 bits systems... Libm's behavior is not
  really arch-proof in terms of performance which is a bit confusing (and I
  would expect 64 bit being better than 32 bits when dealing with maths...
  :(
 
 cosf/sinf is actually twice faster on 64-bit than on 32-bit in normal
 cases, but can be a lot slower in the cases that concern you.
 
  Have you tried pushing your code upstream ? Maybe this would be useful
  for future versions.
 
 It's currently not clean enough to be send upstream. I'll try to send it
 upstream, but I don't have to much hope.
 




-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201003080927.08652.vizcaino_jer...@yahoo.fr



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-08 Thread Aurelien Jarno
Jerome Vizcaino a écrit :
 What do you mean sinf/cosf is supposed to be twice faster ?
 You're mentionning calling it with bound values ?
 

Yes, with the current code and bounded values, it is twice faster. This
is not the case anymore with the assembly code, as the same FPU
instruction (fcos/fsin/fsincos) is used for the three versions (float,
double, long double).

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4b94dc2b.7010...@aurel32.net



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-07 Thread Aurelien Jarno
On Sat, Mar 06, 2010 at 11:42:51AM +0100, Jerome Vizcaino wrote:
 Package: libc6
 Version: 2.10.2-6
 Severity: normal
 
 Hi,
 
 After many tests and research I've come to the conclusion that the float 
 variants 
 of
 sin/cos (and maybe others) are anormaly slow Debian amd64.
 The performance loss is really impressive (around 8 to 9 times slower).
 I've attached the prog used to make my experiments and used it in the 
 following 
 cases.
 
 + Lenny-amd64: sinf/cosf is really slow
 + Lenny-i386: float performance is ok (faster than the cos/sin using double)
 + Sid-amd64: sinf/cosf slow
 + Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is 
 OK.

On amd64, only sincos has an optimized version, sincosf is using the
generic C implementation. On i386, there are optimized version of both 
sincos and sincosf

 + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64, 
 the tests run fine !
 = The problem is not compiler related.
 
 There seems to be a problem with the way libm is compiled for the amd64 
 architecture on Debian.
 This is why the OpenSuse test was run: the problem is somewhere in the 
 compile 
 chain or debian specific patches.
 

The problem is clearly not Debian specific, and is also present
upstream. OpenSuse is probably using a patch to workaround the problem.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100307151708.ga...@hall.aurel32.net



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-07 Thread Aurelien Jarno
On Sun, Mar 07, 2010 at 04:17:08PM +0100, Aurelien Jarno wrote:
 On Sat, Mar 06, 2010 at 11:42:51AM +0100, Jerome Vizcaino wrote:
  Package: libc6
  Version: 2.10.2-6
  Severity: normal
  
  Hi,
  
  After many tests and research I've come to the conclusion that the float 
  variants 
  of
  sin/cos (and maybe others) are anormaly slow Debian amd64.
  The performance loss is really impressive (around 8 to 9 times slower).
  I've attached the prog used to make my experiments and used it in the 
  following 
  cases.
  
  + Lenny-amd64: sinf/cosf is really slow
  + Lenny-i386: float performance is ok (faster than the cos/sin using double)
  + Sid-amd64: sinf/cosf slow
  + Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is 
  OK.
 
 On amd64, only sincos has an optimized version, sincosf is using the
 generic C implementation. On i386, there are optimized version of both 
 sincos and sincosf
 
  + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on 
  lenny-amd64, 
  the tests run fine !
  = The problem is not compiler related.
  
  There seems to be a problem with the way libm is compiled for the amd64 
  architecture on Debian.
  This is why the OpenSuse test was run: the problem is somewhere in the 
  compile 
  chain or debian specific patches.
  
 
 The problem is clearly not Debian specific, and is also present
 upstream. OpenSuse is probably using a patch to workaround the problem.
 

This is confirmed, there using an AMD version of the libm library on
x86_64, still coded in C for the sincosf function.

A quick an dirty implementation of sincosf in x86_64 assembly gives me a
speed around 4% slower than sincos. What kind of performance ratio do 
you get on SuSe?

The solution seems to write each *f function in x86_64 assembly, but
that'll probably take time.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100307161759.gc...@hall.aurel32.net



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-07 Thread Jerome Vizcaino
Hi,

I could not say for sure the difference between sin and sinf (for example) on 
Suse but the performance ratio I had on 32 bits, stayed the same on 64 bits. 
This is why I was surprised to get impressive slowness when moving to debian :(
Thanks for pointing out the Suse patch : as we only have Suse or Debian at work 
I could not do more comparisons.

How about including patches from OpenSuse ? Is it possible as a quick 
workaround?

Thanks for your help.

Jerome

On Sunday 07 March 2010, you wrote:
 On Sun, Mar 07, 2010 at 04:17:08PM +0100, Aurelien Jarno wrote:
  On Sat, Mar 06, 2010 at 11:42:51AM +0100, Jerome Vizcaino wrote:
   Package: libc6
   Version: 2.10.2-6
   Severity: normal
  
   Hi,
  
   After many tests and research I've come to the conclusion that the
   float variants of
   sin/cos (and maybe others) are anormaly slow Debian amd64.
   The performance loss is really impressive (around 8 to 9 times slower).
   I've attached the prog used to make my experiments and used it in the
   following cases.
  
   + Lenny-amd64: sinf/cosf is really slow
   + Lenny-i386: float performance is ok (faster than the cos/sin using
   double) + Sid-amd64: sinf/cosf slow
   + Lenny-amd64 using lenny-i386 binary and 32bits libs: float
   performance is OK.
 
  On amd64, only sincos has an optimized version, sincosf is using the
  generic C implementation. On i386, there are optimized version of both
  sincos and sincosf
 
   + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on
   lenny-amd64, the tests run fine !
   = The problem is not compiler related.
  
   There seems to be a problem with the way libm is compiled for the amd64
   architecture on Debian.
   This is why the OpenSuse test was run: the problem is somewhere in the
   compile chain or debian specific patches.
 
  The problem is clearly not Debian specific, and is also present
  upstream. OpenSuse is probably using a patch to workaround the problem.
 
 This is confirmed, there using an AMD version of the libm library on
 x86_64, still coded in C for the sincosf function.
 
 A quick an dirty implementation of sincosf in x86_64 assembly gives me a
 speed around 4% slower than sincos. What kind of performance ratio do
 you get on SuSe?
 
 The solution seems to write each *f function in x86_64 assembly, but
 that'll probably take time.
 




-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201003071943.46266.vizcaino_jer...@yahoo.fr



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-07 Thread Aurelien Jarno
On Sun, Mar 07, 2010 at 07:43:46PM +0100, Jerome Vizcaino wrote:
 Hi,
 
 I could not say for sure the difference between sin and sinf (for example) on 
 Suse but the performance ratio I had on 32 bits, stayed the same on 64 bits. 
 This is why I was surprised to get impressive slowness when moving to debian 
 :(
 Thanks for pointing out the Suse patch : as we only have Suse or Debian at 
 work 
 I could not do more comparisons.
 
 How about including patches from OpenSuse ? Is it possible as a quick 
 workaround?
 

The patches from OpenSuse are ugly and very invasive, and they do not
seem to include the recent errno changes for C99 compliance (though I
haven't tested them). I am not really sure we want that. I have started
to rewrite part of the functions in assembly.

While this new assembly code behaves correctly with your testcase, it is
twice slower than the current version when using normal arguments. I 
have modified a bit your code to stay within a reasonable range of
arguments, and also test the l version of the functions. 

Here is the result with the original code (using C code for the f 
version):

| Testing 1000 sinf, cosf and tanf... Result: 19764686.00, Duration: 
0.516700 sec
| Testing 1000 sin, cos and tan (with float args)... Result: 
19764686.00, Duration: 1.056214 sec
| Testing 1000 sinl, cosl and tanl (with float args)... Result: 
19764686.00, Duration: 1.089871 sec

Here is the result with assembly code instead (using the FPU 
instructions), I get instead:

| Testing 1000 sinf, cosf and tanf... Result: 19764686.00, Duration: 
1.010248 sec
| Testing 1000 sin, cos and tan (with float args)... Result: 
19764686.00, Duration: 1.055434 sec
| Testing 1000 sinl, cosl and tanl (with float args)... Result: 
19764686.00, Duration: 1.095374 sec

As I expect most codes to use values between -2pi and 2pi, I am not
really sure we should change the current code.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100307210105.ge...@hall.aurel32.net



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-07 Thread Aurelien Jarno
On Mon, Mar 08, 2010 at 12:17:41AM +0100, Jerome Vizcaino wrote:
 Ok about the patches : there had to be a reason for those not to be merged 
 upstream.

Also I think they have never been submitted upstream. They are from
2002...

 Some of my co-workers noticed the performance improvement when binding values 
 between -pi and pi but the thing is, this kind of trick do not need to be 
 applied when on 32 bits systems... Libm's behavior is not really arch-proof 
 in 
 terms of performance which is a bit confusing (and I would expect 64 bit 
 being 
 better than 32 bits when dealing with maths... :(

cosf/sinf is actually twice faster on 64-bit than on 32-bit in normal
cases, but can be a lot slower in the cases that concern you.

 Have you tried pushing your code upstream ? Maybe this would be useful for 
 future versions.
 

It's currently not clean enough to be send upstream. I'll try to send it
upstream, but I don't have to much hope.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100307232342.gg...@hall.aurel32.net



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-07 Thread Jerome Vizcaino
Ok about the patches : there had to be a reason for those not to be merged 
upstream.
Some of my co-workers noticed the performance improvement when binding values 
between -pi and pi but the thing is, this kind of trick do not need to be 
applied when on 32 bits systems... Libm's behavior is not really arch-proof in 
terms of performance which is a bit confusing (and I would expect 64 bit being 
better than 32 bits when dealing with maths... :(
Have you tried pushing your code upstream ? Maybe this would be useful for 
future versions.

Thanks for your help

Jerome

On Sunday 07 March 2010, Aurelien Jarno wrote:
 On Sun, Mar 07, 2010 at 07:43:46PM +0100, Jerome Vizcaino wrote:
  Hi,
 
  I could not say for sure the difference between sin and sinf (for
  example) on Suse but the performance ratio I had on 32 bits, stayed the
  same on 64 bits. This is why I was surprised to get impressive slowness
  when moving to debian :( Thanks for pointing out the Suse patch : as we
  only have Suse or Debian at work I could not do more comparisons.
 
  How about including patches from OpenSuse ? Is it possible as a quick
  workaround?
 
 The patches from OpenSuse are ugly and very invasive, and they do not
 seem to include the recent errno changes for C99 compliance (though I
 haven't tested them). I am not really sure we want that. I have started
 to rewrite part of the functions in assembly.
 
 While this new assembly code behaves correctly with your testcase, it is
 twice slower than the current version when using normal arguments. I
 have modified a bit your code to stay within a reasonable range of
 arguments, and also test the l version of the functions.
 
 Here is the result with the original code (using C code for the f
 
 version):
 | Testing 1000 sinf, cosf and tanf... Result: 19764686.00,
 | Duration: 0.516700 sec Testing 1000 sin, cos and tan (with float
 | args)... Result: 19764686.00, Duration: 1.056214 sec Testing 1000
 | sinl, cosl and tanl (with float args)... Result: 19764686.00,
 | Duration: 1.089871 sec
 
 Here is the result with assembly code instead (using the FPU
 
 instructions), I get instead:
 | Testing 1000 sinf, cosf and tanf... Result: 19764686.00,
 | Duration: 1.010248 sec Testing 1000 sin, cos and tan (with float
 | args)... Result: 19764686.00, Duration: 1.055434 sec Testing 1000
 | sinl, cosl and tanl (with float args)... Result: 19764686.00,
 | Duration: 1.095374 sec
 
 As I expect most codes to use values between -2pi and 2pi, I am not
 really sure we should change the current code.
 




-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201003080017.41964.vizcaino_jer...@yahoo.fr



Bug#572746: libm: sinf/cosf performance is awful on amd64

2010-03-06 Thread Jerome Vizcaino
Package: libc6
Version: 2.10.2-6
Severity: normal

Hi,

After many tests and research I've come to the conclusion that the float 
variants 
of
sin/cos (and maybe others) are anormaly slow Debian amd64.
The performance loss is really impressive (around 8 to 9 times slower).
I've attached the prog used to make my experiments and used it in the following 
cases.

+ Lenny-amd64: sinf/cosf is really slow
+ Lenny-i386: float performance is ok (faster than the cos/sin using double)
+ Sid-amd64: sinf/cosf slow
+ Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is OK.

+ OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64, 
the tests run fine !
= The problem is not compiler related.

There seems to be a problem with the way libm is compiled for the amd64 
architecture on Debian.
This is why the OpenSuse test was run: the problem is somewhere in the compile 
chain or debian specific patches.

We're extensively using these for calculations and this is a real problem. 
Using 
cos/sin as
a temporary workaround would do the trick but this is still slower than the 
sinf/cosf 
implementations that works so well on 32 bits computers...

Thank you

Jerome

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-trunk-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) (ignored: LC_ALL 
set to en_US.utf8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libc-bin  2.10.2-6   Embedded GNU C Library: Binaries
ii  libgcc1   1:4.4.3-3  GCC support library

libc6 recommends no packages.

Versions of packages libc6 suggests:
ii  debconf [debconf-2.0] 1.5.28 Debian configuration management sy
pn  glibc-doc none (no description available)
ii  locales   2.10.2-6   Embedded GNU C Library: National L

-- debconf information excluded
CC=gcc
CFLAGS=-DNDEBUG -O3 -D_ISOC99_SOURCE -Wall -Wextra
LDFLAGS=-lm

all: test_trig

clean:
	rm test_trig

test_trig: test_trig.c
#include math.h
#include sys/time.h
#include stdio.h




int main(void) 
{
  const int nbElement_i = 1000;
  int i=0;
  float f1=0.0f, f2=0.0f, f3=0.0f;
  
  struct timeval tv1, tv2; 

  printf(Testing %d sinf and cosf... , nbElement_i);
  fflush(stdout);
  
  gettimeofday(tv1, NULL);

  for(i=0; inbElement_i; i++){
f1 += cosf(i); 
f2 += sinf(i);
  }

  // This is needed for gcc to know a and b results
  // really matters, otherwise sin and cos could
  // be ignored.
  f3 = f1+f2; 

  gettimeofday(tv2, NULL);

  //
  printf(Result: %f, Duration: %ld sec %ld usec\n, f3, tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec);

  f1 = 0.0f; f2 = 0.0f;
  printf(Testing %d sin and cos (with float args)... , nbElement_i);
  fflush(stdout);
  
  gettimeofday(tv1, NULL);

  for(i=0; inbElement_i; i++){
f1 += cos(i); 
f2 += sin(i);
  }

  // This is needed for gcc to know a and b results
  // really matters, otherwise sin and cos could
  // be ignored.
  f3 = f1+f2; 

  gettimeofday(tv2, NULL);

  //
  printf(Result: %f, Duration: %ld sec %ld usec\n, f3, tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec);
  
  return 0;
}