Re: sin() implementation

2017-09-29 Thread Joerg Sonnenberger
On Fri, Sep 29, 2017 at 11:16:24AM +0200, Alexandre Ratchov wrote:
> On Wed, Sep 27, 2017 at 03:09:48PM +0200, Joerg Sonnenberger wrote:
> > On Wed, Sep 27, 2017 at 08:40:26AM +0200, Alexandre Ratchov wrote:
> > > Even on modern amd64s integer arithmetics and bitwise operations are
> > > faster (and more precise in many cases) than floating point
> > > equivalents.
> > 
> > Can you actually substanciate this claim?
> 
> [OT]
> 
> I'm working on typical signal processing code doing mostly
> multiplications, additions and table lookups. For instance, when we do
> the following with floats:
> 
>   float a, b, x, y;
>   y = a * x + b;
> 
> we get 24 bits of precision as floats have 24-bit mantissa. If, with
> proper typedefs and #ifdefery, we replace floats by ints representing
> the same numbers as 1:4:27 fixed-points:
> 
>   int a, b, x, y;
>   y = ((long long)a * x >> 27) + b;
> 
> we get 28 bits of precision, i.e. 4 extra bits of precision compared
> to floats.
> 
> I just did a quick test on a i5, gcc 4.2. The fixed-point version
> consumes 10% to 50% less CPU depending on the algorithm. To test, I
> just run the algorithm during 10 seconds and measure the throughput.

You are comparing Apples and Oranges. Just for starters, it is quite
common in newer (as in: less than 20 years old) calling conventions to
pass the function arguments via registers. It is also quite common that
moving data from a floating point register to a general purpose register
requires going via the stack. On super scalar architectures this means
that the basic input checks can often be done in parallel to the start
of the range reduction etc.

I'm not saying that libm should always use FP math. All I've been saying
is that the code was written under certain assumptions and many of them
haven't been valid for a long time.

Joerg



Re: sin() implementation

2017-09-29 Thread Alexandre Ratchov
On Fri, Sep 29, 2017 at 11:28:42AM +0200, Otto Moerbeek wrote:
> On Fri, Sep 29, 2017 at 11:16:24AM +0200, Alexandre Ratchov wrote:
> 
> > On Wed, Sep 27, 2017 at 03:09:48PM +0200, Joerg Sonnenberger wrote:
> > > On Wed, Sep 27, 2017 at 08:40:26AM +0200, Alexandre Ratchov wrote:
> > > > Even on modern amd64s integer arithmetics and bitwise operations are
> > > > faster (and more precise in many cases) than floating point
> > > > equivalents.
> > > 
> > > Can you actually substanciate this claim?
> > 
> > [OT]
> > 
> > I'm working on typical signal processing code doing mostly
> > multiplications, additions and table lookups. For instance, when we do
> > the following with floats:
> > 
> > float a, b, x, y;
> > y = a * x + b;
> > 
> > we get 24 bits of precision as floats have 24-bit mantissa. If, with
> > proper typedefs and #ifdefery, we replace floats by ints representing
> > the same numbers as 1:4:27 fixed-points:
> > 
> > int a, b, x, y;
> > y = ((long long)a * x >> 27) + b;
> > 
> > we get 28 bits of precision, i.e. 4 extra bits of precision compared
> > to floats.
> > 
> > I just did a quick test on a i5, gcc 4.2. The fixed-point version
> > consumes 10% to 50% less CPU depending on the algorithm. To test, I
> > just run the algorithm during 10 seconds and measure the throughput.
> > 
> > In the past, I've tested on other CPUs, with other compilers and the
> > floating point version of this code was never faster than the
> > fixed-point.
> 
> Add to that this is platform indepedent code and that we run or at
> least ran on platforms having softfloat, where the speed difference is
> significant.
> 

This was my initial motivation, zaurus had no fpu.



Re: sin() implementation

2017-09-29 Thread Otto Moerbeek
On Fri, Sep 29, 2017 at 11:16:24AM +0200, Alexandre Ratchov wrote:

> On Wed, Sep 27, 2017 at 03:09:48PM +0200, Joerg Sonnenberger wrote:
> > On Wed, Sep 27, 2017 at 08:40:26AM +0200, Alexandre Ratchov wrote:
> > > Even on modern amd64s integer arithmetics and bitwise operations are
> > > faster (and more precise in many cases) than floating point
> > > equivalents.
> > 
> > Can you actually substanciate this claim?
> 
> [OT]
> 
> I'm working on typical signal processing code doing mostly
> multiplications, additions and table lookups. For instance, when we do
> the following with floats:
> 
>   float a, b, x, y;
>   y = a * x + b;
> 
> we get 24 bits of precision as floats have 24-bit mantissa. If, with
> proper typedefs and #ifdefery, we replace floats by ints representing
> the same numbers as 1:4:27 fixed-points:
> 
>   int a, b, x, y;
>   y = ((long long)a * x >> 27) + b;
> 
> we get 28 bits of precision, i.e. 4 extra bits of precision compared
> to floats.
> 
> I just did a quick test on a i5, gcc 4.2. The fixed-point version
> consumes 10% to 50% less CPU depending on the algorithm. To test, I
> just run the algorithm during 10 seconds and measure the throughput.
> 
> In the past, I've tested on other CPUs, with other compilers and the
> floating point version of this code was never faster than the
> fixed-point.

Add to that this is platform indepedent code and that we run or at
least ran on platforms having softfloat, where the speed difference is
significant.

-Otto



Re: sin() implementation

2017-09-29 Thread Alexandre Ratchov
On Wed, Sep 27, 2017 at 03:09:48PM +0200, Joerg Sonnenberger wrote:
> On Wed, Sep 27, 2017 at 08:40:26AM +0200, Alexandre Ratchov wrote:
> > Even on modern amd64s integer arithmetics and bitwise operations are
> > faster (and more precise in many cases) than floating point
> > equivalents.
> 
> Can you actually substanciate this claim?

[OT]

I'm working on typical signal processing code doing mostly
multiplications, additions and table lookups. For instance, when we do
the following with floats:

float a, b, x, y;
y = a * x + b;

we get 24 bits of precision as floats have 24-bit mantissa. If, with
proper typedefs and #ifdefery, we replace floats by ints representing
the same numbers as 1:4:27 fixed-points:

int a, b, x, y;
y = ((long long)a * x >> 27) + b;

we get 28 bits of precision, i.e. 4 extra bits of precision compared
to floats.

I just did a quick test on a i5, gcc 4.2. The fixed-point version
consumes 10% to 50% less CPU depending on the algorithm. To test, I
just run the algorithm during 10 seconds and measure the throughput.

In the past, I've tested on other CPUs, with other compilers and the
floating point version of this code was never faster than the
fixed-point.



Re: sin() implementation

2017-09-27 Thread Theo de Raadt
> On Wed, Sep 27, 2017 at 08:40:26AM +0200, Alexandre Ratchov wrote:
> > Even on modern amd64s integer arithmetics and bitwise operations are
> > faster (and more precise in many cases) than floating point
> > equivalents.
> 
> Can you actually substanciate this claim? The basic x87 instructions
> (FLD, FST, FCOM etc) all run in one cycle with a latency of 2 according
> to Agner. That's generally not worth the cost of moving from x87 to GP
> registers. Precision might have been a problem two decades ago when many
> compilers were too bad at correct parsing of float literals, but that
> has been fixed in the mean time even without using C99 hex float
> literals. The short answer is that the original msun code has been
> optimized for a hardware environment that is substantially different
> from modern CPUs. Many of the small details are no longer optimal.

The integer arithmetics and bitwise operations are not slower than the
float ones.  Hence, since this is machine independent code, the approach
taken is sane.



Re: sin() implementation

2017-09-27 Thread Joerg Sonnenberger
On Wed, Sep 27, 2017 at 08:40:26AM +0200, Alexandre Ratchov wrote:
> Even on modern amd64s integer arithmetics and bitwise operations are
> faster (and more precise in many cases) than floating point
> equivalents.

Can you actually substanciate this claim? The basic x87 instructions
(FLD, FST, FCOM etc) all run in one cycle with a latency of 2 according
to Agner. That's generally not worth the cost of moving from x87 to GP
registers. Precision might have been a problem two decades ago when many
compilers were too bad at correct parsing of float literals, but that
has been fixed in the mean time even without using C99 hex float
literals. The short answer is that the original msun code has been
optimized for a hardware environment that is substantially different
from modern CPUs. Many of the small details are no longer optimal.

Joerg



Re: sin() implementation

2017-09-27 Thread Alexandre Ratchov
On Tue, Sep 26, 2017 at 01:27:52PM +0200, Jan Stary wrote:
> I picked sin() as an example while trying to walk though the implementation
> of (pieces of) libm.  If someone has the time for it, I have some questions.
> 
> I understand the implementation originaly stems from Sun's libm of 1993.
> (As does that of FreeBSD and NetBSD.) It has been tweaked over the years,
> but the code that actually computes the values is mostly untouched.
> 
> s_sin.c normalizes the argument to [-pi/4, +pi/4].
> This is how |x| <= pi/4 is tested:
> 
>   GET_HIGH_WORD(ix,x);
>   ix &= 0x7fff;
>   if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
> 
> Why is it done like that? Is it faster or more portable
> or in any way better that comparing the value itself to M_PI_4?
> (Or was it, in 1993?)

Even on modern amd64s integer arithmetics and bitwise operations are
faster (and more precise in many cases) than floating point
equivalents. Recently I did few measurements on "real life" signal
processing code (intel i5-2500K).

I don't know if this is true for above condition and even if it was,
I'm not sure the optimization would be visible in the execution time
of programs using sinf() a lot.

Hex constants would be "useful" to avoid rounding errors in
decimal->binary conversions done during compilation. This could be
avoided by using hex float literals, but these didn't exist in 1993.

> For Inf and NaN arguments, NaN is returned as follows:
> 
>   /* sin(Inf or NaN) is NaN */
>   else if (ix>=0x7ff0) return x-x;

interestingly, 0x7f81 is also NaN but doesn't pass the test.
AFAICS, to implement what the comment says, the test would be:

else if (ix >= 0x7f80) return x - x;

> 
> Again, why the integer conversion? Would
> 
>   else if (isinf(x) || isnan(x))
> 
> be slower? Less portable?
> 
> Also, is "return x-x" just a fast/clever way to return NaN
> for x being Inf _or_ NaN, preferable to return nan("")?
> 

Above function could also just set x to 0x7fff.

For sure nan(3) and isnan(3) were not common in 1993, so this might be
for portability. Furthermore, we used to support non-IEEE-754 machines
which may need this, I don't known.



Re: sin() implementation

2017-09-26 Thread Jimmy Hess
On Tue, Sep 26, 2017 at 6:27 AM, Jan Stary  wrote:


> s_sin.c normalizes the argument to [-pi/4, +pi/4].
> This is how |x| <= pi/4 is tested:
> GET_HIGH_WORD(ix,x);
> ix &= 0x7fff;
> if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
>
> Why is it done like that? Is it faster or more portable
> or in any way better that comparing the value itself to M_PI_4?
>

See  s_cos.c  s_tan.c;   and many others start with a similar block.
Probably the integer comparison is efficient (fewer average CPU and bus
resources per second tied up per operation).   It will  depend upon your
computer's  characteristics,  and  applications running, and how many times
per second  sin()  is  invoked   on average.

Assuming the compiler optimization of the above code block is as good as it
could be:
that should hinge on whether a single integer copy, and 32-bit integer
subtraction
and Bitwise AND on your CPU will burn more processor work cycles than  two
floating-point copy,
two  floating-point subtraction plus   one  sign negate  operation against
IEEE 64-bit double.

The basic rule of thumb says  that if a  performance-significant task can
be completed using Integers
instead, then maximum effort should be made to avoid/minimize unnecessary
usage of
floating-point operations  (without an excessive increase in the # of
operations),


As  3 to 4 times the processor work cycles for execution path to
 3-argument  64-bit floating point
calculation Versus the corresponding 3-argument operation using 32-bit
integers.

More floating point operations/sec = lower system efficiency,  or
less-effective use of the overall available
compute capacity Versus   the use of  processors'  regular,  typically
more-optimized ALU pipelines.

> if (-M_PI_4 < x && x < M_PI_4)


> >For Inf and NaN arguments, NaN is returned as follows:
> >/* sin(Inf or NaN) is NaN */
> >else if (ix>=0x7ff0) return x-x;
>


> > Again, why the integer conversion? Would
> >
> >else if (isinf(x) || isnan(x))


> Also, is "return x-x" just a fast/clever way to return NaN
>

Well,  the original code is applying a >=  test,whereas isinf() and
isnan() are
shown to be equality  tests  that check for two specific values of the
exponent field.
The  ||  branching is involving more computations though

General application code should  use  the new  isinf(),  isnan(),   and
nan("")  macros,
but they have not always been part of the C standard,  and I imagine this
code predates it.

The  NaN/InF values are  runtime-specific in the current C standard,   so
it is
only sort of legitimate to do that,   Because  this  code is part of the
 libc runtime's
implementation itself.

In this context,   the libc implementation has enough information to safely
rely
on the assumption to make the decision without resorting to a macro call
 that
invokes a number of other operations,   and so

Is free  and perfectly fine  to use that   ix >= 0x7ff0  to  equate
to a  NaN  condition.

That appears to potentially streamline the number of computations required
for this case,
so there  might actually be a performance benefit to not use the macros
here as well.

-- 
-JH


Re: sin() implementation

2017-09-26 Thread Otto Moerbeek
On Tue, Sep 26, 2017 at 01:27:52PM +0200, Jan Stary wrote:

> I picked sin() as an example while trying to walk though the implementation
> of (pieces of) libm.  If someone has the time for it, I have some questions.
> 
> I understand the implementation originaly stems from Sun's libm of 1993.
> (As does that of FreeBSD and NetBSD.) It has been tweaked over the years,
> but the code that actually computes the values is mostly untouched.
> 
> s_sin.c normalizes the argument to [-pi/4, +pi/4].
> This is how |x| <= pi/4 is tested:
> 
>   GET_HIGH_WORD(ix,x);
>   ix &= 0x7fff;
>   if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
> 
> Why is it done like that? Is it faster or more portable
> or in any way better that comparing the value itself to M_PI_4?
> (Or was it, in 1993?)

The code is written in a way to use as little as fp ops as possible.

> 
> For Inf and NaN arguments, NaN is returned as follows:
> 
>   /* sin(Inf or NaN) is NaN */
>   else if (ix>=0x7ff0) return x-x;
> 
> Again, why the integer conversion? Would
> 
>   else if (isinf(x) || isnan(x))
> 
> be slower? Less portable?

These functions were not generally available when the code was
written. iirc these were only standarized in C99.

> 
> Also, is "return x-x" just a fast/clever way to return NaN
> for x being Inf _or_ NaN, preferable to return nan("")?

Same as above.

-Otto


> 
> I have other questions in k_sin.c and elsewhere, but I'll stop here.
> Please see my naive diff below. I tried this on amd64, macppc and armv7;
> regress/lib/libm/fpaccuracy results in zero difference. The only
> other thing I tried is "sox -n out.raw synth 10 sin" which also gives
> the same result. (That's hardly "testing" of course.)
> 
> Surely I am missing something elementary.
> I'll be glad for any insight.
> 
>   Thank you for your time
> 
>   Jan
> 
> 
> Index: s_sin.c
> ===
> RCS file: /cvs/src/lib/libm/src/s_sin.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 s_sin.c
> --- s_sin.c   12 Sep 2016 19:47:02 -  1.11
> +++ s_sin.c   26 Sep 2017 08:54:03 -
> @@ -50,17 +50,12 @@ double
>  sin(double x)
>  {
>   double y[2],z=0.0;
> - int32_t n, ix;
> + int32_t n;
>  
> -/* High word of x. */
> - GET_HIGH_WORD(ix,x);
> -
> -/* |x| ~< pi/4 */
> - ix &= 0x7fff;
> - if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
> -
> -/* sin(Inf or NaN) is NaN */
> - else if (ix>=0x7ff0) return x-x;
> + if (-M_PI_4 < x && x < M_PI_4)
> + return __kernel_sin(x,z,0);
> + else if (isinf(x) || isnan(x))
> + return nan("");
>  
>  /* argument reduction needed */
>   else {



Re: sin() implementation

2017-09-26 Thread Otto Moerbeek
On Tue, Sep 26, 2017 at 06:21:16PM +0200, Jan Stary wrote:

> These (diff below) seem to be obvious typos in k_sin.c,
> but only in comments. The comment that says
> 
>   if x < 2^-27 (hx<0x3e40), return x with inexact if x!=0
> 
> also puzzles me a bit: what the code does is
> 
>   GET_HIGH_WORD(ix,x);
>   ix &= 0x7fff;   /* high word of x */
>   if(ix<0x3e40)   /* |x| < 2**-27 */
>   {if((int)x==0) return x;}   /* generate inexact */
> 
> If the high word of a double x is less than 0x3e40,
> how could (int)x be anything else than zero?

The result might be zero, but you want the inexact flag to be set in
those cases. I think case 2 describes it correctly: only return 0 with
the inexact flag not set if x equals zero.

-Otto

> 
> The "if x!= 0" also puzzles me: the sine of zero _is_ zero, exactly.
> What would change with this:
> 
>   if (ix < 0x3e40)
>   return x;
> 
> What is the role of the iy indicator?
> Is it easier/faster to test the integer indicator for iy == 0
> than it would be to test y == 0 itself? Or is there another reason?
> 
>  Jan
> 
> 
> Index: k_sin.c
> ===
> RCS file: /cvs/src/lib/libm/src/k_sin.c,v
> retrieving revision 1.3
> diff -u -p -r1.3 k_sin.c
> --- k_sin.c   27 Oct 2009 23:59:30 -  1.3
> +++ k_sin.c   26 Sep 2017 16:07:45 -
> @@ -10,15 +10,15 @@
>   * 
>   */
>  
> -/* __kernel_sin( x, y, iy)
> +/* __kernel_sin(x, y, iy)
>   * kernel sin function on [-pi/4, pi/4], pi/4 ~ 0.7854
>   * Input x is assumed to be bounded by ~pi/4 in magnitude.
>   * Input y is the tail of x.
> - * Input iy indicates whether y is 0. (if iy=0, y assume to be 0). 
> + * Input iy indicates whether y is 0. (if iy=0, assume y to be 0). 
>   *
>   * Algorithm
>   *   1. Since sin(-x) = -sin(x), we need only to consider positive x. 
> - *   2. if x < 2^-27 (hx<0x3e40 0), return x with inexact if x!=0.
> + *   2. if x < 2^-27 (hx<0x3e40), return x with inexact if x!=0.
>   *   3. sin(x) is approximated by a polynomial of degree 13 on
>   *  [0,pi/4]
>   *313



Re: sin() implementation

2017-09-26 Thread Jan Stary
On Sep 26 17:41:17, h...@stare.cz wrote:
> > > s_sin.c normalizes the argument to [-pi/4, +pi/4].
> > > This is how |x| <= pi/4 is tested:
> > > 
> > >   GET_HIGH_WORD(ix,x);
> > >   ix &= 0x7fff;
> > >   if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
> > > 
> > > Why is it done like that? Is it faster or more portable
> > > or in any way better that comparing the value itself to M_PI_4?
> > > (Or was it, in 1993?)
> > 
> > Hm, is 0x3fe921fb the nearest float <= pi/4?
> > s_sinf.c uses 0x3f490fd8 for that test.
> 
> Yes.
> 
> #include 
> #include 
> #include "math_private.h"
> 
> int
> main()
> {
>   double d = M_PI_4;
>   double f = M_PI_4;

float f, sorry.

>   int32_t i, j;
> 
>   GET_HIGH_WORD(i, d);
>   GET_FLOAT_WORD(j, f);
> 
>   printf("double %f, high  word %#x\n", d, i);
>   printf("float  %f, float word %#x\n", f, j);
> 
>   return 0;
> }
> 
> This says
> 
>   double 0.785398, high  word 0x3fe921fb
>   float  0.785398, float word 0x3f490fdb
> 
> In case of double, it's exactly 0x3fe921fb.
> But why then does s_sinf.c use 0x3f490fd8 instead of 0x3f490fdb?
> 
>   Jan
> 



Re: sin() implementation

2017-09-26 Thread Jan Stary
These (diff below) seem to be obvious typos in k_sin.c,
but only in comments. The comment that says

if x < 2^-27 (hx<0x3e40), return x with inexact if x!=0

also puzzles me a bit: what the code does is

GET_HIGH_WORD(ix,x);
ix &= 0x7fff;   /* high word of x */
if(ix<0x3e40)   /* |x| < 2**-27 */
{if((int)x==0) return x;}   /* generate inexact */

If the high word of a double x is less than 0x3e40,
how could (int)x be anything else than zero?

The "if x!= 0" also puzzles me: the sine of zero _is_ zero, exactly.
What would change with this:

if (ix < 0x3e40)
return x;

What is the role of the iy indicator?
Is it easier/faster to test the integer indicator for iy == 0
than it would be to test y == 0 itself? Or is there another reason?

 Jan


Index: k_sin.c
===
RCS file: /cvs/src/lib/libm/src/k_sin.c,v
retrieving revision 1.3
diff -u -p -r1.3 k_sin.c
--- k_sin.c 27 Oct 2009 23:59:30 -  1.3
+++ k_sin.c 26 Sep 2017 16:07:45 -
@@ -10,15 +10,15 @@
  * 
  */
 
-/* __kernel_sin( x, y, iy)
+/* __kernel_sin(x, y, iy)
  * kernel sin function on [-pi/4, pi/4], pi/4 ~ 0.7854
  * Input x is assumed to be bounded by ~pi/4 in magnitude.
  * Input y is the tail of x.
- * Input iy indicates whether y is 0. (if iy=0, y assume to be 0). 
+ * Input iy indicates whether y is 0. (if iy=0, assume y to be 0). 
  *
  * Algorithm
  * 1. Since sin(-x) = -sin(x), we need only to consider positive x. 
- * 2. if x < 2^-27 (hx<0x3e40 0), return x with inexact if x!=0.
+ * 2. if x < 2^-27 (hx<0x3e40), return x with inexact if x!=0.
  * 3. sin(x) is approximated by a polynomial of degree 13 on
  *[0,pi/4]
  *  313



Re: sin() implementation

2017-09-26 Thread Jan Stary
> > s_sin.c normalizes the argument to [-pi/4, +pi/4].
> > This is how |x| <= pi/4 is tested:
> > 
> > GET_HIGH_WORD(ix,x);
> > ix &= 0x7fff;
> > if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
> > 
> > Why is it done like that? Is it faster or more portable
> > or in any way better that comparing the value itself to M_PI_4?
> > (Or was it, in 1993?)
> 
> Hm, is 0x3fe921fb the nearest float <= pi/4?
> s_sinf.c uses 0x3f490fd8 for that test.

Yes.

#include 
#include 
#include "math_private.h"

int
main()
{
double d = M_PI_4;
double f = M_PI_4;
int32_t i, j;

GET_HIGH_WORD(i, d);
GET_FLOAT_WORD(j, f);

printf("double %f, high  word %#x\n", d, i);
printf("float  %f, float word %#x\n", f, j);

return 0;
}

This says

  double 0.785398, high  word 0x3fe921fb
  float  0.785398, float word 0x3f490fdb

In case of double, it's exactly 0x3fe921fb.
But why then does s_sinf.c use 0x3f490fd8 instead of 0x3f490fdb?

Jan



Re: sin() implementation

2017-09-26 Thread Jan Stary
On Sep 26 13:27:52, h...@stare.cz wrote:
> I picked sin() as an example while trying to walk though the implementation
> of (pieces of) libm.  If someone has the time for it, I have some questions.
> 
> I understand the implementation originaly stems from Sun's libm of 1993.
> (As does that of FreeBSD and NetBSD.)

And Android https://android.googlesource.com/platform/external/fdlibm/
and Julia http://openlibm.org/, apparently.

> It has been tweaked over the years,
> but the code that actually computes the values is mostly untouched.
> 
> s_sin.c normalizes the argument to [-pi/4, +pi/4].
> This is how |x| <= pi/4 is tested:
> 
>   GET_HIGH_WORD(ix,x);
>   ix &= 0x7fff;
>   if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
> 
> Why is it done like that? Is it faster or more portable
> or in any way better that comparing the value itself to M_PI_4?
> (Or was it, in 1993?)

Hm, is 0x3fe921fb the nearest float <= pi/4?
s_sinf.c uses 0x3f490fd8 for that test.


Jan

> For Inf and NaN arguments, NaN is returned as follows:
> 
>   /* sin(Inf or NaN) is NaN */
>   else if (ix>=0x7ff0) return x-x;
> 
> Again, why the integer conversion? Would
> 
>   else if (isinf(x) || isnan(x))
> 
> be slower? Less portable?
> 
> Also, is "return x-x" just a fast/clever way to return NaN
> for x being Inf _or_ NaN, preferable to return nan("")?
> 
> I have other questions in k_sin.c and elsewhere, but I'll stop here.
> Please see my naive diff below. I tried this on amd64, macppc and armv7;
> regress/lib/libm/fpaccuracy results in zero difference. The only
> other thing I tried is "sox -n out.raw synth 10 sin" which also gives
> the same result. (That's hardly "testing" of course.)
> 
> Surely I am missing something elementary.
> I'll be glad for any insight.
> 
>   Thank you for your time
> 
>   Jan
> 
> 
> Index: s_sin.c
> ===
> RCS file: /cvs/src/lib/libm/src/s_sin.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 s_sin.c
> --- s_sin.c   12 Sep 2016 19:47:02 -  1.11
> +++ s_sin.c   26 Sep 2017 08:54:03 -
> @@ -50,17 +50,12 @@ double
>  sin(double x)
>  {
>   double y[2],z=0.0;
> - int32_t n, ix;
> + int32_t n;
>  
> -/* High word of x. */
> - GET_HIGH_WORD(ix,x);
> -
> -/* |x| ~< pi/4 */
> - ix &= 0x7fff;
> - if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
> -
> -/* sin(Inf or NaN) is NaN */
> - else if (ix>=0x7ff0) return x-x;
> + if (-M_PI_4 < x && x < M_PI_4)
> + return __kernel_sin(x,z,0);
> + else if (isinf(x) || isnan(x))
> + return nan("");
>  
>  /* argument reduction needed */
>   else {
> 



sin() implementation

2017-09-26 Thread Jan Stary
I picked sin() as an example while trying to walk though the implementation
of (pieces of) libm.  If someone has the time for it, I have some questions.

I understand the implementation originaly stems from Sun's libm of 1993.
(As does that of FreeBSD and NetBSD.) It has been tweaked over the years,
but the code that actually computes the values is mostly untouched.

s_sin.c normalizes the argument to [-pi/4, +pi/4].
This is how |x| <= pi/4 is tested:

GET_HIGH_WORD(ix,x);
ix &= 0x7fff;
if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);

Why is it done like that? Is it faster or more portable
or in any way better that comparing the value itself to M_PI_4?
(Or was it, in 1993?)

For Inf and NaN arguments, NaN is returned as follows:

/* sin(Inf or NaN) is NaN */
else if (ix>=0x7ff0) return x-x;

Again, why the integer conversion? Would

else if (isinf(x) || isnan(x))

be slower? Less portable?

Also, is "return x-x" just a fast/clever way to return NaN
for x being Inf _or_ NaN, preferable to return nan("")?

I have other questions in k_sin.c and elsewhere, but I'll stop here.
Please see my naive diff below. I tried this on amd64, macppc and armv7;
regress/lib/libm/fpaccuracy results in zero difference. The only
other thing I tried is "sox -n out.raw synth 10 sin" which also gives
the same result. (That's hardly "testing" of course.)

Surely I am missing something elementary.
I'll be glad for any insight.

Thank you for your time

Jan


Index: s_sin.c
===
RCS file: /cvs/src/lib/libm/src/s_sin.c,v
retrieving revision 1.11
diff -u -p -r1.11 s_sin.c
--- s_sin.c 12 Sep 2016 19:47:02 -  1.11
+++ s_sin.c 26 Sep 2017 08:54:03 -
@@ -50,17 +50,12 @@ double
 sin(double x)
 {
double y[2],z=0.0;
-   int32_t n, ix;
+   int32_t n;
 
-/* High word of x. */
-   GET_HIGH_WORD(ix,x);
-
-/* |x| ~< pi/4 */
-   ix &= 0x7fff;
-   if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);
-
-/* sin(Inf or NaN) is NaN */
-   else if (ix>=0x7ff0) return x-x;
+   if (-M_PI_4 < x && x < M_PI_4)
+   return __kernel_sin(x,z,0);
+   else if (isinf(x) || isnan(x))
+   return nan("");
 
 /* argument reduction needed */
else {