On Thu, 5 Jul 2012 16:28:07 +0200, Mark Marshall wrote:
Hi.
Hi !

Sorry to spill some maths here, but I thought I could help a tiny bit.

It's probably worth adding some instructions to help calculate some
hard math functions as well.
Timothy once said it would be for later, since OG is not a true GPGPU.
Let's keep it simple and working first :-)

  Square root and inverse square root
spring to mind. We have a long pipeline, which means we could get a
lot done.  It would also be helpful (if we didn't want to calculate
the whole square root) to calculate a partial answer.  It's then much
easier to calculate the complete answer in only a few instructions.

t0 = aprox-square-root(X)
t1 = (t0 + X/t0) / 2

If t0 has n bits of accuracy t1 has 2n bits of accuracy.
no, the Newton method doubles the accuracy, which means n+1 bits.

the "2n" method is the Newton-Raphson method, which has more complex steps but exponential (vs linear) convergence. In the SHARC DSP user's manual,
it mentions a lookup table that takes 6 bits from the mantissa and
gives 4 bits of mantissa. With 3 iterations of Newton-Raphson,
it gets 8 then 16 then 32 bits of accurate mantissa (the SHARC is a
40-bit FP DSP). That suits OG's requirements :-)
This method is also enhanced a lot in more modern architectures,
I once read a paper about the AMD implementation, that combines 2 big tables to skip one iteration... but this requires lots of silicon so it must be
needed often (distance computation, mainly ?).

The SHARC's simple hardcoded LUT could fit in 64 bytes of OG's cache RAM so no special
instruction is required.

   (Theres a
really nice geometric proof of this, which I love.  We have a
rectangle of area X.  We need to convert it to a square of area X. 
A more "square" rectangle will have one side that is the average of
the sides of our initial rectangle).

Look at page B40 (n°542) of
http://www.analog.com/static/imported-files/processor_manuals/50836807228561ADSP2106xSHARCProcessorUsersManual_Revision2_1.pdf

   B ALU=Floating-Point Operations
       Compute
       Fn RSQRTS Fx
       Syntax:
                  Fn = RSQRTS Fx
Function: Creates a 4-bit accurate seed for 1/√Fx, the reciprocal square root of Fx. The mantissa of the seed is determined from a ROM table using the LSB of the biased exponent of Fx concatenated with the 6 MSBs (excluding the hidden bit) of the mantissa of Fx as an index. The unbiased exponent of the seed is calculated as the twos complement of the unbiased Fx exponent, shifted right by one bit and decremented by one; i.e., if e is the
       unbiased exponent of Fx, then the unbiased exponent of
Fn = –INT[e/2] – 1. The sign of the seed is the sign of the input. ±Zero returns ±Infinity and sets the overflow flag. +Infinity returns +Zero. A
       NAN input or a negative nonzero input returns an all 1s result.
The following code calculates a floating-point reciprocal square root (1/√x) using a Newton-Raphson iteration algorithm.* The result is accurate to one LSB in whichever format mode, 32-bit or 40-bit, is set (32-bit only for ADSP-21010). To calculate the square root, simply multiply the result by the original input. The following inputs are required: F0=input, F8=3.0, F1=0.5. The result is returned in F4. (The four highlighted instructions can be removed if only a ±1 LSB accurate single-
       precision result is necessary.)
F4=RSQRTS F0; {Fetch 4-bit seed}
                  F12=F4*F4;                            {F12=X0^2}
                  F12=F12*F0;                           {F12=C*X0^2}
F4=F1*F4, F12=F8-F12; {F4=.5*X0, F12=3-C*X0^2} F4=F4*F12; {F4=X1=.5*X0(3-C*X0^2)}
                  F12=F4*F4;                            {F12=X1^2}
                  F12=F12*F0;                           {F12=C*X1^2}
F4=F1*F4, F12=F8-F12; {F4=.5*X1, F12=3-C*X1^2} F4=F4*F12; {F4=X2=.5*X1(3-C*X1^2)}
                    F12=F4*F4;                           {F12=X2^2}
                    F12=F12*F0;                          {F12=C*X2^2}
F4=F1*F4, F12=F8-F12; {F4=.5*X2, F12=3-C*X2^2} F4=F4*F12; {F4=X3=.5*X2(3-C*X2^2)}
(snip)

* Cavanagh, J. 1984. Digital Computer Arithmetic. McGraw-Hill. Page 278.


Note that the earliest mention of NR iterations I've seen are on Cray manuals (early 70s,
or was it the 60s on the CDC ?),
but I have read that the Newton method was already known in the middle East (Babylonians ?)

Anyway, computing FP SQRT is just a lookup then a bunch of multiplies and substractions.
No big deal ;-)

MM
YG
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to