On Thu, 5 Jul 2012 16:28:07 +0200, Mark Marshall wrote:
Hi.
Hi !
Sorry to spill some maths here, but I thought I could help a tiny bit.
It's probably worth adding some instructions to help calculate some
hard math functions as well.
Timothy once said it would be for later, since OG is not a true GPGPU.
Let's keep it simple and working first :-)
Square root and inverse square root
spring to mind. We have a long pipeline, which means we could get a
lot done. It would also be helpful (if we didn't want to calculate
the whole square root) to calculate a partial answer. It's then much
easier to calculate the complete answer in only a few instructions.
t0 = aprox-square-root(X)
t1 = (t0 + X/t0) / 2
If t0 has n bits of accuracy t1 has 2n bits of accuracy.
no, the Newton method doubles the accuracy, which means n+1 bits.
the "2n" method is the Newton-Raphson method, which has more complex
steps
but exponential (vs linear) convergence. In the SHARC DSP user's
manual,
it mentions a lookup table that takes 6 bits from the mantissa and
gives 4 bits of mantissa. With 3 iterations of Newton-Raphson,
it gets 8 then 16 then 32 bits of accurate mantissa (the SHARC is a
40-bit FP DSP). That suits OG's requirements :-)
This method is also enhanced a lot in more modern architectures,
I once read a paper about the AMD implementation, that combines 2 big
tables
to skip one iteration... but this requires lots of silicon so it must
be
needed often (distance computation, mainly ?).
The SHARC's simple hardcoded LUT could fit in 64 bytes of OG's cache
RAM so no special
instruction is required.
(Theres a
really nice geometric proof of this, which I love. We have a
rectangle of area X. We need to convert it to a square of area X.
A more "square" rectangle will have one side that is the average of
the sides of our initial rectangle).
Look at page B40 (n°542) of
http://www.analog.com/static/imported-files/processor_manuals/50836807228561ADSP2106xSHARCProcessorUsersManual_Revision2_1.pdf
B ALU=Floating-Point Operations
Compute
Fn RSQRTS Fx
Syntax:
Fn = RSQRTS Fx
Function: Creates a 4-bit accurate seed for 1/√Fx, the
reciprocal square root
of Fx. The mantissa of the seed is determined from a ROM table
using the
LSB of the biased exponent of Fx concatenated with the 6 MSBs
(excluding
the hidden bit) of the mantissa of Fx as an index. The unbiased
exponent
of the seed is calculated as the twos complement of the unbiased
Fx
exponent, shifted right by one bit and decremented by one; i.e.,
if e is the
unbiased exponent of Fx, then the unbiased exponent of
Fn = –INT[e/2] – 1. The sign of the seed is the sign of the
input. ±Zero
returns ±Infinity and sets the overflow flag. +Infinity returns
+Zero. A
NAN input or a negative nonzero input returns an all 1s result.
The following code calculates a floating-point reciprocal square
root
(1/√x) using a Newton-Raphson iteration algorithm.* The result
is
accurate to one LSB in whichever format mode, 32-bit or 40-bit,
is set
(32-bit only for ADSP-21010). To calculate the square root,
simply
multiply the result by the original input. The following inputs
are
required: F0=input, F8=3.0, F1=0.5. The result is returned in
F4. (The four
highlighted instructions can be removed if only a ±1 LSB
accurate single-
precision result is necessary.)
F4=RSQRTS F0; {Fetch 4-bit
seed}
F12=F4*F4; {F12=X0^2}
F12=F12*F0; {F12=C*X0^2}
F4=F1*F4, F12=F8-F12; {F4=.5*X0,
F12=3-C*X0^2}
F4=F4*F12;
{F4=X1=.5*X0(3-C*X0^2)}
F12=F4*F4; {F12=X1^2}
F12=F12*F0; {F12=C*X1^2}
F4=F1*F4, F12=F8-F12; {F4=.5*X1,
F12=3-C*X1^2}
F4=F4*F12;
{F4=X2=.5*X1(3-C*X1^2)}
F12=F4*F4; {F12=X2^2}
F12=F12*F0; {F12=C*X2^2}
F4=F1*F4, F12=F8-F12; {F4=.5*X2,
F12=3-C*X2^2}
F4=F4*F12;
{F4=X3=.5*X2(3-C*X2^2)}
(snip)
* Cavanagh, J. 1984. Digital Computer Arithmetic. McGraw-Hill.
Page 278.
Note that the earliest mention of NR iterations I've seen are on Cray
manuals (early 70s,
or was it the 60s on the CDC ?),
but I have read that the Newton method was already known in the middle
East (Babylonians ?)
Anyway, computing FP SQRT is just a lookup then a bunch of multiplies
and substractions.
No big deal ;-)
MM
YG
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)