Re: [Open-graphics] OGA2 simulator: Something compiles and runs

whygee Thu, 05 Jul 2012 18:09:03 -0700

On Thu, 5 Jul 2012 16:28:07 +0200, Mark Marshall wrote:

Hi.

Hi !


Sorry to spill some maths here, but I thought I could help a tiny bit.

It's probably worth adding some instructions to help calculate some
hard math functions as well.

Timothy once said it would be for later, since OG is not a true GPGPU.
Let's keep it simple and working first :-)

  Square root and inverse square root
spring to mind. We have a long pipeline, which means we could get a
lot done.  It would also be helpful (if we didn't want to calculate
the whole square root) to calculate a partial answer.  It's then much
easier to calculate the complete answer in only a few instructions.

t0 = aprox-square-root(X)
t1 = (t0 + X/t0) / 2

If t0 has n bits of accuracy t1 has 2n bits of accuracy.

no, the Newton method doubles the accuracy, which means n+1 bits.

the "2n" method is the Newton-Raphson method, which has more complexstepsbut exponential (vs linear) convergence. In the SHARC DSP user'smanual,

it mentions a lookup table that takes 6 bits from the mantissa and
gives 4 bits of mantissa. With 3 iterations of Newton-Raphson,
it gets 8 then 16 then 32 bits of accurate mantissa (the SHARC is a
40-bit FP DSP). That suits OG's requirements :-)
This method is also enhanced a lot in more modern architectures,

I once read a paper about the AMD implementation, that combines 2 bigtablesto skip one iteration... but this requires lots of silicon so it mustbe

needed often (distance computation, mainly ?).

The SHARC's simple hardcoded LUT could fit in 64 bytes of OG's cacheRAM so no special

instruction is required.

   (Theres a
really nice geometric proof of this, which I love.  We have a
rectangle of area X.  We need to convert it to a square of area X. 
A more "square" rectangle will have one side that is the average of
the sides of our initial rectangle).


Look at page B40 (n°542) of
http://www.analog.com/static/imported-files/processor_manuals/50836807228561ADSP2106xSHARCProcessorUsersManual_Revision2_1.pdf

   B ALU=Floating-Point Operations
       Compute
       Fn RSQRTS Fx
       Syntax:
                  Fn = RSQRTS Fx

Function: Creates a 4-bit accurate seed for 1/√Fx, thereciprocal square rootof Fx. The mantissa of the seed is determined from a ROM tableusing theLSB of the biased exponent of Fx concatenated with the 6 MSBs(excludingthe hidden bit) of the mantissa of Fx as an index. The unbiasedexponentof the seed is calculated as the twos complement of the unbiasedFxexponent, shifted right by one bit and decremented by one; i.e.,if e is the

       unbiased exponent of Fx, then the unbiased exponent of

Fn = –INT[e/2] – 1. The sign of the seed is the sign of theinput. ±Zeroreturns ±Infinity and sets the overflow flag. +Infinity returns+Zero. A

       NAN input or a negative nonzero input returns an all 1s result.

The following code calculates a floating-point reciprocal squareroot(1/√x) using a Newton-Raphson iteration algorithm.* The resultisaccurate to one LSB in whichever format mode, 32-bit or 40-bit,is set(32-bit only for ADSP-21010). To calculate the square root,simplymultiply the result by the original input. The following inputsarerequired: F0=input, F8=3.0, F1=0.5. The result is returned inF4. (The fourhighlighted instructions can be removed if only a ±1 LSBaccurate single-

       precision result is necessary.)

F4=RSQRTS F0; {Fetch 4-bitseed}

                  F12=F4*F4;                            {F12=X0^2}
                  F12=F12*F0;                           {F12=C*X0^2}

F4=F1*F4, F12=F8-F12; {F4=.5*X0,F12=3-C*X0^2}F4=F4*F12;{F4=X1=.5*X0(3-C*X0^2)}

                  F12=F4*F4;                            {F12=X1^2}
                  F12=F12*F0;                           {F12=C*X1^2}

F4=F1*F4, F12=F8-F12; {F4=.5*X1,F12=3-C*X1^2}F4=F4*F12;{F4=X2=.5*X1(3-C*X1^2)}

                    F12=F4*F4;                           {F12=X2^2}
                    F12=F12*F0;                          {F12=C*X2^2}

F4=F1*F4, F12=F8-F12; {F4=.5*X2,F12=3-C*X2^2}F4=F4*F12;{F4=X3=.5*X2(3-C*X2^2)}

(snip)

* Cavanagh, J. 1984. Digital Computer Arithmetic. McGraw-Hill.Page 278.

Note that the earliest mention of NR iterations I've seen are on Craymanuals (early 70s,

or was it the 60s on the CDC ?),

but I have read that the Newton method was already known in the middleEast (Babylonians ?)

Anyway, computing FP SQRT is just a lookup then a bunch of multipliesand substractions.

No big deal ;-)

MM

YG
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] OGA2 simulator: Something compiles and runs

Reply via email to