> What we really need to decide is how to handle operations (such as
> matrix math) which require a series of steps for the ALU to complete.

Or many chained ALU.

> We can use CISC like operations where the series of operations is in
> microcode.  Or we can have a RISC like control of the ALU where we have
> SIMD that can operate on up to 4 32 bit floats in parallel and issue a
> series of instructions to do what one CISC like operation would do.
> But, you can combine these like the transputer and have the OP
> instruction which calls a macrocode subroutine to do these things.
>

That's the principe of µcode.

> I can't see having more than one ALU per shader since it should have 4
> 32 bit float hardware multipliers.  Since the standard shader operation
> is multiplying three 4x4 matrices, it is the hardware multipliers that
> is going to boost throughput.
>

I don't think it's wise to use SIMD ALU here. All scalar code will use the
SIMD FPU with 3 FMUL unit idle. Because everything is strongly parrallel,
i think it's better to stay scalar.

32 bits flotting point instruction is the op the most used. So the
performance will depend on the number of such unit and the efficiency of
there use.

Beside that complexe 128 bits data path is always harder to route, so it's
mandatory slower than cpu core with 32 bits internal data path.

Nicolas Boulay

> --
> JRT
> _______________________________________________
> Open-graphics mailing list
> [email protected]
> http://lists.duskglow.com/mailman/listinfo/open-graphics
> List service provided by Duskglow Consulting, LLC (www.duskglow.com)
>


_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to