On 4/20/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 4/20/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
>> I don't think it's wise to use SIMD ALU here. All scalar code will use
>> the
>> SIMD FPU with 3 FMUL unit idle. Because everything is strongly
>> parrallel,
>> i think it's better to stay scalar.
>
> If there are enough independent scalars that can be scheduled, you can
> pack them and run them in parallel.
>

So you need the logic to detect that a pack is possible, and you need the
switch that permit to connect the different register bank and the FPU.

No.  The compiler can optimize this stuff.

Have a look at the instruction set you came up with; 19 out of 27 ops are vector ops.  It is going to be far more important to optimize vector ops than to ensure full utilisation of the silicon at all times.  If you can do a dot product in 10 instructions (8 load, 1 vmul, 1 store) then that is a big gain over 17 instructions (8 load, 4 smul, 3 add, 1 store).  If you have a wide memory bus and can fetch four floats in one op, so much the better; now we are down to 4 instructions instead of 17.  Vector operations are very likely to dominate a shader (any 3D processing for that matter) therefore the whole architecture should  have the goal of optimising vector operations.

Tom

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to