> On 4/21/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> >
>> > We don't.  However, we could use software (the compiler/assembler) to
>> do
>> > it.
>>
>> So you need pack/unpack instruction and balanced there use, because it
>> take also one cycle. Or you need individualy selectable word inside the
>> SIMD register which produice big switch.
>
> Here's a solution:
>
> One fully-pipelined vmul unit that can start a new vmul every cycle.
> The vmul unit can do scalar individual ops (where the other three muls
> have a bubble).
> Sometimes, the compiler can do whole-program analysis, carefully
> allocating registers and using pack/unpack/swizzle instructions to get
> a bit more throughput.

It could. But gcc don't do it. Icc from intel did it a little bit. Do you
think you could have enough compiler people to do better than Intel ?

> We don't need separate hardware for smul, just a little multiplexing on
> vmul.

Yep. Why nobody like the idea to keep every thing simple with 4 scalar core ?

Could someone take the compiled code of yesterday and write some part of
the code in vector manner and in scalar manner which show that the vector
part will be faster ?

> It's the divides that'll kill us.  Can we cheat by always using
> reciprocal and multiply?

There is no divide needed.


_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to