On 4/21/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > We don't. However, we could use software (the compiler/assembler) to do > > it. > > So you need pack/unpack instruction and balanced there use, because it > take also one cycle. Or you need individualy selectable word inside the > SIMD register which produice big switch.
Here's a solution: One fully-pipelined vmul unit that can start a new vmul every cycle. The vmul unit can do scalar individual ops (where the other three muls have a bubble). Sometimes, the compiler can do whole-program analysis, carefully allocating registers and using pack/unpack/swizzle instructions to get a bit more throughput. We don't need separate hardware for smul, just a little multiplexing on vmul. It's the divides that'll kill us. Can we cheat by always using reciprocal and multiply? _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
