On 4/21/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > One fully-pipelined vmul unit that can start a new vmul every cycle. > > The vmul unit can do scalar individual ops (where the other three muls > > have a bubble). > > Sometimes, the compiler can do whole-program analysis, carefully > > allocating registers and using pack/unpack/swizzle instructions to get > > a bit more throughput. > > It could. But gcc don't do it. Icc from intel did it a little bit. Do you > think you could have enough compiler people to do better than Intel ?
Part of GCC's problem is that it's trying to be very general. We'd develop a very special-purposes compiler and we can evolve the compiler and architecture together so that it is more convenient to compile. I suspect that there are challenges with vectorizing for SSE that we can avoid. > > We don't need separate hardware for smul, just a little multiplexing on > > vmul. > > Yep. Why nobody like the idea to keep every thing simple with 4 scalar core ? The control and routing hardware for doing vectors is simpler than doing four scalars at the same time. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
