2009/9/28 Timothy Normand Miller <[email protected]> > On Wed, Sep 23, 2009 at 6:36 AM, Kenneth Ostby <[email protected]> wrote: > > Nicolas Boulay: > >>2009/9/23 Kenneth Ostby <[email protected]>: > >>> Hi, > >>> > >>> Nicolas Boulay: > >>>>2009/9/23 Hugh Fisher <[email protected]>: > >>>>> Andre Pouliot wrote: > <...>
> These instructions are rare: > > - div > - convert > - memory load/store (more common than the others but rarer than add and > mult > > These instructions are common: > > - add > - mult > > This can be had for free: > > - Flow control (but it will be completely absent from many kernels) > > > So there's no point in adding extra instruction bits for anything > other than add and mult. Also, since we won't tend to mix fp and int, > there's no point in providing simultaneous access to fp and int add > and mul. > > So if we do LIW, I propose this: > > Slot 0: any instruction (add, sub, div, flow control, memory, etc.) > Slot 1: any of fp add, fp mul, int add, int mul not conflicting with slot > 0. > > I'm assuming that we include vector instructions (even if they get > unrolled into scalars). > > Now, you could do an int add and an fp add at the same time or an add > and a mul, or whatever. Or you could do one at the same time as a > memory op. > > I fear that the code bloat will be so bad that we'll get killed by the > icache misses, completely defeating the gains we get from LIW. > > slot 1 could also do also 3 registers read at a time. This enable the use of fused ADD and MUL operation(a*b+c), 3 way add could also be done(a+b+c). I don't know the size of a 3 way adder compare to 2 adder but i think it smaller and this remove read after write dependancies. I think it's one of the easiest way to speed up code if SW compatibilities is not an issue. Nicolas
_______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
