2009/9/28 Timothy Normand Miller <[email protected]>

> On Wed, Sep 23, 2009 at 6:36 AM, Kenneth Ostby <[email protected]> wrote:
> > Nicolas Boulay:
> >>2009/9/23 Kenneth Ostby <[email protected]>:
> >>> Hi,
> >>>
> >>> Nicolas Boulay:
> >>>>2009/9/23 Hugh Fisher <[email protected]>:
> >>>>> Andre Pouliot wrote:
>
 <...>

> These instructions are rare:
>
> - div
> - convert
> - memory load/store (more common than the others but rarer than add and
> mult
>
> These instructions are common:
>
> - add
> - mult
>
> This can be had for free:
>
> - Flow control (but it will be completely absent from many kernels)
>
>
> So there's no point in adding extra instruction bits for anything
> other than add and mult.  Also, since we won't tend to mix fp and int,
> there's no point in providing simultaneous access to fp and int add
> and mul.
>
> So if we do LIW, I propose this:
>
> Slot 0:  any instruction (add, sub, div, flow control, memory, etc.)
> Slot 1:  any of fp add, fp mul, int add, int mul not conflicting with slot
> 0.
>
> I'm assuming that we include vector instructions (even if they get
> unrolled into scalars).
>
> Now, you could do an int add and an fp add at the same time or an add
> and a mul, or whatever.  Or you could do one at the same time as a
> memory op.
>
> I fear that the code bloat will be so bad that we'll get killed by the
> icache misses, completely defeating the gains we get from LIW.
>
>
slot 1 could also do also 3 registers read at a time. This enable the use of
fused ADD and MUL operation(a*b+c), 3 way add could also be done(a+b+c). I
don't know the size of a 3 way adder compare to 2 adder but i think it
smaller and this remove read after write dependancies. I think it's one of
the easiest way to speed up code if SW compatibilities is not an issue.

Nicolas
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to