Re: [Open-graphics] Operations for the ALU

Timothy Normand Miller Mon, 16 Jul 2012 09:22:37 -0700

On Mon, Jul 16, 2012 at 11:19 AM, Mark Marshall
<[email protected]> wrote:
> Hi.
>
> How much thought has there been on what operations the ALU will perform
> (floating point).
>
> I can see that we need add, sub and multiply.
>
> I think that divide and inverse square root would be nice to have, but I
> understand that they are costly in terms of area etc..


I should be keeping better notes to keep track of these ideas... which
I think I'll do when I start work in September.  I think one of the
design tradeoffs that we need to explore is how to implement these
operations.  Surely they will happen.  Do we require the compiler to
emit code (a subroutine that ends up enlarging the code a bit for
every shader program that uses it)?  Do we add divide to pools of
"complex operations" units shared among shaders?  Do we give every
pipeline its own divide unit?

Although it can be pipelined, I've never seen any CPU implement it
that way.  The reason is that it's a relatively rare operation.  In an
OOO processor, the fact that divides have the same latency and issue
rate typically has little effect on the instruction throughput
relative to what you'd get from a fully pipelined divider.  As a
result, it would be terribly wasteful of resources to have a pipelined
divider.  The reason I mention this is that to integrated it
_seamlessly_ into our in-order pipeline would require that we pipeline
the divide.  We could do it less seamlessly by explicitly queueing, as
we do with memory requests, and that same interface would be used
whether we had one divider per pipeline, thread processor, or group of
thread processors.

In simulation, we have the luxury of doing the ridiculous, so I'm sure
we'll compare several options for performance and energy:

1. Newton-Raphson divider from compiler
2. Full-precision divider from compiler
3. Fully pipelined in each shader pipeline
4. Iterative, queued for each pipeline
5. Pipelined, queued for each thread processor
6. Iterative, queued for each thread processor
7. Pipelined, queued for N thread processors
8. Iterative, queued for N thread processors

>
> There are some other functions that we could have in hardware, but we can
> possibly work around with the other operations (power, sin, cos).

One relatively analytical thing we can do is analyze the frequency of
these operations in shader programs and, by Amdahl's law, determine
the speedup possible from different implementations.  Then we can
implement a few and simulate them.

> If we decide not to have full divide, square root, etc. has anyone thought
> about having a half-way instruction that would produce a good first
> estimate.  Other chips (IA64?) have these instructions.  If we can get half
> or a quarter of the bits of accuracy that we need from and initial
> instruction then completing the result is much easier.  I thought about
> adding a reciprocal instruction, but that's not much easier to calculate
> that a full divide.

In an ASIC, we may be able to afford a few look-up table ROMs.

The advantage of reciprocal is that there's a single operand, which
makes it more practical for a lookup table.

>
> I was also wondering if it was feasible to split the divide up into two
> opcodes, the first opcode would do half of the work and would leave the
> partial result in some special flip-flops.  The second op-code would
> complete the operation and put the result in the correct place (I know that
> this is a bit nasty, but it lets us extend our pipeline to almost any depth
> we need).

IIRC, early SPARC processors had some "multiply stepping"
instructions.  This sounds similar in spirit.

Ultimately, we need to find the minimum-energy solution.  We run all
of our benchmarks, compute energy, compute geometric mean.  Whatever
gives us the lowest energy number is the one we should favor in an
embedded design.  What gives us the best performance per area is
perhaps what we should favor for a desktop design.



-- 
Timothy Normand Miller, PhD
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Operations for the ALU

Reply via email to