On 2009-10-10, Timothy Normand Miller wrote:
> In light of your comment, I say we ditch the icache space optimization
> for now.  Or at most, we might consider feeding TWO pipelines from the
> same icache, since the BRAMs are dual-ported, but we can even make
> that an afterthought.

How many bits can we read off a BRAM in one cycle?  In HQ we configured
the BRAM as 512 words of 32 bits, so with two ports, is 64 bits the
limit?  I'm thinking about how much control we have give the program
over the ALU by translating one instruction into microinstructions to
configure each stage of the ALU.  That could allow a single full-width
FP, two half-width, or several simple integer operations.  It would go
something like this:

The first stage receives (insn_kind, reg_a, reg_b, insn_no) it fetches
reg_a and reg_b from the thread context and insn_no from the
microinstruction store.  The microinstruction contains maybe one byte
per stage, which will determine what function to perform on the data
from the previous stage.  The available functions are carefully selected
to allow FP add/sub, and FP mult as the hardest constraints.

We may have a standard set of microinstructions, but if we get ambitious
with the compiler, it could create it's own adapted for the specific
kernel.  The furthest we could go here is probably 32 bit instructions
and 32 bit microinstruction sequences.

Then, maybe it's better to encode it in the pipeline and save the BRAMs,
esp since I'm arguing not to group threads.

> And moreover, we need to think about how many independent paths there
> will be to the global dcache.  Lots of shaders trying to hit memory at
> once will bog down, serialized really.  The main reason we have so may
> shaders, actually, is because the proportion of math and flow control
> instructions in a kernel should be high compared to the number of
> memory accesses.

See below.

> > The proposed architecture is nice given a mostly linear flow of
> > instructions which only use local memory, but can deal with the more
> > general case effectively?  If threads were much more lightweight, it
> > would seem easier to come up with a solution.
> 
> What did you have in mind?

Given that we only save 17/20 of the space, I don't have a feasible
solution but for the curious:

    Let a continuation point be a) either the target address or the
    address of the instruction after a conditional jump as chosen by the
    compiler for that instruction, or c) directly follows a load
    instruction.  We declare continuation points to be a limit resource,
    in the sense that each kernel can have a finite number.  Thus, we
    can use a queue for each continuation point to collect threads which
    have reached that point.

On the other hand, I think we could use some queueing and sorting to
optimise memory access now that if we go with independent threads.  We
have 60 pipelines, so it seems reasonable to spend a bit logic to keep
them busy.  Instead of letting the ALU do load, we send these
instructions to a shared memory unit.  It may be tempting to add one or
two extra threads per ALU to keep the ALU busy, but due to the cost and
the low frequency of loads, it may be better to send a "phantom" down
the ALU for the thread doing the load.  The result of the load can be
fetched back via a short return queue on each ALU.  This could be just
one or two slots if we allow stalling on rare cases.  As soon as a
"phantom" comes out of the ALU, a real thread is dequeued and passed
down it place of it.

Once memory requests goes to a shared unit, maybe we can spend some
transistors on it?  We have four memory rows, as far as I understand.
Compare each incoming request and queue them for the correct row if they
match one.  Otherwise pass them into a holding area where we do some
heuristics I haven't quite out to elect which will be next row to open
once one of the former queues dries out.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to