On 7/20/06, James Richard Tyrer <[EMAIL PROTECTED]> wrote:
Timothy Miller wrote:
> On 7/20/06, James Richard Tyrer <[EMAIL PROTECTED]> wrote:
>
>> A major consideration is what the GPU is going to do regarding
>> pipelining.  A pipelined read is going to have (I think) minimum:
>>
>> 1/2 clock after address generation for row address compare
>>
>> 3 clocks for CAS Latency.
>>
>> 1/2 clock to get the data to the GPU for the first 4 pixels (or 32
>> bytes).
>>
>> The 1/2 clocks may be overly optimistic.
>>
>> Then getting the second 4 pixels or (32 bytes) could overlap
>> address output for the next read.
>>
>> How many pixels will the GPU process at once?
>>
>> Will it be able to generate an address each clock (200 MHz)?
>
> In the Spartan, the GPU is unlikely to run at 200MHz.  We'll get the
> speed from the ASIC.

This is important to the selection of memory access methods.  If the GPU
can't keep the pipeline full, it might be better to interleave the
screen refresh.

In fact, they will be interleaved.


> But in any case, the basic design of the GPU separates read request
> generation from read data receipt.  In between those two units is a
> fifo that absorbs the read latency.

I don't see a FIFO absorbing negative time, so I must be missing
something.  Pipelining requires that the address be issued before the
data time.  The memory controller needs a FIFO to remember what to do
with the address when it comes up in the memory.

Unit REQ sends a request to the memory controller (an address) (fifo A).
Unit REQ also puts appropriate metadata into a queue (fifo B).

Unit RCV waits until metadata arrives in the queue (B) and matching
read data has arrived from the memory controller (fifo C).  When that
happens, it dequeues them and goes on.

With this design, we can get <fifo-length> requests ahead of the
memory controller, absorbing pipeline and CAS latency, along with
somewhat reducing the impact of row misses.

> Deep pipelining and fifos will probably have the GPU effectively
> processing a few hundred pixels at once, at least.

I don't appear to have been clear.  How *wide* with the GPU be?  How
many pixels will it process in parallel?  Not how many will be in the
pipeline and cache total.  My thought is that if the GPU and the memory
bus are not the same width that you would need a very small cache
between them.  Perhaps this would be a good idea in any case.

The intended width is two pixels.  For simple solid fills, the GPU
won't be able to saturate the memory bandwidth.  If you use a few 3D
features, the total demand will exceed memory bandwidth, and the
arbiter can (by virtue of having lots of different types of pending
requests queued up) process accesses in bulk, minimizing the impact of
row misses and keeping all memory controllers maximally busy all of
the time.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to