On 7/20/06, James Richard Tyrer <[EMAIL PROTECTED]> wrote:
Timothy Miller wrote:
> On 7/19/06, James Richard Tyrer <[EMAIL PROTECTED]> wrote:
>> Timothy Miller wrote:
>> > I suspect you're referring to the fact that you can transfer data on
>> > one bank while another is being precharged or activated.  I did some
>> > calculations on that (years ago), and especially for graphics, the
>> > performance advantage is trivial.  It's not worth the extra logic.
>>
>> IIUC, we are *not* going to use page/row mode read/write access to the
>> DRAM.
>
> I'm not sure what you mean.  Can you tell me how that is different
> from some other mode?

There is page/row mode and there is random.

Oh, like how accesses are reordered for bursts?  I have no idea what
that's for, so I have ignored it.


> We'll open different rows on each bank.  This means we can do
> back-to-back accesses to different objects in different parts of
> memory without row misses.
>
So, we will need to have 4 row registers and comparators to determine a
row hit.  This means that the state machine is going to need to deal
with row misses.  Not really complicated unless we want more than one
thing happening at the same time.

Exactly, so let's just do one thing at a time.

>>
>> I tend to agree that the logic gets messy and there is little gain.
>>
>> I do have some questions about this.
>>
>> Do we have to do refresh?  Note that you don't on a video card with 4 or
>> 8 MB of memory, you just set up the address lines so that each row gets
>> read once per scan line.  I presume that this isn't going to work with a
>> lot of memory; there are more than 512 rows.
>
> I've considered that, but it seems to me that there's little point in
> my logic to have a counter when the RAM chip already has one.  The
> only way to gain anything would be to keep track of how long ago any
> one row was last accessed, but that's a lot of logic for practically
> no gain.

Actually, I said the wrong thing.  The problem is the 4 banks.  If all
of the screen memory is going to be in one of them, it won't work.  For
it to work with no refresh, you would have to have 1/4 of the screen
memory in each bank.  You have 8K lines in each bank for a total of 32K
lines and to have it work with 480i you would need to access 16 lines
for each scan line.  This doesn't work because you can only figure on
512 lines for the picture.  You could do it differently. Hmmm?  But that
would mean 64 or 128 different memory lines per scan line.  It isn't
going to work very well.

Yeah.  Back in the days when 1/2 your 24MiB framebuffer was used for
video, that could work.  Unfortunately, that won't work in this case.


>> Are we going to use evenly distributed refresh, or refresh enable
>> synchronized with the horizontal scan (i.e. refresh on horizontal sync)?
>
> I'd refresh one row so that the whole memory gets refreshed in its max
> refresh period.  Doing one refresh or a couple back to back doesn't
> make much difference.  They're not very frequent.

The only issue is that you don't want refresh of memory to conflict with
refresh of the screen.

I see what you're saying.  Let's see.  Worst case, refresh will take:
read2precharge + precharge2refresh + refresh2activate + activate2read,
which is something like 5 + 3 + 14 + 3 = 25.

Can we stand to have 25 cycles inserted randomly into video?  What is
the typical length of tie between a video request and when the data is
needed?  (i.e. how many scanlines per second are most video modes?)


>>   Either way, I presume that we will need a refresh posted counter since
>> refresh has to wait for the controller to be (forced?) into the idle
>> state.
>
> Yeah.  We'll have a separate counter for that.
>
>> The spec says 400 MHz DDR memory (DDR400B, 200 MHz clock) so I presume
>> that we don't need to be able to adjust CAS latency (it will always be
>> 3).
>
> I would design it to be adjustable, just in case we can get faster CAS
> latency for a high-perf model or something.  Besides, I want a
> generalized design that I can license to others for other designs.

OK, so we want a register to hold a 3 bit code for the number of half
cycles for t(CAS).  This will be used to determine the number of NOP
commands in the memory cycle after CAS and also when DQS becomes valid.

We should handle the read latency as a separate pipeline.  The read
command goes to memory and also sets off a shift register that
triggers at the right time.  It's pipelined, so we really can't do it
any other way.

>> IIUC, screen refresh will read 4 pixels at a time.  Are we going to use
>> any other methods to accelerate reading screen refresh?  Specifically,
>> row/page mode?  That is, will screen refresh read be just another read
>> except that it has the highest priority in bus arbitration.
>
> Basically.  We'll read video data in bulk into a fifo.
>
> What option is there besides page/row mode?  I'm confused about that.
>
I guess that with this type of memory that is the only option.

Well, you said something about 'random', which is how I treat these
memories.  Since we're taking about DDR here, then I always do bursts
of length 2, starting on an even address.

>>
>> I presume that there will be a separate 16 byte read buffer for screen
>> refresh, but that isn't part of the memory controller but it might need
>> to be synced with the refresh.  I would probably use two to make a short
>> FIFO since memory read isn't going to be totally deterministic and you
>> have to read with a burst of 2.
>
> The fifo will hold 512 256-bit words.  That's because it needs to be
> that wide and dual-ported, so the FPGA SRAMs dictate the depth.  In
> effect, you'll be able to read an entire scanline into the fifo.
> Video has the highest priority, so other things will have to wait, so
> no glitches on the screen.

I missed something here.  Is the memory 64 bits wide -- 4 @ 16 bit wide
chips?  A two read burst gets you 4 pixels.  But, you have a 8 pixel
wide FIFO.

There are 8 chips, 16 bits wide, for a 128-bit bus.  But that bus is
running at 400MHz.  Plus, we always have to do bursts of length 2, so
internally, we just carry around 256-bit data at 200MHz (in the memory
controller--other things produce/consume it more slowly via fifos).

{This should probably be in the spec since it is something in the specs
of most graphics boards.}

I don't see the advantage of a large FIFO.  You only need one large
enough to deal with the non-deterministic read time of the memory plus a
large safety factor (1.5 to 2x).  SRAM is still expensive chip real
estate wise.

There are 96 of them on the 3S4000, and we haven't even scratched the
surface.  If we find that we've overallocated them, we'll change
things around, but for the moment, a 256x512 fifo seems appropriate.

I didn't ask but (perhaps incorrectly) assumed a burst length of 2 for
memory.  Is that correct?  That is what would give 4 pixels per burst.

Yes, but it's 8 pixels, because the bus is 128 bits.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to