Timothy Miller wrote:
On 7/19/06, James Richard Tyrer <[EMAIL PROTECTED]> wrote:
Timothy Miller wrote:
> I suspect you're referring to the fact that you can transfer data on
> one bank while another is being precharged or activated.  I did some
> calculations on that (years ago), and especially for graphics, the
> performance advantage is trivial.  It's not worth the extra logic.

IIUC, we are *not* going to use page/row mode read/write access to the DRAM.

I'm not sure what you mean.  Can you tell me how that is different
from some other mode?

There is page/row mode and there is random.

We'll open different rows on each bank.  This means we can do
back-to-back accesses to different objects in different parts of
memory without row misses.

So, we will need to have 4 row registers and comparators to determine a row hit. This means that the state machine is going to need to deal with row misses. Not really complicated unless we want more than one thing happening at the same time.

I tend to agree that the logic gets messy and there is little gain.

I do have some questions about this.

Do we have to do refresh?  Note that you don't on a video card with 4 or
8 MB of memory, you just set up the address lines so that each row gets
read once per scan line.  I presume that this isn't going to work with a
lot of memory; there are more than 512 rows.

I've considered that, but it seems to me that there's little point in
my logic to have a counter when the RAM chip already has one.  The
only way to gain anything would be to keep track of how long ago any
one row was last accessed, but that's a lot of logic for practically
no gain.

Actually, I said the wrong thing. The problem is the 4 banks. If all of the screen memory is going to be in one of them, it won't work. For it to work with no refresh, you would have to have 1/4 of the screen memory in each bank. You have 8K lines in each bank for a total of 32K lines and to have it work with 480i you would need to access 16 lines for each scan line. This doesn't work because you can only figure on 512 lines for the picture. You could do it differently. Hmmm? But that would mean 64 or 128 different memory lines per scan line. It isn't going to work very well.

Are we going to use evenly distributed refresh, or refresh enable
synchronized with the horizontal scan (i.e. refresh on horizontal sync)?

I'd refresh one row so that the whole memory gets refreshed in its max
refresh period.  Doing one refresh or a couple back to back doesn't
make much difference.  They're not very frequent.

The only issue is that you don't want refresh of memory to conflict with refresh of the screen.

  Either way, I presume that we will need a refresh posted counter since
refresh has to wait for the controller to be (forced?) into the idle state.

Yeah.  We'll have a separate counter for that.

The spec says 400 MHz DDR memory (DDR400B, 200 MHz clock) so I presume
that we don't need to be able to adjust CAS latency (it will always be 3).

I would design it to be adjustable, just in case we can get faster CAS
latency for a high-perf model or something.  Besides, I want a
generalized design that I can license to others for other designs.

OK, so we want a register to hold a 3 bit code for the number of half cycles for t(CAS). This will be used to determine the number of NOP commands in the memory cycle after CAS and also when DQS becomes valid.

IIUC, screen refresh will read 4 pixels at a time.  Are we going to use
any other methods to accelerate reading screen refresh?  Specifically,
row/page mode?  That is, will screen refresh read be just another read
except that it has the highest priority in bus arbitration.

Basically.  We'll read video data in bulk into a fifo.

What option is there besides page/row mode?  I'm confused about that.

I guess that with this type of memory that is the only option.

I presume that there will be a separate 16 byte read buffer for screen
refresh, but that isn't part of the memory controller but it might need
to be synced with the refresh.  I would probably use two to make a short
FIFO since memory read isn't going to be totally deterministic and you
have to read with a burst of 2.

The fifo will hold 512 256-bit words.  That's because it needs to be
that wide and dual-ported, so the FPGA SRAMs dictate the depth.  In
effect, you'll be able to read an entire scanline into the fifo.
Video has the highest priority, so other things will have to wait, so
no glitches on the screen.

I missed something here. Is the memory 64 bits wide -- 4 @ 16 bit wide chips? A two read burst gets you 4 pixels. But, you have a 8 pixel wide FIFO.

{This should probably be in the spec since it is something in the specs of most graphics boards.}

I don't see the advantage of a large FIFO. You only need one large enough to deal with the non-deterministic read time of the memory plus a large safety factor (1.5 to 2x). SRAM is still expensive chip real estate wise.

I didn't ask but (perhaps incorrectly) assumed a burst length of 2 for memory. Is that correct? That is what would give 4 pixels per burst.

--
JRT

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to