Re: [Open-graphics] Mad dash to finish VGA by Jan 7 -- who's with me?

Michael Meeuwisse Sun, 09 Dec 2007 13:03:09 -0800


On 9 Dec 2007, at 20:49, Timothy Normand Miller wrote:

On Dec 7, 2007 7:50 PM, Michael Meeuwisse <[EMAIL PROTECTED]>wrote:

Good. But for the record, what is exactly your definition of a fifo
interface? In the 'RFC pipeline' you also mentioned it but I'm not
too sure what a fifo has and what it doesn't have. I get that it's
based on dual port block ram so it can do the whole clock domain
transition.


Click on the "FIFO" button on this page:
http://www.traversaltech.com/download.phtml

This describes an interface for moving data through a chip, with flow
control.  We use it as the interface for dealing with fifos and
computation pipelines.

Great. I'm missing descriptions of some of the signals, but the restmakes sense.

This is somewhat in the spirit of the Wishbone interface used by all
the opencores.org blocks.  I just like this better, because it seems
simpler, is designed for high throughput with short combinatorial
delays, and doesn't incur any latency to start or stop "bursts".

So instead of pushing
addresses into a request fifo (or four), we have a small fifo that
accepts an address and count to transport the request from the video

clock domain into the memory clock domain. Then these countersappearas read request agents. Each memory controller spills its returndata

into the correct return fifo, and the video controller dequeues them
at the right time.  (This also implies that the arbiter has four
schedulers.  Oh, and we can't forget the "memory refresh" agent that
is the timer for DRAM refresh.)


Wouldn't it make sense to let the arbiter know we want a chunk of
memory starting at an address, and let that be translated to the
correct controllers? Same for the memory refresh, a don't-care
problem from the video fifo perspective. Just keep an eye on the
output-valid line from the arbiter.


When it comes to reads, it may or may not help to have that batch-size
metadata.  At SOME point, we have to break things into individual word
requests because that's how the memories and memory controllers work.

Yes. I'm arguing that it's better to do this in the arbiter than inthe agent, because it's easier at a later point to act 'smart'. Ithink that figuring out that a dozen addresses can be groupedtogether in a single read on a memory controller is much moreexpensive than deciding to ungroup a requested block in multiplereads if it needs to.

If, for some reason, all requests from all agents hit the same memory
row for each access, we could schedule long runs of reads and long
runs of writes, interleaved between different agents, with no penalty.
 There are small penalties for switching between reading and writing.
And there's a rather large penalty for having to change memory rows.
Batching or not, a batch could cross a row boundary, and we might want
the scheduler to do something smart when that happens, say switching
to another agent that happens to be wanting to access the exact same
row.  (We don't have any data on how likely that is, in order to
determine how much logic it's worth spending on it.)


For video, due to latencies, we need to make requests well in advance.
 Then the data has to be sitting there in a queue, waiting for us to
pull it out at EXACTLY the right time.

My idea was to send out a request for one fifo the moment it runs outof data and another fifo starts supplying data. The arbiter will havetime for as long as the other fifo can provide data. We can put theaddress we want (and the block size somehow, say, another queue) in aqueue from the arbiter, and the arbiter can write data back to us asif we were a fifo. Internally, we'd pass it on to the correct fifo(this is all in the arbiter's clock domain).

The tricky part is that the fifo's will not be very big. There's only216KB of block ram available, so say that we take for each fifo a twoblocks of 18Kbit. In our highest target resolution (2048 * 1600 * 24,60Hz) the raster scanner will work through 160.000 full fifos persecond. To get these all filled in time will become quite a strain onthe arbiter.

So the idea is roughly like so;

[snip your description of how video works]

You have the general idea.  I'll describe our specifics.

We have a memory controller whose function you can look up.  What
matters here is that, for a given scanline on the display, it makes
the memory request for the corresponding data one scanline in advance.
 That request is broken down into individual word requests and handed
to the memory controllers.  When the data comes back, it's put into a
queue.  At the time when the video rasterscan needs the data, it'll
pulled from the queue.

I'm not sure how (if at all) this differs from my description. Theonly point I'm making is that the queue the data sits in, is in factpart of the agent. For the addresses going to the memory controllers;this is all arbiter talk, which sits between us agents and thecontrollers. When the data comes back the arbiter kept track of whataddress was associated with this data and plays it on to the relevantagent. Interesting, but not relevant for the video fifo. :)

Our video controller is a programmable state machine that can be
programmed to do a wide variety of different video modes.  The same
download page I mentioned earlier
(http://www.traversaltech.com/download.phtml) has a link to its
documentation.  Likewise, the queues I describe are linked to from
there.


I'm going to look into it.

Read requests are, effectively or literally, made by putting addresses
into one queue.  The data comes back through another.

Agreed. Does this queue have data relating to the number of bits wewant from that address? Or will we make another queue for that? Or isit predefined (which is nasty, as I tried to explain earlier).

Note that there are no tristates inside of an FPGA.  (Well, there
could hypothetically be, but we never use them.)


You mean my inout usage?


--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project

A final thing to add, I mentioned sending a signal a cycle early. Iessentially meant the 'empty' from the fifo, only a clock early. Thisway we can switch between the fifos driving the bus to the rasterscanner without the raster scanner ever knowing.


Mike
www.projectvga.org



_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Mad dash to finish VGA by Jan 7 -- who's with me?

Reply via email to