Timothy Normand Miller wrote:
As I'm sure most of you are aware, we're testing OGD1 by putting a
semi-complete design into it with PCI, video, memory controller, etc.

We've run into a challenge with video, and we could use some
brain-storming help to solve it.

The problem has to do with async fifos.  Check out the existing designs:

https://svn.suug.ch/repos/opengraphics/main/trunk/rtl/fifos/

The fifos of interest are the "async" fifos, which have head and tail
ends at different clock rates, and fifo_DxW.v, which uses one clock
domain and can be mapped to one or more of the large block RAMs on the
chip.

I can give more specific detail later, but the bottom line is that we
need a fast 512-entry async fifo.  That is, the head end needs to run
at 200MHz.  The problem is that we can't both compare two 9-bit
addresses and use that as a control to increment a 9-bit address in
5ns.

So, I'm looking for more novel approaches.

For a frame of reference, the way the 16-entry async fifo works is as
follows:  There are gray-code head and tail pointers.  When something
is enqueued, the tail pointer is "advanced".  We can determine if the
fifo has entries in it by retiming the tail pointer into the head
clock domain and comparing them.  If they differ, the fifo contains
something, and we can dequeue.

So, what we need are two independent head and tail pointers, each in
its own clock domain.  On the write end, we need to know if the fifo
is full or not.  On the read end, we need to know if an entry in the
RAM is new data (valid) or old data (the fifo is empty).  One idea
I've thought of is to encode validity info into the fifo data itself,
but it's not fully fleshed out.

Thoughts?

512 words seems like a lot.  Is there a reason we need this much?

IAC, a real FIFO buffer is always the fastest as far as transfer rate is concerned. The problem with using a real FIFO that long would be a latency of 512 clocks.

You can shorten the latency of a real FIFO by using multiple shift registers in parallel. E.G. using 4 @ 128 each. They can be striped or used in sequence. This is still a latency of 128 clocks which might not be acceptable if they were empty. You can keep using more smaller shift registers till you get to 512 @ 1 each which means just registers/memory and there is no difference between striped and sequence.

This might work.

A real FIFO uses an extra bit which is set when data is written and cleared when it is read. Perhaps this would work with memory based FIFO. If you has a "dirty" bit, then you wouldn't need to compare addresses to determine if the FIFO was empty. If the memory location at the tail pointer had the dirty bit set, then it would read out that memory location, clear the dirty bit and increment the counter. The head pointer would avoid overrun by not writing to a memory location till the dirty bit was clear.

In theory, the above will work if at reset all dirty bits are cleared and the head and tail pointer are both set to the same value (normally 0). There is an issue of what would happen if this manages to get out of sync due to a stray cosmic ray which might need to be addressed. If the FIFO empties this is simple Wen all dirty bits are reset then the two counters are reset to 0.

--
JRT
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to