Hi All, About me: I work in Cambridge, UK, for a small firm which designs DSPs. I have a fair amount of RTL experience - especially with asynchronous memory interface design.
I have been lurking with intent on this list for a while ;-) I have little free time to help develop but perhaps can offer advice now and then. Anyway, an excellent resource for any ASIC engineer lives at: http://www.sunburst-design.com/papers/ All you ever needed to know about async. FIFO design is in the paper: "Simulation and Synthesis Techniques for Asynchronous FIFO Design" There is also a follow-up paper which discusses using direct (unsynchronised) gray coded pointer comparison (I have not really looked at this one; but apparently a benefit of the discussed style is faster FPGA performance...) Ok, let me offer my two pennies on the metastability/cross clock domain stuff. In my experience, metastability is not really an issue. Sure, if your data into your flip flop transitions during the setup/hold window then the FF can become metastable. But what happens then? - well the output will be indeterminate for a time; but in todays process geometries (180nm and smaller) this 'metastable time' is exceptionally short. (Ok, so that is not *quite* true - the time taken for the output of the flop to become stable again is of course not fixed, it varies with a certain probability of being stable at any given time after the clock edge - but in reality it is going to be very short). I have seen a good paper on this, the author actually measures the time taken for the FF outputs to stabilise (on .18u process) but frustratingly can't remember where to find it! To sum up, using the age old solution of double-registering the input signal before using it if it is coming from another clock domain will be ample to stop metastability propagation to the rest of the system. The big issue with multiple clock domains is not metastability, but passing signals across clock domains, and how to do it without causing "synchronisation hazards". A synchronisation hazard is best illustrated with an example. Lets say we have two clock domains clk1 and clk2 and we want to pass some kind of information across the domain. Say we have signals A and B which are generated in domain clk1 and contain information for kicking off a state machine in domain 2 (A), as well as some kind of mode-state info in domain 2 (B). We assume some handshaking goes on so that signal start_machine is only kept high until clock domain 2 state machine has started. reg A; reg B; reg A_meta; reg B_meta; reg A_sync; reg B_sync; . . . always @(posedge clk1 or negedge reset_n) begin if(!reset_n) begin // async reset of registers... A <= 0; B <= 0; end else begin A <= start_machine; B <= start_machine & machine_mode; end end // re-sync always @(posedge clk2 or negedge reset_n) begin if(!reset_n) begin // async reset of registers... A_meta <= 0; B_meta <= 0; A_sync <= 0; B_sync <= 0; end else begin A_sync <= A_meta; A_meta <= A; B_sync <= B_meta; B_meta <= B; end end // State machine block... reg next_state; always @(this_state or A_sync or B_sync or ...) begin next_state = this_state; // default case if (this_state == `cIDLE) if (A_sync) begin if (B_sync) begin next_state = `cSTART_MACHINE_MODE1; end else begin next_state = `cSTART_MACHINE_MODE0; end end else ... . . . end So the "synchronisation hazard" with this code is that if A and B change simulatneously in domain1 during the setup/hold window of the re-sync ffs A_meta and B_meta; then either A_meta and B_meta may propagate their signal a cycle 'late'. This is not an issue if A_sync propagates late as the state machine starting will just be delayed by one cycle, however if B_sync propagates late, then the state machine will go to the wrong state. The way to work around this problem is to either accept some latency and give an extra ff to synchronising A_sync so it always changes after B_sync; or to pass the gray coded signal across the domain instead. Unfortunately, most of the time in practice (in an FPGA or ASIC) this code will likely work (depending on how it has been synthesized). And unfortunately, in simulation it will *always* work making these kinds of bugs a real problem to find. Perhaps that wasn't the simplest example - but it does illustrate what I am getting at. One must be *very* careful about _when_ signals are passed across, given how they will be interpreted at the recieveing end. NOTE a good way to handshake information back and forward across clock domains is use a 'toggle' system. Basically the 'sender and 'reciever' domains both keep a flag register. These registers are normally both equal (either both zero or both one). When the sender wants to signal an event to the reciever it flips the state of its flag. The reciever sees this flip (it sees the XOR of its copy of the flag and the re-synchronised version of the sender's flag) and to acknowledge it has recieved the event it toggles its copy of the flag. The reciever 'recieves' the acknowledge when it sees both copies of the flag are once again equal. Phew, that took a few more words than I expected! I hope it made some vague sense, and was useful to some (I'm sure it was probably old news to others who have worked on multi-clock-domain designs before). Good luck with this project - count me in for a board when they get to production! And just for the record I completely agree with Tim on the business side of things - the project must make money, and the RTL cannot be released util such time as it becomes economically sensible (e.g. after version #2 is out etc.) James. On 6/11/05, Timothy Miller <[EMAIL PROTECTED]> wrote: > On 6/11/05, Attila Kinali <[EMAIL PROTECTED]> wrote: > > On Sat, 11 Jun 2005 09:53:10 -0400 > > Timothy Miller <[EMAIL PROTECTED]> wrote: > > > > > > > You're right that this is an issue, but the only things ever passed > > > across are fifo heads and tails. This does cause latency, but the > > > only one that we could improve is the first write to the fifo. All > > > subsequent ones are going to have 'gaps' in them where the data seems > > > to change in jumps. > > > > > > How about using an req/ack system ? > > > [...] > > > This is kinda like what I posted, except that it adjusts timing based > > > on input data. I didn't bother, because I consider the input data > > > change to be random also. Perhaps we could detect non-change and halt > > > the ring until we detect change. > > [...] > > > There's always room for improvement. Here's the real problem: Our > > > usual fifo is only 16 entries. The latency of the sync block can be > > > quite high. Let's say it's nine. That means we fill an empty fifo to > > > 16 entries, then wait a lot for the tail to propagate to the receiver. > > > The receiver then starts draining, then we have to wait another nine > > > cycles for the head to get back to the sender. That makes a 16-entry > > > fifo very inefficient. Two solutions include finding a better sync > > > block and increasing the fifo size to 32 entries. Your solution of > > > controlling the ring based on change only helps for the first write to > > > the fifo. > > > > Hmm.. i'm lacking some information here. > > What do you want to pass here ? And between which clock domains ? > > And which FIFO are we talking about? > > (alternatively you can point me to some discussion i've missed) > > Ok, we want to communicate data between clock domains. In order to > streamline things, we like to use fifos for buffering, etc. It turns > out that everything we want to pass between clock domains is exactly > the sort of thing we'd like to buffer in fifos. Communicating a > single piece of data between domains is problematic because of > metastability issues, so we need to do things like hold data constant > in one domain before clocking it in the other, which means we cannot > pass data quickly or stream it without data loss. The solution to > this is to queue data in a RAM and just sync up the head and tail > pointers. Since there's a delay on those numbers, the writer always > thinks there's less free space and the reader always thinks there are > fewer available entries than there really are, which adds latency but > does not suffer from any data loss in the actual data being streamed. > > The sorts of things that need to cross clock domains include things > like stuff going to/from the PCI controller, or between the video > controller and memory, etc. Our four clock domains are the host > interface, memory, GPU engine, and video. All of the stuff going back > and forth between them is streaming data that goes into fifos anyhow. > > Oh, that reminds me... I'm not going to try to fix the metastability > problem for engine register readsbacks. Those readbacks are for > debugging only. Either wait until it's idle, or single-step the > engine (assuming I implement that capability). > > Sorry for the rambling. Long day, need nap. :) > > _______________________________________________ > Open-graphics mailing list > [email protected] > http://lists.duskglow.com/mailman/listinfo/open-graphics > List service provided by Duskglow Consulting, LLC (www.duskglow.com) > _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
