Re: [Open-graphics] Safely passing data between clock domains

James Adams Sun, 12 Jun 2005 06:57:45 -0700

Hi All,

About me: 
I work in Cambridge, UK, for a small firm which designs DSPs. I have a
fair amount of RTL experience - especially with asynchronous memory
interface design.

I have been lurking with intent on this list for a while ;-) I have
little free time to help develop but perhaps can offer advice now and
then.

Anyway, an excellent resource for any ASIC engineer lives at: 
http://www.sunburst-design.com/papers/
All you ever needed to know about async. FIFO design is in the paper:  
"Simulation and Synthesis Techniques for Asynchronous FIFO Design"
There is also a follow-up paper which discusses using direct
(unsynchronised) gray coded pointer comparison (I have not really
looked at this one; but apparently a benefit of the discussed style is
faster FPGA performance...)

Ok, let me offer my two pennies on the metastability/cross clock domain stuff.

In my experience, metastability is not really an issue. 
Sure, if your data into your flip flop transitions during the
setup/hold window then the FF can become metastable. But what happens
then? - well the output will be indeterminate for a time; but in
todays process geometries (180nm and smaller) this 'metastable time'
is exceptionally short. (Ok, so that is not *quite* true - the time
taken for the output of the flop to become stable again is of course
not fixed, it varies with a certain probability of being stable at any
given time after the clock edge - but in reality it is going to be
very short). I have seen a good paper on this, the author actually
measures the time taken for the FF outputs to stabilise (on .18u
process) but frustratingly can't remember where to find it!
To sum up, using the age old solution of double-registering the input
signal before using it if it is coming from another clock domain will
be ample to stop metastability propagation to the rest of the system.

The big issue with multiple clock domains is not metastability, but
passing signals across clock domains, and how to do it without causing
"synchronisation hazards". A synchronisation hazard is best
illustrated with an example.

Lets say we have two clock domains clk1 and clk2 and we want to pass
some kind of information across the domain. Say we have signals A and
B which are generated in domain clk1 and contain information for
kicking off a state machine in domain 2 (A), as well as some kind of
mode-state info in domain 2 (B).
We assume some handshaking goes on so that signal start_machine is
only kept high until clock domain 2 state machine has started.

reg A;
reg B;
reg A_meta;
reg B_meta;
reg A_sync;
reg B_sync;
.
.
.
always @(posedge clk1 or negedge reset_n) begin
  if(!reset_n) begin
    // async reset of registers...    
    A <= 0;
    B <= 0;
  end else begin
    A <= start_machine;
    B <= start_machine & machine_mode;
  end
end

// re-sync
always @(posedge clk2 or negedge reset_n) begin
  if(!reset_n) begin
    // async reset of registers...
    A_meta <= 0;
    B_meta <= 0;
    A_sync <= 0;
    B_sync <= 0;    
  end else begin
    A_sync <= A_meta;
    A_meta <= A;
    B_sync <= B_meta;
    B_meta <= B;
  end
end

// State machine block...
reg next_state;
always @(this_state or A_sync or B_sync or ...) begin
  next_state = this_state; // default case
  if (this_state == `cIDLE)
    if (A_sync) begin
      if (B_sync) begin
        next_state = `cSTART_MACHINE_MODE1;
      end else begin
        next_state = `cSTART_MACHINE_MODE0;
      end
  end else ...
  .
  . 
  .
end

So the "synchronisation hazard" with this code is that if A and B
change simulatneously in domain1 during the setup/hold window of the
re-sync ffs A_meta and B_meta; then either A_meta and B_meta may
propagate their signal a cycle 'late'. This is not an issue if A_sync
propagates late as the state machine starting will just be delayed by
one cycle, however if B_sync propagates late, then the state machine
will go to the wrong state.

The way to work around this problem is to either accept some latency
and give an extra ff to synchronising A_sync so it always changes
after B_sync; or to pass the gray coded signal across the domain
instead.

Unfortunately, most of the time in practice (in an FPGA or ASIC) this
code will likely work (depending on how it has been synthesized). And
unfortunately, in simulation it will *always* work making these kinds
of bugs a real problem to find.

Perhaps that wasn't the simplest example - but it does illustrate what
I am getting at. One must be *very* careful about _when_ signals are
passed across, given how they will be interpreted at the recieveing
end.

NOTE a good way to handshake information back and forward across clock
domains is use a 'toggle' system. Basically the 'sender and 'reciever'
domains both keep a flag register. These registers are normally both
equal (either both zero or both one). When the sender wants to signal
an event to the reciever it flips the state of its flag. The reciever
sees this flip (it sees the XOR of its copy of the flag and the
re-synchronised version of the sender's flag) and to acknowledge it
has recieved the event it toggles its copy of the flag. The reciever
'recieves' the acknowledge when it sees both copies of the flag are
once again equal.

Phew, that took a few more words than I expected! I hope it made some
vague sense, and was useful to some (I'm sure it was probably old news
to others who have worked on multi-clock-domain designs before).

Good luck with this project - count me in for a board when they get to
production! And just for the record I completely agree with Tim on the
business side of things - the project must make money, and the RTL
cannot be released util such time as it becomes economically sensible
(e.g. after version #2 is out etc.)

James.

On 6/11/05, Timothy Miller <[EMAIL PROTECTED]> wrote:
> On 6/11/05, Attila Kinali <[EMAIL PROTECTED]> wrote:
> > On Sat, 11 Jun 2005 09:53:10 -0400
> > Timothy Miller <[EMAIL PROTECTED]> wrote:
> >
> >
> > > You're right that this is an issue, but the only things ever passed
> > > across are fifo heads and tails.  This does cause latency, but the
> > > only one that we could improve is the first write to the fifo.  All
> > > subsequent ones are going to have 'gaps' in them where the data seems
> > > to change in jumps.
> >
> > > > How about using an req/ack system ?
> > > [...]
> > > This is kinda like what I posted, except that it adjusts timing based
> > > on input data.  I didn't bother, because I consider the input data
> > > change to be random also.  Perhaps we could detect non-change and halt
> > > the ring until we detect change.
> > [...]
> > > There's always room for improvement.  Here's the real problem:  Our
> > > usual fifo is only 16 entries.  The latency of the sync block can be
> > > quite high.  Let's say it's nine.  That means we fill an empty fifo to
> > > 16 entries, then wait a lot for the tail to propagate to the receiver.
> > >  The receiver then starts draining, then we have to wait another nine
> > > cycles for the head to get back to the sender.  That makes a 16-entry
> > > fifo very inefficient.  Two solutions include finding a better sync
> > > block and increasing the fifo size to 32 entries.  Your solution of
> > > controlling the ring based on change only helps for the first write to
> > > the fifo.
> >
> > Hmm.. i'm lacking some information here.
> > What do you want to pass here ? And between which clock domains ?
> > And which FIFO are we talking about?
> > (alternatively you can point me to some discussion i've missed)
> 
> Ok, we want to communicate data between clock domains.  In order to
> streamline things, we like to use fifos for buffering, etc.  It turns
> out that everything we want to pass between clock domains is exactly
> the sort of thing we'd like to buffer in fifos.  Communicating a
> single piece of data between domains is problematic because of
> metastability issues, so we need to do things like hold data constant
> in one domain before clocking it in the other, which means we cannot
> pass data quickly or stream it without data loss.  The solution to
> this is to queue data in a RAM and just sync up the head and tail
> pointers.  Since there's a delay on those numbers, the writer always
> thinks there's less free space and the reader always thinks there are
> fewer available entries than there really are, which adds latency but
> does not suffer from any data loss in the actual data being streamed.
> 
> The sorts of things that need to cross clock domains include things
> like stuff going to/from the PCI controller, or between the video
> controller and memory, etc.  Our four clock domains are the host
> interface, memory, GPU engine, and video.  All of the stuff going back
> and forth between them is streaming data that goes into fifos anyhow.
> 
> Oh, that reminds me... I'm not going to try to fix the metastability
> problem for engine register readsbacks.  Those readbacks are for
> debugging only.  Either wait until it's idle, or single-step the
> engine (assuming I implement that capability).
> 
> Sorry for the rambling.  Long day, need nap.  :)
> 
> _______________________________________________
> Open-graphics mailing list
> [email protected]
> http://lists.duskglow.com/mailman/listinfo/open-graphics
> List service provided by Duskglow Consulting, LLC (www.duskglow.com)
>

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Safely passing data between clock domains

Reply via email to