The PCI core for OGP is a Moore state machine, which means that the outputs (PCI control signals, etc) are a function of the state of the machine. There is no direct connection between the inputs and the outputs. However, the NEXT state of the machine is affected by the inputs, and those inputs come in three varieties:
- The previous state - Inputs from other logic inside of the chip - Inputs from OUTSIDE of the chip It's the last set that's a serious problem. While the first two are "instantly" available from registers in the logic of the chip, the last group suffer LONG propagation delays from the logic in another device on the bus, the output driver in that device, the bus wires, and finally, the input buffers in our chip. This can subtract many precious nanoseconds away from the time we have to combine all the inputs in our logic to compute the next state. In order to do this, you need to move the logic that uses the slow inputs to as far to the end of the combinatorial logic as possible. Unfortunately, logic synthesizers aren't always so smart about rearranging your logic to take into account the added delay on those signals. Some are better than others, but the solution I found that works well is to carefully construct the logic so that the synthesizer has no choice about it. Consider the PCI target. For that, there are really only two slow inputs that matter: frame and irdy. Given those two signals, there are four possible combinations: irdy=0, frame=0 irdy=0, frame=1 irdy=1, frame=0 irdy=1, frame=1 So in my state machine, what I decided to do was generate four possible next states, one for each combination. If you look through older revisions of what I checked into SVN, you'll see variables like "next_state_f0i0". The structure of the logic computes all four (which does increase the size a bit), and then finally at the end muxes them together based on the slow inputs. This results in plenty of slack on the inputs. Sounds great, right? Until I decided to integrate in the Master logic. Now, in addition to irdy and frame, I have to pay attention to four more input signals: gnt, trdy, stop, devsel. Now we have a problem. If I were to continue the theme, I would have to generate 64 different possible next states for each state before muxing them at the end. Not only would it blow up the required amount of logic, but it would require so much repetitiveness in the logic that no one would be able to write it or debug it in any reasonable amount of time. There are something like 18 different states, times 64 possible next states for each, results in a hell of a lot of lines of code. As a result, I've dragged my feet a bit since the last time I worked on it. One solution I came up with was to generate a separate set of target and master states. This reduced the number of combinations down to 20 from 64 and even prevented me from having to write actual code for the set that wasn't going to be considered. But that's still too many for me to want to deal with. So, here's what dawned on me tonight: Of those 64 combinations, only a handful are meaningful. For instance, any time you're in a master state and gnt=1 (deasserted), no other signals matter. gnt=1 means that this is your last cycle as a master. As such, 32 of your 64 states (or 8 of your 16) are all the same thing and there's absolutely no reason to bother with any of them separately. So, here's the solution: Of all 64 combinations, go through and identify which ones are meaningful and which ones are meaningless or redundant. Add a thin layer of logic that reduces those combinations to a much smaller number. Now, the state machine only has to compute that number of possible "next states". (Whether those states are numbered in binary or one-hot is something to be determined as a result of trying to synthesize it, but I'm going to start with binary.) For completeness, since there's a dissociation between target and master states, there are two more inputs for selecing this number: idle and master-state. A target state is when neither is true. Hopefully, partitioning the logic this way will keep the distance between the slow inputs and the state registers to a minimum. Not partitioning, where inputs are considered directly in the next-state computation, would leave the slack time for slow inputs uncontrolled and severely limit the speed. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
