On 14 Aug 2007, at 18:27, Timothy Normand Miller wrote: Before I do that, let's discuss it. Also, this is really the sort of thing that should be discussed on the list, because others need to read it too. Please bring this discussion back to the list.
Done. Sorry. :) One of the design rules of thumb is to register all of your outputs. That is, to the maximum extent possible, outputs from blocks should have zero or maybe one level of logic to the port. This is done mostly as an aid to the designer -- since most of your combinatorial logic is from inputs to registers, you can more easily keep track of the prop delays you'll be dealing with.
There's some terminology here which I hopefully correctly understand; you mean with register x_o and y_o, and port the actual output of stage2 to the next stage? And because I glued the multiplier to those registers I'm adding this logic *after* the result has been stored in the register?
input -(a)--> x_o -(b)--> output I want to connect the multiplier in phase a, not b, but did b anyhow? What should I change to connect to a?
Another rule of thumb, particularly for a pipeline, is to have the logic for a given stage to sit in the stage it belongs to. You put oga1hq_multiplier in the stage 2 module, but it's really happening in stage 3. Architecturally, it's just a big chunk of combinatorial logic on the output of a module, which is something we want to avoid.
Rule of thumb != law. But I get your point. So, what you've done in your design, if I understand it correctly, is put a combinatorial multiplier in stage 2 connected straight to some adders you put in stage 3. But there's nothing magical about the module hierarchy. It's just how we organized it. And the hierarchy doesn't insert registers; you have to do that yourself. So you're not actually pipelining anything. You still have one unbroken combinatorial chain through the multipler and the adders. This would be no faster than the earlier version.
My idea was to not have a straight connection, but have registers in between. I thought I did that after I defined m_o, but apparently not? An 'input' or 'output' doesn't linearly map to registers in verilog?
If you're thinking about inserting something between the "register file" and the registers on the output of stage 2, don't. The registers on the output of the file are actually built into the slices that implement the file, which is part of our efficiency there. You'd actually be switching from a synchronous RAM to an asynchronous one, and there's no way we could get that to run at speed.
So input maps directly on output, no buffer there at all? Odd, I thought VHDL did that all the time. But I'm lost here, what do you mean with "register file"? :) The original suggestion was to have registered multipliers in stage 3 and then do the adds in stage 4. The output of stage 4 would mux them all together on the way into write-back.
How is this different from your earlier rule of thumb? From my perspective, the mul and add together make up the operation. Wether we spread it out across stage 2 and 3, or 3 and 4 makes no difference, right? Could you try synthesizing attached version? I'm just playing around and don't see what's apparently wrong with my solution (besides that this might not help at all in the lattice due to the dedicated- multiplier-problem).
I'll try to get my own Webpack running shortly, but am curious about timing for this file. Hopefully I didn't introduce any compile errors.
Mike www.wacco.mveas.com
-- Timothy Normand Miller Open Graphics Project
|