Speaking of multipliers, i was wondering what speed we are targeting for the floatmult25 or the entire FPGA in general? I have managed to get a 3 stage version working at <9ns according to the tools (targeted for 3S1500 but i'm not sure how accurate the auto generated timing constraints are. Seems a bit quirky to me, it does not respond as i expect to changes i make (as in, why do changes i make in stage1 affect the critical path which is in stage2, and weird stuff like that). I'm used to working on full custom ASICs where i control everything, so i don't find the synthesizer to be very intuitive :\
Back to the XP10, if it doesn't have hard multipliers, we can make our own :) But again i'm not sure how well that works out for FPGAs. On Mon, August 13, 2007 5:13 pm, Timothy Normand Miller said: > On 8/13/07, Mark <[EMAIL PROTECTED]> wrote: >> Timothy Normand Miller wrote: >>> instructions. Also, in response to his question, I'm targetting the >>> 3S4000 because it's convenient. In the real design, we'll target the >>> XP10, which is a little slower. Either way, this tells us basically >>> what we need to know. >>> >> Doesn't that make this whole discussion moot? The XP10 doesn't have >> hard multipliers, as near as I can tell. Regardless, the architecture >> and timing aren't necessarily even remotely similar (well, sure, >> they're both island-style FPGAs using 4-LUTs... but that's still a big >> design space). > > Ugh. You're right. I thought it had dedicated multipliers, but Howard > can't find any reference to that in the spec. > > There's no sense in trying to move the nanocontroller into the Xilinx, > because the nanocontroller's also responsibile for controlling DMA. > >> >> I'd be wary of putting to much stock in XST's timing estimates, anyhow. >> Until you've got post-PAR timing, don't bank on it. > > The timing numbers I provided are post-PAR, but as you say, regarding the > mulipliers, the point is moot. We need to rethink that whole thing. > > Should we do some early-SPARC-style multiplier stepping instructions? I'm > not sure we can without 4-operand instructions. > > Another option would be to switch to the out-of-band approach. Write > operand to the multiplier via the I/O space, and X clock cycles later, you > can grab the product. > >> Finally, I don't think you mentioned the speed grade and package you're >> using for the Spartan or the Lattice part. (Both are on the board, >> right? I'm going off >> http://wiki.duskglow.com/tiki-index.php?page=OGD1+components+guide.) > > For Xilinx, it's -5. I think we also picked the fastest XP10. > >> Your timing numbers will depend on that, too (even in XST's output, I >> believe). Have those aspects been specified yet? I mean, at least the >> package is presumably known. > > Well, I wanted a ballpark sense of what was worst in the design, and for > that I doubt it'll make a lot of difference which device we target. When > it comes down to shaving off the last few nanoseconds, then it'll matter a > lot. What we want is a controller that's reasonably efficient across > multiple architectures anyhow. > > -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open > Graphics Project _______________________________________________ > Open-graphics mailing list [email protected] > http://lists.duskglow.com/mailman/listinfo/open-graphics List service > provided by Duskglow Consulting, LLC (www.duskglow.com) > > _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
