You might want to configure your mailer to automatically break lines
at column 75.

On 8/14/07, Farhan Mohamed Ali <[EMAIL PROTECTED]> wrote:
> Speaking of multipliers, i was wondering what speed we are targeting for the 
> floatmult25 or the entire FPGA in general? I have managed to get a 3 stage 
> version working at <9ns according to the tools (targeted for 3S1500 but i'm 
> not sure how accurate the auto generated timing constraints are. Seems a bit 
> quirky to me, it does not respond as i expect to changes i make (as in, why 
> do changes i make in stage1 affect the critical path which is in stage2, and 
> weird stuff like that). I'm used to working on full custom ASICs where i 
> control everything, so i don't find the synthesizer to be very intuitive :\
>

It's trying to do the P&R automatically, and it's using simulated
annealing to do it.  It's an optimization problem using randomization.
 So to begin with, what you get isn't deterministic.  But since
there's competition for resources, changing something in one place
will affect everything else.  It can be frustrating sometimes.  We
find that when we're on the edge of being able to meet timing, we'll
have to run P&R several times before it gives us what we want.

> Back to the XP10, if it doesn't have hard multipliers, we can make our own :) 
> But again i'm not sure how well that works out for FPGAs.
>

Yeah.  It's not worth the extra logic to fully pipeline it (nor could
we keep a 32-stage multiplier pipeline fully fed).  We could have
separate logic that would run in parallel.  Or we could have special
mult-stepping instructions.  With the latter, partial multiplies can
be optimized to take fewer cycles.

Here's what I'm thinking....

If we had a stand-alone multstep instruction, it would need four
operands:  (1) an accumulator, (2) the mutiplicand, (3) one bit from
the multiplier to determine whether or not the multiplicand is added
to the accumulator, and (4) a loop counter from which to compute the
multiplicand left-shift and which bit to take from the multiplier.

Now, I don't like the idea of adding extra state.  What if we want to
add the ability to handle interrupts?  But we can tinker with the
idea:  Have one special instruction whose job is to load the counter
and the multiplier.  The step instruction would have the accumulator
(as a source and the target) and the muliplicand.  Each step would
step the counter, shift the multiplier, and add (or not, depending on
the bit from the multiplier) the shifted multiplicand to the
accumulator.  That puts a shifter in line with an adder, though, so
maybe we want to load the multiplier and multiplicand in the first
instruction (so they're shifted 1 each cycle) and then specify the
counter and accumulator in the step?  We'll have to work out the
permutations.

BTW, the only reason to do this is because without it, a multiply
would take at least 4 times longer due to the overhead of explicit
shifts and branches.  Do we care?

-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to