Hallo Peter:

First of all, thanks for taking time to help me with the multiplier.


> Ok.  Suppose you just have a single 32x32 to 64-bit multiplier.
> 
> To do l.mul, send both 32-bit values to the multiplier as is (not
> sign- or zero-extended).
> Then the bottom 32 bits get written to the register file.
> Then look at the top 32 bits.   If the top bits are all zeros *or* the
> top bits are all ones, clear the OV flag.  Otherwise set it.
> If *any* of the top 32 bits are ones, set the CY flag.
> 
> To do l.mulu, again, send both 32-bit values to the multiplier without
> extending. Write the bottom 32 bits to the register file.
> Clear the OV flag. If *any* of the top 32 bits are ones, set the CY flag.


When you said that the multiplicators for signed and unsigned are the same, you 
probably meant that the lower half of the result is the same, didn't you? I 
just tested it with Verilator and got the following results:

0xfffffffe * 0xfffffffd =  (-2 * -3)
  0x00000000:0x00000006  (signed multiplication)
  0xfffffff6:0x00000006  (unsigned multiplication)

If I want to keep the full 64-bit results, I would need 2 multipliers (signed 
and unsigned), that's why I wanted to sign-extend and do a 33x33->66 bits 
multiplication. That way, I would still use the FPGA's built-in hardware 
multipliers, but only one group of them, as a signed multiplier could do both 
signed and unsigned multiplications.

The only thing that's keeping me from using this solution at the moment is the 
Carry flag for the signed multiplication instructions (l.mul, l.muli). If we 
were to drop that flag, which as discussed does not make sense for those 
instructions, it think that solution would be optimal.

Dropping the Carry flag for l.mul and l.muli (but not for l.mulu) shouldn't be 
a big issue, as all OpenRISC implementations I know of either don't implement 
the Carry and Overflow flags, or do it wrong (at least different to or1ksim). 
But it does mean patching or1ksim and modifying the test suite.

Like I said before, I could use an unsigned 32x32->64 multiplier instead of a 
signed one for both cases, but then I would have to manage the signs manually 
for the signed instructions, which means doing 2's complement conversion 3 
times (2 operands and 1 result). Such conversions involve a "+ 1"  addition (or 
a similar operation), which has to propagate through all the bits and consume 
extra time and extra FPGA resources. I hope you understand what I'm trying to 
achieve, or maybe I got it wrong and missed some obvious optimisation somewhere.


> My intent was to use the same datapath for both regular multiply and
> for MAC.  If you reuse the MACHI and MACLO registers for this purpose,
> you don't add (much) additional hardware.  If you use a 32x32->64-bit
> multiplier you won't need any other resources.

The 2 MAC instructions do a signed multiplication, but I would still like to 
reuse the single multiplier for unsigned integers and get the full unsigned 
64-bit results.


Thanks again,
  rdiez

_______________________________________________
OpenRISC mailing list
[email protected]
http://lists.openrisc.net/listinfo/openrisc

Reply via email to