The pipelined nature of FPU-100 is restored with fresh commit into https://github.com/openrisc/mor1kx/tree/withfpu .

Actually it isn't completely pipelined because it doesn't implement
- pipelined division,
- intermediate registers for PC and destination identifier,
- etc.

In fact, in the cappuccino-pipe environment the FPU operates in non-pipelined mode (the pipe stalls till FPU rises ready flag). So, it is just initial point for further development as FPU itself as new and more efficient pipeline (perhaps similar to BA25).

Opposite to non-pipelined version, the pipelined variant includes two stage multiplier for fractional parts. The 24x24 bits multiplier is sectioned on 4 multipliers 13x13 (1st stage) and adder (2nd stage). That allows to synthesis the module with involving a built-in FPGA DSP cells. Now, the multiplying consumes 6 clocks (original FPU100 takes 12 or 35 clocks for parallel or serial implementation accordingly).

Additionally, instead of using a counter as operation complete flag the direct propagation (through pipeline) of ready signal is implemented. The approach is removed extra delays presented (legacy from OpenRISC-1200 design) in non-pipelined variant of FPU .

The intermediate benchmarking versus with previous variant.

The previous variant (let me repeat):

case #2: -mhard-float, fpu32_v1.0:
         Single Precision C/C++ Whetstone Benchmark

Loop content                  MFLOPS   MOPS   Seconds

N1 floating point                 2.400              0.008
N2 floating point                 2.240              0.060
N3 if then else                               3.450    0.030
N4 fixed point                                3.938    0.080
N5 sin,cos etc.                               0.019    4.300
N6 floating point                 1.199              0.450
N7 assignments                            1.680    0.110
N8 exp,sqrt etc.                             0.009    4.300

MWIPS                                1.071              9.338


The new one:

         Single Precision C/C++ Whetstone Benchmark

Loop content                  MFLOPS   MOPS   Seconds

N1 floating point                4.800              0.004
N2 floating point                3.360              0.040
N3 if then else                             3.450    0.030
N4 fixed point                              4.500    0.070
N5 sin,cos etc.                             0.019    4.300
N6 floating point                1.635              0.330
N7 assignments                           1.680    0.110
N8 exp,sqrt etc.                            0.009    4.300

MWIPS                                1.089              9.184


To activate the pipelined FPU:
   - add the following lines into parameter list of mor1kx unit instance:
           .FEATURE_FPU("ENABLED")
.FEATURE_PIPELINED_FPU("ENABLED") // makes sense only if FEATURE_FPU==ENABLED
 - add into project all files from "pfpu32" folder

Andrey
_______________________________________________
OpenRISC mailing list
[email protected]
http://lists.openrisc.net/listinfo/openrisc

Reply via email to