[OpenRISC] Piplined FPU

BAndViG Sat, 25 Oct 2014 05:22:29 -0700

The pipelined nature of FPU-100 is restored with fresh commit intohttps://github.com/openrisc/mor1kx/tree/withfpu .


Actually it isn't completely pipelined because it doesn't implement
- pipelined division,
- intermediate registers for PC and destination identifier,
- etc.

In fact, in the cappuccino-pipe environment the FPU operates innon-pipelined mode (the pipe stalls till FPU rises ready flag). So, it isjust initial point for further development as FPU itself as new and moreefficient pipeline (perhaps similar to BA25).

Opposite to non-pipelined version, the pipelined variant includes two stagemultiplier for fractional parts. The 24x24 bits multiplier is sectioned on 4multipliers 13x13 (1st stage) and adder (2nd stage). That allows tosynthesis the module with involving a built-in FPGA DSP cells. Now, themultiplying consumes 6 clocks (original FPU100 takes 12 or 35 clocks forparallel or serial implementation accordingly).

Additionally, instead of using a counter as operation complete flag thedirect propagation (through pipeline) of ready signal is implemented. Theapproach is removed extra delays presented (legacy from OpenRISC-1200design) in non-pipelined variant of FPU .


The intermediate benchmarking versus with previous variant.

The previous variant (let me repeat):

case #2: -mhard-float, fpu32_v1.0:
         Single Precision C/C++ Whetstone Benchmark

Loop content                  MFLOPS   MOPS   Seconds

N1 floating point                 2.400              0.008
N2 floating point                 2.240              0.060
N3 if then else                               3.450    0.030
N4 fixed point                                3.938    0.080
N5 sin,cos etc.                               0.019    4.300
N6 floating point                 1.199              0.450
N7 assignments                            1.680    0.110
N8 exp,sqrt etc.                             0.009    4.300

MWIPS                                1.071              9.338


The new one:

         Single Precision C/C++ Whetstone Benchmark

Loop content                  MFLOPS   MOPS   Seconds

N1 floating point                4.800              0.004
N2 floating point                3.360              0.040
N3 if then else                             3.450    0.030
N4 fixed point                              4.500    0.070
N5 sin,cos etc.                             0.019    4.300
N6 floating point                1.635              0.330
N7 assignments                           1.680    0.110
N8 exp,sqrt etc.                            0.009    4.300

MWIPS                                1.089              9.184


To activate the pipelined FPU:
   - add the following lines into parameter list of mor1kx unit instance:
           .FEATURE_FPU("ENABLED")

.FEATURE_PIPELINED_FPU("ENABLED") // makes sense only ifFEATURE_FPU==ENABLED

 - add into project all files from "pfpu32" folder

Andrey

_______________________________________________
OpenRISC mailing list
[email protected]
http://lists.openrisc.net/listinfo/openrisc

[OpenRISC] Piplined FPU

Reply via email to