I didn't synthesize FPU separately. I think that separate design digits are useful only if we are going to build a macro-cell. Otherwise, most part of data delay is introduced not by logic but by routing. For example, the longest FPU related way reported by post PAR static timing is:

Source:                  clkgen0/wb_rst_shr_15 (FF)
Destination:          mor1kx0/mor1kx_cpu/cappuccino.mor1kx_cpu/
                             
mor1kx_execute_alu/fpu_enabled_in_execute_alu.fpu_arith/
                             fpu_post_norm_mul/s_frac2a_34 (FF)
Requirement:        20.000ns
Data Path Delay:   18.718ns (Levels of Logic = 1)
...
Total                      18.718ns (1.126ns logic, 17.592ns route)
                              (6.0% logic, 94.0% route)

94% are routing! But constrain (50 MHz) is satisfied, so router haven't to do anything more.
By the way the 50 MHz wasn't set by me. It is default value for Atlys SoC.

About cycling. As the FPU is just VHDL->Verilog conversion of FPU100 (http://opencores.org/project,fpu100), the cycling is equal to original design:
 Add/Sub: 7
 Mul with serial implementation: 35 (now implemented for OR1200 and mor1kx)
Mul with 'parallel' implementation: 12 (implemented for original project only)
 Div: 35 (serial implementation only)

By the way the original design was able to run on 100MHz in case of synthesized alone for Cyclone I–EP1C6Q240C.
The Altera Quartus II v.5 reported the following number of logic elements:
 Addition unit:          684
Multiplication unit: 1530 (!!! parallel !!! The serial one implemented for OR1K should be smaller.)
 Division unit:           928
 Square-root unit:     919 (not ported to OR1K)
 Top unit:                  326
 _______________________________
 Total:                       4387

I don't see any reason that converted FPU itself should be slower .

Andrey


-----Исходное сообщение----- From: Sébastien Bourdeauducq
Sent: Sunday, September 14, 2014 6:57 PM
To: [email protected]
Subject: Re: [OpenRISC] Porting FPU from OpenRISC-1200tomor1kx-cappuccinopipeline

Cool! What are the area and frequency? How many cycles do the floating
point operations take?

Sébastien


On 09/14/2014 06:35 PM, BAndViG wrote:
Almost all bugs are fixed. The only 'testfloat' to hardware difference
left is generation underflow flag.
The latest commit (4f9d080 of 14-sep-2014) to upstream (openrisc/mor1kx)
is merged into my FPU branch
(https://github.com/bandvig/mor1kx/tree/withfpu). The resulted state
(commit 1315933) has been tagged by 'fpu32_v1.0' label.

WBR
Andrey

_______________________________________________
OpenRISC mailing list
[email protected]
http://lists.openrisc.net/listinfo/openrisc
_______________________________________________
OpenRISC mailing list
[email protected]
http://lists.openrisc.net/listinfo/openrisc

Reply via email to