Hi Alessandro and all,

Compiler optimizations can be helpful when tuning code for good 
performance, but I am surprised that you see anything like a 2x 
improvement in decoding speed.

The following table shows the results of a series of tests I made today 
for the decoder (the executable program jt9) running on files like the 
ones you created (01.wav, 02.wav, ... 10.wav, all copies of the example 
file 130610_2343.wav).  The first column lists the Fortran compiler 
flags used; the numerical column gives the total execution time (wall 
clock) for processing the ten files.

FFLAGS                                   Time
------------------------------------------------
-O0 -fbounds-check                       42.4
-O1 -fbounds-check                       22.8
-Os -fbounds-check                       22.8
-O2 -fbounds-check -funroll-all-loops    20.4 *
-O2 -fbounds-check                       20.2
-O3 -fbounds-check                       19.8
-Ofast -fbounds-check                    18.9
-O2                                      18.4
-O2 -mtune=native                        18.4
-O2 -funroll-all-loops                   18.2
-O3                                      18.0
-Ofast                                   17.8
------------------------------------------------
* Used in the release builds of WSJT-X

As you can see, "-mtune=native" made essentially no difference.  The 
biggest improvement in execution performance (over the default Release 
build) is gained by turning off bounds-checking.  A slight additional 
improvement is obtained by using -O3 or -Ofast rather than -O2. 
However, the total available improvement is less than 15%.

Obviously, such tests will give different results on different machines. 
  Those described above were done on a machine with a Core2 Duo E6750 
CPU, 2.66 Ghz.  Here is a similar set of results for a Windows machine 
(Core i5-2500, 3.3 GHz):

FFLAGS                                   Time
------------------------------------------------
-O0 -fbounds-check                       28.5
-O1 -fbounds-check                       18.2
-O2 -fbounds-check -funroll-all-loops    16.6 *
-O2 -fbounds-check                       16.2
-O3 -fbounds-check                       16.2
-Ofast                                   15.7
-O3 -m32 -msse -funroll-all-loops        15.4
-O3 -mtune=core2                         15.1
-O3 -m32 -msse                           15.0
-O3 -mtune=native                        15.0
------------------------------------------------
* Used in the release builds of WSJT-X for Windows

The flags we're currently using for Windows Release builds give results 
within about 10% of the best one listed.

One way to look at all of this is that the most important optimizations 
are those that have already been done, by the programmer.  These include 
making the best possible choices of data structures, algorithms, loop 
ordering, etc., etc.

        -- 73, Joe, K1JT

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Reply via email to