Re: [Bug target/33431] New: [SH4] performance regression between 3.4.6 and 4.x

2007-09-17 Thread Andrew STUBBS

nbkolchin at gmail dot com wrote:

Our target hardware has SH7750 processor running in little endian mode under
RTEMS. Unfortunetaly there is no way to boot linux there.

After lurking inside backend sources, I found that m4 has several variants in
GCC 4.x: m4-100, m4-200, etc. I've tried to compile this tests with m4-200
switch, but it looks like m4-200 enforces big-endian.


The 7750 has direct mapped caches and so is not the best platform for 
benchmarking. A slight code perturbation can give a large change in 
performance. :(


The m4-200 option is NOT suitable for that target. The 7750 is a 100 
series core (not that that was a nomenclature that existed when it came 
out). As far as I know, anybody that has a 200 series or above has an 
official ST toolset to go with it (GCC of course).


Andrew


[Bug target/33431] New: [SH4] performance regression between 3.4.6 and 4.x

2007-09-14 Thread nbkolchin at gmail dot com
I've found serious performance regression between GCC version 3.4.6 and
4.2/4.3.

SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark

   GCC: 3.4.6   4.2.1   4.3.0 (20070907)
 Composite:  6.055.014.82
   FFT:  4.904.154.21
   SOR: 10.108.367.64
MonteCarlo:  3.683.063.04
Sparse matmult:  5.454.454.03
LU:  6.105.035.18


BYTEmark* Native Mode Benchmark ver. 2 (10/95)

 GCC:  3.4.6  4.2.1  4.3.0 (20070907)
NUMERIC SORT: 35.459   32.2  29.327
 STRING SORT: 0.59430.57604  0.8603
BITFIELD: 1.0585e+07  9.269e+06  9.4138e+06
FP EMULATION: 4.4944 4.6012   5.364
 FOURIER: 272.28 241.34  259.12
  ASSIGNMENT:0.359970.38373 0.39683
IDEA: 124.11 95.057  100.07
 HUFFMAN: 45.593 52.083  56.391
  NEURAL NET:0.361530.30922 0.31348
LU DECOMPOSITION: 11.331 9.4938   8.255


The real world application has 20%-200% performance regression with GCC 4.x.

All tests were compiled with this arguments:
 -O3 -ffast-math -fomit-frame-pointer -funroll-loops -ftracer
 -funit-at-a-time
 -m4 -ml

This arguments were tuned for the best results under 3.4.6. I've played with
various settings under 4.x, but can't achieve any performance improvement.

I can rerun them with any key combination you want.

This tests compilable under Linux can be downloaded from:
- scimark: http://oktetlabs.ru/~snob/scimark.tgz
- nbench: http://oktetlabs.ru/~snob/nbench.tgz

I can attach this files to bugreport if this is acceptable and will not pollute
bugzilla.

Our target hardware has SH7750 processor running in little endian mode under
RTEMS. Unfortunetaly there is no way to boot linux there.

Can I ask you to run this tests under linux-sh? At least scimark one.

After lurking inside backend sources, I found that m4 has several variants in
GCC 4.x: m4-100, m4-200, etc. I've tried to compile this tests with m4-200
switch, but it looks like m4-200 enforces big-endian.

Backend sources show, that there is a lot of work going on SH4 GCC part.

I also wrote simple stupid tests to compare code generation between different
compiler versions (I can mail/attach them to you, but they are really stupid)
to
understand what can cause such performance regression. But generated assembler
is really different across versions. I can found only two obvious things:
- GCC4 has a much more aggressive inline and loop unrolling. (-funroll-loops
  was dropped from compiler arguments with no positive result)
- GCC4 has different command scheduling, which probably leads to performance
  regression.


-- 
   Summary: [SH4] performance regression between 3.4.6 and 4.x
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: nbkolchin at gmail dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: sh-unknown-rtemself


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431