[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #12 from PeteVine  ---
Nice, PR68664 patch has fixed the issue.

FWIW, unlike previously, running on a Cortex-A53, showed perfect alignment with
core type (-mfpu=vfpv3) on the first run:

Cortex-A8
Rendering took: 1 seconds (1801 milliseconds)

Cortex-A5
Rendering took: 1 seconds (1708 milliseconds)

Cortex-A7
Rendering took: 1 seconds (1699 milliseconds)

Cortex-A9
Rendering took: 1 seconds (1644 milliseconds)

Cortex-A15
Rendering took: 1 seconds (1637 milliseconds)

whereas using -mfpu=vfpv4 favours Cortex-A5 code's execution:

Cortex-A8
Rendering took: 1 seconds (1803 milliseconds)

Cortex-A5
Rendering took: 1 seconds (1506 milliseconds)

Cortex-A7
Rendering took: 1 seconds (1636 milliseconds)

Cortex-A9
Rendering took: 1 seconds (1645 milliseconds)

Cortex-A15
Rendering took: 1 seconds (1643 milliseconds)

but that's probably expected. Not sure about A8's codegen performance though.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #11 from PeteVine  ---
Super cool, thanks! That makes the OP a true prophet before his time ;)

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-30 Thread jgreenhalgh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #10 from James Greenhalgh  ---
(In reply to PeteVine from comment #9)
> @jgreenhalgh Please have a look at the profiled assembly for both fast and
> slow codegen. (attached)
> 
> According to @aldyh's bisection in #68664 this probably isn't the same issue.

In the attached code I once again see the vdiv moved before the branch in the
slow case.

Looking at the bisection is one way to triage a bug, but it points to a change
in scheduling model for Cortex-A53, and the analysis in this report indicates
that the same bad scheduling decision is made with the Cortex-A9 and Cortex-A15
scheduling models. If the scheduler is making bad decisions across a range of
models, it is (in my opinion) more instructive to look for the pattern shared
across those models and fix the scheduler than it is to tweak each scheduling
model individually to avoid the abnormal case here.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #9 from PeteVine  ---
@jgreenhalgh Please have a look at the profiled assembly for both fast and slow
codegen. (attached)

According to @aldyh's bisection in #68664 this probably isn't the same issue.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-25 Thread siarhei.siamashka at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #8 from Siarhei Siamashka  ---
Since my report predates bug 68664 by several years, shouldn't bug 68664 be a
duplicate? In addition, my report was much more detailed, since it also
provided a practical use case, showcasing the importance of this problem.

Also if I understand it correctly, you have still not fixed the issue. So
closing it seems to be a bit premature. I'll keep a watch on bug 68664 and will
be sure to reopen my bugreport in the case if the fix does not help on ARM
Cortex A9.

Thanks for generating some sort of activity anyway. It's surely better than
nothing.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-25 Thread jgreenhalgh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

James Greenhalgh  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #7 from James Greenhalgh  ---


*** This bug has been marked as a duplicate of bug 68664 ***

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2016-09-24 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #6 from PeteVine  ---
Testing different 32-bit codegen options in aarch32 mode on a Cortex A53, shows
A15 is probably also affected. Full comparison below:

$ for i in 8 5 7 9 15 ; do gcc -marm -Ofast -o c-ray-a$i c-ray-mt.c -lm 
-lpthread -mcpu=cortex-a$i; done
$ for i in 8 5 7 9 15 ; do echo Cortex-A$i  ; ./c-ray-a$i -t 32 -s 160x120 -r 8
-i sphfract -o output.ppm ; done

Cortex-A8
c-ray-mt v1.1
Rendering took: 1 seconds (1660 milliseconds)
Cortex-A5
c-ray-mt v1.1
Rendering took: 1 seconds (1638 milliseconds)
Cortex-A7
c-ray-mt v1.1
Rendering took: 1 seconds (1645 milliseconds)
Cortex-A9
c-ray-mt v1.1
Rendering took: 2 seconds (2027 milliseconds)
Cortex-A15
c-ray-mt v1.1
Rendering took: 1 seconds (1922 milliseconds)

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2016-09-19 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #5 from PeteVine  ---
Created attachment 39649
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39649=edit
Annotated ARMv7 assembly

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2016-09-19 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #4 from PeteVine  ---
I've just done the obvious and run the resulting ARMv7 binaries on a Cortex A53
 in aarch32 mode and the difference is there (GCC 6.2.1 and 7.0.0) so I can
confirm the issue is present to this day.

Cortex-A5 vs Cortex-A9 codegen yields a 0.81x performance ratio.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2016-09-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #3 from PeteVine  ---
Curiously, up to gcc 6, targeting Cortex-A5 made virtually no difference, but
in gcc 7, generic codegen takes an 8% hit while -mcpu=cortex-a5 produces
roughly the same performance as before. (but that's a different issue so FWIW)

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2016-09-03 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

PeteVine  changed:

   What|Removed |Added

 CC||tulipawn at gmail dot com

--- Comment #2 from PeteVine  ---
Even though I've tested this on a Cortex-A5, the 18% difference does reproduce
on gcc 6.1.1 (2694 vs 3304 ms):

First, the slower profile for A9 codegen:

CPU: ARM Cortex-A5, speed 1.728e+06 MHz (estimated)
Counted CPU_CYCLES events (CPU cycle) with a unit mask of 0x00 (No unit mask)
count 90
samples  %linenr info image name   symbol
name

1446065.7422  c-ray-mt.c:377  c-ray-mt shade
3901 17.7358  c-ray-mt.c:336  c-ray-mt trace
3181 14.4624  c-ray-mt.c:308  c-ray-mt
render_scanline
186   0.8456  e_pow.c:70  libm-2.19.so
__pow_finite
680.3092  e_exp.c:240 libm-2.19.so __exp1
550.2501  (no location information)   no-vmlinux  
/no-vmlinux
400.1819  c-ray-mt.c:454  c-ray-mt
get_primary_ray
380.1728  c-ray-mt.c:497  c-ray-mt
get_sample_pos
170.0773  fraiseexcpt.c:27libm-2.19.so
feraiseexcept
130.0591  e_pow.c:430 libm-2.19.so checkint
6 0.0273  fesetround.c:31 libm-2.19.so
fesetround
5 0.0227  fputc.c:37  libc-2.19.so fputc
5 0.0227  feupdateenv.c:27libm-2.19.so
feupdateenv@@GLIBC_2.4
4 0.0182  feholdexcpt.c:32libm-2.19.so
feholdexcept
4 0.0182  fesetenv.c:31   libm-2.19.so
fesetenv@@GLIBC_2.4
3 0.0136  mpa.c:767   libm-2.19.so __sqr
2 0.0091  strtod_l.c:483  libc-2.19.so
strtod_l_internal
1 0.0045  c-ray-mt.c:170  c-ray-mt main
1 0.0045  dl-tls.c:770ld-2.19.so  
__tls_get_addr
1 0.0045  dl-reloc.c:154  ld-2.19.so  
_dl_relocate_object
1 0.0045  (no location information)   libc-2.19.so
.udivsi3_skip_div0_test
1 0.0045  malloc.c:3302   libc-2.19.so
_int_malloc
1 0.0045  random_r.c:366  libc-2.19.so random_r
1 0.0045  strtod_l.c:201  libc-2.19.so
round_and_return

compared to the default codegen:

samples  %linenr info image name   symbol
name
1165764.6211  c-ray-mt.c:377  c-ray-mt shade
3396 18.8259  c-ray-mt.c:336  c-ray-mt trace
2586 14.3356  c-ray-mt.c:308  c-ray-mt
render_scanline
172   0.9535  e_pow.c:70  libm-2.19.so
__pow_finite
490.2716  (no location information)   no-vmlinux  
/no-vmlinux
470.2605  e_exp.c:240 libm-2.19.so __exp1
410.2273  c-ray-mt.c:454  c-ray-mt
get_primary_ray
390.2162  c-ray-mt.c:497  c-ray-mt
get_sample_pos
160.0887  e_pow.c:430 libm-2.19.so checkint
120.0665  fraiseexcpt.c:27libm-2.19.so
feraiseexcept
7 0.0388  fputc.c:37  libc-2.19.so fputc
2 0.0111  c-ray-mt.c:170  c-ray-mt main
2 0.0111  strtod_l.c:483  libc-2.19.so
strtod_l_internal
2 0.0111  mpa.c:767   libm-2.19.so __sqr
2 0.0111  feholdexcpt.c:32libm-2.19.so
feholdexcept
2 0.0111  fesetround.c:31 libm-2.19.so
fesetround
1 0.0055  cxa_thread_atexit_impl.c:83 libc-2.19.so
__call_tls_dtors
1 0.0055  memchr.S:58 libc-2.19.so memchr
1 0.0055  random_r.c:366  libc-2.19.so random_r
1 0.0055  strtok.c:38 libc-2.19.so strtok
1 0.0055  mpa.c:614   libm-2.19.so __mul
1 0.0055  fesetenv.c:31   libm-2.19.so
fesetenv@@GLIBC_2.4
1 0.0055  feupdateenv.c:27libm-2.19.so
feupdateenv@@GLIBC_2.4

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2012-07-12 Thread ramana at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2012-07-12
 CC||ramana at gcc dot gnu.org
 Ever Confirmed|0   |1


[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2012-07-10 Thread rearnsha at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

Richard Earnshaw rearnsha at gcc dot gnu.org changed:

   What|Removed |Added

 Target||arm

--- Comment #1 from Richard Earnshaw rearnsha at gcc dot gnu.org 2012-07-10 
08:48:24 UTC ---
What platform are you running on (GCC configuration)?

Please can you do some profiling and try to identify where the slowdown is
coming from.  We need more information if we are to progress this.