[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #12 from PeteVine --- Nice, PR68664 patch has fixed the issue. FWIW, unlike previously, running on a Cortex-A53, showed perfect alignment with core type (-mfpu=vfpv3) on the first run: Cortex-A8 Rendering took: 1 seconds (1801 milliseconds) Cortex-A5 Rendering took: 1 seconds (1708 milliseconds) Cortex-A7 Rendering took: 1 seconds (1699 milliseconds) Cortex-A9 Rendering took: 1 seconds (1644 milliseconds) Cortex-A15 Rendering took: 1 seconds (1637 milliseconds) whereas using -mfpu=vfpv4 favours Cortex-A5 code's execution: Cortex-A8 Rendering took: 1 seconds (1803 milliseconds) Cortex-A5 Rendering took: 1 seconds (1506 milliseconds) Cortex-A7 Rendering took: 1 seconds (1636 milliseconds) Cortex-A9 Rendering took: 1 seconds (1645 milliseconds) Cortex-A15 Rendering took: 1 seconds (1643 milliseconds) but that's probably expected. Not sure about A8's codegen performance though.
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #11 from PeteVine --- Super cool, thanks! That makes the OP a true prophet before his time ;)
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #10 from James Greenhalgh --- (In reply to PeteVine from comment #9) > @jgreenhalgh Please have a look at the profiled assembly for both fast and > slow codegen. (attached) > > According to @aldyh's bisection in #68664 this probably isn't the same issue. In the attached code I once again see the vdiv moved before the branch in the slow case. Looking at the bisection is one way to triage a bug, but it points to a change in scheduling model for Cortex-A53, and the analysis in this report indicates that the same bad scheduling decision is made with the Cortex-A9 and Cortex-A15 scheduling models. If the scheduler is making bad decisions across a range of models, it is (in my opinion) more instructive to look for the pattern shared across those models and fix the scheduler than it is to tweak each scheduling model individually to avoid the abnormal case here.
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #9 from PeteVine --- @jgreenhalgh Please have a look at the profiled assembly for both fast and slow codegen. (attached) According to @aldyh's bisection in #68664 this probably isn't the same issue.
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #8 from Siarhei Siamashka --- Since my report predates bug 68664 by several years, shouldn't bug 68664 be a duplicate? In addition, my report was much more detailed, since it also provided a practical use case, showcasing the importance of this problem. Also if I understand it correctly, you have still not fixed the issue. So closing it seems to be a bit premature. I'll keep a watch on bug 68664 and will be sure to reopen my bugreport in the case if the fix does not help on ARM Cortex A9. Thanks for generating some sort of activity anyway. It's surely better than nothing.
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 James Greenhalgh changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE --- Comment #7 from James Greenhalgh --- *** This bug has been marked as a duplicate of bug 68664 ***
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #6 from PeteVine --- Testing different 32-bit codegen options in aarch32 mode on a Cortex A53, shows A15 is probably also affected. Full comparison below: $ for i in 8 5 7 9 15 ; do gcc -marm -Ofast -o c-ray-a$i c-ray-mt.c -lm -lpthread -mcpu=cortex-a$i; done $ for i in 8 5 7 9 15 ; do echo Cortex-A$i ; ./c-ray-a$i -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm ; done Cortex-A8 c-ray-mt v1.1 Rendering took: 1 seconds (1660 milliseconds) Cortex-A5 c-ray-mt v1.1 Rendering took: 1 seconds (1638 milliseconds) Cortex-A7 c-ray-mt v1.1 Rendering took: 1 seconds (1645 milliseconds) Cortex-A9 c-ray-mt v1.1 Rendering took: 2 seconds (2027 milliseconds) Cortex-A15 c-ray-mt v1.1 Rendering took: 1 seconds (1922 milliseconds)
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #5 from PeteVine --- Created attachment 39649 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39649=edit Annotated ARMv7 assembly
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #4 from PeteVine --- I've just done the obvious and run the resulting ARMv7 binaries on a Cortex A53 in aarch32 mode and the difference is there (GCC 6.2.1 and 7.0.0) so I can confirm the issue is present to this day. Cortex-A5 vs Cortex-A9 codegen yields a 0.81x performance ratio.
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #3 from PeteVine --- Curiously, up to gcc 6, targeting Cortex-A5 made virtually no difference, but in gcc 7, generic codegen takes an 8% hit while -mcpu=cortex-a5 produces roughly the same performance as before. (but that's a different issue so FWIW)
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 PeteVine changed: What|Removed |Added CC||tulipawn at gmail dot com --- Comment #2 from PeteVine --- Even though I've tested this on a Cortex-A5, the 18% difference does reproduce on gcc 6.1.1 (2694 vs 3304 ms): First, the slower profile for A9 codegen: CPU: ARM Cortex-A5, speed 1.728e+06 MHz (estimated) Counted CPU_CYCLES events (CPU cycle) with a unit mask of 0x00 (No unit mask) count 90 samples %linenr info image name symbol name 1446065.7422 c-ray-mt.c:377 c-ray-mt shade 3901 17.7358 c-ray-mt.c:336 c-ray-mt trace 3181 14.4624 c-ray-mt.c:308 c-ray-mt render_scanline 186 0.8456 e_pow.c:70 libm-2.19.so __pow_finite 680.3092 e_exp.c:240 libm-2.19.so __exp1 550.2501 (no location information) no-vmlinux /no-vmlinux 400.1819 c-ray-mt.c:454 c-ray-mt get_primary_ray 380.1728 c-ray-mt.c:497 c-ray-mt get_sample_pos 170.0773 fraiseexcpt.c:27libm-2.19.so feraiseexcept 130.0591 e_pow.c:430 libm-2.19.so checkint 6 0.0273 fesetround.c:31 libm-2.19.so fesetround 5 0.0227 fputc.c:37 libc-2.19.so fputc 5 0.0227 feupdateenv.c:27libm-2.19.so feupdateenv@@GLIBC_2.4 4 0.0182 feholdexcpt.c:32libm-2.19.so feholdexcept 4 0.0182 fesetenv.c:31 libm-2.19.so fesetenv@@GLIBC_2.4 3 0.0136 mpa.c:767 libm-2.19.so __sqr 2 0.0091 strtod_l.c:483 libc-2.19.so strtod_l_internal 1 0.0045 c-ray-mt.c:170 c-ray-mt main 1 0.0045 dl-tls.c:770ld-2.19.so __tls_get_addr 1 0.0045 dl-reloc.c:154 ld-2.19.so _dl_relocate_object 1 0.0045 (no location information) libc-2.19.so .udivsi3_skip_div0_test 1 0.0045 malloc.c:3302 libc-2.19.so _int_malloc 1 0.0045 random_r.c:366 libc-2.19.so random_r 1 0.0045 strtod_l.c:201 libc-2.19.so round_and_return compared to the default codegen: samples %linenr info image name symbol name 1165764.6211 c-ray-mt.c:377 c-ray-mt shade 3396 18.8259 c-ray-mt.c:336 c-ray-mt trace 2586 14.3356 c-ray-mt.c:308 c-ray-mt render_scanline 172 0.9535 e_pow.c:70 libm-2.19.so __pow_finite 490.2716 (no location information) no-vmlinux /no-vmlinux 470.2605 e_exp.c:240 libm-2.19.so __exp1 410.2273 c-ray-mt.c:454 c-ray-mt get_primary_ray 390.2162 c-ray-mt.c:497 c-ray-mt get_sample_pos 160.0887 e_pow.c:430 libm-2.19.so checkint 120.0665 fraiseexcpt.c:27libm-2.19.so feraiseexcept 7 0.0388 fputc.c:37 libc-2.19.so fputc 2 0.0111 c-ray-mt.c:170 c-ray-mt main 2 0.0111 strtod_l.c:483 libc-2.19.so strtod_l_internal 2 0.0111 mpa.c:767 libm-2.19.so __sqr 2 0.0111 feholdexcpt.c:32libm-2.19.so feholdexcept 2 0.0111 fesetround.c:31 libm-2.19.so fesetround 1 0.0055 cxa_thread_atexit_impl.c:83 libc-2.19.so __call_tls_dtors 1 0.0055 memchr.S:58 libc-2.19.so memchr 1 0.0055 random_r.c:366 libc-2.19.so random_r 1 0.0055 strtok.c:38 libc-2.19.so strtok 1 0.0055 mpa.c:614 libm-2.19.so __mul 1 0.0055 fesetenv.c:31 libm-2.19.so fesetenv@@GLIBC_2.4 1 0.0055 feupdateenv.c:27libm-2.19.so feupdateenv@@GLIBC_2.4
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 Ramana Radhakrishnan ramana at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2012-07-12 CC||ramana at gcc dot gnu.org Ever Confirmed|0 |1
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 Richard Earnshaw rearnsha at gcc dot gnu.org changed: What|Removed |Added Target||arm --- Comment #1 from Richard Earnshaw rearnsha at gcc dot gnu.org 2012-07-10 08:48:24 UTC --- What platform are you running on (GCC configuration)? Please can you do some profiling and try to identify where the slowdown is coming from. We need more information if we are to progress this.