https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111612
Bug ID: 111612 Summary: GCC twice as slow as Clang for minisweep (SPEC HPC 2021) Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- The discussion came out during this year's GNU Tools Cauldron during the OpenMP/OpenACC/offloading talks, i.e. https://gcc.gnu.org/wiki/cauldron2023#cauldron2023talks.openacc_openmp_offloading_and_gcc In that talk, using MPI with 8 ranks gave the following (--define model=mpi --ranks 8): 3855 s (~1.071 h) - Nvidia HPC SDK 23.5 (May 2023): 4076 s (~1.132 h) - LLVM 17 (pre) commit 34cf263e6 (2023-08-07): 4900 s (~1.361 h) or/up to 6624 s (~1.840 h) - GCC og13 commit b003e6511 (2023-07-19) * * * I just tried it myself as follows - using the non SPEC-ified version and a modified input from how-to-run readme. I have not checked whether there are any gotchas, but it should be identical and without OpenMP, MPI or similar. Namely: git clone https://github.com/wdj/minisweep.git cmake -DCMAKE_C_FLAGS=-O2 -DCMAKE_C_COMPILER=/usr/bin/clang-14 ../.. And likewise for GCC mainline, also with -O2. Running then: time ./sweep --ncell_x 4 --ncell_y 8 --ncell_z 32 GCC mainline: Normsq result: 2.82234163e+12 diff: 0.000e+00 PASS time: 7.817 GF/s: 0.315 real 0m8,124s / user 0m7,943s / sys 0m0,180s Clang/LLVM-14: Normsq result: 2.82234163e+12 diff: 0.000e+00 PASS time: 3.036 GF/s: 0.812 real 0m3,223s / user 0m3,085s / sys 0m0,137s Using -O3 -flto, I get: 2.070s (GCC) vs. 1.053s (Clang/LLVM)