https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Jan Hubicka changed:
What|Removed |Added
Summary|[14 Regression] 10% |[14 Regression] 10%
|fatigue2 regression on zen |fatigue2 regression on zen
|since |since
|r14-2369-g3a61ca1b925653|r14-2369-g3a61ca1b925653
||(bad LRA)
--- Comment #4 from Jan Hubicka ---
Aha, sphinx3 is indeed same patch.
The patch corrects profile here. It is LRA/scheduler interaction that causes
the difference
With older trunk I get:
Performance counter stats for './b.out':
28,536.75 msec task-clock:u #1.000 CPUs
utilized
0 context-switches:u #0.000 /sec
0 cpu-migrations:u #0.000 /sec
138 page-faults:u#4.836 /sec
134,747,380,473 cycles:u #4.722 GHz
(83.33%)
714,193,718 stalled-cycles-frontend:u#0.53% frontend
cycles idle(83.33%)
3,510,378 stalled-cycles-backend:u #0.00% backend
cycles idle (83.33%)
243,176,910,654 instructions:u #1.80 insn per
cycle
#0.00 stalled cycles per
insn (83.33%)
13,541,807,472 branches:u # 474.539 M/sec
(83.33%)
13,829,858 branch-misses:u #0.10% of all
branches (83.33%)
28.537620889 seconds time elapsed
28.536941000 seconds user
0.0 seconds sys
and with current trunk:
Performance counter stats for './a.out':
31933.51 msec task-clock:u #1.000 CPUs
utilized
0 context-switches:u #0.000 /sec
0 cpu-migrations:u #0.000 /sec
138 page-faults:u#4.321 /sec
150448312691 cycles:u #4.711 GHz
(83.33%)
760763745 stalled-cycles-frontend:u#0.51% frontend
cycles idle(83.33%)
1918238 stalled-cycles-backend:u #0.00% backend
cycles idle (83.33%)
242823668283 instructions:u #1.61 insn per
cycle
#0.00 stalled cycles per
insn (83.34%)
13541981288 branches:u # 424.068 M/sec
(83.34%)
14583703 branch-misses:u #0.11% of all
branches (83.33%)
31.933986770 seconds time elapsed
31.933701000 seconds user
0.0 seconds sys
So same instruction and branch count, but they execute slower. IPC goes down
from 1.8 to 1.6. Perf thinks the difference is
__perdida_m_MOD_generalized_hookes_law.constprop.0.
27.45% b.outb.out [.] MAIN__
27.07% a.outa.out [.] MAIN__
21.72% a.outa.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.0.
16.60% b.outb.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.0.
2.22% a.outa.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.1.
1.64% b.outb.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.1.
1.55% b.outlibc.so.6 [.] __memset_avx2_unaligned_erms
1.54% a.outlibc.so.6 [.] __memset_avx2_unaligned_erms
0.06% a.outlibm.so.6 [.] __sincos_fma
0.04% b.outlibm.so.6 [.] __sincos_fma
b.out is before patch and a.out is after. The difference seems to be relocated
load. Before patch:
Percent│ 00401860 <__perdida_m_MOD_generalized_hookes_▒
│ __perdida_m_MOD_generalized_hookes_law.constprop.0.is▒
0.10 │ push %rbp ▒
0.02 │ mov %r8,%rax ▒
│ vmovddup %xmm0,%xmm5 ▒
│ mov %rsp,%rbp ▒
1.22 │ push %r15 ▒
0.04 │ push %r14 ▒
0.03 │ push %r13 ▒
0.09 │ push %r12 ▒
0.05 │ push %rbx ▒
0.03 │ not