[Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 (bad LRA)

2024-03-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 (bad LRA)

2023-07-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586

Jan Hubicka  changed:

   What|Removed |Added

Summary|[14 Regression] 10% |[14 Regression] 10%
   |fatigue2 regression on zen  |fatigue2 regression on zen
   |since   |since
   |r14-2369-g3a61ca1b925653|r14-2369-g3a61ca1b925653
   ||(bad LRA)

--- Comment #4 from Jan Hubicka  ---
Aha, sphinx3 is indeed same patch.
The patch corrects profile here. It is LRA/scheduler interaction that causes
the difference

With older trunk I get:
 Performance counter stats for './b.out':

 28,536.75 msec task-clock:u #1.000 CPUs
utilized
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
   138  page-faults:u#4.836 /sec
   134,747,380,473  cycles:u #4.722 GHz
(83.33%)
   714,193,718  stalled-cycles-frontend:u#0.53% frontend
cycles idle(83.33%)
 3,510,378  stalled-cycles-backend:u #0.00% backend
cycles idle (83.33%)
   243,176,910,654  instructions:u   #1.80  insn per
cycle
  #0.00  stalled cycles per
insn (83.33%)
13,541,807,472  branches:u   #  474.539 M/sec  
(83.33%)
13,829,858  branch-misses:u  #0.10% of all
branches (83.33%)

  28.537620889 seconds time elapsed

  28.536941000 seconds user
   0.0 seconds sys

and with current trunk:
 Performance counter stats for './a.out':

  31933.51 msec task-clock:u #1.000 CPUs
utilized
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
   138  page-faults:u#4.321 /sec
  150448312691  cycles:u #4.711 GHz
(83.33%)
 760763745  stalled-cycles-frontend:u#0.51% frontend
cycles idle(83.33%)
   1918238  stalled-cycles-backend:u #0.00% backend
cycles idle (83.33%)
  242823668283  instructions:u   #1.61  insn per
cycle
  #0.00  stalled cycles per
insn (83.34%)
   13541981288  branches:u   #  424.068 M/sec  
(83.34%)
  14583703  branch-misses:u  #0.11% of all
branches (83.33%)

  31.933986770 seconds time elapsed

  31.933701000 seconds user
   0.0 seconds sys

So same instruction and branch count, but they execute slower. IPC goes down
from 1.8 to 1.6. Perf thinks the difference is
__perdida_m_MOD_generalized_hookes_law.constprop.0.

  27.45%  b.outb.out [.] MAIN__ 
  27.07%  a.outa.out [.] MAIN__ 
  21.72%  a.outa.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.0.
  16.60%  b.outb.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.0.
   2.22%  a.outa.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.1.
   1.64%  b.outb.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.1.
   1.55%  b.outlibc.so.6 [.] __memset_avx2_unaligned_erms   
   1.54%  a.outlibc.so.6 [.] __memset_avx2_unaligned_erms   
   0.06%  a.outlibm.so.6 [.] __sincos_fma   
   0.04%  b.outlibm.so.6 [.] __sincos_fma   

b.out is before patch and a.out is after. The difference seems to be relocated
load.  Before patch:

Percent│ 00401860 <__perdida_m_MOD_generalized_hookes_▒
   │ __perdida_m_MOD_generalized_hookes_law.constprop.0.is▒
  0.10 │   push %rbp  ▒
  0.02 │   mov  %r8,%rax  ▒
   │   vmovddup %xmm0,%xmm5   ▒
   │   mov  %rsp,%rbp ▒
  1.22 │   push %r15  ▒
  0.04 │   push %r14  ▒
  0.03 │   push %r13  ▒
  0.09 │   push %r12  ▒
  0.05 │   push %rbx  ▒
  0.03 │   not  

[Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653

2023-07-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586

--- Comment #3 from Martin Jambor  ---
(In reply to Jan Hubicka from comment #2)
> Do we have other PRs reducing to this change?
> 

I thought the recent sphinx regression was also becaus of this?  But if I am
wrong, there may be none.

[Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653

2023-07-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586

Jan Hubicka  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-07-17

--- Comment #2 from Jan Hubicka  ---
Do we have other PRs reducing to this change?

The patch makes cuntroll to scale down previously incoherent profiles when loop
that does not loop is predicted to loop.
Common source of these loops are vectorized epilogues which I fixed yesterday.
With some luck this may fix fatigue.

[Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653

2023-07-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586

Andrew Pinski  changed:

   What|Removed |Added

   Keywords|needs-bisection |
Summary|[13/14 Regression] 10%  |[14 Regression] 10%
   |fatigue2 regression on zen  |fatigue2 regression on zen
   |since   |since
   |r14-2369-g3a61ca1b925653|r14-2369-g3a61ca1b925653
   Target Milestone|13.2|14.0
Version|13.1.0  |14.0