[Bug rtl-optimization/90813] [10 regression] gfortran.dg/proc_ptr_51.f90 fails (SIGSEGV) after 272084

2019-07-05 Thread pthaugen at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90813

Pat Haugen  changed:

   What|Removed |Added

 CC||pthaugen at linux dot ibm.com,
   ||rguenth at gcc dot gnu.org

--- Comment #22 from Pat Haugen  ---
So the problem appears to be alias.c:true_dependence() telling
sched-deps.c:sched_analyze_2() that the following two instructions' memory
references don't alias.

Breakpoint 11, sched_analyze_2 (deps=0x7fffdbf8, x=0x759a1a40,
insn=0x757f16c0)
at /home/pthaugen/src/gcc/temp/gcc/gcc/sched-deps.c:2671
2671if (true_dependence (pending_mem->element (), VOIDmode,
t)
(gdb) pr pending->insn()
(insn 37 36 38 4 (set (mem/f/c:DI (unspec:DI [
(symbol_ref:DI ("*.LANCHOR1") [flags 0x182])
(reg:DI 2 2)
] UNSPEC_TOCREL) [8 c_+0 S8 A64])
(reg/f:DI 141)) "proc_ptr_51.f90":28:0 609 {*movdi_internal64}
 (expr_list:REG_DEAD (reg/f:DI 141)
(nil)))

(gdb) pr insn
(insn 39 38 40 4 (set (reg/f:DI 143 [ c_ ])
(mem/f/c:DI (reg/f:DI 142) [8 c_+0 S8 A64])) "proc_ptr_51.f90":37:0 609
{*movdi_internal64}
 (expr_list:REG_DEAD (reg/f:DI 142)
(expr_list:REG_EQUAL (mem/f/c:DI (symbol_ref:DI ("__f_MOD_c_") [flags
0xc0] ) [8 c_+0 S8 A64])
(nil

Which then lets the scheduler move the load above the store. Since they really
are referring to the same location, we load up garbage (null) and branch to it.

Including some additional detail from a couple various spots in the debug
chain. Hoping someone with more alias.c knowledge can chime in.


Breakpoint 13, true_dependence_1 (mem=0x759a4878, mem_mode=E_VOIDmode,
mem_addr=0x0, x=0x759a8e08, 
x_addr=0x0, mem_canonicalized=false) at
/home/pthaugen/src/gcc/temp/gcc/gcc/alias.c:2902
2902{
(gdb) pr mem
(mem/f/c:DI (unspec:DI [
(symbol_ref:DI ("*.LANCHOR1") [flags 0x182])
(reg:DI 2 2)
] UNSPEC_TOCREL) [8 c_+0 S8 A64])
(gdb) pr x
(mem/f/c:DI (symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] ) [8 c_+0 S8 A64])

...

(gdb) pr x_addr
(symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] )
(gdb) pr true_mem_addr
(unspec:DI [
(symbol_ref:DI ("*.LANCHOR1") [flags 0x182])
(reg:DI 2 2)
] UNSPEC_TOCREL)
(gdb) pr mem_base
(symbol_ref:DI ("*.LANCHOR1") [flags 0x182])

2958  if (! base_alias_check (x_addr, base, true_mem_addr, mem_base,
(gdb) p base_alias_check (x_addr, base, true_mem_addr, mem_base, GET_MODE (x),
mem_mode)
$18 = 0

(gdb) s
base_alias_check (x=0x75992990, x_base=0x75992990, y=0x759a4890,
y_base=0x75990ac8, 
x_mode=E_DImode, y_mode=E_DImode) at
/home/pthaugen/src/gcc/temp/gcc/gcc/alias.c:2174
2174  if (x_base == 0)
(gdb) pr x
(symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] )
(gdb) pr x_base
(symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] )
(gdb) pr y
(unspec:DI [
(symbol_ref:DI ("*.LANCHOR1") [flags 0x182])
(reg:DI 2 2)
] UNSPEC_TOCREL)
(gdb) pr y_base
(symbol_ref:DI ("*.LANCHOR1") [flags 0x182])

2221return compare_base_symbol_refs (x_base, y_base) != 0;
(gdb) p compare_base_symbol_refs (x_base, y_base)
$19 = 0

2136  if (!x_node->definition)
(gdb) n
2137return 0;

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread pthaugen at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #10 from Pat Haugen  ---
(In reply to Richard Biener from comment #3)
> Created attachment 46250 [details]
> run_fast_dce also for LRA
> 
> Sth like this could fix it.

Yes, that restored the performance.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread pthaugen at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #7 from Pat Haugen  ---
Overall 'perf' cycle counts and hot functions.

r270483
---

# Overhead   Samples  Command  Shared Object   
#     ...  
#
91.17%721643  gcc_base.gcc_hu  gcc_base.gcc_hunt_64
 8.82% 69840  gcc_base.gcc_hu  libc-2.17.so

# Overhead   Samples  Command  Shared Object Symbol 
#     ...   
...
#
 6.22% 49295  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
reg_is_remote_constant_p.isra.0.part.0
 6.18% 48897  gcc_base.gcc_hu  libc-2.17.so  [.]
__memset_power8
 6.05% 47651  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
bitmap_operation
 5.92% 46695  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
htab_traverse
 3.66% 28957  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.] canon_rtx
 3.59% 28440  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
compute_transp
 3.35% 26372  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
bitmap_element_allocate
 2.18% 17151  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
compute_dominance_frontiers_1
 2.00% 15841  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
ggc_set_mark
 1.77% 13974  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
fixup_var_refs_1
 1.69% 13391  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
ggc_mark_rtx_children_1
 1.54% 12236  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
single_set_2.part.0



r270484
---

# Overhead   Samples  Command  Shared Object
#     ...  .
#
92.08%814297  gcc_base.base_6  gcc_base.base_64 
 7.91% 70063  gcc_base.base_6  libc-2.17.so 


# Overhead   Samples  Command  Shared Object  Symbol
#     ...  . 
...
#
 8.14% 71642  gcc_base.base_6  gcc_base.base_64   [.]
bitmap_operation
 6.92% 60863  gcc_base.base_6  gcc_base.base_64   [.]
bitmap_element_allocate
 6.01% 53281  gcc_base.base_6  gcc_base.base_64   [.]
reg_is_remote_constant_p.isra.0.part.0
 5.68% 50081  gcc_base.base_6  gcc_base.base_64   [.] htab_traverse
 5.53% 48967  gcc_base.base_6  libc-2.17.so   [.]
__memset_power8
 3.82% 33850  gcc_base.base_6  gcc_base.base_64   [.]
compute_transp
 3.30% 29142  gcc_base.base_6  gcc_base.base_64   [.] canon_rtx
 1.95% 17155  gcc_base.base_6  gcc_base.base_64   [.]
compute_dominance_frontiers_1
 1.81% 16023  gcc_base.base_6  gcc_base.base_64   [.] ggc_set_mark
 1.69% 14989  gcc_base.base_6  gcc_base.base_64   [.]
ggc_mark_rtx_children_1
 1.57% 13832  gcc_base.base_6  gcc_base.base_64   [.]
fixup_var_refs_1
 1.48% 13144  gcc_base.base_6  gcc_base.base_64   [.]
single_set_2.part.0

[Bug other/90257] New: 8% degradation on cpu2006 403.gcc starting with revision 270484

2019-04-25 Thread pthaugen at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

Bug ID: 90257
   Summary: 8% degradation on cpu2006 403.gcc starting with
revision 270484
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at linux dot ibm.com
CC: dje at gcc dot gnu.org, hjl at gcc dot gnu.org, segher at 
gcc dot gnu.org,
wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

Will add more detail as I discover it.

[Bug target/84369] test case gcc.dg/sms-10.c fails on power9

2019-04-16 Thread pthaugen at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369

Pat Haugen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Pat Haugen  ---
Fixed.

[Bug ipa/89584] New: CPU2000 degradations with r268448 (172.mgrid -22%, 252.eon -8%)

2019-03-04 Thread pthaugen at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89584

Bug ID: 89584
   Summary: CPU2000 degradations with r268448 (172.mgrid -22%,
252.eon -8%)
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at linux dot ibm.com
CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org,
marxin at gcc dot gnu.org, rguenth at gcc dot gnu.org,
segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64-unknown-linux-gnu
Target: powerpc64-unknown-linux-gnu
 Build: powerpc64-unknown-linux-gnu

Revision 268448 introduced the noted degradations. Compile flags are -m64 -O3
-mcpu=power7 -fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip=all.

I dug into the mgrid degradation further to have some more detail. The main
difference appears to be that the last call to RESID() in the main function is
now inlined. RESID() is actually cloned, and this call is to the clone,
resid_.constprop.0. I'm not sure if this is another instance of losing RESTRICT
on the parameters as seen in prior PRs (54497/55334 and 84737) or just a fact
of inlining that specific call into an inner loop now creates too much register
pressure and we spill too much (I suspect the latter). Following is a simple
static instruction count comparison of the vectorized loop from
resid_.constprop.0() and the same loop after inlining, note the obvious
increase in load/store insns.

Old = constprop.s
New = constprop_inline.s
INSTR  Old  New Change
---  -- --
addi-1   29   28
bc  -110
cmpl-110
ld  -0   17   17
lxvd2x  -   19   33   14
ori -055
stxvd2x -1   15   14
xvadddp -   17   170
xvnmsubadp  -110
xvnmsubmdp  -330
xxlor   -32   -1
---  ---
load-   19   50   31
store   -1   15   14
total   -   47  124   77