[Bug rtl-optimization/90813] [10 regression] gfortran.dg/proc_ptr_51.f90 fails (SIGSEGV) after 272084
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90813 Pat Haugen changed: What|Removed |Added CC||pthaugen at linux dot ibm.com, ||rguenth at gcc dot gnu.org --- Comment #22 from Pat Haugen --- So the problem appears to be alias.c:true_dependence() telling sched-deps.c:sched_analyze_2() that the following two instructions' memory references don't alias. Breakpoint 11, sched_analyze_2 (deps=0x7fffdbf8, x=0x759a1a40, insn=0x757f16c0) at /home/pthaugen/src/gcc/temp/gcc/gcc/sched-deps.c:2671 2671if (true_dependence (pending_mem->element (), VOIDmode, t) (gdb) pr pending->insn() (insn 37 36 38 4 (set (mem/f/c:DI (unspec:DI [ (symbol_ref:DI ("*.LANCHOR1") [flags 0x182]) (reg:DI 2 2) ] UNSPEC_TOCREL) [8 c_+0 S8 A64]) (reg/f:DI 141)) "proc_ptr_51.f90":28:0 609 {*movdi_internal64} (expr_list:REG_DEAD (reg/f:DI 141) (nil))) (gdb) pr insn (insn 39 38 40 4 (set (reg/f:DI 143 [ c_ ]) (mem/f/c:DI (reg/f:DI 142) [8 c_+0 S8 A64])) "proc_ptr_51.f90":37:0 609 {*movdi_internal64} (expr_list:REG_DEAD (reg/f:DI 142) (expr_list:REG_EQUAL (mem/f/c:DI (symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] ) [8 c_+0 S8 A64]) (nil Which then lets the scheduler move the load above the store. Since they really are referring to the same location, we load up garbage (null) and branch to it. Including some additional detail from a couple various spots in the debug chain. Hoping someone with more alias.c knowledge can chime in. Breakpoint 13, true_dependence_1 (mem=0x759a4878, mem_mode=E_VOIDmode, mem_addr=0x0, x=0x759a8e08, x_addr=0x0, mem_canonicalized=false) at /home/pthaugen/src/gcc/temp/gcc/gcc/alias.c:2902 2902{ (gdb) pr mem (mem/f/c:DI (unspec:DI [ (symbol_ref:DI ("*.LANCHOR1") [flags 0x182]) (reg:DI 2 2) ] UNSPEC_TOCREL) [8 c_+0 S8 A64]) (gdb) pr x (mem/f/c:DI (symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] ) [8 c_+0 S8 A64]) ... (gdb) pr x_addr (symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] ) (gdb) pr true_mem_addr (unspec:DI [ (symbol_ref:DI ("*.LANCHOR1") [flags 0x182]) (reg:DI 2 2) ] UNSPEC_TOCREL) (gdb) pr mem_base (symbol_ref:DI ("*.LANCHOR1") [flags 0x182]) 2958 if (! base_alias_check (x_addr, base, true_mem_addr, mem_base, (gdb) p base_alias_check (x_addr, base, true_mem_addr, mem_base, GET_MODE (x), mem_mode) $18 = 0 (gdb) s base_alias_check (x=0x75992990, x_base=0x75992990, y=0x759a4890, y_base=0x75990ac8, x_mode=E_DImode, y_mode=E_DImode) at /home/pthaugen/src/gcc/temp/gcc/gcc/alias.c:2174 2174 if (x_base == 0) (gdb) pr x (symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] ) (gdb) pr x_base (symbol_ref:DI ("__f_MOD_c_") [flags 0xc0] ) (gdb) pr y (unspec:DI [ (symbol_ref:DI ("*.LANCHOR1") [flags 0x182]) (reg:DI 2 2) ] UNSPEC_TOCREL) (gdb) pr y_base (symbol_ref:DI ("*.LANCHOR1") [flags 0x182]) 2221return compare_base_symbol_refs (x_base, y_base) != 0; (gdb) p compare_base_symbol_refs (x_base, y_base) $19 = 0 2136 if (!x_node->definition) (gdb) n 2137return 0;
[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257 --- Comment #10 from Pat Haugen --- (In reply to Richard Biener from comment #3) > Created attachment 46250 [details] > run_fast_dce also for LRA > > Sth like this could fix it. Yes, that restored the performance.
[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257 --- Comment #7 from Pat Haugen --- Overall 'perf' cycle counts and hot functions. r270483 --- # Overhead Samples Command Shared Object # ... # 91.17%721643 gcc_base.gcc_hu gcc_base.gcc_hunt_64 8.82% 69840 gcc_base.gcc_hu libc-2.17.so # Overhead Samples Command Shared Object Symbol # ... ... # 6.22% 49295 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] reg_is_remote_constant_p.isra.0.part.0 6.18% 48897 gcc_base.gcc_hu libc-2.17.so [.] __memset_power8 6.05% 47651 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] bitmap_operation 5.92% 46695 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] htab_traverse 3.66% 28957 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] canon_rtx 3.59% 28440 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] compute_transp 3.35% 26372 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] bitmap_element_allocate 2.18% 17151 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] compute_dominance_frontiers_1 2.00% 15841 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] ggc_set_mark 1.77% 13974 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] fixup_var_refs_1 1.69% 13391 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] ggc_mark_rtx_children_1 1.54% 12236 gcc_base.gcc_hu gcc_base.gcc_hunt_64 [.] single_set_2.part.0 r270484 --- # Overhead Samples Command Shared Object # ... . # 92.08%814297 gcc_base.base_6 gcc_base.base_64 7.91% 70063 gcc_base.base_6 libc-2.17.so # Overhead Samples Command Shared Object Symbol # ... . ... # 8.14% 71642 gcc_base.base_6 gcc_base.base_64 [.] bitmap_operation 6.92% 60863 gcc_base.base_6 gcc_base.base_64 [.] bitmap_element_allocate 6.01% 53281 gcc_base.base_6 gcc_base.base_64 [.] reg_is_remote_constant_p.isra.0.part.0 5.68% 50081 gcc_base.base_6 gcc_base.base_64 [.] htab_traverse 5.53% 48967 gcc_base.base_6 libc-2.17.so [.] __memset_power8 3.82% 33850 gcc_base.base_6 gcc_base.base_64 [.] compute_transp 3.30% 29142 gcc_base.base_6 gcc_base.base_64 [.] canon_rtx 1.95% 17155 gcc_base.base_6 gcc_base.base_64 [.] compute_dominance_frontiers_1 1.81% 16023 gcc_base.base_6 gcc_base.base_64 [.] ggc_set_mark 1.69% 14989 gcc_base.base_6 gcc_base.base_64 [.] ggc_mark_rtx_children_1 1.57% 13832 gcc_base.base_6 gcc_base.base_64 [.] fixup_var_refs_1 1.48% 13144 gcc_base.base_6 gcc_base.base_64 [.] single_set_2.part.0
[Bug other/90257] New: 8% degradation on cpu2006 403.gcc starting with revision 270484
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257 Bug ID: 90257 Summary: 8% degradation on cpu2006 403.gcc starting with revision 270484 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at linux dot ibm.com CC: dje at gcc dot gnu.org, hjl at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu Will add more detail as I discover it.
[Bug target/84369] test case gcc.dg/sms-10.c fails on power9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369 Pat Haugen changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Pat Haugen --- Fixed.
[Bug ipa/89584] New: CPU2000 degradations with r268448 (172.mgrid -22%, 252.eon -8%)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89584 Bug ID: 89584 Summary: CPU2000 degradations with r268448 (172.mgrid -22%, 252.eon -8%) Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at linux dot ibm.com CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org, rguenth at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu Revision 268448 introduced the noted degradations. Compile flags are -m64 -O3 -mcpu=power7 -fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip=all. I dug into the mgrid degradation further to have some more detail. The main difference appears to be that the last call to RESID() in the main function is now inlined. RESID() is actually cloned, and this call is to the clone, resid_.constprop.0. I'm not sure if this is another instance of losing RESTRICT on the parameters as seen in prior PRs (54497/55334 and 84737) or just a fact of inlining that specific call into an inner loop now creates too much register pressure and we spill too much (I suspect the latter). Following is a simple static instruction count comparison of the vectorized loop from resid_.constprop.0() and the same loop after inlining, note the obvious increase in load/store insns. Old = constprop.s New = constprop_inline.s INSTR Old New Change --- -- -- addi-1 29 28 bc -110 cmpl-110 ld -0 17 17 lxvd2x - 19 33 14 ori -055 stxvd2x -1 15 14 xvadddp - 17 170 xvnmsubadp -110 xvnmsubmdp -330 xxlor -32 -1 --- --- load- 19 50 31 store -1 15 14 total - 47 124 77