[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Andre Vieira changed: What|Removed |Added CC||andre.simoesdiasvieira@arm. ||com --- Comment #59 from Andre Vieira --- I believe PR70164 is related to this.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67295, which changed state. Bug 67295 Summary: [ARM][6 Regression] FAIL: gcc.target/arm/builtin-bswap-1.c scan-assembler-times revshne\\t 1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67295 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Dominik Vogt changed: What|Removed |Added CC||vogt at linux dot vnet.ibm.com --- Comment #58 from Dominik Vogt --- The patch in comment 35 causes a performance regression on s390. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #57 from Alexandre Oliva --- Author: aoliva Date: Thu Nov 26 21:57:40 2015 New Revision: 230985 URL: https://gcc.gnu.org/viewcvs?rev=230985=gcc=rev Log: [PR67753] adjust for padding when bypassing memory in assign_parm_setup_block Storing a register in memory as a full word and then accessing the same memory address under a smaller-than-word mode amounts to right-shifting of the register word on big endian machines. So, if BLOCK_REG_PADDING chooses upward padding for BYTES_BIG_ENDIAN, and we're copying from the entry_parm REG directly to a pseudo, bypassing any stack slot, perform the shifting explicitly. This fixes the miscompile of function_return_val_10 in gcc.target/aarch64/aapcs64/func-ret-4.c for target aarch64_be-elf introduced in the first patch for 67753. for gcc/ChangeLog PR rtl-optimization/67753 PR rtl-optimization/64164 * function.c (assign_parm_setup_block): Right-shift upward-padded big-endian args when bypassing the stack slot. Modified: trunk/gcc/ChangeLog trunk/gcc/function.c
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #56 from Alexandre Oliva --- Author: aoliva Date: Fri Nov 6 10:34:13 2015 New Revision: 229840 URL: https://gcc.gnu.org/viewcvs?rev=229840=gcc=rev Log: [PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg In assign_parms_setup_block, the copy of args in PARALLELs from entry_parm to stack_parm is deferred to the parm conversion insn seq, but the copy from stack_parm to target_reg was inserted in the normal copy seq, that is executed before the conversion insn seq. Oops. We could do away with the need for an actual stack_parm in general, which would have avoided the need for emitting the copy to target_reg in the conversion seq, but at least on pa, due to the need for stack to copy between SI and SF modes, it seems like using the reserved stack slot is beneficial, so I put in logic to use a pre-reserved stack slot when there is one, and emit the copy to target_reg in the conversion seq if stack_parm was set up there. for gcc/ChangeLog PR rtl-optimization/67753 PR rtl-optimization/64164 * function.c (assign_parm_setup_block): Avoid allocating a stack slot if we don't have an ABI-reserved one. Emit the copy to target_reg in the conversion seq if the copy from entry_parm is in it too. Don't use the conversion seq to copy a PARALLEL to a REG or a CONCAT. Modified: trunk/gcc/ChangeLog trunk/gcc/function.c
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67753, which changed state. Bug 67753 Summary: [6 Regression] FAIL: cxg1005, cxg2002, cxg2006, cxg2007, cxg2008, cxg2018, cxg2019 and cxg2020 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67753 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67766, which changed state. Bug 67766 Summary: [6 Regression]: Bootstrap failure on alpha-linux-gnu: ICE in simplify_subreg, at simplify-rtx.c https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67766 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67891, which changed state. Bug 67891 Summary: [6 Regression] FAIL: gcc.dg/pr43300.c (internal compiler error) on alpha-linux-gnu https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67891 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67597, which changed state. Bug 67597 Summary: [6 Regression] profiledbootstrap failure on ppc64le https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67597 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67312, which changed state. Bug 67312 Summary: [6 Regression] ICE: SIGSEGV in expand_expr_real_1 (expr.c:9561) with -ftree-coalesce-vars https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67312 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67490, which changed state. Bug 67490 Summary: [6 regression] FAIL: gcc.target/powerpc/pr16458-1.c scan-assembler-not cmpw https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67490 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67340, which changed state. Bug 67340 Summary: [6 Regression] ICE: in convert_move, at expr.c:279 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67340 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #55 from Alexandre Oliva --- Author: aoliva Date: Sun Sep 27 09:02:00 2015 New Revision: 228175 URL: https://gcc.gnu.org/viewcvs?rev=228175=gcc=rev Log: revert to assign_parms assignments using default defs Revert the fragile and complicated changes to assign_parms designed to enable it to use RTL assigments chosen by cfgexpand, and instead have cfgexpand use the RTL assignments by assign_parms, keying them off of the default defs that are now necessarily introduced for each parm and result. The possible lack of a default def was already a problem, and the fallbacks in place were not enough, as shown by PR67312. We now have checking asserts in set_rtl that verify that we're assigning to each var a piece of RTL that matches the expectations set forth by use_register_for_decl. for gcc/ChangeLog PR rtl-optimization/64164 PR tree-optimization/67312 PR middle-end/67340 PR middle-end/67490 PR bootstrap/67597 * cfgexpand.c (parm_in_stack_slot_p): Remove. (ssa_default_def_partition): Remove. (get_rtl_for_parm_ssa_default_def): Remove. (set_rtl): Check that RTL assignments match expectations. Loop on SUBREGs, CONCATs and PARALLELs subexprs. Set only the default def location for params and results. Record SSA names or types in REG and MEM attrs, respectively. (set_parm_rtl): New. (expand_one_ssa_partition): Drop logic that assigned MEMs with unassigned addresses. (adjust_one_expanded_partition_var): Don't accept NULL RTL on deferred stack alloc vars. (expand_used_vars): Skip partitions holding parm default defs. Move adjust_one_expanded_partition_var loop... (pass_expand::execute): ... here. Drop redundant assert. Adjust comments before the final loop over all ssa names. Require assigned rtl of parms and results to match exactly. Reset its attributes to match them, not any other variables in the same partition. (expand_debug_expr): Use entry value for PARM's default defs only iff they have zero nondebug uses. * cfgexpand.h (parm_in_stack_slot_p): Remove. (get_rtl_for_parm_ssa_default_def): Remove. (set_parm_rtl): Declare. * doc/invoke.texi: Improve wording. * explow.c (promote_decl_mode): Fix promote_function_mode for result decls not by reference. (promote_ssa_mode): Disregard BLKmode from promote_decl, and bypass TYPE_MODE to get the actual vector mode. * function.c: Include tree-dfa.h. Revert 2015-08-14's and 2015-08-19's changes as follows. Drop include of basic-block.h and df.h. (rtl_for_parm): Remove. (maybe_reset_rtl_for_parm): Remove. (parm_in_unassigned_mem_p): Remove. (use_register_for_decl): Add logic for RESULT_DECLs matching assign_parms' behavior. (split_complex_args): Revert. (assign_parms_augmented_arg_list): Revert. Add comment referencing the logic above. (assign_parm_adjust_stack_rtl): Revert. (assign_parm_setup_block): Revert. Use set_parm_rtl instead of SET_DECL_RTL. Set up a REG if the parm demands so. (assign_parm_setup_reg): Revert. Consolidated SET_DECL_RTL calls into a single set_parm_rtl. Set up a temporary RTL temporarily for expand_assignment. (assign_parm_setup_stack): Revert. Use set_parm_rtl. (assign_parms_unsplit_complex): Revert. Use set_parm_rtl. (assign_bounds): Revert. (assign_parms): Revert. Use set_parm_rtl. (allocate_struct_function): Relayout result and parms of non-abstruct functions. (expand_function_start): Revert. Use set_parm_rtl. If the result is not a hard reg, create a pseudo from the promoted mode of the default def. Promote static chain mode. * tree-outof-ssa.c (remove_ssa_form): Drop unused partition_has_default_def. Set up partitions_for_parm_default_defs. (finish_out_of_ssa): Remove partition_has_default_def. Release partitions_for_parm_default_defs. * tree-outof-ssa.h (struct ssaexpand): Remove partition_has_default_def. Add partitions_for_parm_default_defs. * tree-ssa-coalesce.c: Include tree-dfa.h, tm_p.h and stor-layout.h. (build_ssa_conflict_graph): Fix conflict-detection of default defs of even unused default defs of params and results. (for_all_parms): New. (create_default_def): New. (register_default_def): New. (coalesce_with_default): New. (create_outofssa_var_map): Create default defs for all parms and results, and register their partitions. Add GIMPLE_RETURN operands as coalesce candidates with results. Add default defs of each parm
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Alexandre Oliva changed: What|Removed |Added Depends on||67597, 67490 --- Comment #54 from Alexandre Oliva --- New patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01793.html Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67490 [Bug 67490] [6 regression] FAIL: gcc.target/powerpc/pr16458-1.c scan-assembler-not cmpw https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67597 [Bug 67597] [6 Regression] profiledbootstrap failure on ppc64le
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67227, which changed state. Bug 67227 Summary: [6 regression] comparison failure in ada/par.o https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67227 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #53 from Alexandre Oliva aoliva at gcc dot gnu.org --- Author: aoliva Date: Fri Aug 21 20:03:14 2015 New Revision: 227085 URL: https://gcc.gnu.org/viewcvs?rev=227085root=gccview=rev Log: fix sched compare regression for gcc/ChangeLog PR rtl-optimization/64164 PR rtl-optimization/67227 * alias.c (memrefs_conflict_p): Handle VALUEs in PLUS better. (nonoverlapping_memrefs_p): Test offsets and sizes when given identical gimple_reg exprs. Modified: trunk/gcc/ChangeLog trunk/gcc/alias.c
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #52 from Alexandre Oliva aoliva at gcc dot gnu.org --- Author: aoliva Date: Wed Aug 19 17:00:32 2015 New Revision: 227015 URL: https://gcc.gnu.org/viewcvs?rev=227015root=gccview=rev Log: [PR64164] fix regressions reported on m68k and armeb Defer stack slot address assignment for all parms that can't live in pseudos, and accept pseudos assignments in assign_param_setup_block. for gcc/ChangeLog PR rtl-optimization/64164 * cfgexpand.c (parm_maybe_byref_p): Renamed to... (parm_in_stack_slot_p): ... this. Disregard mode, what matters is whether the parm will live in a pseudo or a stack slot. (expand_one_ssa_partition): Deal with params without a default def. Disregard mode. * cfgexpand.h: Renamed function declaration. * tree-ssa-coalesce.c: Adjust. * function.c (split_complex_args): Allocate stack slot for unassigned parms before splitting. (parm_in_unassigned_mem_p): New. Use it instead of parm_maybe_byref_p throughout this file. (assign_parm_setup_block): Use it. Accept pseudos in the expand-assigned rtl. (assign_parm_setup_reg): Drop BLKmode requirement. (assign_parm_setup_stack): Allocate and fill in the address of unassigned MEM parms. Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/cfgexpand.h trunk/gcc/function.c trunk/gcc/tree-ssa-coalesce.c
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #50 from Alexandre Oliva aoliva at gcc dot gnu.org --- Author: aoliva Date: Fri Aug 14 18:51:50 2015 New Revision: 226901 URL: https://gcc.gnu.org/viewcvs?rev=226901root=gccview=rev Log: [PR64164] Drop copyrename, use coalescible partition as base when optimizing. for gcc/ChangeLog PR rtl-optimization/64164 PR bootstrap/66978 PR middle-end/66983 PR rtl-optimization/67000 PR middle-end/67034 PR middle-end/67035 * Makefile.in (OBJS): Drop tree-ssa-copyrename.o. * tree-ssa-copyrename.c: Removed. * opts.c (default_options_table): Drop -ftree-copyrename. Add -ftree-coalesce-vars. * passes.def: Drop all occurrences of pass_rename_ssa_copies. * common.opt (ftree-copyrename): Ignore. (ftree-coalesce-inlined-vars): Likewise. * doc/invoke.texi: Remove the ignored options above. * gimple-expr.h (gimple_can_coalesce_p): Move declaration * tree-ssa-coalesce.h: ... here. * tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other headers required by it. * gimple-expr.c (gimple_can_coalesce_p): Allow coalescing across variables when flag_tree_coalesce_vars. Check register use and promoted modes to allow coalescing. Do not coalesce maybe-byref parms with SSA_NAMEs of other variables, or anonymous SSA_NAMEs. Moved to tree-ssa-coalesce.c. * tree-ssa-live.c (struct tree_int_map_hasher): Move along with its member functions to tree-ssa-coalesce.c. (var_map_base_init): Likewise. Renamed to compute_samebase_partition_bases. (partition_view_normal): Drop want_bases parameter. (partition_view_bitmap): Likewise. * tree-ssa-live.h: Adjust declarations. * tree-ssa-coalesce.c: Include explow.h and cfgexpand.h. (build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's default defs at the entry point. (dump_part_var_map): New. (compute_optimized_partition_bases): New, called by... (coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead of compute_samebase_partition_bases. Adjust. * alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs. * cfgexpand.c (leader_merge, parm_maybe_byref_p): New. (ssa_default_def_partition): New. (get_rtl_for_parm_ssa_default_def): New. (align_local_variable, add_stack_var): Support anonymous SSA names. (defer_stack_allocation): Likewise. Declare earlier. (set_rtl): Merge exprs and attrs, even for MEMs and non-SSA vars. Update DECL_RTL for PARM_DECLs and RESULT_DECLs too. Do no record deferred-allocation marker in SA.partition_to_pseudo. (expand_stack_vars): Adjust check for the marker in it. (expand_one_stack_var_at): Handle anonymous SSA_NAMEs. Drop redundant MEM attr setting. (expand_one_stack_var_1): Handle anonymous SSA_NAMEs. Renamed from... (expand_one_stack_var): ... this. New wrapper to check and skip already expanded SSA partitions. (record_alignment_for_reg_var): New, factored out of... (expand_one_var): ... this. (expand_one_ssa_partition): New. (adjust_one_expanded_partition_var): New. (expand_one_register_var): Check and skip already expanded SSA partitions. (expand_used_vars): Don't create DECLs for anonymous SSA names. Expand all SSA partitions, then adjust all SSA names. (pass::execute): Replace the loops that set SA.partition_to_pseudo from partition leaders and cleared DECL_RTL for multi-location variables, and that which used to rename vars and set attrs, with one that clears DECL_RTL and checks that PARMs and RESULTs default_defs match DECL_RTL. * cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare. * emit-rtl.c: Include stor-layout.h. (set_reg_attrs_for_parm): Handle NULL decl. (set_reg_attrs_for_decl_rtl): Take mode from expression if it's not a DECL. * stmt.c (emit_case_decision_tree): Pass it the SSA_NAME rather than its possibly-NULL DECL. * explow.c (promote_ssa_mode): New. * explow.h (promote_ssa_mode): Declare. * expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs. (read_complex_part): Export. * expr.h (read_complex_part): Declare. * cfgexpand.h (parm_maybe_byref_p): Declare. * function.c: Include cfgexpand.h. (use_register_for_decl): Handle SSA_NAMEs, anonymous or not. (use_register_for_parm_decl): Wrapper for the above to special-case the result_ptr. (rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def. (split_complex_args): Take assign_parm_data_all argument. Pass it to
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 66983, which changed state. Bug 66983 Summary: [6 Regression] Many testsuite regressions https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66983 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67034, which changed state. Bug 67034 Summary: [6 Regression] FAIL: gcc.c-torture/compile/pr39928-1.c https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67034 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 66978, which changed state. Bug 66978 Summary: [6 Regression] bootstrap failure with --with-multilib-list=m32,m64,mx32 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66978 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67000, which changed state. Bug 67000 Summary: [6 Regression] ICE in split_complex_args, at function.c:2325 on ppc64le https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67000 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Bug 64164 depends on bug 67035, which changed state. Bug 67035 Summary: [6 Regression] FAIL: gcc.c-torture/compile/pr54713-3.c https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67035 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Alexandre Oliva aoliva at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #51 from Alexandre Oliva aoliva at gcc dot gnu.org --- Fixed for the next major release. Not planning a backport.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #49 from Gary Funck gary at intrepid dot com --- (In reply to Alexandre Oliva from comment #48) The errors reported in comments 44, 45, 46, and 47 are fixed in the git branch aoliva/pr64164. I'm giving it all some more testing before posting an updated, consolidated patch. I applied your patch (commit 9357ff1, 8/2/15) to our GUPC branch, based off trunk version 226386 on gcc112 (PPC64LE). It bootstrapped fine and passed the tests with -O0 --enable-checking and -O3 --disable-checking.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #48 from Alexandre Oliva aoliva at gcc dot gnu.org --- The errors reported in comments 44, 45, 46, and 47 are fixed in the git branch aoliva/pr64164. I'm giving it all some more testing before posting an updated, consolidated patch.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Rainer Orth ro at gcc dot gnu.org changed: What|Removed |Added CC||ro at gcc dot gnu.org --- Comment #47 from Rainer Orth ro at gcc dot gnu.org --- (In reply to Andreas Schwab from comment #44) Same on sparc.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #44 from Andreas Schwab sch...@linux-m68k.org --- This breaks gcc.dg/pr43300.c on aarch64. $ gcc/xgcc -B gcc/ ../gcc/testsuite/gcc.dg/pr43300.c ../gcc/testsuite/gcc.dg/pr43300.c: In function ‘foo’: ../gcc/testsuite/gcc.dg/pr43300.c:8:1: internal compiler error: in emit_move_insn, at expr.c:3552 foo (int x, V2SF a) ^ 0x7ee783 emit_move_insn(rtx_def*, rtx_def*) ../../gcc/expr.c:3551 0x84a80b assign_parm_setup_reg ../../gcc/function.c:3322 0x84c4ff assign_parms ../../gcc/function.c:3766 0x84f353 expand_function_start(tree_node*) ../../gcc/function.c:5192 0x6f919f execute ../../gcc/cfgexpand.c:6105
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #45 from Andreas Schwab sch...@linux-m68k.org --- It also breaks a lot of tests on m68k, eg: $ gcc/xgcc -B gcc/ ../gcc/testsuite/gcc.dg/pr17957.c ../gcc/testsuite/gcc.dg/pr17957.c: In function ‘vadd’: ../gcc/testsuite/gcc.dg/pr17957.c:6:1: internal compiler error: in expand_one_stack_var_1, at cfgexpand.c:1221 vadd (void) ^ 0x662055 expand_one_stack_var_1 ../../gcc/cfgexpand.c:1221 0x670685 expand_one_ssa_partition ../../gcc/cfgexpand.c:1295 0x670685 expand_used_vars ../../gcc/cfgexpand.c:1940 0x671e20 execute ../../gcc/cfgexpand.c:6084
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Steve Ellcey sje at gcc dot gnu.org changed: What|Removed |Added CC||sje at gcc dot gnu.org --- Comment #46 from Steve Ellcey sje at gcc dot gnu.org --- I see the same failures on MIPS too.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #43 from Alexandre Oliva aoliva at gcc dot gnu.org --- Author: aoliva Date: Thu Jul 23 15:34:49 2015 New Revision: 226113 URL: https://gcc.gnu.org/viewcvs?rev=226113root=gccview=rev Log: [PR64164] Drop copyrename, use coalescible partition as base when optimizing. for gcc/ChangeLog PR rtl-optimization/64164 * Makefile.in (OBJS): Drop tree-ssa-copyrename.o. * tree-ssa-copyrename.c: Removed. * opts.c (default_options_table): Drop -ftree-copyrename. Add -ftree-coalesce-vars. * passes.def: Drop all occurrences of pass_rename_ssa_copies. * common.opt (ftree-copyrename): Ignore. (ftree-coalesce-inlined-vars): Likewise. * doc/invoke.texi: Remove the ignored options above. * gimple-expr.h (gimple_can_coalesce_p): Move declaration * tree-ssa-coalesce.h: ... here. * tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other headers required by it. * gimple-expr.c (gimple_can_coalesce_p): Allow coalescing across variables when flag_tree_coalesce_vars. Check register use and promoted modes to allow coalescing. Moved to tree-ssa-coalesce.c. * tree-ssa-live.c (struct tree_int_map_hasher): Move along with its member functions to tree-ssa-coalesce.c. (var_map_base_init): Likewise. Renamed to compute_samebase_partition_bases. (partition_view_normal): Drop want_bases parameter. (partition_view_bitmap): Likewise. * tree-ssa-live.h: Adjust declarations. * tree-ssa-coalesce.c: Include explow.h. (build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's default defs at the entry point. (dump_part_var_map): New. (compute_optimized_partition_bases): New, called by... (coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead of compute_samebase_partition_bases. Adjust. * alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs. * cfgexpand.c (leader_merge): New. (get_rtl_for_parm_ssa_default_def): New. (set_rtl): Merge exprs and attrs, even for MEMs and non-SSA vars. Update DECL_RTL for PARM_DECLs and RESULT_DECLs too. (expand_one_stack_var_at): Handle anonymous SSA_NAMEs. Drop redundant MEM attr setting. (expand_one_stack_var_1): Handle anonymous SSA_NAMEs. Renamed from... (expand_one_stack_var): ... this. New wrapper to check and skip already expanded SSA partitions. (record_alignment_for_reg_var): New, factored out of... (expand_one_var): ... this. (expand_one_ssa_partition): New. (adjust_one_expanded_partition_var): New. (expand_one_register_var): Check and skip already expanded SSA partitions. (expand_used_vars): Don't create DECLs for anonymous SSA names. Expand all SSA partitions, then adjust all SSA names. (pass::execute): Replace the loops that set SA.partition_to_pseudo from partition leaders and cleared DECL_RTL for multi-location variables, and that which used to rename vars and set attrs, with one that clears DECL_RTL and checks that PARMs and RESULTs default_defs match DECL_RTL. * cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare. * emit-rtl.c (set_reg_attrs_for_parm): Handle NULL decl. * explow.c (promote_ssa_mode): New. * explow.h (promote_ssa_mode): Declare. * expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs. * function.c: Include cfgexpand.h. (use_register_for_decl): Handle SSA_NAMEs, anonymous or not. (use_register_for_parm_decl): Wrapper for the above to special-case the result_ptr. (rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def. (split_complex_args): Take assign_parm_data_all argument. Pass it to rtl_for_parm. Set up rtl and context for split args. (assign_parms_augmented_arg_list): Adjust. (maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with multiple locations. Recognize split complex args. (assign_parm_adjust_stack_rtl): Add all and parm arguments, for rtl_for_parm. For SSA-assigned parms, zero stack_parm. (assign_parm_setup_block): Prefer SSA-assigned location. (assign_parm_setup_reg): Likewise. Use entry_parm for equiv if stack_parm is NULL. (assign_parm_setup_stack): Prefer SSA-assigned location. (assign_parms): Maybe reset DECL_RTL of params. Adjust stack rtl before testing for pointer bounds. Special-case result_ptr. (expand_function_start): Maybe reset DECL_RTL of result. Prefer SSA-assigned location for result and static chain. Factor out DECL_RESULT and SET_DECL_RTL. * tree-outof-ssa.c (insert_value_copy_on_edge): Handle
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Christophe Lyon clyon at gcc dot gnu.org changed: What|Removed |Added CC||clyon at gcc dot gnu.org --- Comment #40 from Christophe Lyon clyon at gcc dot gnu.org --- Created attachment 35740 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35740action=edit ARM testcase This is the testcase that breaks on ARM, when compiled with optimizations: -O0 is OK, -O1, -O2, -O3 crash with: /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/libgcc/fixed-bit.c: In function '__gnu_addqq3': /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/libgcc/fixed-bit.c:59:1: internal compiler error: RTL flag check: MEM_VOLATILE_P used with unexpected rtx code 'reg' in set_mem_attributes_minus_bitpos, at emit-rtl.c:1787 FIXED_ADD (FIXED_C_TYPE a, FIXED_C_TYPE b) ^ 0xa6eb52 rtl_check_failed_flag(char const*, rtx_def const*, char const*, int, char const*) /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/rtl.c:800 0x771fc7 set_mem_attributes_minus_bitpos(rtx_def*, tree_node*, int, long) /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/emit-rtl.c:1787 0x805294 assign_parm_setup_block /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:2977 0x80b65c assign_parms /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:3775 0x80e087 expand_function_start(tree_node*) /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:5215 0x6a77ed execute /media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/cfgexpand.c:6127 Please submit a full bug report,
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #41 from David Edelsohn dje at gcc dot gnu.org --- Created attachment 35742 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35742action=edit AIX PowerPC testcase
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #42 from David Edelsohn dje at gcc dot gnu.org --- Created attachment 35743 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35743action=edit PPC64LE Linux Testcase
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Alexandre Oliva aoliva at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|ASSIGNED Resolution|FIXED |--- Summary|[4.9/5 Regression] one more |[4.9/5/6 Regression] one |stack slot used due to one |more stack slot used due to |less inlining level |one less inlining level --- Comment #39 from Alexandre Oliva aoliva at gcc dot gnu.org --- At least the sparc regression was caused by the change richi requested to disregard the underlying decl in promote_ssa_mode. I didn't realize this could cause a mismatch between the mode of the partition created for the QImode parm default def and the promoted mode for the parm decl expected by the parm-assignment code in function.c. This will likely take some time to sort out, so I'm reverting the patch for now.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #35 from Alexandre Oliva aoliva at gcc dot gnu.org --- Author: aoliva Date: Tue Jun 9 05:05:34 2015 New Revision: 224262 URL: https://gcc.gnu.org/viewcvs?rev=224262root=gccview=rev Log: [PR64164] Drop copyrename, use coalescible partition as base when optimizing. for gcc/ChangeLog PR rtl-optimization/64164 * Makefile.in (OBJS): Drop tree-ssa-copyrename.o. * tree-ssa-copyrename.c: Removed. * opts.c (default_options_table): Drop -ftree-copyrename. Add -ftree-coalesce-vars. * passes.def: Drop all occurrences of pass_rename_ssa_copies. * common.opt (ftree-copyrename): Ignore. (ftree-coalesce-inlined-vars): Likewise. * doc/invoke.texi: Remove the ignored options above. * gimple-expr.h (gimple_can_coalesce_p): Move declaration * tree-ssa-coalesce.h: ... here. * tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other headers required by it. * gimple-expr.c (gimple_can_coalesce_p): Allow coalescing across variables when flag_tree_coalesce_vars. Check register use and promoted modes to allow coalescing. Moved to tree-ssa-coalesce.c. * tree-ssa-live.c (struct tree_int_map_hasher): Move along with its member functions to tree-ssa-coalesce.c. (var_map_base_init): Likewise. Renamed to compute_samebase_partition_bases. (partition_view_normal): Drop want_bases parameter. (partition_view_bitmap): Likewise. * tree-ssa-live.h: Adjust declarations. * tree-ssa-coalesce.c: Include explow.h. (build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's default defs at the entry point. (dump_part_var_map): New. (compute_optimized_partition_bases): New, called by... (coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead of compute_samebase_partition_bases. Adjust. * alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs. * cfgexpand.c (leader_merge): New. (get_rtl_for_parm_ssa_default_def): New. (set_rtl): Merge exprs and attrs, even for MEMs and non-SSA vars. Update DECL_RTL for PARM_DECLs and RESULT_DECLs too. (expand_one_stack_var_at): Handle anonymous SSA_NAMEs. Drop redundant MEM attr setting. (expand_one_stack_var_1): Handle anonymous SSA_NAMEs. Renamed from... (expand_one_stack_var): ... this. New wrapper to check and skip already expanded SSA partitions. (record_alignment_for_reg_var): New, factored out of... (expand_one_var): ... this. (expand_one_ssa_partition): New. (adjust_one_expanded_partition_var): New. (expand_one_register_var): Check and skip already expanded SSA partitions. (expand_used_vars): Don't create DECLs for anonymous SSA names. Expand all SSA partitions, then adjust all SSA names. (pass::execute): Replace the loops that set SA.partition_to_pseudo from partition leaders and cleared DECL_RTL for multi-location variables, and that which used to rename vars and set attrs, with one that clears DECL_RTL and checks that PARMs and RESULTs default_defs match DECL_RTL. * cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare. * emit-rtl.c (set_reg_attrs_for_parm): Handle NULL decl. * explow.c (promote_ssa_mode): New. * explow.h (promote_ssa_mode): Declare. * expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs. * function.c: Include cfgexpand.h. (use_register_for_decl): Handle SSA_NAMEs, anonymous or not. (use_register_for_parm_decl): Wrapper for the above to special-case the result_ptr. (rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def. (maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with multiple locations. (assign_parm_adjust_stack_rtl): Add all and parm arguments, for rtl_for_parm. For SSA-assigned parms, zero stack_parm. (assign_parm_setup_block): Prefer SSA-assigned location. (assign_parm_setup_reg): Likewise. Use entry_parm for equiv if stack_parm is NULL. (assign_parm_setup_stack): Prefer SSA-assigned location. (assign_parms): Maybe reset DECL_RTL of params. Adjust stack rtl before testing for pointer bounds. Special-case result_ptr. (expand_function_start): Maybe reset DECL_RTL of result. Prefer SSA-assigned location for result and static chain. Factor out DECL_RESULT and SET_DECL_RTL. * tree-outof-ssa.c (insert_value_copy_on_edge): Handle anonymous SSA names. Use promote_ssa_mode. (get_temp_reg): Likewise. (remove_ssa_form): Adjust. * var-tracking.c (dataflow_set_clear_at_call): Take call_insn and get its reg_usage for reg
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #34 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Jeffrey A. Law from comment #33) On 03/31/2015 05:25 AM, rguenth at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #30 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Jeffrey A. Law from comment #28) So I've been thinking about how to integrate life/conflict analysis into the uncprop code and it may not be that bad, both from an implementation and computation standpoint. Most importantly, we don't have to compute full life information. We really just need to compute the life of the equivalence. Given the life of the equivalence, if the equivalence is live in any block that contains the defining statement for an SSA_NAME appearing in the target PHI, then the equivalence conflicts and we don't want to unpropagate it. Computing the life of the equivalence is pretty easy and should be reasonably quick. This is a cost we'd have to pay regardless of whether or not we integrate uncprop with out-of-ssa since we won't have life information for the expression. Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't test for conflicts multiple times if an SSA_NAME shows up in multiple PHI alternatives would help keep the cost down as well. Ultimately I don't think we need to integrate uncprop and out-of-ssa to avoid the unprofitable transformation during uncprop. Also see Boissinot et al., Fast Liveness Checking for SSA-Form Programs (CGO 08). They describe a way to do fast liveness queries without actually doing a (memory) expensive data-flow analysis but using SSA immediate-uses and dominance checks. Sth we could use in SSA coalescing as well to avoid both the liveness bitmaps and the conflict graph. Yea, it looks reasonably interesting and there's probably benefit in experimenting with that approach. However, be aware that it's memory consumption can be problematical. According to their summary, it's quadratic. Yes, but that's only if you store the liveness info. We don't need to do that but we only need to compute whether two partitions conflict for each coalescing candidate (which means a few SSA conflict checks dependent on partition size). Our current algorithm is already quadratic in memory use because we do store SSA liveness and the partition conflict graph. What I'm not sure is whether doing the SSA based liveness check is going to be slower compile-time wise. Though presumably we could drop back to the tried and true approach if we have too many BBs. That definitely is stage1 material. Indeed. Jeff
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.9.3 |6.0 Summary|[4.9/5 Regression] one more |[4.9/5/6 Regression] one |stack slot used due to one |more stack slot used due to |less inlining level |one less inlining level Known to fail||5.0 --- Comment #31 from Richard Biener rguenth at gcc dot gnu.org --- I'd say we push this back to GCC 6.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #32 from Jakub Jelinek jakub at gcc dot gnu.org --- Agreed, though ideally it should be fixed early in stage1.
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #33 from Jeffrey A. Law law at redhat dot com --- On 03/31/2015 05:25 AM, rguenth at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 --- Comment #30 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Jeffrey A. Law from comment #28) So I've been thinking about how to integrate life/conflict analysis into the uncprop code and it may not be that bad, both from an implementation and computation standpoint. Most importantly, we don't have to compute full life information. We really just need to compute the life of the equivalence. Given the life of the equivalence, if the equivalence is live in any block that contains the defining statement for an SSA_NAME appearing in the target PHI, then the equivalence conflicts and we don't want to unpropagate it. Computing the life of the equivalence is pretty easy and should be reasonably quick. This is a cost we'd have to pay regardless of whether or not we integrate uncprop with out-of-ssa since we won't have life information for the expression. Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't test for conflicts multiple times if an SSA_NAME shows up in multiple PHI alternatives would help keep the cost down as well. Ultimately I don't think we need to integrate uncprop and out-of-ssa to avoid the unprofitable transformation during uncprop. Also see Boissinot et al., Fast Liveness Checking for SSA-Form Programs (CGO 08). They describe a way to do fast liveness queries without actually doing a (memory) expensive data-flow analysis but using SSA immediate-uses and dominance checks. Sth we could use in SSA coalescing as well to avoid both the liveness bitmaps and the conflict graph. Yea, it looks reasonably interesting and there's probably benefit in experimenting with that approach. However, be aware that it's memory consumption can be problematical. According to their summary, it's quadratic. Though presumably we could drop back to the tried and true approach if we have too many BBs. That definitely is stage1 material. Jeff