[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2016-03-10 Thread andre.simoesdiasvieira at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #59 from Andre Vieira  ---
I believe PR70164 is related to this.

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2016-01-21 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67295, which changed state.

Bug 67295 Summary: [ARM][6 Regression] FAIL: gcc.target/arm/builtin-bswap-1.c 
scan-assembler-times revshne\\t 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67295

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-12-04 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Dominik Vogt  changed:

   What|Removed |Added

 CC||vogt at linux dot vnet.ibm.com

--- Comment #58 from Dominik Vogt  ---
The patch in comment 35 causes a performance regression on s390.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-11-26 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #57 from Alexandre Oliva  ---
Author: aoliva
Date: Thu Nov 26 21:57:40 2015
New Revision: 230985

URL: https://gcc.gnu.org/viewcvs?rev=230985=gcc=rev
Log:
[PR67753] adjust for padding when bypassing memory in assign_parm_setup_block

Storing a register in memory as a full word and then accessing the
same memory address under a smaller-than-word mode amounts to
right-shifting of the register word on big endian machines.  So, if
BLOCK_REG_PADDING chooses upward padding for BYTES_BIG_ENDIAN, and
we're copying from the entry_parm REG directly to a pseudo, bypassing
any stack slot, perform the shifting explicitly.

This fixes the miscompile of function_return_val_10 in
gcc.target/aarch64/aapcs64/func-ret-4.c for target aarch64_be-elf
introduced in the first patch for 67753.

for  gcc/ChangeLog

PR rtl-optimization/67753
PR rtl-optimization/64164
* function.c (assign_parm_setup_block): Right-shift
upward-padded big-endian args when bypassing the stack slot.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/function.c

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-11-06 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #56 from Alexandre Oliva  ---
Author: aoliva
Date: Fri Nov  6 10:34:13 2015
New Revision: 229840

URL: https://gcc.gnu.org/viewcvs?rev=229840=gcc=rev
Log:
[PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg

In assign_parms_setup_block, the copy of args in PARALLELs from
entry_parm to stack_parm is deferred to the parm conversion insn seq,
but the copy from stack_parm to target_reg was inserted in the normal
copy seq, that is executed before the conversion insn seq.  Oops.

We could do away with the need for an actual stack_parm in general,
which would have avoided the need for emitting the copy to target_reg
in the conversion seq, but at least on pa, due to the need for stack
to copy between SI and SF modes, it seems like using the reserved
stack slot is beneficial, so I put in logic to use a pre-reserved
stack slot when there is one, and emit the copy to target_reg in the
conversion seq if stack_parm was set up there.

for  gcc/ChangeLog

PR rtl-optimization/67753
PR rtl-optimization/64164
* function.c (assign_parm_setup_block): Avoid allocating a
stack slot if we don't have an ABI-reserved one.  Emit the
copy to target_reg in the conversion seq if the copy from
entry_parm is in it too.  Don't use the conversion seq to copy
a PARALLEL to a REG or a CONCAT.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/function.c

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-11-06 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67753, which changed state.

Bug 67753 Summary: [6 Regression] FAIL: cxg1005, cxg2002, cxg2006, cxg2007, 
cxg2008, cxg2018, cxg2019 and cxg2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67753

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-10-09 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67766, which changed state.

Bug 67766 Summary: [6 Regression]: Bootstrap failure on alpha-linux-gnu: ICE in 
simplify_subreg, at simplify-rtx.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67766

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-10-09 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67891, which changed state.

Bug 67891 Summary: [6 Regression] FAIL: gcc.dg/pr43300.c (internal compiler 
error) on alpha-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67891

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-09-27 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67597, which changed state.

Bug 67597 Summary: [6 Regression] profiledbootstrap failure on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67597

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-09-27 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67312, which changed state.

Bug 67312 Summary: [6 Regression] ICE: SIGSEGV in expand_expr_real_1 
(expr.c:9561) with -ftree-coalesce-vars
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67312

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-09-27 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67490, which changed state.

Bug 67490 Summary: [6 regression] FAIL: gcc.target/powerpc/pr16458-1.c 
scan-assembler-not cmpw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67490

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-09-27 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67340, which changed state.

Bug 67340 Summary: [6 Regression] ICE:  in convert_move, at expr.c:279
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67340

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-09-27 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #55 from Alexandre Oliva  ---
Author: aoliva
Date: Sun Sep 27 09:02:00 2015
New Revision: 228175

URL: https://gcc.gnu.org/viewcvs?rev=228175=gcc=rev
Log:
revert to assign_parms assignments using default defs

Revert the fragile and complicated changes to assign_parms designed to
enable it to use RTL assigments chosen by cfgexpand, and instead have
cfgexpand use the RTL assignments by assign_parms, keying them off of
the default defs that are now necessarily introduced for each parm and
result.  The possible lack of a default def was already a problem, and
the fallbacks in place were not enough, as shown by PR67312.  We now
have checking asserts in set_rtl that verify that we're assigning to
each var a piece of RTL that matches the expectations set forth by
use_register_for_decl.

for  gcc/ChangeLog

PR rtl-optimization/64164
PR tree-optimization/67312
PR middle-end/67340
PR middle-end/67490
PR bootstrap/67597
* cfgexpand.c (parm_in_stack_slot_p): Remove.
(ssa_default_def_partition): Remove.
(get_rtl_for_parm_ssa_default_def): Remove.
(set_rtl): Check that RTL assignments match expectations.
Loop on SUBREGs, CONCATs and PARALLELs subexprs.  Set only the
default def location for params and results.  Record SSA names
or types in REG and MEM attrs, respectively.
(set_parm_rtl): New.
(expand_one_ssa_partition): Drop logic that assigned MEMs with
unassigned addresses.
(adjust_one_expanded_partition_var): Don't accept NULL RTL on
deferred stack alloc vars.
(expand_used_vars): Skip partitions holding parm default defs.
Move adjust_one_expanded_partition_var loop...
(pass_expand::execute): ... here.  Drop redundant assert.
Adjust comments before the final loop over all ssa names.
Require assigned rtl of parms and results to match exactly.
Reset its attributes to match them, not any other variables in
the same partition.
(expand_debug_expr): Use entry value for PARM's default defs
only iff they have zero nondebug uses.
* cfgexpand.h (parm_in_stack_slot_p): Remove.
(get_rtl_for_parm_ssa_default_def): Remove.
(set_parm_rtl): Declare.
* doc/invoke.texi: Improve wording.
* explow.c (promote_decl_mode): Fix promote_function_mode for
result decls not by reference.
(promote_ssa_mode): Disregard BLKmode from promote_decl, and
bypass TYPE_MODE to get the actual vector mode.
* function.c: Include tree-dfa.h.  Revert 2015-08-14's and
2015-08-19's changes as follows.  Drop include of
basic-block.h and df.h.
(rtl_for_parm): Remove.
(maybe_reset_rtl_for_parm): Remove.
(parm_in_unassigned_mem_p): Remove.
(use_register_for_decl): Add logic for RESULT_DECLs matching
assign_parms' behavior.
(split_complex_args): Revert.
(assign_parms_augmented_arg_list): Revert.  Add comment
referencing the logic above.
(assign_parm_adjust_stack_rtl): Revert.
(assign_parm_setup_block): Revert.  Use set_parm_rtl instead
of SET_DECL_RTL.  Set up a REG if the parm demands so.
(assign_parm_setup_reg): Revert.  Consolidated SET_DECL_RTL
calls into a single set_parm_rtl.  Set up a temporary RTL
temporarily for expand_assignment.
(assign_parm_setup_stack): Revert.  Use set_parm_rtl.
(assign_parms_unsplit_complex): Revert.  Use set_parm_rtl.
(assign_bounds): Revert.
(assign_parms): Revert.  Use set_parm_rtl.
(allocate_struct_function): Relayout result and parms of
non-abstruct functions.
(expand_function_start): Revert.  Use set_parm_rtl.  If the
result is not a hard reg, create a pseudo from the promoted
mode of the default def.  Promote static chain mode.
* tree-outof-ssa.c (remove_ssa_form): Drop unused
partition_has_default_def.  Set up
partitions_for_parm_default_defs.
(finish_out_of_ssa): Remove partition_has_default_def.
Release partitions_for_parm_default_defs.
* tree-outof-ssa.h (struct ssaexpand): Remove
partition_has_default_def.  Add
partitions_for_parm_default_defs.
* tree-ssa-coalesce.c: Include tree-dfa.h, tm_p.h and
stor-layout.h.
(build_ssa_conflict_graph): Fix conflict-detection of default
defs of even unused default defs of params and results.
(for_all_parms): New.
(create_default_def): New.
(register_default_def): New.
(coalesce_with_default): New.
(create_outofssa_var_map): Create default defs for all parms
and results, and register their partitions.  Add GIMPLE_RETURN
operands as coalesce candidates with results.  Add default
defs of each parm 

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-09-23 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Alexandre Oliva  changed:

   What|Removed |Added

 Depends on||67597, 67490

--- Comment #54 from Alexandre Oliva  ---
New patch posted at https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01793.html


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67490
[Bug 67490] [6 regression] FAIL: gcc.target/powerpc/pr16458-1.c
scan-assembler-not cmpw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67597
[Bug 67597] [6 Regression] profiledbootstrap failure on ppc64le


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-21 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67227, which changed state.

Bug 67227 Summary: [6 regression] comparison failure in ada/par.o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67227

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-21 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #53 from Alexandre Oliva aoliva at gcc dot gnu.org ---
Author: aoliva
Date: Fri Aug 21 20:03:14 2015
New Revision: 227085

URL: https://gcc.gnu.org/viewcvs?rev=227085root=gccview=rev
Log:
fix sched compare regression

for  gcc/ChangeLog

PR rtl-optimization/64164
PR rtl-optimization/67227
* alias.c (memrefs_conflict_p): Handle VALUEs in PLUS better.
(nonoverlapping_memrefs_p): Test offsets and sizes when given
identical gimple_reg exprs.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/alias.c


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-19 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #52 from Alexandre Oliva aoliva at gcc dot gnu.org ---
Author: aoliva
Date: Wed Aug 19 17:00:32 2015
New Revision: 227015

URL: https://gcc.gnu.org/viewcvs?rev=227015root=gccview=rev
Log:
[PR64164] fix regressions reported on m68k and armeb

Defer stack slot address assignment for all parms that can't live in
pseudos, and accept pseudos assignments in assign_param_setup_block.

for  gcc/ChangeLog

PR rtl-optimization/64164
* cfgexpand.c (parm_maybe_byref_p): Renamed to...
(parm_in_stack_slot_p): ... this.  Disregard mode, what
matters is whether the parm will live in a pseudo or a stack
slot.
(expand_one_ssa_partition): Deal with params without a default
def.  Disregard mode.
* cfgexpand.h: Renamed function declaration.
* tree-ssa-coalesce.c: Adjust.
* function.c (split_complex_args): Allocate stack slot for
unassigned parms before splitting.
(parm_in_unassigned_mem_p): New.  Use it instead of
parm_maybe_byref_p throughout this file.
(assign_parm_setup_block): Use it.  Accept pseudos in the
expand-assigned rtl.
(assign_parm_setup_reg): Drop BLKmode requirement.
(assign_parm_setup_stack): Allocate and fill in the address of
unassigned MEM parms.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgexpand.c
trunk/gcc/cfgexpand.h
trunk/gcc/function.c
trunk/gcc/tree-ssa-coalesce.c


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-14 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #50 from Alexandre Oliva aoliva at gcc dot gnu.org ---
Author: aoliva
Date: Fri Aug 14 18:51:50 2015
New Revision: 226901

URL: https://gcc.gnu.org/viewcvs?rev=226901root=gccview=rev
Log:
[PR64164] Drop copyrename, use coalescible partition as base when optimizing.

for  gcc/ChangeLog

PR rtl-optimization/64164
PR bootstrap/66978
PR middle-end/66983
PR rtl-optimization/67000
PR middle-end/67034
PR middle-end/67035
* Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
* tree-ssa-copyrename.c: Removed.
* opts.c (default_options_table): Drop -ftree-copyrename.  Add
-ftree-coalesce-vars.
* passes.def: Drop all occurrences of pass_rename_ssa_copies.
* common.opt (ftree-copyrename): Ignore.
(ftree-coalesce-inlined-vars): Likewise.
* doc/invoke.texi: Remove the ignored options above.
* gimple-expr.h (gimple_can_coalesce_p): Move declaration
* tree-ssa-coalesce.h: ... here.
* tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other
headers required by it.
* gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
across variables when flag_tree_coalesce_vars.  Check register
use and promoted modes to allow coalescing.  Do not coalesce
maybe-byref parms with SSA_NAMEs of other variables, or
anonymous SSA_NAMEs.  Moved to tree-ssa-coalesce.c.
* tree-ssa-live.c (struct tree_int_map_hasher): Move along
with its member functions to tree-ssa-coalesce.c.
(var_map_base_init): Likewise.  Renamed to
compute_samebase_partition_bases.
(partition_view_normal): Drop want_bases parameter.
(partition_view_bitmap): Likewise.
* tree-ssa-live.h: Adjust declarations.
* tree-ssa-coalesce.c: Include explow.h and cfgexpand.h.
(build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's
default defs at the entry point.
(dump_part_var_map): New.
(compute_optimized_partition_bases): New, called by...
(coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead
of compute_samebase_partition_bases.  Adjust.
* alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs.
* cfgexpand.c (leader_merge, parm_maybe_byref_p): New.
(ssa_default_def_partition): New.
(get_rtl_for_parm_ssa_default_def): New.
(align_local_variable, add_stack_var): Support anonymous SSA
names.
(defer_stack_allocation): Likewise.  Declare earlier.
(set_rtl): Merge exprs and attrs, even for MEMs and non-SSA
vars.  Update DECL_RTL for PARM_DECLs and RESULT_DECLs too.
Do no record deferred-allocation marker in
SA.partition_to_pseudo.
(expand_stack_vars): Adjust check for the marker in it.
(expand_one_stack_var_at): Handle anonymous SSA_NAMEs.  Drop
redundant MEM attr setting.
(expand_one_stack_var_1): Handle anonymous SSA_NAMEs.  Renamed
from...
(expand_one_stack_var): ... this.  New wrapper to check and
skip already expanded SSA partitions.
(record_alignment_for_reg_var): New, factored out of...
(expand_one_var): ... this.
(expand_one_ssa_partition): New.
(adjust_one_expanded_partition_var): New.
(expand_one_register_var): Check and skip already expanded SSA
partitions.
(expand_used_vars): Don't create DECLs for anonymous SSA
names.  Expand all SSA partitions, then adjust all SSA names.
(pass::execute): Replace the loops that set
SA.partition_to_pseudo from partition leaders and cleared
DECL_RTL for multi-location variables, and that which used to
rename vars and set attrs, with one that clears DECL_RTL and
checks that PARMs and RESULTs default_defs match DECL_RTL.
* cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare.
* emit-rtl.c: Include stor-layout.h.
(set_reg_attrs_for_parm): Handle NULL decl.
(set_reg_attrs_for_decl_rtl): Take mode from expression if
it's not a DECL.
* stmt.c (emit_case_decision_tree): Pass it the SSA_NAME
rather than its possibly-NULL DECL.
* explow.c (promote_ssa_mode): New.
* explow.h (promote_ssa_mode): Declare.
* expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs.
(read_complex_part): Export.
* expr.h (read_complex_part): Declare.
* cfgexpand.h (parm_maybe_byref_p): Declare.
* function.c: Include cfgexpand.h.
(use_register_for_decl): Handle SSA_NAMEs, anonymous or not.
(use_register_for_parm_decl): Wrapper for the above to
special-case the result_ptr.
(rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def.
(split_complex_args): Take assign_parm_data_all argument.
Pass it to 

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-14 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 66983, which changed state.

Bug 66983 Summary: [6 Regression] Many testsuite regressions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66983

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-14 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67034, which changed state.

Bug 67034 Summary: [6 Regression] FAIL: gcc.c-torture/compile/pr39928-1.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67034

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-14 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 66978, which changed state.

Bug 66978 Summary: [6 Regression] bootstrap failure with 
--with-multilib-list=m32,m64,mx32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66978

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-14 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67000, which changed state.

Bug 67000 Summary: [6 Regression] ICE in split_complex_args, at function.c:2325 
on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67000

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-14 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
Bug 64164 depends on bug 67035, which changed state.

Bug 67035 Summary: [6 Regression] FAIL: gcc.c-torture/compile/pr54713-3.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67035

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-14 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Alexandre Oliva aoliva at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #51 from Alexandre Oliva aoliva at gcc dot gnu.org ---
Fixed for the next major release.  Not planning a backport.


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-08-02 Thread gary at intrepid dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #49 from Gary Funck gary at intrepid dot com ---
(In reply to Alexandre Oliva from comment #48)
 The errors reported in comments 44, 45, 46, and 47 are fixed in the git
 branch aoliva/pr64164.  I'm giving it all some more testing before posting
 an updated, consolidated patch.

I applied your patch (commit 9357ff1, 8/2/15) to our GUPC branch, based off
trunk version 226386 on gcc112 (PPC64LE).  It bootstrapped fine and passed the
tests with -O0 --enable-checking and -O3 --disable-checking.


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-07-30 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #48 from Alexandre Oliva aoliva at gcc dot gnu.org ---
The errors reported in comments 44, 45, 46, and 47 are fixed in the git branch
aoliva/pr64164.  I'm giving it all some more testing before posting an updated,
consolidated patch.


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-07-27 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Rainer Orth ro at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ro at gcc dot gnu.org

--- Comment #47 from Rainer Orth ro at gcc dot gnu.org ---
(In reply to Andreas Schwab from comment #44)

Same on sparc.


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-07-24 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #44 from Andreas Schwab sch...@linux-m68k.org ---
This breaks gcc.dg/pr43300.c on aarch64.

$ gcc/xgcc -B gcc/ ../gcc/testsuite/gcc.dg/pr43300.c
../gcc/testsuite/gcc.dg/pr43300.c: In function ‘foo’:
../gcc/testsuite/gcc.dg/pr43300.c:8:1: internal compiler error: in
emit_move_insn, at expr.c:3552
 foo (int x, V2SF a)
 ^
0x7ee783 emit_move_insn(rtx_def*, rtx_def*)
../../gcc/expr.c:3551
0x84a80b assign_parm_setup_reg
../../gcc/function.c:3322
0x84c4ff assign_parms
../../gcc/function.c:3766
0x84f353 expand_function_start(tree_node*)
../../gcc/function.c:5192
0x6f919f execute
../../gcc/cfgexpand.c:6105

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-07-24 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #45 from Andreas Schwab sch...@linux-m68k.org ---
It also breaks a lot of tests on m68k, eg:

$ gcc/xgcc -B gcc/ ../gcc/testsuite/gcc.dg/pr17957.c 
../gcc/testsuite/gcc.dg/pr17957.c: In function ‘vadd’:
../gcc/testsuite/gcc.dg/pr17957.c:6:1: internal compiler error: in
expand_one_stack_var_1, at cfgexpand.c:1221
 vadd (void)
 ^
0x662055 expand_one_stack_var_1
../../gcc/cfgexpand.c:1221
0x670685 expand_one_ssa_partition
../../gcc/cfgexpand.c:1295
0x670685 expand_used_vars
../../gcc/cfgexpand.c:1940
0x671e20 execute
../../gcc/cfgexpand.c:6084

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-07-24 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Steve Ellcey sje at gcc dot gnu.org changed:

   What|Removed |Added

 CC||sje at gcc dot gnu.org

--- Comment #46 from Steve Ellcey sje at gcc dot gnu.org ---
I see the same failures on MIPS too.


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-07-23 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #43 from Alexandre Oliva aoliva at gcc dot gnu.org ---
Author: aoliva
Date: Thu Jul 23 15:34:49 2015
New Revision: 226113

URL: https://gcc.gnu.org/viewcvs?rev=226113root=gccview=rev
Log:
[PR64164] Drop copyrename, use coalescible partition as base when optimizing.

for  gcc/ChangeLog

PR rtl-optimization/64164
* Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
* tree-ssa-copyrename.c: Removed.
* opts.c (default_options_table): Drop -ftree-copyrename.  Add
-ftree-coalesce-vars.
* passes.def: Drop all occurrences of pass_rename_ssa_copies.
* common.opt (ftree-copyrename): Ignore.
(ftree-coalesce-inlined-vars): Likewise.
* doc/invoke.texi: Remove the ignored options above.
* gimple-expr.h (gimple_can_coalesce_p): Move declaration
* tree-ssa-coalesce.h: ... here.
* tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other
headers required by it.
* gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
across variables when flag_tree_coalesce_vars.  Check register
use and promoted modes to allow coalescing.  Moved to
tree-ssa-coalesce.c.
* tree-ssa-live.c (struct tree_int_map_hasher): Move along
with its member functions to tree-ssa-coalesce.c.
(var_map_base_init): Likewise.  Renamed to
compute_samebase_partition_bases.
(partition_view_normal): Drop want_bases parameter.
(partition_view_bitmap): Likewise.
* tree-ssa-live.h: Adjust declarations.
* tree-ssa-coalesce.c: Include explow.h.
(build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's
default defs at the entry point.
(dump_part_var_map): New.
(compute_optimized_partition_bases): New, called by...
(coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead
of compute_samebase_partition_bases.  Adjust.
* alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs.
* cfgexpand.c (leader_merge): New.
(get_rtl_for_parm_ssa_default_def): New.
(set_rtl): Merge exprs and attrs, even for MEMs and non-SSA
vars.  Update DECL_RTL for PARM_DECLs and RESULT_DECLs too.
(expand_one_stack_var_at): Handle anonymous SSA_NAMEs.  Drop
redundant MEM attr setting.
(expand_one_stack_var_1): Handle anonymous SSA_NAMEs.  Renamed
from...
(expand_one_stack_var): ... this.  New wrapper to check and
skip already expanded SSA partitions.
(record_alignment_for_reg_var): New, factored out of...
(expand_one_var): ... this.
(expand_one_ssa_partition): New.
(adjust_one_expanded_partition_var): New.
(expand_one_register_var): Check and skip already expanded SSA
partitions.
(expand_used_vars): Don't create DECLs for anonymous SSA
names.  Expand all SSA partitions, then adjust all SSA names.
(pass::execute): Replace the loops that set
SA.partition_to_pseudo from partition leaders and cleared
DECL_RTL for multi-location variables, and that which used to
rename vars and set attrs, with one that clears DECL_RTL and
checks that PARMs and RESULTs default_defs match DECL_RTL.
* cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare.
* emit-rtl.c (set_reg_attrs_for_parm): Handle NULL decl.
* explow.c (promote_ssa_mode): New.
* explow.h (promote_ssa_mode): Declare.
* expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs.
* function.c: Include cfgexpand.h.
(use_register_for_decl): Handle SSA_NAMEs, anonymous or not.
(use_register_for_parm_decl): Wrapper for the above to
special-case the result_ptr.
(rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def.
(split_complex_args): Take assign_parm_data_all argument.
Pass it to rtl_for_parm.  Set up rtl and context for split
args.
(assign_parms_augmented_arg_list): Adjust.
(maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with
multiple locations.  Recognize split complex args.
(assign_parm_adjust_stack_rtl): Add all and parm arguments,
for rtl_for_parm.  For SSA-assigned parms, zero stack_parm.
(assign_parm_setup_block): Prefer SSA-assigned location.
(assign_parm_setup_reg): Likewise.  Use entry_parm for equiv
if stack_parm is NULL.
(assign_parm_setup_stack): Prefer SSA-assigned location.
(assign_parms): Maybe reset DECL_RTL of params.  Adjust stack
rtl before testing for pointer bounds.  Special-case result_ptr.
(expand_function_start): Maybe reset DECL_RTL of result.
Prefer SSA-assigned location for result and static chain.
Factor out DECL_RESULT and SET_DECL_RTL.
* tree-outof-ssa.c (insert_value_copy_on_edge): Handle

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-06-10 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Christophe Lyon clyon at gcc dot gnu.org changed:

   What|Removed |Added

 CC||clyon at gcc dot gnu.org

--- Comment #40 from Christophe Lyon clyon at gcc dot gnu.org ---
Created attachment 35740
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35740action=edit
ARM testcase

This is the testcase that breaks on ARM, when compiled with optimizations: -O0
is OK, -O1, -O2, -O3 crash with:
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/libgcc/fixed-bit.c:
In function '__gnu_addqq3':
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/libgcc/fixed-bit.c:59:1:
internal compiler error: RTL flag check: MEM_VOLATILE_P used with unexpected
rtx code 'reg' in set_mem_attributes_minus_bitpos, at emit-rtl.c:1787
 FIXED_ADD (FIXED_C_TYPE a, FIXED_C_TYPE b)
 ^
0xa6eb52 rtl_check_failed_flag(char const*, rtx_def const*, char const*, int,
char const*)
   
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/rtl.c:800
0x771fc7 set_mem_attributes_minus_bitpos(rtx_def*, tree_node*, int, long)
   
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/emit-rtl.c:1787
0x805294 assign_parm_setup_block
   
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:2977
0x80b65c assign_parms
   
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:3775
0x80e087 expand_function_start(tree_node*)
   
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/function.c:5215
0x6a77ed execute
   
/media/lyon/9be1a707-5b7f-46da-9106-e084a5dbb011/ssd/src/GCC/sources/gcc-fsf/trunk/gcc/cfgexpand.c:6127
Please submit a full bug report,


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-06-10 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #41 from David Edelsohn dje at gcc dot gnu.org ---
Created attachment 35742
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35742action=edit
AIX PowerPC testcase


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-06-10 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #42 from David Edelsohn dje at gcc dot gnu.org ---
Created attachment 35743
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35743action=edit
PPC64LE Linux Testcase


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-06-09 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Alexandre Oliva aoliva at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|ASSIGNED
 Resolution|FIXED   |---
Summary|[4.9/5 Regression] one more |[4.9/5/6 Regression] one
   |stack slot used due to one  |more stack slot used due to
   |less inlining level |one less inlining level

--- Comment #39 from Alexandre Oliva aoliva at gcc dot gnu.org ---
At least the sparc regression was caused by the change richi requested to
disregard the underlying decl in promote_ssa_mode.  I didn't realize this could
cause a mismatch between the mode of the partition created for the QImode parm
default def and the promoted mode for the parm decl expected by the
parm-assignment code in function.c.  This will likely take some time to sort
out, so I'm reverting the patch for now.


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-06-08 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #35 from Alexandre Oliva aoliva at gcc dot gnu.org ---
Author: aoliva
Date: Tue Jun  9 05:05:34 2015
New Revision: 224262

URL: https://gcc.gnu.org/viewcvs?rev=224262root=gccview=rev
Log:
[PR64164] Drop copyrename, use coalescible partition as base when optimizing.

for  gcc/ChangeLog

PR rtl-optimization/64164
* Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
* tree-ssa-copyrename.c: Removed.
* opts.c (default_options_table): Drop -ftree-copyrename.  Add
-ftree-coalesce-vars.
* passes.def: Drop all occurrences of pass_rename_ssa_copies.
* common.opt (ftree-copyrename): Ignore.
(ftree-coalesce-inlined-vars): Likewise.
* doc/invoke.texi: Remove the ignored options above.
* gimple-expr.h (gimple_can_coalesce_p): Move declaration
* tree-ssa-coalesce.h: ... here.
* tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other
headers required by it.
* gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
across variables when flag_tree_coalesce_vars.  Check register
use and promoted modes to allow coalescing.  Moved to
tree-ssa-coalesce.c.
* tree-ssa-live.c (struct tree_int_map_hasher): Move along
with its member functions to tree-ssa-coalesce.c.
(var_map_base_init): Likewise.  Renamed to
compute_samebase_partition_bases.
(partition_view_normal): Drop want_bases parameter.
(partition_view_bitmap): Likewise.
* tree-ssa-live.h: Adjust declarations.
* tree-ssa-coalesce.c: Include explow.h.
(build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's
default defs at the entry point.
(dump_part_var_map): New.
(compute_optimized_partition_bases): New, called by...
(coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead
of compute_samebase_partition_bases.  Adjust.
* alias.c (nonoverlapping_memrefs_p): Disregard gimple-regs.
* cfgexpand.c (leader_merge): New.
(get_rtl_for_parm_ssa_default_def): New.
(set_rtl): Merge exprs and attrs, even for MEMs and non-SSA
vars.  Update DECL_RTL for PARM_DECLs and RESULT_DECLs too.
(expand_one_stack_var_at): Handle anonymous SSA_NAMEs.  Drop
redundant MEM attr setting.
(expand_one_stack_var_1): Handle anonymous SSA_NAMEs.  Renamed
from...
(expand_one_stack_var): ... this.  New wrapper to check and
skip already expanded SSA partitions.
(record_alignment_for_reg_var): New, factored out of...
(expand_one_var): ... this.
(expand_one_ssa_partition): New.
(adjust_one_expanded_partition_var): New.
(expand_one_register_var): Check and skip already expanded SSA
partitions.
(expand_used_vars): Don't create DECLs for anonymous SSA
names.  Expand all SSA partitions, then adjust all SSA names.
(pass::execute): Replace the loops that set
SA.partition_to_pseudo from partition leaders and cleared
DECL_RTL for multi-location variables, and that which used to
rename vars and set attrs, with one that clears DECL_RTL and
checks that PARMs and RESULTs default_defs match DECL_RTL.
* cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare.
* emit-rtl.c (set_reg_attrs_for_parm): Handle NULL decl.
* explow.c (promote_ssa_mode): New.
* explow.h (promote_ssa_mode): Declare.
* expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs.
* function.c: Include cfgexpand.h.
(use_register_for_decl): Handle SSA_NAMEs, anonymous or not.
(use_register_for_parm_decl): Wrapper for the above to
special-case the result_ptr.
(rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def.
(maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with
multiple locations.
(assign_parm_adjust_stack_rtl): Add all and parm arguments,
for rtl_for_parm.  For SSA-assigned parms, zero stack_parm.
(assign_parm_setup_block): Prefer SSA-assigned location.
(assign_parm_setup_reg): Likewise.  Use entry_parm for equiv
if stack_parm is NULL.
(assign_parm_setup_stack): Prefer SSA-assigned location.
(assign_parms): Maybe reset DECL_RTL of params.  Adjust stack
rtl before testing for pointer bounds.  Special-case result_ptr.
(expand_function_start): Maybe reset DECL_RTL of result.
Prefer SSA-assigned location for result and static chain.
Factor out DECL_RESULT and SET_DECL_RTL.
* tree-outof-ssa.c (insert_value_copy_on_edge): Handle
anonymous SSA names.  Use promote_ssa_mode.
(get_temp_reg): Likewise.
(remove_ssa_form): Adjust.
* var-tracking.c (dataflow_set_clear_at_call): Take call_insn
and get its reg_usage for reg 

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-04-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #34 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Jeffrey A. Law from comment #33)
 On 03/31/2015 05:25 AM, rguenth at gcc dot gnu.org wrote:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164
 
  --- Comment #30 from Richard Biener rguenth at gcc dot gnu.org ---
  (In reply to Jeffrey A. Law from comment #28)
  So I've been thinking about how to integrate life/conflict analysis into 
  the
  uncprop code and it may not be that bad, both from an implementation and
  computation standpoint.
 
  Most importantly, we don't have to compute full life information.  We 
  really
  just need to compute the life of the equivalence.  Given the life of the
  equivalence, if the equivalence is live in any block that contains the
  defining statement for an SSA_NAME appearing in the target PHI, then the
  equivalence conflicts and we don't want to unpropagate it.
 
  Computing the life of the equivalence is pretty easy and should be
  reasonably quick.  This is a cost we'd have to pay regardless of whether or
  not we integrate uncprop with out-of-ssa since we won't have life
  information for the expression.
 
  Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't
  test for conflicts multiple times if an SSA_NAME shows up in multiple PHI
  alternatives would help keep the cost down as well.
 
  Ultimately I don't think we need to integrate uncprop and out-of-ssa to
  avoid the unprofitable transformation during uncprop.
 
  Also see Boissinot et al., Fast Liveness Checking for SSA-Form Programs
  (CGO 08).  They describe a way to do fast liveness queries without actually
  doing a (memory) expensive data-flow analysis but using SSA immediate-uses
  and dominance checks.  Sth we could use in SSA coalescing as well to avoid
  both the liveness bitmaps and the conflict graph.
 Yea, it looks reasonably interesting and there's probably benefit in 
 experimenting with that approach.  However, be aware that it's memory 
 consumption can be problematical.   According to their summary, it's 
 quadratic.

Yes, but that's only if you store the liveness info.  We don't need to do that
but we only need to compute whether two partitions conflict for each coalescing
candidate (which means a few SSA conflict checks dependent on partition size).

Our current algorithm is already quadratic in memory use because we do
store SSA liveness and the partition conflict graph.

What I'm not sure is whether doing the SSA based liveness check is going to
be slower compile-time wise.

  Though presumably we could drop back to the tried and true 
 approach if we have too many BBs.
 
 That definitely is stage1 material.

Indeed.

 Jeff


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-03-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.9.3   |6.0
Summary|[4.9/5 Regression] one more |[4.9/5/6 Regression] one
   |stack slot used due to one  |more stack slot used due to
   |less inlining level |one less inlining level
  Known to fail||5.0

--- Comment #31 from Richard Biener rguenth at gcc dot gnu.org ---
I'd say we push this back to GCC 6.


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-03-31 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #32 from Jakub Jelinek jakub at gcc dot gnu.org ---
Agreed, though ideally it should be fixed early in stage1.


[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2015-03-31 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

--- Comment #33 from Jeffrey A. Law law at redhat dot com ---
On 03/31/2015 05:25 AM, rguenth at gcc dot gnu.org wrote:
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

 --- Comment #30 from Richard Biener rguenth at gcc dot gnu.org ---
 (In reply to Jeffrey A. Law from comment #28)
 So I've been thinking about how to integrate life/conflict analysis into the
 uncprop code and it may not be that bad, both from an implementation and
 computation standpoint.

 Most importantly, we don't have to compute full life information.  We really
 just need to compute the life of the equivalence.  Given the life of the
 equivalence, if the equivalence is live in any block that contains the
 defining statement for an SSA_NAME appearing in the target PHI, then the
 equivalence conflicts and we don't want to unpropagate it.

 Computing the life of the equivalence is pretty easy and should be
 reasonably quick.  This is a cost we'd have to pay regardless of whether or
 not we integrate uncprop with out-of-ssa since we won't have life
 information for the expression.

 Collecting the SSA_NAMEs appearing on the RHS of the PHI so that we don't
 test for conflicts multiple times if an SSA_NAME shows up in multiple PHI
 alternatives would help keep the cost down as well.

 Ultimately I don't think we need to integrate uncprop and out-of-ssa to
 avoid the unprofitable transformation during uncprop.

 Also see Boissinot et al., Fast Liveness Checking for SSA-Form Programs
 (CGO 08).  They describe a way to do fast liveness queries without actually
 doing a (memory) expensive data-flow analysis but using SSA immediate-uses
 and dominance checks.  Sth we could use in SSA coalescing as well to avoid
 both the liveness bitmaps and the conflict graph.
Yea, it looks reasonably interesting and there's probably benefit in 
experimenting with that approach.  However, be aware that it's memory 
consumption can be problematical.   According to their summary, it's 
quadratic.  Though presumably we could drop back to the tried and true 
approach if we have too many BBs.

That definitely is stage1 material.

Jeff