[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 --- Comment #10 from Alexandre Oliva aoliva at gcc dot gnu.org 2011-10-19 15:50:04 UTC --- Author: aoliva Date: Wed Oct 19 15:50:00 2011 New Revision: 180194 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=180194 Log: PR debug/49310 * var-tracking.c (loc_exp_dep, onepart_aux): New structs. (variable_part): Replace offset with union. (enum onepart_enum, onepart_enum_t): New. (variable_def): Drop cur_loc_changed, add onepart. (value_chain_def, const_value_chain): Remove. (VAR_PART_OFFSET, VAR_LOC_1PAUX): New macros, with checking. (VAR_LOC_DEP_LST, VAR_LOC_DEP_LSTP): New macros. (VAR_LOC_FROM, VAR_LOC_DEPTH, VAR_LOC_DEP_VEC): Likewise. (value_chain_pool, value_chains): Remove. (dropped_values): New. (struct parm_reg): Only if HAVE_window_save. (vt_stack_adjustments): Don't record register arguments. (dv_as_rtx): New. (dv_onepart_p): Return a onepart_enum_t. (onepart_pool): New. (dv_pool): Remove. (dv_from_rtx): New. (variable_htab_free): Release onepart aux data. Reset flags. (value_chain_htab_hash, value_chain_htab_eq): Remove. (unshare_variable): Use onepart field. Propagate onepart aux data or offset. Drop cur_loc_changed. (val_store): Cope with NULL insn. Rephrase dump output. Check for unsuitable locs. Add FIXME on using cselib locs. (val_reset): Remove FIXME of unfounded concerns. (val_resolve): Check for unsuitable locs. Add FIXME on using cselib locs. (variable_union): Use onepart field, adjust access to offset. (NO_LOC_P): New. (VALUE_CHANGED, DECL_CHANGED): Update doc. (set_dv_changed): Clear NO_LOC_P when changed. (find_loc_in_1pdv): Use onepart field. (intersect_loc_chains): Likewise. (unsuitable_loc): New. (loc_cmp): Keep ENTRY_VALUEs at the end of the loc list. (add_value_chain, add_value_chains): Remove. (add_cselib_value_chains, remove_value_chain): Likewise. (remove_value_chains, remove_cselib_value_chains): Likewise. (canonicalize_loc_order_check): Use onepart. Drop cur_loc_changed. (canonicalize_values_star, canonicalize_vars_star): Use onepart. (variable_merge_over_cur): Likewise. Adjust access to offset. Drop cur_loc_changed. (variable_merge_over_src): Use onepart field. (remove_duplicate_values): Likewise. (variable_post_merge_new_vals): Likewise. (find_mem_expr_in_1pdv): Likewise. (dataflow_set_preserve_mem_locs): Likewise. Drop cur_loc_changed and value chains. (dataflow_set_remove_mem_locs): Likewise. Use VAR_LOC_FROM. (variable_different_p): Use onepart field. Move onepart test out of the loop. (argument_reg_set): Drop. (add_uses, add_stores): Preserve but do not record in dynamic tables equivalences for ENTRY_VALUEs and CFA_based addresses. Avoid unsuitable address expressions. (EXPR_DEPTH): Unlimit. (EXPR_USE_DEPTH): Repurpose PARAM_MAX_VARTRACK_EXPR_DEPTH. (prepare_call_arguments): Use DECL_RTL_IF_SET. (dump_var): Adjust access to offset. (variable_from_dropped, recover_dropped_1paux): New. (variable_was_changed): Drop cur_loc_changed. Use onepart. Preserve onepart aux in empty_var. Recover empty_var and onepart aux from dropped_values. (find_variable_location_part): Special-case onepart. Adjust access to offset. (set_slot_part): Use onepart. Drop cur_loc_changed. Adjust access to offset. Initialize onepaux. Drop value chains. (delete_slot_part): Drop value chains. Use VAR_LOC_FROM. (VEC (variable, heap), VEC (rtx, stack)): Define. (expand_loc_callback_data): Drop dummy, cur_loc_changed, ignore_cur_loc. Add expanding, pending, depth. (loc_exp_dep_alloc, loc_exp_dep_clear): New. (loc_exp_dep_insert, loc_exp_dep_set): New. (notify_dependents_of_resolved_value): New. (update_depth, vt_expand_var_loc_chain): New. (vt_expand_loc_callback): Revamped. (resolve_expansions_pending_recursion): New. (INIT_ELCD, FINI_ELCD): New. (vt_expand_loc): Use the new macros above. Drop ignore_cur_loc parameter, adjust all callers. (vt_expand_loc_dummy): Drop. (vt_expand_1pvar): New. (emit_note_insn_var_location): Operate on non-debug decls only. Revamp multi-part cur_loc recomputation and one-part expansion. Drop cur_loc_changed. Adjust access to offset. (VEC (variable, heap)): Drop. (changed_variables_stack, changed_values_stack): Drop. (check_changed_vars_0, check_changed_vars_1): Remove. (check_changed_vars_2, check_changed_vars_3): Remove. (values_to_stack, remove_value_from_changed_variables): New. (notify_dependents_of_changed_value, process_changed_values): New. (emit_notes_for_changes): Revamp onepart updates. (emit_notes_for_differences_1): Use onepart. Drop cur_loc_changed and value chains. Propagate onepaux. Recover empty_var and onepaux from dropped_values. (emit_notes_for_differences_2): Drop value chains. (emit_notes_in_bb): Adjust. (vt_emit_notes): Drop value chains, changed_variables_stack. Initialize and release dropped_values. (create_entry_value): Revamp. (vt_add_function_parameter): Use new interface. (note_register_arguments): Remove. (vt_initialize): Drop value chains and register arguments.
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2011-07-22 09:52:45 UTC --- http://gcc.gnu.org/viewcvs?root=gccview=revrev=176538
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 --- Comment #8 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-07-04 18:57:45 UTC --- patch ;-) Index: gcc/params.def === --- gcc/params.def (revision 175820) +++ gcc/params.def (working copy) @@ -845,7 +845,7 @@ DEFPARAM (PARAM_MAX_VARTRACK_EXPR_DEPTH, max-vartrack-expr-depth, Max. recursion depth for expanding var tracking expressions, - 20, 0, 0) + 12, 0, 0) /* Set minimum insn uid for non-debug insns. */
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 --- Comment #7 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-06-09 06:54:33 UTC --- two more datapoints (depth=30 is still running): max-vartrack-expr-depth=22: var-tracking emit :5459.44 (99%) usr max-vartrack-expr-depth=25: var-tracking emit :42078.07 (100%) usr these are the timings for the various -Ox '-g -O0 -fbounds-check' : 14s '-g -O1 -fbounds-check' : 2631s ' -O1 -fbounds-check' : 44s '-g -O2 -fbounds-check' : 43s from this point of view, something at -O2 seems to be very good at cleaning up these very long expressions very cheaply. Would it make sense to run that pass also at -O1 (maybe only when these long expressions are observed) ?
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 --- Comment #2 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-06-08 07:16:06 UTC --- the testcase from http://gcc.gnu.org/bugzilla/attachment.cgi?id=20290 can be used more conveniently. It runs in 1.4s and still spends 50% of time in var-tracking emit. Using callgrind, most of the time is in emit_notes_for_changes, calling htab_traverse.
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||aoliva at gcc dot gnu.org, ||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org 2011-06-08 09:54:29 UTC --- Using -g -O2 -fbounds-check instead of -g -O1 -fbounds-check cures it, or e.g. -g -O1 -fbounds-check --param max-vartrack-expr-depth=5 speeds it up. The programming style is very weird, and combined with -fbounds-check which results in huge number of bbs doesn't help it, plus the expression chains for the debug vars really seem to be very long (and at the points where bounds checking failures are reported the relevant registers holding the expressions are reused for something else).
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 --- Comment #4 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-06-08 13:23:00 UTC --- (In reply to comment #3) Using -g -O2 -fbounds-check instead of -g -O1 -fbounds-check cures it, or e.g. -g -O1 -fbounds-check --param max-vartrack-expr-depth=5 speeds it up. The programming style is very weird, and combined with -fbounds-check which results in huge number of bbs doesn't help it, plus the expression chains for the debug vars really seem to be very long (and at the points where bounds checking failures are reported the relevant registers holding the expressions are reused for something else). Not so sure if I agree with your statement about my programming style ;-). sure timings explode with increasing max-vartrack-expr-depth, maybe the table below can help to pick a good default ? max-vartrack-expr-depth=2: var-tracking emit : 32.66 (33%) usr max-vartrack-expr-depth=3: var-tracking emit : 33.03 (34%) usr max-vartrack-expr-depth=4: var-tracking emit : 33.66 (34%) usr max-vartrack-expr-depth=5: var-tracking emit : 33.64 (34%) usr max-vartrack-expr-depth=6: var-tracking emit : 34.34 (35%) usr max-vartrack-expr-depth=7: var-tracking emit : 35.98 (35%) usr max-vartrack-expr-depth=8: var-tracking emit : 42.52 (37%) usr max-vartrack-expr-depth=9: var-tracking emit : 48.79 (39%) usr max-vartrack-expr-depth=10: var-tracking emit : 53.09 (42%) usr max-vartrack-expr-depth=12: var-tracking emit : 74.52 (46%) usr max-vartrack-expr-depth=14: var-tracking emit : 118.90 (63%) usr max-vartrack-expr-depth=16: var-tracking emit : 313.50 (81%) usr max-vartrack-expr-depth=18: var-tracking emit : 833.84 (91%) usr max-vartrack-expr-depth=20: var-tracking emit :2527.38 (97%) usr
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-06-08 13:34:42 UTC --- 10 was the minimal value to get reasonable debug info in some cases (e.g. gcc.dg/guality/), so perhaps 20 is too much and we should go down to the default of 12-15.
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2011-06-08 13:38:51 UTC --- Or alternatively make it more dynamic, like if in one function the maximum level is reached or almost reached (so it could be checked only in vt_expand_loc_callback) more than some parameter times (like several millions or so), it would temporarily drop down the limit to a lower value. It would probably need to recheck all var locations at that spot though, because dummy and real expansion should match.
[Bug middle-end/49310] [4.7 Regression] Compile time hog in var-tracking emit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49310 Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch changed: What|Removed |Added Summary|[4.7 Regression] Compile|[4.7 Regression] Compile |time hog|time hog in var-tracking ||emit --- Comment #1 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-06-07 15:10:48 UTC --- The time report is pretty clear: var-tracking emit :2565.20 (97%) usr 0.08 ( 9%) sys2565.58 (97%) wall 65881 kB ( 8%) ggc TOTAL :2631.33 0.85 2632.52 788209 kB For completeness the full report is below Execution times (seconds) phase setup : 0.03 ( 0%) usr 0.01 ( 1%) sys 0.04 ( 0%) wall 261 kB ( 0%) ggc phase parsing : 1.12 ( 0%) usr 0.06 ( 7%) sys 1.18 ( 0%) wall 45507 kB ( 6%) ggc phase generate:2630.17 (100%) usr 0.78 (92%) sys2631.29 (100%) wall 742440 kB (94%) ggc phase finalize: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc garbage collection: 3.46 ( 0%) usr 0.01 ( 1%) sys 3.47 ( 0%) wall 0 kB ( 0%) ggc callgraph construction: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 9747 kB ( 1%) ggc callgraph optimization: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 182 kB ( 0%) ggc ipa reference : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa pure const: 0.09 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc cfg construction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 140 kB ( 0%) ggc cfg cleanup : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 9 kB ( 0%) ggc CFG verifier : 0.73 ( 0%) usr 0.01 ( 1%) sys 0.81 ( 0%) wall 0 kB ( 0%) ggc trivially dead code : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.29 ( 0%) wall 0 kB ( 0%) ggc df scan insns : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 14 kB ( 0%) ggc df multiple defs : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc df reaching defs : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc df live regs : 0.76 ( 0%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall 0 kB ( 0%) ggc df liveinitialized regs: 0.31 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 0.58 ( 0%) usr 0.01 ( 1%) sys 0.66 ( 0%) wall 8709 kB ( 1%) ggc register information : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 0 kB ( 0%) ggc alias analysis: 0.24 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall 10901 kB ( 1%) ggc alias stmt walking: 1.28 ( 0%) usr 0.04 ( 5%) sys 1.22 ( 0%) wall 555 kB ( 0%) ggc register scan : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc rebuild jump labels : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc parser (global) : 1.12 ( 0%) usr 0.06 ( 7%) sys 1.18 ( 0%) wall 45506 kB ( 6%) ggc inline heuristics : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall 86 kB ( 0%) ggc tree gimplify : 0.46 ( 0%) usr 0.03 ( 4%) sys 0.49 ( 0%) wall 59986 kB ( 8%) ggc tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree CFG construction : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 9046 kB ( 1%) ggc tree CFG cleanup : 0.22 ( 0%) usr 0.01 ( 1%) sys 0.24 ( 0%) wall 35 kB ( 0%) ggc tree copy propagation : 0.23 ( 0%) usr 0.02 ( 2%) sys 0.21 ( 0%) wall 2267 kB ( 0%) ggc tree find ref. vars : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 5044 kB ( 1%) ggc tree PTA : 0.62 ( 0%) usr 0.05 ( 6%) sys 0.70 ( 0%) wall 1936 kB ( 0%) ggc tree PHI insertion: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 310 kB ( 0%) ggc tree SSA rewrite : 0.17 ( 0%) usr 0.02 ( 2%) sys 0.22 ( 0%) wall 21049 kB ( 3%) ggc tree SSA other: 0.05 ( 0%) usr 0.02 ( 2%) sys 0.12 ( 0%) wall 22 kB ( 0%) ggc tree SSA incremental : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 817 kB ( 0%) ggc tree operand scan : 0.17 ( 0%) usr 0.10 (12%) sys 0.23 ( 0%) wall 19454 kB ( 2%) ggc dominator optimization: 0.33 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall 5073 kB ( 1%) ggc tree SRA : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CCP : 2.13 ( 0%) usr 0.00 ( 0%) sys 2.13 ( 0%) wall 5999 kB ( 1%) ggc tree PHI const/copy prop: