Re: [PATCH 3/4] New IPA-SRA implementation
Hi, sorry for replying so late, I still haven't recovered from two weeks of traveling and conferences. On Sat, Sep 21 2019, Richard Sandiford wrote: > Hi, > > Thanks for doing this. > > Martin Jambor writes: >> +/* Analyze function body scan results stored in param_accesses and >> + param_accesses, detect possible transformations and store information of >> + those in function summary. NODE, FUN and IFS are all various structures >> + describing the currently analyzed function. */ >> + >> +static void >> +process_scan_results (cgraph_node *node, struct function *fun, >> + isra_func_summary *ifs, >> + vec *param_descriptions) >> +{ >> + bool check_pass_throughs = false; >> + bool dereferences_propagated = false; >> + tree parm = DECL_ARGUMENTS (node->decl); >> + unsigned param_count = param_descriptions->length(); >> + >> + for (unsigned desc_index = 0; >> + desc_index < param_count; >> + desc_index++, parm = DECL_CHAIN (parm)) >> +{ >> + gensum_param_desc *desc = &(*param_descriptions)[desc_index]; >> + if (!desc->locally_unused && !desc->split_candidate) >> +continue; > > I'm jumping in the middle without working through the whole pass, > so this is probably a daft question sorry, but: what is this loop > required to do when: > > !desc->split_candidate && desc->locally_unused You have figured out correctly that this is a thinko. I meant not to continue for non-register-types which might not be used locally but their locally_unused flag is only set a few lines below... > > ? AFAICT... > >> + >> + if (flag_checking) >> +isra_verify_access_tree (desc->accesses); >> + >> + if (!dereferences_propagated >> + && desc->by_ref >> + && desc->accesses) >> +{ >> + propagate_dereference_distances (fun); >> + dereferences_propagated = true; >> +} >> + >> + HOST_WIDE_INT nonarg_acc_size = 0; >> + bool only_calls = true; >> + bool check_failed = false; >> + >> + int entry_bb_index = ENTRY_BLOCK_PTR_FOR_FN (fun)->index; >> + for (gensum_param_access *acc = desc->accesses; >> + acc; >> + acc = acc->next_sibling) >> +if (check_gensum_access (parm, desc, acc, _acc_size, _calls, >> + entry_bb_index)) >> + { >> +check_failed = true; >> +break; >> + } >> + if (check_failed) >> +continue; >> + >> + if (only_calls) >> +desc->locally_unused = true; ...specifically here. >> + >> + HOST_WIDE_INT cur_param_size >> += tree_to_uhwi (TYPE_SIZE (TREE_TYPE (parm))); >> + HOST_WIDE_INT param_size_limit; >> + if (!desc->by_ref || optimize_function_for_size_p (fun)) >> +param_size_limit = cur_param_size; >> + else >> +param_size_limit = (PARAM_VALUE (PARAM_IPA_SRA_PTR_GROWTH_FACTOR) >> + * cur_param_size); >> + if (nonarg_acc_size > param_size_limit >> + || (!desc->by_ref && nonarg_acc_size == param_size_limit)) >> +{ >> + disqualify_split_candidate (desc, "Would result into a too big set of" >> + "replacements."); >> +} >> + else >> +{ >> + /* create_parameter_descriptors makes sure unit sizes of all >> + candidate parameters fit unsigned integers restricted to >> + ISRA_ARG_SIZE_LIMIT. */ >> + desc->param_size_limit = param_size_limit / BITS_PER_UNIT; >> + desc->nonarg_acc_size = nonarg_acc_size / BITS_PER_UNIT; >> + if (desc->split_candidate && desc->ptr_pt_count) >> +{ >> + gcc_assert (desc->by_ref); >> + check_pass_throughs = true; >> +} >> +} >> +} > > ...disqualify_split_candidate should be a no-op in that case, > because we've already disqualified the parameter for a different reason. > So it looks like the main effect is instead to set up param_size_limit > and nonarg_acc_size, the latter of which I assume is 0 when > desc->locally_unused. This is the only bit where you are wrong, param_size_limit is the type size for aggregates and twice pointer size for pointers (well, actually PARAM_IPA_SRA_PTR_GROWTH_FACTOR times the size of a pointer). Even for locally unused parameters because we might "pull" some of them from callees, when there are some. But it is not really relevant for the problem you are facing. > > The reason for asking is that the final "else" says that we've already > checked that param_size_limit is in range, but that's only true if > desc->split_candidate. In particular: > > if (is_gimple_reg (parm) > && !isra_track_scalar_param_local_uses (fun, node, parm, num, > _call_uses)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, " is a scalar with only %i call uses\n", >scalar_call_uses); > > desc->locally_unused = true; > desc->call_uses
Re: [PATCH 3/4] New IPA-SRA implementation
Hi, Thanks for doing this. Martin Jambor writes: > +/* Analyze function body scan results stored in param_accesses and > + param_accesses, detect possible transformations and store information of > + those in function summary. NODE, FUN and IFS are all various structures > + describing the currently analyzed function. */ > + > +static void > +process_scan_results (cgraph_node *node, struct function *fun, > + isra_func_summary *ifs, > + vec *param_descriptions) > +{ > + bool check_pass_throughs = false; > + bool dereferences_propagated = false; > + tree parm = DECL_ARGUMENTS (node->decl); > + unsigned param_count = param_descriptions->length(); > + > + for (unsigned desc_index = 0; > + desc_index < param_count; > + desc_index++, parm = DECL_CHAIN (parm)) > +{ > + gensum_param_desc *desc = &(*param_descriptions)[desc_index]; > + if (!desc->locally_unused && !desc->split_candidate) > + continue; I'm jumping in the middle without working through the whole pass, so this is probably a daft question sorry, but: what is this loop required to do when: !desc->split_candidate && desc->locally_unused ? AFAICT... > + > + if (flag_checking) > + isra_verify_access_tree (desc->accesses); > + > + if (!dereferences_propagated > + && desc->by_ref > + && desc->accesses) > + { > + propagate_dereference_distances (fun); > + dereferences_propagated = true; > + } > + > + HOST_WIDE_INT nonarg_acc_size = 0; > + bool only_calls = true; > + bool check_failed = false; > + > + int entry_bb_index = ENTRY_BLOCK_PTR_FOR_FN (fun)->index; > + for (gensum_param_access *acc = desc->accesses; > +acc; > +acc = acc->next_sibling) > + if (check_gensum_access (parm, desc, acc, _acc_size, _calls, > + entry_bb_index)) > + { > + check_failed = true; > + break; > + } > + if (check_failed) > + continue; > + > + if (only_calls) > + desc->locally_unused = true; > + > + HOST_WIDE_INT cur_param_size > + = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (parm))); > + HOST_WIDE_INT param_size_limit; > + if (!desc->by_ref || optimize_function_for_size_p (fun)) > + param_size_limit = cur_param_size; > + else > + param_size_limit = (PARAM_VALUE (PARAM_IPA_SRA_PTR_GROWTH_FACTOR) > +* cur_param_size); > + if (nonarg_acc_size > param_size_limit > + || (!desc->by_ref && nonarg_acc_size == param_size_limit)) > + { > + disqualify_split_candidate (desc, "Would result into a too big set of" > + "replacements."); > + } > + else > + { > + /* create_parameter_descriptors makes sure unit sizes of all > + candidate parameters fit unsigned integers restricted to > + ISRA_ARG_SIZE_LIMIT. */ > + desc->param_size_limit = param_size_limit / BITS_PER_UNIT; > + desc->nonarg_acc_size = nonarg_acc_size / BITS_PER_UNIT; > + if (desc->split_candidate && desc->ptr_pt_count) > + { > + gcc_assert (desc->by_ref); > + check_pass_throughs = true; > + } > + } > +} ...disqualify_split_candidate should be a no-op in that case, because we've already disqualified the parameter for a different reason. So it looks like the main effect is instead to set up param_size_limit and nonarg_acc_size, the latter of which I assume is 0 when desc->locally_unused. The reason for asking is that the final "else" says that we've already checked that param_size_limit is in range, but that's only true if desc->split_candidate. In particular: if (is_gimple_reg (parm) && !isra_track_scalar_param_local_uses (fun, node, parm, num, _call_uses)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " is a scalar with only %i call uses\n", scalar_call_uses); desc->locally_unused = true; desc->call_uses = scalar_call_uses; } happens before: if (type_size == 0 || type_size >= ISRA_ARG_SIZE_LIMIT) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " not a candidate, has zero or huge size\n"); continue; } An uninteresting case in which we truncate the value is: typedef unsigned int big_vec __attribute__((vector_size (0x2))); void foo (big_vec x) {} and going further triggers an assert in the tree_to_uhwi (...): typedef unsigned int omg_vec __attribute__((vector_size (1ULL << 61))); void foo (omg_vec x) {} but as you can guess, SVE is the real reason I'm asking. ;-) FWIW, I tried changing it to: gensum_param_desc *desc = &(*param_descriptions)[desc_index]; if (!desc->split_candidate) continue;
Re: [PATCH 3/4] New IPA-SRA implementation
Hi, On Fri, Sep 13 2019, Jan Hubicka wrote: >> This patch actually adds the analysis bits of IPA-SRA - both the >> function summary generation and the interprocedural analysis and >> decision stage. The transformation itself then happens in the call >> graph cloning infrastructure changes which are in the previous patch. >> Please see the cover letter of the whole patch-set for more >> information. >> >> This is mostly only a rebase on the current trunk of the earlier >> submission, the only functional change is that the pass does not clone >> when all the work (unused parameter removal) has already been done by >> IPA-CP. >> >> Martin >> >> 2019-08-20 Martin Jambor >> >> * Makefile.in (GTFILES): Added ipa-sra.c. >> (OBJS): Added ipa-sra.o. >> * dbgcnt.def: Added ipa_sra_params and ipa_sra_retvalues. >> * doc/invoke.texi (ipa-sra-max-replacements): New. >> * ipa-sra.c: New file. >> * lto-section-in.c (lto_section_name): Added ipa-sra section. >> * lto-streamer.h (lto_section_type): Likewise. >> * params.def (PARAM_IPA_SRA_MAX_REPLACEMENTS): New. >> * passes.def: Add new IPA-SRA. >> * tree-pass.h (make_pass_ipa_sra): Declare. > OK Thanks! >> +#define IPA_SRA_MAX_PARAM_FLOW_LEN 7 > I would move this to the place ISRA_ARG_SIZE_LIMIT_BITS is defined so > they appaer at same place. Perhaps C++ way would be to use constant > member variable? OK ... >> + >> +/* Quick mapping from a decl to its param descriptor. */ >> + >> +static hash_map *decl2desc; >> + >> +/* Countdown of allowe Alias analysis steps during summary building. */ >> + >> +static int aa_walking_limit; >> + >> +/* This is a table in which for each basic block and parameter there is a >> + distance (offset + size) in that parameter which is dereferenced and >> + accessed in that BB. */ >> +static HOST_WIDE_INT *bb_dereferences = NULL; >> +/* How many by-reference parameters there are in the current function. */ >> +static int by_ref_count; >> + >> +/* Bitmap of BBs that can cause the function to "stop" progressing by >> + returning, throwing externally, looping infinitely or calling a function >> + which might abort etc.. */ >> +static bitmap final_bbs; >> + >> +/* Obstack to allocate various small structures required only when >> generating >> + summary for a function. */ >> +static struct obstack gensum_obstack; > > Perhaps place the static vars at the beginig of the namespace. > I think "static" should not be used when declaring vars in a namespace. OK, I moved them to the beginning of the namespace and removed the static modifier from them. >> +/* Scan body function described by NODE and FUN and create access trees for >> + parameters. */ >> + >> +static void >> +scan_function (cgraph_node *node, struct function *fun) >> +{ >> + basic_block bb; >> + >> + FOR_EACH_BB_FN (bb, fun) >> +{ >> + gimple_stmt_iterator gsi; >> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ()) >> +{ >> + gimple *stmt = gsi_stmt (gsi); >> + >> + if (stmt_can_throw_external (fun, stmt)) >> +bitmap_set_bit (final_bbs, bb->index); >> + switch (gimple_code (stmt)) >> +{ >> +case GIMPLE_RETURN: >> + { >> +tree t = gimple_return_retval (as_a (stmt)); >> +if (t != NULL_TREE) >> + scan_expr_access (t, stmt, ISRA_CTX_LOAD, bb); >> +bitmap_set_bit (final_bbs, bb->index); >> + } >> + break; >> + >> +case GIMPLE_ASSIGN: >> + if (gimple_assign_single_p (stmt) >> + && !gimple_clobber_p (stmt)) >> +{ >> + tree rhs = gimple_assign_rhs1 (stmt); >> + scan_expr_access (rhs, stmt, ISRA_CTX_LOAD, bb); >> + tree lhs = gimple_assign_lhs (stmt); >> + scan_expr_access (lhs, stmt, ISRA_CTX_STORE, bb); >> +} >> + break; >> + >> +case GIMPLE_CALL: >> + { >> +unsigned argument_count = gimple_call_num_args (stmt); >> +scan_call_info call_info; >> +call_info.cs = node->get_edge (stmt); >> +call_info.argument_count = argument_count; >> + >> +for (unsigned i = 0; i < argument_count; i++) >> + { >> +call_info.arg_idx = i; >> +scan_expr_access (gimple_call_arg (stmt, i), stmt, >> + ISRA_CTX_ARG, bb, _info); >> + } >> + >> +tree lhs = gimple_call_lhs (stmt); >> +if (lhs) >> + scan_expr_access (lhs, stmt, ISRA_CTX_STORE, bb); >> +int flags = gimple_call_flags (stmt); >> +if ((flags & (ECF_CONST | ECF_PURE)) == 0) >> + bitmap_set_bit (final_bbs, bb->index); >> + } >> + break; >> + >> +case GIMPLE_ASM: >> + { >> +gasm *asm_stmt = as_a (stmt); >> +
Re: [PATCH 3/4] New IPA-SRA implementation
> This patch actually adds the analysis bits of IPA-SRA - both the > function summary generation and the interprocedural analysis and > decision stage. The transformation itself then happens in the call > graph cloning infrastructure changes which are in the previous patch. > Please see the cover letter of the whole patch-set for more > information. > > This is mostly only a rebase on the current trunk of the earlier > submission, the only functional change is that the pass does not clone > when all the work (unused parameter removal) has already been done by > IPA-CP. > > Martin > > 2019-08-20 Martin Jambor > > * Makefile.in (GTFILES): Added ipa-sra.c. > (OBJS): Added ipa-sra.o. > * dbgcnt.def: Added ipa_sra_params and ipa_sra_retvalues. > * doc/invoke.texi (ipa-sra-max-replacements): New. > * ipa-sra.c: New file. > * lto-section-in.c (lto_section_name): Added ipa-sra section. > * lto-streamer.h (lto_section_type): Likewise. > * params.def (PARAM_IPA_SRA_MAX_REPLACEMENTS): New. > * passes.def: Add new IPA-SRA. > * tree-pass.h (make_pass_ipa_sra): Declare. OK > +#define IPA_SRA_MAX_PARAM_FLOW_LEN 7 I would move this to the place ISRA_ARG_SIZE_LIMIT_BITS is defined so they appaer at same place. Perhaps C++ way would be to use constant member variable? > + > +/* Structure to describe which formal parameters feed into a particular > actual > + arguments. */ > + > +struct isra_param_flow > +{ > + /* Number of elements in array inputs that contain valid data. */ > + char length; > + /* Indices of formal parameters that feed into the described actual > argument. > + If aggregate_pass_through or pointer_pass_through below are true, it > must > + contain exactly one element which is passed through from a formal > + parameter if the given number. Otherwise, the array contains indices of > + collee's formal parameters which are used to calculate value of this > + actual argument. */ > + unsigned char inputs[IPA_SRA_MAX_PARAM_FLOW_LEN]; > + > + /* Offset within the formal parameter. */ > + unsigned unit_offset; > + /* Size of the portion of the formal parameter that is being passed. */ > + unsigned unit_size : ISRA_ARG_SIZE_LIMIT_BITS; > + > + /* True when the value of this actual copy is a portion of a formal > + parameter. */ > + unsigned aggregate_pass_through : 1; > + /* True when the value of this actual copy is a verbatim pass through of an > + obtained pointer. */ > + unsigned pointer_pass_through : 1; > + /* True when it is safe to copy access candidates here from the callee, > which > + would mean introducing dereferences into callers of the caller. */ > + unsigned safe_to_import_accesses : 1; > +}; > + > +/* Strucutre used to convey information about calls from the intra-procedurwl > + analysis stage to inter-procedural one. */ > + > +class isra_call_summary > +{ > +public: > + isra_call_summary () > +: m_arg_flow (), m_return_ignored (false), m_return_returned (false), > + m_bit_aligned_arg (false) > + {} > + > + void init_inputs (unsigned arg_count); > + void dump (FILE *f); > + > + /* Information about what formal parameters of the caller are used to > compute > + indivisual actual arguments of this call. */ > + auto_vec m_arg_flow; > + > + /* Set to true if the call statement does not have a LHS. */ > + unsigned m_return_ignored : 1; > + > + /* Set to true if the LHS of call statement is only used to construct the > + return value of the caller. */ > + unsigned m_return_returned : 1; > + > + /* Set when any of the call arguments are not byte-aligned. */ > + unsigned m_bit_aligned_arg : 1; > +}; > + > +/* Class to manage function summaries. */ > + > +class GTY((user)) ipa_sra_function_summaries > + : public function_summary > +{ > +public: > + ipa_sra_function_summaries (symbol_table *table, bool ggc): > +function_summary (table, ggc) { } > + > + virtual void duplicate (cgraph_node *, cgraph_node *, > + isra_func_summary *old_sum, > + isra_func_summary *new_sum); > +}; > + > +/* Hook that is called by summary when a node is duplicated. */ > + > +void > +ipa_sra_function_summaries::duplicate (cgraph_node *, cgraph_node *, > +isra_func_summary *old_sum, > +isra_func_summary *new_sum) > +{ > + /* TODO: Somehow stop copying when ISRA is doing the cloning, it is > + useless. */ > + new_sum->m_candidate = old_sum->m_candidate; > + new_sum->m_returns_value = old_sum->m_returns_value; > + new_sum->m_return_ignored = old_sum->m_return_ignored; > + gcc_assert (!old_sum->m_queued); > + new_sum->m_queued = false; > + > + unsigned param_count = vec_safe_length (old_sum->m_parameters); > + if (!param_count) > +return; > + vec_safe_reserve_exact (new_sum->m_parameters, param_count); > +