Re: [PATCH 3/4] New IPA-SRA implementation

2019-09-24 Thread Martin Jambor
Hi,

sorry for replying so late, I still haven't recovered from two weeks of
traveling and conferences.

On Sat, Sep 21 2019, Richard Sandiford wrote:
> Hi,
>
> Thanks for doing this.
>
> Martin Jambor  writes:
>> +/* Analyze function body scan results stored in param_accesses and
>> +   param_accesses, detect possible transformations and store information of
>> +   those in function summary.  NODE, FUN and IFS are all various structures
>> +   describing the currently analyzed function.  */
>> +
>> +static void
>> +process_scan_results (cgraph_node *node, struct function *fun,
>> +  isra_func_summary *ifs,
>> +  vec *param_descriptions)
>> +{
>> +  bool check_pass_throughs = false;
>> +  bool dereferences_propagated = false;
>> +  tree parm = DECL_ARGUMENTS (node->decl);
>> +  unsigned param_count = param_descriptions->length();
>> +
>> +  for (unsigned desc_index = 0;
>> +   desc_index < param_count;
>> +   desc_index++, parm = DECL_CHAIN (parm))
>> +{
>> +  gensum_param_desc *desc = &(*param_descriptions)[desc_index];
>> +  if (!desc->locally_unused && !desc->split_candidate)
>> +continue;
>
> I'm jumping in the middle without working through the whole pass,
> so this is probably a daft question sorry, but: what is this loop
> required to do when:
>
>   !desc->split_candidate && desc->locally_unused

You have figured out correctly that this is a thinko.  I meant not to
continue for non-register-types which might not be used locally but
their locally_unused flag is only set a few lines below...

>
> ?  AFAICT...
>
>> +
>> +  if (flag_checking)
>> +isra_verify_access_tree (desc->accesses);
>> +
>> +  if (!dereferences_propagated
>> +  && desc->by_ref
>> +  && desc->accesses)
>> +{
>> +  propagate_dereference_distances (fun);
>> +  dereferences_propagated = true;
>> +}
>> +
>> +  HOST_WIDE_INT nonarg_acc_size = 0;
>> +  bool only_calls = true;
>> +  bool check_failed = false;
>> +
>> +  int entry_bb_index = ENTRY_BLOCK_PTR_FOR_FN (fun)->index;
>> +  for (gensum_param_access *acc = desc->accesses;
>> +   acc;
>> +   acc = acc->next_sibling)
>> +if (check_gensum_access (parm, desc, acc, _acc_size, _calls,
>> + entry_bb_index))
>> +  {
>> +check_failed = true;
>> +break;
>> +  }
>> +  if (check_failed)
>> +continue;
>> +
>> +  if (only_calls)
>> +desc->locally_unused = true;

...specifically here.

>> +
>> +  HOST_WIDE_INT cur_param_size
>> += tree_to_uhwi (TYPE_SIZE (TREE_TYPE (parm)));
>> +  HOST_WIDE_INT param_size_limit;
>> +  if (!desc->by_ref || optimize_function_for_size_p (fun))
>> +param_size_limit = cur_param_size;
>> +  else
>> +param_size_limit = (PARAM_VALUE (PARAM_IPA_SRA_PTR_GROWTH_FACTOR)
>> +   * cur_param_size);
>> +  if (nonarg_acc_size > param_size_limit
>> +  || (!desc->by_ref && nonarg_acc_size == param_size_limit))
>> +{
>> +  disqualify_split_candidate (desc, "Would result into a too big set of"
>> +  "replacements.");
>> +}
>> +  else
>> +{
>> +  /* create_parameter_descriptors makes sure unit sizes of all
>> + candidate parameters fit unsigned integers restricted to
>> + ISRA_ARG_SIZE_LIMIT.  */
>> +  desc->param_size_limit = param_size_limit / BITS_PER_UNIT;
>> +  desc->nonarg_acc_size = nonarg_acc_size / BITS_PER_UNIT;
>> +  if (desc->split_candidate && desc->ptr_pt_count)
>> +{
>> +  gcc_assert (desc->by_ref);
>> +  check_pass_throughs = true;
>> +}
>> +}
>> +}
>
> ...disqualify_split_candidate should be a no-op in that case,
> because we've already disqualified the parameter for a different reason.
> So it looks like the main effect is instead to set up param_size_limit
> and nonarg_acc_size, the latter of which I assume is 0 when
> desc->locally_unused.

This is the only bit where you are wrong, param_size_limit is the type
size for aggregates and twice pointer size for pointers (well, actually
PARAM_IPA_SRA_PTR_GROWTH_FACTOR times the size of a pointer).  Even for
locally unused parameters because we might "pull" some of them from
callees, when there are some.  But it is not really relevant for the
problem you are facing.

>
> The reason for asking is that the final "else" says that we've already
> checked that param_size_limit is in range, but that's only true if
> desc->split_candidate.  In particular:
>
>   if (is_gimple_reg (parm)
> && !isra_track_scalar_param_local_uses (fun, node, parm, num,
> _call_uses))
>   {
> if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file, " is a scalar with only %i call uses\n",
>scalar_call_uses);
>
> desc->locally_unused = true;
> desc->call_uses 

Re: [PATCH 3/4] New IPA-SRA implementation

2019-09-21 Thread Richard Sandiford
Hi,

Thanks for doing this.

Martin Jambor  writes:
> +/* Analyze function body scan results stored in param_accesses and
> +   param_accesses, detect possible transformations and store information of
> +   those in function summary.  NODE, FUN and IFS are all various structures
> +   describing the currently analyzed function.  */
> +
> +static void
> +process_scan_results (cgraph_node *node, struct function *fun,
> +   isra_func_summary *ifs,
> +   vec *param_descriptions)
> +{
> +  bool check_pass_throughs = false;
> +  bool dereferences_propagated = false;
> +  tree parm = DECL_ARGUMENTS (node->decl);
> +  unsigned param_count = param_descriptions->length();
> +
> +  for (unsigned desc_index = 0;
> +   desc_index < param_count;
> +   desc_index++, parm = DECL_CHAIN (parm))
> +{
> +  gensum_param_desc *desc = &(*param_descriptions)[desc_index];
> +  if (!desc->locally_unused && !desc->split_candidate)
> + continue;

I'm jumping in the middle without working through the whole pass,
so this is probably a daft question sorry, but: what is this loop
required to do when:

  !desc->split_candidate && desc->locally_unused

?  AFAICT...

> +
> +  if (flag_checking)
> + isra_verify_access_tree (desc->accesses);
> +
> +  if (!dereferences_propagated
> +   && desc->by_ref
> +   && desc->accesses)
> + {
> +   propagate_dereference_distances (fun);
> +   dereferences_propagated = true;
> + }
> +
> +  HOST_WIDE_INT nonarg_acc_size = 0;
> +  bool only_calls = true;
> +  bool check_failed = false;
> +
> +  int entry_bb_index = ENTRY_BLOCK_PTR_FOR_FN (fun)->index;
> +  for (gensum_param_access *acc = desc->accesses;
> +acc;
> +acc = acc->next_sibling)
> + if (check_gensum_access (parm, desc, acc, _acc_size, _calls,
> +  entry_bb_index))
> +   {
> + check_failed = true;
> + break;
> +   }
> +  if (check_failed)
> + continue;
> +
> +  if (only_calls)
> + desc->locally_unused = true;
> +
> +  HOST_WIDE_INT cur_param_size
> + = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (parm)));
> +  HOST_WIDE_INT param_size_limit;
> +  if (!desc->by_ref || optimize_function_for_size_p (fun))
> + param_size_limit = cur_param_size;
> +  else
> + param_size_limit = (PARAM_VALUE (PARAM_IPA_SRA_PTR_GROWTH_FACTOR)
> +* cur_param_size);
> +  if (nonarg_acc_size > param_size_limit
> +   || (!desc->by_ref && nonarg_acc_size == param_size_limit))
> + {
> +   disqualify_split_candidate (desc, "Would result into a too big set of"
> +   "replacements.");
> + }
> +  else
> + {
> +   /* create_parameter_descriptors makes sure unit sizes of all
> +  candidate parameters fit unsigned integers restricted to
> +  ISRA_ARG_SIZE_LIMIT.  */
> +   desc->param_size_limit = param_size_limit / BITS_PER_UNIT;
> +   desc->nonarg_acc_size = nonarg_acc_size / BITS_PER_UNIT;
> +   if (desc->split_candidate && desc->ptr_pt_count)
> + {
> +   gcc_assert (desc->by_ref);
> +   check_pass_throughs = true;
> + }
> + }
> +}

...disqualify_split_candidate should be a no-op in that case,
because we've already disqualified the parameter for a different reason.
So it looks like the main effect is instead to set up param_size_limit
and nonarg_acc_size, the latter of which I assume is 0 when
desc->locally_unused.

The reason for asking is that the final "else" says that we've already
checked that param_size_limit is in range, but that's only true if
desc->split_candidate.  In particular:

  if (is_gimple_reg (parm)
  && !isra_track_scalar_param_local_uses (fun, node, parm, num,
  _call_uses))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, " is a scalar with only %i call uses\n",
 scalar_call_uses);

  desc->locally_unused = true;
  desc->call_uses = scalar_call_uses;
}

happens before:

  if (type_size == 0
  || type_size >= ISRA_ARG_SIZE_LIMIT)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, " not a candidate, has zero or huge size\n");
  continue;
}

An uninteresting case in which we truncate the value is:

typedef unsigned int big_vec __attribute__((vector_size (0x2)));
void foo (big_vec x) {}

and going further triggers an assert in the tree_to_uhwi (...):

typedef unsigned int omg_vec __attribute__((vector_size (1ULL << 61)));
void foo (omg_vec x) {}

but as you can guess, SVE is the real reason I'm asking. ;-)

FWIW, I tried changing it to:

  gensum_param_desc *desc = &(*param_descriptions)[desc_index];
  if (!desc->split_candidate)
continue;


Re: [PATCH 3/4] New IPA-SRA implementation

2019-09-19 Thread Martin Jambor
Hi,

On Fri, Sep 13 2019, Jan Hubicka wrote:
>> This patch actually adds the analysis bits of IPA-SRA - both the
>> function summary generation and the interprocedural analysis and
>> decision stage.  The transformation itself then happens in the call
>> graph cloning infrastructure changes which are in the previous patch.
>> Please see the cover letter of the whole patch-set for more
>> information.
>> 
>> This is mostly only a rebase on the current trunk of the earlier
>> submission, the only functional change is that the pass does not clone
>> when all the work (unused parameter removal) has already been done by
>> IPA-CP.
>> 
>> Martin
>> 
>> 2019-08-20  Martin Jambor  
>> 
>> * Makefile.in (GTFILES): Added ipa-sra.c.
>> (OBJS): Added ipa-sra.o.
>> * dbgcnt.def: Added ipa_sra_params and ipa_sra_retvalues.
>> * doc/invoke.texi (ipa-sra-max-replacements): New.
>> * ipa-sra.c: New file.
>> * lto-section-in.c (lto_section_name): Added ipa-sra section.
>> * lto-streamer.h (lto_section_type): Likewise.
>> * params.def (PARAM_IPA_SRA_MAX_REPLACEMENTS): New.
>> * passes.def: Add new IPA-SRA.
>> * tree-pass.h (make_pass_ipa_sra): Declare.
> OK

Thanks!

>> +#define IPA_SRA_MAX_PARAM_FLOW_LEN 7
> I would move this to the place ISRA_ARG_SIZE_LIMIT_BITS is defined so
> they appaer at same place.  Perhaps C++ way would be to use constant
> member variable?

OK

...

>> +
>> +/* Quick mapping from a decl to its param descriptor.  */
>> +
>> +static hash_map *decl2desc;
>> +
>> +/* Countdown of allowe Alias analysis steps during summary building.  */
>> +
>> +static int aa_walking_limit;
>> +
>> +/* This is a table in which for each basic block and parameter there is a
>> +   distance (offset + size) in that parameter which is dereferenced and
>> +   accessed in that BB.  */
>> +static HOST_WIDE_INT *bb_dereferences = NULL;
>> +/* How many by-reference parameters there are in the current function.  */
>> +static int by_ref_count;
>> +
>> +/* Bitmap of BBs that can cause the function to "stop" progressing by
>> +   returning, throwing externally, looping infinitely or calling a function
>> +   which might abort etc.. */
>> +static bitmap final_bbs;
>> +
>> +/* Obstack to allocate various small structures required only when 
>> generating
>> +   summary for a function.  */
>> +static struct obstack gensum_obstack;
>
> Perhaps place the static vars at the beginig of the namespace.
> I think "static" should not be used when declaring vars in a namespace.


OK, I moved them to the beginning of the namespace and removed the
static modifier from them.



>> +/* Scan body function described by NODE and FUN and create access trees for
>> +   parameters.  */
>> +
>> +static void
>> +scan_function (cgraph_node *node, struct function *fun)
>> +{
>> +  basic_block bb;
>> +
>> +  FOR_EACH_BB_FN (bb, fun)
>> +{
>> +  gimple_stmt_iterator gsi;
>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
>> +{
>> +  gimple *stmt = gsi_stmt (gsi);
>> +
>> +  if (stmt_can_throw_external (fun, stmt))
>> +bitmap_set_bit (final_bbs, bb->index);
>> +  switch (gimple_code (stmt))
>> +{
>> +case GIMPLE_RETURN:
>> +  {
>> +tree t = gimple_return_retval (as_a  (stmt));
>> +if (t != NULL_TREE)
>> +  scan_expr_access (t, stmt, ISRA_CTX_LOAD, bb);
>> +bitmap_set_bit (final_bbs, bb->index);
>> +  }
>> +  break;
>> +
>> +case GIMPLE_ASSIGN:
>> +  if (gimple_assign_single_p (stmt)
>> +  && !gimple_clobber_p (stmt))
>> +{
>> +  tree rhs = gimple_assign_rhs1 (stmt);
>> +  scan_expr_access (rhs, stmt, ISRA_CTX_LOAD, bb);
>> +  tree lhs = gimple_assign_lhs (stmt);
>> +  scan_expr_access (lhs, stmt, ISRA_CTX_STORE, bb);
>> +}
>> +  break;
>> +
>> +case GIMPLE_CALL:
>> +  {
>> +unsigned argument_count = gimple_call_num_args (stmt);
>> +scan_call_info call_info;
>> +call_info.cs = node->get_edge (stmt);
>> +call_info.argument_count = argument_count;
>> +
>> +for (unsigned i = 0; i < argument_count; i++)
>> +  {
>> +call_info.arg_idx = i;
>> +scan_expr_access (gimple_call_arg (stmt, i), stmt,
>> +  ISRA_CTX_ARG, bb, _info);
>> +  }
>> +
>> +tree lhs = gimple_call_lhs (stmt);
>> +if (lhs)
>> +  scan_expr_access (lhs, stmt, ISRA_CTX_STORE, bb);
>> +int flags = gimple_call_flags (stmt);
>> +if ((flags & (ECF_CONST | ECF_PURE)) == 0)
>> +  bitmap_set_bit (final_bbs, bb->index);
>> +  }
>> +  break;
>> +
>> +case GIMPLE_ASM:
>> +  {
>> +gasm *asm_stmt = as_a  (stmt);
>> +

Re: [PATCH 3/4] New IPA-SRA implementation

2019-09-13 Thread Jan Hubicka
> This patch actually adds the analysis bits of IPA-SRA - both the
> function summary generation and the interprocedural analysis and
> decision stage.  The transformation itself then happens in the call
> graph cloning infrastructure changes which are in the previous patch.
> Please see the cover letter of the whole patch-set for more
> information.
> 
> This is mostly only a rebase on the current trunk of the earlier
> submission, the only functional change is that the pass does not clone
> when all the work (unused parameter removal) has already been done by
> IPA-CP.
> 
> Martin
> 
> 2019-08-20  Martin Jambor  
> 
> * Makefile.in (GTFILES): Added ipa-sra.c.
> (OBJS): Added ipa-sra.o.
> * dbgcnt.def: Added ipa_sra_params and ipa_sra_retvalues.
> * doc/invoke.texi (ipa-sra-max-replacements): New.
> * ipa-sra.c: New file.
> * lto-section-in.c (lto_section_name): Added ipa-sra section.
> * lto-streamer.h (lto_section_type): Likewise.
> * params.def (PARAM_IPA_SRA_MAX_REPLACEMENTS): New.
> * passes.def: Add new IPA-SRA.
> * tree-pass.h (make_pass_ipa_sra): Declare.
OK
> +#define IPA_SRA_MAX_PARAM_FLOW_LEN 7
I would move this to the place ISRA_ARG_SIZE_LIMIT_BITS is defined so
they appaer at same place.  Perhaps C++ way would be to use constant
member variable?
> +
> +/* Structure to describe which formal parameters feed into a particular 
> actual
> +   arguments.  */
> +
> +struct isra_param_flow
> +{
> +  /* Number of elements in array inputs that contain valid data.  */
> +  char length;
> +  /* Indices of formal parameters that feed into the described actual 
> argument.
> + If aggregate_pass_through or pointer_pass_through below are true, it 
> must
> + contain exactly one element which is passed through from a formal
> + parameter if the given number.  Otherwise, the array contains indices of
> + collee's formal parameters which are used to calculate value of this
> + actual argument. */
> +  unsigned char inputs[IPA_SRA_MAX_PARAM_FLOW_LEN];
> +
> +  /* Offset within the formal parameter.  */
> +  unsigned unit_offset;
> +  /* Size of the portion of the formal parameter that is being passed.  */
> +  unsigned unit_size : ISRA_ARG_SIZE_LIMIT_BITS;

> +
> +  /* True when the value of this actual copy is a portion of a formal
> + parameter.  */
> +  unsigned aggregate_pass_through : 1;
> +  /* True when the value of this actual copy is a verbatim pass through of an
> + obtained pointer.  */
> +  unsigned pointer_pass_through : 1;
> +  /* True when it is safe to copy access candidates here from the callee, 
> which
> + would mean introducing dereferences into callers of the caller.  */
> +  unsigned safe_to_import_accesses : 1;
> +};
> +
> +/* Strucutre used to convey information about calls from the intra-procedurwl
> +   analysis stage to inter-procedural one.  */
> +
> +class isra_call_summary
> +{
> +public:
> +  isra_call_summary ()
> +: m_arg_flow (), m_return_ignored (false), m_return_returned (false),
> +  m_bit_aligned_arg (false)
> +  {}
> +
> +  void init_inputs (unsigned arg_count);
> +  void dump (FILE *f);
> +
> +  /* Information about what formal parameters of the caller are used to 
> compute
> + indivisual actual arguments of this call.  */
> +  auto_vec  m_arg_flow;
> +
> +  /* Set to true if the call statement does not have a LHS.  */
> +  unsigned m_return_ignored : 1;
> +
> +  /* Set to true if the LHS of call statement is only used to construct the
> + return value of the caller.  */
> +  unsigned m_return_returned : 1;
> +
> +  /* Set when any of the call arguments are not byte-aligned.  */
> +  unsigned m_bit_aligned_arg : 1;
> +};
> +
> +/* Class to manage function summaries.  */
> +
> +class GTY((user)) ipa_sra_function_summaries
> +  : public function_summary 
> +{
> +public:
> +  ipa_sra_function_summaries (symbol_table *table, bool ggc):
> +function_summary (table, ggc) { }
> +
> +  virtual void duplicate (cgraph_node *, cgraph_node *,
> +   isra_func_summary *old_sum,
> +   isra_func_summary *new_sum);
> +};
> +
> +/* Hook that is called by summary when a node is duplicated.  */
> +
> +void
> +ipa_sra_function_summaries::duplicate (cgraph_node *, cgraph_node *,
> +isra_func_summary *old_sum,
> +isra_func_summary *new_sum)
> +{
> +  /* TODO: Somehow stop copying when ISRA is doing the cloning, it is
> + useless.  */
> +  new_sum->m_candidate  = old_sum->m_candidate;
> +  new_sum->m_returns_value = old_sum->m_returns_value;
> +  new_sum->m_return_ignored = old_sum->m_return_ignored;
> +  gcc_assert (!old_sum->m_queued);
> +  new_sum->m_queued = false;
> +
> +  unsigned param_count = vec_safe_length (old_sum->m_parameters);
> +  if (!param_count)
> +return;
> +  vec_safe_reserve_exact (new_sum->m_parameters, param_count);
> +