Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model

2023-09-06 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

general remark upfront:  Please add function-level comments for all
functions.  This makes reading and reviewing much easier.  I had to sweep
back and forth quite a bit.

> +
> +static int
> +get_last_live_range (const vec _ranges, tree var)
> +{
> +  unsigned int ix;
> +  var_live_range *live_range;
> +  FOR_EACH_VEC_ELT_REVERSE (live_ranges, ix, live_range)
> +if (live_range->var == var)
> +  return ix;
> +  return -1;
> +}

>From reading the usage site of this function it looks like we could benefit
from having the live ranges be a hash_map as well?  That way we wouldn't
need to scan through the list every time.  Something like
hash_map>.  It looks like we only consider the range
end anyway.

> +   int index = get_last_live_range (live_ranges, var);

That way we could avoid some worst-case behavior here for pathological
inputs.

> +   if (index == -1)
> + {
> +   var_live_range range = {var, 0, point};
> +   live_ranges.safe_push (range);
> + }

Please add a comment that we assume the variable is live from the start
of this block. 

> +   else
> + live_ranges[index].end = point;

And here a comment that we will grow the live range for each use.

> +static bool
> +live_range_conflict_p (const var_live_range _range1,
> +const var_live_range _range2)
> +{
> +  if (live_range1.start >= live_range2.end)
> +return false;
> +  if (live_range1.end <= live_range2.start)
> +return false;
> +  if (live_range2.start >= live_range1.end)
> +return false;
> +  if (live_range2.end <= live_range1.start)
> +return false;
> +  return true;
> +}

Rename to live_range_overlap_p and simplify to
 return a.end >= b.start || b.end >= a.start;

> +
> +static unsigned int
> +max_number_of_live_regs (const basic_block bb,
> +  const vec _ranges,
> +  machine_mode biggest_mode, int lmul)
> +{
> +  unsigned int max_nregs = 0;
> +  unsigned int i, j, k;
> +  unsigned int live_point = 0;
> +  for (i = 0; i < live_ranges.length (); i++)
> +{
> +  auto_vec conflict_live_ranges;
> +  var_live_range live_range = live_ranges[i];
> +  conflict_live_ranges.safe_push (live_range);
> +  unsigned int min_point = live_range.start;
> +  unsigned int max_point = live_range.end;
> +  for (j = 0; j < live_ranges.length (); j++)
> + {
> +   if (j == i)
> + continue;
> +   if (live_range_conflict_p (live_range, live_ranges[j]))
> + {
> +   conflict_live_ranges.safe_push (live_ranges[j]);
> +   min_point
> + = std::min (min_point, (unsigned int) live_ranges[j].start);
> +   max_point
> + = std::max (max_point, (unsigned int) live_ranges[j].end);
> + }
> + }
> +  for (j = min_point; j <= max_point; j++)
> + {
> +   unsigned int nregs = 0;
> +   for (k = 0; k < conflict_live_ranges.length (); k++)
> + {
> +   if (j >= (unsigned int) conflict_live_ranges[k].start
> +   && j <= (unsigned int) conflict_live_ranges[k].end)
> + {
> +   machine_mode mode
> + = TYPE_MODE (TREE_TYPE (conflict_live_ranges[k].var));
> +   nregs += compute_nregs_for_mode (mode, biggest_mode, lmul);
> + }
> + }
> +   if (nregs > max_nregs)
> + {
> +   max_nregs = nregs;
> +   live_point = j;
> + }
> + }
> +}

This looks pretty quadratic in the number of live ranges (or even cubic?).
Can't it be done more efficiently using a sliding-window approach by sorting
the live ranges according to their start point before?
Also std::min/max -> MIN/MAX.

> +
> +  /* Collect user explicit RVV type.  */
> +  hash_set all_preds = get_all_predecessors (bb);
> +  hash_set all_succs = get_all_successors (bb);

As mentioned before, maybe dominator info could help here?

> +  for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++)
> +{
> +  tree t = ssa_name (i);
> +  if (!t)
> + continue;
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (t));
> +  if (!lookup_vector_type_attribute (TREE_TYPE (t))
> +   && !riscv_v_ext_vls_mode_p (mode))
> + continue;
> +
> +  gimple *def = SSA_NAME_DEF_STMT (t);
> +  if (gimple_bb (def) && !all_preds.contains (gimple_bb (def)))
> + continue;
> +  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
> +  const ssa_use_operand_t *ptr;
> +
> +  for (ptr = head->next; ptr != head; ptr = ptr->next)
> + {
> +   if (USE_STMT (ptr) && !is_gimple_debug (USE_STMT (ptr)))
> + {
> +   if (all_succs.contains (gimple_bb (USE_STMT (ptr
> + {

Reverse the conditions and continue, i.e. if (!USE_STMT || is_gimple_debug
 || !all_succs.contains).

> +
> +static int
> 

Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model

2023-09-05 Thread Jeff Law via Gcc-patches




On 9/5/23 15:39, 钟居哲 wrote:

- Why don't we use the normal reverse postorder (or postorder) approach of
    computing live ranges?  Is that because we don't really need full global
    live ranges?

Yes. We don't need global live ranges.

- Why can't we use existing code i.e. tree-ssa-live?  I suspect I already
    know the answer but an explanation (in a comment) would still be useful.

The existing code can't help I have tried many times.
I would expect it to be fairly hard to use for this purpose.  I've tried 
to use it in other contexts as well without success.


Jeff


Re: Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model

2023-09-05 Thread 钟居哲
- Why don't we use the normal reverse postorder (or postorder) approach of
   computing live ranges?  Is that because we don't really need full global
   live ranges?

Yes. We don't need global live ranges.

- Why can't we use existing code i.e. tree-ssa-live?  I suspect I already
   know the answer but an explanation (in a comment) would still be useful.

The existing code can't help I have tried many times.

- Do we really need get_all_predecessors/get_all_successors?  As they're
   only used for "defined before" and "used after", at first glance it
   looks like some kind of dominance info could help there but I didn't
   really check in detail.

Yes. At the first time, I want to use dominance analysis but I am not sure 
whether
we can use df_analyze () in COST model framwork. It worth trying.

- Why don't we use bitmaps/sbitmaps like in vsetvl.cc and other related
   passes?  I don't mind maps but just wonder if it's on purpose, for
   convenience or something else.

I don't know how to use bitmap to substitue the current approach of using map.

Besides, it might help to rename program_points_map (into program_points_per_bb
or so).  At first it looked quadratic to me but we're just iterating over
the program points of a BB.

Ok.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-06 05:02
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe,
 
I think the general approach makes sense and it doesn't need to be perfect
from the beginning as we can always iterate on it.  Before continuing with a
more detailed review (hopefully tomorrow) some high-level questions upfront.
It would help to document some of these choices so it's easier to understand
the rationale.
 
- Why don't we use the normal reverse postorder (or postorder) approach of
   computing live ranges?  Is that because we don't really need full global
   live ranges?
 
- Why can't we use existing code i.e. tree-ssa-live?  I suspect I already
   know the answer but an explanation (in a comment) would still be useful.
 
- Do we really need get_all_predecessors/get_all_successors?  As they're
   only used for "defined before" and "used after", at first glance it
   looks like some kind of dominance info could help there but I didn't
   really check in detail.
 
- Why don't we use bitmaps/sbitmaps like in vsetvl.cc and other related
   passes?  I don't mind maps but just wonder if it's on purpose, for
   convenience or something else. 
 
Besides, it might help to rename program_points_map (into program_points_per_bb
or so).  At first it looked quadratic to me but we're just iterating over
the program points of a BB.
 
Regards
Robin
 
 


Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model

2023-09-05 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

I think the general approach makes sense and it doesn't need to be perfect
from the beginning as we can always iterate on it.  Before continuing with a
more detailed review (hopefully tomorrow) some high-level questions upfront.
It would help to document some of these choices so it's easier to understand
the rationale.

 - Why don't we use the normal reverse postorder (or postorder) approach of
   computing live ranges?  Is that because we don't really need full global
   live ranges?

 - Why can't we use existing code i.e. tree-ssa-live?  I suspect I already
   know the answer but an explanation (in a comment) would still be useful.

 - Do we really need get_all_predecessors/get_all_successors?  As they're
   only used for "defined before" and "used after", at first glance it
   looks like some kind of dominance info could help there but I didn't
   really check in detail.

 - Why don't we use bitmaps/sbitmaps like in vsetvl.cc and other related
   passes?  I don't mind maps but just wonder if it's on purpose, for
   convenience or something else. 

Besides, it might help to rename program_points_map (into program_points_per_bb
or so).  At first it looked quadratic to me but we're just iterating over
the program points of a BB.

Regards
 Robin