Re: [PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-05-07 Thread Lehua Ding

Hi Vladimir,

I'll send V3 patchs based on these comments. Note that these four 
patches only support subreg liveness tracking and apply to IRA and LRA 
pass. Therefore, no performance changes are expected before we support 
subreg coalesce. There will be new patches later to complete the subreg 
coalesce functionality. Support for subreg coalesce requires support for 
subreg copy i.e. modifying the logic for conflict detection.


On 2024/5/2 00:24, Vladimir Makarov wrote:


On 2/3/24 05:50, Lehua Ding wrote:

This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos to
include new partial_def/use and range_def/use fileds for DF_LIVE_SUBREG
problem.

Typo "fileds".

(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.

The same typo.

(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg liveness.
(live_con_fun_0): Adjust Confluence function to support subreg liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.


The patch is ok for me with some minor requests:

You missed log entry for collect_non_operand_hard_regs.  Log entry for 
lra_create_live_ranges_1 is not full (at least, it should be "Ditto. ...").


Also you changed signature for functions update_live_info, 
fix_bb_live_info, mark_regno_live, mark_regno_dead, new_insn_reg but did 
not updated the function comments.  Outdated comments are even worse 
than the comment absence.  Please fix them.


Also some variable naming could be improved but it is up to you.

So now you need just an approval for the rest patches to commit your 
work but they are not my area responsibility.


It is difficult predict for patches of this size how they will work for 
other targets.  I tested you patches on aarch64 and ppc64le. They seems 
working right but please be prepare to switch them off (it is easy) if 
the patches create some issues for other targets, of course until fixing 
the issues.


And thank you for your contribution.  Improving GCC performance these 
days is a challenging task as so many people are working on GCC but you 
found such opportunity and most importantly implement it.





--
Best,
Lehua



Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-05-07 Thread Lehua Ding

Hi Dimitar,


Thanks for helping to review the code! I will send V3 patch which 
address these comments.



Best,

Lehua


On 2024/4/26 04:56, Dimitar Dimitrov wrote:

On Wed, Apr 24, 2024 at 06:05:03PM +0800, Lehua Ding wrote:

This patch add a new DF problem, named DF_LIVE_SUBREG. This problem
is extended from the DF_LR problem and support track the subreg liveness
of multireg pseudo if these pseudo satisfy the following conditions:

   1. the mode size greater than it's REGMODE_NATURAL_SIZE.
   2. the reg is used in insns via subreg pattern.

The main methods are as follows:

   1. split bitmap in/out/def/use fileds to full_in/out/def/use and
  partial_in/out/def/use. If a pseudo need to be tracked it's subreg
  liveness, then it is recorded in partial_in/out/def/use fileds.
  Meantimes, there are range_in/out/def/use fileds which records the live
  range of the tracked pseudo.
   2. in the df_live_subreg_finalize function, we move the tracked pseudo from
  the partial_in/out/def/use to full_in/out/def/use if the pseudo's live
  range is full.

Hi Lehua,

I'm not familiar with LRA, so my comments bellow could be totally off
point.  Please treat them as mild suggestions.


gcc/ChangeLog:

* Makefile.in: Add subreg-live-range object file.
* df-problems.cc (struct df_live_subreg_problem_data): Private struct
for DF_LIVE_SUBREG problem.
(df_live_subreg_get_bb_info): getting bb regs in/out data.
(get_live_subreg_local_bb_info): getting bb regs use/def data.
(multireg_p): checking is the regno a pseudo multireg.
(need_track_subreg_p): checking is the regno need to be tracked.
(init_range): getting the range of subreg rtx.
(remove_subreg_range): removing use data for the reg/subreg rtx.
(add_subreg_range): adding def/use data for the reg/subreg rtx.
(df_live_subreg_free_bb_info): Free basic block df data.
(df_live_subreg_alloc): Allocate and init df data.
(df_live_subreg_reset): Reset the live in/out df data.
(df_live_subreg_bb_local_compute): Compute basic block df data.
(df_live_subreg_local_compute): Compute all basic blocks df data.
(df_live_subreg_init): Init the in/out df data.
(df_live_subreg_check_result): Assert the full and partial df data.
(df_live_subreg_confluence_0): Confluence function for infinite loops.
(df_live_subreg_confluence_n): Confluence function for normal edge.
(df_live_subreg_transfer_function): Transfer function.
(df_live_subreg_finalize): Finalize the all_in/all_out df data.
(df_live_subreg_free): Free the df data.
(df_live_subreg_top_dump): Dump top df data.
(df_live_subreg_bottom_dump): Dump bottom df data.
(df_live_subreg_add_problem): Add the DF_LIVE_SUBREG problem.
* df.h (enum df_problem_id): Add DF_LIVE_SUBREG.
(class subregs_live): Simple decalare.
(class df_live_subreg_local_bb_info): New class for full/partial def/use
df data.
(class df_live_subreg_bb_info): New class for full/partial in/out
df data.
(df_live_subreg): getting the df_live_subreg data.
(df_live_subreg_add_problem): Exported.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_check_result): Ditto.
(multireg_p): Ditto.
(init_range): Ditto.
(add_subreg_range): Ditto.
(remove_subreg_range): Ditto.
(df_get_subreg_live_in): Accessor the all_in df data.
(df_get_subreg_live_out): Accessor the all_out df data.
(df_get_subreg_live_full_in): Accessor the full_in df data.
(df_get_subreg_live_full_out): Accessor the full_out df data.
(df_get_subreg_live_partial_in): Accessor the partial_in df data.
(df_get_subreg_live_partial_out): Accessor the partial_out df data.
(df_get_subreg_live_range_in): Accessor the range_in df data.
(df_get_subreg_live_range_out): Accessor the range_out df data.
* regs.h (get_nblocks): Get the blocks of mode.
* sbitmap.cc (bitmap_full_p): sbitmap predicator.
(bitmap_same_p): sbitmap predicator.
(test_full): test bitmap_full_p.
(test_same): test bitmap_same_p.
(sbitmap_cc_tests): Add test_full and test_same.
* sbitmap.h (bitmap_full_p): Exported.
(bitmap_same_p): Ditto.
* timevar.def (TV_DF_LIVE_SUBREG): add DF_LIVE_SUBREG timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.
---
  gcc/Makefile.in  |   1 +
  gcc/df-problems.cc   | 855 ++-
  gcc/df.h | 155 +++
  gcc/regs.h   |   5 +
  gcc/sbitmap.cc   |  98 +
  gcc/sbitmap.h|   2 +
  gcc/subreg-live-range.cc |  53 +++
  gcc/subreg-live-range.h  | 206 ++
  gcc/timevar.def  |   1 +
  9 files changed, 1375 insertions(+), 1

[PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-04-24 Thread Lehua Ding
+  if (end_bitno == 0)
+return true;
+
+  gcc_checking_assert (i + 1 == a->size);
+
+  SBITMAP_ELT_TYPE mask = ((SBITMAP_ELT_TYPE) 1 << end_bitno) - 1;
+
+  /* Make sure the tail bits are same.  */
+  return (a->elms[i] & mask) == (b->elms[i] & mask);
+}
+
 /* Set DST to be (A or (B and C)).
Return nonzero if any change is made.  */
 
@@ -994,6 +1047,49 @@ test_bit_in_range ()
   sbitmap_free (s);
 }
 
+/* Verify bitmap_full_p functions for sbitmap.  */
+
+static void
+test_full ()
+{
+  sbitmap s = sbitmap_alloc (193);
+
+  bitmap_clear (s);
+  ASSERT_FALSE (bitmap_full_p (s));
+
+  bitmap_ones (s);
+  ASSERT_TRUE (bitmap_full_p (s));
+
+  bitmap_clear_bit (s, 192);
+  ASSERT_FALSE (bitmap_full_p (s));
+
+  bitmap_ones (s);
+  bitmap_clear_bit (s, 17);
+  ASSERT_FALSE (bitmap_full_p (s));
+}
+
+/* Verify bitmap_same_p functions for sbitmap.  */
+
+static void
+test_same ()
+{
+  sbitmap s1 = sbitmap_alloc (193);
+  sbitmap s2 = sbitmap_alloc (193);
+  sbitmap s3 = sbitmap_alloc (192);
+  
+  ASSERT_FALSE (bitmap_same_p (s1, s3));
+
+  bitmap_clear (s1);
+  bitmap_clear (s2);
+  ASSERT_TRUE (bitmap_same_p (s1, s2));
+  
+  bitmap_set_bit (s2, 192);
+  ASSERT_FALSE (bitmap_same_p (s1, s2));
+  
+  bitmap_set_bit (s1, 192);
+  ASSERT_TRUE (bitmap_same_p (s1, s2));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -1001,6 +1097,8 @@ sbitmap_cc_tests ()
 {
   test_set_range ();
   test_bit_in_range ();
+  test_full ();
+  test_same ();
 }
 
 } // namespace selftest
diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
index da6116ce925..71cfded9fb2 100644
--- a/gcc/sbitmap.h
+++ b/gcc/sbitmap.h
@@ -267,6 +267,7 @@ extern void bitmap_copy (sbitmap, const_sbitmap);
 extern bool bitmap_equal_p (const_sbitmap, const_sbitmap);
 extern unsigned int bitmap_count_bits (const_sbitmap);
 extern bool bitmap_empty_p (const_sbitmap);
+extern bool bitmap_full_p (const_sbitmap);
 extern void bitmap_clear (sbitmap);
 extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
 extern void bitmap_set_range (sbitmap, unsigned, unsigned);
@@ -287,6 +288,7 @@ extern bool bitmap_and (sbitmap, const_sbitmap, 
const_sbitmap);
 extern bool bitmap_ior (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_xor (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_subset_p (const_sbitmap, const_sbitmap);
+extern bool bitmap_same_p (const_sbitmap, const_sbitmap);
 extern bool bitmap_bit_in_range_p (const_sbitmap, unsigned int, unsigned int);
 
 extern int bitmap_first_set_bit (const_sbitmap);
diff --git a/gcc/subreg-live-range.cc b/gcc/subreg-live-range.cc
new file mode 100644
index 000..7e8e081844f
--- /dev/null
+++ b/gcc/subreg-live-range.cc
@@ -0,0 +1,53 @@
+/* SUBREG liveness tracking classes for DF & IRA & LRA.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "subreg-live-range.h"
+
+void
+subregs_live::dump (FILE *file, const char *indent) const
+{
+  if (lives.empty ())
+{
+  fprintf (file, "%sempty\n", indent);
+  return;
+}
+  fprintf (file, "%s", indent);
+  for (auto  : lives)
+{
+  const_sbitmap range = kv.second;
+  if (bitmap_empty_p (range))
+   continue;
+  fprintf (file, "%d: ", kv.first);
+  if (!bitmap_full_p (range))
+   {
+ dump_bitmap_file (file, range);
+ fprintf (file, ",  ");
+   }
+  else
+fprintf (file, "full, ");
+}
+  fprintf (file, "\n");
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live )
+{
+  l.dump (stderr, "");
+}
diff --git a/gcc/subreg-live-range.h b/gcc/subreg-live-range.h
new file mode 100644
index 000..4eafe006935
--- /dev/null
+++ b/gcc/subreg-live-range.h
@@ -0,0 +1,206 @@
+/* SUBREG liveness tracking classes for DF & IRA & LRA.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (

[PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-04-24 Thread Lehua Ding
This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos to
include new partial_def/use and range_def/use fileds for DF_LIVE_SUBREG
problem.
(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg liveness.
(live_con_fun_0): Adjust Confluence function to support subreg liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.
---
 gcc/lra-coalesce.cc|  27 +++-
 gcc/lra-constraints.cc | 109 ++---
 gcc/lra-int.h  |   4 +
 gcc/lra-lives.cc   | 357 -
 gcc/lra-remat.cc   |   8 +-
 gcc/lra-spills.cc  |  27 +++-
 gcc/lra.cc |  10 +-
 7 files changed, 430 insertions(+), 112 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index a9b5b51cb3f..9416775a009 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -186,19 +186,28 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
-  if (! bitmap_empty_p (_pseudos_bitmap))
+  if (!bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  if (flag_track_subreg_liveness)
+   {
+ bitmap_and_compl_into (full, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+ bitmap_and_compl_into (partial, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
+   }
 }
 }
 
@@ -301,8 +310,12 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (df_get_subreg_live_in (bb),
+   df_get_subreg_live_full_in (bb),
+   df_get_subreg_live_partial_in (bb));
+  update_live_info (df_get_subreg_live_out (bb),
+   df_get_subreg_live_full_out (bb),
+   df_get_subreg_live_partial_out (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 10e3d4e4097..9586e5602e4 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6515,34 +6515,86 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = df_get_subreg_live_in (prev_bb);
+ bitmap subreg_full_in = df_get_subreg_live_full_in (prev_bb);
+ bitmap subreg_partial_in = df_get_subreg_live_partial_in 
(prev_bb);
+ subregs_live *range_in = df_get_subreg_live_range_in (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j, 

[PATCH 3/4] ira: Apply DF_LIVE_SUBREG data

2024-04-24 Thread Lehua Ding
This patch simple replace df_get_live_in to df_get_subreg_live_in
and replace df_get_live_out to df_get_subreg_live_out.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch to DF_LIVE_SUBREG df data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.
---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 19 ---
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..283ff36d3dd 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1921,7 +1921,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_in (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1937,9 +1938,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = df_get_subreg_live_in (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_out (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..bfebc48ef83 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2786,8 +2786,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2795,8 +2795,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index d347f11fa02..8075b082e36 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = df_get_subreg_live_in (e->dest);
+  regs_live_out_src = df_get_subreg_live_out (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, df_get_subreg_live_in (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, df_get_subreg_live_out (bb));
   add_range_and_copies_from_move_list
(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- bitmap_and (live_through,
- df_get_live_in (e->dest), df_get_live_out (bb));
+ bitmap_and (live_through, df_get_subreg_live_in (e->dest),
+ df_get_subreg_live_out (bb));

[PATCH 1/4] df: Add -ftrack-subreg-liveness option

2024-04-24 Thread Lehua Ding
Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness.
This flag is enabled at -O3/fast.

gcc/ChangeLog:

* common.opt: add -ftrack-subreg-liveness option.
* opts.cc: auto aneble -ftrack-subreg-liveness in -O3/fast
---
 gcc/common.opt  | 4 
 gcc/common.opt.urls | 3 +++
 gcc/doc/invoke.texi | 8 
 gcc/opts.cc | 1 +
 4 files changed, 16 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index ad348844775..bd030973434 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2157,6 +2157,10 @@ fira-share-spill-slots
 Common Var(flag_ira_share_spill_slots) Init(1) Optimization
 Share stack slots for spilled pseudo-registers.
 
+ftrack-subreg-liveness
+Common Var(flag_track_subreg_liveness) Init(0) Optimization
+Track subreg liveness information.
+
 fira-verbose=
 Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
 -fira-verbose= Control IRA's level of diagnostic messages.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index f71ed80a34b..59f27a6f7c6 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -880,6 +880,9 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fira-share-save-slots)
 fira-share-spill-slots
 UrlSuffix(gcc/Optimize-Options.html#index-fira-share-spill-slots)
 
+ftrack-subreg-liveness
+UrlSuffix(gcc/Optimize-Options.html#index-ftrack-subreg-liveness)
+
 fira-verbose=
 UrlSuffix(gcc/Developer-Options.html#index-fira-verbose)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27c31ab0c86..9724cbb32ba 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13186,6 +13186,14 @@ Disable sharing of stack slots allocated for 
pseudo-registers.  Each
 pseudo-register that does not get a hard register gets a separate
 stack slot, and as a result function stack frames are larger.
 
+@opindex ftrack-subreg-liveness
+@item -ftrack-subreg-liveness
+Enable tracking subreg liveness information. This infomation allows IRA
+and LRA to support subreg coalesce feature which can improve the quality
+of register allocation.
+
+This option is enabled at level @option{-O3} for all targets.
+
 @opindex flra-remat
 @item -flra-remat
 Enable CFG-sensitive rematerialization in LRA.  Instead of loading
diff --git a/gcc/opts.cc b/gcc/opts.cc
index a90dc57f8b5..7b5d905a241 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -689,6 +689,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
 
 /* -O3 parameters.  */
 { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
-- 
2.25.1



[PATCH V2 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-04-24 Thread Lehua Ding
Hi Vladimir and Richard,

These patches are used to add a new data flow DF_LIVE_SUBREG,
which will track subreg liveness and then apply it to IRA and LRA
passes (enabled via -O3 or -ftrack-subreg-liveness). These patches
are for GCC 15. And these codes are pushed to the devel/subreg-coalesce
branch. In addition, my colleague Shuo Chen will also be involved in some
of the remain work, thank you for your support.

These patches are separated from the subreg-coalesce patches submitted
a few months ago. I refactored the code according to comments. The next
patches will support subreg coalesce base on they. Here are some data
abot build time of SPEC INT 2017 (x86-64 target):

  baseline   baseline(+track-subreg-liveness)
specint2017 build time :  1892s  1883s

Regarding build times, I've run it a few times, but they all seem to take
much less time. Since the difference is small, it's possible that it's just
a change in environment. But it's theoretically possible, since supporting
subreg-liveness could have reduced the number of living regs.

For memory usage, I trided PR 69609 by valgrind, peak memory size grow from
2003910656 to 2003947520, very small increase.

No regression on x86-64

Co-authored-by: Shuo Chen 

Best,
Lehua

Lehua Ding (4):
  df: Add -ftrack-subreg-liveness option
  df: Add DF_LIVE_SUBREG problem
  ira: Apply DF_LIVE_SUBREG data
  lra: Apply DF_LIVE_SUBREG data

 gcc/Makefile.in  |   1 +
 gcc/common.opt   |   4 +
 gcc/common.opt.urls  |   3 +
 gcc/df-problems.cc   | 855 ++-
 gcc/df.h | 155 +++
 gcc/doc/invoke.texi  |   8 +
 gcc/ira-build.cc |   7 +-
 gcc/ira-color.cc |   8 +-
 gcc/ira-emit.cc  |  12 +-
 gcc/ira-lives.cc |   7 +-
 gcc/ira.cc   |  19 +-
 gcc/lra-coalesce.cc  |  27 +-
 gcc/lra-constraints.cc   | 109 -
 gcc/lra-int.h|   4 +
 gcc/lra-lives.cc | 357 
 gcc/lra-remat.cc |   8 +-
 gcc/lra-spills.cc|  27 +-
 gcc/lra.cc   |  10 +-
 gcc/opts.cc  |   1 +
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 +
 gcc/sbitmap.h|   2 +
 gcc/subreg-live-range.cc |  53 +++
 gcc/subreg-live-range.h  | 206 ++
 gcc/timevar.def  |   1 +
 25 files changed, 1851 insertions(+), 136 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.25.1



[gcc/devel/subreg-coalesce] lra: Apply DF_LIVE_SUBREG data

2024-04-24 Thread Lehua Ding via Gcc-cvs
https://gcc.gnu.org/g:cde1363042b2857111e461968a6367381d46c936

commit cde1363042b2857111e461968a6367381d46c936
Author: Lehua Ding 
Date:   Fri Feb 2 10:35:37 2024 +0800

lra: Apply DF_LIVE_SUBREG data

This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos 
to
include new partial_def/use and range_def/use fileds for 
DF_LIVE_SUBREG
problem.
(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg 
liveness.
(live_con_fun_0): Adjust Confluence function to support subreg 
liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.

Diff:
---
 gcc/lra-coalesce.cc|  27 +++-
 gcc/lra-constraints.cc | 109 ---
 gcc/lra-int.h  |   4 +
 gcc/lra-lives.cc   | 357 +++--
 gcc/lra-remat.cc   |   8 +-
 gcc/lra-spills.cc  |  27 +++-
 gcc/lra.cc |  10 +-
 7 files changed, 430 insertions(+), 112 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index a9b5b51cb3f..9416775a009 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -186,19 +186,28 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
-  if (! bitmap_empty_p (_pseudos_bitmap))
+  if (!bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  if (flag_track_subreg_liveness)
+   {
+ bitmap_and_compl_into (full, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+ bitmap_and_compl_into (partial, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
+   }
 }
 }
 
@@ -301,8 +310,12 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (df_get_subreg_live_in (bb),
+   df_get_subreg_live_full_in (bb),
+   df_get_subreg_live_partial_in (bb));
+  update_live_info (df_get_subreg_live_out (bb),
+   df_get_subreg_live_full_out (bb),
+   df_get_subreg_live_partial_out (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 10e3d4e4097..9586e5602e4 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6515,34 +6515,86 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Updat

[gcc/devel/subreg-coalesce] ira: Apply DF_LIVE_SUBREG data

2024-04-24 Thread Lehua Ding via Gcc-cvs
https://gcc.gnu.org/g:cf327312a72fe55d7e06a84bbae3d5de649a1ed3

commit cf327312a72fe55d7e06a84bbae3d5de649a1ed3
Author: Lehua Ding 
Date:   Fri Feb 2 10:35:17 2024 +0800

ira: Apply DF_LIVE_SUBREG data

This patch simple replace df_get_live_in to df_get_subreg_live_in
and replace df_get_live_out to df_get_subreg_live_out.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch to DF_LIVE_SUBREG df 
data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.

Diff:
---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 19 ---
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..283ff36d3dd 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1921,7 +1921,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_in (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1937,9 +1938,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = df_get_subreg_live_in (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_out (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..bfebc48ef83 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2786,8 +2786,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2795,8 +2795,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index d347f11fa02..8075b082e36 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = df_get_subreg_live_in (e->dest);
+  regs_live_out_src = df_get_subreg_live_out (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, df_get_subreg_live_in (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, df_get_subreg_

[gcc/devel/subreg-coalesce] df: Add DF_LIVE_SUBREG problem

2024-04-24 Thread Lehua Ding via Gcc-cvs
https://gcc.gnu.org/g:8e76084576fb8e0054fa19e3bc16e97d05c10630

commit 8e76084576fb8e0054fa19e3bc16e97d05c10630
Author: Lehua Ding 
Date:   Tue Jan 30 16:47:25 2024 +0800

df: Add DF_LIVE_SUBREG problem

This patch add a new DF problem, named DF_LIVE_SUBREG. This problem
is extended from the DF_LR problem and support track the subreg liveness
of multireg pseudo if these pseudo satisfy the following conditions:

  1. the mode size greater than it's REGMODE_NATURAL_SIZE.
  2. the reg is used in insns via subreg pattern.

The main methods are as follows:

  1. split bitmap in/out/def/use fileds to full_in/out/def/use and
 partial_in/out/def/use. If a pseudo need to be tracked it's subreg
 liveness, then it is recorded in partial_in/out/def/use fileds.
 Meantimes, there are range_in/out/def/use fileds which records the live
 range of the tracked pseudo.
  2. in the df_live_subreg_finalize function, we move the tracked pseudo 
from
 the partial_in/out/def/use to full_in/out/def/use if the pseudo's live
 range is full.

gcc/ChangeLog:

* Makefile.in: Add subreg-live-range object file.
* df-problems.cc (struct df_live_subreg_problem_data): Private 
struct
for DF_LIVE_SUBREG problem.
(df_live_subreg_get_bb_info): getting bb regs in/out data.
(get_live_subreg_local_bb_info): getting bb regs use/def data.
(multireg_p): checking is the regno a pseudo multireg.
(need_track_subreg_p): checking is the regno need to be tracked.
(init_range): getting the range of subreg rtx.
(remove_subreg_range): removing use data for the reg/subreg rtx.
(add_subreg_range): adding def/use data for the reg/subreg rtx.
(df_live_subreg_free_bb_info): Free basic block df data.
(df_live_subreg_alloc): Allocate and init df data.
(df_live_subreg_reset): Reset the live in/out df data.
(df_live_subreg_bb_local_compute): Compute basic block df data.
(df_live_subreg_local_compute): Compute all basic blocks df data.
(df_live_subreg_init): Init the in/out df data.
(df_live_subreg_check_result): Assert the full and partial df data.
(df_live_subreg_confluence_0): Confluence function for infinite 
loops.
(df_live_subreg_confluence_n): Confluence function for normal edge.
(df_live_subreg_transfer_function): Transfer function.
(df_live_subreg_finalize): Finalize the all_in/all_out df data.
(df_live_subreg_free): Free the df data.
(df_live_subreg_top_dump): Dump top df data.
(df_live_subreg_bottom_dump): Dump bottom df data.
(df_live_subreg_add_problem): Add the DF_LIVE_SUBREG problem.
* df.h (enum df_problem_id): Add DF_LIVE_SUBREG.
(class subregs_live): Simple decalare.
(class df_live_subreg_local_bb_info): New class for full/partial 
def/use
df data.
(class df_live_subreg_bb_info): New class for full/partial in/out
df data.
(df_live_subreg): getting the df_live_subreg data.
(df_live_subreg_add_problem): Exported.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_check_result): Ditto.
(multireg_p): Ditto.
(init_range): Ditto.
(add_subreg_range): Ditto.
(remove_subreg_range): Ditto.
(df_get_subreg_live_in): Accessor the all_in df data.
(df_get_subreg_live_out): Accessor the all_out df data.
(df_get_subreg_live_full_in): Accessor the full_in df data.
(df_get_subreg_live_full_out): Accessor the full_out df data.
(df_get_subreg_live_partial_in): Accessor the partial_in df data.
(df_get_subreg_live_partial_out): Accessor the partial_out df data.
(df_get_subreg_live_range_in): Accessor the range_in df data.
(df_get_subreg_live_range_out): Accessor the range_out df data.
* regs.h (get_nblocks): Get the blocks of mode.
* sbitmap.cc (bitmap_full_p): sbitmap predicator.
(bitmap_same_p): sbitmap predicator.
(test_full): test bitmap_full_p.
(test_same): test bitmap_same_p.
(sbitmap_cc_tests): Add test_full and test_same.
* sbitmap.h (bitmap_full_p): Exported.
(bitmap_same_p): Ditto.
* timevar.def (TV_DF_LIVE_SUBREG): add DF_LIVE_SUBREG timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.

Diff:
---
 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 855 ++-
 gcc/df.h | 155 +
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 ++
 gcc/sbitmap.h

[gcc/devel/subreg-coalesce] df: Add -ftrack-subreg-liveness option

2024-04-24 Thread Lehua Ding via Gcc-cvs
https://gcc.gnu.org/g:b6b50e19f88bd33b6c0d252795ebb6cffda9574f

commit b6b50e19f88bd33b6c0d252795ebb6cffda9574f
Author: Lehua Ding 
Date:   Tue Jan 30 16:45:25 2024 +0800

df: Add -ftrack-subreg-liveness option

Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness.
This flag is enabled at -O3/fast.

gcc/ChangeLog:

* common.opt: add -ftrack-subreg-liveness option.
* opts.cc: auto aneble -ftrack-subreg-liveness in -O3/fast

Diff:
---
 gcc/common.opt  | 4 
 gcc/common.opt.urls | 3 +++
 gcc/doc/invoke.texi | 8 
 gcc/opts.cc | 1 +
 4 files changed, 16 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index ad348844775..bd030973434 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2157,6 +2157,10 @@ fira-share-spill-slots
 Common Var(flag_ira_share_spill_slots) Init(1) Optimization
 Share stack slots for spilled pseudo-registers.
 
+ftrack-subreg-liveness
+Common Var(flag_track_subreg_liveness) Init(0) Optimization
+Track subreg liveness information.
+
 fira-verbose=
 Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
 -fira-verbose= Control IRA's level of diagnostic messages.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index f71ed80a34b..59f27a6f7c6 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -880,6 +880,9 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fira-share-save-slots)
 fira-share-spill-slots
 UrlSuffix(gcc/Optimize-Options.html#index-fira-share-spill-slots)
 
+ftrack-subreg-liveness
+UrlSuffix(gcc/Optimize-Options.html#index-ftrack-subreg-liveness)
+
 fira-verbose=
 UrlSuffix(gcc/Developer-Options.html#index-fira-verbose)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27c31ab0c86..9724cbb32ba 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13186,6 +13186,14 @@ Disable sharing of stack slots allocated for 
pseudo-registers.  Each
 pseudo-register that does not get a hard register gets a separate
 stack slot, and as a result function stack frames are larger.
 
+@opindex ftrack-subreg-liveness
+@item -ftrack-subreg-liveness
+Enable tracking subreg liveness information. This infomation allows IRA
+and LRA to support subreg coalesce feature which can improve the quality
+of register allocation.
+
+This option is enabled at level @option{-O3} for all targets.
+
 @opindex flra-remat
 @item -flra-remat
 Enable CFG-sensitive rematerialization in LRA.  Instead of loading
diff --git a/gcc/opts.cc b/gcc/opts.cc
index a90dc57f8b5..7b5d905a241 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -689,6 +689,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
 
 /* -O3 parameters.  */
 { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },


[gcc/devel/subreg-coalesce] (111 commits) tree-optimization/114787 - more careful loop update with CF

2024-04-24 Thread Lehua Ding via Gcc-cvs
The branch 'devel/subreg-coalesce' was updated to point to:

 cc48418cfc2... tree-optimization/114787 - more careful loop update with CF

It previously pointed to:

 443748259d9... libstdc++: Fix "extact" typos in comments

Diff:

Summary of changes (added commits):
---

  cc48418... tree-optimization/114787 - more careful loop update with CF (*)
  e28e8ab... tree-optimization/114832 - wrong dominator info with vect p (*)
  d279c9d... i386: Fix behavior for both using AVX10.1-256 in options an (*)
  f952745... RISC-V: Add xfail test case for highpart overlap of vext.vf (*)
  8bcefc2... Revert "RISC-V: Support highpart overlap for vext.vf" (*)
  3091f1d... Daily bump. (*)
  7318f1a... c++: Fix ICE with xobj parms and maybe incomplete decl-spec (*)
  628c222... i386: Avoid =,r,r andn double-word alternative for ia32 [ (*)
  f7a5c99... Regenerate gcc.pot (*)
  0bf94da... Fortran: check C_SIZEOF on additions from TS29113/F2018 [PR (*)
  4f9401d... c++/modules: deduced return type merging [PR114795] (*)
  d2f05fe... libbacktrace: test --compress-debug-sections=ARG for each A (*)
  0c8e99e... testsuite: Adjust testsuite expectations for diagnostic spe (*)
  6f0a646... Remove repeated information in -ftree-loop-distribute-patte (*)
  f994094... Further spelling fixes in translatable strings (*)
  4338ac1... Spelling fixes for translatable strings (*)
  3d56999... s390: testsuite: Xfail forwprop-4{0,1}.c (*)
  ca00bf0... Fortran: Check that the ICE does not reappear [PR102597] (*)
  18e8e55... tree-optimization/114799 - SLP and patterns (*)
  42189f2... s390x: Fix vec_xl/vec_xst type aliasing [PR114676] (*)
  aa73eb9... c++: Copy over DECL_DISREGARD_INLINE_LIMITS flag to inherit (*)
  cf51fe7... c++: Check if allocation functions are xobj members [PR1140 (*)
  77e114b... LoongArch: Define builtin macros for ISA evolutions (*)
  b4ebdd1... LoongArch: Define ISA versions (*)
  8c6ee63... Daily bump. (*)
  2a8187e... RISC-V: Adjust overlap attr after revert d3544cea63d and e6 (*)
  b909daa... PR modula2/114811 string set incl ICE bugfix (*)
  7ef1391... libstdc++: Fix conversion of simd to vector builtin (*)
  e7a3ad2... libstdc++: Silence irrelevant warnings in 3 constraints [PR114783] (*)
  2afdecc... c-family: Allow arguments with NULLPTR_TYPE as sentinels [P (*)
  a39983b... c: Fix ICE with -g and -std=c23 related to incomplete types (*)
  d86472a... libstdc++: Simplify constraints on <=> for std::reference_w (*)
  eed7fb1... libstdc++: Support link chains in std::chrono::tzdb::locate (*)
  e8f0540... Update gcc sv.po (*)
  33bf8e5... internal-fn: Fix up expand_arith_overflow [PR114753] (*)
  1216460... middle-end: refactory vect_recog_absolute_difference to sim (*)
  9451b6c... Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768] (*)
  ede01df... bpf: remove huge memory waste with string allocation. (*)
  d7190d0... bpf: support more instructions to match CO-RE relocations (*)
  4d4929f... d: Fix ICE in build_deref, at d/d-codegen.cc:1650 [PR111650 (*)
  9f29584... rtlanal: Fix set_noop_p for volatile loads or stores [PR114 (*)
  36f4c8a... libgcc: Another __divmodbitint4 bug fix [PR114762] (*)
  694fa37... [vxworks] avoid mangling __STDC_VERSION_LIMITS_H__ (*)
  85c187b... Daily bump. (*)
  e498ba9... Add nios2*-*-* to the list of obsolete targets (*)
  e243d0f... Fortran: Fix ICE and clear incorrect error messages [PR1147 (*)
  7eecc08... [testsuite] [i386] add -msse2 to tests that require it (*)
  0ea96af... [testsuite] [i386] work around fails with --enable-frame-po (*)
  36d0038... [testsuite] [arm] accept empty init for bfloat16 (*)
  ce2dfc5... [c++] [testsuite] adjust contracts9.C for negative addresse (*)
  df92df0... [testsuite] [aarch64] Require fpic effective target. (*)
  514c6b1... [testsuite] [i386] require fpic for pr111497.C (*)
  cc02ebf... [testsuite] xfail pr103798-2 in C++ on vxworks too [PR11370 (*)
  e965162... [testsuite] [analyzer] include sys/select.h if available (*)
  8a11709... [testsuite] [analyzer] require fork where used (*)
  5be4f20... [testsuite] [analyzer] skip access-mode: O_ACCMODE on vxwor (*)
  76a1bcc... [testsuite] [analyzer] avoid vxworks libc mode_t (*)
  5dfbc05... [testsuite] introduce strndup effective target (*)
  dcf0bd1... [libstdc++] [testsuite] disable SRA for compare_exchange_pa (*)
  5b17817... [libstdc++] [testsuite] xfail double-prec from_chars for fl (*)
  da3504a... [libstdc++] define zoneinfo_dir_override on vxworks (*)
  a2f4be3... AArch64: remove reliance on register allocator for simd/gpr (*)
  82d6d38... libgcc: Fix up __divmodbitint4 [PR114755] (*)
  6c152c9... internal-fn: Temporarily disable flag_trapv during .{ADD,SU (*)
  6e62ede... testsuite, rs6000: Fix builtins-6-p9-runnable.c for BE [PR1 (*)
  58a0b19... rs6000: Fix bcd test case (*)
  69576bc... Daily bump. (*)
  7c2a9db... libstdc++: Implement "Printing blank lines with println" fo (*)
  5705614... DOCUMENTATION_ROOT_URL vs. release branches [PR114738] 

[gcc] Created branch 'devel/subreg-coalesce'

2024-04-24 Thread Lehua Ding via Gcc-cvs
The branch 'devel/subreg-coalesce' was created pointing to:

 443748259d9... libstdc++: Fix "extact" typos in comments


Re: [PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-05 Thread Lehua Ding




On 2024/2/6 2:17, Joseph Myers wrote:

This series appears to be missing documentation for the new option in
invoke.texi.



OK, I'll add that. Thanks.

--
Best,
Lehua (RiVAI)


Re: [PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-05 Thread Lehua Ding



On 2024/2/6 0:10, Jeff Law wrote:
Just a note.  I doubt this will get much traction from a review 
standpoint until gcc-14 is basically out the door.


My recommendation is to continue development, bugfixing, cleanup, etc 
between now and then.  Consider creating a branch for the work in the 
upstream repo.


OK, thanks for the guidance.

--
Best,
Lehua (RiVAI)



Re: [PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-04 Thread Lehua Ding

For SPEC INT 2017, when using upstream GCC (whitout these patches), I get a
coredump when training the peak case, so no data yet. The cause of the core
dump still needs to be investigated.


Typo, SPEC INT 2017 -> SPEC FP 2017
Also There is a bad news, the score of specint 2017 (with these patches) 
is dropped, a bit strange and I need to be locating the cause.


--
Best,
Lehua (RiVAI)



[PATCH 3/4] ira: Apply DF_LIVE_SUBREG data

2024-02-03 Thread Lehua Ding
This patch simple replace df_get_live_in to df_get_subreg_live_in
and replace df_get_live_out to df_get_subreg_live_out.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch to DF_LIVE_SUBREG df data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.

---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 19 ---
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..283ff36d3dd 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1921,7 +1921,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_in (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1937,9 +1938,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = df_get_subreg_live_in (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_out (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..bfebc48ef83 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2786,8 +2786,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2795,8 +2795,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index d347f11fa02..8075b082e36 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = df_get_subreg_live_in (e->dest);
+  regs_live_out_src = df_get_subreg_live_out (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, df_get_subreg_live_in (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, df_get_subreg_live_out (bb));
   add_range_and_copies_from_move_list
(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- bitmap_and (live_through,
- df_get_live_in (e->dest), df_get_live_out (bb));
+ bitmap_and (live_through, df_get_subreg_live_in (e->dest),
+ df_get_subreg_live_out (bb));
   

[PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-02-03 Thread Lehua Ding
This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos to
include new partial_def/use and range_def/use fileds for DF_LIVE_SUBREG
problem.
(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg liveness.
(live_con_fun_0): Adjust Confluence function to support subreg liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.
---
 gcc/lra-coalesce.cc|  27 +++-
 gcc/lra-constraints.cc | 109 ++---
 gcc/lra-int.h  |   4 +
 gcc/lra-lives.cc   | 357 -
 gcc/lra-remat.cc   |   8 +-
 gcc/lra-spills.cc  |  27 +++-
 gcc/lra.cc |  10 +-
 7 files changed, 430 insertions(+), 112 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index a9b5b51cb3f..9416775a009 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -186,19 +186,28 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
-  if (! bitmap_empty_p (_pseudos_bitmap))
+  if (!bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  if (flag_track_subreg_liveness)
+   {
+ bitmap_and_compl_into (full, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+ bitmap_and_compl_into (partial, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
+   }
 }
 }
 
@@ -301,8 +310,12 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (df_get_subreg_live_in (bb),
+   df_get_subreg_live_full_in (bb),
+   df_get_subreg_live_partial_in (bb));
+  update_live_info (df_get_subreg_live_out (bb),
+   df_get_subreg_live_full_out (bb),
+   df_get_subreg_live_partial_out (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0ae81c1ff9c..d1316620f51 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6505,34 +6505,86 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = df_get_subreg_live_in (prev_bb);
+ bitmap subreg_full_in = df_get_subreg_live_full_in (prev_bb);
+ bitmap subreg_partial_in = df_get_subreg_live_partial_in 
(prev_bb);
+ subregs_live *range_in = df_get_subreg_live_range_in (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j, 

[PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-02-03 Thread Lehua Ding
+  if (end_bitno == 0)
+return true;
+
+  gcc_checking_assert (i + 1 == a->size);
+
+  SBITMAP_ELT_TYPE mask = ((SBITMAP_ELT_TYPE) 1 << end_bitno) - 1;
+
+  /* Make sure the tail bits are same.  */
+  return (a->elms[i] & mask) == (b->elms[i] & mask);
+}
+
 /* Set DST to be (A or (B and C)).
Return nonzero if any change is made.  */
 
@@ -994,6 +1047,49 @@ test_bit_in_range ()
   sbitmap_free (s);
 }
 
+/* Verify bitmap_full_p functions for sbitmap.  */
+
+static void
+test_full ()
+{
+  sbitmap s = sbitmap_alloc (193);
+
+  bitmap_clear (s);
+  ASSERT_FALSE (bitmap_full_p (s));
+
+  bitmap_ones (s);
+  ASSERT_TRUE (bitmap_full_p (s));
+
+  bitmap_clear_bit (s, 192);
+  ASSERT_FALSE (bitmap_full_p (s));
+
+  bitmap_ones (s);
+  bitmap_clear_bit (s, 17);
+  ASSERT_FALSE (bitmap_full_p (s));
+}
+
+/* Verify bitmap_same_p functions for sbitmap.  */
+
+static void
+test_same ()
+{
+  sbitmap s1 = sbitmap_alloc (193);
+  sbitmap s2 = sbitmap_alloc (193);
+  sbitmap s3 = sbitmap_alloc (192);
+  
+  ASSERT_FALSE (bitmap_same_p (s1, s3));
+
+  bitmap_clear (s1);
+  bitmap_clear (s2);
+  ASSERT_TRUE (bitmap_same_p (s1, s2));
+  
+  bitmap_set_bit (s2, 192);
+  ASSERT_FALSE (bitmap_same_p (s1, s2));
+  
+  bitmap_set_bit (s1, 192);
+  ASSERT_TRUE (bitmap_same_p (s1, s2));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -1001,6 +1097,8 @@ sbitmap_cc_tests ()
 {
   test_set_range ();
   test_bit_in_range ();
+  test_full ();
+  test_same ();
 }
 
 } // namespace selftest
diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
index da6116ce925..71cfded9fb2 100644
--- a/gcc/sbitmap.h
+++ b/gcc/sbitmap.h
@@ -267,6 +267,7 @@ extern void bitmap_copy (sbitmap, const_sbitmap);
 extern bool bitmap_equal_p (const_sbitmap, const_sbitmap);
 extern unsigned int bitmap_count_bits (const_sbitmap);
 extern bool bitmap_empty_p (const_sbitmap);
+extern bool bitmap_full_p (const_sbitmap);
 extern void bitmap_clear (sbitmap);
 extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
 extern void bitmap_set_range (sbitmap, unsigned, unsigned);
@@ -287,6 +288,7 @@ extern bool bitmap_and (sbitmap, const_sbitmap, 
const_sbitmap);
 extern bool bitmap_ior (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_xor (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_subset_p (const_sbitmap, const_sbitmap);
+extern bool bitmap_same_p (const_sbitmap, const_sbitmap);
 extern bool bitmap_bit_in_range_p (const_sbitmap, unsigned int, unsigned int);
 
 extern int bitmap_first_set_bit (const_sbitmap);
diff --git a/gcc/subreg-live-range.cc b/gcc/subreg-live-range.cc
new file mode 100644
index 000..fe8d4210eb6
--- /dev/null
+++ b/gcc/subreg-live-range.cc
@@ -0,0 +1,53 @@
+/* SUBREG liveness tracking classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "subreg-live-range.h"
+
+void
+subregs_live::dump (FILE *file, const char *indent) const
+{
+  if (lives.empty ())
+{
+  fprintf (file, "%sempty\n", indent);
+  return;
+}
+  fprintf (file, "%s", indent);
+  for (auto  : lives)
+{
+  const_sbitmap range = kv.second;
+  if (bitmap_empty_p (range))
+   continue;
+  fprintf (file, "%d: ", kv.first);
+  if (!bitmap_full_p (range))
+   {
+ dump_bitmap_file (file, range);
+ fprintf (file, ",  ");
+   }
+  else
+fprintf (file, "full, ");
+}
+  fprintf (file, "\n");
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live )
+{
+  l.dump (stderr, "");
+}
diff --git a/gcc/subreg-live-range.h b/gcc/subreg-live-range.h
new file mode 100644
index 000..c0b88071858
--- /dev/null
+++ b/gcc/subreg-live-range.h
@@ -0,0 +1,206 @@
+/* SUBREG liveness tracking classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (

[PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-03 Thread Lehua Ding
Hi,

These patches are used to add a new data flow DF_LIVE_SUBREG,
which will track subreg liveness and then apply it to IRA and LRA
passes (enabled via -O3 or -ftrack-subreg-liveness). These patches
are for GCC 15.

These patches are separated from the subreg-coalesce patches submitted
a few months ago. I refactored the code according to comments. The next
patches will support subreg coalesce base on they. Here are some data
abot build time of SPEC INT 2017 (x86-64 target):

  baseline   baseline(+track-subreg-liveness)
specint2017 build time :  1892s  1883s

Regarding build times, I've run it a few times, but they all seem to take
much less time. Since the difference is small, it's possible that it's just
a change in environment. But it's theoretically possible, since supporting
subreg-liveness could have reduced the number of living regs.

For memory usage, I trided PR 69609 by valgrind, peak memory size grow from
2003910656 to 2003947520, very small increase.

For SPEC INT 2017, when using upstream GCC (whitout these patches), I get a
coredump when training the peak case, so no data yet. The cause of the core
dump still needs to be investigated.

No regression on x86-64, AArch64 and RISC-V target.

Best,
Lehua

Lehua Ding (4):
  df: Add -ftrack-subreg-liveness option
  df: Add DF_LIVE_SUBREG problem
  ira: Apply DF_LIVE_SUBREG data
  lra: Apply DF_LIVE_SUBREG data

 gcc/Makefile.in  |   1 +
 gcc/common.opt   |   4 +
 gcc/df-problems.cc   | 855 ++-
 gcc/df.h | 155 +++
 gcc/ira-build.cc |   7 +-
 gcc/ira-color.cc |   8 +-
 gcc/ira-emit.cc  |  12 +-
 gcc/ira-lives.cc |   7 +-
 gcc/ira.cc   |  19 +-
 gcc/lra-coalesce.cc  |  27 +-
 gcc/lra-constraints.cc   | 109 -
 gcc/lra-int.h|   4 +
 gcc/lra-lives.cc | 355 
 gcc/lra-remat.cc |   8 +-
 gcc/lra-spills.cc|  27 +-
 gcc/lra.cc   |  10 +-
 gcc/opts.cc  |   1 +
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 +
 gcc/sbitmap.h|   2 +
 gcc/subreg-live-range.cc |  53 +++
 gcc/subreg-live-range.h  | 206 ++
 gcc/timevar.def  |   1 +
 23 files changed, 1839 insertions(+), 135 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3



[PATCH 1/4] df: Add -ftrack-subreg-liveness option

2024-02-03 Thread Lehua Ding
Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness.
This flag is enabled at -O3/fast.

gcc/ChangeLog:

* common.opt: add -ftrack-subreg-liveness option.
* opts.cc: auto aneble -ftrack-subreg-liveness in -O3/fast

---
 gcc/common.opt | 4 
 gcc/opts.cc| 1 +
 2 files changed, 5 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index 51c4a17da83..d4592c6426a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2156,6 +2156,10 @@ fira-verbose=
 Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
 -fira-verbose= Control IRA's level of diagnostic messages.
 
+ftrack-subreg-liveness
+Common Var(flag_track_subreg_liveness) Init(0) Optimization
+Track subreg liveness information for IRA and LRA, enabled at -O3.
+
 fivopts
 Common Var(flag_ivopts) Init(1) Optimization
 Optimize induction variables on trees.
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 600e0ea..50c0b62c5af 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -689,6 +689,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
 
 /* -O3 parameters.  */
 { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
-- 
2.36.3



Re: [PATCH V3 1/7] df: Add DF_LIVE_SUBREG problem

2023-11-20 Thread Lehua Ding

Hi Richard,

On 2023/11/21 4:11, Richard Sandiford wrote:

Lehua Ding  writes:

This patch adds a live_subreg problem to extend the original live_reg to
track the liveness of subreg. We will only try to trace speudo registers
who's mode size is a multiple of nature size and eventually a small portion
of the inside will appear to use subreg. With live_reg problem, live_subreg
prbolem will have the following output. full_in/out mean the entire pesudo
live in/out, partial_in/out mean the subregs of the pesudo are live in/out,
and range_in/out indicates which part of the pesudo is live. all_in/out is
the union of full_in/out and partial_in/out:

   bitmap_head all_in, full_in;
   bitmap_head all_out, full_out;
   bitmap_head partial_in;
   bitmap_head partial_out;
   subregs_live *range_in = NULL;
   subregs_live *range_out = NULL;


I haven't fully processed the patch yet, sorry.  And I think I might be
about to cover things that you dealt with elsewhere.

My assumption going into this was that a subreg liveness tracker would
work as follows:

- First we would work out which registers need to have subreg tracking.
   This could be done ahead of time by iterating over regno_reg_rtx.
   The condition in need_track_subreg looks like the correct one.

   For every other register, subreg liveness degenerates to the existing
   liveness problems.  Such registers can be ignored.

- We would assign a unique identifier to each subreg that we want to track,
   with subregs for the same register being consecutive.

- There would be a mapping from pseudo registers to the first subreg
   that we want to track.  The mapping would probably just be a linear
   array, but perhaps there are times when something more compact is
   appropriate.

- The dataflow problem itself would then be very similar to the existing
   ones.  But rather than computing bitmaps with a single bit per register,
   we'd be computing bitmaps that have N bits for N-register pseudos
   (and no bits for single-register pseudos).

- There would be helper functions that consumers could use to iterate
   over a block.  E.g. for a backwards walk over a block, a consumer
   would start with the bitmap of live-out subregs.  It would then use
   these helper functions to keep the values up-to-date as it moves
   up through the block.

   That's done for normal liveness via the df_simulate_* helpers.
   But now that the codebase is C++, it might be more convenient for
   the subreg code to provide classes for walking a block.

That should be relatively compile-time-friendly, although I agree
with Vlad of course that DF does have efficiency problems.  The nature
of the way it works makes it at least O(#blocks * #regs).

Did you consider doing it that way?  Or does it not provide the
information that you need?


Thanks for providing such detailed instructions, this looks like it 
should perform well. I'll give it a try and come back with any questions.





gcc/ChangeLog:

* Makefile.in: Add new object file.
* df-problems.cc (struct df_live_subreg_problem_data):
The data of the new live_subreg problem.
(need_track_subreg): New function.
(get_range): Ditto.
(remove_subreg_range): Ditto.
(add_subreg_range): Ditto.
(df_live_subreg_free_bb_info): Ditto.
(df_live_subreg_alloc): Ditto.
(df_live_subreg_reset): Ditto.
(df_live_subreg_bb_local_compute): Ditto.
(df_live_subreg_local_compute): Ditto.
(df_live_subreg_init): Ditto.
(df_live_subreg_check_result): Ditto.
(df_live_subreg_confluence_0): Ditto.
(df_live_subreg_confluence_n): Ditto.
(df_live_subreg_transfer_function): Ditto.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_free): Ditto.
(df_live_subreg_top_dump): Ditto.
(df_live_subreg_bottom_dump): Ditto.
(df_live_subreg_add_problem): Ditto.
* df.h (enum df_problem_id): Add live_subreg id.
(DF_LIVE_SUBREG_INFO): Data accessor.
(DF_LIVE_SUBREG_IN): Ditto.
(DF_LIVE_SUBREG_OUT): Ditto.
(DF_LIVE_SUBREG_FULL_IN): Ditto.
(DF_LIVE_SUBREG_FULL_OUT): Ditto.
(DF_LIVE_SUBREG_PARTIAL_IN): Ditto.
(DF_LIVE_SUBREG_PARTIAL_OUT): Ditto.
(DF_LIVE_SUBREG_RANGE_IN): Ditto.
(DF_LIVE_SUBREG_RANGE_OUT): Ditto.
(class subregs_live): New class.
(class basic_block_subreg_live_info): Ditto.
(class df_live_subreg_bb_info): Ditto.
(df_live_subreg): Ditto.
(df_live_subreg_add_problem): Ditto.
(df_live_subreg_finalize): Ditto.
(class subreg_range): Ditto.
(need_track_subreg): Ditto.
(remove_subreg_range): Ditto.
(add_subreg_range): Ditto.
(df_live_subreg_get_bb_info): Ditto.
* regs.h (get_nblocks): Helper function.
* timevar.def (TV_DF_LIVE_SUBREG): New timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New

Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-18 Thread Lehua Ding




On 2023/11/18 16:24, Sam James wrote:


Lehua Ding  writes:


Hi Sam,

On 2023/11/18 16:06, Sam James wrote:

Lehua Ding  writes:


Hi Vladimir,

On 2023/11/17 22:05, Vladimir Makarov wrote:

On 11/16/23 21:06, Lehua Ding wrote:

Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel
like there are a lot of issues, especially the long compile time
issue. So I'm going to reorganize and refactor the patches so that
as many of them as possible can be reviewed separately. this way
there will be fewer patches to support subreg in the end. I plan to
split it into four separate patches like bellow. What do you think?


I can wait for the new version patches.  The only issue is stage1 deadline.
In my opinion, I'd recommend to work on the patches more and start
their submission right before GCC-14 release (somewhere in April).


Quite agree, I'll rewrite the patches a bit better before resend new
version patchs, stage 1 is definitely too late. When you say before
GCC-14 release do you mean at GCC 14 stage 3? Is it possible to commit
such changes at stage 3? I was thinking that if I miss GCC 14 stage 1
I should have to wait until GCC 15 stage 1.

I took it to mean "submit it during GCC 14 stage 3 for merging
during
GCC 15 stage 1", as the idea would be that if you're basing it on the
state of the tree & doing further/final testing on GCC 14 stage 3,
the tree should be in a stable state by then with only regression fixes
going in, rather than other changes which might disrupt your testing.
This means you are not constantly rebasing and getting new test
failures
possibly due to changes other than yours. It also means lots of time to
review and fix any problems with less pressure.


Oh, looks like I misunderstood. Thanks for the correction.




You need a lot of testing for the patches: major targets (x86-64,
aarhc64, ppc64), some big endian targets, a 32-bit targets. Knowing
how even small changes in RA can affect many targets, e.g. GCC
testsuite results (there are a lot of different target tests which
expect a particular output),  it is better to do this on stabilized
GCC and stage3 is the best time for this.  In any case I'll approve
patches only if you have successful bootstraps and no GCC testsuite
regression on x86-64, ppc64le/be, aarhc64, i686.
Also you have a lot of compile time performance issues which you
need to address.  So I guess you will be overwhelmed by new
different target PRs after committing the patches if you will do
this now.  You will have more time and less pressure work if you
commit these patches in April.


Hm, I'll test the targets I can get first. I'll figure out the
other targets later.


The compiler farm can provide access to a bunch of targets and the
community may be able to help with access to others if needed.


I applied for cfarm access the other day, I'll try to use it. Thanks.


No problem. If you need some Linux targets not on the cfarm, let me
know. I can probably help with hppa/sparc at least and I know someone
with alpha, mips.


That's great. Thanks in advance.

--
Best,
Lehua (RiVAI)



Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-18 Thread Lehua Ding

Hi Sam,

On 2023/11/18 16:06, Sam James wrote:


Lehua Ding  writes:


Hi Vladimir,

On 2023/11/17 22:05, Vladimir Makarov wrote:

On 11/16/23 21:06, Lehua Ding wrote:

Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel
like there are a lot of issues, especially the long compile time
issue. So I'm going to reorganize and refactor the patches so that
as many of them as possible can be reviewed separately. this way
there will be fewer patches to support subreg in the end. I plan to
split it into four separate patches like bellow. What do you think?


I can wait for the new version patches.  The only issue is stage1 deadline.
In my opinion, I'd recommend to work on the patches more and start
their submission right before GCC-14 release (somewhere in April).


Quite agree, I'll rewrite the patches a bit better before resend new
version patchs, stage 1 is definitely too late. When you say before
GCC-14 release do you mean at GCC 14 stage 3? Is it possible to commit
such changes at stage 3? I was thinking that if I miss GCC 14 stage 1
I should have to wait until GCC 15 stage 1.


I took it to mean "submit it during GCC 14 stage 3 for merging during
GCC 15 stage 1", as the idea would be that if you're basing it on the
state of the tree & doing further/final testing on GCC 14 stage 3,
the tree should be in a stable state by then with only regression fixes
going in, rather than other changes which might disrupt your testing.

This means you are not constantly rebasing and getting new test failures
possibly due to changes other than yours. It also means lots of time to
review and fix any problems with less pressure.


Oh, looks like I misunderstood. Thanks for the correction.




You need a lot of testing for the patches: major targets (x86-64,
aarhc64, ppc64), some big endian targets, a 32-bit targets. Knowing
how even small changes in RA can affect many targets, e.g. GCC
testsuite results (there are a lot of different target tests which
expect a particular output),  it is better to do this on stabilized
GCC and stage3 is the best time for this.  In any case I'll approve
patches only if you have successful bootstraps and no GCC testsuite
regression on x86-64, ppc64le/be, aarhc64, i686.
Also you have a lot of compile time performance issues which you
need to address.  So I guess you will be overwhelmed by new
different target PRs after committing the patches if you will do
this now.  You will have more time and less pressure work if you
commit these patches in April.


Hm, I'll test the targets I can get first. I'll figure out the
other targets later.



The compiler farm can provide access to a bunch of targets and the
community may be able to help with access to others if needed.


I applied for cfarm access the other day, I'll try to use it. Thanks.

--
Best,
Lehua (RiVAI)


Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-18 Thread Lehua Ding

Hi Vladimir,

On 2023/11/17 22:05, Vladimir Makarov wrote:


On 11/16/23 21:06, Lehua Ding wrote:

Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel like 
there are a lot of issues, especially the long compile time issue. So 
I'm going to reorganize and refactor the patches so that as many of 
them as possible can be reviewed separately. this way there will be 
fewer patches to support subreg in the end. I plan to split it into 
four separate patches like bellow. What do you think?



I can wait for the new version patches.  The only issue is stage1 deadline.

In my opinion, I'd recommend to work on the patches more and start their 
submission right before GCC-14 release (somewhere in April).


Quite agree, I'll rewrite the patches a bit better before resend new 
version patchs, stage 1 is definitely too late. When you say before 
GCC-14 release do you mean at GCC 14 stage 3? Is it possible to commit 
such changes at stage 3? I was thinking that if I miss GCC 14 stage 1 I 
should have to wait until GCC 15 stage 1.


You need a lot of testing for the patches: major targets (x86-64, 
aarhc64, ppc64), some big endian targets, a 32-bit targets. Knowing how 
even small changes in RA can affect many targets, e.g. GCC testsuite 
results (there are a lot of different target tests which expect a 
particular output),  it is better to do this on stabilized GCC and 
stage3 is the best time for this.  In any case I'll approve patches only 
if you have successful bootstraps and no GCC testsuite regression on 
x86-64, ppc64le/be, aarhc64, i686.


Also you have a lot of compile time performance issues which you need to 
address.  So I guess you will be overwhelmed by new different target PRs 
after committing the patches if you will do this now.  You will have 
more time and less pressure work if you commit these patches in April.


Hm, I'll test the targets I can get first. I'll figure out the other 
targets later.


You changes are massive and in a critical part of GCC, it is better to 
do all of this on public git branch in order to people can try this and 
test their targets.


Okay, I'll try.

But it is up to you to decide when submit the patches.  Still besides 
approval of your patches, you need successful testing.  If new testsuite 
failures occur after submitting the patch and they are not fixed during 
short period of time, the patches should be reverted.



 1. live_subreg problem
2. conflict_hard_regs check refactoring
3. use object instead of allocno to create copies
4. support subreg coalesce
   4.1 ira: Apply live_subreg data to ira
   4.2 lra: Apply live_subreg data to lra
   4.3 ira: Support subreg liveness track
   4.4 lra: Support subreg liveness track

So for the two patches about LRA, maybe you can stop review and wait 
for the revised patchs.




Sure. So far I only had a quick glance on them.




--
Best,
Lehua (RiVAI)


Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-17 Thread Lehua Ding

Hi Kito and Robin,

So, going back to our testcases that reported errors with this, I don't 
think we should explicitly specify -march and -mabi when compiling a 
runnable program, but use the defaults (--with-arch). Most of our 
current runnable testcases adhere to this convention, except for the 
ones we are discussing now, who are explicitly setting -march to 
rv32gcv_zfh or rv64gcv_zfh inside rvv.exp file.


On 2023/11/17 16:29, Kito Cheng wrote:

Oh, ok I got why it happened and it is definitely caused by my patch
(but not that one, it is caused by another patch[1]), let me describe
the reason why I try to emit errors. RISC-V has a crazy number of
possible extension combinations, so it's easy to make some mistakes by
using some unsupported extension combination and generating error
messages that are difficult to understand.

Give some practical example here:
config a RISC-V toolchain with --with-arch=rv64gc --with-abi=lp64d
also build with multilib for rv32i/ilp32 and rv64imac/lp64

Now users try to use that toolchain with -march=rv32gc -mabi=ilp32d,
what will happen if there is no such error emitted?
GCC will fail back to default multilib which is rv64gc/lp64d, and then
you may got error message like bellow:

ABI is incompatible with that of the selected emulation:
  target emulation `elf32-littleriscv' does not match `elf64-littleriscv'

Experienced toolchain developers or experienced FAE may know what happened,
but the error message is really not meaningful for most users - and
then they will go back to waste our time :P

So that's the background why I design and implement that mechnish.

You may ask: hey why not implement the same mechnish for linux?
Ok - the answer is simple - linux typically will build with
rv64gc/lp64d as base ,
No much different combination like bare metal environment.

*However* I am not trying to say: there is no solution there, let's
give up on testing with bare metal.

One possible solution is Jin Ma's patch[2], he proposed
-mdisable-multilib-check to suppress this check, but it's kind of
dangerous in most cases, this may make it compile, but will get
another error soon.

So...I think the right solution should be adding more checks before
running those tests, e.g.checking rv32gv/ilp32d can run before running
those testcase.

[1] 
https://github.com/gcc-mirror/gcc/commit/d72ca12b846a9f5c01674b280b1817876c77888f
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619234.html

On Wed, Nov 15, 2023 at 6:48 PM 钟居哲  wrote:


Hi, Kito. Could you take a look at this issue?

-march parser is consistent between non-linux and linux.

You can simplify verify it with these cases:

FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c -std=c99 
-O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-run.c -O3 -ftree-vectorize (test for 
excess errors)

These cases failed on non-linux toolchain, but pass on linux toolchain.
This consistency is caused by your previous multilib patch as Lehua said:
https://github.com/gcc-mirror/gcc/commit/17d683d



juzhe.zh...@rivai.ai


From: Lehua Ding
Date: 2023-11-13 19:27
To: kito.cheng; Robin Dapp
CC: juzhe.zh...@rivai.ai; gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.
Hi Kito,

On 2023/11/13 19:13, Lehua Ding wrote:

Hi Robin,

On 2023/11/13 18:33, Robin Dapp wrote:

On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:

If there is a difference between them. I think we should fix
riscv-common.cc.
Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.


I don't think it is different.  Just checked and it still works for me.

Could you please tell me how you invoke the testsuite?


This looks to be the difference between the linux and elf versions of
gcc. The elf version of gcc we are build will have this problem, the
linux version of gcc will not. I think the linux version of gcc has a
wrong behavior.:

➜  riscv-gnu-toolchain-push git:(tintin-dev)
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-newlib-spike-debug/install/bin/riscv32-unknown-elf-gcc
 -march=rv32gcv_zfh 
build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
riscv32-unknown-

Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-16 Thread Lehua Ding

Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel like 
there are a lot of issues, especially the long compile time issue. So 
I'm going to reorganize and refactor the patches so that as many of them 
as possible can be reviewed separately. this way there will be fewer 
patches to support subreg in the end. I plan to split it into four 
separate patches like bellow. What do you think?


1. live_subreg problem
2. conflict_hard_regs check refactoring
3. use object instead of allocno to create copies
4. support subreg coalesce
   4.1 ira: Apply live_subreg data to ira
   4.2 lra: Apply live_subreg data to lra
   4.3 ira: Support subreg liveness track
   4.4 lra: Support subreg liveness track

So for the two patches about LRA, maybe you can stop review and wait for 
the revised patchs.


On 2023/11/17 5:13, Vladimir Makarov wrote:


On 11/12/23 07:08, Lehua Ding wrote:
This patch changes the previous way of creating a copy between 
allocnos to objects.


gcc/ChangeLog:

* ira-build.cc (find_allocno_copy): Removed.
(find_object): New.
(ira_create_copy): Adjust.
(add_allocno_copy_to_list): Adjust.
(swap_allocno_copy_ends_if_necessary): Adjust.
(ira_add_allocno_copy): Adjust.
(print_copy): Adjust.
(print_allocno_copies): Adjust.
(ira_flattening): Adjust.
* ira-color.cc (INCLUDE_VECTOR): Include vector.
(struct allocno_color_data): Adjust.
(struct allocno_hard_regs_subnode): Adjust.
(form_allocno_hard_regs_nodes_forest): Adjust.
(update_left_conflict_sizes_p): Adjust.
(struct update_cost_queue_elem): Adjust.
(queue_update_cost): Adjust.
(get_next_update_cost): Adjust.
(update_costs_from_allocno): Adjust.
(update_conflict_hard_regno_costs): Adjust.
(assign_hard_reg): Adjust.
(objects_conflict_by_live_ranges_p): New.
(allocno_thread_conflict_p): Adjust.
(object_thread_conflict_p): Ditto.
(merge_threads): Ditto.
(form_threads_from_copies): Ditto.
(form_threads_from_bucket): Ditto.
(form_threads_from_colorable_allocno): Ditto.
(init_allocno_threads): Ditto.
(add_allocno_to_bucket): Ditto.
(delete_allocno_from_bucket): Ditto.
(allocno_copy_cost_saving): Ditto.
(color_allocnos): Ditto.
(color_pass): Ditto.
(update_curr_costs): Ditto.
(coalesce_allocnos): Ditto.
(ira_reuse_stack_slot): Ditto.
(ira_initiate_assign): Ditto.
(ira_finish_assign): Ditto.
* ira-conflicts.cc (allocnos_conflict_for_copy_p): Ditto.
(REG_SUBREG_P): Ditto.
(subreg_move_p): New.
(regs_non_conflict_for_copy_p): New.
(subreg_reg_align_and_times_p): New.
(process_regs_for_copy): Ditto.
(add_insn_allocno_copies): Ditto.
(propagate_copies): Ditto.
* ira-emit.cc (add_range_and_copies_from_move_list): Ditto.
* ira-int.h (struct ira_allocno_copy): Ditto.
(ira_add_allocno_copy): Ditto.
(find_object): Exported.
(subreg_move_p): Exported.
* ira.cc (print_redundant_copies): Exported.

---
  gcc/ira-build.cc | 154 +++-
  gcc/ira-color.cc | 541 +++
  gcc/ira-conflicts.cc | 173 +++---
  gcc/ira-emit.cc  |  10 +-
  gcc/ira-int.h    |  10 +-
  gcc/ira.cc   |   5 +-
  6 files changed, 646 insertions(+), 247 deletions(-)
The patch is mostly ok for me except that there are the same issues I 
mentioned in my 1st email. Not changing comments for functions with 
changed interface like function arg types and names (e.g. 
find_allocno_copy) is particularly bad.  It makes the comments confusing 
and wrong.  Also using just "adjust" in changelog entries is too brief. 
You should at least mention that function signature is changed.

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index a32693e69e4..13f0f7336ed 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 8aed25144b9..099312bcdb3 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see



-  ira_allocno_t next_thread_allocno;
+  ira_object_t *next_thread_objects;
+  /* The allocno all thread shared.  */
+  ira_allocno_t first_thread_allocno;
+  /* The offset start relative to the first_thread_allocno.  */
+  int first_thread_offset;
+  /* All allocnos belong to the thread.  */
+  bitmap thread_allocnos;


It is better to use bitmap_head instead of bitmap.  It permits to avoid 
allocation of bitmap_head for bitmap.  There are many places when 
bitmap_head in you patches can be better used than bitmap (it is 
especially profitable if there is significant probability of empty bitmap).


Of  course the patch cab be committed when all the patches are approved 
and fixed.





--
Best,
Lehua (RiVAI)


Re: [PATCH V3 0/7] ira/lra: Support subreg coalesce

2023-11-14 Thread Lehua Ding




On 2023/11/15 7:22, Peter Bergner wrote:

On 11/12/23 6:08 AM, Lehua Ding wrote:

V3 Changes:
   1. fix three ICE.
   2. rebase



I tested this on powerpc64le-linux and powerpc64-linux.  The LE build
bootstrapped fine and it looks like only one testsuite FAIL which I have
to look into why it's FAILing.

The BE build did bootstrap, but the 32-bit and 64-bit testsuite runs both
had lots of FAILs (over 100 between them both) which I have yet to look
into what is happening.


I've applied for machine permissions on the compile farm, can you give 
me the way to compile and run tests on PPC64BE machine? I'll take a look 
at it too, thanks a lot.



I'll also note I have done no performance testing yet until I have an
idea of what the testsuite failures are.  I think a patch like this that
can affect the performance of all architectures needs some performance
testing to ensure we don't have unintended performance degradations.
I'll have someone on my team kick off some builds once I have a handle
on the testsuite FAILs.


This is really great, thanks for helping to test the performance.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH V3 0/7] ira/lra: Support subreg coalesce

2023-11-14 Thread Lehua Ding




On 2023/11/14 0:43, Dimitar Dimitrov wrote:

On Sun, Nov 12, 2023 at 08:08:10PM +0800, Lehua Ding wrote:

V3 Changes:
   1. fix three ICE.
   2. rebase

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).



Hi Lehua,

V3 indeed fixes the arm-none-eabi build. It's also confirmed by Linaro CI:
   
https://patchwork.sourceware.org/project/gcc/patch/20231112120817.2635864-8-lehua.d...@rivai.ai/

But avr and pru backends are still broken, albeit with different crash
signatures. Both targets are peculiar because they have
UNITS_PER_WORD=1. I'll try building some 16-bit target like msp430.

AVR fails when building libgcc:
/mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c: In function 
'__roundlr':
/mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:115:3: 
internal compiler error: in check_allocation, at ira.cc:2673
   115 |   }
   |   ^
/mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:106:3: note: 
in expansion of macro 'ROUND2'
   106 |   ROUND2 (FX)
   |   ^~
/mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:117:1: note: 
in expansion of macro 'ROUND1'
   117 | ROUND1(L_LABEL)
   | ^~
0xc80b8d check_allocation
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:2673
0xc89451 ira
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5873
0xc89451 execute
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6104

Script I'm using to build avr: 
https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-avr.sh



PRU fails building newlib:
/mnt/nvme/dinux/local-workspace/newlib/newlib/libc/stdlib/gdtoa-gdtoa.c:835:9: 
internal compiler error: in lra_create_live_ranges, at lra-lives.cc:1933
   835 | }
   | ^
0x6b951c lra_create_live_ranges(bool, bool)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra-lives.cc:1933
0xd9320c lra(_IO_FILE*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2638
0xd3e519 do_reload
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0xd3e519 execute
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148

Script I'm using to build pru: 
https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-pru.sh


These ICE will fixed in the V4 patchs and both targets build 
successfully in my machine, thank you so much for the reported.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH] x86: Make testcase apx-spill_to_egprs-1.c more robust

2023-11-14 Thread Lehua Ding

Committed, thanks Hongtao.

On 2023/11/14 18:24, Hongtao Liu wrote:

On Tue, Nov 14, 2023 at 5:01 PM Lehua Ding  wrote:


Hi,

This little patch adjust the assert in apx-spill_to_egprs-1.c testcase.
The -mapxf compilation option allows more registers to be used, which in
turn eliminates the need for local variables to be stored in stack memory..
Therefore, the assertion is changed to detects no memory loaded through the
%rsp register.

Ok, thanks.


gcc/testsuite/ChangeLog:

 * gcc.target/i386/apx-spill_to_egprs-1.c: Make sure that no local
 variables are stored on the stack.

---
  .../gcc.target/i386/apx-spill_to_egprs-1.c| 19 +++
  1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c 
b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
index 290863d63a7..d7952b4c550 100644
--- a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
+++ b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
@@ -3,22 +3,9 @@

  #include "spill_to_mask-1.c"

-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r16d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r17d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r18d" } } */
-/* { dg-final { scan-assembler "movq\[ \t]+\[^\\n\\r\]*, %r19" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r20d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r21d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r22d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r23d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r24d" } } */
-/* { dg-final { scan-assembler "addl\[ \t]+\[^\\n\\r\]*, %r25d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r26d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r27d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r28d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r29d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r30d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r31d" } } */
+/* Make sure that no local variables are stored on the stack. */
+/* { dg-final { scan-assembler-not "\\(%rsp\\)" } } */
+
  /* { dg-final { scan-assembler-not "knot" } } */
  /* { dg-final { scan-assembler-not "kxor" } } */
  /* { dg-final { scan-assembler-not "kor" } } */
--
2.36.3






--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



[PATCH] x86: Make testcase apx-spill_to_egprs-1.c more robust

2023-11-14 Thread Lehua Ding
Hi,

This little patch adjust the assert in apx-spill_to_egprs-1.c testcase.
The -mapxf compilation option allows more registers to be used, which in
turn eliminates the need for local variables to be stored in stack memory.
Therefore, the assertion is changed to detects no memory loaded through the
%rsp register.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-spill_to_egprs-1.c: Make sure that no local
variables are stored on the stack.

---
 .../gcc.target/i386/apx-spill_to_egprs-1.c| 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c 
b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
index 290863d63a7..d7952b4c550 100644
--- a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
+++ b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
@@ -3,22 +3,9 @@
 
 #include "spill_to_mask-1.c"
 
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r16d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r17d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r18d" } } */
-/* { dg-final { scan-assembler "movq\[ \t]+\[^\\n\\r\]*, %r19" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r20d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r21d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r22d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r23d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r24d" } } */
-/* { dg-final { scan-assembler "addl\[ \t]+\[^\\n\\r\]*, %r25d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r26d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r27d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r28d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r29d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r30d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r31d" } } */
+/* Make sure that no local variables are stored on the stack. */
+/* { dg-final { scan-assembler-not "\\(%rsp\\)" } } */
+
 /* { dg-final { scan-assembler-not "knot" } } */
 /* { dg-final { scan-assembler-not "kxor" } } */
 /* { dg-final { scan-assembler-not "kor" } } */
-- 
2.36.3



Re: [PATCH V3 1/7] df: Add DF_LIVE_SUBREG problem

2023-11-14 Thread Lehua Ding




On 2023/11/14 16:14, Richard Biener wrote:

On Mon, Nov 13, 2023 at 11:39 PM Vladimir Makarov  wrote:



On 11/12/23 07:08, Lehua Ding wrote:

This patch adds a live_subreg problem to extend the original live_reg to
track the liveness of subreg. We will only try to trace speudo registers
who's mode size is a multiple of nature size and eventually a small portion
of the inside will appear to use subreg. With live_reg problem, live_subreg
prbolem will have the following output. full_in/out mean the entire pesudo
live in/out, partial_in/out mean the subregs of the pesudo are live in/out,
and range_in/out indicates which part of the pesudo is live. all_in/out is
the union of full_in/out and partial_in/out:


I am not a maintainer or reviewer of data-flow analysis framework and
can not approve this patch except changes in regs.h.  Richard Sandiford
or Jeff Law as global reviewers probably can do this.

As for regs.h changes, they are ok for me after fixing general issues I
mentioned in my previous email (two spaces after sentence ends in the
comments).

I think all this code is a major compiler time and memory consumer in
all set of the patches.  DF analysis is slow by itself even when only
effective data structures as bitmaps are used but you are introducing
even slower data structure as maps (I believe better performance data
structure can be used instead).  In the very first version of LRA I used
DFA but it made LRA so slow that I had to introduce own data structures
which are faster in case of massive RTL changes in LRA.  The same
problem exists for using generic C++ standard library data as vectors
and maps for critical code.  It is hard to get a needed performance when
the exact implementation can vary or be not what you need, e.g. vector
initial capacity, growth etc.  But again the performance issues can be
addressed later.


I think the important bit should be the subreg live analysis should be
opt-in and when not enabled shouldn't have a bad effect on memory
usage and compile-time.  At -O0 and -O1 RA consumes a major
amount of compile-time.


This is perfectly fine, the code inside the live_subreg problem has a 
branch that goes through similar logic to live_reg if it finds no subreg 
inside the program. Then when the optimization level is less than 2, it 
doesn't track the subreg. By the way, I'd like to ask you if you have 
certain programs where RA has a big impact on compilation time to offer? 
Or any suggestions about it?


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH V3 0/7] ira/lra: Support subreg coalesce

2023-11-13 Thread Lehua Ding

Hi Vladimir,

On 2023/11/14 3:37, Vladimir Makarov wrote:


On 11/12/23 07:08, Lehua Ding wrote:

V3 Changes:
   1. fix three ICE.
   2. rebase

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).

I've started review of v3 patches and here is my initial general 
criticism of your patches:


   * Absence of comments for some functions, e.g. for `HARD_REG_SET 
operator>> (unsigned int shift_amount) const`.


   * Adding significant functionality to existing functions is not 
reflected in the function comment, e.g. in ira_set_allocno_class.


   * A lot of typos, e.g. `pesudo` or `reprensent`.  I think you need to 
check spelling of you comments (I myself do spell checking in emacs by 
ispell-region command).


   * Grammar mistakes, e.g `Flag means need track subreg live range for 
the allocno`.  I understand English is not your native languages (as for 
me).  In case of some doubts I'd recommend to check grammar in ChatGPT 
(Proofread:  text).


   * Some local variables use upper case letters (e.g. `int A`) which 
should be used for macros or enums according to GNU coding standard 
(https://www.gnu.org/prep/standards/standards.html) .


   * Sometimes you put one space at the end of sentence.  Please see GNU 
coding standard and GCC coding conventions 
(https://gcc.gnu.org/codingconventions.html)


   * There is no uniformity in your code, e.g. sometimes you use 'i++', 
sometimes `++i` or `i += 1`.  Although the uniformity is not necessary, 
it makes a better impression about the patches.


Sorry for these issue, I'll address all those comments.

I also did not find what targets did you use for testing.  I am asking 
this because I see new testsuite failures (apx-spill_to_egprs-1.c) even 
on x86-64.  It might be nothing as the test expects a specific code 
generation.


There was testing x86, aarch64, riscv not long ago, but it looks like 
I'm missing something, I just locally tested with the latest code and 
also reproduced this fail you mentioned, along with a c++ fail 
(pr106877.C). I'll have a look at the cause.


Also besides testing major targets I'd recommend testing at least one 
big endian target (I'd recommend ppc64be. gcc110.fsfrance.org could be 
used for this).  Plenty RA issues occur because BE targets are not tested.


You said the address looks a bit wrong, it should be this 
gcc110.fsffrance.org right? I looked for it and it looks like you have 
to go to portal.cfarm.net first to apply for an account on this site, 
I'll try that, thanks a lot.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding

Hi Kito,

On 2023/11/13 19:13, Lehua Ding wrote:

Hi Robin,

On 2023/11/13 18:33, Robin Dapp wrote:

On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:
If there is a difference between them. I think we should fix 
riscv-common.cc.

Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.


I don't think it is different.  Just checked and it still works for me.

Could you please tell me how you invoke the testsuite?


This looks to be the difference between the linux and elf versions of 
gcc. The elf version of gcc we are build will have this problem, the 
linux version of gcc will not. I think the linux version of gcc has a 
wrong behavior.:


➜  riscv-gnu-toolchain-push git:(tintin-dev) 
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-newlib-spike-debug/install/bin/riscv32-unknown-elf-gcc -march=rv32gcv_zfh build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
riscv32-unknown-elf-gcc: fatal error: Cannot find suitable multilib set 
for 
'-march=rv32imafdcv_zicsr_zifencei_zfh_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=ilp32d'

compilation terminated.
➜  riscv-gnu-toolchain-push git:(tintin-dev) 
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-linux-spike-debug/install/bin/riscv32-unknown-linux-gnu-gcc -march=rv32gcv_zfh build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c




It looks like this commit[1] from you make the difference between elf 
and linux. Can you help to see if it makes sense to behave differently 
now? elf version --with-arch is rv32gcv_zvfh_zfh, and the user will get 
an error with -march=rv32gcv_zfh. linux version will not.


[1] https://github.com/gcc-mirror/gcc/commit/17d683d

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding

Hi Robin,

On 2023/11/13 18:33, Robin Dapp wrote:

On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:

If there is a difference between them. I think we should fix riscv-common.cc.
Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.


I don't think it is different.  Just checked and it still works for me.

Could you please tell me how you invoke the testsuite?


This looks to be the difference between the linux and elf versions of 
gcc. The elf version of gcc we are build will have this problem, the 
linux version of gcc will not. I think the linux version of gcc has a 
wrong behavior.:


➜  riscv-gnu-toolchain-push git:(tintin-dev) 
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-newlib-spike-debug/install/bin/riscv32-unknown-elf-gcc 
-march=rv32gcv_zfh 
build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
riscv32-unknown-elf-gcc: fatal error: Cannot find suitable multilib set 
for 
'-march=rv32imafdcv_zicsr_zifencei_zfh_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=ilp32d'

compilation terminated.
➜  riscv-gnu-toolchain-push git:(tintin-dev) 
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-linux-spike-debug/install/bin/riscv32-unknown-linux-gnu-gcc 
-march=rv32gcv_zfh 
build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding




On 2023/11/13 18:33, Robin Dapp wrote:

On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:

If there is a difference between them. I think we should fix riscv-common.cc.
Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.


I don't think it is different.  Just checked and it still works for me.

Could you please tell me how you invoke the testsuite?


We use the riscv-gnu-toolchain and run this `make report-newlib 
SIM=spike RUNTESTFLAGS="rvv.exp" -j100`


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding



On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:
If there is a difference between them. I think we should fix 
riscv-common.cc.

Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding




On 2023/11/13 17:59, Robin Dapp wrote:

Hi Lehua,


Executing on host: 
/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/xgcc
 
-B/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/
/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
  -march=rv64gcv_zvfh_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output  -O3 -ftree-vectorize -ansi -pedantic-errors 
-march=rv64gcv_zfh -mabi=lp64d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable  -lm  -o ./slp-mask-run-1.exe    
(timeout = 6000)
spawn -ignore SIGHUP 
/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/xgcc
 
-B/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/
 
/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
 -march=rv64gcv_zvfh_zfh -mabi=lp64d -mcmodel=medany -fdiagnostics-plain-output 
-O3 -ftree-vectorize -ansi -pedantic-errors -march=rv64gcv_zfh -mabi=lp64d -O3 
-std=gnu99 -O3 --param=riscv-autovec-preference=scalable -lm -o 
./slp-mask-run-1.exe



Executing on host: /home/rdapp/projects/gcc32/build/gcc/xgcc 
-B/home/rdapp/projects/gcc32/build/gcc/  
/home/rdapp/projects/gcc32/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
  -march=rv32gcv_zvfh   -fdiagnostics-plain-output  -O3 -ftree-vectorize -ansi 
-pedantic-errors -march=rv32gcv_zfh -mabi=ilp32d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable  -lm  -o ./slp-mask-run-1.exe
(timeout = 300)
spawn -ignore SIGHUP /home/rdapp/projects/gcc32/build/gcc/xgcc 
-B/home/rdapp/projects/gcc32/build/gcc/ 
/home/rdapp/projects/gcc32/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
 -march=rv32gcv_zvfh -fdiagnostics-plain-output -O3 -ftree-vectorize -ansi 
-pedantic-errors -march=rv32gcv_zfh -mabi=ilp32d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable -lm -o ./slp-mask-run-1.exe


Looks like your configure is --with-march=rv32gcv_zvfh, can you change 
to --with-march=rv32gcv_zvfh_zfh?


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding

Hi Robin,

Can you show me the compile command in gcc.log for the 
slp-mask-run-1.exe like bellow? I'd like to see the -march option on 
your side.


Executing on host: 
/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/xgcc 
-B/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/ 

/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c 
 -march=rv64gcv_zvfh_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output  -O3 -ftree-vectorize -ansi -pedantic-errors 
-march=rv64gcv_zfh -mabi=lp64d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable  -lm  -o 
./slp-mask-run-1.exe(timeout = 6000)
spawn -ignore SIGHUP 
/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/xgcc 
-B/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/ 
/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c 
-march=rv64gcv_zvfh_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output -O3 -ftree-vectorize -ansi -pedantic-errors 
-march=rv64gcv_zfh -mabi=lp64d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable -lm -o ./slp-mask-run-1.exe


On 2023/11/13 17:31, Robin Dapp wrote:

I'm going to configure with --with-arch=rv32gcv_zfh_zvfh --with-abi=ilp32d
to see if there is any difference.


No change for me, how do you invoke the testsuite? I.e. Which target board?

Regards
  Robin


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-12 Thread Lehua Ding




On 2023/11/13 9:11, juzhe.zh...@rivai.ai wrote:

Ah, nice!  How configurable are the bit ranges?

I think Lehua's patch is configurable for bit ranges.
Since his patch allow target flexible tracking subreg livenesss 
according to REGMODE_NATURAL_SIZE


+/* Return true if REGNO is a pseudo and MODE is a multil regs size.  */
+bool
+need_track_subreg (int regno, machine_mode reg_mode)
+{
+  poly_int64 total_size = GET_MODE_SIZE (reg_mode);
+  poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode);
+  return maybe_gt (total_size, natural_size)
+&& multiple_p (total_size, natural_size)
+&& regno >= FIRST_PSEUDO_REGISTER;
+}

It depends on how targets configure REGMODE_NATURAL_SIZE target hook.

If we return QImode size, his patch is enable tracking bit ranges 7 bits 
subreg.


Yes, the current subreg_ranges class provides 
remove_range/add_range/remove_ranges/add_ranges interfaces to modify 
ranges. Each subreg_range contains start and end fields representing the 
range [start, end). For live_subreg problem, the value returned by 
REGMODE_NATURAL_SIZE is used as the unit, for bit track like Jeff's 
side, it can be used bit as the unit.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-12 Thread Lehua Ding

Hi Vladimir,

While you're starting your review, please review v3 version that fixes 
some ICE issues, thanks.


https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636178.html

On 2023/11/12 20:01, Lehua Ding wrote:

Hi Vladimir,

On 2023/11/10 4:24, Vladimir Makarov wrote:


On 11/7/23 22:47, Lehua Ding wrote:


Lehua Ding (7):
   ira: Refactor the handling of register conflicts to make it more
 general
   ira: Add live_subreg problem and apply to ira pass
   ira: Support subreg live range track
   ira: Support subreg copy
   ira: Add all nregs >= 2 pseudos to tracke subreg list
   lra: Apply live_subreg df_problem to lra pass
   lra: Support subreg live range track and conflict detect

Thank you very much for addressing subreg RA.  It is a big work.  I 
wanted to address this long time ago but have no time to do this by 
myself.


I tried to evaluate your patches on x86-64 (i7-9700k) release mode 
GCC. I used -O3 for SPEC2017 compilation.


Here are the results:

    baseline baseline(+patches)
specint2017:  8.51 vs 8.58 (+0.8%)
specfp2017:   21.1 vs 21.1 (+0%)
compile time: 2426.41s vs 2580.58s (+6.4%)

Spec2017 average code size change: -0.07%

Improving specint by 0.8% is impressive for me.

Unfortunately, it is achieved by decreasing compilation speed by 6.4% 
(although on smaller benchmark I saw only 3% slowdown). I don't know 
how but we should mitigate this speed degradation.  May be we can find 
a hot spot in the new code (but I think it is not a linear search 
pointed by Richard Biener as the object vectors most probably contain 
1-2 elements) and this code spot can be improved, or we could use this 
only for -O3/fast, or the code can be function or target dependent.


I also find GCC consumes more memory with the patches. May be it can 
be improved too (although I am not sure about this).


Thanks for the specint performance data. I'll do my best to get the 
compile time and memory issues fixed. I'm very curious to know if the 
way used to solve the subreg coalesce problem makes sense to you?


I'll start to review the patches on the next week.  I don't expect 
that I'll find something serious to reject the patches but again we 
should work on mitigation of the compilation speed problem.  We can 
fill a new PR for this and resolve the problem during the release cycle.




--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


[PATCH V3 7/7] lra: Support subreg live range track and conflict detect

2023-11-12 Thread Lehua Ding
This patch supports tracking the liveness of a subreg in a lra pass, with the
goal of getting it to agree with ira's register allocation scheme. There is some
duplication, maybe in the future this part of the code logic can be harmonized.

gcc/ChangeLog:

* ira-build.cc (setup_pseudos_has_subreg_object):
Collect new data for lra to use.
(ira_build): Ditto.
* lra-assigns.cc (set_offset_conflicts): New function.
(setup_live_pseudos_and_spill_after_risky_transforms): Adjust.
(lra_assign): Ditto.
* lra-constraints.cc (process_alt_operands): Ditto.
* lra-int.h (GCC_LRA_INT_H): Ditto.
(struct lra_live_range): Ditto.
(struct lra_insn_reg): Ditto.
(get_range_hard_regs): New.
(get_nregs): New.
(has_subreg_object_p): New.
* lra-lives.cc (INCLUDE_VECTOR): Adjust.
(lra_live_range_pool): Ditto.
(create_live_range): Ditto.
(lra_merge_live_ranges): Ditto.
(update_pseudo_point): Ditto.
(mark_regno_live): Ditto.
(mark_regno_dead): Ditto.
(process_bb_lives): Ditto.
(remove_some_program_points_and_update_live_ranges): Ditto.
(lra_print_live_range_list): Ditto.
(class subreg_live_item): New.
(create_subregs_live_ranges): New.
(lra_create_live_ranges_1): Ditto.
* lra.cc (get_range_blocks): Ditto.
(get_range_hard_regs): Ditto.
(new_insn_reg): Ditto.
(collect_non_operand_hard_regs): Ditto.
(initialize_lra_reg_info_element): Ditto.
(reg_same_range_p): New.
(add_regs_to_insn_regno_info): Adjust.

---
 gcc/ira-build.cc   |  31 
 gcc/lra-assigns.cc | 111 --
 gcc/lra-constraints.cc |  18 ++-
 gcc/lra-int.h  |  31 
 gcc/lra-lives.cc   | 340 ++---
 gcc/lra.cc | 139 +++--
 6 files changed, 585 insertions(+), 85 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index f88aaef..bb29627d375 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -95,6 +95,9 @@ int ira_copies_num;
basic block.  */
 static int last_basic_block_before_change;
 
+/* Record these pseudos which has subreg object. Used by LRA pass.  */
+bitmap_head pseudos_has_subreg_object;
+
 /* Initialize some members in loop tree node NODE.  Use LOOP_NUM for
the member loop_num.  */
 static void
@@ -3711,6 +3714,33 @@ update_conflict_hard_reg_costs (void)
 }
 }
 
+/* Setup speudos_has_subreg_object.  */
+static void
+setup_pseudos_has_subreg_object ()
+{
+  bitmap_initialize (_has_subreg_object, _obstack);
+  ira_allocno_t a;
+  ira_allocno_iterator ai;
+  FOR_EACH_ALLOCNO (a, ai)
+if (has_subreg_object_p (a))
+  {
+   bitmap_set_bit (_has_subreg_object, ALLOCNO_REGNO (a));
+   if (ira_dump_file != NULL)
+ {
+   fprintf (ira_dump_file,
+"  a%d(r%d, nregs: %d) has subreg objects:\n",
+ALLOCNO_NUM (a), ALLOCNO_REGNO (a), ALLOCNO_NREGS (a));
+   ira_allocno_object_iterator oi;
+   ira_object_t obj;
+   FOR_EACH_ALLOCNO_OBJECT (a, obj, oi)
+ fprintf (ira_dump_file, "object %d: start: %d, nregs: %d\n",
+  OBJECT_INDEX (obj), OBJECT_START (obj),
+  OBJECT_NREGS (obj));
+   fprintf (ira_dump_file, "\n");
+ }
+  }
+}
+
 /* Create a internal representation (IR) for IRA (allocnos, copies,
loop tree nodes).  The function returns TRUE if we generate loop
structure (besides nodes representing all function and the basic
@@ -3731,6 +3761,7 @@ ira_build (void)
   create_allocnos ();
   ira_costs ();
   create_allocno_objects ();
+  setup_pseudos_has_subreg_object ();
   ira_create_allocno_live_ranges ();
   remove_unnecessary_regions (false);
   ira_compress_allocno_live_ranges ();
diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index d2ebcfd5056..6588a740162 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1131,6 +1131,52 @@ assign_hard_regno (int hard_regno, int regno)
 /* Array used for sorting different pseudos.  */
 static int *sorted_pseudos;
 
+/* The detail conflict offsets If two live ranges conflict. Use to record
+   partail conflict.  */
+static bitmap_head live_range_conflicts;
+
+/* Set the conflict offset of the two registers REGNO1 and REGNO2. Use the
+   regno with bigger nregs as the base.  */
+static void
+set_offset_conflicts (int regno1, int regno2)
+{
+  gcc_assert (reg_renumber[regno1] >= 0 && reg_renumber[regno2] >= 0);
+  int nregs1 = get_nregs (regno1);
+  int nregs2 = get_nregs (regno2);
+  if (nregs1 < nregs2)
+{
+  std::swap (nregs1, nregs2);
+  std::swap (regno1, regno2);
+}
+
+  lra_live_range_t r1 = lra_reg_info[regno1].live_ranges;
+  lra_live_range_t r2 = lra_reg_info[regno2].live_ranges;
+  int total = nregs1;
+
+  bitmap_clear (_range_conflicts);
+  while (r1 != 

[PATCH V3 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list

2023-11-12 Thread Lehua Ding
This patch relax the subreg track capability to all subreg registers.

gcc/ChangeLog:

* ira-build.cc (get_reg_unit_size): New.
(has_same_nregs): New.
(ira_set_allocno_class): Adjust.

---
 gcc/ira-build.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 13f0f7336ed..f88aaef 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -607,6 +607,37 @@ ira_create_allocno (int regno, bool cap_p,
   return a;
 }
 
+/* Return single register size of allocno A.  */
+static poly_int64
+get_reg_unit_size (ira_allocno_t a)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ALLOCNO_NREGS (a);
+  poly_int64 block_size = REGMODE_NATURAL_SIZE (mode);
+  int nblocks = get_nblocks (mode);
+  gcc_assert (nblocks % nregs == 0);
+  return block_size * (nblocks / nregs);
+}
+
+/* Return true if TARGET_CLASS_MAX_NREGS and TARGET_HARD_REGNO_NREGS results is
+   same. It should be noted that some targets may not implement these two very
+   uniformly, and need to be debugged step by step. For example, in V3x1DI mode
+   in AArch64, TARGET_CLASS_MAX_NREGS returns 2 but TARGET_HARD_REGNO_NREGS
+   returns 3. They are in conflict and need to be repaired in the Hook of
+   AArch64.  */
+static bool
+has_same_nregs (ira_allocno_t a)
+{
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+if (REGNO_REG_CLASS (i) != NO_REGS
+   && reg_class_subset_p (REGNO_REG_CLASS (i), ALLOCNO_CLASS (a))
+   && ALLOCNO_NREGS (a) != hard_regno_nregs (i, ALLOCNO_MODE (a)))
+  return false;
+  return true;
+}
+
 /* Set up register class for A and update its conflict hard
registers.  */
 void
@@ -624,12 +655,12 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class 
aclass)
 
   if (aclass == NO_REGS)
 return;
-  /* SET the unit_size of one register.  */
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD))
+  gcc_assert (!ALLOCNO_TRACK_SUBREG_P (a));
+  /* Set unit size and track_subreg_p flag for pseudo which need occupied multi
+ hard regs.  */
+  if (ALLOCNO_NREGS (a) > 1 && has_same_nregs (a))
 {
-  ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
+  ALLOCNO_UNIT_SIZE (a) = get_reg_unit_size (a);
   ALLOCNO_TRACK_SUBREG_P (a) = true;
   return;
 }
-- 
2.36.3



[PATCH V3 3/7] ira: Support subreg live range track

2023-11-12 Thread Lehua Ding
This patch supports tracking subreg liveness. It first extends
ira_object_t objects[2] to std::vector objects,
which can hold more than one object, and is used to collect all
access via subreg in program and the partial_in and partial_out
of the basic block live in/out.

Then there is a modification to the way conflicts between registers
are detected, for example, if a object conflicts with b object, then
the offset and size of the object relative to the allocno it belongs
to need to be taken into account to compute the conflict registers
between allocno and allocno.

gcc/ChangeLog:

* hard-reg-set.h (struct HARD_REG_SET): New shift operator.
* ira-build.cc (ira_create_object): Adjust.
(find_object): New.
(find_object_anyway): New.
(ira_create_allocno): Adjust.
(get_range): New.
(ira_copy_allocno_objects): New.
(merge_hard_reg_conflicts): Adjust copy.
(create_cap_allocno): Adjust.
(find_subreg_p): New.
(add_subregs): New.
(create_insn_allocnos): Collect subreg.
(create_bb_allocnos): Ditto.
(move_allocno_live_ranges): Adjust.
(copy_allocno_live_ranges): Adjust.
(setup_min_max_allocno_live_range_point): Adjust.
* ira-color.cc (INCLUDE_MAP): include map.
(setup_left_conflict_sizes_p): Adjust conflict size.
(setup_profitable_hard_regs): Adjust.
(get_conflict_and_start_profitable_regs): Adjust.
(check_hard_reg_p): Adjust conflict check.
(assign_hard_reg): Adjust.
(push_allocno_to_stack): Adjust conflict size.
(improve_allocation): Adjust.
* ira-conflicts.cc (record_object_conflict): Simplify.
(build_object_conflicts): Adjust.
(build_conflicts): Adjust.
(print_allocno_conflicts): Adjust.
* ira-emit.cc (modify_move_list): Adjust.
* ira-int.h (struct ira_object): Adjust struct.
(struct ira_allocno): Adjust struct.
(ALLOCNO_NUM_OBJECTS): New accessor.
(ALLOCNO_UNIT_SIZE): Ditto.
(ALLOCNO_TRACK_SUBREG_P): Ditto.
(ALLOCNO_NREGS): Ditto.
(OBJECT_SUBWORD): Ditto.
(OBJECT_INDEX): Ditto.
(OBJECT_START): Ditto.
(OBJECT_NREGS): Ditto.
(find_object): Exported.
(find_object_anyway): Ditto.
(ira_copy_allocno_objects): Ditto.
(has_subreg_object_p): Ditto.
(get_full_object): Ditto.
* ira-lives.cc (INCLUDE_VECTOR): Include vector.
(add_onflict_hard_regs): New.
(add_onflict_hard_reg): New.
(make_hard_regno_dead): Adjust.
(make_object_live): Adjust.
(update_allocno_pressure_excess_length): Adjust.
(make_object_dead): Adjust.
(mark_pseudo_regno_live): Adjust.
(add_subreg_point): New.
(mark_pseudo_object_live): Adjust.
(mark_pseudo_regno_subword_live): Adjust.
(mark_pseudo_regno_subreg_live): Adjust.
(mark_pseudo_regno_subregs_live): Adjust.
(mark_pseudo_reg_live): Adjust.
(mark_pseudo_regno_dead): Adjust.
(mark_pseudo_object_dead): Adjust.
(mark_pseudo_regno_subword_dead): Adjust.
(mark_pseudo_regno_subreg_dead): Adjust.
(mark_pseudo_reg_dead): Adjust.
(process_single_reg_class_operands): Adjust.
(process_out_of_region_eh_regs): Adjust.
(add_conflict_from_region_landing_pads): Adjust.
(process_bb_node_lives): Adjust.
(class subreg_live_item): New class.
(create_subregs_live_ranges): New function.
(ira_create_allocno_live_ranges): Adjust.
* ira.cc (check_allocation): Adjust.

---
 gcc/hard-reg-set.h   |  33 +++
 gcc/ira-build.cc | 235 +---
 gcc/ira-color.cc | 302 +-
 gcc/ira-conflicts.cc |  48 ++---
 gcc/ira-emit.cc  |   2 +-
 gcc/ira-int.h|  57 -
 gcc/ira-lives.cc | 500 ---
 gcc/ira.cc   |  52 ++---
 8 files changed, 907 insertions(+), 322 deletions(-)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..760eadba186 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -113,6 +113,39 @@ struct HARD_REG_SET
 return !operator== (other);
   }
 
+  HARD_REG_SET
+  operator>> (unsigned int shift_amount) const
+  {
+if (shift_amount == 0)
+  return *this;
+
+HARD_REG_SET res;
+unsigned int total_bits = sizeof (HARD_REG_ELT_TYPE) * 8;
+if (shift_amount >= total_bits)
+  {
+   unsigned int n_elt = shift_amount % total_bits;
+   shift_amount -= n_elt * total_bits;
+   for (unsigned int i = 0; i < ARRAY_SIZE (elts) - n_elt - 1; i += 1)
+ res.elts[i] = elts[i + n_elt];
+   /* clear upper n_elt elements.  */
+   for (unsigned int i = 0; i < n_elt; i += 1)
+ res.elts[ARRAY_SIZE (elts) - 1 - i] = 0;
+  }
+
+if (shift_amount > 0)
+  {
+   /* The left bits of an element 

[PATCH V3 6/7] lra: Switch to live_subreg data flow

2023-11-12 Thread Lehua Ding
This patch switches the live_reg data in lra to live_subreg data,
and the situation will be more complicated than in ira because
this part of the data is modified in lra also and the live_subreg
data will be recalculated.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info):
Adjust to new live subreg data.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): Ditto.
* lra-lives.cc (class bb_data_pseudos): Ditto.
(make_hard_regno_live): Ditto.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Ditto.
(mark_regno_dead): Ditto.
(live_trans_fun): Ditto.
(live_con_fun_0): Ditto.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Ditto.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Ditto.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.

---
 gcc/lra-coalesce.cc|  20 ++-
 gcc/lra-constraints.cc |  93 +---
 gcc/lra-int.h  |   2 +
 gcc/lra-lives.cc   | 328 -
 gcc/lra-remat.cc   |  13 +-
 gcc/lra-spills.cc  |  22 ++-
 6 files changed, 374 insertions(+), 104 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index 04a5bbd714b..abfc54f1cc2 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -188,19 +188,25 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
   if (! bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  bitmap_and_compl_into (full, _pseudos_bitmap);
+  bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+  bitmap_and_compl_into (partial, _pseudos_bitmap);
+  bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
 }
 }
 
@@ -303,8 +309,10 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (DF_LIVE_SUBREG_IN (bb), DF_LIVE_SUBREG_FULL_IN (bb),
+   DF_LIVE_SUBREG_PARTIAL_IN (bb));
+  update_live_info (DF_LIVE_SUBREG_OUT (bb), DF_LIVE_SUBREG_FULL_OUT (bb),
+   DF_LIVE_SUBREG_PARTIAL_OUT (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0607c8be7cb..c3ad846b97b 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6571,34 +6571,75 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = DF_LIVE_SUBREG_IN (prev_bb);
+ bitmap subreg_full_in = DF_LIVE_SUBREG_FULL_IN (prev_bb);
+ bitmap subreg_partial_in = DF_LIVE_SUBREG_PARTIAL_IN (prev_bb);
+ subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j, bi)
if (bitmap_bit_p (_regs, j))
- bitmap_set_bit (df_get_live_in (prev_bb), j);
-   else
- bitmap_clear_bit (df_get_live_in (prev_bb), j);
+ {
+   bitmap_set_bit (subreg_all_in, j);
+   bitmap_set_bit (subreg_full_in, j);
+   if (bitmap_bit_p (subreg_partial_in, j))
+ {
+   bitmap_clear_bit (subreg_partial_in, j);
+   range_in->remove_live (j);
+ }
+ }
+   else if (bitmap_bit_p (subreg_all_in, j))
+ {
+   bitmap_clear_bit (subreg_all_in, j);
+   bitmap_clear_bit (subreg_full_in, j);
+   if 

Re: [PATCH V2 0/7] ira/lra: Support subreg coalesce

2023-11-12 Thread Lehua Ding
These patches found a new bug and I resend a v3 version, I'm sorry about 
this.


V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636178.html

On 2023/11/12 17:58, Lehua Ding wrote:

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).

Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT):

```
#include 

void
foo (int32_t *in, int32_t *out, size_t m)
{
   vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32);
   vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0);
   vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1);
   for (size_t i = 0; i < m; i++)
 {
   v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
   v1 = __riscv_vmul_vv_i32m1(v1, v1, 4);
 }
   *(vint32m1_t*)(out+4*0) = v0;
   *(vint32m1_t*)(out+4*1) = v1;
}
```

Before these patchs:

```
foo:
li  a5,32
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v4,0(a0)
vmv1r.v v2,v4
vmv1r.v v1,v5
beq a2,zero,.L2
li  a5,0
vsetivlizero,4,e32,m1,ta,ma
.L3:
addia5,a5,1
vadd.vv v2,v2,v2
vmul.vv v1,v1,v1
bne a2,a5,.L3
.L2:
vs1r.v  v2,0(a1)
addia1,a1,16
vs1r.v  v1,0(a1)
ret
```

After these patchs:

```
foo:
li  a5,32
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v2,0(a0)
beq a2,zero,.L2
li  a5,0
vsetivlizero,4,e32,m1,ta,ma
.L3:
addia5,a5,1
vadd.vv v2,v2,v2
vmul.vv v3,v3,v3
bne a2,a5,.L3
.L2:
vs1r.v  v2,0(a1)
addia1,a1,16
vs1r.v  v3,0(a1)
ret
```

As you can see, the two redundant vmv1r.v instructions were removed.
The reason for the two redundant vmv1r.v instructions is because
the current ira pass is being conservative in calculating the live
range of pseduo registers that occupy multil hardregs. As in the
following two RTL instructions. Where r134 occupies two physical
registers and r135 and r136 occupy one physical register.
At insn 12 point, ira considers the entire r134 pseudo register
to be live, so r135 is in conflict with r134, as shown in the ira
dump info. Then when the physical registers are allocated, r135 and
r134 are allocated first because they are inside the loop body and
have higher priority. This makes it difficult to assign r136 to
overlap with r134, i.e., to assign r136 to hr100, thus eliminating
the need for the vmv1r.v instruction. Thus two vmv1r.v instructions
appear.

If we refine the live information of r134 to the case of each subreg,
we can remove this conflict. We can then create copies of the set
with subreg reference, thus increasing the priority of the r134 allocation,
which allow registers with bigger alignment requirements to prioritize
the allocation of physical registers. In RVV, pseudo registers occupying
two physical registers need to be time-2 aligned.

```
(insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ])
 (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) 
"/app/example.c":7:19 998 {*movrvvm1si_whole}
  (nil))
(insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ])
 (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) 
"/app/example.c":8:19 998 {*movrvvm1si_whole}
  (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ])
 (nil)))
```

ira dump:

;; a1(r136,l0) conflicts: a3(r135,l0)
;; total conflict hard regs:
;; conflict hard regs:
;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0)
;; total conflict hard regs:
;; conflict hard regs:
;; a6(r134,l0) conflicts: a3(r135,l0)
;; total conflict hard regs:
;; conflict hard regs:
;;
;; ...
   Popping a1(r135,l0)  -- assign reg 97
   Popping a3(r136,l0)  -- assign reg 98
   Popping a4(r137,l0)  -- assign reg 15
   Popping a5(r140,l0)  -- assign reg 12
   Popping a10(r145,l0)  -- assign reg 12
   Popping a2(r139,l0)  -- assign reg 11
   Popping a9(r144,l0)  -- assign reg 11
   Popping a0(r142,l0)  -- assign reg 11
   Popping a6(r134,l0)  -- assign reg 100
   Popping a7(r143,l0)  -- assign reg 10
   Popping a8(r141,l0)  -- assign reg 15

The AArch64 SVE has the same problem. Consider the following
code (https://godbolt.org/z/MYrK7Ghaj):

```
#include 

int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, 
int64_t*out)
{
   svint64x4_t result = svld4_s64 (pg, base);
   svint64_t v0 = svget4_s64(result, 0);
   svint64_t v1 = svget4_s64(result, 1);
   svint64_t v2 = svget4_s64(result, 2);
   svint64_t v3 = svget4_s64(result, 3);

   for (int i = 0; i < n; i += 1)
 {
 svint64_t v18 = svld1_s64(pg, in1);
 svint64_t v19 = svld1_s64(pg, in2);
 v0 = svmad_s64_z(pg, v0, v18, v19);
 v1 = svmad_s64_z(pg, v1, v18, v19);
 v2 = svmad_s64_z(pg, v2, v18, v19);
 v3

[PATCH V3 4/7] ira: Support subreg copy

2023-11-12 Thread Lehua Ding
This patch changes the previous way of creating a copy between allocnos to 
objects.

gcc/ChangeLog:

* ira-build.cc (find_allocno_copy): Removed.
(find_object): New.
(ira_create_copy): Adjust.
(add_allocno_copy_to_list): Adjust.
(swap_allocno_copy_ends_if_necessary): Adjust.
(ira_add_allocno_copy): Adjust.
(print_copy): Adjust.
(print_allocno_copies): Adjust.
(ira_flattening): Adjust.
* ira-color.cc (INCLUDE_VECTOR): Include vector.
(struct allocno_color_data): Adjust.
(struct allocno_hard_regs_subnode): Adjust.
(form_allocno_hard_regs_nodes_forest): Adjust.
(update_left_conflict_sizes_p): Adjust.
(struct update_cost_queue_elem): Adjust.
(queue_update_cost): Adjust.
(get_next_update_cost): Adjust.
(update_costs_from_allocno): Adjust.
(update_conflict_hard_regno_costs): Adjust.
(assign_hard_reg): Adjust.
(objects_conflict_by_live_ranges_p): New.
(allocno_thread_conflict_p): Adjust.
(object_thread_conflict_p): Ditto.
(merge_threads): Ditto.
(form_threads_from_copies): Ditto.
(form_threads_from_bucket): Ditto.
(form_threads_from_colorable_allocno): Ditto.
(init_allocno_threads): Ditto.
(add_allocno_to_bucket): Ditto.
(delete_allocno_from_bucket): Ditto.
(allocno_copy_cost_saving): Ditto.
(color_allocnos): Ditto.
(color_pass): Ditto.
(update_curr_costs): Ditto.
(coalesce_allocnos): Ditto.
(ira_reuse_stack_slot): Ditto.
(ira_initiate_assign): Ditto.
(ira_finish_assign): Ditto.
* ira-conflicts.cc (allocnos_conflict_for_copy_p): Ditto.
(REG_SUBREG_P): Ditto.
(subreg_move_p): New.
(regs_non_conflict_for_copy_p): New.
(subreg_reg_align_and_times_p): New.
(process_regs_for_copy): Ditto.
(add_insn_allocno_copies): Ditto.
(propagate_copies): Ditto.
* ira-emit.cc (add_range_and_copies_from_move_list): Ditto.
* ira-int.h (struct ira_allocno_copy): Ditto.
(ira_add_allocno_copy): Ditto.
(find_object): Exported.
(subreg_move_p): Exported.
* ira.cc (print_redundant_copies): Exported.

---
 gcc/ira-build.cc | 154 +++-
 gcc/ira-color.cc | 541 +++
 gcc/ira-conflicts.cc | 173 +++---
 gcc/ira-emit.cc  |  10 +-
 gcc/ira-int.h|  10 +-
 gcc/ira.cc   |   5 +-
 6 files changed, 646 insertions(+), 247 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index a32693e69e4..13f0f7336ed 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -36,9 +36,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "subreg-live-range.h"
 
-static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
-ira_loop_tree_node_t);
-
 /* The root of the loop tree corresponding to the all function.  */
 ira_loop_tree_node_t ira_loop_tree_root;
 
@@ -520,6 +517,16 @@ find_object (ira_allocno_t a, poly_int64 offset, 
poly_int64 size)
   return find_object (a, subreg_start, subreg_nregs);
 }
 
+/* Return object in allocno A for REG.  */
+ira_object_t
+find_object (ira_allocno_t a, rtx reg)
+{
+  if (has_subreg_object_p (a) && read_modify_subreg_p (reg))
+return find_object (a, SUBREG_BYTE (reg), GET_MODE_SIZE (GET_MODE (reg)));
+  else
+return find_object (a, 0, ALLOCNO_NREGS (a));
+}
+
 /* Return the object in allocno A which match START & NREGS.  Create when not
found.  */
 ira_object_t
@@ -1503,27 +1510,36 @@ initiate_copies (void)
 /* Return copy connecting A1 and A2 and originated from INSN of
LOOP_TREE_NODE if any.  */
 static ira_copy_t
-find_allocno_copy (ira_allocno_t a1, ira_allocno_t a2, rtx_insn *insn,
+find_allocno_copy (ira_object_t obj1, ira_object_t obj2, rtx_insn *insn,
   ira_loop_tree_node_t loop_tree_node)
 {
   ira_copy_t cp, next_cp;
-  ira_allocno_t another_a;
+  ira_object_t another_obj;
 
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
   for (cp = ALLOCNO_COPIES (a1); cp != NULL; cp = next_cp)
 {
-  if (cp->first == a1)
+  ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+  ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+  if (first_a == a1)
{
  next_cp = cp->next_first_allocno_copy;
- another_a = cp->second;
+ if (cp->first == obj1)
+   another_obj = cp->second;
+ else
+   continue;
}
-  else if (cp->second == a1)
+  else if (second_a == a1)
{
  next_cp = cp->next_second_allocno_copy;
- another_a = cp->first;
+ if (cp->second == obj1)
+   another_obj = cp->first;
+ else
+   continue;
}
   else
gcc_unreachable ();
-  if (another_a == a2 && 

[PATCH V3 1/7] df: Add DF_LIVE_SUBREG problem

2023-11-12 Thread Lehua Ding
nd backward live
@@ -946,6 +983,7 @@ extern class df_d *df;
 #define df_note(df->problems_by_index[DF_NOTE])
 #define df_md  (df->problems_by_index[DF_MD])
 #define df_mir (df->problems_by_index[DF_MIR])
+#define df_live_subreg (df->problems_by_index[DF_LIVE_SUBREG])
 
 /* This symbol turns on checking that each modification of the cfg has
   been identified to the appropriate df routines.  It is not part of
@@ -1031,6 +1069,25 @@ extern void df_lr_add_problem (void);
 extern void df_lr_verify_transfer_functions (void);
 extern void df_live_verify_transfer_functions (void);
 extern void df_live_add_problem (void);
+extern void
+df_live_subreg_add_problem (void);
+extern void
+df_live_subreg_finalize (bitmap all_blocks);
+class subreg_range;
+extern bool
+need_track_subreg (int regno, machine_mode mode);
+extern void
+remove_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+machine_mode mode, const subreg_range );
+extern bool
+remove_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref);
+extern void
+add_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+ machine_mode mode, const subreg_range ,
+ bool is_def = false);
+extern bool
+add_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref,
+ bool is_def = false);
 extern void df_live_set_all_dirty (void);
 extern void df_chain_add_problem (unsigned int);
 extern void df_word_lr_add_problem (void);
@@ -1124,6 +1181,16 @@ df_lr_get_bb_info (unsigned int index)
 return NULL;
 }
 
+inline class df_live_subreg_bb_info *
+df_live_subreg_get_bb_info (unsigned int index)
+{
+  if (index < df_live_subreg->block_info_size)
+return &(
+  (class df_live_subreg_bb_info *) df_live_subreg->block_info)[index];
+  else
+return NULL;
+}
+
 inline class df_md_bb_info *
 df_md_get_bb_info (unsigned int index)
 {
diff --git a/gcc/regs.h b/gcc/regs.h
index aea093ed795..84c6bdb980c 100644
--- a/gcc/regs.h
+++ b/gcc/regs.h
@@ -389,4 +389,11 @@ range_in_hard_reg_set_p (const_hard_reg_set set, unsigned 
regno, int nregs)
   return true;
 }
 
+/* Return the number of blocks the MODE overlap. One block equal mode's natural
+   size. So, satisfy the following equation:
+ (nblocks - 1) * natural_size < GET_MODE_SIZE (mode)
+   <= nblocks * natural_size. */
+#define get_nblocks(mode)  
\
+  (exact_div (GET_MODE_SIZE (mode), REGMODE_NATURAL_SIZE (mode)).to_constant 
())
+
 #endif /* GCC_REGS_H */
diff --git a/gcc/subreg-live-range.cc b/gcc/subreg-live-range.cc
new file mode 100644
index 000..43a5eafedf1
--- /dev/null
+++ b/gcc/subreg-live-range.cc
@@ -0,0 +1,628 @@
+/* SUBREG live range track classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "subreg-live-range.h"
+#include "selftest.h"
+#include "print-rtl.h"
+
+/* class subreg_range */
+void
+subreg_range::dump (FILE *file) const
+{
+  fprintf (file, "[%d, %d)", start, end);
+}
+
+/* class subreg_ranges */
+bool
+subreg_ranges::add_range (int max, const subreg_range _range)
+{
+  subreg_range range = new_range;
+  if (full_p ())
+return false;
+  else if (max == 1)
+{
+  gcc_assert (range.start == 0 && range.end == 1);
+  make_full ();
+  return true;
+}
+
+  if (this->max == 1)
+change_max (max);
+
+  gcc_assert (this->max == max);
+  gcc_assert (range.start < range.end);
+
+  bool changed = empty_p ();
+  auto it = ranges.begin ();
+  while (it != ranges.end ())
+{
+  const subreg_range  = *it;
+  gcc_assert (r.start < r.end);
+
+  /* The possible positional relationship of R and RANGE.
+1~5 means R.start's possible position relative to RANGE
+A~G means R.end's possible position relative to RANGE
+caseN means when R.start at N positon, the R.end can be in which
+positions.
+
+RANGE.start RANGE.end
+ [   )
+ |   |
+   R.start   1   2   3   4   5
+   R

[PATCH V3 2/7] ira: Switch to live_subreg data

2023-11-12 Thread Lehua Ding
This patch switch the use of live_reg data to live_subreg data.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.

---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 16 +---
 5 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 93e46033170..f931c6e304c 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1919,7 +1919,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1935,9 +1936,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = DF_LIVE_SUBREG_IN (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_OUT (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..4aa3e316282 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2783,8 +2783,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (DF_LIVE_SUBREG_OUT (e->src), regno)
+   && bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2792,8 +2792,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (DF_LIVE_SUBREG_OUT (e->src), regno)
+   && bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index bcc4f09f7c4..84ed482e568 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = DF_LIVE_SUBREG_IN (e->dest);
+  regs_live_out_src = DF_LIVE_SUBREG_OUT (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, DF_LIVE_SUBREG_IN (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, DF_LIVE_SUBREG_OUT (bb));
   add_range_and_copies_from_move_list
(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- bitmap_and (live_through,
- df_get_live_in (e->dest), df_get_live_out (bb));
+ bitmap_and (live_through, DF_LIVE_SUBREG_IN (e->dest),
+ DF_LIVE_SUBREG_OUT (bb));
  add_range_and_copies_from_move_list
((move_t) e->aux, node, live_through,
 REG_FREQ_FROM_EDGE_FREQ 

[PATCH V3 0/7] ira/lra: Support subreg coalesce

2023-11-12 Thread Lehua Ding
  ld4d{z4.d - z7.d}, p0/z, [x0]
mov z26.d, z4.d
mov z27.d, z5.d
mov z28.d, z6.d
mov z29.d, z7.d
cmp w1, 0
...
```

After these patchs:

```
bar:
ld4d{z28.d - z31.d}, p0/z, [x0]
cmp     w1, 0
    ...
```

Lehua Ding (7):
  df: Add DF_LIVE_SUBREG problem
  ira: Switch to live_subreg data
  ira: Support subreg live range track
  ira: Support subreg copy
  ira: Add all nregs >= 2 pseudos to tracke subreg list
  lra: Switch to live_subreg data flow
  lra: Support subreg live range track and conflict detect

 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 889 ++-
 gcc/df.h |  67 +++
 gcc/hard-reg-set.h   |  33 ++
 gcc/ira-build.cc | 456 
 gcc/ira-color.cc | 851 ++---
 gcc/ira-conflicts.cc | 221 +++---
 gcc/ira-emit.cc  |  24 +-
 gcc/ira-int.h|  67 ++-
 gcc/ira-lives.cc | 507 --
 gcc/ira.cc   |  73 ++--
 gcc/lra-assigns.cc   | 111 -
 gcc/lra-coalesce.cc  |  20 +-
 gcc/lra-constraints.cc   | 111 +++--
 gcc/lra-int.h|  33 ++
 gcc/lra-lives.cc | 660 -
 gcc/lra-remat.cc |  13 +-
 gcc/lra-spills.cc|  22 +-
 gcc/lra.cc   | 139 +-
 gcc/regs.h   |   7 +
 gcc/subreg-live-range.cc | 628 +++
 gcc/subreg-live-range.h  | 333 +++
 gcc/timevar.def  |   1 +
 23 files changed, 4490 insertions(+), 777 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-12 Thread Lehua Ding

Hi Vladimir,

On 2023/11/10 4:24, Vladimir Makarov wrote:


On 11/7/23 22:47, Lehua Ding wrote:


Lehua Ding (7):
   ira: Refactor the handling of register conflicts to make it more
 general
   ira: Add live_subreg problem and apply to ira pass
   ira: Support subreg live range track
   ira: Support subreg copy
   ira: Add all nregs >= 2 pseudos to tracke subreg list
   lra: Apply live_subreg df_problem to lra pass
   lra: Support subreg live range track and conflict detect

Thank you very much for addressing subreg RA.  It is a big work.  I 
wanted to address this long time ago but have no time to do this by myself.


I tried to evaluate your patches on x86-64 (i7-9700k) release mode GCC. 
I used -O3 for SPEC2017 compilation.


Here are the results:

    baseline baseline(+patches)
specint2017:  8.51 vs 8.58 (+0.8%)
specfp2017:   21.1 vs 21.1 (+0%)
compile time: 2426.41s vs 2580.58s (+6.4%)

Spec2017 average code size change: -0.07%

Improving specint by 0.8% is impressive for me.

Unfortunately, it is achieved by decreasing compilation speed by 6.4% 
(although on smaller benchmark I saw only 3% slowdown). I don't know how 
but we should mitigate this speed degradation.  May be we can find a hot 
spot in the new code (but I think it is not a linear search pointed by 
Richard Biener as the object vectors most probably contain 1-2 elements) 
and this code spot can be improved, or we could use this only for 
-O3/fast, or the code can be function or target dependent.


I also find GCC consumes more memory with the patches. May be it can be 
improved too (although I am not sure about this).


Thanks for the specint performance data. I'll do my best to get the 
compile time and memory issues fixed. I'm very curious to know if the 
way used to solve the subreg coalesce problem makes sense to you?


I'll start to review the patches on the next week.  I don't expect that 
I'll find something serious to reject the patches but again we should 
work on mitigation of the compilation speed problem.  We can fill a new 
PR for this and resolve the problem during the release cycle.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-12 Thread Lehua Ding

Hi Dimitar,

I solved the problem you reported in V2 patch 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636166.html), 
is it possible for you to help confirm this? Thank you very much.


On 2023/11/9 0:56, Dimitar Dimitrov wrote:

On Wed, Nov 08, 2023 at 11:47:33AM +0800, Lehua Ding wrote:

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).


Hi Lehua,

This patch set breaks the build for at least three embedded targets. See
below.

For avr the GCC build fails with:
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira-lives.cc:149:39: error: call of overloaded 
‘set_subreg_conflict_hard_regs(ira_allocno*&, int&)’ is ambiguous
   149 | set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj), regno);


For arm-none-eabi the newlib build fails with:
/mnt/nvme/dinux/local-workspace/newlib/newlib/libm/math/e_jn.c:279:1: internal 
compiler error: Floating point exception
   279 | }
   | ^
0x1176e0f crash_signal
 /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
0xf6008d get_range_hard_regs(int, subreg_range const&)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
0xf6008d get_range_hard_regs(int, subreg_range const&)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:601
0xf60312 new_insn_reg
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
0xf6064d add_regs_to_insn_regno_info
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1623
0xf62909 lra_update_insn_regno_info(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
0xf62e46 lra_update_insn_regno_info(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1762
0xf62e46 lra_push_insn_1
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
0xf62f2d lra_push_insn(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
0xf62f2d push_insns
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
0xf63302 push_insns
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1966
0xf63302 lra(_IO_FILE*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
0xf0e399 do_reload
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0xf0e399 execute
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148


For pru-elf the GCC build fails with:
/mnt/nvme/dinux/local-workspace/gcc/libgcc/unwind-dw2-fde.c: In function 
'linear_search_fdes':
/mnt/nvme/dinux/local-workspace/gcc/libgcc/unwind-dw2-fde.c:1035:1: internal 
compiler error: Floating point exception
  1035 | }
   | ^
0x1694f2e crash_signal
 /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
0x1313178 get_range_hard_regs(int, subreg_range const&)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
0x131343a new_insn_reg
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
0x13174f0 add_regs_to_insn_regno_info
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1608
0x1318479 lra_update_insn_regno_info(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
0x13196ab lra_push_insn_1
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
0x13196de lra_push_insn(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
0x13197da push_insns
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
0x131b6dc lra(_IO_FILE*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
0x129f237 do_reload
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0x129f6c6 execute
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148


The divide by zero error above is interesting. I'm not sure why 
ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the following 
rtx:
(debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ encoding 
])) -1
  (nil))

Regards,
Dimitar



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



[PATCH V2 6/7] lra: Switch to live_subreg data flow

2023-11-12 Thread Lehua Ding
This patch switches the live_reg data in lra to live_subreg data,
and the situation will be more complicated than in ira because
this part of the data is modified in lra also and the live_subreg
data will be recalculated.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info):
Adjust to new live subreg data.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): Ditto.
* lra-lives.cc (class bb_data_pseudos): Ditto.
(make_hard_regno_live): Ditto.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Ditto.
(mark_regno_dead): Ditto.
(live_trans_fun): Ditto.
(live_con_fun_0): Ditto.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Ditto.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Ditto.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.

---
 gcc/lra-coalesce.cc|  20 ++-
 gcc/lra-constraints.cc |  93 +---
 gcc/lra-int.h  |   2 +
 gcc/lra-lives.cc   | 328 -
 gcc/lra-remat.cc   |  13 +-
 gcc/lra-spills.cc  |  22 ++-
 6 files changed, 374 insertions(+), 104 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index 04a5bbd714b..abfc54f1cc2 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -188,19 +188,25 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
   if (! bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  bitmap_and_compl_into (full, _pseudos_bitmap);
+  bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+  bitmap_and_compl_into (partial, _pseudos_bitmap);
+  bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
 }
 }
 
@@ -303,8 +309,10 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (DF_LIVE_SUBREG_IN (bb), DF_LIVE_SUBREG_FULL_IN (bb),
+   DF_LIVE_SUBREG_PARTIAL_IN (bb));
+  update_live_info (DF_LIVE_SUBREG_OUT (bb), DF_LIVE_SUBREG_FULL_OUT (bb),
+   DF_LIVE_SUBREG_PARTIAL_OUT (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0607c8be7cb..c3ad846b97b 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6571,34 +6571,75 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = DF_LIVE_SUBREG_IN (prev_bb);
+ bitmap subreg_full_in = DF_LIVE_SUBREG_FULL_IN (prev_bb);
+ bitmap subreg_partial_in = DF_LIVE_SUBREG_PARTIAL_IN (prev_bb);
+ subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j, bi)
if (bitmap_bit_p (_regs, j))
- bitmap_set_bit (df_get_live_in (prev_bb), j);
-   else
- bitmap_clear_bit (df_get_live_in (prev_bb), j);
+ {
+   bitmap_set_bit (subreg_all_in, j);
+   bitmap_set_bit (subreg_full_in, j);
+   if (bitmap_bit_p (subreg_partial_in, j))
+ {
+   bitmap_clear_bit (subreg_partial_in, j);
+   range_in->remove_live (j);
+ }
+ }
+   else if (bitmap_bit_p (subreg_all_in, j))
+ {
+   bitmap_clear_bit (subreg_all_in, j);
+   bitmap_clear_bit (subreg_full_in, j);
+   if 

[PATCH V2 1/7] df: Add DF_LIVE_SUBREG problem

2023-11-12 Thread Lehua Ding
nd backward live
@@ -946,6 +983,7 @@ extern class df_d *df;
 #define df_note(df->problems_by_index[DF_NOTE])
 #define df_md  (df->problems_by_index[DF_MD])
 #define df_mir (df->problems_by_index[DF_MIR])
+#define df_live_subreg (df->problems_by_index[DF_LIVE_SUBREG])
 
 /* This symbol turns on checking that each modification of the cfg has
   been identified to the appropriate df routines.  It is not part of
@@ -1031,6 +1069,25 @@ extern void df_lr_add_problem (void);
 extern void df_lr_verify_transfer_functions (void);
 extern void df_live_verify_transfer_functions (void);
 extern void df_live_add_problem (void);
+extern void
+df_live_subreg_add_problem (void);
+extern void
+df_live_subreg_finalize (bitmap all_blocks);
+class subreg_range;
+extern bool
+need_track_subreg (int regno, machine_mode mode);
+extern void
+remove_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+machine_mode mode, const subreg_range );
+extern bool
+remove_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref);
+extern void
+add_subreg_range (basic_block_subreg_live_info *bb_info, unsigned int regno,
+ machine_mode mode, const subreg_range ,
+ bool is_def = false);
+extern bool
+add_subreg_range (basic_block_subreg_live_info *bb_info, df_ref ref,
+ bool is_def = false);
 extern void df_live_set_all_dirty (void);
 extern void df_chain_add_problem (unsigned int);
 extern void df_word_lr_add_problem (void);
@@ -1124,6 +1181,16 @@ df_lr_get_bb_info (unsigned int index)
 return NULL;
 }
 
+inline class df_live_subreg_bb_info *
+df_live_subreg_get_bb_info (unsigned int index)
+{
+  if (index < df_live_subreg->block_info_size)
+return &(
+  (class df_live_subreg_bb_info *) df_live_subreg->block_info)[index];
+  else
+return NULL;
+}
+
 inline class df_md_bb_info *
 df_md_get_bb_info (unsigned int index)
 {
diff --git a/gcc/regs.h b/gcc/regs.h
index aea093ed795..84c6bdb980c 100644
--- a/gcc/regs.h
+++ b/gcc/regs.h
@@ -389,4 +389,11 @@ range_in_hard_reg_set_p (const_hard_reg_set set, unsigned 
regno, int nregs)
   return true;
 }
 
+/* Return the number of blocks the MODE overlap. One block equal mode's natural
+   size. So, satisfy the following equation:
+ (nblocks - 1) * natural_size < GET_MODE_SIZE (mode)
+   <= nblocks * natural_size. */
+#define get_nblocks(mode)  
\
+  (exact_div (GET_MODE_SIZE (mode), REGMODE_NATURAL_SIZE (mode)).to_constant 
())
+
 #endif /* GCC_REGS_H */
diff --git a/gcc/subreg-live-range.cc b/gcc/subreg-live-range.cc
new file mode 100644
index 000..43a5eafedf1
--- /dev/null
+++ b/gcc/subreg-live-range.cc
@@ -0,0 +1,628 @@
+/* SUBREG live range track classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "subreg-live-range.h"
+#include "selftest.h"
+#include "print-rtl.h"
+
+/* class subreg_range */
+void
+subreg_range::dump (FILE *file) const
+{
+  fprintf (file, "[%d, %d)", start, end);
+}
+
+/* class subreg_ranges */
+bool
+subreg_ranges::add_range (int max, const subreg_range _range)
+{
+  subreg_range range = new_range;
+  if (full_p ())
+return false;
+  else if (max == 1)
+{
+  gcc_assert (range.start == 0 && range.end == 1);
+  make_full ();
+  return true;
+}
+
+  if (this->max == 1)
+change_max (max);
+
+  gcc_assert (this->max == max);
+  gcc_assert (range.start < range.end);
+
+  bool changed = empty_p ();
+  auto it = ranges.begin ();
+  while (it != ranges.end ())
+{
+  const subreg_range  = *it;
+  gcc_assert (r.start < r.end);
+
+  /* The possible positional relationship of R and RANGE.
+1~5 means R.start's possible position relative to RANGE
+A~G means R.end's possible position relative to RANGE
+caseN means when R.start at N positon, the R.end can be in which
+positions.
+
+RANGE.start RANGE.end
+ [   )
+ |   |
+   R.start   1   2   3   4   5
+   R

[PATCH V2 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list

2023-11-12 Thread Lehua Ding
This patch relax the subreg track capability to all subreg registers.

gcc/ChangeLog:

* ira-build.cc (get_reg_unit_size): New.
(has_same_nregs): New.
(ira_set_allocno_class): Adjust.

---
 gcc/ira-build.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 13f0f7336ed..f88aaef 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -607,6 +607,37 @@ ira_create_allocno (int regno, bool cap_p,
   return a;
 }
 
+/* Return single register size of allocno A.  */
+static poly_int64
+get_reg_unit_size (ira_allocno_t a)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ALLOCNO_NREGS (a);
+  poly_int64 block_size = REGMODE_NATURAL_SIZE (mode);
+  int nblocks = get_nblocks (mode);
+  gcc_assert (nblocks % nregs == 0);
+  return block_size * (nblocks / nregs);
+}
+
+/* Return true if TARGET_CLASS_MAX_NREGS and TARGET_HARD_REGNO_NREGS results is
+   same. It should be noted that some targets may not implement these two very
+   uniformly, and need to be debugged step by step. For example, in V3x1DI mode
+   in AArch64, TARGET_CLASS_MAX_NREGS returns 2 but TARGET_HARD_REGNO_NREGS
+   returns 3. They are in conflict and need to be repaired in the Hook of
+   AArch64.  */
+static bool
+has_same_nregs (ira_allocno_t a)
+{
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+if (REGNO_REG_CLASS (i) != NO_REGS
+   && reg_class_subset_p (REGNO_REG_CLASS (i), ALLOCNO_CLASS (a))
+   && ALLOCNO_NREGS (a) != hard_regno_nregs (i, ALLOCNO_MODE (a)))
+  return false;
+  return true;
+}
+
 /* Set up register class for A and update its conflict hard
registers.  */
 void
@@ -624,12 +655,12 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class 
aclass)
 
   if (aclass == NO_REGS)
 return;
-  /* SET the unit_size of one register.  */
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD))
+  gcc_assert (!ALLOCNO_TRACK_SUBREG_P (a));
+  /* Set unit size and track_subreg_p flag for pseudo which need occupied multi
+ hard regs.  */
+  if (ALLOCNO_NREGS (a) > 1 && has_same_nregs (a))
 {
-  ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
+  ALLOCNO_UNIT_SIZE (a) = get_reg_unit_size (a);
   ALLOCNO_TRACK_SUBREG_P (a) = true;
   return;
 }
-- 
2.36.3



[PATCH V2 7/7] lra: Support subreg live range track and conflict detect

2023-11-12 Thread Lehua Ding
This patch supports tracking the liveness of a subreg in a lra pass, with the
goal of getting it to agree with ira's register allocation scheme. There is some
duplication, maybe in the future this part of the code logic can be harmonized.

gcc/ChangeLog:

* ira-build.cc (setup_pseudos_has_subreg_object):
Collect new data for lra to use.
(ira_build): Ditto.
* lra-assigns.cc (set_offset_conflicts): New function.
(setup_live_pseudos_and_spill_after_risky_transforms): Adjust.
(lra_assign): Ditto.
* lra-constraints.cc (process_alt_operands): Ditto.
* lra-int.h (GCC_LRA_INT_H): Ditto.
(struct lra_live_range): Ditto.
(struct lra_insn_reg): Ditto.
(get_range_hard_regs): New.
(get_nregs): New.
(has_subreg_object_p): New.
* lra-lives.cc (INCLUDE_VECTOR): Adjust.
(lra_live_range_pool): Ditto.
(create_live_range): Ditto.
(lra_merge_live_ranges): Ditto.
(update_pseudo_point): Ditto.
(mark_regno_live): Ditto.
(mark_regno_dead): Ditto.
(process_bb_lives): Ditto.
(remove_some_program_points_and_update_live_ranges): Ditto.
(lra_print_live_range_list): Ditto.
(class subreg_live_item): New.
(create_subregs_live_ranges): New.
(lra_create_live_ranges_1): Ditto.
* lra.cc (get_range_blocks): Ditto.
(get_range_hard_regs): Ditto.
(new_insn_reg): Ditto.
(collect_non_operand_hard_regs): Ditto.
(initialize_lra_reg_info_element): Ditto.
(reg_same_range_p): New.
(add_regs_to_insn_regno_info): Adjust.

---
 gcc/ira-build.cc   |  31 
 gcc/lra-assigns.cc | 111 --
 gcc/lra-constraints.cc |  18 ++-
 gcc/lra-int.h  |  31 
 gcc/lra-lives.cc   | 340 ++---
 gcc/lra.cc | 139 +++--
 6 files changed, 585 insertions(+), 85 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index f88aaef..bb29627d375 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -95,6 +95,9 @@ int ira_copies_num;
basic block.  */
 static int last_basic_block_before_change;
 
+/* Record these pseudos which has subreg object. Used by LRA pass.  */
+bitmap_head pseudos_has_subreg_object;
+
 /* Initialize some members in loop tree node NODE.  Use LOOP_NUM for
the member loop_num.  */
 static void
@@ -3711,6 +3714,33 @@ update_conflict_hard_reg_costs (void)
 }
 }
 
+/* Setup speudos_has_subreg_object.  */
+static void
+setup_pseudos_has_subreg_object ()
+{
+  bitmap_initialize (_has_subreg_object, _obstack);
+  ira_allocno_t a;
+  ira_allocno_iterator ai;
+  FOR_EACH_ALLOCNO (a, ai)
+if (has_subreg_object_p (a))
+  {
+   bitmap_set_bit (_has_subreg_object, ALLOCNO_REGNO (a));
+   if (ira_dump_file != NULL)
+ {
+   fprintf (ira_dump_file,
+"  a%d(r%d, nregs: %d) has subreg objects:\n",
+ALLOCNO_NUM (a), ALLOCNO_REGNO (a), ALLOCNO_NREGS (a));
+   ira_allocno_object_iterator oi;
+   ira_object_t obj;
+   FOR_EACH_ALLOCNO_OBJECT (a, obj, oi)
+ fprintf (ira_dump_file, "object %d: start: %d, nregs: %d\n",
+  OBJECT_INDEX (obj), OBJECT_START (obj),
+  OBJECT_NREGS (obj));
+   fprintf (ira_dump_file, "\n");
+ }
+  }
+}
+
 /* Create a internal representation (IR) for IRA (allocnos, copies,
loop tree nodes).  The function returns TRUE if we generate loop
structure (besides nodes representing all function and the basic
@@ -3731,6 +3761,7 @@ ira_build (void)
   create_allocnos ();
   ira_costs ();
   create_allocno_objects ();
+  setup_pseudos_has_subreg_object ();
   ira_create_allocno_live_ranges ();
   remove_unnecessary_regions (false);
   ira_compress_allocno_live_ranges ();
diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index d2ebcfd5056..6588a740162 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1131,6 +1131,52 @@ assign_hard_regno (int hard_regno, int regno)
 /* Array used for sorting different pseudos.  */
 static int *sorted_pseudos;
 
+/* The detail conflict offsets If two live ranges conflict. Use to record
+   partail conflict.  */
+static bitmap_head live_range_conflicts;
+
+/* Set the conflict offset of the two registers REGNO1 and REGNO2. Use the
+   regno with bigger nregs as the base.  */
+static void
+set_offset_conflicts (int regno1, int regno2)
+{
+  gcc_assert (reg_renumber[regno1] >= 0 && reg_renumber[regno2] >= 0);
+  int nregs1 = get_nregs (regno1);
+  int nregs2 = get_nregs (regno2);
+  if (nregs1 < nregs2)
+{
+  std::swap (nregs1, nregs2);
+  std::swap (regno1, regno2);
+}
+
+  lra_live_range_t r1 = lra_reg_info[regno1].live_ranges;
+  lra_live_range_t r2 = lra_reg_info[regno2].live_ranges;
+  int total = nregs1;
+
+  bitmap_clear (_range_conflicts);
+  while (r1 != 

[PATCH V2 0/7] ira/lra: Support subreg coalesce

2023-11-12 Thread Lehua Ding
   mov z26.d, z4.d
mov z27.d, z5.d
mov z28.d, z6.d
mov z29.d, z7.d
cmp w1, 0
...
```

After these patchs:

```
bar:
ld4d{z28.d - z31.d}, p0/z, [x0]
cmp     w1, 0
    ...
```

Lehua Ding (7):
  df: Add DF_LIVE_SUBREG problem
  ira: Switch to live_subreg data
  ira: Support subreg live range track
  ira: Support subreg copy
  ira: Add all nregs >= 2 pseudos to tracke subreg list
  lra: Switch to live_subreg data flow
  lra: Support subreg live range track and conflict detect

 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 889 ++-
 gcc/df.h |  67 +++
 gcc/hard-reg-set.h   |  33 ++
 gcc/ira-build.cc | 456 
 gcc/ira-color.cc | 851 ++---
 gcc/ira-conflicts.cc | 221 +++---
 gcc/ira-emit.cc  |  24 +-
 gcc/ira-int.h|  67 ++-
 gcc/ira-lives.cc | 507 --
 gcc/ira.cc   |  73 ++--
 gcc/lra-assigns.cc   | 111 -
 gcc/lra-coalesce.cc  |  20 +-
 gcc/lra-constraints.cc   | 111 +++--
 gcc/lra-int.h|  33 ++
 gcc/lra-lives.cc | 660 -
 gcc/lra-remat.cc |  13 +-
 gcc/lra-spills.cc|  22 +-
 gcc/lra.cc   | 139 +-
 gcc/regs.h   |   7 +
 gcc/subreg-live-range.cc | 628 +++
 gcc/subreg-live-range.h  | 333 +++
 gcc/timevar.def  |   1 +
 23 files changed, 4490 insertions(+), 777 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3



[PATCH V2 4/7] ira: Support subreg copy

2023-11-12 Thread Lehua Ding
This patch changes the previous way of creating a copy between allocnos to 
objects.

gcc/ChangeLog:

* ira-build.cc (find_allocno_copy): Removed.
(find_object): New.
(ira_create_copy): Adjust.
(add_allocno_copy_to_list): Adjust.
(swap_allocno_copy_ends_if_necessary): Adjust.
(ira_add_allocno_copy): Adjust.
(print_copy): Adjust.
(print_allocno_copies): Adjust.
(ira_flattening): Adjust.
* ira-color.cc (INCLUDE_VECTOR): Include vector.
(struct allocno_color_data): Adjust.
(struct allocno_hard_regs_subnode): Adjust.
(form_allocno_hard_regs_nodes_forest): Adjust.
(update_left_conflict_sizes_p): Adjust.
(struct update_cost_queue_elem): Adjust.
(queue_update_cost): Adjust.
(get_next_update_cost): Adjust.
(update_costs_from_allocno): Adjust.
(update_conflict_hard_regno_costs): Adjust.
(assign_hard_reg): Adjust.
(objects_conflict_by_live_ranges_p): New.
(allocno_thread_conflict_p): Adjust.
(object_thread_conflict_p): Ditto.
(merge_threads): Ditto.
(form_threads_from_copies): Ditto.
(form_threads_from_bucket): Ditto.
(form_threads_from_colorable_allocno): Ditto.
(init_allocno_threads): Ditto.
(add_allocno_to_bucket): Ditto.
(delete_allocno_from_bucket): Ditto.
(allocno_copy_cost_saving): Ditto.
(color_allocnos): Ditto.
(color_pass): Ditto.
(update_curr_costs): Ditto.
(coalesce_allocnos): Ditto.
(ira_reuse_stack_slot): Ditto.
(ira_initiate_assign): Ditto.
(ira_finish_assign): Ditto.
* ira-conflicts.cc (allocnos_conflict_for_copy_p): Ditto.
(REG_SUBREG_P): Ditto.
(subreg_move_p): New.
(regs_non_conflict_for_copy_p): New.
(subreg_reg_align_and_times_p): New.
(process_regs_for_copy): Ditto.
(add_insn_allocno_copies): Ditto.
(propagate_copies): Ditto.
* ira-emit.cc (add_range_and_copies_from_move_list): Ditto.
* ira-int.h (struct ira_allocno_copy): Ditto.
(ira_add_allocno_copy): Ditto.
(find_object): Exported.
(subreg_move_p): Exported.
* ira.cc (print_redundant_copies): Exported.

---
 gcc/ira-build.cc | 154 +++-
 gcc/ira-color.cc | 541 +++
 gcc/ira-conflicts.cc | 173 +++---
 gcc/ira-emit.cc  |  10 +-
 gcc/ira-int.h|  10 +-
 gcc/ira.cc   |   5 +-
 6 files changed, 646 insertions(+), 247 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index a32693e69e4..13f0f7336ed 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -36,9 +36,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "subreg-live-range.h"
 
-static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
-ira_loop_tree_node_t);
-
 /* The root of the loop tree corresponding to the all function.  */
 ira_loop_tree_node_t ira_loop_tree_root;
 
@@ -520,6 +517,16 @@ find_object (ira_allocno_t a, poly_int64 offset, 
poly_int64 size)
   return find_object (a, subreg_start, subreg_nregs);
 }
 
+/* Return object in allocno A for REG.  */
+ira_object_t
+find_object (ira_allocno_t a, rtx reg)
+{
+  if (has_subreg_object_p (a) && read_modify_subreg_p (reg))
+return find_object (a, SUBREG_BYTE (reg), GET_MODE_SIZE (GET_MODE (reg)));
+  else
+return find_object (a, 0, ALLOCNO_NREGS (a));
+}
+
 /* Return the object in allocno A which match START & NREGS.  Create when not
found.  */
 ira_object_t
@@ -1503,27 +1510,36 @@ initiate_copies (void)
 /* Return copy connecting A1 and A2 and originated from INSN of
LOOP_TREE_NODE if any.  */
 static ira_copy_t
-find_allocno_copy (ira_allocno_t a1, ira_allocno_t a2, rtx_insn *insn,
+find_allocno_copy (ira_object_t obj1, ira_object_t obj2, rtx_insn *insn,
   ira_loop_tree_node_t loop_tree_node)
 {
   ira_copy_t cp, next_cp;
-  ira_allocno_t another_a;
+  ira_object_t another_obj;
 
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
   for (cp = ALLOCNO_COPIES (a1); cp != NULL; cp = next_cp)
 {
-  if (cp->first == a1)
+  ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+  ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+  if (first_a == a1)
{
  next_cp = cp->next_first_allocno_copy;
- another_a = cp->second;
+ if (cp->first == obj1)
+   another_obj = cp->second;
+ else
+   continue;
}
-  else if (cp->second == a1)
+  else if (second_a == a1)
{
  next_cp = cp->next_second_allocno_copy;
- another_a = cp->first;
+ if (cp->second == obj1)
+   another_obj = cp->first;
+ else
+   continue;
}
   else
gcc_unreachable ();
-  if (another_a == a2 && 

[PATCH V2 3/7] ira: Support subreg live range track

2023-11-12 Thread Lehua Ding
This patch supports tracking subreg liveness. It first extends
ira_object_t objects[2] to std::vector objects,
which can hold more than one object, and is used to collect all
access via subreg in program and the partial_in and partial_out
of the basic block live in/out.

Then there is a modification to the way conflicts between registers
are detected, for example, if a object conflicts with b object, then
the offset and size of the object relative to the allocno it belongs
to need to be taken into account to compute the conflict registers
between allocno and allocno.

gcc/ChangeLog:

* hard-reg-set.h (struct HARD_REG_SET): New shift operator.
* ira-build.cc (ira_create_object): Adjust.
(find_object): New.
(find_object_anyway): New.
(ira_create_allocno): Adjust.
(get_range): New.
(ira_copy_allocno_objects): New.
(merge_hard_reg_conflicts): Adjust copy.
(create_cap_allocno): Adjust.
(find_subreg_p): New.
(add_subregs): New.
(create_insn_allocnos): Collect subreg.
(create_bb_allocnos): Ditto.
(move_allocno_live_ranges): Adjust.
(copy_allocno_live_ranges): Adjust.
(setup_min_max_allocno_live_range_point): Adjust.
* ira-color.cc (INCLUDE_MAP): include map.
(setup_left_conflict_sizes_p): Adjust conflict size.
(setup_profitable_hard_regs): Adjust.
(get_conflict_and_start_profitable_regs): Adjust.
(check_hard_reg_p): Adjust conflict check.
(assign_hard_reg): Adjust.
(push_allocno_to_stack): Adjust conflict size.
(improve_allocation): Adjust.
* ira-conflicts.cc (record_object_conflict): Simplify.
(build_object_conflicts): Adjust.
(build_conflicts): Adjust.
(print_allocno_conflicts): Adjust.
* ira-emit.cc (modify_move_list): Adjust.
* ira-int.h (struct ira_object): Adjust struct.
(struct ira_allocno): Adjust struct.
(ALLOCNO_NUM_OBJECTS): New accessor.
(ALLOCNO_UNIT_SIZE): Ditto.
(ALLOCNO_TRACK_SUBREG_P): Ditto.
(ALLOCNO_NREGS): Ditto.
(OBJECT_SUBWORD): Ditto.
(OBJECT_INDEX): Ditto.
(OBJECT_START): Ditto.
(OBJECT_NREGS): Ditto.
(find_object): Exported.
(find_object_anyway): Ditto.
(ira_copy_allocno_objects): Ditto.
(has_subreg_object_p): Ditto.
(get_full_object): Ditto.
* ira-lives.cc (INCLUDE_VECTOR): Include vector.
(add_onflict_hard_regs): New.
(add_onflict_hard_reg): New.
(make_hard_regno_dead): Adjust.
(make_object_live): Adjust.
(update_allocno_pressure_excess_length): Adjust.
(make_object_dead): Adjust.
(mark_pseudo_regno_live): Adjust.
(add_subreg_point): New.
(mark_pseudo_object_live): Adjust.
(mark_pseudo_regno_subword_live): Adjust.
(mark_pseudo_regno_subreg_live): Adjust.
(mark_pseudo_regno_subregs_live): Adjust.
(mark_pseudo_reg_live): Adjust.
(mark_pseudo_regno_dead): Adjust.
(mark_pseudo_object_dead): Adjust.
(mark_pseudo_regno_subword_dead): Adjust.
(mark_pseudo_regno_subreg_dead): Adjust.
(mark_pseudo_reg_dead): Adjust.
(process_single_reg_class_operands): Adjust.
(process_out_of_region_eh_regs): Adjust.
(add_conflict_from_region_landing_pads): Adjust.
(process_bb_node_lives): Adjust.
(class subreg_live_item): New class.
(create_subregs_live_ranges): New function.
(ira_create_allocno_live_ranges): Adjust.
* ira.cc (check_allocation): Adjust.

---
 gcc/hard-reg-set.h   |  33 +++
 gcc/ira-build.cc | 235 +---
 gcc/ira-color.cc | 302 +-
 gcc/ira-conflicts.cc |  48 ++---
 gcc/ira-emit.cc  |   2 +-
 gcc/ira-int.h|  57 -
 gcc/ira-lives.cc | 500 ---
 gcc/ira.cc   |  52 ++---
 8 files changed, 907 insertions(+), 322 deletions(-)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..760eadba186 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -113,6 +113,39 @@ struct HARD_REG_SET
 return !operator== (other);
   }
 
+  HARD_REG_SET
+  operator>> (unsigned int shift_amount) const
+  {
+if (shift_amount == 0)
+  return *this;
+
+HARD_REG_SET res;
+unsigned int total_bits = sizeof (HARD_REG_ELT_TYPE) * 8;
+if (shift_amount >= total_bits)
+  {
+   unsigned int n_elt = shift_amount % total_bits;
+   shift_amount -= n_elt * total_bits;
+   for (unsigned int i = 0; i < ARRAY_SIZE (elts) - n_elt - 1; i += 1)
+ res.elts[i] = elts[i + n_elt];
+   /* clear upper n_elt elements.  */
+   for (unsigned int i = 0; i < n_elt; i += 1)
+ res.elts[ARRAY_SIZE (elts) - 1 - i] = 0;
+  }
+
+if (shift_amount > 0)
+  {
+   /* The left bits of an element 

[PATCH V2 2/7] ira: Switch to live_subreg data

2023-11-12 Thread Lehua Ding
This patch switch the use of live_reg data to live_subreg data.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.

---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 16 +---
 5 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 93e46033170..f931c6e304c 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1919,7 +1919,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_IN (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1935,9 +1936,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = DF_LIVE_SUBREG_IN (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (DF_LIVE_SUBREG_OUT (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..4aa3e316282 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2783,8 +2783,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (DF_LIVE_SUBREG_OUT (e->src), regno)
+   && bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2792,8 +2792,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (DF_LIVE_SUBREG_OUT (e->src), regno)
+   && bitmap_bit_p (DF_LIVE_SUBREG_IN (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index bcc4f09f7c4..84ed482e568 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = DF_LIVE_SUBREG_IN (e->dest);
+  regs_live_out_src = DF_LIVE_SUBREG_OUT (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, DF_LIVE_SUBREG_IN (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, DF_LIVE_SUBREG_OUT (bb));
   add_range_and_copies_from_move_list
(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- bitmap_and (live_through,
- df_get_live_in (e->dest), df_get_live_out (bb));
+ bitmap_and (live_through, DF_LIVE_SUBREG_IN (e->dest),
+ DF_LIVE_SUBREG_OUT (bb));
  add_range_and_copies_from_move_list
((move_t) e->aux, node, live_through,
 REG_FREQ_FROM_EDGE_FREQ 

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-11 Thread Lehua Ding

Hi Dimitar,

On 2023/11/11 0:00, Dimitar Dimitrov wrote:

On Fri, Nov 10, 2023 at 04:53:57PM +0800, Lehua Ding wrote:

The divide by zero error above is interesting. I'm not sure why
ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in
the following rtx:
(debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [
encoding ])) -1
   (nil))


I just cross compiled an arm-none-eabi compiler and didn't encounter
this error, can you give me a little more config info about build? For
example, flags_for_target, etc. Thanks again.



Forgot, please also provide the version information of newlib code.



These are the GIT commit hashes which I tested:
   gcc 39d81b667373b0033f44702a4b532a4618dde9ff
   binutils c96ceed9dce7617f270aa4742645706e535f74b7
   newlib 39f734a857e2692224715b03b99fc7bd83e94a0f

This is the script I'm using to build arm-none-eabi:
https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-arm.sh
The build steps and config parameters are easily seen there.

Note that the Linaro CI is also detecting issues. It hits ICEs when
building libgcc:
   
https://patchwork.sourceware.org/project/gcc/patch/20231108034740.834590-8-lehua.d...@rivai.ai/


Thanks so much for the information, I can reproduce the problem now! I 
will fixed these bugs in the V2 patchs.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding

On 2023/11/10 18:16, Richard Sandiford wrote:

Lehua Ding  writes:

Hi Richard,

On 2023/11/8 17:40, Richard Sandiford wrote:

Tracking subreg liveness will sometimes expose dead code that
wasn't obvious without it.  PR89606 has an example of this.
There the dead code was introduced by init-regs, and there's a
debate about (a) whether init-regs should still be run and (b) if it
should still be run, whether it should use subreg liveness tracking too.

But I think such dead code is possible even without init-regs.
So for the purpose of this series, I think the init-regs behaviour
in that PR creates a helpful example.


Yes, I think the init-regs should be enhanced to reduce unnecessary
initialization. My previous internal patchs did this in a separate
patch. Maybe I should split the live_subreg problem out of the second
patch and not couple it with these patches. That way it can be reviewed
separately.


But my point was that this kind of dead code is possible even without
init-regs.  So I think we should have something that removes the dead
code.  And we can try it on that PR (without changing init-regs).


Got it, so we should add a fast remove dead code job after init-regs pass.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding

Hi Jeff,

On 2023/11/9 3:13, Jeff Law wrote:
The other thing to ponder.  Jivan and I have been banging on Joern's 
sub-object tracking bits for a totally different problem in the RISC-V 
space.  But there may be some overlap.


Essentially Joern's code tracks liveness for a few chunks in registers. 
bits 0..7, bits 8..15, bits 16..31 and bits 32..63.  This includes 
propagating liveness from the destination through to the sources.  SO 
for example if we have


(set (reg:SI dest) (plus:SI (srcreg1:SI) (srcreg2:SI)))

If we had previously determined that only bits 0..15 were live in DEST, 
then we'll propagate that into the source registers.


The goal is to ultimately transform something like

(set (dest:mode) (any_extend:mode (reg:narrower_mode)))

into

(set (dest:mode) (subreg:mode (reg:narrower_mode)))

Where the latter typically will get simplified and propagated away.


Joern's code is a bit of a mess, but Jivan and I are slowly untangling 
it from a correctness standpoint.  It'll also need the usual cleanups.


Anyway, point being I think it'll be worth looking at Lehua's bits and 
Joern's bits to see if there's anything that can and should be shared. 
Given I'm getting fairly familiar with Joern's bits, that likely falls 
to me.


Maybe subreg live range track classes (in patch 2) could be shared. 
Including range's UNION, Diff, and other operations should be similar. 
I'll see if I'm going to extract a separate patch to review this part. 
What do you think?


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding




On 2023/11/8 11:55, juzhe.zh...@rivai.ai wrote:

Thanks Lehua.

Appreciate for supporting subreg liveness tracking with tons of work.

A nit comments, I think you should mention these following PRs:

106694
89967
106146
99161

No need send V2 now. You can send V2 after Richard and Vlad reviewed.

Okay, thanks :)

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding

Hi Richard,

On 2023/11/8 17:40, Richard Sandiford wrote:

Tracking subreg liveness will sometimes expose dead code that
wasn't obvious without it.  PR89606 has an example of this.
There the dead code was introduced by init-regs, and there's a
debate about (a) whether init-regs should still be run and (b) if it
should still be run, whether it should use subreg liveness tracking too.

But I think such dead code is possible even without init-regs.
So for the purpose of this series, I think the init-regs behaviour
in that PR creates a helpful example.


Yes, I think the init-regs should be enhanced to reduce unnecessary 
initialization. My previous internal patchs did this in a separate 
patch. Maybe I should split the live_subreg problem out of the second 
patch and not couple it with these patches. That way it can be reviewed 
separately.



I agree with Richi of course that compile-time is a concern.
The patch seems to add quite a bit of new data to ira_allocno,
but perhaps that's OK.  ira_object + ira_allocno is already quite big.

However:

@@ -387,8 +398,8 @@ struct ira_allocno
/* An array of structures describing conflict information and live
   ranges for each object associated with the allocno.  There may be
   more than one such object in cases where the allocno represents a
- multi-word register.  */
-  ira_object_t objects[2];
+ multi-hardreg pesudo.  */
+  std::vector objects;
/* Registers clobbered by intersected calls.  */
 HARD_REG_SET crossed_calls_clobbered_regs;
/* Array of usage costs (accumulated and the one updated during

adds an extra level of indirection (and separate extra storage) for
every allocno, not just multi-hardreg ones.  It'd be worth optimising
the data structures' representation of single-hardreg pseudos even if
that slows down the multi-hardreg code, since single-hardreg pseudos are
so much more common.  And the different single-hardreg and multi-hardreg
representations could be hidden behind accessors, to make life easier
for consumers.  (Of course, performance of the accessors is also then
an issue. :))


Okay, I'll try. Thank you so much.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding
The divide by zero error above is interesting. I'm not sure why 
ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the 
following rtx:
(debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ 
encoding ])) -1

  (nil))


I just cross compiled an arm-none-eabi compiler and didn't encounter 
this error, can you give me a little more config info about build? For 
example, flags_for_target, etc. Thanks again.




Forgot, please also provide the version information of newlib code.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Lehua Ding

Hi Dimitar,

Thanks for the tests.


This patch set breaks the build for at least three embedded targets. See
below.

For avr the GCC build fails with:
/mnt/nvme/dinux/local-workspace/gcc/gcc/ira-lives.cc:149:39: error: call of overloaded 
‘set_subreg_conflict_hard_regs(ira_allocno*&, int&)’ is ambiguous
   149 | set_subreg_conflict_hard_regs (OBJECT_ALLOCNO (obj), regno);


I think it's because `HARD_REG_SET` and `unsigned int` are of the same 
type on avr target(i.e. No more than 32 registers on avr target), so 
these two bellow function prototypes conflict, I'll adjust it.


static void
set_subreg_conflict_hard_regs (ira_allocno_t a, HARD_REG_SET regs)

static void
set_subreg_conflict_hard_regs (ira_allocno_t a, unsigned int regno)


For arm-none-eabi the newlib build fails with:
/mnt/nvme/dinux/local-workspace/newlib/newlib/libm/math/e_jn.c:279:1: internal 
compiler error: Floating point exception
   279 | }
   | ^
0x1176e0f crash_signal
 /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:316
0xf6008d get_range_hard_regs(int, subreg_range const&)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:609
0xf6008d get_range_hard_regs(int, subreg_range const&)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:601
0xf60312 new_insn_reg
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:658
0xf6064d add_regs_to_insn_regno_info
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1623
0xf62909 lra_update_insn_regno_info(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1769
0xf62e46 lra_update_insn_regno_info(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1762
0xf62e46 lra_push_insn_1
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1919
0xf62f2d lra_push_insn(rtx_insn*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1927
0xf62f2d push_insns
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1970
0xf63302 push_insns
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:1966
0xf63302 lra(_IO_FILE*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2511
0xf0e399 do_reload
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0xf0e399 execute
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148

The divide by zero error above is interesting. I'm not sure why 
ira_reg_class_max_nregs[] yields 0 for the pseudo register 168 in the following 
rtx:
(debug_insn 168 167 169 19 (var_location:SI encoding (reg/v:SI 168 [ encoding 
])) -1
  (nil))


I just cross compiled an arm-none-eabi compiler and didn't encounter 
this error, can you give me a little more config info about build? For 
example, flags_for_target, etc. Thanks again.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl

2023-11-09 Thread Lehua Ding

Hi Kito,

On 2023/11/9 17:21, Kito Cheng wrote:

Should we need a zero-ext version as well?


It's not needed at the moment, since the sign_extend is currently used 
for both int32_t and uint32_t. I can't find a case where zero_extend 
would occur.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl

2023-11-08 Thread Lehua Ding

Committed, thanks Juzhe.

On 2023/11/8 21:29, juzhe.zhong wrote:

lgtm
 Replied Message 
FromLehua Ding <mailto:lehua.d...@rivai.ai>
Date11/08/2023 21:27
To	gcc-patches@gcc.gnu.org 
<mailto:gcc-patches@gcc.gnu.org>

Cc  juzhe.zh...@rivai.ai 
<mailto:juzhe.zh...@rivai.ai>,
kito.ch...@gmail.com <mailto:kito.ch...@gmail.com>,
rdapp@gmail.com <mailto:rdapp@gmail.com>,
pal...@rivosinc.com <mailto:pal...@rivosinc.com>,
jeffreya...@gmail.com <mailto:jeffreya...@gmail.com>,
lehua.d...@rivai.ai <mailto:lehua.d...@rivai.ai>
Subject [PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



[PATCH] RISC-V: Removed unnecessary sign-extend for vsetvl

2023-11-08 Thread Lehua Ding
Hi,

This patch try to combine bellow two insns and then further remove
unnecessary sign_extend operations. This optimization is borrowed
from LLVM (https://godbolt.org/z/4f6v56xej):
  (set (reg:DI 134 [ _1 ])
   (unspec:DI [
   (const_int 19 [0x13])
   (const_int 8 [0x8])
   (const_int 5 [0x5])
   (const_int 2 [0x2]) repeated x2
   ] UNSPEC_VSETVL))
  (set (reg/v:DI 135 [  ])
  (sign_extend:DI (subreg:SI (reg:DI 134 [ _1 ]) 0)))

The reason we can remove signe_extend is because currently the vl value
returned by the vsetvl instruction ranges from 0 to 65536 (uint16_t), and
bits 17 to 63 (including 31) are always 0, so there is no change after
sign_extend. Note that for HI and QI modes we cannot do this.
Of course, if the range of instructions returned by vsetvl later expands
to 32bits, then this combine pattern needs to be removed. But that could be
a long time from now.

gcc/ChangeLog:

* config/riscv/vector.md (*vsetvldi_no_side_effects_si_extend):
New combine pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_int.c: New test.

---
 gcc/config/riscv/vector.md| 41 +++
 .../gcc.target/riscv/rvv/vsetvl/vsetvl_int.c  | 31 ++
 2 files changed, 72 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_int.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e23f64938b7..d1499d330ff 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1604,6 +1604,47 @@
   [(set_attr "type" "vsetvl")
(set_attr "mode" "SI")])
 
+;; This pattern use to combine bellow two insns and then further remove
+;; unnecessary sign_extend operations:
+;;   (set (reg:DI 134 [ _1 ])
+;;(unspec:DI [
+;;(const_int 19 [0x13])
+;;(const_int 8 [0x8])
+;;(const_int 5 [0x5])
+;;(const_int 2 [0x2]) repeated x2
+;;] UNSPEC_VSETVL))
+;;   (set (reg/v:DI 135 [  ])
+;;   (sign_extend:DI (subreg:SI (reg:DI 134 [ _1 ]) 0)))
+;;
+;; The reason we can remove signe_extend is because currently the vl value
+;; returned by the vsetvl instruction ranges from 0 to 65536 (uint16_t), and
+;; bits 17 to 63 (including 31) are always 0, so there is no change after
+;; sign_extend. Note that for HI and QI modes we cannot do this.
+;; Of course, if the range of instructions returned by vsetvl later expands
+;; to 32bits, then this combine pattern needs to be removed. But that could be
+;; a long time from now.
+(define_insn_and_split "*vsetvldi_no_side_effects_si_extend"
+  [(set (match_operand:DI 0 "register_operand")
+(sign_extend:DI
+  (subreg:SI
+   (unspec:DI [(match_operand:P 1 "csr_operand")
+   (match_operand 2 "const_int_operand")
+   (match_operand 3 "const_int_operand")
+   (match_operand 4 "const_int_operand")
+   (match_operand 5 "const_int_operand")] UNSPEC_VSETVL) 
0)))]
+  "TARGET_VECTOR && TARGET_64BIT"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+(unspec:DI [(match_dup 1)
+(match_dup 2)
+(match_dup 3)
+(match_dup 4)
+(match_dup 5)] UNSPEC_VSETVL))]
+  ""
+  [(set_attr "type" "vsetvl")
+   (set_attr "mode" "SI")])
+
 ;; RVV machine description matching format
 ;; (define_insn ""
 ;;   [(set (match_operand:MODE 0)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_int.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_int.c
new file mode 100644
index 000..4cdd5877742
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_int.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d" } */
+
+#include "riscv_vector.h"
+
+void bar1 (int32_t a);
+
+int32_t
+foo1 ()
+{
+  int32_t a = __riscv_vsetvl_e8mf8(19);
+  bar1 (a);
+  return a;
+}
+
+void bar2 (uint32_t a);
+
+uint32_t
+foo2 ()
+{
+  uint32_t a = __riscv_vsetvl_e8mf8(19);
+  bar2 (a);
+  return a;
+}
+
+int32_t foo3 ()
+{
+  return __riscv_vsetvl_e8mf8(19);
+}
+
+/* { dg-final { scan-assembler-not {sext\.w} { target { no-opts "-O0" "-g" } } 
} } */
-- 
2.36.3



Re: [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general

2023-11-08 Thread Lehua Ding

Hi Richard,

Thanks for taking the time to review the code.

On 2023/11/8 15:57, Richard Biener wrote:

On Wed, Nov 8, 2023 at 4:48 AM Lehua Ding  wrote:


This patch does not make any functional changes. It mainly refactor two parts:

1. The ira_allocno's objects field is expanded to an scalable array, and 
multi-word
pseduo registers are split and tracked only when necessary.
2. Since the objects array has been expanded, there will be more subreg objects
that pass through later, rather than the previous fixed two. Therefore, it
is necessary to modify the detection of whether two objects conflict, and
the check method is to pull back the registers occupied by the object to
the first register of the allocno for judgment.


Did you profile this before/after?  RA performance is critical ...


Based on the data I ran earlier, the performance changes on spec2017 
were very slight. I'll run again and give you the data.Based on my 
expectations, the impact on existing performance should all be minimal. 
Except for examples like the ones I put up.



diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..760eadba186 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -113,6 +113,39 @@ struct HARD_REG_SET
  return !operator== (other);
}

+  HARD_REG_SET
+  operator>> (unsigned int shift_amount) const


This is a quite costly operation, why do we need it instead
of keeping an "offset" for set queries?


Because there are logic operations after the shift. For a mutil hardreg 
pseudo register, it will record the physical registers of each part of 
the conflict, and different parts of the offset are different, and we 
need to unify these differences to the conflict against the first single 
reg of the pseduo register. That is to say, first we need to convert it 
to a conflict against the first_single_reg, and then we need to collect 
all the conflicting registers (by OR operation). like this:


*start_conflict_regs |= OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) >> 
(OBJECT_START (obj) + j)



+/* Return the object in allocno A which match START & NREGS.  */
+ira_object_t
+find_object (ira_allocno_t a, int start, int nregs)
+{
+  for (ira_object_t obj : a->objects)


linear search?  really?


I was thinking about the fact that most allocno's have only one objects, 
and most of the others don't have more than 10, so I chose this easiest 
way to find them. Thanks for the heads up, it's really not very good 
here, I'll see if there's a faster way.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



[PATCH 6/7] lra: Apply live_subreg df_problem to lra pass

2023-11-07 Thread Lehua Ding
This patch change the use of old live data to the new live_subreg data.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Update.
(lra_coalesce): Update.
* lra-constraints.cc (update_ebb_live_info): Update.
(get_live_on_other_edges): Update.
(inherit_in_ebb): Update.
(lra_inheritance): Update.
(fix_bb_live_info): Update.
(remove_inheritance_pseudos): Update.
* lra-lives.cc (make_hard_regno_live): Update.
(make_hard_regno_dead): Update.
(mark_regno_live): Update.
(mark_regno_dead): Update.
(class bb_data_pseudos): Update.
(live_trans_fun): Update.
(live_con_fun_0): Update.
(live_con_fun_n): Update.
(initiate_live_solver): Update.
(finish_live_solver): Update.
(process_bb_lives): Update.
(lra_create_live_ranges_1): Update.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Update.
(calculate_livein_cands): Update.
(do_remat): Update.
* lra-spills.cc (spill_pseudos): Update.

---
 gcc/lra-coalesce.cc|  20 ++-
 gcc/lra-constraints.cc |  93 ++---
 gcc/lra-lives.cc   | 308 -
 gcc/lra-remat.cc   |  13 +-
 gcc/lra-spills.cc  |  22 ++-
 5 files changed, 354 insertions(+), 102 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index 04a5bbd714b..abfc54f1cc2 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -188,19 +188,25 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
   if (! bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  bitmap_and_compl_into (full, _pseudos_bitmap);
+  bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+  bitmap_and_compl_into (partial, _pseudos_bitmap);
+  bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
 }
 }
 
@@ -303,8 +309,10 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (DF_LIVE_SUBREG_IN (bb), DF_LIVE_SUBREG_FULL_IN (bb),
+   DF_LIVE_SUBREG_PARTIAL_IN (bb));
+  update_live_info (DF_LIVE_SUBREG_OUT (bb), DF_LIVE_SUBREG_FULL_OUT (bb),
+   DF_LIVE_SUBREG_PARTIAL_OUT (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0607c8be7cb..c3ad846b97b 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6571,34 +6571,75 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = DF_LIVE_SUBREG_IN (prev_bb);
+ bitmap subreg_full_in = DF_LIVE_SUBREG_FULL_IN (prev_bb);
+ bitmap subreg_partial_in = DF_LIVE_SUBREG_PARTIAL_IN (prev_bb);
+ subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j, bi)
if (bitmap_bit_p (_regs, j))
- bitmap_set_bit (df_get_live_in (prev_bb), j);
-   else
- bitmap_clear_bit (df_get_live_in (prev_bb), j);
+ {
+   bitmap_set_bit (subreg_all_in, j);
+   bitmap_set_bit (subreg_full_in, j);
+   if (bitmap_bit_p (subreg_partial_in, j))
+ {
+   bitmap_clear_bit (subreg_partial_in, j);
+   range_in->remove_live (j);
+ }
+ }
+   else if (bitmap_bit_p (subreg_all_in, j))
+ {
+   bitmap_clear_bit (subreg_all_in, j);
+   bitmap_clear_bit (subreg_full_in, j);
+   if (bitmap_bit_p (subreg_partial_in, j))
+ {
+   bitmap_clear_bit (subreg_partial_in, j);
+   range_in->remove_live (j);
+ }
+ }
}
+  

[PATCH 4/7] ira: Support subreg copy

2023-11-07 Thread Lehua Ding
This patch change the copy between allocno and allocno to the copy between
object and object, that is, allow partial copy between pseudo registers.

gcc/ChangeLog:

* ira-build.cc (find_allocno_copy): Removed.
(ira_create_object): Adjust.
(find_object): New.
(ira_create_copy): Adjust.
(add_allocno_copy_to_list): Adjust.
(swap_allocno_copy_ends_if_necessary): Adjust.
(ira_add_allocno_copy): Adjust.
(print_copy): Adjust.
(print_allocno_copies): Adjust.
(ira_flattening): Adjust.
* ira-color.cc (INCLUDE_VECTOR): use std::vector
(struct allocno_color_data): New fields.
(struct allocno_hard_regs_subnode): More comments.
(form_allocno_hard_regs_nodes_forest): More comments.
(update_left_conflict_sizes_p): More comments.
(struct update_cost_queue_elem): New field.
(queue_update_cost): Adjust.
(get_next_update_cost): Adjust.
(update_costs_from_allocno): Adjust.
(update_conflict_hard_regno_costs): Adjust.
(assign_hard_reg): Adjust.
(objects_conflict_by_live_ranges_p): New.
(allocno_thread_conflict_p): Removed.
(object_thread_conflict_p): New.
(merge_threads): Adjust.
(form_threads_from_copies): Adjust.
(form_threads_from_bucket): Adjust.
(form_threads_from_colorable_allocno): Adjust.
(init_allocno_threads): Adjust.
(add_allocno_to_bucket): Adjust.
(delete_allocno_from_bucket): Adjust.
(allocno_copy_cost_saving): Adjust.
(color_allocnos): Adjust.
(color_pass): Adjust.
(update_curr_costs): Adjust.
(coalesce_allocnos): Adjust.
(ira_reuse_stack_slot): Adjust.
(ira_initiate_assign): Adjust.
(ira_finish_assign): Adjust.
* ira-conflicts.cc (allocnos_conflict_for_copy_p): Removed.
(REG_SUBREG_P): Adjust.
(subreg_move_p): New.
(regs_non_conflict_for_copy_p): New.
(subreg_reg_align_and_times_p): New.
(process_regs_for_copy): Adjust.
(add_insn_allocno_copies): Adjust.
(propagate_copies): Adjust.
* ira-emit.cc (add_range_and_copies_from_move_list): Adjust.
* ira-int.h (struct ira_object): New field.
(OBJECT_INDEX): New macro.
(struct ira_allocno_copy): Adjust fields.
(ira_add_allocno_copy): Exported.
(find_object): Exported.
(subreg_move_p): Exported.
* ira.cc (print_redundant_copies): Adjust.

---
 gcc/ira-build.cc | 150 +++-
 gcc/ira-color.cc | 541 +++
 gcc/ira-conflicts.cc | 173 +++---
 gcc/ira-emit.cc  |  10 +-
 gcc/ira-int.h|  13 +-
 gcc/ira.cc   |   5 +-
 6 files changed, 645 insertions(+), 247 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 5fb7a9f800f..1c47f81ce9d 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -36,9 +36,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "subreg-live-range.h"
 
-static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
-ira_loop_tree_node_t);
-
 /* The root of the loop tree corresponding to the all function.  */
 ira_loop_tree_node_t ira_loop_tree_root;
 
@@ -463,6 +460,7 @@ ira_create_object (ira_allocno_t a, int start, int nregs)
   OBJECT_LIVE_RANGES (obj) = NULL;
   OBJECT_START (obj) = start;
   OBJECT_NREGS (obj) = nregs;
+  OBJECT_INDEX (obj) = ALLOCNO_NUM_OBJECTS (a);
 
   ira_object_id_map_vec.safe_push (obj);
   ira_object_id_map
@@ -519,6 +517,16 @@ find_object (ira_allocno_t a, poly_int64 offset, 
poly_int64 size)
   return find_object (a, subreg_start, subreg_nregs);
 }
 
+/* Return object in allocno A for REG.  */
+ira_object_t
+find_object (ira_allocno_t a, rtx reg)
+{
+  if (has_subreg_object_p (a) && read_modify_subreg_p (reg))
+return find_object (a, SUBREG_BYTE (reg), GET_MODE_SIZE (GET_MODE (reg)));
+  else
+return find_object (a, 0, ALLOCNO_NREGS (a));
+}
+
 /* Return the object in allocno A which match START & NREGS.  Create when not
found.  */
 ira_object_t
@@ -1502,27 +1510,36 @@ initiate_copies (void)
 /* Return copy connecting A1 and A2 and originated from INSN of
LOOP_TREE_NODE if any.  */
 static ira_copy_t
-find_allocno_copy (ira_allocno_t a1, ira_allocno_t a2, rtx_insn *insn,
+find_allocno_copy (ira_object_t obj1, ira_object_t obj2, rtx_insn *insn,
   ira_loop_tree_node_t loop_tree_node)
 {
   ira_copy_t cp, next_cp;
-  ira_allocno_t another_a;
+  ira_object_t another_obj;
 
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
   for (cp = ALLOCNO_COPIES (a1); cp != NULL; cp = next_cp)
 {
-  if (cp->first == a1)
+  ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+  ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+  if (first_a == a1)
{
  next_cp = 

[PATCH 3/7] ira: Support subreg live range track

2023-11-07 Thread Lehua Ding
This patch extends the reg live range in ira to track the lifecycle
of subreg, thus enabling more granular tracking of the live range and
conflict of a pseudo subreg part. This patch will divide allocno into
two categories: one has single object, and the other is the case where
it contains subreg objects.

gcc/ChangeLog:

* ira-build.cc (init_object_start_and_nregs): Removed.
(ira_create_object): Adjust.
(find_object): New.
(find_object_anyway): New.
(ira_create_allocno): Removed regs_with_subreg.
(ira_set_allocno_class): Adjust.
(get_range): New.
(ira_copy_allocno_objects): New.
(merge_hard_reg_conflicts): Adjust.
(create_cap_allocno): Adjust.
(find_subreg_p): New.
(add_subregs): New.
(create_insn_allocnos): Adjust.
(create_bb_allocnos): Adjust.
(move_allocno_live_ranges): Adjust.
(copy_allocno_live_ranges):  Adjust.
(setup_min_max_allocno_live_range_point): Adjust.
(init_regs_with_subreg): Removed.
(ira_build): Removed.
(ira_destroy): Removed.
* ira-color.cc (INCLUDE_MAP): use std::map
(setup_left_conflict_sizes_p): Adjust.
(push_allocno_to_stack): Adjust.
* ira-conflicts.cc (record_object_conflict): Adjust.
(build_object_conflicts): Adjust.
(build_conflicts): Adjust.
(print_allocno_conflicts): Adjust.
* ira-emit.cc (modify_move_list): Adjust.
* ira-int.h (struct ira_object): Adjust.
(struct ira_allocno): Adjust.
(OBJECT_SIZE): New.
(OBJECT_OFFSET): New.
(OBJECT_SUBWORD): New.
(find_object): New.
(find_object_anyway): New.
(ira_copy_allocno_objects):  New.
* ira-lives.cc (INCLUDE_VECTOR): use std::vector.
(set_subreg_conflict_hard_regs): New.
(make_hard_regno_dead): Adjust.
(make_object_live): Adjust.
(update_allocno_pressure_excess_length): Adjust.
(make_object_dead): Adjust.
(mark_pseudo_regno_live): New.
(add_subreg_point): New.
(mark_pseudo_object_live): New.
(mark_pseudo_regno_subword_live): Removed.
(mark_pseudo_regno_subreg_live): New.
(mark_pseudo_regno_subregs_live): New.
(mark_pseudo_reg_live): New.
(mark_pseudo_regno_dead): Removed.
(mark_pseudo_object_dead): New.
(mark_pseudo_regno_subword_dead): Removed.
(mark_pseudo_regno_subreg_dead): New.
(mark_pseudo_reg_dead): Adjust.
(process_single_reg_class_operands): Adjust.
(process_out_of_region_eh_regs): Adjust.
(process_bb_node_lives): Adjust.
(class subreg_live_item): New.
(create_subregs_live_ranges): New.
(ira_create_allocno_live_ranges): Adjust.
* subreg-live-range.h: New fields.

---
 gcc/ira-build.cc| 275 +
 gcc/ira-color.cc|  68 --
 gcc/ira-conflicts.cc|  48 ++--
 gcc/ira-emit.cc |   2 +-
 gcc/ira-int.h   |  21 +-
 gcc/ira-lives.cc| 522 +---
 gcc/subreg-live-range.h |  16 ++
 7 files changed, 653 insertions(+), 299 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 7df98164503..5fb7a9f800f 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -29,10 +29,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-config.h"
 #include "regs.h"
 #include "memmodel.h"
+#include "tm_p.h"
 #include "ira.h"
 #include "ira-int.h"
 #include "sparseset.h"
 #include "cfgloop.h"
+#include "subreg-live-range.h"
 
 static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
 ira_loop_tree_node_t);
@@ -440,49 +442,14 @@ initiate_allocnos (void)
   memset (ira_regno_allocno_map, 0, max_reg_num () * sizeof (ira_allocno_t));
 }
 
-/* Update OBJ's start and nregs field according A and OBJ info.  */
-static void
-init_object_start_and_nregs (ira_allocno_t a, ira_object_t obj)
-{
-  enum reg_class aclass = ALLOCNO_CLASS (a);
-  gcc_assert (aclass != NO_REGS);
-
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (ALLOCNO_TRACK_SUBREG_P (a))
-{
-  poly_int64 end = OBJECT_OFFSET (obj) + OBJECT_SIZE (obj);
-  for (int i = 0; i < nregs; i += 1)
-   {
- poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
- if (OBJECT_START (obj) < 0 && maybe_lt (OBJECT_OFFSET (obj), right))
-   {
- OBJECT_START (obj) = i;
-   }
- if (OBJECT_NREGS (obj) < 0 && maybe_le (end, right))
-   {
- OBJECT_NREGS (obj) = i + 1 - OBJECT_START (obj);
- break;
-   }
-   }
-  gcc_assert (OBJECT_START (obj) >= 0 && OBJECT_NREGS (obj) > 0);
-}
-  else
-{
-  OBJECT_START (obj) = 0;
-  OBJECT_NREGS (obj) = nregs;
-}
-}
-
 /* Create and return an 

[PATCH 7/7] lra: Support subreg live range track and conflict detect

2023-11-07 Thread Lehua Ding
This patch implements tracking of the live range of subregs and synchronously
modifies conflict detection.

gcc/ChangeLog:

* ira-build.cc (print_copy): Adjust print.
(setup_pseudos_has_subreg_object): New.
(ira_build): collect subreg object allocno.
* lra-assigns.cc (set_offset_conflicts): New.
(setup_live_pseudos_and_spill_after_risky_transforms): Adjust.
(lra_assign): Adjust.
* lra-constraints.cc (process_alt_operands): Relax.
* lra-int.h (GCC_LRA_INT_H): New include.
(struct lra_live_range): New field subreg.
(struct lra_insn_reg): New fields.
(get_range_hard_regs):  Exported.
(get_nregs): New.
(has_subreg_object_p): New.
* lra-lives.cc (INCLUDE_VECTOR): New.
(lra_live_range_pool): New.
(create_live_range): Adjust.
(lra_merge_live_ranges): Adjust.
(update_pseudo_point): Adjust.
(class bb_data_pseudos): New.
(mark_regno_live): Adjust.
(mark_regno_dead): Adjust.
(process_bb_lives): Adjust.
(remove_some_program_points_and_update_live_ranges): Adjust.
(lra_print_live_range_list): Adjust print.
(class subreg_live_item): New class.
(create_subregs_live_ranges): New.
(lra_create_live_ranges_1): Add subreg live ranges.
* lra.cc (get_range_blocks): New.
(get_range_hard_regs): New.
(new_insn_reg): Adjust.
(collect_non_operand_hard_regs): Adjust.
(initialize_lra_reg_info_element): Adjust.
(reg_same_range_p): New.
(add_regs_to_insn_regno_info): Adjust.
* subreg-live-range.h: New constructor.

---
 gcc/ira-build.cc|  40 -
 gcc/lra-assigns.cc  | 111 ++--
 gcc/lra-constraints.cc  |  18 +-
 gcc/lra-int.h   |  33 
 gcc/lra-lives.cc| 361 ++--
 gcc/lra.cc  | 139 ++--
 gcc/subreg-live-range.h |   1 +
 7 files changed, 614 insertions(+), 89 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 379f877ca67..cba38d5fecb 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -95,6 +95,9 @@ int ira_copies_num;
basic block.  */
 static int last_basic_block_before_change;
 
+/* Record these pseudos which has subreg object. Used by LRA pass.  */
+bitmap_head pseudos_has_subreg_object;
+
 /* Initialize some members in loop tree node NODE.  Use LOOP_NUM for
the member loop_num.  */
 static void
@@ -1688,8 +1691,13 @@ print_copy (FILE *f, ira_copy_t cp)
 {
   ira_allocno_t a1 = OBJECT_ALLOCNO (cp->first);
   ira_allocno_t a2 = OBJECT_ALLOCNO (cp->second);
-  fprintf (f, "  cp%d:a%d(r%d)<->a%d(r%d)@%d:%s\n", cp->num, ALLOCNO_NUM (a1),
-  ALLOCNO_REGNO (a1), ALLOCNO_NUM (a2), ALLOCNO_REGNO (a2), cp->freq,
+  fprintf (f, "  cp%d:a%d(r%d", cp->num, ALLOCNO_NUM (a1), ALLOCNO_REGNO (a1));
+  if (ALLOCNO_NREGS (a1) != OBJECT_NREGS (cp->first))
+fprintf (f, "_obj%d", OBJECT_INDEX (cp->first));
+  fprintf (f, ")<->a%d(r%d", ALLOCNO_NUM (a2), ALLOCNO_REGNO (a2));
+  if (ALLOCNO_NREGS (a2) != OBJECT_NREGS (cp->second))
+fprintf (f, "_obj%d", OBJECT_INDEX (cp->second));
+  fprintf (f, ")@%d:%s\n", cp->freq,
   cp->insn != NULL   ? "move"
   : cp->constraint_p ? "constraint"
  : "shuffle");
@@ -3706,6 +3714,33 @@ update_conflict_hard_reg_costs (void)
 }
 }
 
+/* Setup speudos_has_subreg_object.  */
+static void
+setup_pseudos_has_subreg_object ()
+{
+  bitmap_initialize (_has_subreg_object, _obstack);
+  ira_allocno_t a;
+  ira_allocno_iterator ai;
+  FOR_EACH_ALLOCNO (a, ai)
+if (has_subreg_object_p (a))
+  {
+   bitmap_set_bit (_has_subreg_object, ALLOCNO_REGNO (a));
+   if (ira_dump_file != NULL)
+ {
+   fprintf (ira_dump_file,
+"  a%d(r%d, nregs: %d) has subreg objects:\n",
+ALLOCNO_NUM (a), ALLOCNO_REGNO (a), ALLOCNO_NREGS (a));
+   ira_allocno_object_iterator oi;
+   ira_object_t obj;
+   FOR_EACH_ALLOCNO_OBJECT (a, obj, oi)
+ fprintf (ira_dump_file, "object %d: start: %d, nregs: %d\n",
+  OBJECT_INDEX (obj), OBJECT_START (obj),
+  OBJECT_NREGS (obj));
+   fprintf (ira_dump_file, "\n");
+ }
+  }
+}
+
 /* Create a internal representation (IR) for IRA (allocnos, copies,
loop tree nodes).  The function returns TRUE if we generate loop
structure (besides nodes representing all function and the basic
@@ -3726,6 +3761,7 @@ ira_build (void)
   create_allocnos ();
   ira_costs ();
   create_allocno_objects ();
+  setup_pseudos_has_subreg_object ();
   ira_create_allocno_live_ranges ();
   remove_unnecessary_regions (false);
   ira_compress_allocno_live_ranges ();
diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index d2ebcfd5056..6588a740162 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ 

[PATCH 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list

2023-11-07 Thread Lehua Ding
This patch completely relax to track all eligible subregs.

gcc/ChangeLog:

* ira-build.cc (get_reg_unit_size): New.
(has_same_nregs): New.
(ira_set_allocno_class): Relax.

---
 gcc/ira-build.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 1c47f81ce9d..379f877ca67 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -607,6 +607,37 @@ ira_create_allocno (int regno, bool cap_p,
   return a;
 }
 
+/* Return single register size of allocno A.  */
+static poly_int64
+get_reg_unit_size (ira_allocno_t a)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ALLOCNO_NREGS (a);
+  poly_int64 block_size = REGMODE_NATURAL_SIZE (mode);
+  int nblocks = get_nblocks (mode);
+  gcc_assert (nblocks % nregs == 0);
+  return block_size * (nblocks / nregs);
+}
+
+/* Return true if TARGET_CLASS_MAX_NREGS and TARGET_HARD_REGNO_NREGS results is
+   same. It should be noted that some targets may not implement these two very
+   uniformly, and need to be debugged step by step. For example, in V3x1DI mode
+   in AArch64, TARGET_CLASS_MAX_NREGS returns 2 but TARGET_HARD_REGNO_NREGS
+   returns 3. They are in conflict and need to be repaired in the Hook of
+   AArch64.  */
+static bool
+has_same_nregs (ira_allocno_t a)
+{
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+if (REGNO_REG_CLASS (i) != NO_REGS
+   && reg_class_subset_p (REGNO_REG_CLASS (i), ALLOCNO_CLASS (a))
+   && ALLOCNO_NREGS (a) != hard_regno_nregs (i, ALLOCNO_MODE (a)))
+  return false;
+  return true;
+}
+
 /* Set up register class for A and update its conflict hard
registers.  */
 void
@@ -624,12 +655,12 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class 
aclass)
 
   if (aclass == NO_REGS)
 return;
-  /* SET the unit_size of one register.  */
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD))
+  gcc_assert (!ALLOCNO_TRACK_SUBREG_P (a));
+  /* Set unit size and track_subreg_p flag for pseudo which need occupied multi
+ hard regs.  */
+  if (ALLOCNO_NREGS (a) > 1 && has_same_nregs (a))
 {
-  ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
+  ALLOCNO_UNIT_SIZE (a) = get_reg_unit_size (a);
   ALLOCNO_TRACK_SUBREG_P (a) = true;
   return;
 }
-- 
2.36.3



[PATCH 2/7] ira: Add live_subreg problem and apply to ira pass

2023-11-07 Thread Lehua Ding
This patch adds a live_subreg problem to extend the original live_reg to
track the lifecycle of subreg. At the same time, this old live data is
replaced by the new live data in ira pass.

gcc/ChangeLog:

* Makefile.in: Add subreg-live-range.o
* df-problems.cc (struct df_live_subreg_problem_data): New df problem.
(need_track_subreg): helper function.
(get_range): helper function.
(remove_subreg_range): helper function.
(add_subreg_range): helper function.
(df_live_subreg_free_bb_info): df function.
(df_live_subreg_alloc): Ditto.
(df_live_subreg_reset): Ditto.
(df_live_subreg_bb_local_compute): Ditto.
(df_live_subreg_local_compute): Ditto.
(df_live_subreg_init): Ditto.
(df_live_subreg_check_result): Ditto.
(df_live_subreg_confluence_0): Ditto.
(df_live_subreg_confluence_n): Ditto.
(df_live_subreg_transfer_function): Ditto.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_free): Ditto.
(df_live_subreg_top_dump): Ditto.
(df_live_subreg_bottom_dump): Ditto.
(df_live_subreg_add_problem): Ditto.
* df.h (enum df_problem_id): New df problem.
(DF_LIVE_SUBREG_INFO): New macro.
(DF_LIVE_SUBREG_IN): Ditto.
(DF_LIVE_SUBREG_OUT): Ditto.
(DF_LIVE_SUBREG_FULL_IN): Ditto.
(DF_LIVE_SUBREG_FULL_OUT): Ditto.
(DF_LIVE_SUBREG_PARTIAL_IN): Ditto.
(DF_LIVE_SUBREG_PARTIAL_OUT): Ditto.
(DF_LIVE_SUBREG_RANGE_IN): Ditto.
(DF_LIVE_SUBREG_RANGE_OUT): Ditto.
(class subregs_live): New class.
(class basic_block_subreg_live_info): New class.
(class df_live_subreg_bb_info): New class.
(df_live_subreg): New function.
(df_live_subreg_add_problem): Ditto.
(df_live_subreg_finalize): Ditto.
(class subreg_range): New class.
(need_track_subreg): Exported.
(remove_subreg_range): Exported.
(add_subreg_range): Exported.
(df_live_subreg_get_bb_info): Exported.
* ira-build.cc (create_bb_allocnos): Use new live data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.
* reginfo.cc (get_nblocks_slow): Helper function.
* rtl.h (get_nblocks_slow): Helper function.
(get_nblocks): Helper function.
* timevar.def (TV_DF_LIVE_SUBREG): New timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.

---
 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 889 ++-
 gcc/df.h |  93 +++-
 gcc/ira-build.cc |  14 +-
 gcc/ira-color.cc |   8 +-
 gcc/ira-emit.cc  |  12 +-
 gcc/ira-lives.cc |   7 +-
 gcc/ira.cc   |  20 +-
 gcc/reginfo.cc   |  14 +
 gcc/rtl.h|  14 +
 gcc/subreg-live-range.cc | 649 
 gcc/subreg-live-range.h  | 326 ++
 gcc/timevar.def  |   1 +
 13 files changed, 2008 insertions(+), 40 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 29cec21c825..e4403b5a30c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1675,6 +1675,7 @@ OBJS = \
store-motion.o \
streamer-hooks.o \
stringpool.o \
+subreg-live-range.o \
substring-locations.o \
target-globals.o \
targhooks.o \
diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index d2cfaf7f50f..2585c762fd1 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "rtl.h"
 #include "df.h"
+#include "subreg-live-range.h"
 #include "memmodel.h"
 #include "tm_p.h"
 #include "insn-config.h"
@@ -1344,8 +1345,894 @@ df_lr_verify_transfer_functions (void)
   bitmap_clear (_blocks);
 }
 
+/*
+   REGISTER AND SUBREG LIVES
+   Like DF_RL, but fine-grained tracking of subreg lifecycle.
+   
*/
+
+/* Private data used to verify the solution for this problem.  */
+struct df_live_subreg_problem_data
+{
+  /* An obstack for the bitmaps we need for this problem.  */
+  bitmap_obstack live_subreg_bitmaps;
+  bool has_subreg_live_p;
+};
+
+/* Helper functions */
+
+/* Return true if REGNO is a pseudo and MODE is a multil regs size.  */
+bool
+need_track_subreg (int 

[PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general

2023-11-07 Thread Lehua Ding
This patch does not make any functional changes. It mainly refactor two parts:

1. The ira_allocno's objects field is expanded to an scalable array, and 
multi-word
   pseduo registers are split and tracked only when necessary.
2. Since the objects array has been expanded, there will be more subreg objects
   that pass through later, rather than the previous fixed two. Therefore, it
   is necessary to modify the detection of whether two objects conflict, and
   the check method is to pull back the registers occupied by the object to
   the first register of the allocno for judgment.

gcc/ChangeLog:

* hard-reg-set.h (struct HARD_REG_SET): Add operator>>.
* ira-build.cc (init_object_start_and_nregs): New func.
(find_object): Ditto.
(ira_create_allocno): Adjust.
(ira_set_allocno_class): Set subreg info.
(ira_create_allocno_objects): Adjust.
(init_regs_with_subreg): Collect access in subreg.
(ira_build): Call init_regs_with_subreg
(ira_destroy): Clear regs_with_subreg
* ira-color.cc (setup_profitable_hard_regs): Adjust.
(get_conflict_and_start_profitable_regs): Adjust.
(check_hard_reg_p): Adjust.
(assign_hard_reg): Adjust.
(improve_allocation): Adjust.
* ira-int.h (struct ira_object): Adjust fields.
(struct ira_allocno): Adjust objects filed.
(ALLOCNO_NUM_OBJECTS): Adjust.
(ALLOCNO_UNIT_SIZE): New.
(ALLOCNO_TRACK_SUBREG_P): New.
(ALLOCNO_NREGS): New.
(OBJECT_SIZE): New.
(OBJECT_OFFSET): New.
(OBJECT_START): New.
(OBJECT_NREGS): New.
(find_object): New.
(has_subreg_object_p): New.
(get_full_object): New.
* ira.cc (check_allocation): Adjust.

---
 gcc/hard-reg-set.h |  33 +++
 gcc/ira-build.cc   | 106 +++-
 gcc/ira-color.cc   | 234 ++---
 gcc/ira-int.h  |  45 -
 gcc/ira.cc |  52 --
 5 files changed, 349 insertions(+), 121 deletions(-)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..760eadba186 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -113,6 +113,39 @@ struct HARD_REG_SET
 return !operator== (other);
   }
 
+  HARD_REG_SET
+  operator>> (unsigned int shift_amount) const
+  {
+if (shift_amount == 0)
+  return *this;
+
+HARD_REG_SET res;
+unsigned int total_bits = sizeof (HARD_REG_ELT_TYPE) * 8;
+if (shift_amount >= total_bits)
+  {
+   unsigned int n_elt = shift_amount % total_bits;
+   shift_amount -= n_elt * total_bits;
+   for (unsigned int i = 0; i < ARRAY_SIZE (elts) - n_elt - 1; i += 1)
+ res.elts[i] = elts[i + n_elt];
+   /* clear upper n_elt elements.  */
+   for (unsigned int i = 0; i < n_elt; i += 1)
+ res.elts[ARRAY_SIZE (elts) - 1 - i] = 0;
+  }
+
+if (shift_amount > 0)
+  {
+   /* The left bits of an element be shifted.  */
+   HARD_REG_ELT_TYPE left = 0;
+   /* Total bits of an element.  */
+   for (int i = ARRAY_SIZE (elts); i >= 0; --i)
+ {
+   res.elts[i] = (elts[i] >> shift_amount) | left;
+   left = elts[i] << (total_bits - shift_amount);
+ }
+  }
+return res;
+  }
+
   HARD_REG_ELT_TYPE elts[HARD_REG_SET_LONGS];
 };
 typedef const HARD_REG_SET _hard_reg_set;
diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 93e46033170..07aba27c1c9 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -440,6 +440,40 @@ initiate_allocnos (void)
   memset (ira_regno_allocno_map, 0, max_reg_num () * sizeof (ira_allocno_t));
 }
 
+/* Update OBJ's start and nregs field according A and OBJ info.  */
+static void
+init_object_start_and_nregs (ira_allocno_t a, ira_object_t obj)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
+  if (ALLOCNO_TRACK_SUBREG_P (a))
+{
+  poly_int64 end = OBJECT_OFFSET (obj) + OBJECT_SIZE (obj);
+  for (int i = 0; i < nregs; i += 1)
+   {
+ poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
+ if (OBJECT_START (obj) < 0 && maybe_lt (OBJECT_OFFSET (obj), right))
+   {
+ OBJECT_START (obj) = i;
+   }
+ if (OBJECT_NREGS (obj) < 0 && maybe_le (end, right))
+   {
+ OBJECT_NREGS (obj) = i + 1 - OBJECT_START (obj);
+ break;
+   }
+   }
+  gcc_assert (OBJECT_START (obj) >= 0 && OBJECT_NREGS (obj) > 0);
+}
+  else
+{
+  OBJECT_START (obj) = 0;
+  OBJECT_NREGS (obj) = nregs;
+}
+}
+
 /* Create and return an object corresponding to a new allocno A.  */
 static ira_object_t
 ira_create_object (ira_allocno_t a, int subword)
@@ -460,15 +494,36 @@ ira_create_object (ira_allocno_t a, int subword)
   OBJECT_MIN (obj) = INT_MAX;
   OBJECT_MAX (obj) = -1;
   

[PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-07 Thread Lehua Ding
   mov z26.d, z4.d
mov z27.d, z5.d
mov z28.d, z6.d
mov z29.d, z7.d
cmp w1, 0
...
```

After these patchs:

```
bar:
ld4d{z28.d - z31.d}, p0/z, [x0]
cmp     w1, 0
    ...
```

Lehua Ding (7):
  ira: Refactor the handling of register conflicts to make it more
general
  ira: Add live_subreg problem and apply to ira pass
  ira: Support subreg live range track
  ira: Support subreg copy
  ira: Add all nregs >= 2 pseudos to tracke subreg list
  lra: Apply live_subreg df_problem to lra pass
  lra: Support subreg live range track and conflict detect

 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 889 ++-
 gcc/df.h |  93 +++-
 gcc/hard-reg-set.h   |  33 ++
 gcc/ira-build.cc | 458 
 gcc/ira-color.cc | 851 ++---
 gcc/ira-conflicts.cc | 221 +++---
 gcc/ira-emit.cc  |  24 +-
 gcc/ira-int.h|  67 ++-
 gcc/ira-lives.cc | 527 +--
 gcc/ira.cc   |  77 ++--
 gcc/lra-assigns.cc   | 111 -
 gcc/lra-coalesce.cc  |  20 +-
 gcc/lra-constraints.cc   | 111 +++--
 gcc/lra-int.h|  33 ++
 gcc/lra-lives.cc | 661 -
 gcc/lra-remat.cc |  13 +-
 gcc/lra-spills.cc|  22 +-
 gcc/lra.cc   | 139 +-
 gcc/reginfo.cc   |  14 +
 gcc/rtl.h|  14 +
 gcc/subreg-live-range.cc | 649 
 gcc/subreg-live-range.h  | 343 +++
 gcc/timevar.def  |   1 +
 24 files changed, 4564 insertions(+), 808 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3



Re: [PATCH] RISC-V: Fixed failed rvv combine testcases

2023-11-07 Thread Lehua Ding

Hi Robin,

On 2023/11/7 15:57, Robin Dapp wrote:

Thanks, what I was slightly concerned about is that we now have
the implicit assumption that the initial value is 0.  I mean
that's what the vectorizer does for reductions but theoretically,
wouldn't we also combine other values into 0 now?


Sorry, I'm not understanding what you mean. I think it's only safe to do 
this combine if it's initialized to 0. Because this combine actually 
throws away the operation of adding 0 (via mask operand).


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH] RISC-V: Fixed failed rvv combine testcases

2023-11-06 Thread Lehua Ding

Committed, thanks Juzhe.

On 2023/11/7 15:51, juzhe.zh...@rivai.ai wrote:

LGTM. Thanks for fixing it.


juzhe.zh...@rivai.ai

*From:* Lehua Ding <mailto:lehua.d...@rivai.ai>
*Date:* 2023-11-07 15:49
*To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
*CC:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai>; kito.cheng
<mailto:kito.ch...@gmail.com>; rdapp.gcc
<mailto:rdapp@gmail.com>; palmer <mailto:pal...@rivosinc.com>;
jeffreyalaw <mailto:jeffreya...@gmail.com>; lehua.ding
<mailto:lehua.d...@rivai.ai>
*Subject:* [PATCH] RISC-V: Fixed failed rvv combine testcases
Hi,
This patch fixed the fellowing failed testcases on the trunk:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
scan-assembler-times
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 2
...
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
scan-assembler-times
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3
...
The reason for these failed testcases is the introduce of
.VCOND_MASK_LEN
in midend for other bugfix and further leads to a new vcond_mask_len rtl
pattern after expand. So we need add new combine patterns handle
this case.
Consider this code:
int16_t foo (int8_t *restrict a, int8_t *restrict pred)
{
   int16_t sum = 0;
   for (int i = 0; i < 16; i += 1)
     if (pred[i])
   sum += a[i];
   return sum;
}
Before this patch:
foo:
     vsetivli    zero,16,e8,m1,ta,ma
     vle8.v  v0,0(a1)
     vsetvli a5,zero,e8,m1,ta,ma
     vmsne.vi    v0,v0,0
     vsetvli zero,zero,e16,m2,ta,ma
     li  a3,0
     vmv.v.i v2,0
     vsetivli    zero,16,e16,m2,ta,ma
     vle8.v  v6,0(a0),v0.t
     vmv.s.x v1,a3
     vsetvli a5,zero,e16,m2,ta,ma
     vsext.vf2   v4,v6
     vsetivli    zero,16,e16,m2,tu,ma
     vmerge.vvm  v2,v2,v4,v0
     vsetvli a5,zero,e16,m2,ta,ma
     vredsum.vs  v2,v2,v1
     vmv.x.s a0,v2
     slliw   a0,a0,16
     sraiw   a0,a0,16
     ret
After this patch:
foo:
vsetivli zero,16,e16,m2,ta,ma
li a5,0
vle8.v v0,0(a1)
vmv.s.x v1,a5
vsetvli zero,zero,e8,m1,ta,ma
vmsne.vi v0,v0,0
vle8.v v2,0(a0),v0.t
vwredsum.vs v1,v2,v1,v0.t
vsetvli zero,zero,e16,m1,ta,ma
vmv.x.s a0,v1
slliw a0,a0,16
sraiw a0,a0,16
ret
Combine the vsext.vf2, vmerge.vvm, and vredsum.vs instructions while
reducing the corresponding vsetvl instructions.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_len_):
New combine pattern.
(*cond_len_): Ditto.
(*cond_len_): Ditto.
(*cond_len_extend): Ditto.
(*cond_len_widen_reduc_plus_scal_): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c:
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c:
---
gcc/config/riscv/autovec-opt.md   | 214 ++
.../rvv/autovec/cond/cond_widen_reduc-1.c |  13 +-
.../rvv/autovec/cond/cond_widen_reduc-2.c |  30 +--
3 files changed, 232 insertions(+), 25 deletions(-)
diff --git a/gcc/config/riscv/autovec-opt.md
b/gcc/config/riscv/autovec-opt.md
index d0f8b3cde4e..3c87e66ea49 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -194,6 +194,84 @@
}
[(set_attr "type" "vector")])
+;; Combine sign_extend/zero_extend(vf2) and vcond_mask_len
+(define_insn_and_split "*cond_len_"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+    (if_then_else:VWEXTI
+  (unspec:
+    [(match_operand 4 "vector_length_operand")
+ (match_operand 5 "const_int_operand")
+ (match_operand 6 "const_int_operand")
+ (reg:SI VL_REGNUM)
+ (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  (vec_merge:VWEXTI
+    (any_extend:VWEXTI (match_operand: 2
"register_operand"))
+    (match_operand:VWEXTI 1 "vector_merge_operand")
+ (match_operand: 3 "register_operand"))
+  (match_dup 1)))]
+  "TARGET_VECTOR"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  emit_insn (gen_pred__vf2 (operands[0], operands[3],
operands[1], operands[2],
+ operands[4], operands[5],
operands[6], CONST0_RTX (Pmode)));
+  DONE;
+}
+[(set_attr "type" "vector")])
+
+;; Combine sign_extend/zero_extend(vf4) and vcond_mask_len
+(define_insn_and_split "*cond_len_

[PATCH] RISC-V: Fixed failed rvv combine testcases

2023-11-06 Thread Lehua Ding
Hi,

This patch fixed the fellowing failed testcases on the trunk:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 2
...
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3
...

The reason for these failed testcases is the introduce of .VCOND_MASK_LEN
in midend for other bugfix and further leads to a new vcond_mask_len rtl
pattern after expand. So we need add new combine patterns handle this case.

Consider this code:

int16_t foo (int8_t *restrict a, int8_t *restrict pred)
{
  int16_t sum = 0;
  for (int i = 0; i < 16; i += 1)
if (pred[i])
  sum += a[i];
  return sum;
}

Before this patch:
foo:
vsetivlizero,16,e8,m1,ta,ma
vle8.v  v0,0(a1)
vsetvli a5,zero,e8,m1,ta,ma
vmsne.viv0,v0,0
vsetvli zero,zero,e16,m2,ta,ma
li  a3,0
vmv.v.i v2,0
vsetivlizero,16,e16,m2,ta,ma
vle8.v  v6,0(a0),v0.t
vmv.s.x v1,a3
vsetvli a5,zero,e16,m2,ta,ma
vsext.vf2   v4,v6
vsetivlizero,16,e16,m2,tu,ma
vmerge.vvm  v2,v2,v4,v0
vsetvli a5,zero,e16,m2,ta,ma
vredsum.vs  v2,v2,v1
vmv.x.s a0,v2
slliw   a0,a0,16
sraiw   a0,a0,16
ret

After this patch:
foo:
vsetivlizero,16,e16,m2,ta,ma
li  a5,0
vle8.v  v0,0(a1)
vmv.s.x v1,a5
vsetvli zero,zero,e8,m1,ta,ma
vmsne.viv0,v0,0
vle8.v  v2,0(a0),v0.t
vwredsum.vs v1,v2,v1,v0.t
vsetvli zero,zero,e16,m1,ta,ma
vmv.x.s a0,v1
slliw   a0,a0,16
sraiw   a0,a0,16
ret

Combine the vsext.vf2, vmerge.vvm, and vredsum.vs instructions while
reducing the corresponding vsetvl instructions.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_len_):
New combine pattern.
(*cond_len_): Ditto.
(*cond_len_): Ditto.
(*cond_len_extend): Ditto.
(*cond_len_widen_reduc_plus_scal_): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c:
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c:

---
 gcc/config/riscv/autovec-opt.md   | 214 ++
 .../rvv/autovec/cond/cond_widen_reduc-1.c |  13 +-
 .../rvv/autovec/cond/cond_widen_reduc-2.c |  30 +--
 3 files changed, 232 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index d0f8b3cde4e..3c87e66ea49 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -194,6 +194,84 @@
 }
 [(set_attr "type" "vector")])
 
+;; Combine sign_extend/zero_extend(vf2) and vcond_mask_len
+(define_insn_and_split "*cond_len_"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+(if_then_else:VWEXTI
+  (unspec:
+[(match_operand 4 "vector_length_operand")
+ (match_operand 5 "const_int_operand")
+ (match_operand 6 "const_int_operand")
+ (reg:SI VL_REGNUM)
+ (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  (vec_merge:VWEXTI
+(any_extend:VWEXTI (match_operand: 2 
"register_operand"))
+(match_operand:VWEXTI 1 "vector_merge_operand")
+   (match_operand: 3 "register_operand"))
+  (match_dup 1)))]
+  "TARGET_VECTOR"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  emit_insn (gen_pred__vf2 (operands[0], operands[3], 
operands[1], operands[2],
+ operands[4], operands[5], 
operands[6], CONST0_RTX (Pmode)));
+  DONE;
+}
+[(set_attr "type" "vector")])
+
+;; Combine sign_extend/zero_extend(vf4) and vcond_mask_len
+(define_insn_and_split "*cond_len_"
+  [(set (match_operand:VQEXTI 0 "register_operand")
+(if_then_else:VQEXTI
+  (unspec:
+[(match_operand 4 "vector_length_operand")
+ (match_operand 5 "const_int_operand")
+ (match_operand 6 "const_int_operand")
+ (reg:SI VL_REGNUM)
+ (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  (vec_merge:VQEXTI
+(any_extend:VQEXTI (match_operand: 2 "register_operand"))
+(match_operand:VQEXTI 1 "vector_merge_operand")
+   (match_operand: 3 "register_operand"))
+  (match_dup 1)))]
+  "TARGET_VECTOR"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  emit_insn (gen_pred__vf4 (operands[0], operands[3], 
operands[1], operands[2],
+ operands[4], operands[5], 
operands[6], CONST0_RTX (Pmode)));
+  DONE;
+}
+[(set_attr "type" "vector")])
+
+;; Combine sign_extend/zero_extend(vf8) and vcond_mask_len
+(define_insn_and_split "*cond_len_"
+  [(set (match_operand:VOEXTI 0 "register_operand")
+(if_then_else:VOEXTI
+  (unspec:
+[(match_operand 4 "vector_length_operand")
+ (match_operand 5 "const_int_operand")
+ (match_operand 6 

Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-31 Thread Lehua Ding

Hi Andrew,


Yes and maybe use tree for the type of op_list instead of auto.
I suspect this code was originally written before GCC was written in C++11.
Maybe if this code is being compiled with C++20 we could do something like:
#include 
template< std::same_as... op_types>

To get a decent error message earlier ...


Or I think it's easier to understand without using a template by 
changing it to the following:


inline
gimple_match_op::gimple_match_op (const gimple_match_cond _in,
  code_helper code_in, tree type_in,
  tree ops[], int num_op)
{
  for (int i = 0; i < num_op)
this->ops[i] = ops[i];
}

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-31 Thread Lehua Ding

Hi Andrew,

On 2023/10/31 14:48, Andrew Pinski wrote:

+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond _in,
+ code_helper code_in, tree type_in,
+ tree op0, tree op1, tree op2, tree op3,
+ tree op4, tree op5)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+num_ops (6)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}

Hmm, does it make sense to start to use variadic templates for these
constructors instead of writing them out?
And we can even add a static_assert to make sure the number of
arguments is <= MAX_NUM_OPS to make sure they are correct. And use
std::is_same to make sure we are only passing tree types.


You mean something like this?:

template
inline
gimple_match_op::gimple_match_op (const gimple_match_cond _in,
  code_helper code_in, tree type_in,
  op_types... ops)
  : cond (cond_in), code (code_in), type (type_in), reverse (false),
num_ops (sizeof...(ops))
{
  static_assert (sizeof...(ops) <= MAX_NUM_OPS);
  auto op_list[] = {ops...};
  for (int i = 0; i < sizeof...(ops); i++)
this->ops[i] = op_list[i];
}

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-31 Thread Lehua Ding

Committed, thanks Jeff.

On 2023/9/28 6:24, Jeff Law wrote:



On 9/20/23 07:09, Lehua Ding wrote:

This patch adds combine cond_len_op and vec_cond to cond_len_op like
cond_op.

gcc/ChangeLog:

* gimple-match.h (gimple_match_op::gimple_match_op):
Add interfaces for more arguments.
(gimple_match_op::set_op): Add interfaces for more arguments.
* match.pd: Add support of combining cond_len_op + vec_cond

OK
jeff



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH 2/2] RISC-V: Add assert of the number of vmerge in autovec cond testcases

2023-10-31 Thread Lehua Ding

Committed, thanks Jeff.

On 2023/10/17 11:19, Lehua Ding wrote:

Hi Jeff,

Can you replace riscv_vector with riscv_v?  That way this will still 
work after Joern commits his change to standardize on the riscv_v 
target selector.


OK with that change, no need to wait for a review on V2, just go ahead 
and blast it in.


No problem, I'll tweak it later and submit it. Thanks.



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH] RISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond

2023-10-30 Thread Lehua Ding

Committed, thanks Juzhe.

On 2023/10/31 11:43, juzhe.zh...@rivai.ai wrote:

LGTM.


juzhe.zh...@rivai.ai

*From:* Lehua Ding <mailto:lehua.d...@rivai.ai>
*Date:* 2023-10-31 11:39
*To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
*CC:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai>; kito.cheng
<mailto:kito.ch...@gmail.com>; rdapp.gcc
<mailto:rdapp@gmail.com>; palmer <mailto:pal...@rivosinc.com>;
jeffreyalaw <mailto:jeffreya...@gmail.com>; lehua.ding
<mailto:lehua.d...@rivai.ai>
*Subject:* [PATCH] RISC-V: Add the missed combine of [u]int64 ->
_Float16 and vcond
Hi,
This patch let the INT64 to FP16 convert split to two small converts
(INT64 -> FP32 and FP32 -> FP16) when expanding instead of dealy the
split to split1 pass. This change could make it possible to combine
the FP32 to FP16 and vcond patterns and so we don't need to add an
combine pattern for INT64 to FP16 and vcond patterns.
Consider this code:
   void
   foo (_Float16 *__restrict r, int64_t *__restrict a, _FLoat16
*__restrict b,
    int64_t *__restrict pred, int n)
   {
     for (int i = 0; i < n; i += 1)
   {
     r[i] = pred[i] ? (_Float16) a[i] : b[i];
   }
   }
Before this patch:
   ...
   vfncvt.f.f.w    v2,v2
   vmerge.vvm  v1,v1,v2,v0
   vse16.v v1,0(a0)
   ...
After this patch:
   ...
   vfncvt.f.f.w    v1,v2,v0.t
   vse16.v v1,0(a0)
   ...
gcc/ChangeLog:
* config/riscv/autovec.md (2):
Change to define_expand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Add vfncvt.f.f.w assert.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.
---
gcc/config/riscv/autovec.md  | 5 +
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c   | 2 ++
.../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c   | 2 ++
5 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 5f49d73be44..bfd45dd76ff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -977,14 +977,11 @@
;; This operation can be performed in the loop vectorizer but
unfortunately
;; not applicable for now. We can remove this pattern after loop
vectorizer
;; is able to take care of INT64 to FP16 conversion.
-(define_insn_and_split "2"
+(define_expand "2"
    [(set (match_operand:  0 "register_operand")
(any_float:
   (match_operand:VWWCONVERTI 1 "register_operand")))]
    "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p () &&
!flag_trapping_math"
-  "#"
-  "&& 1"
-  [(const_int 0)]
    {
  rtx single = gen_reg_rtx (mode); /* Get vector SF
mode.  */
diff --git

a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
index f5d3bb4c789..030c8fe33ce 100644
---

a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
+++

b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
@@ -12,4 +12,6 @@
/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
/* { dg-final { scan-assembler
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git

a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
index f5d3bb4c789..030c8fe33ce 100644
---

a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
+++

b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
@@ -12,4 +12,6 @@
/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
/* { dg-final { scan-assembler-times
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+/* { dg-final { scan-assembler-ti

[PATCH] RISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond

2023-10-30 Thread Lehua Ding
Hi,

This patch let the INT64 to FP16 convert split to two small converts
(INT64 -> FP32 and FP32 -> FP16) when expanding instead of dealy the
split to split1 pass. This change could make it possible to combine
the FP32 to FP16 and vcond patterns and so we don't need to add an
combine pattern for INT64 to FP16 and vcond patterns.

Consider this code:
  void
  foo (_Float16 *__restrict r, int64_t *__restrict a, _FLoat16 *__restrict b,
   int64_t *__restrict pred, int n)
  {
for (int i = 0; i < n; i += 1)
  {
r[i] = pred[i] ? (_Float16) a[i] : b[i];
  }
  }

Before this patch:
  ...
  vfncvt.f.f.wv2,v2
  vmerge.vvm  v1,v1,v2,v0
  vse16.v v1,0(a0)
  ...

After this patch:
  ...
  vfncvt.f.f.wv1,v2,v0.t
  vse16.v v1,0(a0)
  ...

gcc/ChangeLog:

* config/riscv/autovec.md (2):
Change to define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Add vfncvt.f.f.w assert.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.

---
 gcc/config/riscv/autovec.md  | 5 +
 .../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c   | 2 ++
 .../riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c   | 2 ++
 .../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c   | 2 ++
 .../riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c   | 2 ++
 5 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 5f49d73be44..bfd45dd76ff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -977,14 +977,11 @@
 ;; This operation can be performed in the loop vectorizer but unfortunately
 ;; not applicable for now. We can remove this pattern after loop vectorizer
 ;; is able to take care of INT64 to FP16 conversion.
-(define_insn_and_split "2"
+(define_expand "2"
   [(set (match_operand:  0 "register_operand")
(any_float:
  (match_operand:VWWCONVERTI 1 "register_operand")))]
   "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p () && 
!flag_trapping_math"
-  "#"
-  "&& 1"
-  [(const_int 0)]
   {
 rtx single = gen_reg_rtx (mode); /* Get vector SF mode.  */
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
index f5d3bb4c789..030c8fe33ce 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
@@ -12,4 +12,6 @@
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
 /* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
index f5d3bb4c789..030c8fe33ce 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
@@ -12,4 +12,6 @@
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
 /* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
index 5ebed2f7fdc..d6298f5351a 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
@@ -12,4 +12,6 @@
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.xu\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 /* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.x\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
 
+/* { dg-final { scan-assembler-times 
{\tvfncvt\.f\.f\.w\tv[0-9]+,v[0-9]+,v0\.t} 2 } } */
+
 /* { dg-final { scan-assembler 
{\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c
index 

Re: [PATCH] Fixup vect_get_and_check_slp_defs for gathers and .MASK_LOAD

2023-10-20 Thread Lehua Ding

Hi Richard,

On 2023/10/20 16:28, Richard Biener wrote:

On Fri, 20 Oct 2023, Lehua Ding wrote:


Hi Richard,

I recompile the testcase with the fixup patch and still get the same ICE.


The following fixes it.


Using this patch did fix it, thank you very much.



 From 377e911b1b64298def75ba9d9c46fdd22fe4cf84 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Fri, 20 Oct 2023 10:25:31 +0200
Subject: [PATCH] Rewrite more refs for epilogue vectorization
To: gcc-patches@gcc.gnu.org

The following makes sure to rewrite all gather/scatter detected by
dataref analysis plus stmts classified as VMAT_GATHER_SCATTER.  Maybe
we need to rewrite all refs, the following covers the cases I've
run into now.

* tree-vect-loop.cc (update_epilogue_loop_vinfo): Rewrite
both STMT_VINFO_GATHER_SCATTER_P and VMAT_GATHER_SCATTER
stmt refs.
---
  gcc/tree-vect-loop.cc | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8877ebde246..4a8b0a18800 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11361,8 +11361,12 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree 
advance)
/* Data references for gather loads and scatter stores do not use the
 updated offset we set using ADVANCE.  Instead we have to make sure the
 reference in the data references point to the corresponding copy of
-the original in the epilogue.  */
-  if (STMT_VINFO_GATHER_SCATTER_P (vect_stmt_to_vectorize (stmt_vinfo)))
+the original in the epilogue.  Make sure to update both
+gather/scatters recognized by dataref analysis and also other
+refs that get_load_store_type classified as VMAT_GATHER_SCATTER.  */
+  auto vstmt_vinfo = vect_stmt_to_vectorize (stmt_vinfo);
+  if (STMT_VINFO_MEMORY_ACCESS_TYPE (vstmt_vinfo) == VMAT_GATHER_SCATTER
+ || STMT_VINFO_GATHER_SCATTER_P (vstmt_vinfo))
{
  DR_REF (dr)
= simplify_replace_tree (DR_REF (dr), NULL_TREE, NULL_TREE,
@@ -11371,9 +11375,6 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree 
advance)
= simplify_replace_tree (DR_BASE_ADDRESS (dr), NULL_TREE, NULL_TREE,
 _in_mapping, );
}
-  else
-   gcc_assert (STMT_VINFO_MEMORY_ACCESS_TYPE (vect_stmt_to_vectorize 
(stmt_vinfo))
-   != VMAT_GATHER_SCATTER);
DR_STMT (dr) = STMT_VINFO_STMT (stmt_vinfo);
stmt_vinfo->dr_aux.stmt = stmt_vinfo;
/* The vector size of the epilogue is smaller than that of the main loop


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH] Fixup vect_get_and_check_slp_defs for gathers and .MASK_LOAD

2023-10-20 Thread Lehua Ding

Hi Richard,

I recompile the testcase with the fixup patch and still get the same ICE.

On 2023/10/20 15:37, Richard Biener wrote:

I went a little bit too simple with implementing SLP gather support
for emulated and builtin based gathers.  The following fixes the
conflict that appears when running into .MASK_LOAD where we rely
on vect_get_operand_map and the bolted-on STMT_VINFO_GATHER_SCATTER_P
checking wrecks that.  The following properly integrates this with
vect_get_operand_map, adding another special index refering to
the vect_check_gather_scatter analyzed offset.

This unbreaks aarch64 (and hopefully riscv), I'll followup with
more fixes and testsuite coverage for x86 where I think I got
masked gather SLP support wrong.

Boostrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

* tree-vect-slp.cc (off_map, off_op0_map, off_arg2_map,
off_arg3_arg2_map): New.
(vect_get_operand_map): Get flag whether the stmt was
recognized as gather or scatter and use the above
accordingly.
(vect_get_and_check_slp_defs): Adjust.
(vect_build_slp_tree_2): Likewise.
---
  gcc/tree-vect-slp.cc | 57 +++-
  1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 8efff2e912d..c905ed40a94 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -508,6 +508,10 @@ static const int arg2_map[] = { 1, 2 };
  static const int arg1_arg4_map[] = { 2, 1, 4 };
  static const int arg3_arg2_map[] = { 2, 3, 2 };
  static const int op1_op0_map[] = { 2, 1, 0 };
+static const int off_map[] = { 1, -3 };
+static const int off_op0_map[] = { 2, -3, 0 };
+static const int off_arg2_map[] = { 2, -3, 2 };
+static const int off_arg3_arg2_map[] = { 3, -3, 3, 2 };
  static const int mask_call_maps[6][7] = {
{ 1, 1, },
{ 2, 1, 2, },
@@ -525,11 +529,14 @@ static const int mask_call_maps[6][7] = {
 - for each child node, the index of the argument associated with that node.
   The special index -1 is the first operand of an embedded comparison and
   the special index -2 is the second operand of an embedded comparison.
+ The special indes -3 is the offset of a gather as analyzed by
+ vect_check_gather_scatter.
  
 SWAP is as for vect_get_and_check_slp_defs.  */
  
  static const int *

-vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
+vect_get_operand_map (const gimple *stmt, bool gather_scatter_p = false,
+ unsigned char swap = 0)
  {
if (auto assign = dyn_cast (stmt))
  {
@@ -539,6 +546,8 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
if (TREE_CODE_CLASS (gimple_assign_rhs_code (assign)) == tcc_comparison
  && swap)
return op1_op0_map;
+  if (gather_scatter_p)
+   return gimple_vdef (stmt) ? off_op0_map : off_map;
  }
gcc_assert (!swap);
if (auto call = dyn_cast (stmt))
@@ -547,7 +556,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
switch (gimple_call_internal_fn (call))
  {
  case IFN_MASK_LOAD:
-   return arg2_map;
+   return gather_scatter_p ? off_arg2_map : arg2_map;
  
  	  case IFN_GATHER_LOAD:

return arg1_map;
@@ -556,7 +565,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
return arg1_arg4_map;
  
  	  case IFN_MASK_STORE:

-   return arg3_arg2_map;
+   return gather_scatter_p ? off_arg3_arg2_map : arg3_arg2_map;
  
  	  case IFN_MASK_CALL:

{
@@ -611,6 +620,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char 
swap,
enum vect_def_type dt = vect_uninitialized_def;
slp_oprnd_info oprnd_info;
gather_scatter_info gs_info;
+  bool gs_p = false;
unsigned int commutative_op = -1U;
bool first = stmt_num == 0;
  
@@ -620,7 +630,9 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,

  return -1;
  
number_of_oprnds = gimple_num_args (stmt_info->stmt);

-  const int *map = vect_get_operand_map (stmt_info->stmt, swap);
+  const int *map
+= vect_get_operand_map (stmt_info->stmt,
+   STMT_VINFO_GATHER_SCATTER_P (stmt_info), swap);
if (map)
  number_of_oprnds = *map++;
if (gcall *stmt = dyn_cast  (stmt_info->stmt))
@@ -642,8 +654,22 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
enum vect_def_type *dts = XALLOCAVEC (enum vect_def_type, number_of_oprnds);
for (i = 0; i < number_of_oprnds; i++)
  {
+  oprnd_info = (*oprnds_info)[i];
int opno = map ? map[i] : int (i);
-  if (opno < 0)
+  if (opno == -3)
+   {
+ gcc_assert (STMT_VINFO_GATHER_SCATTER_P (stmt_info));
+ if (!is_a  (vinfo)
+ || !vect_check_gather_scatter (stmt_info,
+as_a  (vinfo),
+

Re: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-19 Thread Lehua Ding
Committed after the commited of the vsetvl pass refactor patch, thanks 
Robin.


On 2023/10/19 16:43, Robin Dapp wrote:

Hi Juzhe,

as discussed off-list this approach generally makes sense to me so
the patch LGTM once the vsetvl rework is upstream and settled.

Independently, we still need to understand why the more complex
broadcast pattern is not hoisted out of the loop.

Regards
  Robin


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding

Committed, thanks Patrick and Juzhe.

On 2023/10/20 2:04, Patrick O'Neill wrote:

I tested it this morning on my machine and it passed!

Tested against:
04d6c74564b7eb51660a00b35353aeab706b5a50

Using targets:
glibc rv32gcv qemu
glibc rv64gcv qemu

This patch series does not introduce any new failures.

Here's a list of *resolved* failures by this patch series:
rv64gcv:
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

rv32gcv:
FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c execution test
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

Thanks for the quick revision Lehua!

Tested-by: Patrick O'Neill 

Patrick

On 10/19/23 01:50, 钟居哲 wrote:

LGTM now. But wait for Patrick CI testing.

Hi, @Patrick. Could you apply this patch and trigger CI in your 
github  so that we can see the full running result.


Issues · patrick-rivos/riscv-gnu-toolchain · GitHub 
<https://github.com/patrick-rivos/riscv-gnu-toolchain/issues>



juzhe.zh...@rivai.ai

*From:* Lehua Ding <mailto:lehua.d...@rivai.ai>
*Date:* 2023-10-19 16:33
*To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
*CC:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai>; kito.cheng
<mailto:kito.ch...@gmail.com>; rdapp.gcc
<mailto:rdapp@gmail.com>; palmer <mailto:pal...@rivosinc.com>;
jeffreyalaw <mailto:jeffreya...@gmail.com>; lehua.ding
<mailto:lehua.d...@rivai.ai>
*Subject:* [PATCH V3 00/11] Refactor and cleanup vsetvl pass
This patch refactors and cleanups the vsetvl pass in order to make
the code
easier to modify and understand. This patch does several things:
1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3
only maintain
   and modify this virtual CFG. Phase 4 performs insertion,
modification and
   deletion of vsetvl insns based on the virtual CFG. The Basic
block in the
   virtual CFG is called vsetvl_block_info and the vsetvl
information inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the
demand system,
   this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to
uplift vsetvl
   info to a pred basic block to a more unified method that there
is a vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and
Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5
removes the avl
   operand from the RVV instruction and removes the unused dest
operand
   register from the vsetvl insns.
These modifications resulted in some testcases needing to be
updated. The reasons
for updating are summarized below:
1. more optimized
vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
   vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
2. less unnecessary fusion
avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c/
4. add some bugfix testcases.
   pr111037-3.c/pr111037-4.c
   avl_single-89.c
PR target/111037
PR target/111234
PR target/111725
Lehua Ding (11):
  RISC-V: P1: Refactor
avl_info/vl_vtype_info/vector_insn_info/vector_block_info
  RISC-V: P2: Refactor and cleanup demand system
  RISC-V: P3: Refactor vector_infos_manager
  RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
  RISC-V: P5: combine phase 1 and 2
  RISC-V: P6: Add computing reaching definition data flow
  RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
  RISC-V: P8: Refactor emit-vsetvl phase and delete post optimization
  RISC-V: P9: Cleanup and reorganize helper functions
  RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def
  RISC-V: P11: Adjust and add testcases
gcc/config/riscv/riscv-vsetvl.cc  | 6502 +++--
gcc/config/riscv/riscv-vsetvl.def |  641 +-
gcc/config/riscv/riscv-vsetvl.h   |  488 --
gcc/config/riscv/t-riscv  |    2 +-
.../gcc.target/riscv/rvv/base/scalar_move-1.c |    2 +-
.../riscv/rvv/vsetvl/avl_single-104.c |   35 +
.../riscv/rvv/vsetvl/avl_single-105.c |   23 +
.../riscv/rvv/vsetvl/avl_single-106.c  

Re: [PATCH 2/2] tree-optimization/111131 - SLP for non-IFN gathers

2023-10-19 Thread Lehua Ding

Hi Richard,

I'm hitting a couple of testcase ICEs for RISC-V while testing with the 
latest trunk code, it looks like these two patches are causing it, can 
you help me look at it? ICE log like bellow:


➜  vsetvl git:(tintin-dev) 
~/open-source/riscv-gnu-toolchain-golden/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug-4/install/bin/riscv64-unknown-elf-gcc 
-march=rv64gcv_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output   -ftree-vectorize -O3 --param 
riscv-autovec-lmul=m1 --param=riscv-autovec-preference=scalable 
-fno-vect-cost-model -ffast-math 
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
 -march=rv64gcv_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output -ftree-vectorize -O2 --param 
riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 
-fno-vect-cost-model -ffast-math -mcmodel=medany -lm

during GIMPLE pass: vect
In file included from 
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c:4:
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: 
In function 'f_int16_t_8':
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:15:3: 
internal compiler error: in update_epilogue_loop_vinfo, at 
tree-vect-loop.cc:11376
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:23:3: 
note: in expansion of macro 'TEST_LOOP'
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:31:3: 
note: in expansion of macro 'TEST_TYPE'
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:41:1: 
note: in expansion of macro 'TEST_ALL'

0x1bf5838 update_epilogue_loop_vinfo
../../../../gcc/gcc/tree-vect-loop.cc:11375
0x1bf746f vect_transform_loop(_loop_vec_info*, gimple*)
../../../../gcc/gcc/tree-vect-loop.cc:11826
0x1c4e8a7 vect_transform_loops
../../../../gcc/gcc/tree-vectorizer.cc:1007
0x1c4eff9 try_vectorize_loop_1
../../../../gcc/gcc/tree-vectorizer.cc:1152
0x1c4f135 try_vectorize_loop
../../../../gcc/gcc/tree-vectorizer.cc:1184
0x1c4f3e5 execute
../../../../gcc/gcc/tree-vectorizer.cc:1298
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).

Please include the complete backtrace with any bug report.
See  for instructions.


On 2023/10/19 19:47, Richard Biener wrote:

The following implements SLP vectorization support for gathers
without relying on IFNs being pattern detected (and supported by
the target).  That includes support for emulated gathers but also
the legacy x86 builtin path.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push.

Richard.

PR tree-optimization/31
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
sure to update all gather/scatter stmt DRs, not only those
that eventually got VMAT_GATHER_SCATTER set.
* tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
(vect_get_and_check_slp_defs): Handle gathers/scatters,
adding the offset as SLP operand and comparing base and scale.
(vect_build_slp_tree_1): Handle gathers.
(vect_build_slp_tree_2): Likewise.

* gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
everywhere.
* gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
Massage the scale case to more reliably produce a different
one.  Scan for the specific messages.
* gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
for AVX2, but not emulated.
* gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
Massage to more properly ensure this.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
everywhere.
---
  .../gcc.dg/vect/tsvc/vect-tsvc-s353.c |  2 +-
  gcc/testsuite/gcc.dg/vect/vect-gather-1.c |  2 +-
  gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 13 --
  gcc/testsuite/gcc.dg/vect/vect-gather-3.c |  2 +-
  gcc/testsuite/gcc.dg/vect/vect-gather-4.c |  6 +--
  gcc/tree-vect-loop.cc |  6 ++-
  gcc/tree-vect-slp.cc  | 45 +--
  7 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
index 98ba7522471..2c4fa3f5991 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
@@ -44,4 +44,4 @@ int main (int argc, char **argv)
return 0;
  }
  
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! riscv_v } } } } */

+/* { dg-final { scan-tree-dump 

Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding

Hi Patrick,

Thanks a lot for helping to test these patchs!

On 2023/10/20 2:04, Patrick O'Neill wrote:

I tested it this morning on my machine and it passed!

Tested against:
04d6c74564b7eb51660a00b35353aeab706b5a50

Using targets:
glibc rv32gcv qemu
glibc rv64gcv qemu

This patch series does not introduce any new failures.

Here's a list of *resolved* failures by this patch series:
rv64gcv:
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

rv32gcv:
FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c execution test
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

Thanks for the quick revision Lehua!

Tested-by: Patrick O'Neill 

Patrick

On 10/19/23 01:50, 钟居哲 wrote:

LGTM now. But wait for Patrick CI testing.

Hi, @Patrick. Could you apply this patch and trigger CI in your 
github  so that we can see the full running result.


Issues · patrick-rivos/riscv-gnu-toolchain · GitHub 
<https://github.com/patrick-rivos/riscv-gnu-toolchain/issues>



juzhe.zh...@rivai.ai

*From:* Lehua Ding <mailto:lehua.d...@rivai.ai>
*Date:* 2023-10-19 16:33
*To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
*CC:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai>; kito.cheng
<mailto:kito.ch...@gmail.com>; rdapp.gcc
<mailto:rdapp@gmail.com>; palmer <mailto:pal...@rivosinc.com>;
jeffreyalaw <mailto:jeffreya...@gmail.com>; lehua.ding
<mailto:lehua.d...@rivai.ai>
*Subject:* [PATCH V3 00/11] Refactor and cleanup vsetvl pass
This patch refactors and cleanups the vsetvl pass in order to make
the code
easier to modify and understand. This patch does several things:
1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3
only maintain
   and modify this virtual CFG. Phase 4 performs insertion,
modification and
   deletion of vsetvl insns based on the virtual CFG. The Basic
block in the
   virtual CFG is called vsetvl_block_info and the vsetvl
information inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the
demand system,
   this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to
uplift vsetvl
   info to a pred basic block to a more unified method that there
is a vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and
Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5
removes the avl
   operand from the RVV instruction and removes the unused dest
operand
   register from the vsetvl insns.
These modifications resulted in some testcases needing to be
updated. The reasons
for updating are summarized below:
1. more optimized
vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
   vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
2. less unnecessary fusion
avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c/
4. add some bugfix testcases.
   pr111037-3.c/pr111037-4.c
   avl_single-89.c
PR target/111037
PR target/111234
PR target/111725
Lehua Ding (11):
  RISC-V: P1: Refactor
avl_info/vl_vtype_info/vector_insn_info/vector_block_info
  RISC-V: P2: Refactor and cleanup demand system
  RISC-V: P3: Refactor vector_infos_manager
  RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
  RISC-V: P5: combine phase 1 and 2
  RISC-V: P6: Add computing reaching definition data flow
  RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
  RISC-V: P8: Refactor emit-vsetvl phase and delete post optimization
  RISC-V: P9: Cleanup and reorganize helper functions
  RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def
  RISC-V: P11: Adjust and add testcases
gcc/config/riscv/riscv-vsetvl.cc  | 6502 +++--
gcc/config/riscv/riscv-vsetvl.def |  641 +-
gcc/config/riscv/riscv-vsetvl.h   |  488 --
gcc/config/riscv/t-riscv  |    2 +-
.../gcc.target/riscv/rvv/base/scalar_move-1.c |    2 +-
.../riscv/rvv/vsetvl/avl_single-104.c |   35 +
.../riscv/rvv/vsetvl/avl_single-105.c |   23 +
.../riscv/rvv/v

Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding

Okay, thanks anyway.

On 2023/10/19 16:38, Robin Dapp wrote:

Hi Lehua,

thanks for the extensive rework.  I'm going to let Juzhe handle the review
since it's his pass and he knows it best.  Delegated it to him in patchwork.

Regards
  Robin



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH V2 00/14] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding
bler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -Os 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-65.c   -O2 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-65.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-65.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/pr111037-2.c   -O0 scan-assembler-not 
vsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/pr111037-2.c   -O1 scan-assembler-not 
vsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/pr111037-2.c   -O2 scan-assembler-not 
vsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/pr111037-2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-not vsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/pr111037-2.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-not vsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/pr111037-2.c   -O3 -g 
scan-assembler-not vsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/pr111037-2.c   -Os scan-assembler-not 
vsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-10.c   -O3 -g 
scan-assembler-times vsetvli 15
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-11.c   -O3 -g 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-12.c   -O3 -g 
scan-assembler-times vsetvli 9
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-13.c   -O3 -g 
scan-assembler-times vsetvli 9
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-14.c   -O3 -g 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-15.c   -O3 -g 
scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-16.c   -O3 -g 
scan-assembler-times vsetvli 15
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-18.c   -O3 -g 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-19.c   -O3 -g 
scan-assembler-times vsetvli 15
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_single_block-9.c   -O3 -g 
scan-assembler-times vsetvli 15


On 10/17/23 04:34, Lehua Ding wrote:
This patch refactors and cleanups the vsetvl pass in order to make the 
code

easier to modify and understand. This patch does several things:

1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3 
only maintain
    and modify this virtual CFG. Phase 4 performs insertion, 
modification and
    deletion of vsetvl insns based on the virtual CFG. The Basic block 
in the
    virtual CFG is called vsetvl_block_info and the vsetvl information 
inside

    is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the demand 
system,

    this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to 
uplift vsetvl
    info to a pred basic block to a more unified method that there is 
a vsetvl

    info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and Phase 5.
    Phase 4 is responsible for inserting, modifying and deleting vsetvl
    instructions based on fully optimized vsetvl infos. Phase 5 
removes the avl

    operand from the RVV instruction and removes the unused dest operand
    register from the vsetvl insns.

These modifications resulted in some testcases needing to be updated. 
The reasons

for updating are summarized below:

1. more optimized
    vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
    vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
    avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
2. less unnecessary fusion
    avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
    scalar_move-1.c/
4. add some bugfix testcases.
    pr111037-3.c/pr111037-4.c
    avl_single-89.c

PR target/111037
PR target/111234
PR target/111725


Lehua Ding (14):
   RISC-V: P1: Refactor avl_info/vl_vtype_info/vector_insn_info
   RISC-V: P2: Refactor and cleanup demand system
   RISC-V: P3: Refactor class vector_infos_manager to pre_vsetvl
   RISC-V: P4: move method from class pass_vsetvl to pre_vsetvl
   RISC-V

[PATCH V3 10/11] RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.def (DEF_INCOMPATIBLE_COND): Removed.
(DEF_SEW_LMUL_RULE): New.
(DEF_SEW_LMUL_FUSE_RULE): Removed.
(DEF_POLICY_RULE): New.
(DEF_UNAVAILABLE_COND): Removed.
(DEF_AVL_RULE): New.
(sew_lmul): New.
(ratio_only): New.
(sew_only): New.
(ge_sew): New.
(ratio_and_ge_sew): New.
(tail_mask_policy): New.
(tail_policy_only): New.
(mask_policy_only): New.
(ignore_policy): New.
(avl): New.
(non_zero_avl): New.
(ignore_avl): New.
* config/riscv/t-riscv: Removed.
* config/riscv/riscv-vsetvl.h: Removed.

---
 gcc/config/riscv/riscv-vsetvl.def | 641 +++---
 gcc/config/riscv/riscv-vsetvl.h   | 488 ---
 gcc/config/riscv/t-riscv  |   2 +-
 3 files changed, 155 insertions(+), 976 deletions(-)
 delete mode 100644 gcc/config/riscv/riscv-vsetvl.h

diff --git a/gcc/config/riscv/riscv-vsetvl.def 
b/gcc/config/riscv/riscv-vsetvl.def
index 709cc4ee0df..401d2c6f421 100644
--- a/gcc/config/riscv/riscv-vsetvl.def
+++ b/gcc/config/riscv/riscv-vsetvl.def
@@ -18,496 +18,163 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#ifndef DEF_INCOMPATIBLE_COND
-#define DEF_INCOMPATIBLE_COND(AVL1, SEW1, LMUL1, RATIO1, NONZERO_AVL1, 
\
- GE_SEW1, TAIL_POLICTY1, MASK_POLICY1, AVL2,  \
- SEW2, LMUL2, RATIO2, NONZERO_AVL2, GE_SEW2,  \
- TAIL_POLICTY2, MASK_POLICY2, COND)
+/* DEF_XXX_RULE (prev_demand, next_demand, fused_demand, compatible_p,
+   available_p, fuse)
+   prev_demand: the prev vector insn's sew_lmul_type
+   next_demand: the next vector insn's sew_lmul_type
+   fused_demand: if them are compatible, change prev_info demand to the
+fused_demand after fuse prev_info and next_info
+   compatible_p: check if prev_demand and next_demand are compatible
+   available_p: check if prev_demand is available for next_demand
+   fuse: if them are compatible, how to modify prev_info  */
+
+#ifndef DEF_SEW_LMUL_RULE
+#define DEF_SEW_LMUL_RULE(prev_demand, next_demand, fused_demand,  
\
+ compatible_p, available_p, fuse)
 #endif
 
-#ifndef DEF_SEW_LMUL_FUSE_RULE
-#define DEF_SEW_LMUL_FUSE_RULE(DEMAND_SEW1, DEMAND_LMUL1, DEMAND_RATIO1,   
\
-  DEMAND_GE_SEW1, DEMAND_SEW2, DEMAND_LMUL2,  \
-  DEMAND_RATIO2, DEMAND_GE_SEW2, NEW_DEMAND_SEW,  \
-  NEW_DEMAND_LMUL, NEW_DEMAND_RATIO,  \
-  NEW_DEMAND_GE_SEW, NEW_SEW, NEW_VLMUL,  \
-  NEW_RATIO)
+#ifndef DEF_POLICY_RULE
+#define DEF_POLICY_RULE(prev_demand, next_demand, fused_demand, compatible_p,  
\
+   available_p, fuse)
 #endif
 
-#ifndef DEF_UNAVAILABLE_COND
-#define DEF_UNAVAILABLE_COND(AVL1, SEW1, LMUL1, RATIO1, NONZERO_AVL1, GE_SEW1, 
\
-TAIL_POLICTY1, MASK_POLICY1, AVL2, SEW2, LMUL2,   \
-RATIO2, NONZERO_AVL2, GE_SEW2, TAIL_POLICTY2, \
-MASK_POLICY2, COND)
+#ifndef DEF_AVL_RULE
+#define DEF_AVL_RULE(prev_demand, next_demand, fused_demand, compatible_p, 
\
+available_p, fuse)
 #endif
 
-/* Case 1: Demand compatible AVL.  */
-DEF_INCOMPATIBLE_COND (/*AVL*/ DEMAND_TRUE, /*SEW*/ DEMAND_ANY,
-  /*LMUL*/ DEMAND_ANY, /*RATIO*/ DEMAND_ANY,
-  /*NONZERO_AVL*/ DEMAND_FALSE, /*GE_SEW*/ DEMAND_ANY,
-  /*TAIL_POLICTY*/ DEMAND_ANY, /*MASK_POLICY*/ DEMAND_ANY,
-  /*AVL*/ DEMAND_TRUE, /*SEW*/ DEMAND_ANY,
-  /*LMUL*/ DEMAND_ANY, /*RATIO*/ DEMAND_ANY,
-  /*NONZERO_AVL*/ DEMAND_FALSE, /*GE_SEW*/ DEMAND_ANY,
-  /*TAIL_POLICTY*/ DEMAND_ANY, /*MASK_POLICY*/ DEMAND_ANY,
-  /*COND*/ incompatible_avl_p)
-
-/* Case 2: Demand same SEW.  */
-DEF_INCOMPATIBLE_COND (/*AVL*/ DEMAND_ANY, /*SEW*/ DEMAND_TRUE,
-  /*LMUL*/ DEMAND_ANY, /*RATIO*/ DEMAND_ANY,
-  /*NONZERO_AVL*/ DEMAND_ANY, /*GE_SEW*/ DEMAND_FALSE,
-  /*TAIL_POLICTY*/ DEMAND_ANY, /*MASK_POLICY*/ DEMAND_ANY,
-  /*AVL*/ DEMAND_ANY, /*SEW*/ DEMAND_TRUE,
-  /*LMUL*/ DEMAND_ANY, /*RATIO*/ DEMAND_ANY,
-  /*NONZERO_AVL*/ DEMAND_ANY, /*GE_SEW*/ DEMAND_FALSE,
-  /*TAIL_POLICTY*/ DEMAND_ANY, /*MASK_POLICY*/ DEMAND_ANY,
-  /*COND*/ different_sew_p)
-
-/* Case 3: Demand same LMUL.  */
-DEF_INCOMPATIBLE_COND (/*AVL*/ DEMAND_ANY, /*SEW*/ 

[PATCH V3 11/11] RISC-V: P11: Adjust and add testcases

2023-10-19 Thread Lehua Ding
PR target/111037
PR target/111234
PR target/111725

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-84.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109773-1.c: Adjust.
* gcc.target/riscv/rvv/base/pr111037-1.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-1.c: ...here.
* gcc.target/riscv/rvv/base/pr111037-2.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-2.c: ...here.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-12.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-13.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-18.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-104.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-105.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-106.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-107.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-108.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-109.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-4.c: New test.

---
 .../gcc.target/riscv/rvv/base/scalar_move-1.c |  2 +-
 .../riscv/rvv/vsetvl/avl_single-104.c | 35 +++
 .../riscv/rvv/vsetvl/avl_single-105.c | 23 ++
 .../riscv/rvv/vsetvl/avl_single-106.c | 34 ++
 .../riscv/rvv/vsetvl/avl_single-107.c | 41 +
 .../riscv/rvv/vsetvl/avl_single-108.c | 41 +
 .../riscv/rvv/vsetvl/avl_single-109.c | 45 +++
 .../riscv/rvv/vsetvl/avl_single-23.c  |  7 +--
 .../riscv/rvv/vsetvl/avl_single-46.c  |  3 +-
 .../riscv/rvv/vsetvl/avl_single-84.c  |  5 +--
 .../riscv/rvv/vsetvl/avl_single-89.c  |  8 ++--
 .../riscv/rvv/vsetvl/avl_single-95.c  |  2 +-
 .../riscv/rvv/vsetvl/imm_bb_prop-1.c  |  7 +--
 .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/pr109773-1.c  |  2 +-
 .../riscv/rvv/{base => vsetvl}/pr111037-1.c   |  0
 .../riscv/rvv/{base => vsetvl}/pr111037-2.c   |  0
 .../gcc.target/riscv/rvv/vsetvl/pr111037-3.c  | 16 +++
 .../gcc.target/riscv/rvv/vsetvl/pr111037-4.c  | 16 +++
 .../riscv/rvv/vsetvl/vlmax_back_prop-25.c | 10 ++---
 .../riscv/rvv/vsetvl/vlmax_back_prop-26.c | 10 ++---
 .../riscv/rvv/vsetvl/vlmax_conflict-12.c  |  1 -
 .../riscv/rvv/vsetvl/vlmax_conflict-3.c   |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-13.c   |  4 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-18.c   |  4 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c   |  2 +-
 26 files changed, 288 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-105.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-106.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-107.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-108.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-109.c
 rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-1.c (100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-2.c (100%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
index 18349132a88..c833d8989e9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
@@ -46,8 +46,8 @@ int32_t foo3 (int32_t *base, size_t vl)
 ** vl1re32\.v\tv[0-9]+,0\([a-x0-9]+\)
 ** vsetvli\tzero,[a-x0-9]+,e32,m1,t[au],m[au]
 ** vadd.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
-** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au]
 ** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+
+** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au]
 ** vmv.v.x\tv[0-9]+,\s*[a-x0-9]+
 ** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+
 ** ret
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c 

[PATCH V3 06/11] RISC-V: P6: Add computing reaching definition data flow

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_avl_def_data): New.
(pre_vsetvl::compute_vsetvl_def_data): New.
(pre_vsetvl::compute_lcm_local_properties): New.

---
 gcc/config/riscv/riscv-vsetvl.cc | 395 +++
 1 file changed, 395 insertions(+)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index dad3d7c941e..27d47d7c039 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2722,6 +2722,401 @@ public:
 };
 
 
+void
+pre_vsetvl::compute_avl_def_data ()
+{
+  if (bitmap_empty_p (m_avl_regs))
+return;
+
+  unsigned num_regs = GP_REG_LAST + 1;
+  unsigned num_bbs = last_basic_block_for_fn (cfun);
+
+  sbitmap *avl_def_loc_temp = sbitmap_vector_alloc (num_bbs, num_regs);
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  bitmap_and (avl_def_loc_temp[bb->index ()], m_avl_regs,
+ m_reg_def_loc[bb->index ()]);
+
+  vsetvl_block_info _info = get_block_info (bb);
+  if (block_info.has_info ())
+   {
+ vsetvl_info _info = block_info.get_exit_info ();
+ gcc_assert (footer_info.valid_p ());
+ if (footer_info.has_vl ())
+   bitmap_set_bit (avl_def_loc_temp[bb->index ()],
+   REGNO (footer_info.get_vl ()));
+   }
+}
+
+  if (m_avl_def_in)
+sbitmap_vector_free (m_avl_def_in);
+  if (m_avl_def_out)
+sbitmap_vector_free (m_avl_def_out);
+
+  unsigned num_exprs = num_bbs * num_regs;
+  sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs);
+  sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs);
+  m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs);
+  m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs);
+
+  bitmap_vector_clear (avl_def_loc, num_bbs);
+  bitmap_vector_clear (m_kill, num_bbs);
+  bitmap_vector_clear (m_avl_def_out, num_bbs);
+
+  unsigned regno;
+  sbitmap_iterator sbi;
+  for (const bb_info *bb : crtl->ssa->bbs ())
+EXECUTE_IF_SET_IN_BITMAP (avl_def_loc_temp[bb->index ()], 0, regno, sbi)
+  {
+   bitmap_set_bit (avl_def_loc[bb->index ()],
+   get_expr_id (bb->index (), regno, num_bbs));
+   bitmap_set_range (m_kill[bb->index ()], regno * num_bbs, num_bbs);
+  }
+
+  basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+  EXECUTE_IF_SET_IN_BITMAP (m_avl_regs, 0, regno, sbi)
+bitmap_set_bit (m_avl_def_out[entry->index],
+   get_expr_id (entry->index, regno, num_bbs));
+
+  compute_reaching_defintion (avl_def_loc, m_kill, m_avl_def_in, 
m_avl_def_out);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file,
+  "  Compute avl reaching defition data (num_bbs %d, num_regs "
+  "%d):\n\n",
+  num_bbs, num_regs);
+  fprintf (dump_file, "avl_regs: ");
+  dump_bitmap_file (dump_file, m_avl_regs);
+  fprintf (dump_file, "\nbitmap data:\n");
+  for (const bb_info *bb : crtl->ssa->bbs ())
+   {
+ unsigned int i = bb->index ();
+ fprintf (dump_file, "  BB %u:\n", i);
+ fprintf (dump_file, "avl_def_loc:");
+ unsigned expr_id;
+ sbitmap_iterator sbi;
+ EXECUTE_IF_SET_IN_BITMAP (avl_def_loc[i], 0, expr_id, sbi)
+   {
+ fprintf (dump_file, " (r%u,bb%u)", get_regno (expr_id, num_bbs),
+  get_bb_index (expr_id, num_bbs));
+   }
+ fprintf (dump_file, "\nkill:");
+ EXECUTE_IF_SET_IN_BITMAP (m_kill[i], 0, expr_id, sbi)
+   {
+ fprintf (dump_file, " (r%u,bb%u)", get_regno (expr_id, num_bbs),
+  get_bb_index (expr_id, num_bbs));
+   }
+ fprintf (dump_file, "\navl_def_in:");
+ EXECUTE_IF_SET_IN_BITMAP (m_avl_def_in[i], 0, expr_id, sbi)
+   {
+ fprintf (dump_file, " (r%u,bb%u)", get_regno (expr_id, num_bbs),
+  get_bb_index (expr_id, num_bbs));
+   }
+ fprintf (dump_file, "\navl_def_out:");
+ EXECUTE_IF_SET_IN_BITMAP (m_avl_def_out[i], 0, expr_id, sbi)
+   {
+ fprintf (dump_file, " (r%u,bb%u)", get_regno (expr_id, num_bbs),
+  get_bb_index (expr_id, num_bbs));
+   }
+ fprintf (dump_file, "\n");
+   }
+}
+
+  sbitmap_vector_free (avl_def_loc);
+  sbitmap_vector_free (m_kill);
+  sbitmap_vector_free (avl_def_loc_temp);
+
+  m_dem.set_avl_in_out_data (m_avl_def_in, m_avl_def_out);
+}
+
+void
+pre_vsetvl::compute_vsetvl_def_data ()
+{
+  m_vsetvl_def_exprs.truncate (0);
+  add_expr (m_vsetvl_def_exprs, m_unknow_info);
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  vsetvl_block_info _info = get_block_info (bb);
+  if (block_info.empty_p ())
+   continue;
+  vsetvl_info _info = block_info.get_exit_info ();
+  gcc_assert (footer_info.valid_p () || 

[PATCH V3 07/11] RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
New.
(pre_vsetvl::pre_global_vsetvl_info): New.
(pass_vsetvl::prune_expressions): Removed.
(pass_vsetvl::compute_local_properties): Removed.
(pass_vsetvl::earliest_fusion): Removed.
(pass_vsetvl::vsetvl_fusion): Removed.
(pass_vsetvl::can_refine_vsetvl_p): Removed.
(pass_vsetvl::refine_vsetvls): Removed.
(pass_vsetvl::cleanup_vsetvls): Removed.
(pass_vsetvl::commit_vsetvls): Removed.
(pass_vsetvl::pre_vsetvl): Removed.

---
 gcc/config/riscv/riscv-vsetvl.cc | 1004 +++---
 1 file changed, 361 insertions(+), 643 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 27d47d7c039..855edd6d0f5 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2721,7 +2721,6 @@ public:
   }
 };
 
-
 void
 pre_vsetvl::compute_avl_def_data ()
 {
@@ -3241,6 +3240,367 @@ pre_vsetvl::fuse_local_vsetvl_info ()
 }
 
 
+bool
+pre_vsetvl::earliest_fuse_vsetvl_info ()
+{
+  compute_avl_def_data ();
+  compute_vsetvl_def_data ();
+  compute_lcm_local_properties ();
+
+  unsigned num_exprs = m_exprs.length ();
+  struct edge_list *m_edges = create_edge_list ();
+  unsigned num_edges = NUM_EDGES (m_edges);
+  sbitmap *antin
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs);
+  sbitmap *antout
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs);
+
+  sbitmap *earliest = sbitmap_vector_alloc (num_edges, num_exprs);
+
+  compute_available (m_avloc, m_kill, m_avout, m_avin);
+  compute_antinout_edge (m_antloc, m_transp, antin, antout);
+  compute_earliest (m_edges, num_exprs, antin, antout, m_avout, m_kill,
+   earliest);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "\n  Compute LCM earliest insert data:\n\n");
+  fprintf (dump_file, "Expression List (%u):\n", num_exprs);
+  for (unsigned i = 0; i < num_exprs; i++)
+   {
+ const auto  = *m_exprs[i];
+ fprintf (dump_file, "  Expr[%u]: ", i);
+ info.dump (dump_file, "");
+   }
+  fprintf (dump_file, "\nbitmap data:\n");
+  for (const bb_info *bb : crtl->ssa->bbs ())
+   {
+ unsigned int i = bb->index ();
+ fprintf (dump_file, "  BB %u:\n", i);
+ fprintf (dump_file, "avloc: ");
+ dump_bitmap_file (dump_file, m_avloc[i]);
+ fprintf (dump_file, "kill: ");
+ dump_bitmap_file (dump_file, m_kill[i]);
+ fprintf (dump_file, "antloc: ");
+ dump_bitmap_file (dump_file, m_antloc[i]);
+ fprintf (dump_file, "transp: ");
+ dump_bitmap_file (dump_file, m_transp[i]);
+
+ fprintf (dump_file, "avin: ");
+ dump_bitmap_file (dump_file, m_avin[i]);
+ fprintf (dump_file, "avout: ");
+ dump_bitmap_file (dump_file, m_avout[i]);
+ fprintf (dump_file, "antin: ");
+ dump_bitmap_file (dump_file, antin[i]);
+ fprintf (dump_file, "antout: ");
+ dump_bitmap_file (dump_file, antout[i]);
+   }
+  fprintf (dump_file, "\n");
+  fprintf (dump_file, "  earliest:\n");
+  for (unsigned ed = 0; ed < num_edges; ed++)
+   {
+ edge eg = INDEX_EDGE (m_edges, ed);
+
+ if (bitmap_empty_p (earliest[ed]))
+   continue;
+ fprintf (dump_file, "Edge(bb %u -> bb %u): ", eg->src->index,
+  eg->dest->index);
+ dump_bitmap_file (dump_file, earliest[ed]);
+   }
+  fprintf (dump_file, "\n");
+}
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Fused global info result:\n");
+}
+
+  bool changed = false;
+  for (unsigned ed = 0; ed < num_edges; ed++)
+{
+  sbitmap e = earliest[ed];
+  if (bitmap_empty_p (e))
+   continue;
+
+  unsigned int expr_index;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (e, 0, expr_index, sbi)
+   {
+ vsetvl_info _info = *m_exprs[expr_index];
+ if (!curr_info.valid_p ())
+   continue;
+
+ edge eg = INDEX_EDGE (m_edges, ed);
+ if (eg->probability == profile_probability::never ())
+   continue;
+ if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
+ || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
+   continue;
+
+ vsetvl_block_info _block_info = get_block_info (eg->src);
+ vsetvl_block_info _block_info = get_block_info (eg->dest);
+
+ if (src_block_info.probability
+ == profile_probability::uninitialized ())
+   continue;
+
+ if (src_block_info.empty_p ())
+   {
+ vsetvl_info new_curr_info = curr_info;
+ new_curr_info.set_bb (crtl->ssa->bb 

  1   2   3   4   >