from:"Lehua Ding"

Re: [PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-05-07 Thread Lehua Ding


Hi Vladimir,

I'll send V3 patchs based on these comments. Note that these four 
patches only support subreg liveness tracking and apply to IRA and LRA 
pass. Therefore, no performance changes are expected before we support 
subreg coalesce. There will be new patches later to complete the subreg 
coalesce functionality. Support for subreg coalesce requires support for 
subreg copy i.e. modifying the logic for conflict detection.


On 2024/5/2 00:24, Vladimir Makarov wrote:


On 2/3/24 05:50, Lehua Ding wrote:

This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos to
include new partial_def/use and range_def/use fileds for DF_LIVE_SUBREG
problem.

Typo "fileds".

(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.

The same typo.

(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg liveness.
(live_con_fun_0): Adjust Confluence function to support subreg liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.


The patch is ok for me with some minor requests:

You missed log entry for collect_non_operand_hard_regs.  Log entry for 
lra_create_live_ranges_1 is not full (at least, it should be "Ditto. ...").


Also you changed signature for functions update_live_info, 
fix_bb_live_info, mark_regno_live, mark_regno_dead, new_insn_reg but did 
not updated the function comments.  Outdated comments are even worse 
than the comment absence.  Please fix them.


Also some variable naming could be improved but it is up to you.

So now you need just an approval for the rest patches to commit your 
work but they are not my area responsibility.


It is difficult predict for patches of this size how they will work for 
other targets.  I tested you patches on aarch64 and ppc64le. They seems 
working right but please be prepare to switch them off (it is easy) if 
the patches create some issues for other targets, of course until fixing 
the issues.


And thank you for your contribution.  Improving GCC performance these 
days is a challenging task as so many people are working on GCC but you 
found such opportunity and most importantly implement it.





--
Best,
Lehua

Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-05-07 Thread Lehua Ding


Hi Dimitar,


Thanks for helping to review the code! I will send V3 patch which 
address these comments.



Best,

Lehua


On 2024/4/26 04:56, Dimitar Dimitrov wrote:

On Wed, Apr 24, 2024 at 06:05:03PM +0800, Lehua Ding wrote:

This patch add a new DF problem, named DF_LIVE_SUBREG. This problem
is extended from the DF_LR problem and support track the subreg liveness
of multireg pseudo if these pseudo satisfy the following conditions:

   1. the mode size greater than it's REGMODE_NATURAL_SIZE.
   2. the reg is used in insns via subreg pattern.

The main methods are as follows:

   1. split bitmap in/out/def/use fileds to full_in/out/def/use and
  partial_in/out/def/use. If a pseudo need to be tracked it's subreg
  liveness, then it is recorded in partial_in/out/def/use fileds.
  Meantimes, there are range_in/out/def/use fileds which records the live
  range of the tracked pseudo.
   2. in the df_live_subreg_finalize function, we move the tracked pseudo from
  the partial_in/out/def/use to full_in/out/def/use if the pseudo's live
  range is full.

Hi Lehua,

I'm not familiar with LRA, so my comments bellow could be totally off
point.  Please treat them as mild suggestions.


gcc/ChangeLog:

* Makefile.in: Add subreg-live-range object file.
* df-problems.cc (struct df_live_subreg_problem_data): Private struct
for DF_LIVE_SUBREG problem.
(df_live_subreg_get_bb_info): getting bb regs in/out data.
(get_live_subreg_local_bb_info): getting bb regs use/def data.
(multireg_p): checking is the regno a pseudo multireg.
(need_track_subreg_p): checking is the regno need to be tracked.
(init_range): getting the range of subreg rtx.
(remove_subreg_range): removing use data for the reg/subreg rtx.
(add_subreg_range): adding def/use data for the reg/subreg rtx.
(df_live_subreg_free_bb_info): Free basic block df data.
(df_live_subreg_alloc): Allocate and init df data.
(df_live_subreg_reset): Reset the live in/out df data.
(df_live_subreg_bb_local_compute): Compute basic block df data.
(df_live_subreg_local_compute): Compute all basic blocks df data.
(df_live_subreg_init): Init the in/out df data.
(df_live_subreg_check_result): Assert the full and partial df data.
(df_live_subreg_confluence_0): Confluence function for infinite loops.
(df_live_subreg_confluence_n): Confluence function for normal edge.
(df_live_subreg_transfer_function): Transfer function.
(df_live_subreg_finalize): Finalize the all_in/all_out df data.
(df_live_subreg_free): Free the df data.
(df_live_subreg_top_dump): Dump top df data.
(df_live_subreg_bottom_dump): Dump bottom df data.
(df_live_subreg_add_problem): Add the DF_LIVE_SUBREG problem.
* df.h (enum df_problem_id): Add DF_LIVE_SUBREG.
(class subregs_live): Simple decalare.
(class df_live_subreg_local_bb_info): New class for full/partial def/use
df data.
(class df_live_subreg_bb_info): New class for full/partial in/out
df data.
(df_live_subreg): getting the df_live_subreg data.
(df_live_subreg_add_problem): Exported.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_check_result): Ditto.
(multireg_p): Ditto.
(init_range): Ditto.
(add_subreg_range): Ditto.
(remove_subreg_range): Ditto.
(df_get_subreg_live_in): Accessor the all_in df data.
(df_get_subreg_live_out): Accessor the all_out df data.
(df_get_subreg_live_full_in): Accessor the full_in df data.
(df_get_subreg_live_full_out): Accessor the full_out df data.
(df_get_subreg_live_partial_in): Accessor the partial_in df data.
(df_get_subreg_live_partial_out): Accessor the partial_out df data.
(df_get_subreg_live_range_in): Accessor the range_in df data.
(df_get_subreg_live_range_out): Accessor the range_out df data.
* regs.h (get_nblocks): Get the blocks of mode.
* sbitmap.cc (bitmap_full_p): sbitmap predicator.
(bitmap_same_p): sbitmap predicator.
(test_full): test bitmap_full_p.
(test_same): test bitmap_same_p.
(sbitmap_cc_tests): Add test_full and test_same.
* sbitmap.h (bitmap_full_p): Exported.
(bitmap_same_p): Ditto.
* timevar.def (TV_DF_LIVE_SUBREG): add DF_LIVE_SUBREG timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.
---
  gcc/Makefile.in  |   1 +
  gcc/df-problems.cc   | 855 ++-
  gcc/df.h | 155 +++
  gcc/regs.h   |   5 +
  gcc/sbitmap.cc   |  98 +
  gcc/sbitmap.h|   2 +
  gcc/subreg-live-range.cc |  53 +++
  gcc/subreg-live-range.h  | 206 ++
  gcc/timevar.def  |   1 +
  9 files changed, 1375 insertions(+), 1

[PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-04-24 Thread Lehua Ding

+  if (end_bitno == 0)
+return true;
+
+  gcc_checking_assert (i + 1 == a->size);
+
+  SBITMAP_ELT_TYPE mask = ((SBITMAP_ELT_TYPE) 1 << end_bitno) - 1;
+
+  /* Make sure the tail bits are same.  */
+  return (a->elms[i] & mask) == (b->elms[i] & mask);
+}
+
 /* Set DST to be (A or (B and C)).
Return nonzero if any change is made.  */
 
@@ -994,6 +1047,49 @@ test_bit_in_range ()
   sbitmap_free (s);
 }
 
+/* Verify bitmap_full_p functions for sbitmap.  */
+
+static void
+test_full ()
+{
+  sbitmap s = sbitmap_alloc (193);
+
+  bitmap_clear (s);
+  ASSERT_FALSE (bitmap_full_p (s));
+
+  bitmap_ones (s);
+  ASSERT_TRUE (bitmap_full_p (s));
+
+  bitmap_clear_bit (s, 192);
+  ASSERT_FALSE (bitmap_full_p (s));
+
+  bitmap_ones (s);
+  bitmap_clear_bit (s, 17);
+  ASSERT_FALSE (bitmap_full_p (s));
+}
+
+/* Verify bitmap_same_p functions for sbitmap.  */
+
+static void
+test_same ()
+{
+  sbitmap s1 = sbitmap_alloc (193);
+  sbitmap s2 = sbitmap_alloc (193);
+  sbitmap s3 = sbitmap_alloc (192);
+  
+  ASSERT_FALSE (bitmap_same_p (s1, s3));
+
+  bitmap_clear (s1);
+  bitmap_clear (s2);
+  ASSERT_TRUE (bitmap_same_p (s1, s2));
+  
+  bitmap_set_bit (s2, 192);
+  ASSERT_FALSE (bitmap_same_p (s1, s2));
+  
+  bitmap_set_bit (s1, 192);
+  ASSERT_TRUE (bitmap_same_p (s1, s2));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -1001,6 +1097,8 @@ sbitmap_cc_tests ()
 {
   test_set_range ();
   test_bit_in_range ();
+  test_full ();
+  test_same ();
 }
 
 } // namespace selftest
diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
index da6116ce925..71cfded9fb2 100644
--- a/gcc/sbitmap.h
+++ b/gcc/sbitmap.h
@@ -267,6 +267,7 @@ extern void bitmap_copy (sbitmap, const_sbitmap);
 extern bool bitmap_equal_p (const_sbitmap, const_sbitmap);
 extern unsigned int bitmap_count_bits (const_sbitmap);
 extern bool bitmap_empty_p (const_sbitmap);
+extern bool bitmap_full_p (const_sbitmap);
 extern void bitmap_clear (sbitmap);
 extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
 extern void bitmap_set_range (sbitmap, unsigned, unsigned);
@@ -287,6 +288,7 @@ extern bool bitmap_and (sbitmap, const_sbitmap, 
const_sbitmap);
 extern bool bitmap_ior (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_xor (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_subset_p (const_sbitmap, const_sbitmap);
+extern bool bitmap_same_p (const_sbitmap, const_sbitmap);
 extern bool bitmap_bit_in_range_p (const_sbitmap, unsigned int, unsigned int);
 
 extern int bitmap_first_set_bit (const_sbitmap);
diff --git a/gcc/subreg-live-range.cc b/gcc/subreg-live-range.cc
new file mode 100644
index 000..7e8e081844f
--- /dev/null
+++ b/gcc/subreg-live-range.cc
@@ -0,0 +1,53 @@
+/* SUBREG liveness tracking classes for DF & IRA & LRA.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "subreg-live-range.h"
+
+void
+subregs_live::dump (FILE *file, const char *indent) const
+{
+  if (lives.empty ())
+{
+  fprintf (file, "%sempty\n", indent);
+  return;
+}
+  fprintf (file, "%s", indent);
+  for (auto  : lives)
+{
+  const_sbitmap range = kv.second;
+  if (bitmap_empty_p (range))
+   continue;
+  fprintf (file, "%d: ", kv.first);
+  if (!bitmap_full_p (range))
+   {
+ dump_bitmap_file (file, range);
+ fprintf (file, ",  ");
+   }
+  else
+fprintf (file, "full, ");
+}
+  fprintf (file, "\n");
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live )
+{
+  l.dump (stderr, "");
+}
diff --git a/gcc/subreg-live-range.h b/gcc/subreg-live-range.h
new file mode 100644
index 000..4eafe006935
--- /dev/null
+++ b/gcc/subreg-live-range.h
@@ -0,0 +1,206 @@
+/* SUBREG liveness tracking classes for DF & IRA & LRA.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (

[PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-04-24 Thread Lehua Ding

This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos to
include new partial_def/use and range_def/use fileds for DF_LIVE_SUBREG
problem.
(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg liveness.
(live_con_fun_0): Adjust Confluence function to support subreg liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.
---
 gcc/lra-coalesce.cc|  27 +++-
 gcc/lra-constraints.cc | 109 ++---
 gcc/lra-int.h  |   4 +
 gcc/lra-lives.cc   | 357 -
 gcc/lra-remat.cc   |   8 +-
 gcc/lra-spills.cc  |  27 +++-
 gcc/lra.cc |  10 +-
 7 files changed, 430 insertions(+), 112 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index a9b5b51cb3f..9416775a009 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -186,19 +186,28 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
-  if (! bitmap_empty_p (_pseudos_bitmap))
+  if (!bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  if (flag_track_subreg_liveness)
+   {
+ bitmap_and_compl_into (full, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+ bitmap_and_compl_into (partial, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
+   }
 }
 }
 
@@ -301,8 +310,12 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (df_get_subreg_live_in (bb),
+   df_get_subreg_live_full_in (bb),
+   df_get_subreg_live_partial_in (bb));
+  update_live_info (df_get_subreg_live_out (bb),
+   df_get_subreg_live_full_out (bb),
+   df_get_subreg_live_partial_out (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 10e3d4e4097..9586e5602e4 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6515,34 +6515,86 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = df_get_subreg_live_in (prev_bb);
+ bitmap subreg_full_in = df_get_subreg_live_full_in (prev_bb);
+ bitmap subreg_partial_in = df_get_subreg_live_partial_in 
(prev_bb);
+ subregs_live *range_in = df_get_subreg_live_range_in (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j,

[PATCH 3/4] ira: Apply DF_LIVE_SUBREG data

2024-04-24 Thread Lehua Ding

This patch simple replace df_get_live_in to df_get_subreg_live_in
and replace df_get_live_out to df_get_subreg_live_out.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch to DF_LIVE_SUBREG df data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.
---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 19 ---
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..283ff36d3dd 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1921,7 +1921,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_in (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1937,9 +1938,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = df_get_subreg_live_in (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_out (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..bfebc48ef83 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2786,8 +2786,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2795,8 +2795,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index d347f11fa02..8075b082e36 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = df_get_subreg_live_in (e->dest);
+  regs_live_out_src = df_get_subreg_live_out (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, df_get_subreg_live_in (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, df_get_subreg_live_out (bb));
   add_range_and_copies_from_move_list
(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- bitmap_and (live_through,
- df_get_live_in (e->dest), df_get_live_out (bb));
+ bitmap_and (live_through, df_get_subreg_live_in (e->dest),
+ df_get_subreg_live_out (bb));

[PATCH 1/4] df: Add -ftrack-subreg-liveness option

2024-04-24 Thread Lehua Ding

Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness.
This flag is enabled at -O3/fast.

gcc/ChangeLog:

* common.opt: add -ftrack-subreg-liveness option.
* opts.cc: auto aneble -ftrack-subreg-liveness in -O3/fast
---
 gcc/common.opt  | 4 
 gcc/common.opt.urls | 3 +++
 gcc/doc/invoke.texi | 8 
 gcc/opts.cc | 1 +
 4 files changed, 16 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index ad348844775..bd030973434 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2157,6 +2157,10 @@ fira-share-spill-slots
 Common Var(flag_ira_share_spill_slots) Init(1) Optimization
 Share stack slots for spilled pseudo-registers.
 
+ftrack-subreg-liveness
+Common Var(flag_track_subreg_liveness) Init(0) Optimization
+Track subreg liveness information.
+
 fira-verbose=
 Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
 -fira-verbose= Control IRA's level of diagnostic messages.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index f71ed80a34b..59f27a6f7c6 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -880,6 +880,9 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fira-share-save-slots)
 fira-share-spill-slots
 UrlSuffix(gcc/Optimize-Options.html#index-fira-share-spill-slots)
 
+ftrack-subreg-liveness
+UrlSuffix(gcc/Optimize-Options.html#index-ftrack-subreg-liveness)
+
 fira-verbose=
 UrlSuffix(gcc/Developer-Options.html#index-fira-verbose)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27c31ab0c86..9724cbb32ba 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13186,6 +13186,14 @@ Disable sharing of stack slots allocated for 
pseudo-registers.  Each
 pseudo-register that does not get a hard register gets a separate
 stack slot, and as a result function stack frames are larger.
 
+@opindex ftrack-subreg-liveness
+@item -ftrack-subreg-liveness
+Enable tracking subreg liveness information. This infomation allows IRA
+and LRA to support subreg coalesce feature which can improve the quality
+of register allocation.
+
+This option is enabled at level @option{-O3} for all targets.
+
 @opindex flra-remat
 @item -flra-remat
 Enable CFG-sensitive rematerialization in LRA.  Instead of loading
diff --git a/gcc/opts.cc b/gcc/opts.cc
index a90dc57f8b5..7b5d905a241 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -689,6 +689,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
 
 /* -O3 parameters.  */
 { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
-- 
2.25.1

[PATCH V2 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-04-24 Thread Lehua Ding

Hi Vladimir and Richard,

These patches are used to add a new data flow DF_LIVE_SUBREG,
which will track subreg liveness and then apply it to IRA and LRA
passes (enabled via -O3 or -ftrack-subreg-liveness). These patches
are for GCC 15. And these codes are pushed to the devel/subreg-coalesce
branch. In addition, my colleague Shuo Chen will also be involved in some
of the remain work, thank you for your support.

These patches are separated from the subreg-coalesce patches submitted
a few months ago. I refactored the code according to comments. The next
patches will support subreg coalesce base on they. Here are some data
abot build time of SPEC INT 2017 (x86-64 target):

  baseline   baseline(+track-subreg-liveness)
specint2017 build time :  1892s  1883s

Regarding build times, I've run it a few times, but they all seem to take
much less time. Since the difference is small, it's possible that it's just
a change in environment. But it's theoretically possible, since supporting
subreg-liveness could have reduced the number of living regs.

For memory usage, I trided PR 69609 by valgrind, peak memory size grow from
2003910656 to 2003947520, very small increase.

No regression on x86-64

Co-authored-by: Shuo Chen 

Best,
Lehua

Lehua Ding (4):
  df: Add -ftrack-subreg-liveness option
  df: Add DF_LIVE_SUBREG problem
  ira: Apply DF_LIVE_SUBREG data
  lra: Apply DF_LIVE_SUBREG data

 gcc/Makefile.in  |   1 +
 gcc/common.opt   |   4 +
 gcc/common.opt.urls  |   3 +
 gcc/df-problems.cc   | 855 ++-
 gcc/df.h | 155 +++
 gcc/doc/invoke.texi  |   8 +
 gcc/ira-build.cc |   7 +-
 gcc/ira-color.cc |   8 +-
 gcc/ira-emit.cc  |  12 +-
 gcc/ira-lives.cc |   7 +-
 gcc/ira.cc   |  19 +-
 gcc/lra-coalesce.cc  |  27 +-
 gcc/lra-constraints.cc   | 109 -
 gcc/lra-int.h|   4 +
 gcc/lra-lives.cc | 357 
 gcc/lra-remat.cc |   8 +-
 gcc/lra-spills.cc|  27 +-
 gcc/lra.cc   |  10 +-
 gcc/opts.cc  |   1 +
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 +
 gcc/sbitmap.h|   2 +
 gcc/subreg-live-range.cc |  53 +++
 gcc/subreg-live-range.h  | 206 ++
 gcc/timevar.def  |   1 +
 25 files changed, 1851 insertions(+), 136 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.25.1

[gcc/devel/subreg-coalesce] lra: Apply DF_LIVE_SUBREG data

2024-04-24 Thread Lehua Ding via Gcc-cvs

https://gcc.gnu.org/g:cde1363042b2857111e461968a6367381d46c936

commit cde1363042b2857111e461968a6367381d46c936
Author: Lehua Ding 
Date:   Fri Feb 2 10:35:37 2024 +0800

lra: Apply DF_LIVE_SUBREG data

This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos 
to
include new partial_def/use and range_def/use fileds for 
DF_LIVE_SUBREG
problem.
(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg 
liveness.
(live_con_fun_0): Adjust Confluence function to support subreg 
liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.

Diff:
---
 gcc/lra-coalesce.cc|  27 +++-
 gcc/lra-constraints.cc | 109 ---
 gcc/lra-int.h  |   4 +
 gcc/lra-lives.cc   | 357 +++--
 gcc/lra-remat.cc   |   8 +-
 gcc/lra-spills.cc  |  27 +++-
 gcc/lra.cc |  10 +-
 7 files changed, 430 insertions(+), 112 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index a9b5b51cb3f..9416775a009 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -186,19 +186,28 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
-  if (! bitmap_empty_p (_pseudos_bitmap))
+  if (!bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  if (flag_track_subreg_liveness)
+   {
+ bitmap_and_compl_into (full, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+ bitmap_and_compl_into (partial, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
+   }
 }
 }
 
@@ -301,8 +310,12 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (df_get_subreg_live_in (bb),
+   df_get_subreg_live_full_in (bb),
+   df_get_subreg_live_partial_in (bb));
+  update_live_info (df_get_subreg_live_out (bb),
+   df_get_subreg_live_full_out (bb),
+   df_get_subreg_live_partial_out (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 10e3d4e4097..9586e5602e4 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6515,34 +6515,86 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Updat

[gcc/devel/subreg-coalesce] ira: Apply DF_LIVE_SUBREG data

2024-04-24 Thread Lehua Ding via Gcc-cvs

https://gcc.gnu.org/g:cf327312a72fe55d7e06a84bbae3d5de649a1ed3

commit cf327312a72fe55d7e06a84bbae3d5de649a1ed3
Author: Lehua Ding 
Date:   Fri Feb 2 10:35:17 2024 +0800

ira: Apply DF_LIVE_SUBREG data

This patch simple replace df_get_live_in to df_get_subreg_live_in
and replace df_get_live_out to df_get_subreg_live_out.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch to DF_LIVE_SUBREG df 
data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.

Diff:
---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 19 ---
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..283ff36d3dd 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1921,7 +1921,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_in (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1937,9 +1938,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = df_get_subreg_live_in (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_out (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..bfebc48ef83 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2786,8 +2786,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2795,8 +2795,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index d347f11fa02..8075b082e36 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = df_get_subreg_live_in (e->dest);
+  regs_live_out_src = df_get_subreg_live_out (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, df_get_subreg_live_in (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, df_get_subreg_

[gcc/devel/subreg-coalesce] df: Add DF_LIVE_SUBREG problem

2024-04-24 Thread Lehua Ding via Gcc-cvs

https://gcc.gnu.org/g:8e76084576fb8e0054fa19e3bc16e97d05c10630

commit 8e76084576fb8e0054fa19e3bc16e97d05c10630
Author: Lehua Ding 
Date:   Tue Jan 30 16:47:25 2024 +0800

df: Add DF_LIVE_SUBREG problem

This patch add a new DF problem, named DF_LIVE_SUBREG. This problem
is extended from the DF_LR problem and support track the subreg liveness
of multireg pseudo if these pseudo satisfy the following conditions:

  1. the mode size greater than it's REGMODE_NATURAL_SIZE.
  2. the reg is used in insns via subreg pattern.

The main methods are as follows:

  1. split bitmap in/out/def/use fileds to full_in/out/def/use and
 partial_in/out/def/use. If a pseudo need to be tracked it's subreg
 liveness, then it is recorded in partial_in/out/def/use fileds.
 Meantimes, there are range_in/out/def/use fileds which records the live
 range of the tracked pseudo.
  2. in the df_live_subreg_finalize function, we move the tracked pseudo 
from
 the partial_in/out/def/use to full_in/out/def/use if the pseudo's live
 range is full.

gcc/ChangeLog:

* Makefile.in: Add subreg-live-range object file.
* df-problems.cc (struct df_live_subreg_problem_data): Private 
struct
for DF_LIVE_SUBREG problem.
(df_live_subreg_get_bb_info): getting bb regs in/out data.
(get_live_subreg_local_bb_info): getting bb regs use/def data.
(multireg_p): checking is the regno a pseudo multireg.
(need_track_subreg_p): checking is the regno need to be tracked.
(init_range): getting the range of subreg rtx.
(remove_subreg_range): removing use data for the reg/subreg rtx.
(add_subreg_range): adding def/use data for the reg/subreg rtx.
(df_live_subreg_free_bb_info): Free basic block df data.
(df_live_subreg_alloc): Allocate and init df data.
(df_live_subreg_reset): Reset the live in/out df data.
(df_live_subreg_bb_local_compute): Compute basic block df data.
(df_live_subreg_local_compute): Compute all basic blocks df data.
(df_live_subreg_init): Init the in/out df data.
(df_live_subreg_check_result): Assert the full and partial df data.
(df_live_subreg_confluence_0): Confluence function for infinite 
loops.
(df_live_subreg_confluence_n): Confluence function for normal edge.
(df_live_subreg_transfer_function): Transfer function.
(df_live_subreg_finalize): Finalize the all_in/all_out df data.
(df_live_subreg_free): Free the df data.
(df_live_subreg_top_dump): Dump top df data.
(df_live_subreg_bottom_dump): Dump bottom df data.
(df_live_subreg_add_problem): Add the DF_LIVE_SUBREG problem.
* df.h (enum df_problem_id): Add DF_LIVE_SUBREG.
(class subregs_live): Simple decalare.
(class df_live_subreg_local_bb_info): New class for full/partial 
def/use
df data.
(class df_live_subreg_bb_info): New class for full/partial in/out
df data.
(df_live_subreg): getting the df_live_subreg data.
(df_live_subreg_add_problem): Exported.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_check_result): Ditto.
(multireg_p): Ditto.
(init_range): Ditto.
(add_subreg_range): Ditto.
(remove_subreg_range): Ditto.
(df_get_subreg_live_in): Accessor the all_in df data.
(df_get_subreg_live_out): Accessor the all_out df data.
(df_get_subreg_live_full_in): Accessor the full_in df data.
(df_get_subreg_live_full_out): Accessor the full_out df data.
(df_get_subreg_live_partial_in): Accessor the partial_in df data.
(df_get_subreg_live_partial_out): Accessor the partial_out df data.
(df_get_subreg_live_range_in): Accessor the range_in df data.
(df_get_subreg_live_range_out): Accessor the range_out df data.
* regs.h (get_nblocks): Get the blocks of mode.
* sbitmap.cc (bitmap_full_p): sbitmap predicator.
(bitmap_same_p): sbitmap predicator.
(test_full): test bitmap_full_p.
(test_same): test bitmap_same_p.
(sbitmap_cc_tests): Add test_full and test_same.
* sbitmap.h (bitmap_full_p): Exported.
(bitmap_same_p): Ditto.
* timevar.def (TV_DF_LIVE_SUBREG): add DF_LIVE_SUBREG timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.

Diff:
---
 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 855 ++-
 gcc/df.h | 155 +
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 ++
 gcc/sbitmap.h

[gcc/devel/subreg-coalesce] df: Add -ftrack-subreg-liveness option

2024-04-24 Thread Lehua Ding via Gcc-cvs

https://gcc.gnu.org/g:b6b50e19f88bd33b6c0d252795ebb6cffda9574f

commit b6b50e19f88bd33b6c0d252795ebb6cffda9574f
Author: Lehua Ding 
Date:   Tue Jan 30 16:45:25 2024 +0800

df: Add -ftrack-subreg-liveness option

Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness.
This flag is enabled at -O3/fast.

gcc/ChangeLog:

* common.opt: add -ftrack-subreg-liveness option.
* opts.cc: auto aneble -ftrack-subreg-liveness in -O3/fast

Diff:
---
 gcc/common.opt  | 4 
 gcc/common.opt.urls | 3 +++
 gcc/doc/invoke.texi | 8 
 gcc/opts.cc | 1 +
 4 files changed, 16 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index ad348844775..bd030973434 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2157,6 +2157,10 @@ fira-share-spill-slots
 Common Var(flag_ira_share_spill_slots) Init(1) Optimization
 Share stack slots for spilled pseudo-registers.
 
+ftrack-subreg-liveness
+Common Var(flag_track_subreg_liveness) Init(0) Optimization
+Track subreg liveness information.
+
 fira-verbose=
 Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
 -fira-verbose= Control IRA's level of diagnostic messages.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index f71ed80a34b..59f27a6f7c6 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -880,6 +880,9 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fira-share-save-slots)
 fira-share-spill-slots
 UrlSuffix(gcc/Optimize-Options.html#index-fira-share-spill-slots)
 
+ftrack-subreg-liveness
+UrlSuffix(gcc/Optimize-Options.html#index-ftrack-subreg-liveness)
+
 fira-verbose=
 UrlSuffix(gcc/Developer-Options.html#index-fira-verbose)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27c31ab0c86..9724cbb32ba 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13186,6 +13186,14 @@ Disable sharing of stack slots allocated for 
pseudo-registers.  Each
 pseudo-register that does not get a hard register gets a separate
 stack slot, and as a result function stack frames are larger.
 
+@opindex ftrack-subreg-liveness
+@item -ftrack-subreg-liveness
+Enable tracking subreg liveness information. This infomation allows IRA
+and LRA to support subreg coalesce feature which can improve the quality
+of register allocation.
+
+This option is enabled at level @option{-O3} for all targets.
+
 @opindex flra-remat
 @item -flra-remat
 Enable CFG-sensitive rematerialization in LRA.  Instead of loading
diff --git a/gcc/opts.cc b/gcc/opts.cc
index a90dc57f8b5..7b5d905a241 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -689,6 +689,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
 
 /* -O3 parameters.  */
 { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },

[gcc/devel/subreg-coalesce] (111 commits) tree-optimization/114787 - more careful loop update with CF

2024-04-24 Thread Lehua Ding via Gcc-cvs

The branch 'devel/subreg-coalesce' was updated to point to:

 cc48418cfc2... tree-optimization/114787 - more careful loop update with CF

It previously pointed to:

 443748259d9... libstdc++: Fix "extact" typos in comments

Diff:

Summary of changes (added commits):
---

  cc48418... tree-optimization/114787 - more careful loop update with CF (*)
  e28e8ab... tree-optimization/114832 - wrong dominator info with vect p (*)
  d279c9d... i386: Fix behavior for both using AVX10.1-256 in options an (*)
  f952745... RISC-V: Add xfail test case for highpart overlap of vext.vf (*)
  8bcefc2... Revert "RISC-V: Support highpart overlap for vext.vf" (*)
  3091f1d... Daily bump. (*)
  7318f1a... c++: Fix ICE with xobj parms and maybe incomplete decl-spec (*)
  628c222... i386: Avoid =,r,r andn double-word alternative for ia32 [ (*)
  f7a5c99... Regenerate gcc.pot (*)
  0bf94da... Fortran: check C_SIZEOF on additions from TS29113/F2018 [PR (*)
  4f9401d... c++/modules: deduced return type merging [PR114795] (*)
  d2f05fe... libbacktrace: test --compress-debug-sections=ARG for each A (*)
  0c8e99e... testsuite: Adjust testsuite expectations for diagnostic spe (*)
  6f0a646... Remove repeated information in -ftree-loop-distribute-patte (*)
  f994094... Further spelling fixes in translatable strings (*)
  4338ac1... Spelling fixes for translatable strings (*)
  3d56999... s390: testsuite: Xfail forwprop-4{0,1}.c (*)
  ca00bf0... Fortran: Check that the ICE does not reappear [PR102597] (*)
  18e8e55... tree-optimization/114799 - SLP and patterns (*)
  42189f2... s390x: Fix vec_xl/vec_xst type aliasing [PR114676] (*)
  aa73eb9... c++: Copy over DECL_DISREGARD_INLINE_LIMITS flag to inherit (*)
  cf51fe7... c++: Check if allocation functions are xobj members [PR1140 (*)
  77e114b... LoongArch: Define builtin macros for ISA evolutions (*)
  b4ebdd1... LoongArch: Define ISA versions (*)
  8c6ee63... Daily bump. (*)
  2a8187e... RISC-V: Adjust overlap attr after revert d3544cea63d and e6 (*)
  b909daa... PR modula2/114811 string set incl ICE bugfix (*)
  7ef1391... libstdc++: Fix conversion of simd to vector builtin (*)
  e7a3ad2... libstdc++: Silence irrelevant warnings in 3 constraints [PR114783] (*)
  2afdecc... c-family: Allow arguments with NULLPTR_TYPE as sentinels [P (*)
  a39983b... c: Fix ICE with -g and -std=c23 related to incomplete types (*)
  d86472a... libstdc++: Simplify constraints on <=> for std::reference_w (*)
  eed7fb1... libstdc++: Support link chains in std::chrono::tzdb::locate (*)
  e8f0540... Update gcc sv.po (*)
  33bf8e5... internal-fn: Fix up expand_arith_overflow [PR114753] (*)
  1216460... middle-end: refactory vect_recog_absolute_difference to sim (*)
  9451b6c... Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768] (*)
  ede01df... bpf: remove huge memory waste with string allocation. (*)
  d7190d0... bpf: support more instructions to match CO-RE relocations (*)
  4d4929f... d: Fix ICE in build_deref, at d/d-codegen.cc:1650 [PR111650 (*)
  9f29584... rtlanal: Fix set_noop_p for volatile loads or stores [PR114 (*)
  36f4c8a... libgcc: Another __divmodbitint4 bug fix [PR114762] (*)
  694fa37... [vxworks] avoid mangling __STDC_VERSION_LIMITS_H__ (*)
  85c187b... Daily bump. (*)
  e498ba9... Add nios2*-*-* to the list of obsolete targets (*)
  e243d0f... Fortran: Fix ICE and clear incorrect error messages [PR1147 (*)
  7eecc08... [testsuite] [i386] add -msse2 to tests that require it (*)
  0ea96af... [testsuite] [i386] work around fails with --enable-frame-po (*)
  36d0038... [testsuite] [arm] accept empty init for bfloat16 (*)
  ce2dfc5... [c++] [testsuite] adjust contracts9.C for negative addresse (*)
  df92df0... [testsuite] [aarch64] Require fpic effective target. (*)
  514c6b1... [testsuite] [i386] require fpic for pr111497.C (*)
  cc02ebf... [testsuite] xfail pr103798-2 in C++ on vxworks too [PR11370 (*)
  e965162... [testsuite] [analyzer] include sys/select.h if available (*)
  8a11709... [testsuite] [analyzer] require fork where used (*)
  5be4f20... [testsuite] [analyzer] skip access-mode: O_ACCMODE on vxwor (*)
  76a1bcc... [testsuite] [analyzer] avoid vxworks libc mode_t (*)
  5dfbc05... [testsuite] introduce strndup effective target (*)
  dcf0bd1... [libstdc++] [testsuite] disable SRA for compare_exchange_pa (*)
  5b17817... [libstdc++] [testsuite] xfail double-prec from_chars for fl (*)
  da3504a... [libstdc++] define zoneinfo_dir_override on vxworks (*)
  a2f4be3... AArch64: remove reliance on register allocator for simd/gpr (*)
  82d6d38... libgcc: Fix up __divmodbitint4 [PR114755] (*)
  6c152c9... internal-fn: Temporarily disable flag_trapv during .{ADD,SU (*)
  6e62ede... testsuite, rs6000: Fix builtins-6-p9-runnable.c for BE [PR1 (*)
  58a0b19... rs6000: Fix bcd test case (*)
  69576bc... Daily bump. (*)
  7c2a9db... libstdc++: Implement "Printing blank lines with println" fo (*)
  5705614... DOCUMENTATION_ROOT_URL vs. release branches [PR114738]

[gcc] Created branch 'devel/subreg-coalesce'

2024-04-24 Thread Lehua Ding via Gcc-cvs

The branch 'devel/subreg-coalesce' was created pointing to:

 443748259d9... libstdc++: Fix "extact" typos in comments

Re: [PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-05 Thread Lehua Ding





On 2024/2/6 2:17, Joseph Myers wrote:

This series appears to be missing documentation for the new option in
invoke.texi.



OK, I'll add that. Thanks.

--
Best,
Lehua (RiVAI)

Re: [PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-05 Thread Lehua Ding




On 2024/2/6 0:10, Jeff Law wrote:
Just a note.  I doubt this will get much traction from a review 
standpoint until gcc-14 is basically out the door.


My recommendation is to continue development, bugfixing, cleanup, etc 
between now and then.  Consider creating a branch for the work in the 
upstream repo.


OK, thanks for the guidance.

--
Best,
Lehua (RiVAI)

Re: [PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-04 Thread Lehua Ding


For SPEC INT 2017, when using upstream GCC (whitout these patches), I get a
coredump when training the peak case, so no data yet. The cause of the core
dump still needs to be investigated.


Typo, SPEC INT 2017 -> SPEC FP 2017
Also There is a bad news, the score of specint 2017 (with these patches) 
is dropped, a bit strange and I need to be locating the cause.


--
Best,
Lehua (RiVAI)

[PATCH 3/4] ira: Apply DF_LIVE_SUBREG data

2024-02-03 Thread Lehua Ding

This patch simple replace df_get_live_in to df_get_subreg_live_in
and replace df_get_live_out to df_get_subreg_live_out.

gcc/ChangeLog:

* ira-build.cc (create_bb_allocnos): Switch to DF_LIVE_SUBREG df data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(add_conflict_from_region_landing_pads): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.

---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 19 ---
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..283ff36d3dd 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1921,7 +1921,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_in (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1937,9 +1938,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = df_get_subreg_live_in (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_out (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..bfebc48ef83 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2786,8 +2786,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2795,8 +2795,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index d347f11fa02..8075b082e36 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = df_get_subreg_live_in (e->dest);
+  regs_live_out_src = df_get_subreg_live_out (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, df_get_subreg_live_in (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, df_get_subreg_live_out (bb));
   add_range_and_copies_from_move_list
(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- bitmap_and (live_through,
- df_get_live_in (e->dest), df_get_live_out (bb));
+ bitmap_and (live_through, df_get_subreg_live_in (e->dest),
+ df_get_subreg_live_out (bb));

[PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-02-03 Thread Lehua Ding

This patch apply the DF_LIVE_SUBREG to LRA pass. More changes were made
to the LRA than the IRA since the LRA will modify the DF data directly.
The main big changes are centered on the lra-lives.cc file.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Extend to DF_LIVE_SUBREG.
(lra_coalesce): Ditto.
* lra-constraints.cc (update_ebb_live_info): Ditto.
(get_live_on_other_edges): Ditto.
(inherit_in_ebb): Ditto.
(lra_inheritance): Ditto.
(fix_bb_live_info): Ditto.
(remove_inheritance_pseudos): Ditto.
* lra-int.h (GCC_LRA_INT_H): include subreg-live-range.h
(struct lra_insn_reg): Add op filed to record the corresponding rtx.
* lra-lives.cc (class bb_data_pseudos): Extend the bb_data_pseudos to
include new partial_def/use and range_def/use fileds for DF_LIVE_SUBREG
problem.
(need_track_subreg_p): checking is the regno need to be tracked.
(make_hard_regno_live): switch to live_subreg filed.
(make_hard_regno_dead): Ditto.
(mark_regno_live): Support record subreg liveness.
(mark_regno_dead): Ditto.
(live_trans_fun): Adjust transfer function to support subreg liveness.
(live_con_fun_0): Adjust Confluence function to support subreg liveness.
(live_con_fun_n): Ditto.
(initiate_live_solver): Ditto.
(finish_live_solver): Ditto.
(process_bb_lives): Ditto.
(lra_create_live_ranges_1): Dump subreg liveness.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Switch to
DF_LIVE_SUBREG df data.
(calculate_livein_cands): Ditto.
(do_remat): Ditto.
* lra-spills.cc (spill_pseudos): Ditto.
* lra.cc (new_insn_reg): New argument op.
(add_regs_to_insn_regno_info): Add new argument op.
---
 gcc/lra-coalesce.cc|  27 +++-
 gcc/lra-constraints.cc | 109 ++---
 gcc/lra-int.h  |   4 +
 gcc/lra-lives.cc   | 357 -
 gcc/lra-remat.cc   |   8 +-
 gcc/lra-spills.cc  |  27 +++-
 gcc/lra.cc |  10 +-
 7 files changed, 430 insertions(+), 112 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index a9b5b51cb3f..9416775a009 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -186,19 +186,28 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
-  if (! bitmap_empty_p (_pseudos_bitmap))
+  if (!bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  if (flag_track_subreg_liveness)
+   {
+ bitmap_and_compl_into (full, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+ bitmap_and_compl_into (partial, _pseudos_bitmap);
+ bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
+   }
 }
 }
 
@@ -301,8 +310,12 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (df_get_subreg_live_in (bb),
+   df_get_subreg_live_full_in (bb),
+   df_get_subreg_live_partial_in (bb));
+  update_live_info (df_get_subreg_live_out (bb),
+   df_get_subreg_live_full_out (bb),
+   df_get_subreg_live_partial_out (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0ae81c1ff9c..d1316620f51 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6505,34 +6505,86 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = df_get_subreg_live_in (prev_bb);
+ bitmap subreg_full_in = df_get_subreg_live_full_in (prev_bb);
+ bitmap subreg_partial_in = df_get_subreg_live_partial_in 
(prev_bb);
+ subregs_live *range_in = df_get_subreg_live_range_in (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j,

[PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-02-03 Thread Lehua Ding

+  if (end_bitno == 0)
+return true;
+
+  gcc_checking_assert (i + 1 == a->size);
+
+  SBITMAP_ELT_TYPE mask = ((SBITMAP_ELT_TYPE) 1 << end_bitno) - 1;
+
+  /* Make sure the tail bits are same.  */
+  return (a->elms[i] & mask) == (b->elms[i] & mask);
+}
+
 /* Set DST to be (A or (B and C)).
Return nonzero if any change is made.  */
 
@@ -994,6 +1047,49 @@ test_bit_in_range ()
   sbitmap_free (s);
 }
 
+/* Verify bitmap_full_p functions for sbitmap.  */
+
+static void
+test_full ()
+{
+  sbitmap s = sbitmap_alloc (193);
+
+  bitmap_clear (s);
+  ASSERT_FALSE (bitmap_full_p (s));
+
+  bitmap_ones (s);
+  ASSERT_TRUE (bitmap_full_p (s));
+
+  bitmap_clear_bit (s, 192);
+  ASSERT_FALSE (bitmap_full_p (s));
+
+  bitmap_ones (s);
+  bitmap_clear_bit (s, 17);
+  ASSERT_FALSE (bitmap_full_p (s));
+}
+
+/* Verify bitmap_same_p functions for sbitmap.  */
+
+static void
+test_same ()
+{
+  sbitmap s1 = sbitmap_alloc (193);
+  sbitmap s2 = sbitmap_alloc (193);
+  sbitmap s3 = sbitmap_alloc (192);
+  
+  ASSERT_FALSE (bitmap_same_p (s1, s3));
+
+  bitmap_clear (s1);
+  bitmap_clear (s2);
+  ASSERT_TRUE (bitmap_same_p (s1, s2));
+  
+  bitmap_set_bit (s2, 192);
+  ASSERT_FALSE (bitmap_same_p (s1, s2));
+  
+  bitmap_set_bit (s1, 192);
+  ASSERT_TRUE (bitmap_same_p (s1, s2));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -1001,6 +1097,8 @@ sbitmap_cc_tests ()
 {
   test_set_range ();
   test_bit_in_range ();
+  test_full ();
+  test_same ();
 }
 
 } // namespace selftest
diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
index da6116ce925..71cfded9fb2 100644
--- a/gcc/sbitmap.h
+++ b/gcc/sbitmap.h
@@ -267,6 +267,7 @@ extern void bitmap_copy (sbitmap, const_sbitmap);
 extern bool bitmap_equal_p (const_sbitmap, const_sbitmap);
 extern unsigned int bitmap_count_bits (const_sbitmap);
 extern bool bitmap_empty_p (const_sbitmap);
+extern bool bitmap_full_p (const_sbitmap);
 extern void bitmap_clear (sbitmap);
 extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
 extern void bitmap_set_range (sbitmap, unsigned, unsigned);
@@ -287,6 +288,7 @@ extern bool bitmap_and (sbitmap, const_sbitmap, 
const_sbitmap);
 extern bool bitmap_ior (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_xor (sbitmap, const_sbitmap, const_sbitmap);
 extern bool bitmap_subset_p (const_sbitmap, const_sbitmap);
+extern bool bitmap_same_p (const_sbitmap, const_sbitmap);
 extern bool bitmap_bit_in_range_p (const_sbitmap, unsigned int, unsigned int);
 
 extern int bitmap_first_set_bit (const_sbitmap);
diff --git a/gcc/subreg-live-range.cc b/gcc/subreg-live-range.cc
new file mode 100644
index 000..fe8d4210eb6
--- /dev/null
+++ b/gcc/subreg-live-range.cc
@@ -0,0 +1,53 @@
+/* SUBREG liveness tracking classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "subreg-live-range.h"
+
+void
+subregs_live::dump (FILE *file, const char *indent) const
+{
+  if (lives.empty ())
+{
+  fprintf (file, "%sempty\n", indent);
+  return;
+}
+  fprintf (file, "%s", indent);
+  for (auto  : lives)
+{
+  const_sbitmap range = kv.second;
+  if (bitmap_empty_p (range))
+   continue;
+  fprintf (file, "%d: ", kv.first);
+  if (!bitmap_full_p (range))
+   {
+ dump_bitmap_file (file, range);
+ fprintf (file, ",  ");
+   }
+  else
+fprintf (file, "full, ");
+}
+  fprintf (file, "\n");
+}
+
+DEBUG_FUNCTION void
+debug (const subregs_live )
+{
+  l.dump (stderr, "");
+}
diff --git a/gcc/subreg-live-range.h b/gcc/subreg-live-range.h
new file mode 100644
index 000..c0b88071858
--- /dev/null
+++ b/gcc/subreg-live-range.h
@@ -0,0 +1,206 @@
+/* SUBREG liveness tracking classes for DF & IRA & LRA.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Lehua Ding (lehua.d...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (

[PATCH 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-02-03 Thread Lehua Ding

Hi,

These patches are used to add a new data flow DF_LIVE_SUBREG,
which will track subreg liveness and then apply it to IRA and LRA
passes (enabled via -O3 or -ftrack-subreg-liveness). These patches
are for GCC 15.

These patches are separated from the subreg-coalesce patches submitted
a few months ago. I refactored the code according to comments. The next
patches will support subreg coalesce base on they. Here are some data
abot build time of SPEC INT 2017 (x86-64 target):

  baseline   baseline(+track-subreg-liveness)
specint2017 build time :  1892s  1883s

Regarding build times, I've run it a few times, but they all seem to take
much less time. Since the difference is small, it's possible that it's just
a change in environment. But it's theoretically possible, since supporting
subreg-liveness could have reduced the number of living regs.

For memory usage, I trided PR 69609 by valgrind, peak memory size grow from
2003910656 to 2003947520, very small increase.

For SPEC INT 2017, when using upstream GCC (whitout these patches), I get a
coredump when training the peak case, so no data yet. The cause of the core
dump still needs to be investigated.

No regression on x86-64, AArch64 and RISC-V target.

Best,
Lehua

Lehua Ding (4):
  df: Add -ftrack-subreg-liveness option
  df: Add DF_LIVE_SUBREG problem
  ira: Apply DF_LIVE_SUBREG data
  lra: Apply DF_LIVE_SUBREG data

 gcc/Makefile.in  |   1 +
 gcc/common.opt   |   4 +
 gcc/df-problems.cc   | 855 ++-
 gcc/df.h | 155 +++
 gcc/ira-build.cc |   7 +-
 gcc/ira-color.cc |   8 +-
 gcc/ira-emit.cc  |  12 +-
 gcc/ira-lives.cc |   7 +-
 gcc/ira.cc   |  19 +-
 gcc/lra-coalesce.cc  |  27 +-
 gcc/lra-constraints.cc   | 109 -
 gcc/lra-int.h|   4 +
 gcc/lra-lives.cc | 355 
 gcc/lra-remat.cc |   8 +-
 gcc/lra-spills.cc|  27 +-
 gcc/lra.cc   |  10 +-
 gcc/opts.cc  |   1 +
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 +
 gcc/sbitmap.h|   2 +
 gcc/subreg-live-range.cc |  53 +++
 gcc/subreg-live-range.h  | 206 ++
 gcc/timevar.def  |   1 +
 23 files changed, 1839 insertions(+), 135 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3

[PATCH 1/4] df: Add -ftrack-subreg-liveness option

2024-02-03 Thread Lehua Ding

Add new flag -ftrack-subreg-liveness to enable track-subreg-liveness.
This flag is enabled at -O3/fast.

gcc/ChangeLog:

* common.opt: add -ftrack-subreg-liveness option.
* opts.cc: auto aneble -ftrack-subreg-liveness in -O3/fast

---
 gcc/common.opt | 4 
 gcc/opts.cc| 1 +
 2 files changed, 5 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index 51c4a17da83..d4592c6426a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2156,6 +2156,10 @@ fira-verbose=
 Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
 -fira-verbose= Control IRA's level of diagnostic messages.
 
+ftrack-subreg-liveness
+Common Var(flag_track_subreg_liveness) Init(0) Optimization
+Track subreg liveness information for IRA and LRA, enabled at -O3.
+
 fivopts
 Common Var(flag_ivopts) Init(1) Optimization
 Optimize induction variables on trees.
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 600e0ea..50c0b62c5af 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -689,6 +689,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
 
 /* -O3 parameters.  */
 { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
-- 
2.36.3

Re: [PATCH V3 1/7] df: Add DF_LIVE_SUBREG problem

2023-11-20 Thread Lehua Ding


Hi Richard,

On 2023/11/21 4:11, Richard Sandiford wrote:

Lehua Ding  writes:

This patch adds a live_subreg problem to extend the original live_reg to
track the liveness of subreg. We will only try to trace speudo registers
who's mode size is a multiple of nature size and eventually a small portion
of the inside will appear to use subreg. With live_reg problem, live_subreg
prbolem will have the following output. full_in/out mean the entire pesudo
live in/out, partial_in/out mean the subregs of the pesudo are live in/out,
and range_in/out indicates which part of the pesudo is live. all_in/out is
the union of full_in/out and partial_in/out:

   bitmap_head all_in, full_in;
   bitmap_head all_out, full_out;
   bitmap_head partial_in;
   bitmap_head partial_out;
   subregs_live *range_in = NULL;
   subregs_live *range_out = NULL;


I haven't fully processed the patch yet, sorry.  And I think I might be
about to cover things that you dealt with elsewhere.

My assumption going into this was that a subreg liveness tracker would
work as follows:

- First we would work out which registers need to have subreg tracking.
   This could be done ahead of time by iterating over regno_reg_rtx.
   The condition in need_track_subreg looks like the correct one.

   For every other register, subreg liveness degenerates to the existing
   liveness problems.  Such registers can be ignored.

- We would assign a unique identifier to each subreg that we want to track,
   with subregs for the same register being consecutive.

- There would be a mapping from pseudo registers to the first subreg
   that we want to track.  The mapping would probably just be a linear
   array, but perhaps there are times when something more compact is
   appropriate.

- The dataflow problem itself would then be very similar to the existing
   ones.  But rather than computing bitmaps with a single bit per register,
   we'd be computing bitmaps that have N bits for N-register pseudos
   (and no bits for single-register pseudos).

- There would be helper functions that consumers could use to iterate
   over a block.  E.g. for a backwards walk over a block, a consumer
   would start with the bitmap of live-out subregs.  It would then use
   these helper functions to keep the values up-to-date as it moves
   up through the block.

   That's done for normal liveness via the df_simulate_* helpers.
   But now that the codebase is C++, it might be more convenient for
   the subreg code to provide classes for walking a block.

That should be relatively compile-time-friendly, although I agree
with Vlad of course that DF does have efficiency problems.  The nature
of the way it works makes it at least O(#blocks * #regs).

Did you consider doing it that way?  Or does it not provide the
information that you need?


Thanks for providing such detailed instructions, this looks like it 
should perform well. I'll give it a try and come back with any questions.





gcc/ChangeLog:

* Makefile.in: Add new object file.
* df-problems.cc (struct df_live_subreg_problem_data):
The data of the new live_subreg problem.
(need_track_subreg): New function.
(get_range): Ditto.
(remove_subreg_range): Ditto.
(add_subreg_range): Ditto.
(df_live_subreg_free_bb_info): Ditto.
(df_live_subreg_alloc): Ditto.
(df_live_subreg_reset): Ditto.
(df_live_subreg_bb_local_compute): Ditto.
(df_live_subreg_local_compute): Ditto.
(df_live_subreg_init): Ditto.
(df_live_subreg_check_result): Ditto.
(df_live_subreg_confluence_0): Ditto.
(df_live_subreg_confluence_n): Ditto.
(df_live_subreg_transfer_function): Ditto.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_free): Ditto.
(df_live_subreg_top_dump): Ditto.
(df_live_subreg_bottom_dump): Ditto.
(df_live_subreg_add_problem): Ditto.
* df.h (enum df_problem_id): Add live_subreg id.
(DF_LIVE_SUBREG_INFO): Data accessor.
(DF_LIVE_SUBREG_IN): Ditto.
(DF_LIVE_SUBREG_OUT): Ditto.
(DF_LIVE_SUBREG_FULL_IN): Ditto.
(DF_LIVE_SUBREG_FULL_OUT): Ditto.
(DF_LIVE_SUBREG_PARTIAL_IN): Ditto.
(DF_LIVE_SUBREG_PARTIAL_OUT): Ditto.
(DF_LIVE_SUBREG_RANGE_IN): Ditto.
(DF_LIVE_SUBREG_RANGE_OUT): Ditto.
(class subregs_live): New class.
(class basic_block_subreg_live_info): Ditto.
(class df_live_subreg_bb_info): Ditto.
(df_live_subreg): Ditto.
(df_live_subreg_add_problem): Ditto.
(df_live_subreg_finalize): Ditto.
(class subreg_range): Ditto.
(need_track_subreg): Ditto.
(remove_subreg_range): Ditto.
(add_subreg_range): Ditto.
(df_live_subreg_get_bb_info): Ditto.
* regs.h (get_nblocks): Helper function.
* timevar.def (TV_DF_LIVE_SUBREG): New timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New

Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-18 Thread Lehua Ding





On 2023/11/18 16:24, Sam James wrote:


Lehua Ding  writes:


Hi Sam,

On 2023/11/18 16:06, Sam James wrote:

Lehua Ding  writes:


Hi Vladimir,

On 2023/11/17 22:05, Vladimir Makarov wrote:

On 11/16/23 21:06, Lehua Ding wrote:

Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel
like there are a lot of issues, especially the long compile time
issue. So I'm going to reorganize and refactor the patches so that
as many of them as possible can be reviewed separately. this way
there will be fewer patches to support subreg in the end. I plan to
split it into four separate patches like bellow. What do you think?


I can wait for the new version patches.  The only issue is stage1 deadline.
In my opinion, I'd recommend to work on the patches more and start
their submission right before GCC-14 release (somewhere in April).


Quite agree, I'll rewrite the patches a bit better before resend new
version patchs, stage 1 is definitely too late. When you say before
GCC-14 release do you mean at GCC 14 stage 3? Is it possible to commit
such changes at stage 3? I was thinking that if I miss GCC 14 stage 1
I should have to wait until GCC 15 stage 1.

I took it to mean "submit it during GCC 14 stage 3 for merging
during
GCC 15 stage 1", as the idea would be that if you're basing it on the
state of the tree & doing further/final testing on GCC 14 stage 3,
the tree should be in a stable state by then with only regression fixes
going in, rather than other changes which might disrupt your testing.
This means you are not constantly rebasing and getting new test
failures
possibly due to changes other than yours. It also means lots of time to
review and fix any problems with less pressure.


Oh, looks like I misunderstood. Thanks for the correction.




You need a lot of testing for the patches: major targets (x86-64,
aarhc64, ppc64), some big endian targets, a 32-bit targets. Knowing
how even small changes in RA can affect many targets, e.g. GCC
testsuite results (there are a lot of different target tests which
expect a particular output),  it is better to do this on stabilized
GCC and stage3 is the best time for this.  In any case I'll approve
patches only if you have successful bootstraps and no GCC testsuite
regression on x86-64, ppc64le/be, aarhc64, i686.
Also you have a lot of compile time performance issues which you
need to address.  So I guess you will be overwhelmed by new
different target PRs after committing the patches if you will do
this now.  You will have more time and less pressure work if you
commit these patches in April.


Hm, I'll test the targets I can get first. I'll figure out the
other targets later.


The compiler farm can provide access to a bunch of targets and the
community may be able to help with access to others if needed.


I applied for cfarm access the other day, I'll try to use it. Thanks.


No problem. If you need some Linux targets not on the cfarm, let me
know. I can probably help with hppa/sparc at least and I know someone
with alpha, mips.


That's great. Thanks in advance.

--
Best,
Lehua (RiVAI)

Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-18 Thread Lehua Ding


Hi Sam,

On 2023/11/18 16:06, Sam James wrote:


Lehua Ding  writes:


Hi Vladimir,

On 2023/11/17 22:05, Vladimir Makarov wrote:

On 11/16/23 21:06, Lehua Ding wrote:

Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel
like there are a lot of issues, especially the long compile time
issue. So I'm going to reorganize and refactor the patches so that
as many of them as possible can be reviewed separately. this way
there will be fewer patches to support subreg in the end. I plan to
split it into four separate patches like bellow. What do you think?


I can wait for the new version patches.  The only issue is stage1 deadline.
In my opinion, I'd recommend to work on the patches more and start
their submission right before GCC-14 release (somewhere in April).


Quite agree, I'll rewrite the patches a bit better before resend new
version patchs, stage 1 is definitely too late. When you say before
GCC-14 release do you mean at GCC 14 stage 3? Is it possible to commit
such changes at stage 3? I was thinking that if I miss GCC 14 stage 1
I should have to wait until GCC 15 stage 1.


I took it to mean "submit it during GCC 14 stage 3 for merging during
GCC 15 stage 1", as the idea would be that if you're basing it on the
state of the tree & doing further/final testing on GCC 14 stage 3,
the tree should be in a stable state by then with only regression fixes
going in, rather than other changes which might disrupt your testing.

This means you are not constantly rebasing and getting new test failures
possibly due to changes other than yours. It also means lots of time to
review and fix any problems with less pressure.


Oh, looks like I misunderstood. Thanks for the correction.




You need a lot of testing for the patches: major targets (x86-64,
aarhc64, ppc64), some big endian targets, a 32-bit targets. Knowing
how even small changes in RA can affect many targets, e.g. GCC
testsuite results (there are a lot of different target tests which
expect a particular output),  it is better to do this on stabilized
GCC and stage3 is the best time for this.  In any case I'll approve
patches only if you have successful bootstraps and no GCC testsuite
regression on x86-64, ppc64le/be, aarhc64, i686.
Also you have a lot of compile time performance issues which you
need to address.  So I guess you will be overwhelmed by new
different target PRs after committing the patches if you will do
this now.  You will have more time and less pressure work if you
commit these patches in April.


Hm, I'll test the targets I can get first. I'll figure out the
other targets later.



The compiler farm can provide access to a bunch of targets and the
community may be able to help with access to others if needed.


I applied for cfarm access the other day, I'll try to use it. Thanks.

--
Best,
Lehua (RiVAI)

Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-18 Thread Lehua Ding


Hi Vladimir,

On 2023/11/17 22:05, Vladimir Makarov wrote:


On 11/16/23 21:06, Lehua Ding wrote:

Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel like 
there are a lot of issues, especially the long compile time issue. So 
I'm going to reorganize and refactor the patches so that as many of 
them as possible can be reviewed separately. this way there will be 
fewer patches to support subreg in the end. I plan to split it into 
four separate patches like bellow. What do you think?



I can wait for the new version patches.  The only issue is stage1 deadline.

In my opinion, I'd recommend to work on the patches more and start their 
submission right before GCC-14 release (somewhere in April).


Quite agree, I'll rewrite the patches a bit better before resend new 
version patchs, stage 1 is definitely too late. When you say before 
GCC-14 release do you mean at GCC 14 stage 3? Is it possible to commit 
such changes at stage 3? I was thinking that if I miss GCC 14 stage 1 I 
should have to wait until GCC 15 stage 1.


You need a lot of testing for the patches: major targets (x86-64, 
aarhc64, ppc64), some big endian targets, a 32-bit targets. Knowing how 
even small changes in RA can affect many targets, e.g. GCC testsuite 
results (there are a lot of different target tests which expect a 
particular output),  it is better to do this on stabilized GCC and 
stage3 is the best time for this.  In any case I'll approve patches only 
if you have successful bootstraps and no GCC testsuite regression on 
x86-64, ppc64le/be, aarhc64, i686.


Also you have a lot of compile time performance issues which you need to 
address.  So I guess you will be overwhelmed by new different target PRs 
after committing the patches if you will do this now.  You will have 
more time and less pressure work if you commit these patches in April.


Hm, I'll test the targets I can get first. I'll figure out the other 
targets later.


You changes are massive and in a critical part of GCC, it is better to 
do all of this on public git branch in order to people can try this and 
test their targets.


Okay, I'll try.

But it is up to you to decide when submit the patches.  Still besides 
approval of your patches, you need successful testing.  If new testsuite 
failures occur after submitting the patch and they are not fixed during 
short period of time, the patches should be reverted.



 1. live_subreg problem
2. conflict_hard_regs check refactoring
3. use object instead of allocno to create copies
4. support subreg coalesce
   4.1 ira: Apply live_subreg data to ira
   4.2 lra: Apply live_subreg data to lra
   4.3 ira: Support subreg liveness track
   4.4 lra: Support subreg liveness track

So for the two patches about LRA, maybe you can stop review and wait 
for the revised patchs.




Sure. So far I only had a quick glance on them.




--
Best,
Lehua (RiVAI)

Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-17 Thread Lehua Ding


Hi Kito and Robin,

So, going back to our testcases that reported errors with this, I don't 
think we should explicitly specify -march and -mabi when compiling a 
runnable program, but use the defaults (--with-arch). Most of our 
current runnable testcases adhere to this convention, except for the 
ones we are discussing now, who are explicitly setting -march to 
rv32gcv_zfh or rv64gcv_zfh inside rvv.exp file.


On 2023/11/17 16:29, Kito Cheng wrote:

Oh, ok I got why it happened and it is definitely caused by my patch
(but not that one, it is caused by another patch[1]), let me describe
the reason why I try to emit errors. RISC-V has a crazy number of
possible extension combinations, so it's easy to make some mistakes by
using some unsupported extension combination and generating error
messages that are difficult to understand.

Give some practical example here:
config a RISC-V toolchain with --with-arch=rv64gc --with-abi=lp64d
also build with multilib for rv32i/ilp32 and rv64imac/lp64

Now users try to use that toolchain with -march=rv32gc -mabi=ilp32d,
what will happen if there is no such error emitted?
GCC will fail back to default multilib which is rv64gc/lp64d, and then
you may got error message like bellow:

ABI is incompatible with that of the selected emulation:
  target emulation `elf32-littleriscv' does not match `elf64-littleriscv'

Experienced toolchain developers or experienced FAE may know what happened,
but the error message is really not meaningful for most users - and
then they will go back to waste our time :P

So that's the background why I design and implement that mechnish.

You may ask: hey why not implement the same mechnish for linux?
Ok - the answer is simple - linux typically will build with
rv64gc/lp64d as base ,
No much different combination like bare metal environment.

*However* I am not trying to say: there is no solution there, let's
give up on testing with bare metal.

One possible solution is Jin Ma's patch[2], he proposed
-mdisable-multilib-check to suppress this check, but it's kind of
dangerous in most cases, this may make it compile, but will get
another error soon.

So...I think the right solution should be adding more checks before
running those tests, e.g.checking rv32gv/ilp32d can run before running
those testcase.

[1] 
https://github.com/gcc-mirror/gcc/commit/d72ca12b846a9f5c01674b280b1817876c77888f
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619234.html

On Wed, Nov 15, 2023 at 6:48 PM 钟居哲  wrote:


Hi, Kito. Could you take a look at this issue?

-march parser is consistent between non-linux and linux.

You can simplify verify it with these cases:

FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c -std=c99 
-O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-run.c -O3 -ftree-vectorize (test for 
excess errors)

These cases failed on non-linux toolchain, but pass on linux toolchain.
This consistency is caused by your previous multilib patch as Lehua said:
https://github.com/gcc-mirror/gcc/commit/17d683d



juzhe.zh...@rivai.ai


From: Lehua Ding
Date: 2023-11-13 19:27
To: kito.cheng; Robin Dapp
CC: juzhe.zh...@rivai.ai; gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.
Hi Kito,

On 2023/11/13 19:13, Lehua Ding wrote:

Hi Robin,

On 2023/11/13 18:33, Robin Dapp wrote:

On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:

If there is a difference between them. I think we should fix
riscv-common.cc.
Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.


I don't think it is different.  Just checked and it still works for me.

Could you please tell me how you invoke the testsuite?


This looks to be the difference between the linux and elf versions of
gcc. The elf version of gcc we are build will have this problem, the
linux version of gcc will not. I think the linux version of gcc has a
wrong behavior.:

➜  riscv-gnu-toolchain-push git:(tintin-dev)
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-newlib-spike-debug/install/bin/riscv32-unknown-elf-gcc
 -march=rv32gcv_zfh 
build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
riscv32-unknown-

Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-16 Thread Lehua Ding


Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel like 
there are a lot of issues, especially the long compile time issue. So 
I'm going to reorganize and refactor the patches so that as many of them 
as possible can be reviewed separately. this way there will be fewer 
patches to support subreg in the end. I plan to split it into four 
separate patches like bellow. What do you think?


1. live_subreg problem
2. conflict_hard_regs check refactoring
3. use object instead of allocno to create copies
4. support subreg coalesce
   4.1 ira: Apply live_subreg data to ira
   4.2 lra: Apply live_subreg data to lra
   4.3 ira: Support subreg liveness track
   4.4 lra: Support subreg liveness track

So for the two patches about LRA, maybe you can stop review and wait for 
the revised patchs.


On 2023/11/17 5:13, Vladimir Makarov wrote:


On 11/12/23 07:08, Lehua Ding wrote:
This patch changes the previous way of creating a copy between 
allocnos to objects.


gcc/ChangeLog:

* ira-build.cc (find_allocno_copy): Removed.
(find_object): New.
(ira_create_copy): Adjust.
(add_allocno_copy_to_list): Adjust.
(swap_allocno_copy_ends_if_necessary): Adjust.
(ira_add_allocno_copy): Adjust.
(print_copy): Adjust.
(print_allocno_copies): Adjust.
(ira_flattening): Adjust.
* ira-color.cc (INCLUDE_VECTOR): Include vector.
(struct allocno_color_data): Adjust.
(struct allocno_hard_regs_subnode): Adjust.
(form_allocno_hard_regs_nodes_forest): Adjust.
(update_left_conflict_sizes_p): Adjust.
(struct update_cost_queue_elem): Adjust.
(queue_update_cost): Adjust.
(get_next_update_cost): Adjust.
(update_costs_from_allocno): Adjust.
(update_conflict_hard_regno_costs): Adjust.
(assign_hard_reg): Adjust.
(objects_conflict_by_live_ranges_p): New.
(allocno_thread_conflict_p): Adjust.
(object_thread_conflict_p): Ditto.
(merge_threads): Ditto.
(form_threads_from_copies): Ditto.
(form_threads_from_bucket): Ditto.
(form_threads_from_colorable_allocno): Ditto.
(init_allocno_threads): Ditto.
(add_allocno_to_bucket): Ditto.
(delete_allocno_from_bucket): Ditto.
(allocno_copy_cost_saving): Ditto.
(color_allocnos): Ditto.
(color_pass): Ditto.
(update_curr_costs): Ditto.
(coalesce_allocnos): Ditto.
(ira_reuse_stack_slot): Ditto.
(ira_initiate_assign): Ditto.
(ira_finish_assign): Ditto.
* ira-conflicts.cc (allocnos_conflict_for_copy_p): Ditto.
(REG_SUBREG_P): Ditto.
(subreg_move_p): New.
(regs_non_conflict_for_copy_p): New.
(subreg_reg_align_and_times_p): New.
(process_regs_for_copy): Ditto.
(add_insn_allocno_copies): Ditto.
(propagate_copies): Ditto.
* ira-emit.cc (add_range_and_copies_from_move_list): Ditto.
* ira-int.h (struct ira_allocno_copy): Ditto.
(ira_add_allocno_copy): Ditto.
(find_object): Exported.
(subreg_move_p): Exported.
* ira.cc (print_redundant_copies): Exported.

---
  gcc/ira-build.cc | 154 +++-
  gcc/ira-color.cc | 541 +++
  gcc/ira-conflicts.cc | 173 +++---
  gcc/ira-emit.cc  |  10 +-
  gcc/ira-int.h    |  10 +-
  gcc/ira.cc   |   5 +-
  6 files changed, 646 insertions(+), 247 deletions(-)
The patch is mostly ok for me except that there are the same issues I 
mentioned in my 1st email. Not changing comments for functions with 
changed interface like function arg types and names (e.g. 
find_allocno_copy) is particularly bad.  It makes the comments confusing 
and wrong.  Also using just "adjust" in changelog entries is too brief. 
You should at least mention that function signature is changed.

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index a32693e69e4..13f0f7336ed 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 8aed25144b9..099312bcdb3 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see



-  ira_allocno_t next_thread_allocno;
+  ira_object_t *next_thread_objects;
+  /* The allocno all thread shared.  */
+  ira_allocno_t first_thread_allocno;
+  /* The offset start relative to the first_thread_allocno.  */
+  int first_thread_offset;
+  /* All allocnos belong to the thread.  */
+  bitmap thread_allocnos;


It is better to use bitmap_head instead of bitmap.  It permits to avoid 
allocation of bitmap_head for bitmap.  There are many places when 
bitmap_head in you patches can be better used than bitmap (it is 
especially profitable if there is significant probability of empty bitmap).


Of  course the patch cab be committed when all the patches are approved 
and fixed.





--
Best,
Lehua (RiVAI)

Re: [PATCH V3 0/7] ira/lra: Support subreg coalesce

2023-11-14 Thread Lehua Ding





On 2023/11/15 7:22, Peter Bergner wrote:

On 11/12/23 6:08 AM, Lehua Ding wrote:

V3 Changes:
   1. fix three ICE.
   2. rebase



I tested this on powerpc64le-linux and powerpc64-linux.  The LE build
bootstrapped fine and it looks like only one testsuite FAIL which I have
to look into why it's FAILing.

The BE build did bootstrap, but the 32-bit and 64-bit testsuite runs both
had lots of FAILs (over 100 between them both) which I have yet to look
into what is happening.


I've applied for machine permissions on the compile farm, can you give 
me the way to compile and run tests on PPC64BE machine? I'll take a look 
at it too, thanks a lot.



I'll also note I have done no performance testing yet until I have an
idea of what the testsuite failures are.  I think a patch like this that
can affect the performance of all architectures needs some performance
testing to ensure we don't have unintended performance degradations.
I'll have someone on my team kick off some builds once I have a handle
on the testsuite FAILs.


This is really great, thanks for helping to test the performance.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH V3 0/7] ira/lra: Support subreg coalesce

2023-11-14 Thread Lehua Ding





On 2023/11/14 0:43, Dimitar Dimitrov wrote:

On Sun, Nov 12, 2023 at 08:08:10PM +0800, Lehua Ding wrote:

V3 Changes:
   1. fix three ICE.
   2. rebase

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).



Hi Lehua,

V3 indeed fixes the arm-none-eabi build. It's also confirmed by Linaro CI:
   
https://patchwork.sourceware.org/project/gcc/patch/20231112120817.2635864-8-lehua.d...@rivai.ai/

But avr and pru backends are still broken, albeit with different crash
signatures. Both targets are peculiar because they have
UNITS_PER_WORD=1. I'll try building some 16-bit target like msp430.

AVR fails when building libgcc:
/mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c: In function 
'__roundlr':
/mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:115:3: 
internal compiler error: in check_allocation, at ira.cc:2673
   115 |   }
   |   ^
/mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:106:3: note: 
in expansion of macro 'ROUND2'
   106 |   ROUND2 (FX)
   |   ^~
/mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:117:1: note: 
in expansion of macro 'ROUND1'
   117 | ROUND1(L_LABEL)
   | ^~
0xc80b8d check_allocation
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:2673
0xc89451 ira
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5873
0xc89451 execute
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6104

Script I'm using to build avr: 
https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-avr.sh



PRU fails building newlib:
/mnt/nvme/dinux/local-workspace/newlib/newlib/libc/stdlib/gdtoa-gdtoa.c:835:9: 
internal compiler error: in lra_create_live_ranges, at lra-lives.cc:1933
   835 | }
   | ^
0x6b951c lra_create_live_ranges(bool, bool)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra-lives.cc:1933
0xd9320c lra(_IO_FILE*)
 /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2638
0xd3e519 do_reload
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960
0xd3e519 execute
 /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148

Script I'm using to build pru: 
https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-pru.sh


These ICE will fixed in the V4 patchs and both targets build 
successfully in my machine, thank you so much for the reported.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] x86: Make testcase apx-spill_to_egprs-1.c more robust

2023-11-14 Thread Lehua Ding


Committed, thanks Hongtao.

On 2023/11/14 18:24, Hongtao Liu wrote:

On Tue, Nov 14, 2023 at 5:01 PM Lehua Ding  wrote:


Hi,

This little patch adjust the assert in apx-spill_to_egprs-1.c testcase.
The -mapxf compilation option allows more registers to be used, which in
turn eliminates the need for local variables to be stored in stack memory..
Therefore, the assertion is changed to detects no memory loaded through the
%rsp register.

Ok, thanks.


gcc/testsuite/ChangeLog:

 * gcc.target/i386/apx-spill_to_egprs-1.c: Make sure that no local
 variables are stored on the stack.

---
  .../gcc.target/i386/apx-spill_to_egprs-1.c| 19 +++
  1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c 
b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
index 290863d63a7..d7952b4c550 100644
--- a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
+++ b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
@@ -3,22 +3,9 @@

  #include "spill_to_mask-1.c"

-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r16d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r17d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r18d" } } */
-/* { dg-final { scan-assembler "movq\[ \t]+\[^\\n\\r\]*, %r19" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r20d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r21d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r22d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r23d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r24d" } } */
-/* { dg-final { scan-assembler "addl\[ \t]+\[^\\n\\r\]*, %r25d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r26d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r27d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r28d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r29d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r30d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r31d" } } */
+/* Make sure that no local variables are stored on the stack. */
+/* { dg-final { scan-assembler-not "\\(%rsp\\)" } } */
+
  /* { dg-final { scan-assembler-not "knot" } } */
  /* { dg-final { scan-assembler-not "kxor" } } */
  /* { dg-final { scan-assembler-not "kor" } } */
--
2.36.3






--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

[PATCH] x86: Make testcase apx-spill_to_egprs-1.c more robust

2023-11-14 Thread Lehua Ding

Hi,

This little patch adjust the assert in apx-spill_to_egprs-1.c testcase.
The -mapxf compilation option allows more registers to be used, which in
turn eliminates the need for local variables to be stored in stack memory.
Therefore, the assertion is changed to detects no memory loaded through the
%rsp register.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-spill_to_egprs-1.c: Make sure that no local
variables are stored on the stack.

---
 .../gcc.target/i386/apx-spill_to_egprs-1.c| 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c 
b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
index 290863d63a7..d7952b4c550 100644
--- a/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
+++ b/gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c
@@ -3,22 +3,9 @@
 
 #include "spill_to_mask-1.c"
 
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r16d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r17d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r18d" } } */
-/* { dg-final { scan-assembler "movq\[ \t]+\[^\\n\\r\]*, %r19" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r20d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r21d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r22d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r23d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r24d" } } */
-/* { dg-final { scan-assembler "addl\[ \t]+\[^\\n\\r\]*, %r25d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r26d" } } */
-/* { dg-final { scan-assembler "movl\[ \t]+\[^\\n\\r\]*, %r27d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r28d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r29d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r30d" } } */
-/* { dg-final { scan-assembler "movbel\[ \t]+\[^\\n\\r\]*, %r31d" } } */
+/* Make sure that no local variables are stored on the stack. */
+/* { dg-final { scan-assembler-not "\\(%rsp\\)" } } */
+
 /* { dg-final { scan-assembler-not "knot" } } */
 /* { dg-final { scan-assembler-not "kxor" } } */
 /* { dg-final { scan-assembler-not "kor" } } */
-- 
2.36.3

Re: [PATCH V3 1/7] df: Add DF_LIVE_SUBREG problem

2023-11-14 Thread Lehua Ding





On 2023/11/14 16:14, Richard Biener wrote:

On Mon, Nov 13, 2023 at 11:39 PM Vladimir Makarov  wrote:



On 11/12/23 07:08, Lehua Ding wrote:

This patch adds a live_subreg problem to extend the original live_reg to
track the liveness of subreg. We will only try to trace speudo registers
who's mode size is a multiple of nature size and eventually a small portion
of the inside will appear to use subreg. With live_reg problem, live_subreg
prbolem will have the following output. full_in/out mean the entire pesudo
live in/out, partial_in/out mean the subregs of the pesudo are live in/out,
and range_in/out indicates which part of the pesudo is live. all_in/out is
the union of full_in/out and partial_in/out:


I am not a maintainer or reviewer of data-flow analysis framework and
can not approve this patch except changes in regs.h.  Richard Sandiford
or Jeff Law as global reviewers probably can do this.

As for regs.h changes, they are ok for me after fixing general issues I
mentioned in my previous email (two spaces after sentence ends in the
comments).

I think all this code is a major compiler time and memory consumer in
all set of the patches.  DF analysis is slow by itself even when only
effective data structures as bitmaps are used but you are introducing
even slower data structure as maps (I believe better performance data
structure can be used instead).  In the very first version of LRA I used
DFA but it made LRA so slow that I had to introduce own data structures
which are faster in case of massive RTL changes in LRA.  The same
problem exists for using generic C++ standard library data as vectors
and maps for critical code.  It is hard to get a needed performance when
the exact implementation can vary or be not what you need, e.g. vector
initial capacity, growth etc.  But again the performance issues can be
addressed later.


I think the important bit should be the subreg live analysis should be
opt-in and when not enabled shouldn't have a bad effect on memory
usage and compile-time.  At -O0 and -O1 RA consumes a major
amount of compile-time.


This is perfectly fine, the code inside the live_subreg problem has a 
branch that goes through similar logic to live_reg if it finds no subreg 
inside the program. Then when the optimization level is less than 2, it 
doesn't track the subreg. By the way, I'd like to ask you if you have 
certain programs where RA has a big impact on compilation time to offer? 
Or any suggestions about it?


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH V3 0/7] ira/lra: Support subreg coalesce

2023-11-13 Thread Lehua Ding


Hi Vladimir,

On 2023/11/14 3:37, Vladimir Makarov wrote:


On 11/12/23 07:08, Lehua Ding wrote:

V3 Changes:
   1. fix three ICE.
   2. rebase

Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).

I've started review of v3 patches and here is my initial general 
criticism of your patches:


   * Absence of comments for some functions, e.g. for `HARD_REG_SET 
operator>> (unsigned int shift_amount) const`.


   * Adding significant functionality to existing functions is not 
reflected in the function comment, e.g. in ira_set_allocno_class.


   * A lot of typos, e.g. `pesudo` or `reprensent`.  I think you need to 
check spelling of you comments (I myself do spell checking in emacs by 
ispell-region command).


   * Grammar mistakes, e.g `Flag means need track subreg live range for 
the allocno`.  I understand English is not your native languages (as for 
me).  In case of some doubts I'd recommend to check grammar in ChatGPT 
(Proofread:  text).


   * Some local variables use upper case letters (e.g. `int A`) which 
should be used for macros or enums according to GNU coding standard 
(https://www.gnu.org/prep/standards/standards.html) .


   * Sometimes you put one space at the end of sentence.  Please see GNU 
coding standard and GCC coding conventions 
(https://gcc.gnu.org/codingconventions.html)


   * There is no uniformity in your code, e.g. sometimes you use 'i++', 
sometimes `++i` or `i += 1`.  Although the uniformity is not necessary, 
it makes a better impression about the patches.


Sorry for these issue, I'll address all those comments.

I also did not find what targets did you use for testing.  I am asking 
this because I see new testsuite failures (apx-spill_to_egprs-1.c) even 
on x86-64.  It might be nothing as the test expects a specific code 
generation.


There was testing x86, aarch64, riscv not long ago, but it looks like 
I'm missing something, I just locally tested with the latest code and 
also reproduced this fail you mentioned, along with a c++ fail 
(pr106877.C). I'll have a look at the cause.


Also besides testing major targets I'd recommend testing at least one 
big endian target (I'd recommend ppc64be. gcc110.fsfrance.org could be 
used for this).  Plenty RA issues occur because BE targets are not tested.


You said the address looks a bit wrong, it should be this 
gcc110.fsffrance.org right? I looked for it and it looks like you have 
to go to portal.cfarm.net first to apply for an account on this site, 
I'll try that, thanks a lot.


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding

Hi Kito,

On 2023/11/13 19:13, Lehua Ding wrote:

Hi Robin,

On 2023/11/13 18:33, Robin Dapp wrote:

On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:
If there is a difference between them. I think we should fix
riscv-common.cc.

Since I think "zvfh_zfh" should not be different with "zfh_zvfh"

It's possible. Let me debug it and see if there's a problem.

I don't think it is different. Just checked and it still works for me.

Could you please tell me how you invoke the testsuite?

This looks to be the difference between the linux and elf versions of
gcc. The elf version of gcc we are build will have this problem, the
linux version of gcc will not. I think the linux version of gcc has a
wrong behavior.:

➜ riscv-gnu-toolchain-push git:(tintin-dev)
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-newlib-spike-debug/install/bin/riscv32-unknown-elf-gcc -march=rv32gcv_zfh build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
riscv32-unknown-elf-gcc: fatal error: Cannot find suitable multilib set
for
'-march=rv32imafdcv_zicsr_zifencei_zfh_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=ilp32d'

compilation terminated.
➜ riscv-gnu-toolchain-push git:(tintin-dev)
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-linux-spike-debug/install/bin/riscv32-unknown-linux-gnu-gcc -march=rv32gcv_zfh build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c

It looks like this commit[1] from you make the difference between elf
and linux. Can you help to see if it makes sense to behave differently
now? elf version --with-arch is rv32gcv_zvfh_zfh, and the user will get
an error with -march=rv32gcv_zfh. linux version will not.

[1] https://github.com/gcc-mirror/gcc/commit/17d683d

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding


Hi Robin,

On 2023/11/13 18:33, Robin Dapp wrote:

On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:

If there is a difference between them. I think we should fix riscv-common.cc.
Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.


I don't think it is different.  Just checked and it still works for me.

Could you please tell me how you invoke the testsuite?


This looks to be the difference between the linux and elf versions of 
gcc. The elf version of gcc we are build will have this problem, the 
linux version of gcc will not. I think the linux version of gcc has a 
wrong behavior.:


➜  riscv-gnu-toolchain-push git:(tintin-dev) 
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-newlib-spike-debug/install/bin/riscv32-unknown-elf-gcc 
-march=rv32gcv_zfh 
build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
riscv32-unknown-elf-gcc: fatal error: Cannot find suitable multilib set 
for 
'-march=rv32imafdcv_zicsr_zifencei_zfh_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=ilp32d'

compilation terminated.
➜  riscv-gnu-toolchain-push git:(tintin-dev) 
./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-linux-spike-debug/install/bin/riscv32-unknown-linux-gnu-gcc 
-march=rv32gcv_zfh 
build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding





On 2023/11/13 18:33, Robin Dapp wrote:

On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:

If there is a difference between them. I think we should fix riscv-common.cc.
Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.


I don't think it is different.  Just checked and it still works for me.

Could you please tell me how you invoke the testsuite?


We use the riscv-gnu-toolchain and run this `make report-newlib 
SIM=spike RUNTESTFLAGS="rvv.exp" -j100`


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding




On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:
If there is a difference between them. I think we should fix 
riscv-common.cc.

Since I think "zvfh_zfh" should not be different with "zfh_zvfh"


It's possible. Let me debug it and see if there's a problem.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding





On 2023/11/13 17:59, Robin Dapp wrote:

Hi Lehua,


Executing on host: 
/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/xgcc
 
-B/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/
/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
  -march=rv64gcv_zvfh_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output  -O3 -ftree-vectorize -ansi -pedantic-errors 
-march=rv64gcv_zfh -mabi=lp64d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable  -lm  -o ./slp-mask-run-1.exe    
(timeout = 6000)
spawn -ignore SIGHUP 
/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/xgcc
 
-B/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/
 
/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
 -march=rv64gcv_zvfh_zfh -mabi=lp64d -mcmodel=medany -fdiagnostics-plain-output 
-O3 -ftree-vectorize -ansi -pedantic-errors -march=rv64gcv_zfh -mabi=lp64d -O3 
-std=gnu99 -O3 --param=riscv-autovec-preference=scalable -lm -o 
./slp-mask-run-1.exe



Executing on host: /home/rdapp/projects/gcc32/build/gcc/xgcc 
-B/home/rdapp/projects/gcc32/build/gcc/  
/home/rdapp/projects/gcc32/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
  -march=rv32gcv_zvfh   -fdiagnostics-plain-output  -O3 -ftree-vectorize -ansi 
-pedantic-errors -march=rv32gcv_zfh -mabi=ilp32d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable  -lm  -o ./slp-mask-run-1.exe
(timeout = 300)
spawn -ignore SIGHUP /home/rdapp/projects/gcc32/build/gcc/xgcc 
-B/home/rdapp/projects/gcc32/build/gcc/ 
/home/rdapp/projects/gcc32/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c
 -march=rv32gcv_zvfh -fdiagnostics-plain-output -O3 -ftree-vectorize -ansi 
-pedantic-errors -march=rv32gcv_zfh -mabi=ilp32d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable -lm -o ./slp-mask-run-1.exe


Looks like your configure is --with-march=rv32gcv_zvfh, can you change 
to --with-march=rv32gcv_zvfh_zfh?


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-13 Thread Lehua Ding


Hi Robin,

Can you show me the compile command in gcc.log for the 
slp-mask-run-1.exe like bellow? I'd like to see the -march option on 
your side.


Executing on host: 
/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/xgcc 
-B/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/ 

/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c 
 -march=rv64gcv_zvfh_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output  -O3 -ftree-vectorize -ansi -pedantic-errors 
-march=rv64gcv_zfh -mabi=lp64d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable  -lm  -o 
./slp-mask-run-1.exe(timeout = 6000)
spawn -ignore SIGHUP 
/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/xgcc 
-B/work/home/lding/open-source/riscv-gnu-toolchain-push/build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/build-gcc-newlib-stage2/gcc/ 
/work/home/lding/open-source/riscv-gnu-toolchain-push/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/slp-mask-run-1.c 
-march=rv64gcv_zvfh_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output -O3 -ftree-vectorize -ansi -pedantic-errors 
-march=rv64gcv_zfh -mabi=lp64d -O3 -std=gnu99 -O3 
--param=riscv-autovec-preference=scalable -lm -o ./slp-mask-run-1.exe


On 2023/11/13 17:31, Robin Dapp wrote:

I'm going to configure with --with-arch=rv32gcv_zfh_zvfh --with-abi=ilp32d
to see if there is any difference.


No change for me, how do you invoke the testsuite? I.e. Which target board?

Regards
  Robin


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

1 2 3 4 >

1 - 100 of 361 matches

Mail list logo