Re: [PATCH] ira: Consider save/restore costs of callee-save registers [PR110071]

2023-09-18 Thread Vladimir Makarov via Gcc-patches



On 9/15/23 10:48, Vladimir Makarov wrote:


On 9/14/23 06:45, Surya Kumari Jangala wrote:

ira: Consider save/restore costs of callee-save registers [PR110071]

In improve_allocation() routine, IRA checks for each allocno if spilling
any conflicting allocnos can improve the allocation of this allocno.
This routine computes the cost improvement for usage of each profitable
hard register for a given allocno. The existing code in
improve_allocation() does not consider the save/restore costs of callee
save registers while computing the cost improvement.

This can result in a callee save register being assigned to a pseudo
that is live in the entire function and across a call, overriding a
non-callee save register assigned to the pseudo by graph coloring. So
the entry basic block requires a prolog, thereby causing shrink wrap to
fail.


Yes, that can be a problem. The general idea is ok for me and common 
sense says me that the performance should be better but I would like 
to benchmark the patch on x86-64 spec2017 first.  Real applications 
have high register pressure and results might be not what we expect.  
So I'll do it, report the results, and give my approval if there is no 
big performance degradation.  I think the results will be ready on 
Monday.



I've benchmarked the patch on x86-64.  Specint2017 rate changed from 
8.54 to 8.51 and specfp2017 rate changed from 21.1 to 21.2. It is 
probably in a range of measurement error.


So the patch is ok for me to commit.  Thank you for working on the issue.




Re: [PATCH] ira: Consider save/restore costs of callee-save registers [PR110071]

2023-09-15 Thread Vladimir Makarov via Gcc-patches



On 9/14/23 06:45, Surya Kumari Jangala wrote:

ira: Consider save/restore costs of callee-save registers [PR110071]

In improve_allocation() routine, IRA checks for each allocno if spilling
any conflicting allocnos can improve the allocation of this allocno.
This routine computes the cost improvement for usage of each profitable
hard register for a given allocno. The existing code in
improve_allocation() does not consider the save/restore costs of callee
save registers while computing the cost improvement.

This can result in a callee save register being assigned to a pseudo
that is live in the entire function and across a call, overriding a
non-callee save register assigned to the pseudo by graph coloring. So
the entry basic block requires a prolog, thereby causing shrink wrap to
fail.


Yes, that can be a problem. The general idea is ok for me and common 
sense says me that the performance should be better but I would like to 
benchmark the patch on x86-64 spec2017 first.  Real applications have 
high register pressure and results might be not what we expect.  So I'll 
do it, report the results, and give my approval if there is no big 
performance degradation.  I think the results will be ready on Monday.





[pushed] [RA]: Improve cost calculation of pseudos with equivalences

2023-09-14 Thread Vladimir Makarov via Gcc-patches
I've committed the following patch.  The reason for this patch is 
explained in its commit message.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


commit 3c834d85f2ec42c60995c2b678196a06cb744959
Author: Vladimir N. Makarov 
Date:   Thu Sep 14 10:26:48 2023 -0400

[RA]: Improve cost calculation of pseudos with equivalences

RISCV target developers reported that RA can spill pseudo used in a
loop although there are enough registers to assign.  It happens when
the pseudo has an equivalence outside the loop and the equivalence is
not merged into insns using the pseudo.  IRA sets up that memory cost
to zero when the pseudo has an equivalence and it means that the
pseudo will be probably spilled.  This approach worked well for i686
(different approaches were benchmarked long time ago on spec2k).
Although common sense says that the code is wrong and this was
confirmed by RISCV developers.

I've tried the following patch on I7-9700k and it improved spec17 fp
by 1.5% (21.1 vs 20.8) although spec17 int is a bit worse by 0.45%
(8.54 vs 8.58).  The average generated code size is practically the
same (0.001% difference).

In the future we probably need to try more sophisticated cost
calculation which should take into account that the equiv can not be
combined in usage insns and the costs of reloads because of this.

gcc/ChangeLog:

* ira-costs.cc (find_costs_and_classes): Decrease memory cost
by equiv savings.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index d9e700e8947..8c93ace5094 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1947,15 +1947,8 @@ find_costs_and_classes (FILE *dump_file)
 	}
 	  if (i >= first_moveable_pseudo && i < last_moveable_pseudo)
 	i_mem_cost = 0;
-	  else if (equiv_savings < 0)
-	i_mem_cost = -equiv_savings;
-	  else if (equiv_savings > 0)
-	{
-	  i_mem_cost = 0;
-	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-		i_costs[k] += equiv_savings;
-	}
-
+	  else
+	i_mem_cost -= equiv_savings;
 	  best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
 	  best = ALL_REGS;
 	  alt_class = NO_REGS;


Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-14 Thread Vladimir Makarov via Gcc-patches



On 9/10/23 00:49, Hongyu Wang wrote:

Vladimir Makarov via Gcc-patches  于2023年9月9日周六 01:04写道:


On 8/31/23 04:20, Hongyu Wang wrote:

@@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression 
(@code{MEM} for the top level
   of an address, @code{ADDRESS} for something that occurs in an
   @code{address_operand}).  @var{index_code} is the code of the corresponding
   index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} 
otherwise.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
   @end defmac

I'd prefer more general description of 'insn' argument for the macros.
Something like that:

@code{insn} can be used to define an insn-specific base register class.


Sure, will adjust in the V2 patch.
Also, currently we reuse the old macro MODE_CODE_BASE_REG_CLASS, do
you think we need a new macro like INSN_BASE_REG_CLASS as other
parameters are actually unused? Then we don't need to change other
targets like avr/gcn.

I thought about this too.  Using new macros would be definitely worth to 
add, especially when you are already adding INSN_INDEX_REG_CLASS.


The names INSN_BASE_REG_CLASS instead of MODE_CODE_BASE_REG_CLASS and 
REGNO_OK_FOR_INSN_BASE_P instead of REGNO_MODE_CODE_OK_FOR_BASE_P are ok 
for me too.


When you submit the v2 patch, I'll review the RA part as soon as 
possible (actually I already looked at this) and most probably give my 
approval for the RA part because I prefer you current approach for RA 
instead of introducing new memory constraints.




Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-08 Thread Vladimir Makarov via Gcc-patches



On 8/31/23 04:20, Hongyu Wang wrote:

@@ -2542,6 +2542,8 @@ the code of the immediately enclosing expression 
(@code{MEM} for the top level
  of an address, @code{ADDRESS} for something that occurs in an
  @code{address_operand}).  @var{index_code} is the code of the corresponding
  index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
+@code{insn} indicates insn specific base register class should be subset
+of the original base register class.
  @end defmac


I'd prefer more general description of 'insn' argument for the macros.  
Something like that:


@code{insn} can be used to define an insn-specific base register class.




[pushed][PR111225][LRA]: Don't reuse chosen insn alternative with special memory constraint

2023-09-07 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


commit f7bca44d97ad01b39f9d6e7809df7bf517eeb2fb
Author: Vladimir N. Makarov 
Date:   Thu Sep 7 09:59:10 2023 -0400

[LRA]: Don't reuse chosen insn alternative with special memory constraint

To speed up GCC, LRA reuses chosen alternative from previous
constraint subpass.  A spilled pseudo is considered ok for any memory
constraint although stack slot assigned to the pseudo later might not
satisfy the chosen alternative constraint.  As we don't consider all insn
alternatives on the subsequent LRA sub-passes, it might result in LRA failure
to generate the correct insn.  This patch solves the problem.

gcc/ChangeLog:

PR target/111225
* lra-constraints.cc (goal_reuse_alt_p): New global flag.
(process_alt_operands): Set up the flag.  Clear flag for chosen
alternative with special memory constraints.
(process_alt_operands): Set up used insn alternative depending on the flag.

gcc/testsuite/ChangeLog:

PR target/111225
* gcc.target/i386/pr111225.c: New test.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c718bedff32..3aaa4906999 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1462,6 +1462,9 @@ static int goal_alt_matches[MAX_RECOG_OPERANDS];
 static int goal_alt_dont_inherit_ops_num;
 /* Numbers of operands whose reload pseudos should not be inherited.  */
 static int goal_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+/* True if we should try only this alternative for the next constraint sub-pass
+   to speed up the sub-pass.  */
+static bool goal_reuse_alt_p;
 /* True if the insn commutative operands should be swapped.  */
 static bool goal_alt_swapped;
 /* The chosen insn alternative.	 */
@@ -2130,6 +2133,7 @@ process_alt_operands (int only_alternative)
   int curr_alt_dont_inherit_ops_num;
   /* Numbers of operands whose reload pseudos should not be inherited.	*/
   int curr_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+  bool curr_reuse_alt_p;
   /* True if output stack pointer reload should be generated for the current
  alternative.  */
   bool curr_alt_out_sp_reload_p;
@@ -2217,6 +2221,7 @@ process_alt_operands (int only_alternative)
   reject += static_reject;
   early_clobbered_regs_num = 0;
   curr_alt_out_sp_reload_p = false;
+  curr_reuse_alt_p = true;
   
   for (nop = 0; nop < n_operands; nop++)
 	{
@@ -2574,7 +2579,10 @@ process_alt_operands (int only_alternative)
 		  if (satisfies_memory_constraint_p (op, cn))
 			win = true;
 		  else if (spilled_pseudo_p (op))
-			win = true;
+			{
+			  curr_reuse_alt_p = false;
+			  win = true;
+			}
 		  break;
 		}
 		  break;
@@ -3318,6 +3326,7 @@ process_alt_operands (int only_alternative)
 	  goal_alt_offmemok[nop] = curr_alt_offmemok[nop];
 	}
 	  goal_alt_dont_inherit_ops_num = curr_alt_dont_inherit_ops_num;
+	  goal_reuse_alt_p = curr_reuse_alt_p;
 	  for (nop = 0; nop < curr_alt_dont_inherit_ops_num; nop++)
 	goal_alt_dont_inherit_ops[nop] = curr_alt_dont_inherit_ops[nop];
 	  goal_alt_swapped = curr_swapped;
@@ -4399,7 +4408,8 @@ curr_insn_transform (bool check_only_p)
 }
 
   lra_assert (goal_alt_number >= 0);
-  lra_set_used_insn_alternative (curr_insn, goal_alt_number);
+  lra_set_used_insn_alternative (curr_insn, goal_reuse_alt_p
+ ? goal_alt_number : LRA_UNKNOWN_ALT);
 
   if (lra_dump_file != NULL)
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr111225.c b/gcc/testsuite/gcc.target/i386/pr111225.c
new file mode 100644
index 000..5d92daf215b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111225.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fsanitize=thread -mforce-drap -mavx512cd" } */
+
+typedef long long __m256i __attribute__ ((__vector_size__ (32)));
+
+int foo (__m256i x, __m256i y)
+{
+  __m256i a = x & ~y;
+  return !__builtin_ia32_ptestz256 (a, a);
+}
+
+int bar (__m256i x, __m256i y)
+{
+  __m256i a = ~x & y;
+  return !__builtin_ia32_ptestz256 (a, a);
+}


Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-07 Thread Vladimir Makarov via Gcc-patches



On 9/7/23 02:23, Uros Bizjak wrote:

On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov  wrote:


On 9/1/23 05:07, Hongyu Wang wrote:



I think the approach proposed by Intel developers is better.  In some way
we already use such approach when we pass memory mode to get the base
reg class.  Although we could use different memory constraints for
different modes when the possible base reg differs for some memory
modes.

Using special memory constraints probably can be implemented too (I
understand attractiveness of such approach for readability of the
machine description).  But in my opinion it will require much bigger
work in IRA/LRA/reload.  It also significantly slow down RA as we need
to process insn constraints for processing each memory in many places
(e.g. for calculation of reg classes and costs in IRA).  Still I think
there will be a few cases for this approach resulting in a bigger
probability of assigning hard reg out of specific base reg class and
this will result in additional reloads.

So the approach proposed by Intel is ok for me.  Although if x86 maintainers
are strongly against this approach and the changes in x86 machine
dependent code and Intel developers implement Uros approach, I am
ready to review this.  But still I prefer the current Intel developers
approach for reasons I mentioned above.

My above proposal is more or less a wish from a target maintainer PoV.
Ideally, we would have a bunch of different memory constraints, and a
target hook that returns corresponding BASE/INDEX reg classes.
However, I have no idea about the complexity of the implementation in
the infrastructure part of the compiler.

Basically, it needs introducing new hooks which return base and index 
classes from special memory constraints. When we process memory in an 
insn (a lot of places in IRA, LRA,reload) we should consider all 
possible memory insn constraints, take intersection of basic and index 
reg classes for the constraints and use them instead of the default base 
and reg classes.


The required functionality is absent in reload too.

I would say that it is a moderate size project (1-2 months for me).  It 
still requires to introduce new hooks and I guess there are few cases 
when we will still assign hard regs out of desirable base class for 
address pseudos and this will results in generation of additional reload 
insns.  It also means much more additional changes in RA source code and 
x86 machine dependent files.


Probably, with this approach there will be also edge cases when we need 
to solve new PRs because of LRA failures to generate the correct code 
but I believe they can be solved.


Therefore I lean toward the current Intel approach when to get base reg 
class we pass the insn as a parameter additionally to memory mode.





Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-06 Thread Vladimir Makarov via Gcc-patches



On 9/1/23 05:07, Hongyu Wang wrote:

Uros Bizjak via Gcc-patches  于2023年8月31日周四 18:16写道:

On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang  wrote:

From: Kong Lingling 

Current reload infrastructure does not support selective base_reg_class
for backend insn. Add insn argument to base_reg_class for
lra/reload usage.

I don't think this is the correct approach. Ideally, a memory
constraint should somehow encode its BASE/INDEX register class.
Instead of passing "insn", simply a different constraint could be used
in the constraint string of the relevant insn.

We tried constraint only at the beginning, but then we found the
reload infrastructure
does not work like that.

The BASE/INDEX reg classes are determined before choosing alternatives, in
process_address under curr_insn_transform. Process_address creates the mem
operand according to the BASE/INDEX reg class. Then, the memory operand
constraint check will evaluate the mem op with targetm.legitimate_address_p.

If we want to make use of EGPR in base/index we need to either extend BASE/INDEX
reg class in the backend, or, for specific insns, add a target hook to
tell reload
that the extended reg class with EGPR can be used to construct memory operand.

CC'd Vladimir as git send-mail failed to add recipient.



I think the approach proposed by Intel developers is better.  In some way
we already use such approach when we pass memory mode to get the base
reg class.  Although we could use different memory constraints for
different modes when the possible base reg differs for some memory
modes.

Using special memory constraints probably can be implemented too (I
understand attractiveness of such approach for readability of the
machine description).  But in my opinion it will require much bigger
work in IRA/LRA/reload.  It also significantly slow down RA as we need
to process insn constraints for processing each memory in many places
(e.g. for calculation of reg classes and costs in IRA).  Still I think
there will be a few cases for this approach resulting in a bigger
probability of assigning hard reg out of specific base reg class and
this will result in additional reloads.

So the approach proposed by Intel is ok for me.  Although if x86 maintainers
are strongly against this approach and the changes in x86 machine
dependent code and Intel developers implement Uros approach, I am
ready to review this.  But still I prefer the current Intel developers
approach for reasons I mentioned above.



Re: [pushed][LRA]: Spill pseudos assigned to fp when fp->sp elimination became impossible

2023-08-17 Thread Vladimir Makarov via Gcc-patches



On 8/17/23 07:19, senthilkumar.selva...@microchip.com wrote:

On Wed, 2023-08-16 at 12:13 -0400, Vladimir Makarov wrote:

EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
content is safe

The attached patch fixes recently found wrong insn removal in LRA port
for AVR.

The patch was successfully tested and bootstrapped on x86-64 and aarch64.



Hi Vladimir,

   Thanks for working on this. After applying the patch, I'm seeing that the
   pseudo in the frame pointer that got spilled is taking up the same stack
   slot that was already assigned to a spilled pseudo, and that is causing 
execution
   failure (it is also causing a crash when building libgcc for avr)

...
   I tried a hacky workaround (see patch below) to create a new stack slot and
   assign the spilled pseudo to it, and that works.
   
   Not sure if that's the right way to do it though.


The general way of solution is right but I've just committed a different 
version of the patch.





[pushed][LRA]: When assigning stack slots to pseudos previously assigned to fp consider other spilled pseudos

2023-08-17 Thread Vladimir Makarov via Gcc-patches
The following patch fixes a problem with allocating the same stack slots 
to conflicting pseudos.  The problem exists only for AVR LRA port.


The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit c024867d1aa9d465e0236fc9d45d8e1d4bb6bd30
Author: Vladimir N. Makarov 
Date:   Thu Aug 17 11:57:45 2023 -0400

[LRA]: When assigning stack slots to pseudos previously assigned to fp 
consider other spilled pseudos

The previous LRA patch can assign slot of conflicting pseudos to
pseudos spilled after prohibiting fp->sp elimination.  This patch
fixes this problem.

gcc/ChangeLog:

* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Moving
slots_num initialization from here ...
(lra_spill): ... to here before the 1st call of
assign_stack_slot_num_and_sort_pseudos.  Add the 2nd call after
fp->sp elimination.

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 7e1d35b5e4e..a663a1931e3 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -363,7 +363,6 @@ assign_stack_slot_num_and_sort_pseudos (int *pseudo_regnos, 
int n)
 {
   int i, j, regno;
 
-  slots_num = 0;
   /* Assign stack slot numbers to spilled pseudos, use smaller numbers
  for most frequently used pseudos. */
   for (i = 0; i < n; i++)
@@ -628,6 +627,7 @@ lra_spill (void)
   /* Sort regnos according their usage frequencies.  */
   qsort (pseudo_regnos, n, sizeof (int), regno_freq_compare);
   n = assign_spill_hard_regs (pseudo_regnos, n);
+  slots_num = 0;
   assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n);
   for (i = 0; i < n; i++)
 if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
@@ -635,6 +635,7 @@ lra_spill (void)
   if ((n2 = lra_update_fp2sp_elimination (pseudo_regnos)) > 0)
 {
   /* Assign stack slots to spilled pseudos assigned to fp.  */
+  assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n2);
   for (i = 0; i < n2; i++)
if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
  assign_mem_slot (pseudo_regnos[i]);


[pushed][LRA]: Spill pseudos assigned to fp when fp->sp elimination became impossible

2023-08-16 Thread Vladimir Makarov via Gcc-patches
The attached patch fixes recently found wrong insn removal in LRA port 
for AVR.


The patch was successfully tested and bootstrapped on x86-64 and aarch64.


commit 748a77558ff37761faa234e19327ad1decaace33
Author: Vladimir N. Makarov 
Date:   Wed Aug 16 09:13:54 2023 -0400

[LRA]: Spill pseudos assigned to fp when fp->sp elimination became 
impossible

Porting LRA to AVR revealed that creating a stack slot can make fp->sp
elimination impossible.  The previous patches undoes fp assignment after
the stack slot creation but calculated wrongly live info after this.  This
resulted in wrong generation by deleting some still alive insns.  This
patch fixes this problem.

gcc/ChangeLog:

* lra-int.h (lra_update_fp2sp_elimination): Change the prototype.
* lra-eliminations.cc (spill_pseudos): Record spilled pseudos.
(lra_update_fp2sp_elimination): Ditto.
(update_reg_eliminate): Adjust spill_pseudos call.
* lra-spills.cc (lra_spill): Assign stack slots to pseudos spilled
in lra_update_fp2sp_elimination.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 1f4e3fec9e0..3c58d4a3815 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1086,18 +1086,18 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, 
bool first_p,
   lra_update_insn_recog_data (insn);
 }
 
-/* Spill pseudos which are assigned to hard registers in SET.  Add
-   affected insns for processing in the subsequent constraint
-   pass.  */
-static void
-spill_pseudos (HARD_REG_SET set)
+/* Spill pseudos which are assigned to hard registers in SET, record them in
+   SPILLED_PSEUDOS unless it is null, and return the recorded pseudos number.
+   Add affected insns for processing in the subsequent constraint pass.  */
+static int
+spill_pseudos (HARD_REG_SET set, int *spilled_pseudos)
 {
-  int i;
+  int i, n;
   bitmap_head to_process;
   rtx_insn *insn;
 
   if (hard_reg_set_empty_p (set))
-return;
+return 0;
   if (lra_dump_file != NULL)
 {
   fprintf (lra_dump_file, "   Spilling non-eliminable hard regs:");
@@ -1107,6 +1107,7 @@ spill_pseudos (HARD_REG_SET set)
   fprintf (lra_dump_file, "\n");
 }
   bitmap_initialize (_process, _obstack);
+  n = 0;
   for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
 if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
&& overlaps_hard_reg_set_p (set,
@@ -1116,6 +1117,8 @@ spill_pseudos (HARD_REG_SET set)
  fprintf (lra_dump_file, "  Spilling r%d(%d)\n",
   i, reg_renumber[i]);
reg_renumber[i] = -1;
+   if (spilled_pseudos != NULL)
+ spilled_pseudos[n++] = i;
bitmap_ior_into (_process, _reg_info[i].insn_bitmap);
   }
   lra_no_alloc_regs |= set;
@@ -1126,6 +1129,7 @@ spill_pseudos (HARD_REG_SET set)
lra_set_used_insn_alternative (insn, LRA_UNKNOWN_ALT);
   }
   bitmap_clear (_process);
+  return n;
 }
 
 /* Update all offsets and possibility for elimination on eliminable
@@ -1238,7 +1242,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
   }
   lra_no_alloc_regs |= temp_hard_reg_set;
   eliminable_regset &= ~temp_hard_reg_set;
-  spill_pseudos (temp_hard_reg_set);
+  spill_pseudos (temp_hard_reg_set, NULL);
   return result;
 }
 
@@ -1382,15 +1386,17 @@ process_insn_for_elimination (rtx_insn *insn, bool 
final_p, bool first_p)
 
 /* Update frame pointer to stack pointer elimination if we started with
permitted frame pointer elimination and now target reports that we can not
-   do this elimination anymore.  */
-void
-lra_update_fp2sp_elimination (void)
+   do this elimination anymore.  Record spilled pseudos in SPILLED_PSEUDOS
+   unless it is null, and return the recorded pseudos number.  */
+int
+lra_update_fp2sp_elimination (int *spilled_pseudos)
 {
+  int n;
   HARD_REG_SET set;
   class lra_elim_table *ep;
 
   if (frame_pointer_needed || !targetm.frame_pointer_required ())
-return;
+return 0;
   gcc_assert (!elimination_fp2sp_occured_p);
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
@@ -1398,10 +1404,11 @@ lra_update_fp2sp_elimination (void)
   frame_pointer_needed = true;
   CLEAR_HARD_REG_SET (set);
   add_to_hard_reg_set (, Pmode, HARD_FRAME_POINTER_REGNUM);
-  spill_pseudos (set);
+  n = spill_pseudos (set, spilled_pseudos);
   for (ep = reg_eliminate; ep < _eliminate[NUM_ELIMINABLE_REGS]; ep++)
 if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
   setup_can_eliminate (ep, false);
+  return n;
 }
 
 /* Entry function to do final elimination if FINAL_P or to update
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 633d9af8058..d0752c2ae50 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -414,7 +414,7 @@ extern int lra_get_elimination_hard_regno (int);
 extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, machine_mode,
 bool, bool, 

[pushed][LRA]: Process output stack pointer reloads before emitting reload insns

2023-08-14 Thread Vladimir Makarov via Gcc-patches

The patch fixes a failure of building aarch64 port with my yesterday patch.

The patch was successfully bootstrapped on x86-64 and aarch64.
commit c4760c0161f92b92361feba11836e3d066bb330c
Author: Vladimir N. Makarov 
Date:   Mon Aug 14 16:06:27 2023 -0400

[LRA]: Process output stack pointer reloads before emitting reload insns

Previous patch setting up asserts for processing stack pointer reloads
caught an error in code moving sp offset.  This resulted in failure of
building aarch64 port. The code wrongly processed insns beyond the
output reloads of the current insn.  This patch fixes it.

gcc/ChangeLog:

* lra-constraints.cc (curr_insn_transform): Process output stack
pointer reloads before emitting reload insns.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 8d9443adeb6..c718bedff32 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4840,7 +4840,6 @@ curr_insn_transform (bool check_only_p)
/* Most probably there are no enough registers to satisfy asm insn: */
lra_asm_insn_error (curr_insn);
 }
-  lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
   if (goal_alt_out_sp_reload_p)
 {
   /* We have an output stack pointer reload -- update sp offset: */
@@ -4863,6 +4862,7 @@ curr_insn_transform (bool check_only_p)
  }
   lra_assert (done_p);
 }
+  lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
   return change_p;
 }
 


Re: [pushed]LRA]: Fix asserts for output stack pointer reloads

2023-08-14 Thread Vladimir Makarov via Gcc-patches



On 8/14/23 14:37, Prathamesh Kulkarni wrote:

On Mon, 14 Aug 2023 at 06:39, Vladimir Makarov via Gcc-patches
 wrote:

The following patch fixes useless asserts in my latest patch
implementing output stack pointer reloads.

Hi Vladimir,
It seems that this patch caused the following ICE on aarch64-linux-gnu
while building cp-demangle.c:
compile:  
/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/stage1-build/./gcc/xgcc
-B/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/stage1-build/./gcc/
-B/usr/local/aarch64-unknown-linux-gnu/bin/
-B/usr/local/aarch64-unknown-linux-gnu/lib/ -isystem
/usr/local/aarch64-unknown-linux-gnu/include -isystem
/usr/local/aarch64-unknown-linux-gnu/sys-include -DHAVE_CONFIG_H -I..
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/gcc/libstdc++-v3/../libiberty
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/gcc/libstdc++-v3/../include
-D_GLIBCXX_SHARED
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/stage1-build/aarch64-unknown-linux-gnu/libstdc++-v3/include/aarch64-unknown-linux-gnu
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/stage1-build/aarch64-unknown-linux-gnu/libstdc++-v3/include
-I/home/prathamesh.kulkarni/gnu-toolchain/gcc/master/gcc/libstdc++-v3/libsupc++
-g -O2 -DIN_GLIBCPP_V3 -Wno-error -c cp-demangle.c  -fPIC -DPIC -o
cp-demangle.o
during RTL pass: reload
cp-demangle.c: In function ‘d_demangle_callback.constprop’:
cp-demangle.c:6815:1: internal compiler error: in curr_insn_transform,
at lra-constraints.cc:4854
  6815 | }
   | ^
0xce6b37 curr_insn_transform
 ../../gcc/gcc/lra-constraints.cc:4854
0xce7887 lra_constraints(bool)
 ../../gcc/gcc/lra-constraints.cc:5478
0xccdfa7 lra(_IO_FILE*)
 ../../gcc/gcc/lra.cc:2419
0xc7e417 do_reload
 ../../gcc/gcc/ira.cc:5970
0xc7e417 execute
 ../../gcc/gcc/ira.cc:6156
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.


Sorry, I should have bootstrapped my patch on aarch64.

The asserts actually seems very useful as I found they caught a bug in 
my previous patch.


I'll push a patch fixing the problems after finishing bootstraps, 
probably in couple hours.


Thank you





[pushed]LRA]: Fix asserts for output stack pointer reloads

2023-08-13 Thread Vladimir Makarov via Gcc-patches
The following patch fixes useless asserts in my latest patch 
implementing output stack pointer reloads.
commit 18b417fe1a46d37738243267c1f559cd0acc4886
Author: Vladimir N. Makarov 
Date:   Sun Aug 13 20:54:58 2023 -0400

[LRA]: Fix asserts for output stack pointer reloads

The patch implementing output stack pointer reloads contained superfluous
asserts.  The patch makes them useful.

gcc/ChangeLog:

* lra-constraints.cc (curr_insn_transform): Set done_p up and
check it on true after processing output stack pointer reload.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 26239908747..8d9443adeb6 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4852,6 +4852,7 @@ curr_insn_transform (bool check_only_p)
&& SET_DEST (set) == stack_pointer_rtx)
  {
lra_assert (!done_p);
+   done_p = true;
curr_id->sp_offset = 0;
lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
id->sp_offset = sp_offset;
@@ -4860,7 +4861,7 @@ curr_insn_transform (bool check_only_p)
   "Moving sp offset from insn %u to %u\n",
   INSN_UID (curr_insn), INSN_UID (insn));
  }
-  lra_assert (!done_p);
+  lra_assert (done_p);
 }
   return change_p;
 }


[pushed][LRA]: Implement output stack pointer reloads

2023-08-11 Thread Vladimir Makarov via Gcc-patches
Sorry, I had some problems with email.  Therefore there are email 
duplication and they were sent to g...@gcc.gnu.org instead of 
gcc-patches@gcc.gnu.org



On 8/9/23 16:54, Vladimir Makarov wrote:




On 8/9/23 07:15, senthilkumar.selva...@microchip.com wrote:

Hi,

   After turning on FP -> SP elimination after Vlad fixed
   an elimination issue in 
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832,

   I'm now running into reload failure if arithmetic is done on SP.

I think we can permit to stack pointer output reloads.  The only thing 
we need to update sp offset accurately for the original and reload 
insns.  I'll try to make the patch on this week.



The following patch fixes the problem.  The patch was successfully 
bootstrapped and tested on x86_64, aarch64, and ppc64le.


The test case is actually one from GCC test suite.

commit c0121083d07ffd4a8424f4be50de769d9ad0386d
Author: Vladimir N. Makarov 
Date:   Fri Aug 11 07:57:37 2023 -0400

[LRA]: Implement output stack pointer reloads

LRA prohibited output stack pointer reloads but it resulted in LRA
failure for AVR target which has no arithmetic insns working with the
stack pointer register.  Given patch implements the output stack
pointer reloads.

gcc/ChangeLog:

* lra-constraints.cc (goal_alt_out_sp_reload_p): New flag.
(process_alt_operands): Set the flag.
(curr_insn_transform): Modify stack pointer offsets if output
stack pointer reload is generated.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 09ff6de1657..26239908747 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1466,6 +1466,8 @@ static int goal_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
 static bool goal_alt_swapped;
 /* The chosen insn alternative.	 */
 static int goal_alt_number;
+/* True if output reload of the stack pointer should be generated.  */
+static bool goal_alt_out_sp_reload_p;
 
 /* True if the corresponding operand is the result of an equivalence
substitution.  */
@@ -2128,6 +2130,9 @@ process_alt_operands (int only_alternative)
   int curr_alt_dont_inherit_ops_num;
   /* Numbers of operands whose reload pseudos should not be inherited.	*/
   int curr_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+  /* True if output stack pointer reload should be generated for the current
+ alternative.  */
+  bool curr_alt_out_sp_reload_p;
   rtx op;
   /* The register when the operand is a subreg of register, otherwise the
  operand itself.  */
@@ -2211,7 +2216,8 @@ process_alt_operands (int only_alternative)
 	}
   reject += static_reject;
   early_clobbered_regs_num = 0;
-
+  curr_alt_out_sp_reload_p = false;
+  
   for (nop = 0; nop < n_operands; nop++)
 	{
 	  const char *p;
@@ -2682,12 +2688,10 @@ process_alt_operands (int only_alternative)
 	  bool no_regs_p;
 
 	  reject += op_reject;
-	  /* Never do output reload of stack pointer.  It makes
-		 impossible to do elimination when SP is changed in
-		 RTL.  */
-	  if (op == stack_pointer_rtx && ! frame_pointer_needed
+	  /* Mark output reload of the stack pointer.  */
+	  if (op == stack_pointer_rtx
 		  && curr_static_id->operand[nop].type != OP_IN)
-		goto fail;
+		curr_alt_out_sp_reload_p = true;
 
 	  /* If this alternative asks for a specific reg class, see if there
 		 is at least one allocatable register in that class.  */
@@ -3317,6 +3321,7 @@ process_alt_operands (int only_alternative)
 	  for (nop = 0; nop < curr_alt_dont_inherit_ops_num; nop++)
 	goal_alt_dont_inherit_ops[nop] = curr_alt_dont_inherit_ops[nop];
 	  goal_alt_swapped = curr_swapped;
+	  goal_alt_out_sp_reload_p = curr_alt_out_sp_reload_p;
 	  best_overall = overall;
 	  best_losers = losers;
 	  best_reload_nregs = reload_nregs;
@@ -4836,6 +4841,27 @@ curr_insn_transform (bool check_only_p)
 	lra_asm_insn_error (curr_insn);
 }
   lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
+  if (goal_alt_out_sp_reload_p)
+{
+  /* We have an output stack pointer reload -- update sp offset: */
+  rtx set;
+  bool done_p = false;
+  poly_int64 sp_offset = curr_id->sp_offset;
+  for (rtx_insn *insn = after; insn != NULL_RTX; insn = NEXT_INSN (insn))
+	if ((set = single_set (insn)) != NULL_RTX
+	&& SET_DEST (set) == stack_pointer_rtx)
+	  {
+	lra_assert (!done_p);
+	curr_id->sp_offset = 0;
+	lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
+	id->sp_offset = sp_offset;
+	if (lra_dump_file != NULL)
+	  fprintf (lra_dump_file,
+		   "Moving sp offset from insn %u to %u\n",
+		   INSN_UID (curr_insn), INSN_UID (insn));
+	  }
+  lra_assert (!done_p);
+}
   return change_p;
 }
 


Re: [PATCH] rtl-optimization/110587 - speedup find_hard_regno_for_1

2023-08-08 Thread Vladimir Makarov via Gcc-patches



On 8/7/23 09:18, Richard Biener wrote:

On Wed, 2 Aug 2023, Richard Biener wrote:


On Mon, 31 Jul 2023, Jeff Law wrote:



On 7/31/23 04:54, Richard Biener via Gcc-patches wrote:

On Tue, 25 Jul 2023, Richard Biener wrote:


The following applies a micro-optimization to find_hard_regno_for_1,
re-ordering the check so we can easily jump-thread by using an else.
This reduces the time spent in this function by 15% for the testcase
in the PR.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK if that
passes?

Ping.


Thanks,
Richard.

  PR rtl-optimization/110587
  * lra-assigns.cc (find_hard_regno_for_1): Re-order checks.
---
   gcc/lra-assigns.cc | 9 +
   1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index b8582dcafff..d2ebcfd5056 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -522,14 +522,15 @@ find_hard_regno_for_1 (int regno, int *cost, int
@@ try_only_hard_regno,
   r2 != NULL;
   r2 = r2->start_next)
{
- if (r2->regno >= lra_constraint_new_regno_start
+ if (live_pseudos_reg_renumber[r2->regno] < 0
+ && r2->regno >= lra_constraint_new_regno_start
   && lra_reg_info[r2->regno].preferred_hard_regno1 >= 0
- && live_pseudos_reg_renumber[r2->regno] < 0
   && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
 sparseset_set_bit (conflict_reload_and_inheritance_pseudos,
   r2->regno);
- if (live_pseudos_reg_renumber[r2->regno] >= 0
- && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
+ else if (live_pseudos_reg_renumber[r2->regno] >= 0
+  && rclass_intersect_p
+   [regno_allocno_class_array[r2->regno]])
 sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);

My biggest concern here would be r2->regno < 0  in the new code which could
cause an OOB array reference in the first condition of the test.

Isn't that the point if the original ordering?  Test that r2->regno is
reasonable before using it as an array index?

Note the original code is

   if (r2->regno >= lra_constraint_new_regno_start
...
  if (live_pseudos_reg_renumber[r2->regno] >= 0
...

so we are going to access live_pseudos_reg_renumber[r2->regno]
independent on the r2->regno >= lra_constraint_new_regno_start check,
so I don't think that's the point of the original ordering.  Note
I preserved the ordering with respect to other array accesses,
the speedup seen is because we now have the


if (live_pseudos_reg_renumber[r2->regno] < 0
...
else if (live_pseudos_reg_renumber[r2->regno] >= 0
 ...

structure directly exposed which helps the compiler.

I think the check on r2->regno is to decide whether to alter
conflict_reload_and_inheritance_pseudos or
live_range_hard_reg_pseudos (so it's also somewhat natural to check
that first).

So - OK?


Richard, sorry, I overlooked this thread.

Yes, it is OK to commit.  In general Jeff has a reasonable concern but 
in this case r2->regno is always >= 0 and I can not imagine reasons that 
we will change algorithm in the future in such way when it is not true.






[pushed][LRA] Check input insn pattern hard regs against early clobber hard regs for live info

2023-08-04 Thread Vladimir Makarov via Gcc-patches
The following patch fixes a problem found by LRA port for avr target.  
The problem description is in the commit message.


The patch was successfully bootstrapped and tested on x86-64 and aarch64.
commit abf953042ace471720c1dc284b5f38e546fc0595
Author: Vladimir N. Makarov 
Date:   Fri Aug 4 08:04:44 2023 -0400

LRA: Check input insn pattern hard regs against early clobber hard regs for live info

For the test case LRA generates wrong code for AVR cpymem_qi insn:

(insn 16 15 17 3 (parallel [
(set (mem:BLK (reg:HI 26 r26) [0  A8])
(mem:BLK (reg:HI 30 r30) [0  A8]))
(unspec [
(const_int 0 [0])
] UNSPEC_CPYMEM)
(use (reg:QI 52))
(clobber (reg:HI 26 r26))
(clobber (reg:HI 30 r30))
(clobber (reg:QI 0 r0))
(clobber (reg:QI 52))
]) "t.c":16:22 132 {cpymem_qi}

The insn gets the same value in r26 and r30.  The culprit is clobbering
r30 and using r30 as input.  For such situation LRA wrongly assumes that
r30 does not live before the insn.  The patch is fixing it.

gcc/ChangeLog:

* lra-lives.cc (process_bb_lives): Check input insn pattern hard regs
against early clobber hard regs.

gcc/testsuite/ChangeLog:

* gcc.target/avr/lra-cpymem_qi.c: New.

diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index f7a3ba8d76a..f60e564da82 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -989,7 +989,7 @@ process_bb_lives (basic_block bb, int _point, bool dead_insn_p)
 	/* We can have early clobbered non-operand hard reg and
 	   the same hard reg as an insn input.  Don't make hard
 	   reg dead before the insns.  */
-	for (reg2 = curr_id->regs; reg2 != NULL; reg2 = reg2->next)
+	for (reg2 = curr_static_id->hard_regs; reg2 != NULL; reg2 = reg2->next)
 	  if (reg2->type != OP_OUT && reg2->regno == reg->regno)
 		break;
 	if (reg2 == NULL)
diff --git a/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c b/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c
new file mode 100644
index 000..fdffb445b45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/lra-cpymem_qi.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mmcu=avr51 -Os" } */
+
+#include 
+
+struct A
+{
+  unsigned int a;
+  unsigned char c1, c2;
+  bool b1 : 1;
+};
+
+void
+foo (const struct A *x, int y)
+{
+  int s = 0, i;
+  for (i = 0; i < y; ++i)
+{
+  const struct A a = x[i];
+  s += a.b1 ? 1 : 0;
+}
+  if (s != 0)
+__builtin_abort ();
+}
+
+/* { dg-final { scan-assembler-not "movw\[^\n\r]*r26,r30" } } */


Re: [PING][PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-08-02 Thread Vladimir Makarov via Gcc-patches



On 8/1/23 01:20, Surya Kumari Jangala wrote:

Ping

Sorry for delay with the answer. I was on vacation.

On 21/07/23 3:43 pm, Surya Kumari Jangala via Gcc-patches wrote:

The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.

If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.

But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.

The patch is OK for me.  Thank you for finding and fixing this issue.

2023-07-21  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR110254
* ira-color.cc (improve_allocation): Update array


I guess you missed the next line in the changelog.  I suspect it should 
be "Update array allocated_hard_reg_p."


Please, fix it before committing the patch.


---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 1fb2958bddd..5807d6d26f6 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3340,6 +3340,10 @@ improve_allocation (void)
}
/* Assign the best chosen hard register to A.  */
ALLOCNO_HARD_REGNO (a) = best;
+
+  for (j = nregs - 1; j >= 0; j--)
+   allocated_hardreg_p[best + j] = true;
+
if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
 best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));




Re: [PATCH] rtl-optimization/110587 - remove quadratic regno_in_use_p

2023-08-01 Thread Vladimir Makarov via Gcc-patches



On 7/25/23 09:40, Richard Biener wrote:

The following removes the code checking whether a noop copy
is between something involved in the return sequence composed
of a SET and USE.  Instead of checking for this special-case
the following makes us only ever remove noop copies between
pseudos - which is the case that is necessary for IRA/LRA
interfacing to function according to the comment.  That makes
looking for the return reg special case unnecessary, reducing
the compile-time in LRA non-specific to zero for the testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu with
all languages and {,-m32}.

OK?


Richard, sorry for the delay with the answer.  I was on vacation.

There is a lot of history of changes of the code.  I believe your change 
is right.  I don't think that RTL will ever contain noop return move 
insn involving the return hard register especially after removing hard 
reg propagation couple years ago, at least IRA/LRA do not generate such 
insns during its work.


So the patch is OK for me.  I specially like that the big part of code 
is removed.  No code, no problem (including performance one).  Thank you 
for the patch.



PR rtl-optimization/110587
* lra-spills.cc (return_regno_p): Remove.
(regno_in_use_p): Likewise.
(lra_final_code_change): Do not remove noop moves
between hard registers.
---
  gcc/lra-spills.cc | 69 +--
  1 file changed, 1 insertion(+), 68 deletions(-)

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3a7bb7e8cd9..fe58f162d05 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -705,72 +705,6 @@ alter_subregs (rtx *loc, bool final_p)
return res;
  }




[pushed][LRA]: Fix sparc bootstrap after recent patch for fp elimination for avr LRA port

2023-07-21 Thread Vladimir Makarov via Gcc-patches
The following patch fixes sparc solaris bootstrap.  The explanation of 
the patch is in the commit message.


The patch was successfully bootstrap on x86-64, aarch64, and sparc64 
solaris.


commit d17be8f7f36abe257a7d026dad61e5f8d14bdafc
Author: Vladimir N. Makarov 
Date:   Fri Jul 21 20:28:50 2023 -0400

[LRA]: Fix sparc bootstrap after recent patch for fp elimination for avr 
LRA port

The recent patch for fp elimination for avr LRA port modified an assert
which can be wrong for targets using hard frame pointer different from
frame pointer.  Also for such ports spilling pseudos assigned to fp
was wrong too in the new code.  Although this code is not used for any 
target
currently using LRA except for avr.  Given patch fixes the issues.

gcc/ChangeLog:

* lra-eliminations.cc (update_reg_eliminate): Fix the assert.
(lra_update_fp2sp_elimination): Use HARD_FRAME_POINTER_REGNUM
instead of FRAME_POINTER_REGNUM to spill pseudos.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index cf0aa94b69a..1f4e3fec9e0 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1179,8 +1179,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
  gcc_assert (ep->to_rtx != stack_pointer_rtx
  || (ep->from == FRAME_POINTER_REGNUM
  && !elimination_fp2sp_occured_p)
- || (ep->from != FRAME_POINTER_REGNUM
- && ep->from < FIRST_PSEUDO_REGISTER
+ || (ep->from < FIRST_PSEUDO_REGISTER
  && fixed_regs [ep->from]));
 
  /* Mark that is not eliminable anymore.  */
@@ -1398,7 +1397,7 @@ lra_update_fp2sp_elimination (void)
 " Frame pointer can not be eliminated anymore\n");
   frame_pointer_needed = true;
   CLEAR_HARD_REG_SET (set);
-  add_to_hard_reg_set (, Pmode, FRAME_POINTER_REGNUM);
+  add_to_hard_reg_set (, Pmode, HARD_FRAME_POINTER_REGNUM);
   spill_pseudos (set);
   for (ep = reg_eliminate; ep < _eliminate[NUM_ELIMINABLE_REGS]; ep++)
 if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)


Re: [pushed][LRA]: Check and update frame to stack pointer elimination after stack slot allocation

2023-07-21 Thread Vladimir Makarov via Gcc-patches



On 7/20/23 16:45, Rainer Orth wrote:

Hi Vladimir,


The following patch is necessary for porting avr to LRA.

The patch was successfully bootstrapped and tested on x86-64, aarch64, and
ppc64le.

There is still avr poring problem with reloading of subreg of frame
pointer.  I'll address it later on this week.

this patch most likely broke sparc-sun-solaris2.11 bootstrap:

/var/gcc/regression/master/11.4-gcc/build/./gcc/xgcc 
-B/var/gcc/regression/master/11.4-gcc/build/./gcc/ 
-B/vol/gcc/sparc-sun-solaris2.11/bin/ -B/vol/gcc/sparc-sun-solaris2.11/lib/ 
-isystem /vol/gcc/sparc-sun-solaris2.11/include -isystem 
/vol/gcc/sparc-sun-solaris2.11/sys-include   -fchecking=1 -c -g -O2   -W -Wall 
-gnatpg -nostdinc   g-alleve.adb -o g-alleve.o
+===GNAT BUG DETECTED==+
| 14.0.0 20230720 (experimental) [master 
506f068e7d01ad2fb107185b8fb204a0ec23785c] (sparc-sun-solaris2.11) GCC error:|
| in update_reg_eliminate, at lra-eliminations.cc:1179 |
| Error detected around g-alleve.adb:4132:8

This is in stage 3.  I haven't investigated further yet.


Thank you for reporting this.  I'll try to fix on this week.  I have a 
patch but unfortunately bootstrap is too slow.  If the patch does not 
work, I'll revert the original patch.





[pushed][LRA]: Exclude reloading of frame pointer in subreg for some cases

2023-07-20 Thread Vladimir Makarov via Gcc-patches
The following patch improves code for avr LRA port.  More explanation 
for the patch can be found in the commit message.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.
commit 4b8878fbf7b74ea5c3405c9f558df0517036f131
Author: Vladimir N. Makarov 
Date:   Thu Jul 20 14:34:26 2023 -0400

[LRA]: Exclude reloading of frame pointer in subreg for some cases

LRA for avr port reloads frame pointer in subreg although we can just
simplify the subreg.  It results in generation of bad performance code.  
The following
patch fixes this.

gcc/ChangeLog:

* lra-constraints.cc (simplify_operand_subreg): Check frame pointer
simplification.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 76a155e99c2..f3784cf5a5b 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1797,6 +1797,16 @@ simplify_operand_subreg (int nop, machine_mode reg_mode)
   alter_subreg (curr_id->operand_loc[nop], false);
   return true;
 }
+  auto fp_subreg_can_be_simplified_after_reload_p = [] (machine_mode innermode,
+   poly_uint64 offset,
+   machine_mode mode) {
+reload_completed = 1;
+bool res = simplify_subreg_regno (FRAME_POINTER_REGNUM,
+ innermode,
+ offset, mode) >= 0;
+reload_completed = 0;
+return res;
+  };
   /* Force a reload of the SUBREG_REG if this is a constant or PLUS or
  if there may be a problem accessing OPERAND in the outer
  mode.  */
@@ -1809,6 +1819,12 @@ simplify_operand_subreg (int nop, machine_mode reg_mode)
   >= hard_regno_nregs (hard_regno, mode))
&& simplify_subreg_regno (hard_regno, innermode,
 SUBREG_BYTE (operand), mode) < 0
+   /* Exclude reloading of frame pointer in subreg if frame pointer can not
+ be simplified here only because the reload is not finished yet.  */
+   && (hard_regno != FRAME_POINTER_REGNUM
+  || !fp_subreg_can_be_simplified_after_reload_p (innermode,
+  SUBREG_BYTE 
(operand),
+  mode))
/* Don't reload subreg for matching reload.  It is actually
  valid subreg in LRA.  */
&& ! LRA_SUBREG_P (operand))


[pushed][LRA]: Check and update frame to stack pointer elimination after stack slot allocation

2023-07-19 Thread Vladimir Makarov via Gcc-patches

The following patch is necessary for porting avr to LRA.

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


There is still avr poring problem with reloading of subreg of frame 
pointer.  I'll address it later on this week.


commit 2971ff7b1d564ac04b537d907c70e6093af70832
Author: Vladimir N. Makarov 
Date:   Wed Jul 19 09:35:37 2023 -0400

[LRA]: Check and update frame to stack pointer elimination after stack slot 
allocation

Avr is an interesting target which does not use stack pointer to
address stack slots.  The elimination of stack pointer to frame pointer
is impossible if there are stack slots.  During LRA works, the
stack slots can be allocated and used and the elimination can be done
anymore.  The situation can be complicated even more if some pseudos
were allocated to the frame pointer.

gcc/ChangeLog:

* lra-int.h (lra_update_fp2sp_elimination): New prototype.
(lra_asm_insn_error): New prototype.
* lra-spills.cc (remove_pseudos): Add check for pseudo slot memory
existence.
(lra_spill): Call lra_update_fp2sp_elimination.
* lra-eliminations.cc: Remove trailing spaces.
(elimination_fp2sp_occured_p): New static flag.
(lra_eliminate_regs_1): Set the flag up.
(update_reg_eliminate): Modify the assert for stack to frame
pointer elimination.
(lra_update_fp2sp_elimination): New function.
(lra_eliminate): Clear flag elimination_fp2sp_occured_p.

gcc/testsuite/ChangeLog:

* gcc.target/avr/lra-elim.c: New test.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 68225339cb6..cf0aa94b69a 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -286,7 +286,7 @@ move_plus_up (rtx x)
 {
   rtx subreg_reg;
   machine_mode x_mode, subreg_reg_mode;
-  
+
   if (GET_CODE (x) != SUBREG || !subreg_lowpart_p (x))
 return x;
   subreg_reg = SUBREG_REG (x);
@@ -309,6 +309,9 @@ move_plus_up (rtx x)
   return x;
 }
 
+/* Flag that we already did frame pointer to stack pointer elimination.  */
+static bool elimination_fp2sp_occured_p = false;
+
 /* Scan X and replace any eliminable registers (such as fp) with a
replacement (such as sp) if SUBST_P, plus an offset.  The offset is
a change in the offset between the eliminable register and its
@@ -366,6 +369,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (maybe_ne (update_sp_offset, 0))
{
  if (ep->to_rtx == stack_pointer_rtx)
@@ -396,9 +402,12 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
  poly_int64 offset, curr_offset;
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (! update_p && ! full_p)
return gen_rtx_PLUS (Pmode, to, XEXP (x, 1));
- 
+
  if (maybe_ne (update_sp_offset, 0))
offset = ep->to_rtx == stack_pointer_rtx ? update_sp_offset : 0;
  else
@@ -456,6 +465,9 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
{
  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
+ if (ep->to_rtx == stack_pointer_rtx && ep->from == 
FRAME_POINTER_REGNUM)
+   elimination_fp2sp_occured_p = true;
+
  if (maybe_ne (update_sp_offset, 0))
{
  if (ep->to_rtx == stack_pointer_rtx)
@@ -500,7 +512,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
 case LE:  case LT:   case LEU:case LTU:
   {
rtx new0 = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode,
-subst_p, update_p, 
+subst_p, update_p,
 update_sp_offset, full_p);
rtx new1 = XEXP (x, 1)
   ? lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode,
@@ -749,7 +761,7 @@ mark_not_eliminable (rtx x, machine_mode mem_mode)
  && poly_int_rtx_p (XEXP (XEXP (x, 1), 1), 
{
  poly_int64 size = GET_MODE_SIZE (mem_mode);
- 
+
 #ifdef PUSH_ROUNDING
  /* If more bytes than MEM_MODE are pushed, account for
 them.  */
@@ -822,7 +834,7 @@ mark_not_eliminable (rtx x, machine_mode mem_mode)
{
  /* See if this is setting the replacement hard register for
 an elimination.
-
+
 If DEST is the hard frame pointer, we do nothing because
 we assume 

[pushed][RA][PR109520]: Catch error when there are no enough registers for asm insn

2023-07-13 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109520

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


commit b175b4887f928118af997f6d4d75097a64dcec5d
Author: Vladimir N. Makarov 
Date:   Thu Jul 13 10:42:17 2023 -0400

[RA][PR109520]: Catch error when there are no enough registers for asm insn

Asm insn unlike other insns can have so many operands whose
constraints can not be satisfied.  It results in LRA cycling for such
test case.  The following patch catches such situation and reports the
problem.

PR middle-end/109520

gcc/ChangeLog:

* lra-int.h (lra_insn_recog_data): Add member asm_reloads_num.
(lra_asm_insn_error): New prototype.
* lra.cc: Include rtl_error.h.
(lra_set_insn_recog_data): Initialize asm_reloads_num.
(lra_asm_insn_error): New func whose code is taken from ...
* lra-assigns.cc (lra_split_hard_reg_for): ... here.  Use lra_asm_insn_error.
* lra-constraints.cc (curr_insn_transform): Check reloads nummber for asm.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr109520.c: New test.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 2f95121df06..3555926af66 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1851,20 +1851,8 @@ lra_split_hard_reg_for (void)
   insn = lra_insn_recog_data[u]->insn;
   if (asm_noperands (PATTERN (insn)) >= 0)
 	{
-	  lra_asm_error_p = asm_p = true;
-	  error_for_asm (insn,
-			 "% operand has impossible constraints");
-	  /* Avoid further trouble with this insn.  */
-	  if (JUMP_P (insn))
-	{
-	  ira_nullify_asm_goto (insn);
-	  lra_update_insn_regno_info (insn);
-	}
-	  else
-	{
-	  PATTERN (insn) = gen_rtx_USE (VOIDmode, const0_rtx);
-	  lra_set_insn_deleted (insn);
-	}
+	  asm_p = true;
+	  lra_asm_insn_error (insn);
 	}
   else if (!asm_p)
 	{
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 9bfc88149ff..0c6912d6e7d 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4813,6 +4813,10 @@ curr_insn_transform (bool check_only_p)
   lra_update_operator_dups (curr_id);
   /* Something changes -- process the insn.	 */
   lra_update_insn_regno_info (curr_insn);
+  if (asm_noperands (PATTERN (curr_insn)) >= 0
+	  && ++curr_id->asm_reloads_num >= FIRST_PSEUDO_REGISTER)
+	/* Most probably there are no enough registers to satisfy asm insn: */
+	lra_asm_insn_error (curr_insn);
 }
   lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
   return change_p;
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 4dbe6672f3a..a32359e5772 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -209,6 +209,9 @@ public:
  debug insn.  LRA_NON_CLOBBERED_ALT means ignoring any earlier
  clobbers for the insn.  */
   int used_insn_alternative;
+  /* Defined for asm insn and it is how many times we already generated reloads
+ for the asm insn.  */
+  int asm_reloads_num;
   /* SP offset before the insn relative to one at the func start.  */
   poly_int64 sp_offset;
   /* The insn itself.  */
@@ -307,6 +310,7 @@ extern void lra_delete_dead_insn (rtx_insn *);
 extern void lra_emit_add (rtx, rtx, rtx);
 extern void lra_emit_move (rtx, rtx);
 extern void lra_update_dups (lra_insn_recog_data_t, signed char *);
+extern void lra_asm_insn_error (rtx_insn *insn);
 
 extern void lra_process_new_insns (rtx_insn *, rtx_insn *, rtx_insn *,
    const char *);
diff --git a/gcc/lra.cc b/gcc/lra.cc
index c8b3f139acd..563aff10b96 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -106,6 +106,7 @@ along with GCC; see the file COPYING3.	If not see
 #include "backend.h"
 #include "target.h"
 #include "rtl.h"
+#include "rtl-error.h"
 #include "tree.h"
 #include "predict.h"
 #include "df.h"
@@ -536,6 +537,27 @@ lra_update_dups (lra_insn_recog_data_t id, signed char *nops)
 	*id->dup_loc[i] = *id->operand_loc[nop];
 }
 
+/* Report asm insn error and modify the asm insn.  */
+void
+lra_asm_insn_error (rtx_insn *insn)
+{
+  lra_asm_error_p = true;
+  error_for_asm (insn,
+		 "% operand has impossible constraints"
+		 " or there are not enough registers");
+  /* Avoid further trouble with this insn.  */
+  if (JUMP_P (insn))
+{
+  ira_nullify_asm_goto (insn);
+  lra_update_insn_regno_info (insn);
+}
+  else
+{
+  PATTERN (insn) = gen_rtx_USE (VOIDmode, const0_rtx);
+  lra_set_insn_deleted (insn);
+}
+}
+
 
 
 /* This page contains code dealing with info about registers in the
@@ -973,6 +995,7 @@ lra_set_insn_recog_data (rtx_insn *insn)
   lra_insn_recog_data[uid] = data;
   data->insn = insn;
   data->used_insn_alternative = LRA_UNKNOWN_ALT;
+  data->asm_reloads_num = 0;
   data->icode = icode;
   data->regs = NULL;
   if (DEBUG_INSN_P (insn))
diff --git a/gcc/testsuite/gcc.target/i386/pr109520.c 

Re: [IRA] Skip empty register classes in setup_reg_class_relations

2023-07-13 Thread Vladimir Makarov via Gcc-patches



On 7/12/23 07:05, senthilkumar.selva...@microchip.com wrote:

Hi,

   I've been spending some (spare) time trying to get LRA working
   for the avr target.


Thank you for addressing this problem.

The code you changing is very sensitive and was a source of multiple PRs 
in the past.  But I found the change your propose logical and I think it 
will not create problems.  Still please be alert and revert the patch if 
people reports the problem with this change.



  After making a couple of changes to get
   libgcc going, I'm now hitting an assert at
   lra-constraints.cc:4423 for a subarch (avrtiny) that has a
   couple of regclasses with no available registers.

   The assert fires because in_class_p (correctly) returns
   false for get_reg_class (regno) = ALL_REGS, and new_class =
   NO_LD_REGS. For avrtiny, NO_LD_REGS is an empty regset, and
   therefore hard_reg_set_subset_p (NO_LD_REGS, lra_no_alloc_regs)
   is always true, making in_class_p return false.

   in_class_p picks NO_LD_REGS as new_class because common_class =
   ira_reg_class_subset[ALL_REGS][NO_REGS] evaluates as
   NO_LD_REGS. This appears wrong to me - it should be NO_REGS
   instead (lra-constraints.cc:4421 checks for NO_REGS).

   ira.cc:setup_reg_class_relations sets up
   ira_reg_class_subset (among other things), and the problem
   appears to be a missing continue statement if
   reg_class_contents[cl3] (in the innermost loop) is empty.

   In this case, for cl1 = ALL_REGS and cl2 = NO_REGS, cl3 =
   NO_LD_REGS, temp_hard_regset and temp_set2 are both empty, and
   hard_reg_subset_p (, ) is always true, so
   ira_reg_class_subset[ALL_REGS][NO_REGS] ends up being set to
   cl3 = NO_LD_REGS. Adding a continue if hard_reg_set_empty_p 
(temp_hard_regset)
   fixes the problem for me.

   Does the below patch look ok? Bootstrapping and regression
   testing passed on x86_64.

OK.



Re: [pushed][LRA][PR110372]: Refine reload pseudo class

2023-07-12 Thread Vladimir Makarov via Gcc-patches



On 7/12/23 12:22, Richard Sandiford wrote:

Vladimir Makarov  writes:

On 7/12/23 06:07, Richard Sandiford wrote:

Vladimir Makarov via Gcc-patches  writes:

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 73fbef29912..2f95121df06 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1443,10 +1443,11 @@ assign_by_spills (void)
 pass.  Indicate that it is no longer spilled.  */
  bitmap_clear_bit (_spilled_pseudos, regno);
  assign_hard_regno (hard_regno, regno);
- if (! reload_p)
-   /* As non-reload pseudo assignment is changed we
-  should reconsider insns referring for the
-  pseudo.  */
+ if (! reload_p || regno_allocno_class_array[regno] == ALL_REGS)

Is this test meaningful on all targets?  We have some for which
GENERAL_REGS == ALL_REGS (e.g. nios2 and nvptx), so ALL_REGS can
be a valid allocation class.


Richard, thank you for the question.

As I remember nvptx does not use IRA/LRA.

I don't think it is a problem.  For targets with GENERAL_REGS ==
ALL_REGS, it only results in one more insn processing on the next
constraint sub-pass.

Ah, ok, thanks.  If there's no risk of cycling then I agree it
doesn't matter.
No. There is no additional risk of cycling as insn processing only 
starts after assigning hard reg to the reload pseudo and it can happens 
only once for the reload pseudo before spilling sub-pass.




Re: [pushed][LRA][PR110372]: Refine reload pseudo class

2023-07-12 Thread Vladimir Makarov via Gcc-patches



On 7/12/23 06:07, Richard Sandiford wrote:

Vladimir Makarov via Gcc-patches  writes:

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 73fbef29912..2f95121df06 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1443,10 +1443,11 @@ assign_by_spills (void)
 pass.  Indicate that it is no longer spilled.  */
  bitmap_clear_bit (_spilled_pseudos, regno);
  assign_hard_regno (hard_regno, regno);
- if (! reload_p)
-   /* As non-reload pseudo assignment is changed we
-  should reconsider insns referring for the
-  pseudo.  */
+ if (! reload_p || regno_allocno_class_array[regno] == ALL_REGS)

Is this test meaningful on all targets?  We have some for which
GENERAL_REGS == ALL_REGS (e.g. nios2 and nvptx), so ALL_REGS can
be a valid allocation class.


Richard, thank you for the question.

As I remember nvptx does not use IRA/LRA.

I don't think it is a problem.  For targets with GENERAL_REGS == 
ALL_REGS, it only results in one more insn processing on the next 
constraint sub-pass.


I could do more accurate solution but it would need introducing new data 
(flags) for pseudos which I'd like to avoid.




[pushed][LRA][PR110372]: Refine reload pseudo class

2023-07-07 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110372

The patch was successfully bootstrapped and tested on x86-64.
commit 1f7e5a7b91862b999aab88ee0319052aaf00f0f1
Author: Vladimir N. Makarov 
Date:   Fri Jul 7 09:53:38 2023 -0400

LRA: Refine reload pseudo class

For given testcase a reload pseudo happened to occur only in reload
insns created on one constraint sub-pass.  Therefore its initial class
(ALL_REGS) was not refined and the reload insns were not processed on
the next constraint sub-passes.  This resulted into the wrong insn.

PR rtl-optimization/110372

gcc/ChangeLog:

* lra-assigns.cc (assign_by_spills): Add reload insns involving
reload pseudos with non-refined class to be processed on the next
sub-pass.
* lra-constraints.cc (enough_allocatable_hard_regs_p): New func.
(in_class_p): Use it.
(print_curr_insn_alt): New func.
(process_alt_operands): Use it.  Improve debug info.
(curr_insn_transform): Use print_curr_insn_alt.  Refine reload
pseudo class if it is not refined yet.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110372.c: New.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 73fbef29912..2f95121df06 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1443,10 +1443,11 @@ assign_by_spills (void)
 		 pass.  Indicate that it is no longer spilled.  */
 	  bitmap_clear_bit (_spilled_pseudos, regno);
 	  assign_hard_regno (hard_regno, regno);
-	  if (! reload_p)
-		/* As non-reload pseudo assignment is changed we
-		   should reconsider insns referring for the
-		   pseudo.  */
+	  if (! reload_p || regno_allocno_class_array[regno] == ALL_REGS)
+		/* As non-reload pseudo assignment is changed we should
+		   reconsider insns referring for the pseudo.  Do the same if a
+		   reload pseudo did not refine its class which can happens
+		   when the pseudo occurs only in reload insns.  */
 		bitmap_set_bit (_pseudo_bitmap, regno);
 	}
 	}
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 4dc2d70c402..123ff662cbc 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -233,6 +233,34 @@ get_reg_class (int regno)
   return NO_REGS;
 }
 
+/* Return true if REG_CLASS has enough allocatable hard regs to keep value of
+   REG_MODE.  */
+static bool
+enough_allocatable_hard_regs_p (enum reg_class reg_class,
+enum machine_mode reg_mode)
+{
+  int i, j, hard_regno, class_size, nregs;
+  
+  if (hard_reg_set_subset_p (reg_class_contents[reg_class], lra_no_alloc_regs))
+return false;
+  class_size = ira_class_hard_regs_num[reg_class];
+  for (i = 0; i < class_size; i++)
+{
+  hard_regno = ira_class_hard_regs[reg_class][i];
+  nregs = hard_regno_nregs (hard_regno, reg_mode);
+  if (nregs == 1)
+	return true;
+  for (j = 0; j < nregs; j++)
+	if (TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno + j)
+	|| ! TEST_HARD_REG_BIT (reg_class_contents[reg_class],
+hard_regno + j))
+	  break;
+  if (j >= nregs)
+	return true;
+}
+  return false;
+}
+
 /* Return true if REG satisfies (or will satisfy) reg class constraint
CL.  Use elimination first if REG is a hard register.  If REG is a
reload pseudo created by this constraints pass, assume that it will
@@ -252,7 +280,6 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
   enum reg_class rclass, common_class;
   machine_mode reg_mode;
   rtx src;
-  int class_size, hard_regno, nregs, i, j;
   int regno = REGNO (reg);
 
   if (new_class != NULL)
@@ -291,26 +318,7 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
   common_class = ira_reg_class_subset[rclass][cl];
   if (new_class != NULL)
 	*new_class = common_class;
-  if (hard_reg_set_subset_p (reg_class_contents[common_class],
- lra_no_alloc_regs))
-	return false;
-  /* Check that there are enough allocatable regs.  */
-  class_size = ira_class_hard_regs_num[common_class];
-  for (i = 0; i < class_size; i++)
-	{
-	  hard_regno = ira_class_hard_regs[common_class][i];
-	  nregs = hard_regno_nregs (hard_regno, reg_mode);
-	  if (nregs == 1)
-	return true;
-	  for (j = 0; j < nregs; j++)
-	if (TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno + j)
-		|| ! TEST_HARD_REG_BIT (reg_class_contents[common_class],
-	hard_regno + j))
-	  break;
-	  if (j >= nregs)
-	return true;
-	}
-  return false;
+  return enough_allocatable_hard_regs_p (common_class, reg_mode);
 }
 }
 
@@ -2046,6 +2054,23 @@ update_and_check_small_class_inputs (int nop, int nalt,
   return false;
 }
 
+/* Print operand constraints for alternative ALT_NUMBER of the current
+   insn.  */
+static void
+print_curr_insn_alt (int alt_number)
+{
+  for (int i = 0; i < curr_static_id->n_operands; i++)
+{
+  const char *p = 

[pushed] [RA] [PR110215] Ignore conflicts for some pseudos from insns throwing a final exception

2023-06-16 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

The patch was successfully tested and bootstrapped on x86-64, aarch64, 
and ppc64le.


It is difficult to make a stable test for the PR.  So there is not test 
in the patch.


commit 154c69039571c66b3a6d16ecfa9e6ff22942f59f
Author: Vladimir N. Makarov 
Date:   Fri Jun 16 11:12:32 2023 -0400

RA: Ignore conflicts for some pseudos from insns throwing a final exception

IRA adds conflicts to the pseudos from insns can throw exceptions
internally even if the exception code is final for the function and
the pseudo value is not used in the exception code.  This results in
spilling a pseudo in a loop (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215).

The following patch fixes the problem.

PR rtl-optimization/110215

gcc/ChangeLog:

* ira-lives.cc: Include except.h.
(process_bb_node_lives): Ignore conflicts from cleanup exceptions
when the pseudo does not live at the exception landing pad.

diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index 6a3901ee234..bc8493856a4 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ira-int.h"
 #include "sparseset.h"
 #include "function-abi.h"
+#include "except.h"
 
 /* The code in this file is similar to one in global but the code
works on the allocno basis and creates live ranges instead of
@@ -1383,14 +1384,24 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 		  SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
 		  SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
 		}
-		  if (can_throw_internal (insn))
+		  eh_region r;
+		  eh_landing_pad lp;
+		  rtx_code_label *landing_label;
+		  basic_block landing_bb;
+		  if (can_throw_internal (insn)
+		  && (r = get_eh_region_from_rtx (insn)) != NULL
+		  && (lp = gen_eh_landing_pad (r)) != NULL
+		  && (landing_label = lp->landing_pad) != NULL
+		  && (landing_bb = BLOCK_FOR_INSN (landing_label)) != NULL
+		  && (r->type != ERT_CLEANUP
+			  || bitmap_bit_p (df_get_live_in (landing_bb),
+	   ALLOCNO_REGNO (a
 		{
-		  OBJECT_CONFLICT_HARD_REGS (obj)
-			|= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
-		  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
-			|= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+		  HARD_REG_SET new_conflict_regs
+			= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+		  OBJECT_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+		  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
 		}
-
 		  if (sparseset_bit_p (allocnos_processed, num))
 		continue;
 		  sparseset_set_bit (allocnos_processed, num);


Re: [pushed] [PR109541] RA: Constrain class of pic offset table pseudo to general regs

2023-06-07 Thread Vladimir Makarov via Gcc-patches


On 6/7/23 12:20, Jeff Law wrote:



On 6/7/23 09:35, Vladimir Makarov via Gcc-patches wrote:

The following patch fixes



-ENOPATCH


Sorry, here is the patch.

commit 08ca31fb27841cb7f3bff7086be6f139136be1a7
Author: Vladimir N. Makarov 
Date:   Wed Jun 7 09:51:54 2023 -0400

RA: Constrain class of pic offset table pseudo to general regs

On some targets an integer pseudo can be assigned to a FP reg.  For
pic offset table pseudo it means we will reload the pseudo in this
case and, as a consequence, memory containing the pseudo might be
recognized as wrong one.  The patch fix this problem.

PR target/109541

gcc/ChangeLog:

* ira-costs.cc: (find_costs_and_classes): Constrain classes of pic
  offset table pseudo to a general reg subset.

gcc/testsuite/ChangeLog:

* gcc.target/sparc/pr109541.c: New.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index ae8304ff938..d9e700e8947 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -2016,6 +2016,16 @@ find_costs_and_classes (FILE *dump_file)
 	  ira_assert (regno_aclass[i] != NO_REGS
 			  && ira_reg_allocno_class_p[regno_aclass[i]]);
 	}
+	  if (pic_offset_table_rtx != NULL
+	  && i == (int) REGNO (pic_offset_table_rtx))
+	{
+	  /* For some targets, integer pseudos can be assigned to fp
+		 regs.  As we don't want reload pic offset table pseudo, we
+		 should avoid using non-integer regs.  */
+	  regno_aclass[i]
+		= ira_reg_class_intersect[regno_aclass[i]][GENERAL_REGS];
+	  alt_class = ira_reg_class_intersect[alt_class][GENERAL_REGS];
+	}
 	  if ((new_class
 	   = (reg_class) (targetm.ira_change_pseudo_allocno_class
 			  (i, regno_aclass[i], best))) != regno_aclass[i])
diff --git a/gcc/testsuite/gcc.target/sparc/pr109541.c b/gcc/testsuite/gcc.target/sparc/pr109541.c
new file mode 100644
index 000..1360f101930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/pr109541.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -mcpu=niagara4 -fpic -w" } */
+
+int rhash_sha512_process_block_A, rhash_sha512_process_block_i,
+rhash_sha512_process_block_block, rhash_sha512_process_block_W_0;
+
+unsigned rhash_sha512_process_block_W_2;
+
+void rhash_sha512_process_block (void)
+{
+  unsigned C, E, F, G, H, W_0, W_4, W_9, W_5, W_3, T1;
+
+  for (; rhash_sha512_process_block_i; rhash_sha512_process_block_i += 6) {
+T1 = F + (rhash_sha512_process_block_W_2 += 6);
+rhash_sha512_process_block_A += H & G + (W_5 += rhash_sha512_process_block_W_0);
+H = C & T1 & E ^ F + (W_9 += rhash_sha512_process_block_W_0);
+G = T1 ^ 6 + (W_0 += rhash_sha512_process_block_block);
+F = (unsigned) 
+T1 = (unsigned) ( + (W_3 += rhash_sha512_process_block_block > 9 > W_4));
+C = (unsigned) (T1 + );
+W_4 += W_5 += rhash_sha512_process_block_W_0;
+  }
+}


[pushed] [PR109541] RA: Constrain class of pic offset table pseudo to general regs

2023-06-07 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109541

The patch was successfully bootstrapped and tested on x86-64, aarcha64, 
and ppc64le.




[pushed] LRA: Update insn sp offset if its input reload changes SP

2023-05-30 Thread Vladimir Makarov via Gcc-patches
The following patch fixes an LRA bug triggered by switching H8300 target 
from reload to LRA.  The description of the problem is in the commit 
message.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.


commit 30038a207c10a2783fa2695b62c7c8458ef05e73
Author: Vladimir N. Makarov 
Date:   Tue May 30 15:54:28 2023 -0400

LRA: Update insn sp offset if its input reload changes SP

The patch fixes a bug when there is input reload changing SP.  The bug was
triggered by switching H8300 target to LRA.  The insn in question is

(insn 21 20 22 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
(reg/f:SI 31)) "j.c":10:3 19 {*movsi}
 (expr_list:REG_DEAD (reg/f:SI 31)
(expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
(nil

The memory address is reloaded but the SP offset for the original insn was 
not updated.

gcc/ChangeLog:

* lra-int.h (lra_update_sp_offset): Add the prototype.
* lra.cc (setup_sp_offset): Change the return type.  Use
lra_update_sp_offset.
* lra-eliminations.cc (lra_update_sp_offset): New function.
(lra_process_new_insns): Push the current insn to reprocess if the
input reload changes sp offset.

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 4220639..68225339cb6 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1308,6 +1308,16 @@ init_elimination (void)
   setup_elimination_map ();
 }
 
+/* Update and return stack pointer OFFSET after processing X.  */
+poly_int64
+lra_update_sp_offset (rtx x, poly_int64 offset)
+{
+  curr_sp_change = offset;
+  mark_not_eliminable (x, VOIDmode);
+  return curr_sp_change;
+}
+
+
 /* Eliminate hard reg given by its location LOC.  */
 void
 lra_eliminate_reg_if_possible (rtx *loc)
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index a400a0f85e2..4dbe6672f3a 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -412,6 +412,7 @@ extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, 
machine_mode,
 extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, poly_int64);
 extern void lra_eliminate (bool, bool);
 
+extern poly_int64 lra_update_sp_offset (rtx, poly_int64);
 extern void lra_eliminate_reg_if_possible (rtx *);
 
 
diff --git a/gcc/lra.cc b/gcc/lra.cc
index eb3ee1f8b63..c8b3f139acd 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -1838,10 +1838,10 @@ push_insns (rtx_insn *from, rtx_insn *to)
   lra_push_insn (insn);
 }
 
-/* Set up sp offset for insn in range [FROM, LAST].  The offset is
+/* Set up and return sp offset for insns in range [FROM, LAST].  The offset is
taken from the next BB insn after LAST or zero if there in such
insn.  */
-static void
+static poly_int64
 setup_sp_offset (rtx_insn *from, rtx_insn *last)
 {
   rtx_insn *before = next_nonnote_nondebug_insn_bb (last);
@@ -1849,7 +1849,11 @@ setup_sp_offset (rtx_insn *from, rtx_insn *last)
   ? 0 : lra_get_insn_recog_data (before)->sp_offset);
 
   for (rtx_insn *insn = from; insn != NEXT_INSN (last); insn = NEXT_INSN 
(insn))
-lra_get_insn_recog_data (insn)->sp_offset = offset;
+{
+  lra_get_insn_recog_data (insn)->sp_offset = offset;
+  offset = lra_update_sp_offset (PATTERN (insn), offset);
+}
+  return offset;
 }
 
 /* Emit insns BEFORE before INSN and insns AFTER after INSN.  Put the
@@ -1875,8 +1879,25 @@ lra_process_new_insns (rtx_insn *insn, rtx_insn *before, 
rtx_insn *after,
   if (cfun->can_throw_non_call_exceptions)
copy_reg_eh_region_note_forward (insn, before, NULL);
   emit_insn_before (before, insn);
+  poly_int64 old_sp_offset = lra_get_insn_recog_data (insn)->sp_offset;
+  poly_int64 new_sp_offset = setup_sp_offset (before, PREV_INSN (insn));
+  if (maybe_ne (old_sp_offset, new_sp_offset))
+   {
+ if (lra_dump_file != NULL)
+   {
+ fprintf (lra_dump_file, "Changing sp offset from ");
+ print_dec (old_sp_offset, lra_dump_file);
+ fprintf (lra_dump_file, " to ");
+ print_dec (new_sp_offset, lra_dump_file);
+ fprintf (lra_dump_file, " for insn");
+ dump_rtl_slim (lra_dump_file, insn, NULL, -1, 0);
+   }
+ lra_get_insn_recog_data (insn)->sp_offset = new_sp_offset;
+ eliminate_regs_in_insn (insn, false, false,
+ old_sp_offset - new_sp_offset);
+ lra_push_insn (insn);
+   }
   push_insns (PREV_INSN (insn), PREV_INSN (before));
-  setup_sp_offset (before, PREV_INSN (insn));
 }
   if (after != NULL_RTX)
 {


Re: [PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2023-05-25 Thread Vladimir Makarov via Gcc-patches



On 5/17/23 02:57, liuhongt wrote:

r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
calculation when the preferred register class are not known yet.
It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
NO_REGS when mode can be allocated with GENERAL_REGS.
The patch takes a step back, still use GENERAL_REGS when
hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.
Kewen confirmed the patch fixed PR109858, I vefiried it also fixed PR109610.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
No big performance impact for SPEC2017 on icelake server.
Ok for trunk?

gcc/ChangeLog:

* ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
calculation when !hard_regno_mode_ok for GENERAL_REGS and
mode, otherwise still use GENERAL_REGS.


Thank you for the patch.  It looks good for me.  It is ok to commit it 
into the trunk.





Re: [PATCH] ira: Don't create copies for earlyclobbered pairs

2023-05-08 Thread Vladimir Makarov via Gcc-patches



On 5/5/23 12:59, Richard Sandiford wrote:

This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a
("the previous patch").  To start by quoting that:

If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn.  Allocating the
same register to the input and the output then saves the cost of a move.

If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency.  This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.

This heuristic seems to work well in the vast majority of cases.
However, the problem fixed in the previous patch was that we
could create a copy for an operand pair even if, for all relevant
alternatives, the output and input register classes did not have
any registers in common.  It is then impossible for the output
operand to reuse the dying input register.

This left unfixed a further case where copies don't make sense:
there is no point trying to reuse the dying input register if,
for all relevant alternatives, the output is earlyclobbered and
the input doesn't match the output.  (Matched earlyclobbers are fine.)

Handling that case fixes several existing XFAILs and helps with
a follow-on aarch64 patch.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  A SPEC2017 run
on aarch64 showed no differences outside the noise.  Also, I tried
compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
per cpu directory, using the options -Os -fno-schedule-insns{,2}.
The results below summarise the tests that showed a difference in LOC:

Target   Tests   GoodBad   DeltaBest   Worst  Median
==   =   ===   =   =  ==
amdgcn-amdhsa   14  7  7   3 -18  10  -1
arm-linux-gnueabihf 16 15  1 -22  -4   2  -1
csky-elf 6  6  0 -21  -6  -2  -4
hppa64-hp-hpux11.23  5  5  0  -7  -2  -1  -1
ia64-linux-gnu  16 16  0 -70 -15  -1  -3
m32r-elf53  1 52  64  -2   8   1
mcore-elf2  2  0  -8  -6  -2  -6
microblaze-elf 285283  2-909 -68   4  -1
mmix 7  7  0   -2101   -2091  -1  -1
msp430-elf   1  1  0  -4  -4  -4  -4
pru-elf  8  6  2 -12  -6   2  -2
rx-elf  22 18  4 -40  -5   6  -2
sparc-linux-gnu 15 14  1 -40  -8   1  -2
sparc-wrs-vxworks   15 14  1 -40  -8   1  -2
visium-elf   2  1  1   0  -2   2  -2
xstormy16-elf1  1  0  -2  -2  -2  -2

with other targets showing no sensitivity to the patch.  The only
target that seems to be negatively affected is m32r-elf; otherwise
the patch seems like an extremely minor but still clear improvement.

OK to install?


Yes, Richard.

Thank you for measuring the patch effect.  I wish other people would do 
the same for patches affecting generated code performance.




Re: [PATCH 1/2] Use NO_REGS in cost calculation when the preferred register class are not known yet.

2023-04-21 Thread Vladimir Makarov via Gcc-patches



On 4/19/23 20:46, liuhongt via Gcc-patches wrote:

1547  /* If this insn loads a parameter from its stack slot, then it
1548 represents a savings, rather than a cost, if the parameter is
1549 stored in memory.  Record this fact.
1550
1551 Similarly if we're loading other constants from memory (constant
1552 pool, TOC references, small data areas, etc) and this is the only
1553 assignment to the destination pseudo.

At that time, preferred regclass is unknown, and GENERAL_REGS is used to
record memory move cost, but it's not accurate especially for large vector
modes, i.e. 512-bit vector in x86 which would most probably allocate with
SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS here will overestimate
the cost of this load and make RA propagate the memeory operand into many
consume instructions which causes worse performance.


For this case GENERAL_REGS was used in GCC practically all the time.  
You can check this in the old regclass.c file (existing until IRA 
introduction).


But I guess it is ok to use NO_REGS for this to promote more usage of 
registers instead of equiv memory and as a lot of code was changed since 
then (the old versions of GCC even did not support vector regs).


Although it would be nice to do some benchmarking (SPEC is preferable) 
for such kind of changes.


On the other hand, I expect that any performance regression (if any) 
will be reported anyway.


The patch is ok for me.  You can commit it into the trunk.

Thank you for addressing this issue.


Fortunately, NO_REGS is used to record the best scenario, so the patch uses
NO_REGS instead of GENERAL_REGS here, it could help RA in PR108707.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
and aarch64-linux-gnu.
Ok for trunk?

gcc/ChangeLog:

PR rtl-optimization/108707
* ira-costs.cc (scan_one_insn): Use NO_REGS instead of
GENERAL_REGS when preferred reg_class is not known.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108707.c: New test.




[pushed] [LRA]: Exclude some hard regs for multi-reg inout reload pseudos used in asm in different mode

2023-04-20 Thread Vladimir Makarov via Gcc-patches
The following patch fixes test failure of 20030222-1.c on moxie port.  
But the problem can occur on other targets.  The patch actually 
implements the old reload approach for the test case.


The patch was successfully tested and bootstrapped on x86-64, aarch64, 
and ppc64le.


commit 51703ac3c722cd94011ab5b499921f6c9fe9fab5
Author: Vladimir N. Makarov 
Date:   Thu Apr 20 10:02:13 2023 -0400

[LRA]: Exclude some hard regs for multi-reg inout reload pseudos used in 
asm in different mode

See gcc.c-torture/execute/20030222-1.c.  Consider the code for 32-bit (e.g. 
BE) target:
  int i, v; long x; x = v; asm ("" : "=r" (i) : "0" (x));
We generate the following RTL with reload insns:
  1. subreg:si(x:di, 0) = 0;
  2. subreg:si(x:di, 4) = v:si;
  3. t:di = x:di, dead x;
  4. asm ("" : "=r" (subreg:si(t:di,4)) : "0" (t:di))
  5. i:si = subreg:si(t:di,4);
If we assign hard reg of x to t, dead code elimination will remove insn #2
and we will use unitialized hard reg.  So exclude the hard reg of x for t.
We could ignore this problem for non-empty asm using all x value but it is 
hard to
check that the asm are expanded into insn realy using x and setting r.
The old reload pass used the same approach.

gcc/ChangeLog

* lra-constraints.cc (match_reload): Exclude some hard regs for
multi-reg inout reload pseudos used in asm in different mode.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index b231cb60529..4dc2d70c402 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1022,6 +1022,34 @@ match_reload (signed char out, signed char *ins, signed 
char *outs,
 are ordered.  */
   if (partial_subreg_p (outmode, inmode))
{
+ bool asm_p = asm_noperands (PATTERN (curr_insn)) >= 0;
+ int hr;
+ HARD_REG_SET temp_hard_reg_set;
+ 
+ if (asm_p && (hr = get_hard_regno (out_rtx)) >= 0
+ && hard_regno_nregs (hr, inmode) > 1)
+   {
+ /* See gcc.c-torture/execute/20030222-1.c.
+Consider the code for 32-bit (e.g. BE) target:
+  int i, v; long x; x = v; asm ("" : "=r" (i) : "0" (x));
+We generate the following RTL with reload insns:
+  1. subreg:si(x:di, 0) = 0;
+  2. subreg:si(x:di, 4) = v:si;
+  3. t:di = x:di, dead x;
+  4. asm ("" : "=r" (subreg:si(t:di,4)) : "0" (t:di))
+  5. i:si = subreg:si(t:di,4);
+If we assign hard reg of x to t, dead code elimination
+will remove insn #2 and we will use unitialized hard reg.
+So exclude the hard reg of x for t.  We could ignore this
+problem for non-empty asm using all x value but it is hard to
+check that the asm are expanded into insn realy using x
+and setting r.  */
+ CLEAR_HARD_REG_SET (temp_hard_reg_set);
+ if (exclude_start_hard_regs != NULL)
+   temp_hard_reg_set = *exclude_start_hard_regs;
+ SET_HARD_REG_BIT (temp_hard_reg_set, hr);
+ exclude_start_hard_regs = _hard_reg_set;
+   }
  reg = new_in_reg
= lra_create_new_reg_with_unique_value (inmode, in_rtx, goal_class,
exclude_start_hard_regs,


Re: [PATCH] Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.

2023-04-05 Thread Vladimir Makarov via Gcc-patches



On 4/4/23 21:29, Jeff Law wrote:



On 4/3/23 23:13, liuhongt via Gcc-patches wrote:

There's a potential performance issue when backend returns some
unreasonable value for the mode which can be never be allocate with
reg class.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk(or GCC14 stage1)?

gcc/ChangeLog:

PR rtl-optimization/109351
* ira.cc (setup_class_subset_and_memory_move_costs): Check
hard_regno_mode_ok before setting lowest memory move cost for
the mode with different reg classes.
Not a regression *and* changing register allocation.  This seems like 
it should defer to gcc-14.


Yes, I am agree.  It should wait for gcc-14, especially when we are 
close to the release. Also the testing x86-64 is not enough for such 
changes (although I tried ppc64le and did not find any problem).


Cost related patches for RA frequently result in new testsuite failures 
on some targets.  Even if the change seems obvious and expected to 
improve the generated code.


Target dependent code sometimes defines correctly the costs only for 
some possible cases and making less dependent from this pitfall is 
good.  So I think the patch moves us to the right direction.


The patch is ok for me to commit it to the trunk after the gcc-13 
release and if arm64 testing shows no GCC testsuite regression.


Thank you for working on this issue.




[pushed][PR109052] LRA: Implement commutative operands exchange for combining secondary memory reload and original insn

2023-03-31 Thread Vladimir Makarov via Gcc-patches

This is one more patch for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109052

The patch adds trying commutative operands exchange for recently 
implemented combining secondary memory reload and original insn:


The patch was successfully bootstrapped and tested on x86_64.

commit 378d19cfebfa2bc4f693dfc9e6f0dd993e7c45f7
Author: Vladimir N. Makarov 
Date:   Fri Mar 31 11:04:44 2023 -0400

LRA: Implement commutative operands exchange for combining secondary memory reload and original insn

The patch implements trying commutative operands exchange for
combining secondary memory reload and original insn.

PR rtl-optimization/109052

gcc/ChangeLog:

* lra-constraints.cc: (combine_reload_insn): New function.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr109052-2.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 405b8b92f5e..ff4e8f06063 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5061,7 +5061,23 @@ combine_reload_insn (rtx_insn *from, rtx_insn *to)
   curr_insn = to;
   curr_id = lra_get_insn_recog_data (curr_insn);
   curr_static_id = curr_id->insn_static_data;
-  ok_p = !curr_insn_transform (true);
+  for (bool swapped_p = false;;)
+	{
+	  ok_p = !curr_insn_transform (true);
+	  if (ok_p || curr_static_id->commutative < 0)
+	break;
+	  swap_operands (curr_static_id->commutative);
+	  if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file,
+		   "Swapping %scombined insn operands:\n",
+		   swapped_p ? "back " : "");
+	  dump_insn_slim (lra_dump_file, to);
+	}
+	  if (swapped_p)
+	break;
+	  swapped_p = true;
+	}
   curr_insn = saved_insn;
   curr_id = lra_get_insn_recog_data (curr_insn);
   curr_static_id = curr_id->insn_static_data;
diff --git a/gcc/testsuite/gcc.target/i386/pr109052-2.c b/gcc/testsuite/gcc.target/i386/pr109052-2.c
new file mode 100644
index 000..337d1f49c2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109052-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mfpmath=both -mavx -fno-math-errno" } */
+
+double foo (double a, double b)
+{
+  double z = __builtin_fmod (a, 3.14);
+  return z * b;
+}
+
+/* { dg-final { scan-assembler-not "vmulsd\[ \t]\+%xmm\[0-9]\+, %xmm\[0-9]\+, %xmm\[0-9]\+"} } */


[pushed] [PR109137] LRA: Do not repeat inheritance and live range splitting in case of asm error

2023-03-22 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109137

The patch was successfully bootstrapped and tested on x86-64.

commit 81d762cbec9685c2f2571da21d48f42c42eff33b
Author: Vladimir N. Makarov 
Date:   Wed Mar 22 12:33:11 2023 -0400

LRA: Do not repeat inheritance and live range splitting in case of asm error

LRA was trying to do live range splitting again and again as there were
no enough regs for asm.  This patch solves the problem.

PR target/109137

gcc/ChangeLog:

* lra.cc (lra): Do not repeat inheritance and live range splitting
when asm error is found.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr109137.c: New.

diff --git a/gcc/lra.cc b/gcc/lra.cc
index f7fdd601e71..eb3ee1f8b63 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -2453,7 +2453,7 @@ lra (FILE *f)
 		  lra_hard_reg_split_p = true;
 		}
 	}
-	  while (fails_p);
+	  while (fails_p && !lra_asm_error_p);
 	  if (! live_p) {
 	/* We need the correct reg notes for work of constraint sub-pass.  */
 	lra_create_live_ranges (true, true);
diff --git a/gcc/testsuite/gcc.target/i386/pr109137.c b/gcc/testsuite/gcc.target/i386/pr109137.c
new file mode 100644
index 000..ffd8e8c574b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109137.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-m32 -O3 -march=znver1 -fPIC -mfpmath=sse -w" } */
+#include 
+typedef struct {
+  char bytestream_end;
+} CABACContext;
+int get_cabac___trans_tmp_3, get_cabac_tmp, get_cabac_c,
+decode_cabac_mb_intra4x4_pred_mode_mode, ff_h264_decode_mb_cabac_h_0,
+ff_h264_decode_mb_cabac_bit;
+typedef struct {
+  char intra4x4_pred_mode_cache[2];
+} H264SliceContext;
+H264SliceContext ff_h264_decode_mb_cabac_sl;
+void ff_h264_decode_mb_cabac(void) {
+  memset((void*)ff_h264_decode_mb_cabac_h_0, 6, 48);
+  int i;
+  for (;; i++) {
+__asm__(""/* { dg-error "'asm' operand has impossible constraints" } */
+: "="(ff_h264_decode_mb_cabac_bit), "="(get_cabac_c),
+  "="(get_cabac_c), "="(get_cabac_tmp)
+: "r"(get_cabac___trans_tmp_3),
+  "r"(__builtin_offsetof(CABACContext, bytestream_end))
+: "ecx");
+ff_h264_decode_mb_cabac_sl.intra4x4_pred_mode_cache[i] =
+decode_cabac_mb_intra4x4_pred_mode_mode;
+  }
+}
+


[pushed] [PR109052] LRA: Implement combining secondary memory reload and original insn

2023-03-17 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109052

The patch was successfully bootstrapped and tested on x86-64, i686, 
aarch64, and ppc64le.
commit 57688950b9328cbb4a9c21eb3199f9132b5119d3
Author: Vladimir N. Makarov 
Date:   Fri Mar 17 08:58:58 2023 -0400

LRA: Implement combining secondary memory reload and original insn

LRA creates secondary memory reload insns but do not try to combine it
with the original insn.  This patch implements a simple insn combining
for such cases in LRA.

PR rtl-optimization/109052

gcc/ChangeLog:

* lra-constraints.cc: Include hooks.h.
(combine_reload_insn): New function.
(lra_constraints): Call it.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr109052.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c38566a7451..95b534e1a70 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -110,6 +110,7 @@
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "hooks.h"
 #include "target.h"
 #include "rtl.h"
 #include "tree.h"
@@ -5001,6 +5002,96 @@ contains_reloaded_insn_p (int regno)
   return false;
 }
 
+/* Try combine secondary memory reload insn FROM for insn TO into TO insn.
+   FROM should be a load insn (usually a secondary memory reload insn).  Return
+   TRUE in case of success.  */
+static bool
+combine_reload_insn (rtx_insn *from, rtx_insn *to)
+{
+  bool ok_p;
+  rtx_insn *saved_insn;
+  rtx set, from_reg, to_reg, op;
+  enum reg_class to_class, from_class;
+  int n, nop;
+  signed char changed_nops[MAX_RECOG_OPERANDS + 1];
+  lra_insn_recog_data_t id = lra_get_insn_recog_data (to);
+  struct lra_static_insn_data *static_id = id->insn_static_data;
+  
+  /* Check conditions for second memory reload and original insn:  */
+  if ((targetm.secondary_memory_needed
+   == hook_bool_mode_reg_class_t_reg_class_t_false)
+  || NEXT_INSN (from) != to || CALL_P (to)
+  || id->used_insn_alternative == LRA_UNKNOWN_ALT
+  || (set = single_set (from)) == NULL_RTX)
+return false;
+  from_reg = SET_DEST (set);
+  to_reg = SET_SRC (set);
+  /* Ignore optional reloads: */
+  if (! REG_P (from_reg) || ! REG_P (to_reg)
+  || bitmap_bit_p (_optional_reload_pseudos, REGNO (from_reg)))
+return false;
+  to_class = lra_get_allocno_class (REGNO (to_reg));
+  from_class = lra_get_allocno_class (REGNO (from_reg));
+  /* Check that reload insn is a load:  */
+  if (to_class != NO_REGS || from_class == NO_REGS)
+return false;
+  for (n = nop = 0; nop < static_id->n_operands; nop++)
+{
+  if (static_id->operand[nop].type != OP_IN)
+	continue;
+  op = *id->operand_loc[nop];
+  if (!REG_P (op) || REGNO (op) != REGNO (from_reg))
+	continue;
+  *id->operand_loc[nop] = to_reg;
+  changed_nops[n++] = nop;
+}
+  changed_nops[n] = -1;
+  lra_update_dups (id, changed_nops);
+  lra_update_insn_regno_info (to);
+  ok_p = recog_memoized (to) >= 0;
+  if (ok_p)
+{
+  /* Check that combined insn does not need any reloads: */
+  saved_insn = curr_insn;
+  curr_insn = to;
+  curr_id = lra_get_insn_recog_data (curr_insn);
+  curr_static_id = curr_id->insn_static_data;
+  ok_p = !curr_insn_transform (true);
+  curr_insn = saved_insn;
+  curr_id = lra_get_insn_recog_data (curr_insn);
+  curr_static_id = curr_id->insn_static_data;
+}
+  if (ok_p)
+{
+  id->used_insn_alternative = -1;
+  lra_push_insn_and_update_insn_regno_info (to);
+  if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file, "Use combined insn:\n");
+	  dump_insn_slim (lra_dump_file, to);
+	}
+  return true;
+}
+  if (lra_dump_file != NULL)
+{
+  fprintf (lra_dump_file, "Failed combined insn:\n");
+  dump_insn_slim (lra_dump_file, to);
+}
+  for (int i = 0; i < n; i++)
+{
+  nop = changed_nops[i];
+  *id->operand_loc[nop] = from_reg;
+}
+  lra_update_dups (id, changed_nops);
+  lra_update_insn_regno_info (to);
+  if (lra_dump_file != NULL)
+{
+  fprintf (lra_dump_file, "Restoring insn after failed combining:\n");
+  dump_insn_slim (lra_dump_file, to);
+}
+  return false;
+}
+
 /* Entry function of LRA constraint pass.  Return true if the
constraint pass did change the code.	 */
 bool
@@ -5010,6 +5101,7 @@ lra_constraints (bool first_p)
   int i, hard_regno, new_insns_num;
   unsigned int min_len, new_min_len, uid;
   rtx set, x, reg, dest_reg;
+  rtx_insn *original_insn;
   basic_block last_bb;
   bitmap_iterator bi;
 
@@ -5119,6 +5211,7 @@ lra_constraints (bool first_p)
   new_insns_num = 0;
   last_bb = NULL;
   changed_p = false;
+  original_insn = NULL;
   while ((new_min_len = lra_insn_stack_length ()) != 0)
 {
   curr_insn = lra_pop_insn ();
@@ -5133,7 +5226,12 @@ lra_constraints (bool first_p)
 	{
 	  min_len = new_min_len;
 	  new_insns_num = 0;
+	  

[pushed] [PR108999] LRA: For clobbered regs use operand mode instead of the biggest mode

2023-03-09 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999

The patch was successfully bootstrapped and tested on i686, x86-64, 
aarch64, and ppc64 be/le.
commit 3c75631fc09a22f2513fab80ef502c2a8b0f9121
Author: Vladimir N. Makarov 
Date:   Thu Mar 9 08:41:09 2023 -0500

LRA: For clobbered regs use operand mode instead of the biggest mode

LRA is too conservative in calculation of conflicts with clobbered regs by
using the biggest access mode.  This results in failure of possible reg
coalescing and worse code.  This patch solves the problem.

PR rtl-optimization/108999

gcc/ChangeLog:

* lra-constraints.cc (process_alt_operands): Use operand modes for
clobbered regs instead of the biggest access mode.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr108999.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index dbfaf0485a5..c38566a7451 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3108,7 +3108,8 @@ process_alt_operands (int only_alternative)
 	  lra_assert (operand_reg[i] != NULL_RTX);
 	  clobbered_hard_regno = hard_regno[i];
 	  CLEAR_HARD_REG_SET (temp_set);
-	  add_to_hard_reg_set (_set, biggest_mode[i], clobbered_hard_regno);
+	  add_to_hard_reg_set (_set, GET_MODE (*curr_id->operand_loc[i]),
+			   clobbered_hard_regno);
 	  first_conflict_j = last_conflict_j = -1;
 	  for (j = 0; j < n_operands; j++)
 	if (j == i
diff --git a/gcc/testsuite/gcc.target/aarch64/pr108999.c b/gcc/testsuite/gcc.target/aarch64/pr108999.c
new file mode 100644
index 000..a34db85be83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr108999.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+sve" } */
+#include 
+
+void subreg_coalesce5 (
+svbool_t pg, int64_t* base, int n,
+int64_t *in1, int64_t *in2, int64_t*out
+)
+{
+svint64x2_t result = svld2_s64 (pg, base);
+
+for (int i = 0; i < n; i += 1) {
+svint64_t v18 = svld1_s64(pg, in1 + i);
+svint64_t v19 = svld1_s64(pg, in2 + i);
+result.__val[0] = svmad_s64_z(pg, v18, v19, result.__val[0]);
+result.__val[1] = svmad_s64_z(pg, v18, v19, result.__val[1]);
+}
+svst2_s64(pg, out, result);
+}
+
+/* { dg-final { scan-assembler-not {[ \t]*mov[ \t]*z[0-9]+\.d} } } */


[pushed][PR90706] IRA: Use minimal cost for hard register movement

2023-03-02 Thread Vladimir Makarov via Gcc-patches

The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

The patch was successfully bootstrapped and tested on i686, x86-64, 
aarch64, ppc64le.


commit 23661e39df76e07fb4ce1ea015379c7601d947ef
Author: Vladimir N. Makarov 
Date:   Thu Mar 2 16:29:05 2023 -0500

IRA: Use minimal cost for hard register movement

This is the 2nd attempt to fix PR90706.  IRA calculates wrong AVR
costs for moving general hard regs of SFmode.  This was the reason for
spilling a pseudo in the PR.  In this patch we use smaller move cost
of hard reg in its natural and operand modes.

PR rtl-optimization/90706

gcc/ChangeLog:

* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
(record_operand_costs): Find and use smaller cost for hard reg
move.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr90706.c: New.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index 4c28171f27d..c0fdef807dd 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ira-int.h"
 #include "addresses.h"
 #include "reload.h"
+#include "print-rtl.h"
 
 /* The flags is set up every time when we calculate pseudo register
classes through function ira_set_pseudo_classes.  */
@@ -503,6 +504,18 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
   int insn_allows_mem[MAX_RECOG_OPERANDS];
   move_table *move_in_cost, *move_out_cost;
   short (*mem_cost)[2];
+  const char *p;
+
+  if (ira_dump_file != NULL && internal_flag_ira_verbose > 5)
+{
+  fprintf (ira_dump_file, "Processing insn %u", INSN_UID (insn));
+  if (INSN_CODE (insn) >= 0
+	  && (p = get_insn_name (INSN_CODE (insn))) != NULL)
+	fprintf (ira_dump_file, " {%s}", p);
+  fprintf (ira_dump_file, " (freq=%d)\n",
+	   REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn)));
+  dump_insn_slim (ira_dump_file, insn);
+  }
 
   for (i = 0; i < n_ops; i++)
 insn_allows_mem[i] = 0;
@@ -526,6 +539,21 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
 	  continue;
 	}
 
+  if (ira_dump_file != NULL && internal_flag_ira_verbose > 5)
+	{
+	  fprintf (ira_dump_file, "  Alt %d:", alt);
+	  for (i = 0; i < n_ops; i++)
+	{
+	  p = constraints[i];
+	  if (*p == '\0')
+		continue;
+	  fprintf (ira_dump_file, "  (%d) ", i);
+	  for (; *p != '\0' && *p != ',' && *p != '#'; p++)
+		fputc (*p, ira_dump_file);
+	}
+	  fprintf (ira_dump_file, "\n");
+	}
+
   for (i = 0; i < n_ops; i++)
 	{
 	  unsigned char c;
@@ -593,12 +621,16 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
 		 register, this alternative can't be used.  */
 
 		  if (classes[j] == NO_REGS)
-		alt_fail = 1;
-		  /* Otherwise, add to the cost of this alternative
-		 the cost to copy the other operand to the hard
-		 register used for this operand.  */
+		{
+		  alt_fail = 1;
+		}
 		  else
-		alt_cost += copy_cost (ops[j], mode, classes[j], 1, NULL);
+		/* Otherwise, add to the cost of this alternative the cost
+		   to copy the other operand to the hard register used for
+		   this operand.  */
+		{
+		  alt_cost += copy_cost (ops[j], mode, classes[j], 1, NULL);
+		}
 		}
 	  else
 		{
@@ -1021,18 +1053,45 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
   for (i = 0; i < n_ops; i++)
 	if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
 	  {
+	int old_cost;
+	bool cost_change_p = false;
 	struct costs *pp = op_costs[i], *qq = this_op_costs[i];
 	int *pp_costs = pp->cost, *qq_costs = qq->cost;
 	int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
 	cost_classes_t cost_classes_ptr
 	  = regno_cost_classes[REGNO (ops[i])];
 
-	pp->mem_cost = MIN (pp->mem_cost,
+	old_cost = pp->mem_cost;
+	pp->mem_cost = MIN (old_cost,
 (qq->mem_cost + op_cost_add) * scale);
 
+	if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+		&& pp->mem_cost < old_cost)
+	  {
+		cost_change_p = true;
+		fprintf (ira_dump_file, "op %d(r=%u) new costs MEM:%d",
+			 i, REGNO(ops[i]), pp->mem_cost);
+	  }
 	for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-	  pp_costs[k]
-		= MIN (pp_costs[k], (qq_costs[k] + op_cost_add) * scale);
+	  {
+		old_cost = pp_costs[k];
+		pp_costs[k]
+		  = MIN (old_cost, (qq_costs[k] + op_cost_add) * scale);
+		if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+		&& pp_costs[k] < old_cost)
+		  {
+		if (!cost_change_p)
+		  fprintf (ira_dump_file, "op %d(r=%u) new costs",
+			   i, REGNO(ops[i]));
+		cost_change_p = true;
+		fprintf (ira_dump_file, " %s:%d",
+			 reg_class_names[cost_classes_ptr->classes[k]],
+			 pp_costs[k]);
+		  }
+	  }
+	if (ira_dump_file != NULL && internal_flag_ira_verbose > 5

[pushed] [PR108774] RA: Clear reg equiv caller_save_p flag when clearing defined_p flag

2023-02-13 Thread Vladimir Makarov via Gcc-patches

The following patch solves

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108774

The patch was successfully bootstrapped and tested on i686, x86_64, and 
aarch64.
commit a33e3dcbd15e73603796e30b5eeec11a0c8bacec
Author: Vladimir N. Makarov 
Date:   Mon Feb 13 16:05:04 2023 -0500

RA: Clear reg equiv caller_save_p flag when clearing defined_p flag

IRA can invalidate initially setup equivalence in setup_reg_equiv.
Flag caller_saved was not cleared during invalidation although
init_insns were cleared.  It resulted in segmentation fault in
get_equiv.  Clearing the flag solves the problem.  For more
precaution I added clearing the flag in other places too although it
might be not necessary.

PR rtl-optimization/108774

gcc/ChangeLog:

* ira.cc (ira_update_equiv_info_by_shuffle_insn): Clear equiv
caller_save_p flag when clearing defined_p flag.
(setup_reg_equiv): Ditto.
* lra-constraints.cc (lra_constraints): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108774.c: New.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 9f9af808f63..6c7f4901e4c 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -2725,6 +2725,7 @@ ira_update_equiv_info_by_shuffle_insn (int to_regno, int from_regno, rtx_insn *i
 	  return;
 	}
   ira_reg_equiv[to_regno].defined_p = false;
+  ira_reg_equiv[to_regno].caller_save_p = false;
   ira_reg_equiv[to_regno].memory
 	= ira_reg_equiv[to_regno].constant
 	= ira_reg_equiv[to_regno].invariant
@@ -4193,6 +4194,7 @@ setup_reg_equiv (void)
 			if (ira_reg_equiv[i].memory == NULL_RTX)
 			  {
 			ira_reg_equiv[i].defined_p = false;
+			ira_reg_equiv[i].caller_save_p = false;
 			ira_reg_equiv[i].init_insns = NULL;
 			break;
 			  }
@@ -4203,6 +4205,7 @@ setup_reg_equiv (void)
 	  }
 	  }
 	ira_reg_equiv[i].defined_p = false;
+	ira_reg_equiv[i].caller_save_p = false;
 	ira_reg_equiv[i].init_insns = NULL;
 	break;
   }
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index dd4f68bbfc0..dbfaf0485a5 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5100,7 +5100,8 @@ lra_constraints (bool first_p)
 			 && (targetm.preferred_reload_class
 			 (x, lra_get_allocno_class (i)) == NO_REGS))
 			|| contains_symbol_ref_p (x
-	  ira_reg_equiv[i].defined_p = false;
+	  ira_reg_equiv[i].defined_p
+		= ira_reg_equiv[i].caller_save_p = false;
 	if (contains_reg_p (x, false, true))
 	  ira_reg_equiv[i].profitable_p = false;
 	if (get_equiv (reg) != reg)
diff --git a/gcc/testsuite/gcc.target/i386/pr108774.c b/gcc/testsuite/gcc.target/i386/pr108774.c
new file mode 100644
index 000..482bc490cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108774.c
@@ -0,0 +1,11 @@
+/* PR target/108774 */
+/* { dg-do compile  { target x86_64-*-* } } */
+/* { dg-options "-Os -ftrapv -mcmodel=large" } */
+
+int i, j;
+
+void
+foo (void)
+{
+  i = ((1 << j) - 1) >> j;
+}


[pushed] [PR108754] RA: Use caller save equivalent memory only for LRA

2023-02-10 Thread Vladimir Makarov via Gcc-patches

The following patch should  solve

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

The patch simply switches off a new optimization for targets using the 
old reload pass.


The patch was successfully bootstrapped on x86-64.

commit 7757567358a84c3774cb972350bd7ea299daaa8d
Author: Vladimir N. Makarov 
Date:   Fri Feb 10 12:17:07 2023 -0500

RA: Use caller save equivalent memory only for LRA

Recently I submitted a patch to reuse memory with constant address for
caller saves optimization for constant or pure function call.  It
seems to work only for targets using LRA instead of the old reload
pass.  So the patch switches off this optimization when the old reload
pass is used.

PR middle-end/108754

gcc/ChangeLog:

* ira.cc (update_equiv_regs): Set up ira_reg_equiv for
valid_combine only when ira_use_lra_p is true.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index d0b6ea062e8..9f9af808f63 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -3773,7 +3773,7 @@ update_equiv_regs (void)
 		{
 		  note = set_unique_reg_note (insn, REG_EQUIV, replacement);
 		}
-		  else
+		  else if (ira_use_lra_p)
 		{
 		  /* We still can use this equivalence for caller save
 			 optimization in LRA.  Mark this.  */


[pushed] [PR108500] RA: Use simple LRA for huge functions

2023-02-10 Thread Vladimir Makarov via Gcc-patches

The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108500

The patch improves compilation speed.  Compilation time of the biggest 
test in the PR decreases from 1235s to 709s.


The patch was successfully bootstrapped on x86-64.
commit 02371cdd755d2b53fb580d3e8209c44e0c45c337
Author: Vladimir N. Makarov 
Date:   Fri Feb 10 11:12:37 2023 -0500

RA: Use simple LRA for huge functions

The PR108500 test contains a huge function and RA spends a lot of time
to compile the test with -O0.  The patch decreases compilation time
considerably for huge functions.  Compilation time for the PR test
decreases from 1235s to 709s on Intel i7-13600K.

PR tree-optimization/108500

gcc/ChangeLog:

* params.opt (ira-simple-lra-insn-threshold): Add new param.
* ira.cc (ira): Use the param to switch on simple LRA.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 6143db06c52..d0b6ea062e8 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -5624,12 +5624,16 @@ ira (FILE *f)
 if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
   num_used_regs++;
 
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
- pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
- use simplified and faster algorithms in LRA.  */
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K pseudos and
+ 10K blocks or 100K pseudos and 1K blocks) or we have too many function
+ insns, we will use simplified and faster algorithms in LRA.  */
   lra_simple_p
-= ira_use_lra_p
-  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
+= (ira_use_lra_p
+   && (num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun)
+   /* max uid is a good evaluation of the number of insns as most
+  optimizations are done on tree-SSA level.  */
+   || ((uint64_t) get_max_uid ()
+	   > (uint64_t) param_ira_simple_lra_insn_threshold * 1000)));
 
   if (lra_simple_p)
 {
diff --git a/gcc/params.opt b/gcc/params.opt
index 8a128c321c9..c7913d9063a 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -302,6 +302,10 @@ The number of registers in each class kept unused by loop invariant motion.
 Common Joined UInteger Var(param_ira_max_conflict_table_size) Init(1000) Param Optimization
 Max size of conflict table in MB.
 
+-param=ira-simple-lra-insn-threshold=
+Common Joined UInteger Var(param_ira_simple_lra_insn_threshold) Init(1000) Param Optimization
+Approximate function insn number in 1K units triggering simple local RA.
+
 -param=ira-max-loops-num=
 Common Joined UInteger Var(param_ira_max_loops_num) Init(100) Param Optimization
 Max loops number for regional RA.


[pushed] [PR103541] RA: Implement reuse of equivalent memory for caller saves optimization (version 2)

2023-02-09 Thread Vladimir Makarov via Gcc-patches

This is another try to solve

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103541

The patch was successfully bootstrapped (--enable-languages=all) and 
tested on x86, x86-64, aarch64
commit 1ad898d18904ac68432ba9b8ffa2b083d007cc2d
Author: Vladimir N. Makarov 
Date:   Thu Feb 9 15:18:48 2023 -0500

RA: Implement reuse of equivalent memory for caller saves optimization (2nd version)

The test pr103541.c shows opportunity to reuse memory with constant address for
caller saves optimization for constant or pure function call.  The patch
implements the memory reuse.

PR rtl-optimization/103541
PR rtl-optimization/108711

gcc/ChangeLog:

* ira.h (struct ira_reg_equiv_s): Add new field caller_save_p.
* ira.cc (validate_equiv_mem): Check memref address variance.
(no_equiv): Clear caller_save_p flag.
(update_equiv_regs): Define caller save equivalence for
valid_combine.
(setup_reg_equiv): Clear defined_p flag for caller save equivalence.
* lra-constraints.cc (lra_copy_reg_equiv): Add new arg
call_save_p.  Use caller save equivalence depending on the arg.
(split_reg): Adjust the call.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103541.c: New.
* g++.target/i386/pr108711.C: New.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 66df03e8a59..6143db06c52 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -3070,6 +3070,8 @@ validate_equiv_mem_from_store (rtx dest, const_rtx set ATTRIBUTE_UNUSED,
 info->equiv_mem_modified = true;
 }
 
+static int equiv_init_varies_p (rtx x);
+
 enum valid_equiv { valid_none, valid_combine, valid_reload };
 
 /* Verify that no store between START and the death of REG invalidates
@@ -3113,7 +3115,8 @@ validate_equiv_mem (rtx_insn *start, rtx reg, rtx memref)
 	 been changed and all hell breaks loose.  */
 	  ret = valid_combine;
 	  if (!MEM_READONLY_P (memref)
-	  && !RTL_CONST_OR_PURE_CALL_P (insn))
+	  && (!RTL_CONST_OR_PURE_CALL_P (insn)
+		  || equiv_init_varies_p (XEXP (memref, 0
 	return valid_none;
 	}
 
@@ -3414,6 +3417,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSED,
   if (reg_equiv[regno].is_arg_equivalence)
 return;
   ira_reg_equiv[regno].defined_p = false;
+  ira_reg_equiv[regno].caller_save_p = false;
   ira_reg_equiv[regno].init_insns = NULL;
   for (; list; list = list->next ())
 {
@@ -3766,7 +3770,18 @@ update_equiv_regs (void)
 		{
 		  replacement = copy_rtx (SET_SRC (set));
 		  if (validity == valid_reload)
-		note = set_unique_reg_note (insn, REG_EQUIV, replacement);
+		{
+		  note = set_unique_reg_note (insn, REG_EQUIV, replacement);
+		}
+		  else
+		{
+		  /* We still can use this equivalence for caller save
+			 optimization in LRA.  Mark this.  */
+		  ira_reg_equiv[regno].caller_save_p = true;
+		  ira_reg_equiv[regno].init_insns
+			= gen_rtx_INSN_LIST (VOIDmode, insn,
+	 ira_reg_equiv[regno].init_insns);
+		}
 		}
 	}
 
@@ -4156,7 +4171,7 @@ setup_reg_equiv (void)
 		   legitimate, we ignore such REG_EQUIV notes.  */
 		if (memory_operand (x, VOIDmode))
 		  {
-		ira_reg_equiv[i].defined_p = true;
+		ira_reg_equiv[i].defined_p = !ira_reg_equiv[i].caller_save_p;
 		ira_reg_equiv[i].memory = x;
 		continue;
 		  }
diff --git a/gcc/ira.h b/gcc/ira.h
index 58b50dbe8a2..3d35025a46e 100644
--- a/gcc/ira.h
+++ b/gcc/ira.h
@@ -175,8 +175,11 @@ extern struct target_ira *this_target_ira;
 /* Major structure describing equivalence info for a pseudo.  */
 struct ira_reg_equiv_s
 {
-  /* True if we can use this equivalence.  */
+  /* True if we can use this as a general equivalence.  */
   bool defined_p;
+  /* True if we can use this equivalence only for caller save/restore
+ location.  */
+  bool caller_save_p;
   /* True if the usage of the equivalence is profitable.  */
   bool profitable_p;
   /* Equiv. memory, constant, invariant, and initializing insns of
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 7bffbc07ee2..dd4f68bbfc0 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5771,14 +5771,17 @@ choose_split_class (enum reg_class allocno_class,
   return best_cl;
 }
 
-/* Copy any equivalence information from ORIGINAL_REGNO to NEW_REGNO.
-   It only makes sense to call this function if NEW_REGNO is always
-   equal to ORIGINAL_REGNO.  */
+/* Copy any equivalence information from ORIGINAL_REGNO to NEW_REGNO.  It only
+   makes sense to call this function if NEW_REGNO is always equal to
+   ORIGINAL_REGNO.  Set up defined_p flag when caller_save_p flag is set up and
+   CALL_SAVE_P is true.  */
 
 static void
-lra_copy_reg_equiv (unsigned int new_regno, unsigned int original_regno)
+lra_copy_reg_equiv (unsigned int new_regno, unsigned int original_regno,
+		bool call_save_p)
 {
-  if (!ira_reg_equiv[original_regno].defined_p)

Re: [pushed] [PR103541] RA: Implement reuse of equivalent memory for caller saves optimization

2023-02-08 Thread Vladimir Makarov via Gcc-patches



On 2/7/23 22:48, Andrew Pinski wrote:

On Tue, Feb 7, 2023 at 6:08 AM Vladimir Makarov via Gcc-patches
 wrote:

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103541

The patch was successfully bootstrapped and tested on x86-64, aarch64,
and ppc64le.

What languages did you test? Because I think I am getting a bootstrap
failure while building libgo in 32bit x86 due to this patch.
I used c and c++ only.  Sorry for all the troubles with the patch. I've 
just reverted the patch and I will try to resolve the issue with it.




[pushed] [PR103541] RA: Implement reuse of equivalent memory for caller saves optimization

2023-02-07 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103541

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.
commit f661c0bb6371f355966a67b5ce71398e80792948
Author: Vladimir N. Makarov 
Date:   Tue Feb 7 08:27:36 2023 -0500

RA: Implement reuse of equivalent memory for caller saves optimization

The test case shows opportunity to reuse memory with constant address for
caller saves optimization for constant or pure function call.  The patch
implements the memory reuse.

PR rtl-optimization/103541

gcc/ChangeLog:

* ira.h (struct ira_reg_equiv_s): Add new field caller_save_p.
* ira.cc (validate_equiv_mem): Check memref address variance.
(update_equiv_regs): Define caller save equivalence for
valid_combine.
(setup_reg_equiv): Clear defined_p flag for caller save equivalence.
* lra-constraints.cc (lra_copy_reg_equiv): Add new arg
call_save_p.  Use caller save equivalence depending on the arg.
(split_reg): Adjust the call.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103541.c: New.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 66df03e8a59..c6ee46286bc 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -3070,6 +3070,8 @@ validate_equiv_mem_from_store (rtx dest, const_rtx set ATTRIBUTE_UNUSED,
 info->equiv_mem_modified = true;
 }
 
+static int equiv_init_varies_p (rtx x);
+
 enum valid_equiv { valid_none, valid_combine, valid_reload };
 
 /* Verify that no store between START and the death of REG invalidates
@@ -3113,7 +3115,8 @@ validate_equiv_mem (rtx_insn *start, rtx reg, rtx memref)
 	 been changed and all hell breaks loose.  */
 	  ret = valid_combine;
 	  if (!MEM_READONLY_P (memref)
-	  && !RTL_CONST_OR_PURE_CALL_P (insn))
+	  && (!RTL_CONST_OR_PURE_CALL_P (insn)
+		  || equiv_init_varies_p (XEXP (memref, 0
 	return valid_none;
 	}
 
@@ -3766,7 +3769,18 @@ update_equiv_regs (void)
 		{
 		  replacement = copy_rtx (SET_SRC (set));
 		  if (validity == valid_reload)
-		note = set_unique_reg_note (insn, REG_EQUIV, replacement);
+		{
+		  note = set_unique_reg_note (insn, REG_EQUIV, replacement);
+		}
+		  else
+		{
+		  /* We still can use this equivalence for caller save
+			 optimization in LRA.  Mark this.  */
+		  ira_reg_equiv[regno].caller_save_p = true;
+		  ira_reg_equiv[regno].init_insns
+			= gen_rtx_INSN_LIST (VOIDmode, insn,
+	 ira_reg_equiv[regno].init_insns);
+		}
 		}
 	}
 
@@ -4156,7 +4170,7 @@ setup_reg_equiv (void)
 		   legitimate, we ignore such REG_EQUIV notes.  */
 		if (memory_operand (x, VOIDmode))
 		  {
-		ira_reg_equiv[i].defined_p = true;
+		ira_reg_equiv[i].defined_p = !ira_reg_equiv[i].caller_save_p;
 		ira_reg_equiv[i].memory = x;
 		continue;
 		  }
diff --git a/gcc/ira.h b/gcc/ira.h
index 58b50dbe8a2..3d35025a46e 100644
--- a/gcc/ira.h
+++ b/gcc/ira.h
@@ -175,8 +175,11 @@ extern struct target_ira *this_target_ira;
 /* Major structure describing equivalence info for a pseudo.  */
 struct ira_reg_equiv_s
 {
-  /* True if we can use this equivalence.  */
+  /* True if we can use this as a general equivalence.  */
   bool defined_p;
+  /* True if we can use this equivalence only for caller save/restore
+ location.  */
+  bool caller_save_p;
   /* True if the usage of the equivalence is profitable.  */
   bool profitable_p;
   /* Equiv. memory, constant, invariant, and initializing insns of
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 7bffbc07ee2..dd4f68bbfc0 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5771,14 +5771,17 @@ choose_split_class (enum reg_class allocno_class,
   return best_cl;
 }
 
-/* Copy any equivalence information from ORIGINAL_REGNO to NEW_REGNO.
-   It only makes sense to call this function if NEW_REGNO is always
-   equal to ORIGINAL_REGNO.  */
+/* Copy any equivalence information from ORIGINAL_REGNO to NEW_REGNO.  It only
+   makes sense to call this function if NEW_REGNO is always equal to
+   ORIGINAL_REGNO.  Set up defined_p flag when caller_save_p flag is set up and
+   CALL_SAVE_P is true.  */
 
 static void
-lra_copy_reg_equiv (unsigned int new_regno, unsigned int original_regno)
+lra_copy_reg_equiv (unsigned int new_regno, unsigned int original_regno,
+		bool call_save_p)
 {
-  if (!ira_reg_equiv[original_regno].defined_p)
+  if (!ira_reg_equiv[original_regno].defined_p
+  && !(call_save_p && ira_reg_equiv[original_regno].caller_save_p))
 return;
 
   ira_expand_reg_equiv ();
@@ -5958,7 +5961,7 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
  rematerializing the original value instead of spilling to the stack.  */
   if (!HARD_REGISTER_NUM_P (original_regno)
   && mode == PSEUDO_REGNO_MODE (original_regno))
-lra_copy_reg_equiv (new_regno, original_regno);
+

[committed] [PR108388] LRA: Always do elimination and only for hard register to check insn constraints

2023-01-24 Thread Vladimir Makarov via Gcc-patches

The following patch solves

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108388

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.
commit 265a749f290f7c6adc9a3aaa9c585b498a8a38ea
Author: Vladimir N. Makarov 
Date:   Tue Jan 24 16:10:59 2023 -0500

LRA: Always do elimination and only for hard register to check insn constraints

LRA does elimination but not always checks insn constraints in this case.
This results in LRA failure for PDP11 target whose addition is only 2-op insn.
The same might happen for other analogous targets.  The patch fixes this problem.

PR rtl-optimization/108388

gcc/ChangeLog:

* lra-constraints.cc (get_hard_regno): Remove final_p arg.  Always
do elimination but only for hard register.
(operands_match_p, uses_hard_regs_p, process_alt_operands): Adjust
calls of get_hard_regno.

gcc/testsuite/ChangeLog:

* gcc.target/pdp11/pdp11.exp: New.
* gcc.target/pdp11/pr108388.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index b0b3c5b01dc..7bffbc07ee2 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -184,12 +184,12 @@ get_try_hard_regno (int regno)
   return ira_class_hard_regs[rclass][0];
 }
 
-/* Return the hard regno of X after removing its subreg.  If X is not
-   a register or a subreg of a register, return -1.  If X is a pseudo,
-   use its assignment.  If FINAL_P return the final hard regno which will
-   be after elimination.  */
+/* Return the hard regno of X after removing its subreg.  If X is not a
+   register or a subreg of a register, return -1.  If X is a pseudo, use its
+   assignment.  If X is a hard regno, return the final hard regno which will be
+   after elimination.  */
 static int
-get_hard_regno (rtx x, bool final_p)
+get_hard_regno (rtx x)
 {
   rtx reg;
   int hard_regno;
@@ -203,7 +203,7 @@ get_hard_regno (rtx x, bool final_p)
 hard_regno = lra_get_regno_hard_regno (hard_regno);
   if (hard_regno < 0)
 return -1;
-  if (final_p)
+  if (HARD_REGISTER_NUM_P (REGNO (reg)))
 hard_regno = lra_get_elimination_hard_regno (hard_regno);
   if (SUBREG_P (x))
 hard_regno += subreg_regno_offset (hard_regno, GET_MODE (reg),
@@ -782,7 +782,7 @@ operands_match_p (rtx x, rtx y, int y_hard_regno)
 {
   int j;
 
-  i = get_hard_regno (x, false);
+  i = get_hard_regno (x);
   if (i < 0)
 	goto slow;
 
@@ -1920,7 +1920,7 @@ uses_hard_regs_p (rtx x, HARD_REG_SET set)
 
   if (REG_P (x) || SUBREG_P (x))
 {
-  x_hard_regno = get_hard_regno (x, true);
+  x_hard_regno = get_hard_regno (x);
   return (x_hard_regno >= 0
 	  && overlaps_hard_reg_set_p (set, mode, x_hard_regno));
 }
@@ -2078,7 +2078,7 @@ process_alt_operands (int only_alternative)
 
   op = no_subreg_reg_operand[nop] = *curr_id->operand_loc[nop];
   /* The real hard regno of the operand after the allocation.  */
-  hard_regno[nop] = get_hard_regno (op, true);
+  hard_regno[nop] = get_hard_regno (op);
 
   operand_reg[nop] = reg = op;
   biggest_mode[nop] = GET_MODE (op);
@@ -2258,7 +2258,7 @@ process_alt_operands (int only_alternative)
 			&& curr_operand_mode[m] != curr_operand_mode[nop])
 		  break;
 		
-		m_hregno = get_hard_regno (*curr_id->operand_loc[m], false);
+		m_hregno = get_hard_regno (*curr_id->operand_loc[m]);
 		/* We are supposed to match a previous operand.
 		   If we do, we win if that one did.  If we do
 		   not, count both of the operands as losers.
diff --git a/gcc/testsuite/gcc.target/pdp11/pdp11.exp b/gcc/testsuite/gcc.target/pdp11/pdp11.exp
new file mode 100644
index 000..89b1f257329
--- /dev/null
+++ b/gcc/testsuite/gcc.target/pdp11/pdp11.exp
@@ -0,0 +1,41 @@
+# Copyright (C) 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an pdp11 target.
+if ![istarget pdp11*-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.

[committed] [PR90706] IRA: Check that reg classes contain a hard reg of given mode in reg move cost calculation

2022-12-15 Thread Vladimir Makarov via Gcc-patches

The following patch solves a spill problem for

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

There are still redundant moves which should be removed to solve PR. 
I'll continue my work on this in Jan.



commit 12abd5a7d13209f79664ea603b3f3517f71b8c4f
Author: Vladimir N. Makarov 
Date:   Thu Dec 15 14:11:05 2022 -0500

IRA: Check that reg classes contain a hard reg of given mode in reg move cost calculation

IRA calculates wrong AVR costs for moving general hard regs of SFmode.  To
calculate the costs we did not exclude sub-classes which do not contain
hard regs of given mode.  This was the reason for spilling a pseudo in the
PR. The patch fixes this.

PR rtl-optimization/90706

gcc/ChangeLog:

* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
* ira.cc (ira_init_register_move_cost): Check that at least one hard
reg of the mode are in the class contents to calculate the
register move costs.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr90706.c: New.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index 964c94a06ef..732a0edd4c1 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ira-int.h"
 #include "addresses.h"
 #include "reload.h"
+#include "print-rtl.h"
 
 /* The flags is set up every time when we calculate pseudo register
classes through function ira_set_pseudo_classes.  */
@@ -503,6 +504,18 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
   int insn_allows_mem[MAX_RECOG_OPERANDS];
   move_table *move_in_cost, *move_out_cost;
   short (*mem_cost)[2];
+  const char *p;
+
+  if (ira_dump_file != NULL && internal_flag_ira_verbose > 5)
+{
+  fprintf (ira_dump_file, "Processing insn %u", INSN_UID (insn));
+  if (INSN_CODE (insn) >= 0
+	  && (p = get_insn_name (INSN_CODE (insn))) != NULL)
+	fprintf (ira_dump_file, " {%s}", p);
+  fprintf (ira_dump_file, " (freq=%d)\n",
+	   REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn)));
+  dump_insn_slim (ira_dump_file, insn);
+  }
 
   for (i = 0; i < n_ops; i++)
 insn_allows_mem[i] = 0;
@@ -526,6 +539,21 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
 	  continue;
 	}
 
+  if (ira_dump_file != NULL && internal_flag_ira_verbose > 5)
+	{
+	  fprintf (ira_dump_file, "  Alt %d:", alt);
+	  for (i = 0; i < n_ops; i++)
+	{
+	  p = constraints[i];
+	  if (*p == '\0')
+		continue;
+	  fprintf (ira_dump_file, "  (%d) ", i);
+	  for (; *p != '\0' && *p != ',' && *p != '#'; p++)
+		fputc (*p, ira_dump_file);
+	}
+	  fprintf (ira_dump_file, "\n");
+	}
+  
   for (i = 0; i < n_ops; i++)
 	{
 	  unsigned char c;
@@ -593,12 +621,16 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
 		 register, this alternative can't be used.  */
 
 		  if (classes[j] == NO_REGS)
-		alt_fail = 1;
-		  /* Otherwise, add to the cost of this alternative
-		 the cost to copy the other operand to the hard
-		 register used for this operand.  */
+		{
+		  alt_fail = 1;
+		}
 		  else
-		alt_cost += copy_cost (ops[j], mode, classes[j], 1, NULL);
+		/* Otherwise, add to the cost of this alternative the cost
+		   to copy the other operand to the hard register used for
+		   this operand.  */
+		{
+		  alt_cost += copy_cost (ops[j], mode, classes[j], 1, NULL);
+		}
 		}
 	  else
 		{
@@ -1021,18 +1053,45 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
   for (i = 0; i < n_ops; i++)
 	if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
 	  {
+	int old_cost;
+	bool cost_change_p = false;
 	struct costs *pp = op_costs[i], *qq = this_op_costs[i];
 	int *pp_costs = pp->cost, *qq_costs = qq->cost;
 	int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
 	cost_classes_t cost_classes_ptr
 	  = regno_cost_classes[REGNO (ops[i])];
 
-	pp->mem_cost = MIN (pp->mem_cost,
+	old_cost = pp->mem_cost;
+	pp->mem_cost = MIN (old_cost,
 (qq->mem_cost + op_cost_add) * scale);
 
+	if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+		&& pp->mem_cost < old_cost)
+	  {
+		cost_change_p = true;
+		fprintf (ira_dump_file, "op %d(r=%u) new costs MEM:%d",
+			 i, REGNO(ops[i]), pp->mem_cost);
+	  }
 	for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-	  pp_costs[k]
-		= MIN (pp_costs[k], (qq_costs[k] + op_cost_add) * scale);
+	  {
+		old_cost = pp_costs[k];
+		pp_costs[k]
+		  = MIN (old_cost, (qq_costs[k] + op_cost_add) * scale);
+		if (ira_dump_file != NULL && internal_flag_ira_verbose > 5
+		&& pp_costs[k] < old_cost)
+		  {
+		if (!cost_change_p)
+		  fprintf (ira_dump_file, "op %d(r=%u) new costs",
+			   i, REGNO(ops[i]));
+		cost_change_p = true;
+		fprintf 

[committed] [PR106462] LRA: Check hard reg availability of pseudo and its subreg for pseudo reload

2022-12-02 Thread Vladimir Makarov via Gcc-patches

The following patch solves

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106462

The patch was successfully bootstrapped and tested on x86-64.
commit 70596a0fb2a2ec072e1e97e37616e05041dfa4e5
Author: Vladimir N. Makarov 
Date:   Fri Dec 2 08:18:04 2022 -0500

LRA: Check hard reg availability of pseudo and its subreg for pseudo reload

Do not reload subreg pseudo if there are hard regs for subreg mode
but there are no hard regs for pseudo mode.

PR target/106462

gcc/ChangeLog:

* lra-constraints.cc (curr_insn_transform): Check available hard
regs for pseudo and its subreg to decide what to reload.

gcc/testsuite/ChangeLog:

* gcc.target/mips/pr106462.c: New test.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index d92ab76908c..02b5ab4a316 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4582,7 +4582,18 @@ curr_insn_transform (bool check_only_p)
 		  || (partial_subreg_p (mode, GET_MODE (reg))
 			  && known_le (GET_MODE_SIZE (GET_MODE (reg)),
    UNITS_PER_WORD)
-			  && WORD_REGISTER_OPERATIONS)))
+			  && WORD_REGISTER_OPERATIONS))
+		  /* Avoid the situation when there are no available hard regs
+		 for the pseudo mode but there are ones for the subreg
+		 mode: */
+		  && !(goal_alt[i] != NO_REGS
+		   && REGNO (reg) >= FIRST_PSEUDO_REGISTER
+		   && (prohibited_class_reg_set_mode_p
+			   (goal_alt[i], reg_class_contents[goal_alt[i]],
+			GET_MODE (reg)))
+		   && !(prohibited_class_reg_set_mode_p
+			(goal_alt[i], reg_class_contents[goal_alt[i]],
+			 mode
 		{
 		  /* An OP_INOUT is required when reloading a subreg of a
 		 mode wider than a word to ensure that data beyond the
diff --git a/gcc/testsuite/gcc.target/mips/pr106462.c b/gcc/testsuite/gcc.target/mips/pr106462.c
new file mode 100644
index 000..c9105409524
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr106462.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=64 -msingle-float" } */
+
+extern void bar (float x, short y);
+
+void foo (int argc)
+{
+short c = argc * 2;
+float a = (float)(short)c, b = 9.5;
+
+bar (b/a, c);
+}


Re: [RFA] gcc: fix PR rtl-optimization/107482

2022-11-07 Thread Vladimir Makarov via Gcc-patches



On 2022-11-07 04:46, Max Filippov wrote:

gcc/
* ira-color.cc (update_costs_from_allocno): Check that allocno
is in the consideration_allocno_bitmap before dereferencing
ALLOCNO_COLOR_DATA (allocno).
---
This fixes the invalid memory access, but I'm not sure if that's
sufficient and there's no remaining higher level logical issue.


Thank you for reporting and working on this issue.

I believe your approach is sufficient.  Although the patch could be 
improved by three ways:


The simplest one is to move consideration allocno check out of loop by 
using the following patch


diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 4a1a325e8e3..a8e52b6b265 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -1413,7 +1413,9 @@ update_costs_from_allocno (ira_allocno_t allocno, 
int hard_regno,

   ira_copy_t cp, next_cp;

   rclass = REGNO_REG_CLASS (hard_regno);
-  do
+  if (!bitmap_bit_p (consideration_allocno_bitmap, ALLOCNO_NUM (allocno)))
+    return;
+  do
 {
   mode = ALLOCNO_MODE (allocno);
   ira_init_register_move_cost_if_necessary (mode);


or by even better patch:

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 4a1a325e8e3..ffe73b61c45 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2209,8 +2209,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 restore_costs_from_copies (a);
   ALLOCNO_HARD_REGNO (a) = best_hard_regno;
   ALLOCNO_ASSIGNED_P (a) = true;
-  if (best_hard_regno >= 0)
-    update_costs_from_copies (a, true, ! retry_p);
+  if (best_hard_regno >= 0 && !retry_p)
+    update_costs_from_copies (a, true, true);
   ira_assert (ALLOCNO_CLASS (a) == aclass);
   /* We don't need updated costs anymore.  */
   ira_free_allocno_updated_costs (a);

Probably the best way would be to allocate and set up data for new 
allocnos of pseudos created on the borders of the allocation regions.  
But it is too complicated and I am not sure it will give some visible 
performance improvement.


So I'd prefer the second patch with change in assign_hard_reg.

Please, check that my proposed patch works and commit it in the case of 
success.


Thank you.


Regtested for target=xtensa-linux-uclibc, no new regressions.

  gcc/ira-color.cc | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 4a1a325e8e31..4527eab39bb7 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -1434,6 +1434,8 @@ update_costs_from_allocno (ira_allocno_t allocno, int 
hard_regno,
  
  	  if (another_allocno == from

  || (ALLOCNO_COLOR_DATA (another_allocno) != NULL
+ && bitmap_bit_p (consideration_allocno_bitmap,
+  ALLOCNO_NUM (allocno))
  && (ALLOCNO_COLOR_DATA (allocno)->first_thread_allocno
  != ALLOCNO_COLOR_DATA 
(another_allocno)->first_thread_allocno)))
continue;




Re: [PATCH] IRA: Make sure array is big enough

2022-10-26 Thread Vladimir Makarov via Gcc-patches



On 2022-10-25 06:01, Torbjörn SVENSSON wrote:

In commit 081c96621da, the call to resize_reg_info() was moved before
the call to remove_scratches() and the latter one can increase the
number of regs and that would cause an out of bounds usage on the
reg_renumber global array.

Without this patch, the following testcase randomly fails with:
during RTL pass: ira
In file included from 
/src/gcc/testsuite/gcc.dg/compat//struct-by-value-5b_y.c:13:
/src/gcc/testsuite/gcc.dg/compat//struct-by-value-5b_y.c: In function 
'checkgSf13':
/src/gcc/testsuite/gcc.dg/compat//fp-struct-test-by-value-y.h:28:1: internal 
compiler error: Segmentation fault
/src/gcc/testsuite/gcc.dg/compat//struct-by-value-5b_y.c:22:1: note: in 
expansion of macro 'TEST'

gcc/ChangeLog:

* ira.c: Resize array after reg number increased.


The patch is ok to commit it into gcc-11,12 branches and master.

Thank you for fixing this.


Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
  gcc/ira.cc | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 42c9cead9f8..d28a67b2546 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -5718,6 +5718,7 @@ ira (FILE *f)
  regstat_free_ri ();
  regstat_init_n_sets_and_refs ();
  regstat_compute_ri ();
+resize_reg_info ();
};
  
int max_regno_before_rm = max_reg_num ();




Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-31 Thread Vladimir Makarov via Gcc-patches



On 2022-05-29 23:05, Hongtao Liu wrote:

On Fri, May 27, 2022 at 5:12 AM Vladimir Makarov via Gcc-patches
 wrote:


On 2022-05-24 23:39, liuhongt wrote:

Rigt now, mem_cost for separate mem alternative is 1 * frequency which
is pretty small and caused the unnecessary SSE spill in the PR, I've tried
to rework backend cost model, but RA still not happy with that(regress
somewhere else). I think the root cause of this is cost for separate 'm'
alternative cost is too small, especially considering that the mov cost
of gpr are 2(default for REGISTER_MOVE_COST). So this patch increase mem_cost
to 2*frequency, also increase 1 for reg_class cost when m alternative.


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

Thank you for addressing this problem. And sorry I can not approve this
patch at least w/o your additional work on benchmarking this change.

This code is very old.  It is coming from older RA (former file
regclass.c) and existed practically since GCC day 1.  People tried many
times to improve this code.  The code also affects many targets.

Yes, that's why I increased it as low as possible, so it won't regress
#c6 in the PR.

I can approve this patch if you show that there is no regression at
least on x86-64 on some credible benchmark, e.g. SPEC2006 or SPEC2017.


I've tested the patch for SPEC2017 with both  -march=cascadelake
-Ofast -flto and -O2 -mtune=generic.
No obvious regression is observed, the binaries are all different from
before, so I looked at 2 of them, the difference mainly comes from
different choices of registers(xmm13 -> xmm12).
Ok for trunk then?


OK.

Thank you for checking SPEC2017.

I hope it will not create troubles for other targets.




Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-26 Thread Vladimir Makarov via Gcc-patches



On 2022-05-24 23:39, liuhongt wrote:

Rigt now, mem_cost for separate mem alternative is 1 * frequency which
is pretty small and caused the unnecessary SSE spill in the PR, I've tried
to rework backend cost model, but RA still not happy with that(regress
somewhere else). I think the root cause of this is cost for separate 'm'
alternative cost is too small, especially considering that the mov cost
of gpr are 2(default for REGISTER_MOVE_COST). So this patch increase mem_cost
to 2*frequency, also increase 1 for reg_class cost when m alternative.


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?


Thank you for addressing this problem. And sorry I can not approve this 
patch at least w/o your additional work on benchmarking this change.


This code is very old.  It is coming from older RA (former file 
regclass.c) and existed practically since GCC day 1.  People tried many 
times to improve this code.  The code also affects many targets.


I can approve this patch if you show that there is no regression at 
least on x86-64 on some credible benchmark, e.g. SPEC2006 or SPEC2017.


I know it is a big work but when I myself do such changes I check 
SPEC2017.  I rejected my changes like this one several times when I 
benchmarked them on SPEC2017 although at the first glance they looked 
reasonable.



gcc/ChangeLog:

PR target/105513
* ira-costs.cc (record_reg_classes): Increase both mem_cost
and reg class cost by 1 for separate mem alternative when
REG_P (op).

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr105513-1.c: New test.
---
  gcc/ira-costs.cc   | 26 +-
  gcc/testsuite/gcc.target/i386/pr105513-1.c | 16 +
  2 files changed, 31 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/i386/pr105513-1.c

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index 964c94a06ef..f7b8325e195 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -625,7 +625,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
{
  rclass = cost_classes[k];
- pp_costs[k] = mem_cost[rclass][0] * frequency;
+ pp_costs[k] = (mem_cost[rclass][0]
++ 1) * frequency;
}
}
  else
@@ -648,7 +649,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
{
  rclass = cost_classes[k];
- pp_costs[k] = mem_cost[rclass][1] * frequency;
+ pp_costs[k] = (mem_cost[rclass][1]
++ 1) * frequency;
}
}
  else
@@ -670,9 +672,9 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
{
  rclass = cost_classes[k];
- pp_costs[k] = ((mem_cost[rclass][0]
- + mem_cost[rclass][1])
-* frequency);
+ pp_costs[k] = (mem_cost[rclass][0]
++ mem_cost[rclass][1]
++ 2) * frequency;
}
}
  else
@@ -861,7 +863,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
{
  rclass = cost_classes[k];
- pp_costs[k] = mem_cost[rclass][0] * frequency;
+ pp_costs[k] = (mem_cost[rclass][0]
++ 1) * frequency;
}
}
  else
@@ -884,7 +887,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
{
  rclass = cost_classes[k];
- pp_costs[k] = mem_cost[rclass][1] * frequency;
+ pp_costs[k] = (mem_cost[rclass][1]
++ 1) * frequency;
}
}
  else
@@ -906,9 +910,9 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
{
  

Re: [PATCH] [PR100106] Reject unaligned subregs when strict alignment is required

2022-05-06 Thread Vladimir Makarov via Gcc-patches



On 2022-05-05 02:52, Alexandre Oliva wrote:


Regstrapped on x86_64-linux-gnu and ppc64le-linux-gnu, also tested
targeting ppc- and ppc64-vx7r2.  Ok to install?

I am ok with the modified version of the patch.  It looks reasonable for 
me and I support its commit.


But I think I can not approve the patch formally as emit-rtl.cc is out 
of my jurisdiction and validate_subreg is used in many places besides RA.


Sorry, Alex, some global reviewer should do this.


for  gcc/ChangeLog

PR target/100106
* emit-rtl.c (validate_subreg): Reject a SUBREG of a MEM that
requires stricter alignment than MEM's.

for  gcc/testsuite/ChangeLog

PR target/100106
* gcc.target/powerpc/pr100106-sa.c: New.
---
  gcc/emit-rtl.cc|3 +++
  gcc/testsuite/gcc.target/powerpc/pr100106-sa.c |4 
  2 files changed, 7 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr100106-sa.c

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 1e02ae254d012..642e47eada0d7 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -982,6 +982,9 @@ validate_subreg (machine_mode omode, machine_mode imode,
  
return subreg_offset_representable_p (regno, imode, offset, omode);

  }
+  else if (reg && MEM_P (reg)
+  && STRICT_ALIGNMENT && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
+return false;
  
/* The outer size must be ordered wrt the register size, otherwise

   we wouldn't know at compile time how many registers the outer
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c 
b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
new file mode 100644
index 0..6cc29595c8b25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100106-sa.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target { ilp32 } } } */
+/* { dg-options "-mcpu=604 -O -mstrict-align" } */
+
+#include "../../gcc.c-torture/compile/pr100106.c"






Re: [committed] [PR105032] LRA: modify loop condition to find reload insns for hard reg splitting

2022-03-30 Thread Vladimir Makarov via Gcc-patches



On 2022-03-30 15:18, Uros Bizjak wrote:

On Wed, Mar 30, 2022 at 7:15 PM Vladimir Makarov via Gcc-patches
 wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032

The patch was successfully bootstrapped and tested on x86-64.

diff --git a/gcc/testsuite/gcc.target/i386/pr105032.c
b/gcc/testsuite/gcc.target/i386/pr105032.c
new file mode 100644
index 000..57b21d3cd7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105032.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-w" } */
+/* { dg-additional-options "-m32" { target x86_64-*-* } } */

Please don't use -m32 in options, but instead conditionally compile
the testcase with


Sorry for may be a stupid question.  I am interesting what are the 
reasons for this.  Is it just for saving computer cycles?


I think the test is important therefore I'd like to run the test on 
x86-64 too because people rarely test i686 target.



/* { dg-do compile { target ia32 } } */




[committed] [PR105032] LRA: modify loop condition to find reload insns for hard reg splitting

2022-03-30 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032

The patch was successfully bootstrapped and tested on x86-64.
commit 25de4889c16fec80172a5e2d1825f3ff505d0cc4
Author: Vladimir N. Makarov 
Date:   Wed Mar 30 13:03:44 2022 -0400

[PR105032] LRA: modify loop condition to find reload insns for hard reg splitting

When trying to split hard reg live range to assign hard reg to a reload
pseudo, LRA searches for reload insns of the reload pseudo
assuming a specific order of the reload insns.  This order is violated if
reload involved in inheritance transformation. In such case, the loop used
for reload insn searching can become infinite.  The patch fixes this.

gcc/ChangeLog:

PR middle-end/105032
* lra-assigns.cc (find_reload_regno_insns): Modify loop condition.

gcc/testsuite/ChangeLog:

PR middle-end/105032
* gcc.target/i386/pr105032.c: New.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index af30a673142..486e94f2006 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1730,7 +1730,8 @@ find_reload_regno_insns (int regno, rtx_insn * , rtx_insn * )
 {
   for (prev_insn = PREV_INSN (start_insn),
 	 next_insn = NEXT_INSN (start_insn);
-	   insns_num != 1 && (prev_insn != NULL || next_insn != NULL); )
+	   insns_num != 1 && (prev_insn != NULL
+			  || (next_insn != NULL && second_insn == NULL)); )
 	{
 	  if (prev_insn != NULL)
 	{
diff --git a/gcc/testsuite/gcc.target/i386/pr105032.c b/gcc/testsuite/gcc.target/i386/pr105032.c
new file mode 100644
index 000..57b21d3cd7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105032.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-w" } */
+/* { dg-additional-options "-m32" { target x86_64-*-* } } */
+
+typedef unsigned int size_t;	
+__extension__ typedef long int __off_t;
+typedef __off_t off_t;
+static void *__sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
+			off_t offset)
+{
+  offset >>= 12;
+  return (void *)({ long _ret;
+  register long _num asm("eax") = (192);
+  register long _arg1 asm("ebx") = (long)(addr);
+  register long _arg2 asm("ecx") = (long)(length);
+  register long _arg3 asm("edx") = (long)(prot);
+  register long _arg4 asm("esi") = (long)(flags);
+  register long _arg5 asm("edi") = (long)(fd);
+  long _arg6 = (long)(offset);
+  asm volatile ("pushl	%[_arg6]\n\t"
+		"pushl	%%ebp\n\t"
+		"movl	4(%%esp), %%ebp\n\t"
+		"int	$0x80\n\t"
+		"popl	%%ebp\n\t"
+		"addl	$4,%%esp\n\t"
+		: "=a"(_ret)
+		: "r"(_num), "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4),"r"(_arg5), [_arg6]"m"(_arg6)
+		: "memory", "cc" );
+  _ret; });
+}
+
+int main(void)
+{
+  __sys_mmap(((void *)0), 0x1000, 0x1 | 0x2, 0x20 | 0x02, -1, 0);
+  return 0;
+}


[committed] [PR104971] LRA: check live hard regs to remove a dead insn

2022-03-25 Thread Vladimir Makarov via Gcc-patches

The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104971

The PR was already fixed by Jakub but his patch did not fix a latent LRA 
bug mentioned in the PR comments.  The current patch fixes the latent bug.


The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit 33904327c92bd914d4e0e076be12dc0a6b453c2d
Author: Vladimir N. Makarov 
Date:   Fri Mar 25 12:22:08 2022 -0400

[PR104971] LRA: check live hard regs to remove a dead insn

LRA removes insn modifying sp for given PR test set.  We should also have
checked living hard regs to prevent this.  The patch fixes this.

gcc/ChangeLog:

PR middle-end/104971
* lra-lives.cc (process_bb_lives): Check hard_regs_live for hard
regs to clear remove_p flag.

diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index 796f00629b4..a755464ee81 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -724,7 +724,10 @@ process_bb_lives (basic_block bb, int _point, bool dead_insn_p)
 	  bool remove_p = true;
 
 	  for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type != OP_IN && sparseset_bit_p (pseudos_live, reg->regno))
+	if (reg->type != OP_IN
+		&& (reg->regno < FIRST_PSEUDO_REGISTER
+		? TEST_HARD_REG_BIT (hard_regs_live, reg->regno)
+		: sparseset_bit_p (pseudos_live, reg->regno)))
 	  {
 		remove_p = false;
 		break;


Re: [PATCH] rtl-optimization/105028 - fix compile-time hog in form_threads_from_copies

2022-03-23 Thread Vladimir Makarov via Gcc-patches



On 2022-03-23 07:49, Richard Biener wrote:

form_threads_from_copies processes a sorted array of copies, skipping
those with the same thread and conflicting threads and merging the
first non-conflicting ones.  After that it terminates the loop and
gathers the remaining elements of the array, skipping same thread
copies, re-starting the process.  For a large number of copies this
gathering of the rest takes considerable time and it also appears
pointless.  The following simply continues processing the array
which should be equivalent as far as I can see.


It looks the same to me that the result code is equivalent to the 
original one.


As I remember originally it was more sophisticated but even more slower 
algorithm taking into account that merging 2 threads could remove 
several copies (not just one) from the array and choosing the best copy 
with this point of view.  It was transformed into this ineffective 
leftover code.



This takes form_threads_from_copies off the profile radar from
previously taking ~50% of the compile-time.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK if testing succeeds?

Yes.  Thank you for working on this, Richard.



[committed] [PR104961] LRA: split hard reg for reload pseudo with clobber

2022-03-18 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104961

The patch was successfully bootstrapped and tested on x86-64.

commit 4e2291789a8b31c550271405782356e8aeddcee3
Author: Vladimir N. Makarov 
Date:   Fri Mar 18 14:23:40 2022 -0400

[PR104961] LRA: split hard reg for reload pseudo with clobber.

Splitting hard register live range did not work for subreg of a
multi-reg reload pseudo.  Reload insns for such pseudo contain clobber
of the pseudo and splitting did not take this into account.  The patch
fixes it.

gcc/ChangeLog:

PR rtl-optimization/104961
* lra-assigns.cc (find_reload_regno_insns): Process reload pseudo clobber.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104961
* gcc.target/i386/pr104961.c: New.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index ab3a6e6e9cc..af30a673142 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1706,7 +1706,8 @@ find_reload_regno_insns (int regno, rtx_insn * , rtx_insn * )
 {
   unsigned int uid;
   bitmap_iterator bi;
-  int n = 0;
+  int insns_num = 0;
+  bool clobber_p = false;
   rtx_insn *prev_insn, *next_insn;
   rtx_insn *start_insn = NULL, *first_insn = NULL, *second_insn = NULL;
   
@@ -1714,28 +1715,32 @@ find_reload_regno_insns (int regno, rtx_insn * , rtx_insn * )
 {
   if (start_insn == NULL)
 	start_insn = lra_insn_recog_data[uid]->insn;
-  n++;
+  if (GET_CODE (PATTERN (lra_insn_recog_data[uid]->insn)) == CLOBBER)
+	clobber_p = true;
+  else
+	insns_num++;
 }
-  /* For reload pseudo we should have at most 3 insns referring for
+  /* For reload pseudo we should have at most 3 insns besides clobber referring for
  it: input/output reload insns and the original insn.  */
-  if (n > 3)
+  if (insns_num > 3)
 return false;
-  if (n > 1)
+  if (clobber_p)
+insns_num++;
+  if (insns_num > 1)
 {
   for (prev_insn = PREV_INSN (start_insn),
 	 next_insn = NEXT_INSN (start_insn);
-	   n != 1 && (prev_insn != NULL || next_insn != NULL); )
+	   insns_num != 1 && (prev_insn != NULL || next_insn != NULL); )
 	{
-	  if (prev_insn != NULL && first_insn == NULL)
+	  if (prev_insn != NULL)
 	{
-	  if (! bitmap_bit_p (_reg_info[regno].insn_bitmap,
-  INSN_UID (prev_insn)))
-		prev_insn = PREV_INSN (prev_insn);
-	  else
+	  if (bitmap_bit_p (_reg_info[regno].insn_bitmap,
+INSN_UID (prev_insn)))
 		{
 		  first_insn = prev_insn;
-		  n--;
+		  insns_num--;
 		}
+		prev_insn = PREV_INSN (prev_insn);
 	}
 	  if (next_insn != NULL && second_insn == NULL)
 	{
@@ -1745,11 +1750,11 @@ find_reload_regno_insns (int regno, rtx_insn * , rtx_insn * )
 	  else
 		{
 		  second_insn = next_insn;
-		  n--;
+		  insns_num--;
 		}
 	}
 	}
-  if (n > 1)
+  if (insns_num > 1)
 	return false;
 }
   start = first_insn != NULL ? first_insn : start_insn;
diff --git a/gcc/testsuite/gcc.target/i386/pr104961.c b/gcc/testsuite/gcc.target/i386/pr104961.c
new file mode 100644
index 000..11ea95afe44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104961.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-Og" } */
+
+__int128 i;
+
+void bar (int);
+
+void
+foo (int a, char b, _Complex unsigned char c)
+{
+  __int128 j = i * i;
+  c -= 1;
+  bar (j);
+  bar (__imag__ c);
+}


Re: [PATCH] lra: Fix up debug_p handling in lra_substitute_pseudo [PR104778]

2022-03-14 Thread Vladimir Makarov via Gcc-patches



On 2022-03-12 14:37, Jakub Jelinek wrote:

Hi!

The following testcase ICEs on powerpc-linux, because lra_substitute_pseudo
substitutes (const_int 1) into a subreg operand.  First a subreg of subreg
of a reg appears in a debug insn (which surely is invalid outside of
debug insns, but in debug insns we allow even what is normally invalid in
RTL like subregs which the target doesn't like, because either dwarf2out
is able to handle it, or we just throw away the location expression,
making some var .

lra_substitute_pseudo already has some code to deal with specifically
SUBREG of REG with the REG being substituted for VOIDmode constant,
but that doesn't cover this case, so the following patch extends
lra_substitute_pseudo for debug_p mode to treat stuff like e.g.
combiner's subst function to ensure we don't lose mode which is essential
for the IL.

Bootstrapped/regtested on {powerpc64{,le},x86_64,i686}-linux, ok for trunk?



Sure.  Thank you for working on this PR, Jakub.



2022-03-12  Jakub Jelinek  

PR debug/104778
* lra.cc (lra_substitute_pseudo): For debug_p mode, simplify
SUBREG, ZERO_EXTEND, SIGN_EXTEND, FLOAT or UNSIGNED_FLOAT if recursive
call simplified the first operand into VOIDmode constant.

* gcc.target/powerpc/pr104778.c: New test.





[committed] [PR103074] LRA: Check new conflicts when splitting hard reg live range

2022-03-10 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103074

The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit d8e5fff6b74b82c2ac3254be9a1f0fb6b30dbdbf
Author: Vladimir N. Makarov 
Date:   Thu Mar 10 16:16:00 2022 -0500

[PR103074] LRA: Check new conflicts when splitting hard reg live range.

Splitting hard register live range can create (artificial)
conflict of the hard register with another pseudo because of simplified
conflict calculation in LRA.  We should check such conflict on the next
assignment sub-pass and spill and reassign the pseudo if necessary.
The patch implements this.

gcc/ChangeLog:

PR target/103074
* lra-constraints.cc (split_reg): Set up
check_and_force_assignment_correctness_p when splitting hard
register live range.

gcc/testsuite/ChangeLog:

PR target/103074
* gcc.target/i386/pr103074.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 080b44ad87a..d92ab76908c 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5994,12 +5994,17 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
 			 before_p ? NULL : save,
 			 call_save_p
 			 ?  "Add save<-reg" : "Add split<-reg");
-  if (nregs > 1)
+  if (nregs > 1 || original_regno < FIRST_PSEUDO_REGISTER)
 /* If we are trying to split multi-register.  We should check
conflicts on the next assignment sub-pass.  IRA can allocate on
sub-register levels, LRA do this on pseudos level right now and
this discrepancy may create allocation conflicts after
-   splitting.  */
+   splitting.
+
+   If we are trying to split hard register we should also check conflicts
+   as such splitting can create artificial conflict of the hard register
+   with another pseudo because of simplified conflict calculation in
+   LRA.  */
 check_and_force_assignment_correctness_p = true;
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
diff --git a/gcc/testsuite/gcc.target/i386/pr103074.c b/gcc/testsuite/gcc.target/i386/pr103074.c
new file mode 100644
index 000..276ad82a1de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103074.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=bonnell -Os -fPIC -fschedule-insns -w" } */
+
+void
+serialize_collection (char *ptr, int a, int need_owner)
+{
+  if (need_owner)
+__builtin_sprintf(ptr, "%d:%d", 0, a);
+  else
+{
+  static char buff[32];
+
+  __builtin_sprintf(buff, "%d:%d", a >> 32, a);
+  __builtin_sprintf(ptr, "%d:%d:\"%s\"", 0, 0, buff);
+}
+}


Re: [PR103302] skip multi-word pre-move clobber during lra

2022-03-02 Thread Vladimir Makarov via Gcc-patches



On 2022-03-02 07:25, Alexandre Oliva wrote:

Regstrapped on x86_64-linux-gnu, also tested on various riscv and arm
targets (with gcc-11).  Ok to install?


Yes.

Thank you on working this, Alex.


for  gcc/ChangeLog
* lra-constraints.cc (undo_optional_reloads): Recognize and
drop insns of multi-word move sequences, tolerate removal
iteration on an already-removed clobber, and refuse to
substitute original pseudos into clobbers.


Re: [PATCH] rtl-optimization/104686 - speedup IRA allocno conflict test

2022-03-02 Thread Vladimir Makarov via Gcc-patches



On 2022-03-02 03:58, Richard Biener wrote:

In this PR allocnos_conflict_p takes 90% of the compile-time via
the calls from update_conflict_hard_regno_costs.  This is due to
the high number of conflicts recorded in the dense bitvector
representation.  Fortunately we can take advantage of the bitvector
representation here and turn the O(n) conflict test into an O(1) one,
greatly speeding up the compile of the testcase from 39s to just 4s
(93% IRA time to 26% IRA time).

While for the testcase in question the first allocno is almost always
the nice one the patch tries a more systematic approach to finding
the allocno to iterate object conflicts over.  That does reduce
the actual number of compares for the testcase but it doesn't make
a measurable difference wall-clock wise.  That's not guaranteed
though I think so I've kept this systematic way of choosing the
cheapest allocno.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?


Yes.

Richard, thank you again for working on this issue.


2022-03-02  Richard Biener  

PR rtl-optimization/104686
* ira-color.cc (object_conflicts_with_allocno_p): New function
using a bitvector test instead of iterating when possible.
(allocnos_conflict_p): Choose the best allocno to iterate over
object conflicts.
(update_conflict_hard_regno_costs): Do allocnos_conflict_p test
last.
other_allocno),




[committed] [PR104637] LRA: Split hard regs as many as possible on one subpass

2022-02-28 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64.


commit ec1b9ba2d7913fe5e9deacc8e55e7539262f5124
Author: Vladimir N. Makarov 
Date:   Mon Feb 28 16:43:50 2022 -0500

[PR104637] LRA: Split hard regs as many as possible on one subpass

LRA hard reg split subpass is a small subpass used as the last
resort for LRA when it can not assign a hard reg to a reload
pseudo by other ways (e.g. by spilling non-reload pseudos).  For
simplicity the subpass works on one split base (as each split
changes pseudo live range info).  In this case it results in
reaching maximal possible number of subpasses.  The patch
implements as many non-overlapping hard reg splits
splits as possible on each subpass.

gcc/ChangeLog:

PR rtl-optimization/104637
* lra-assigns.cc (lra_split_hard_reg_for): Split hard regs as many
as possible on one subpass.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104637
* gcc.target/i386/pr104637.c: New.

diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index c1d40ea2a14..ab3a6e6e9cc 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -1774,8 +1774,8 @@ lra_split_hard_reg_for (void)
  iterations.  Either it's an asm and something is wrong with the
  constraints, or we have run out of spill registers; error out in
  either case.  */
-  bool asm_p = false;
-  bitmap_head failed_reload_insns, failed_reload_pseudos;
+  bool asm_p = false, spill_p = false;
+  bitmap_head failed_reload_insns, failed_reload_pseudos, over_split_insns;
   
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
@@ -1786,6 +1786,7 @@ lra_split_hard_reg_for (void)
   bitmap_ior (_reload_pseudos, _inheritance_pseudos, _split_regs);
   bitmap_ior_into (_reload_pseudos, _subreg_reload_pseudos);
   bitmap_ior_into (_reload_pseudos, _optional_reload_pseudos);
+  bitmap_initialize (_split_insns, _obstack);
   for (i = lra_constraint_new_regno_start; i < max_regno; i++)
 if (reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
 	&& (rclass = lra_get_allocno_class (i)) != NO_REGS
@@ -1793,14 +1794,41 @@ lra_split_hard_reg_for (void)
   {
 	if (! find_reload_regno_insns (i, first, last))
 	  continue;
-	if (BLOCK_FOR_INSN (first) == BLOCK_FOR_INSN (last)
-	&& spill_hard_reg_in_range (i, rclass, first, last))
+	if (BLOCK_FOR_INSN (first) == BLOCK_FOR_INSN (last))
 	  {
-	bitmap_clear (_reload_pseudos);
-	return true;
+	/* Check that we are not trying to split over the same insn
+	   requiring reloads to avoid splitting the same hard reg twice or
+	   more.  If we need several hard regs splitting over the same insn
+	   it can be finished on the next iterations.
+
+	   The following loop iteration number is small as we split hard
+	   reg in a very small range.  */
+	for (insn = first;
+		 insn != NEXT_INSN (last);
+		 insn = NEXT_INSN (insn))
+	  if (bitmap_bit_p (_split_insns, INSN_UID (insn)))
+		break;
+	if (insn != NEXT_INSN (last)
+		|| !spill_hard_reg_in_range (i, rclass, first, last))
+	  {
+		bitmap_set_bit (_reload_pseudos, i);
+	  }
+	else
+	  {
+		for (insn = first;
+		 insn != NEXT_INSN (last);
+		 insn = NEXT_INSN (insn))
+		  bitmap_set_bit (_split_insns, INSN_UID (insn));
+		spill_p = true;
+	  }
 	  }
-	bitmap_set_bit (_reload_pseudos, i);
   }
+  bitmap_clear (_split_insns);
+  if (spill_p)
+{
+  bitmap_clear (_reload_pseudos);
+  return true;
+}
   bitmap_clear (_reload_pseudos);
   bitmap_initialize (_reload_insns, _obstack);
   EXECUTE_IF_SET_IN_BITMAP (_reload_pseudos, 0, u, bi)
diff --git a/gcc/testsuite/gcc.target/i386/pr104637.c b/gcc/testsuite/gcc.target/i386/pr104637.c
new file mode 100644
index 000..65e8635d55e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104637.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fno-forward-propagate -mavx" } */
+
+typedef short __attribute__((__vector_size__ (64))) U;
+typedef unsigned long long __attribute__((__vector_size__ (32))) V;
+typedef long double __attribute__((__vector_size__ (64))) F;
+
+int i;
+U u;
+F f;
+
+void
+foo (char a, char b, _Complex char c, V v)
+{
+  u = (U) { u[0] / 0, u[1] / 0, u[2] / 0, u[3] / 0, u[4] / 0, u[5] / 0, u[6] / 0, u[7] / 0,
+	u[8] / 0, u[0] / 0, u[9] / 0, u[10] / 0, u[11] / 0, u[12] / 0, u[13] / 0, u[14] / 0, u[15] / 0,
+	u[16] / 0, u[17] / 0, u[18] / 0, u[19] / 0, u[20] / 0, u[21] / 0, u[22] / 0, u[23] / 0,
+	u[24] / 0, u[25] / 0, u[26] / 0, u[27] / 0, u[28] / 0, u[29] / 0, u[30] / 0, u[31] / 0 };
+  c += i;
+  f = (F) { v[0], v[1], v[2], v[3] };
+  i = (char) (__imag__ c + i);
+}


Re: [PATCH] rtl-optimization/104686 - speed up conflict iteration

2022-02-25 Thread Vladimir Makarov via Gcc-patches



On 2022-02-25 09:14, Richard Biener wrote:

The following replaces

/* Skip bits that are zero.  */
for (; (word & 1) == 0; word >>= 1)
  bit_num++;

idioms in ira-int.h in the attempt to speedup update_conflict_hard_regno_costs
which we're bound on in PR104686.  The trick is to use ctz_hwi here
which should pay off even with dense bitmaps on architectures that
have HW support for this.

For the PR in question this speeds up compile-time from 31s to 24s for
me.

It is a really significant improvement.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

Yes.  Thank you for working on this PR, Richard.

2022-02-25  Richard Biener  

PR rtl-optimization/104686
* ira-int.h (minmax_set_iter_cond): Use ctz_hwi to elide loop
skipping bits that are zero.
(ira_object_conflict_iter_cond): Likewise.




Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-22 Thread Vladimir Makarov via Gcc-patches



On 2022-02-20 12:34, Iain Sandoe wrote:


^^^ this is mostly for my education - the stuff below is a potential solution 
to leaving lra-constraints unchanged and fixing the Darwin bug….

I'd be really glad if you do manage to fix this w/o changing LRA. 
Richard has a legitimate point that my proposed change in LRA 
prohibiting `...;reg=low_sum; ...mem[reg]` might force LRA to generate 
less optimized code or even might make LRA to generate unrecognized 
insns `reg = orginal addr` for some ports requiring further fixes in 
machine-dependent code of the ports.




[committed] [PR104447] LRA: Do not split non-alloc hard regs

2022-02-17 Thread Vladimir Makarov via Gcc-patches

The patch solves the following PR:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104447

The patch was successfully bootstrapped and tested on x86-64.

commit db69f666a728ce800a840115829f6b64bc3174d2
Author: Vladimir N. Makarov 
Date:   Thu Feb 17 11:31:50 2022 -0500

[PR104447] LRA: Do not split non-alloc hard regs.

LRA tried to split non-allocated hard reg for reload pseudos again and
again until number of assignment passes reaches the limit.  The patch fixes
this.

gcc/ChangeLog:

PR rtl-optimization/104447
* lra-constraints.cc (spill_hard_reg_in_range): Initiate ignore
hard reg set by lra_no_alloc_regs.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104447
* gcc.target/i386/pr104447.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c700c3f4578..b2c4590153c 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6008,7 +6008,7 @@ spill_hard_reg_in_range (int regno, enum reg_class rclass, rtx_insn *from, rtx_i
   HARD_REG_SET ignore;
   
   lra_assert (from != NULL && to != NULL);
-  CLEAR_HARD_REG_SET (ignore);
+  ignore = lra_no_alloc_regs;
   EXECUTE_IF_SET_IN_BITMAP (_reg_info[regno].insn_bitmap, 0, uid, bi)
 {
   lra_insn_recog_data_t id = lra_insn_recog_data[uid];
diff --git a/gcc/testsuite/gcc.target/i386/pr104447.c b/gcc/testsuite/gcc.target/i386/pr104447.c
new file mode 100644
index 000..bf11e8696e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104447.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -pg" } */
+
+int
+bar (int x)
+{
+  asm goto ("" : : "r" (x), "r" (x + 1), "r" (x + 2), "r" (x + 3), /* { dg-error "operand has impossible constraints" } */
+	"r" (x + 4), "r" (x + 5), "r" (x + 6), "r" (x + 7),
+	"r" (x + 8), "r" (x + 9), "r" (x + 10), "r" (x + 11),
+	"r" (x + 12), "r" (x + 13), "r" (x + 14), "r" (x + 15),
+	"r" (x + 16) : : lab);
+ lab:
+  return 0;
+}


Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Vladimir Makarov via Gcc-patches



On 2022-02-14 11:00, Richard Sandiford wrote:

Hi Vlad,

Vladimir Makarov via Gcc-patches  writes:


Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.

I think there is no need for this code and it is misleading.  If
'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]'
will help for any existing target.  As machine-dependent code for any
target most probably (for ppc64 darwin it is exactly the case) checks
address only in memory, it can wrongly accept wrong address by reloading
it into reg and use it in memory. So these are my arguments for the
remove this code from process_address_1.

I'm probably making too much of this, but:

I think the code is potentially useful in that existing targets do forbid
forbid lo_sum addresses in certain contexts (due to limited offset range)
while still wanting lo_sum to be used to be load the address.  If we
handle the high/lo_sum split in generic code then we have more chance
of being able to optimise things.  So it feels like this is setting an
unfortunate precedent.

I still don't understand what went wrong before though (the PR trail
was a bit too long to process :-)).  Is there a case where
(lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.

Sometimes it is hard to make a line where an RA bug is a bug in 
machine-dependent code or in RA itself.


For this case I would say it is a bug in the both parts.

Low-sum is generated by LRA and it does not know that it should be 
wrapped by unspec for darwin. Generally speaking we could avoid the 
change in LRA but it would require to do non-trivial analysis in machine 
dependent code to find cases when 'reg=low_sum ... mem[reg]' is 
incorrect code for darwin (PIC) target (and may be some other PIC 
targets too). Therefore I believe the change in LRA is a good solution 
even if the change can potentially result in less optimized code for 
some cases.  Taking your concern into account we could probably improve 
the patch by introducing a hook (I never liked such solutions as we 
already have too many hooks directing RA) or better to make the LRA 
change working only for PIC target. Something like this (it probably 
needs better recognition of pic target):


--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
  if (HAVE_lo_sum)
    {
  /* addr => lo_sum (new_base, addr), case (2) above.  */
  insn = emit_insn (gen_rtx_SET
    (new_reg,
 gen_rtx_HIGH (Pmode, copy_rtx (addr;
  code = recog_memoized (insn);
  if (code >= 0)
    {
  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
- if (!valid_address_p (op, , cn))
+ if (!valid_address_p (op, , cn) && !flag_pic)
    {
  /* Try to put lo_sum into register.  */
  insn = emit_insn (gen_rtx_SET
    (new_reg,
 gen_rtx_LO_SUM (Pmode, 
new_reg, addr)));

  code = recog_memoized (insn);
  if (code >= 0)
    {
  *ad.inner = new_reg;
  if (!valid_address_p (op, , cn))



Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Vladimir Makarov via Gcc-patches



On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:

Iain Sandoe via Gcc-patches  writes:

Two issues resulted in this PR, which manifests when we force a constant into
memory in LRA (in PIC code on Darwin).  The presence of such forced constants
is quite dependent on other RTL optimisations, and it is easy for the issue to
become latent for a specific case.

First, in the Darwin-specific rs6000 backend code, we were not being careful
enough in rejecting invalid symbolic addresses.  Specifically, when generating
PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.

Second, LRA was attempting to load a register using an invalid lo_sum address.

The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
Darwin-specific (although, of course, any observations are welcome).

Tested on several lo_sum targets and x86_64 all languages except as noted:
powerpc64-linux (m32/m64) -D
powerpc64le-linux  -D
powerpc64-aix -Ada -Go -D
aarch64-linux -Ada -D
x86_64-linux all langs -D
powerpc-darwin9 (master and 11.2) -D -Go.

pushed to master, thanks,
Iain

Signed-off-by: Iain Sandoe 
Co-authored-by: Vladimir Makarov 

PR target/104117

gcc/ChangeLog:

* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
emitting PIC code.
(legitimate_lo_sum_address_p): Likewise.
* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
load from an invalid lo_sum address.
---
  gcc/config/rs6000/rs6000.cc | 38 +++--
  gcc/lra-constraints.cc  | 17 ++---
  2 files changed, 38 insertions(+), 17 deletions(-)

[…]
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index fdff9e0720a..c700c3f4578 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
  if (!valid_address_p (op, , cn))
{
- /* Try to put lo_sum into register.  */
- insn = emit_insn (gen_rtx_SET
-   (new_reg,
-gen_rtx_LO_SUM (Pmode, new_reg, 
addr)));
- code = recog_memoized (insn);
- if (code >= 0)
-   {
- *ad.inner = new_reg;
- if (!valid_address_p (op, , cn))
-   {
- *ad.inner = addr;
- code = -1;
-   }
-   }
-
+ *ad.inner = addr; /* Punt.  */
+ code = -1;
}
}
  if (code < 0)

Could you go into more details about this?  Why is it OK to continue
to try:

   (lo_sum new_reg addr)

directly as an address (the context at the top of the hunk), but not try
moving the lo_sum into a register?  They should be semantically equivalent,
so it seems that if one is wrong, the other would be too.


Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.

I think there is no need for this code and it is misleading.  If 
'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
will help for any existing target.  As machine-dependent code for any 
target most probably (for ppc64 darwin it is exactly the case) checks 
address only in memory, it can wrongly accept wrong address by reloading 
it into reg and use it in memory. So these are my arguments for the 
remove this code from process_address_1.





[committed] [PR104400] LRA: Modify exclude start hard register calculation for insn alternative

2022-02-11 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104400

The patch was successfully tested and bootstrapped on x86-64 and aarch64.
commit 274a4d29421e73c9b40c1641986c6ed904e20184
Author: Vladimir N. Makarov 
Date:   Fri Feb 11 09:52:14 2022 -0500

[PR104400] LRA: Modify exclude start hard register calculation for insn alternative

v850 target has an interesting insn alternative constraint 'e!r' where e
denotes even general regs and e is a subset of r.  We cannot just make
union of exclude start hard registers for e and r and should use only
exclude start hard registers of r.  The following patch implements this.

gcc/ChangeLog:

PR rtl-optimization/104400
* lra-constraints.cc (process_alt_operands): Don't make union of
this_alternative_exclude_start_hard_regs when reg class in insn
alternative covers other reg classes in the same alternative.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104400
* gcc.target/v850/pr104400.c: New.
* gcc.target/v850/v850.exp: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 9cee17479ba..fdff9e0720a 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -2498,9 +2498,15 @@ process_alt_operands (int only_alternative)
 		  if (mode == BLKmode)
 		break;
 		  this_alternative = reg_class_subunion[this_alternative][cl];
+		  if (hard_reg_set_subset_p (this_alternative_set,
+	 reg_class_contents[cl]))
+		this_alternative_exclude_start_hard_regs
+		  = ira_exclude_class_mode_regs[cl][mode];
+		  else if (!hard_reg_set_subset_p (reg_class_contents[cl],
+		   this_alternative_set))
+		this_alternative_exclude_start_hard_regs
+		  |= ira_exclude_class_mode_regs[cl][mode];
 		  this_alternative_set |= reg_class_contents[cl];
-		  this_alternative_exclude_start_hard_regs
-		|= ira_exclude_class_mode_regs[cl][mode];
 		  if (costly_p)
 		{
 		  this_costly_alternative
diff --git a/gcc/testsuite/gcc.target/v850/pr104400.c b/gcc/testsuite/gcc.target/v850/pr104400.c
new file mode 100644
index 000..5d78a77345c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/v850/pr104400.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mv850e3v5" } */
+
+double frob (double r)
+{
+r = -r;
+return r;
+}
diff --git a/gcc/testsuite/gcc.target/v850/v850.exp b/gcc/testsuite/gcc.target/v850/v850.exp
new file mode 100644
index 000..4e8c745a0b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/v850/v850.exp
@@ -0,0 +1,41 @@
+# Copyright (C) 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an v850 target.
+if ![istarget v850*-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
+	"" $DEFAULT_CFLAGS
+
+# All done.
+dg-finish


[pushed] [PR103676] LRA: Calculate and exclude some start hard registers for reload pseudos

2022-01-21 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103676

The patch was successfully bootstrapped and tested on x86_64, aarch64, 
and ppc64.
commit 85419ac59724b7ce710ebb4acf03dbd747edeea3
Author: Vladimir N. Makarov 
Date:   Fri Jan 21 13:34:32 2022 -0500

[PR103676] LRA: Calculate and exclude some start hard registers for reload pseudos

LRA and old reload pass uses only one register class for reload pseudos even if
operand constraints contain more one register class.  Let us consider
constraint 'lh' for thumb arm which means low and high thumb registers.
Reload pseudo for such constraint will have general reg class (union of
low and high reg classes).  Assigning the last low register to the reload
pseudo is wrong if the pseudo is of DImode as it requires two hard regs.
But it is considered OK if we use general reg class.  The following patch
solves this problem for LRA.

gcc/ChangeLog:

PR target/103676
* ira.h (struct target_ira): Add member
x_ira_exclude_class_mode_regs.
(ira_exclude_class_mode_regs): New macro.
* lra.h (lra_create_new_reg): Add arg exclude_start_hard_regs and
move from here ...
* lra-int.h: ... to here.
(lra_create_new_reg_with_unique_value): Add arg
exclude_start_hard_regs.
(class lra_reg): Add member exclude_start_hard_regs.
* lra-assigns.cc (find_hard_regno_for_1): Setup
impossible_start_hard_regs from exclude_start_hard_regs.
* lra-constraints.cc (get_reload_reg): Add arg exclude_start_hard_regs and pass
it lra_create_new_reg[_with_unique_value].
(match_reload): Ditto.
(check_and_process_move): Pass NULL
exclude_start_hard_regs to lra_create_new_reg_with_unique_value.
(goal_alt_exclude_start_hard_regs): New static variable.
(process_addr_reg, simplify_operand_subreg): Pass NULL
exclude_start_hard_regs to lra_create_new_reg_with_unique_value
and get_reload_reg.
(process_alt_operands): Setup goal_alt_exclude_start_hard_regs.
Use this_alternative_exclude_start_hard_regs additionally to find
winning operand alternative.
(base_to_reg, base_plus_disp_to_reg, index_part_to_reg): Pass NULL
exclude_start_hard_regs to lra_create_new_reg.
(process_address_1, emit_inc): Ditto.
(curr_insn_transform): Pass exclude_start_hard_regs value to
lra_create_new_reg, get_reload_reg, match_reload.
(inherit_reload_reg, split_reg): Pass NULL exclude_start_hard_regs
to lra_create_new_reg.
(process_invariant_for_inheritance): Ditto.
* lra-remat.cc (update_scratch_ops): Ditto.
* lra.cc (lra_create_new_reg_with_unique_value): Add arg
exclude_start_hard_regs.  Setup the corresponding member of
lra reg info.
(lra_create_new_reg): Add arg exclude_start_hard_regs and pass it
to lra_create_new_reg_with_unique_value.
(initialize_lra_reg_info_element): Initialize member
exclude_start_hard_regs.
(get_scratch_reg): Pass NULL to lra_create_new_reg.
* ira.cc (setup_prohibited_class_mode_regs): Rename to
setup_prohibited_and_exclude_class_mode_regs and calculate
ira_exclude_class_mode_regs.

gcc/testsuite/ChangeLog:

PR target/103676
* g++.target/arm/pr103676.C: New.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index f294f035d74..e3b3c549120 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -1465,10 +1465,11 @@ setup_reg_class_nregs (void)
 
 
 
-/* Set up IRA_PROHIBITED_CLASS_MODE_REGS and IRA_CLASS_SINGLETON.
-   This function is called once IRA_CLASS_HARD_REGS has been initialized.  */
+/* Set up IRA_PROHIBITED_CLASS_MODE_REGS, IRA_EXCLUDE_CLASS_MODE_REGS, and
+   IRA_CLASS_SINGLETON.  This function is called once IRA_CLASS_HARD_REGS has
+   been initialized.  */
 static void
-setup_prohibited_class_mode_regs (void)
+setup_prohibited_and_exclude_class_mode_regs (void)
 {
   int j, k, hard_regno, cl, last_hard_regno, count;
 
@@ -1480,6 +1481,7 @@ setup_prohibited_class_mode_regs (void)
 	  count = 0;
 	  last_hard_regno = -1;
 	  CLEAR_HARD_REG_SET (ira_prohibited_class_mode_regs[cl][j]);
+	  CLEAR_HARD_REG_SET (ira_exclude_class_mode_regs[cl][j]);
 	  for (k = ira_class_hard_regs_num[cl] - 1; k >= 0; k--)
 	{
 	  hard_regno = ira_class_hard_regs[cl][k];
@@ -1492,6 +1494,10 @@ setup_prohibited_class_mode_regs (void)
 		  last_hard_regno = hard_regno;
 		  count++;
 		}
+	  else
+		{
+		  SET_HARD_REG_BIT (ira_exclude_class_mode_regs[cl][j], hard_regno);
+		}
 	}
 	  ira_class_singleton[cl][j] = (count == 1 ? last_hard_regno : -1);
 	}
@@ -1707,7 +1713,7 @@ ira_init (void)
   

Re: [PATCH] ira: Fix old-reload targets [PR103974]

2022-01-12 Thread Vladimir Makarov via Gcc-patches



On 2022-01-12 03:47, Richard Biener wrote:

On Tue, Jan 11, 2022 at 7:41 PM Vladimir Makarov via Gcc-patches
 wrote:


On 2022-01-11 12:42, Richard Sandiford wrote:

The new IRA heuristics would need more work on old-reload targets,
since flattening needs to be able to undo the cost propagation.
It's doable, but hardly seems worth it.

Agree. It is not worth to spend your time for work for reload.

This patch therefore makes all the new calls to
ira_subloop_allocnos_can_differ_p return false if !ira_use_lra_p.
The color_pass code that predated the new function (and that was
the source of ira_subloop_allocnos_can_differ_p) continues to
behave as before.

It's a hack, but at least it has the advantage that the new parameter
would become obviously unused if reload and (!)ira_use_lra_p were
removed.  The hack should therefore disappear alongside reload.

I have a feeling that it will stay for a long time if not forever.

We indeed seem to have 34 targets w/o LRA by default and only 15 with :/

At some point Eric wrote a nice summary for how to transition targets
away from CC0, I wonder if there's something similar for transitioning
a port away from reload to LRA?  In those 34 targets there must be some
for which that's a relatively easy task?  I suppose it depends on how
much of the reload target hooks are put to use and how those translate
to LRA.


First of all the target should be rid of using CC0.  Then theoretically 
:) the target should work with LRA as LRA uses existing reload hooks 
(more accurately a subset of them).


In practice some work should be done for switching to LRA.  I did first 
4 major targets to work with LRA and unfortunately did not find some 
repeating patterns in this work.  The problems for the first targets 
were mostly unique and required a lot of LRA code modifications.  After 
that people did other target switching pretty easily and spent few time 
for this as I remember.


So based on my experience of porting targets to LRA I can not formalize 
this work.  But probably it can be done by examining all LRA targets 
code (mostly looking at machine dependent code related to use 
lra_in_progress_p) or by collecting information what was done from other 
people who did porting to LRA.




Re: [PATCH] ira: Fix old-reload targets [PR103974]

2022-01-11 Thread Vladimir Makarov via Gcc-patches



On 2022-01-11 12:42, Richard Sandiford wrote:

The new IRA heuristics would need more work on old-reload targets,
since flattening needs to be able to undo the cost propagation.
It's doable, but hardly seems worth it.

Agree. It is not worth to spend your time for work for reload.

This patch therefore makes all the new calls to
ira_subloop_allocnos_can_differ_p return false if !ira_use_lra_p.
The color_pass code that predated the new function (and that was
the source of ira_subloop_allocnos_can_differ_p) continues to
behave as before.

It's a hack, but at least it has the advantage that the new parameter
would become obviously unused if reload and (!)ira_use_lra_p were
removed.  The hack should therefore disappear alongside reload.
I have a feeling that it will stay for a long time if not forever. 
Recently I had the same problem.  My performance patch for IRA resulted 
in ice in reload pass on SH4.

Tested on aarch64-linux-gnu and cris-elf.  OK to install?

OK.  Thank you.



gcc/
PR rtl-optimization/103974
* ira-int.h (ira_subloop_allocnos_can_differ_p): Take an
extra argument, default true, that says whether old-reload
targets should be excluded.
* ira-color.c (color_pass): Pass false.




Re: [PATCH 6/6] ira: Handle "soft" conflicts between cap and non-cap allocnos

2022-01-10 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:48, Richard Sandiford wrote:

This patch looks for allocno conflicts of the following form:

- One allocno (X) is a cap allocno for some non-cap allocno X2.
- X2 belongs to some loop L2.
- The other allocno (Y) is a non-cap allocno.
- Y is an ancestor of some allocno Y2 in L2.
- Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0).
- Y can use a different allocation from Y2.

In this case, Y's register is live across L2 but is not used within it,
whereas X's register is used only within L2.  The conflict is therefore
only "soft", in that it can easily be avoided by spilling Y2 inside L2
without affecting any insn references.

In principle we could do this for ALLOCNO_NREFS (Y2) != 0 too, with the
callers then taking Y2's ALLOCNO_MEMORY_COST into account.  There would
then be no "cliff edge" between a Y2 that has no references and a Y2 that
has (say) a single cold reference.

However, doing that isn't necessary for the PR and seems to give
variable results in practice.  (fotonik3d_r improves slightly but
namd_r regresses slightly.)  It therefore seemed better to start
with the higher-value zero-reference case and see how things go.

On top of the previous patches in the series, this fixes the exchange2
regression seen in GCC 11.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_soft_conflict): Declare.
* ira-costs.c (max_soft_conflict_loop_depth): New constant.
(ira_soft_conflict): New function.
(spill_soft_conflicts): Likewise.
(assign_hard_reg): Use them to handle the case described by
the comment above ira_soft_conflict.
(improve_allocation): Likewise.
* ira.c (check_allocation): Allow allocnos with "soft" conflicts
to share the same register.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-4.c: New test.


OK.  If something goes wrong with the patches (e.g. a lot of GCC 
testsuite failures or performance degradation), we can revert only the 
last 3 of them as ones actually changing the heuristics.  But I hope it 
will be not necessary.


Thank you again for working on the PR.  Fixing it required big efforts 
in thinking, testing and benchmarking.





Re: [PATCH 5/6] ira: Consider modelling caller-save allocations as loop spills

2022-01-10 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:48, Richard Sandiford wrote:

If an allocno A in an inner loop L spans a call, a parent allocno AP
can choose to handle a call-clobbered/caller-saved hard register R
in one of two ways:

(1) save R before each call in L and restore R after each call
(2) spill R to memory throughout L

(2) can be cheaper than (1) in some cases, particularly if L does
not reference A.

Before the patch we always did (1).  The patch adds support for
picking (2) instead, when it seems cheaper.  It builds on the
earlier support for not propagating conflicts to parent allocnos.
Another cost calculation improvement for calls would be taking into 
account that allocno can be saved and restored once for several 
subsequent calls (e.g. in one BB).

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_caller_save_cost): New function.
(ira_caller_save_loop_spill_p): Likewise.
* ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
cheaper to spill a call-clobbered register throughout a loop rather
than spill it around each individual call.  If so, treat all
call-clobbered registers as conflicts and...
(propagate_allocno_info): ...do not propagate call information
from the child to the parent.
* ira-color.c (move_spill_restore): Update accordingly.
* ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-3.c: New test.

OK for me.  Thank you for the patch.



Re: [PATCH 4/6] ira: Try to avoid propagating conflicts

2022-01-10 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:47, Richard Sandiford wrote:

Suppose that:

- an inner loop L contains an allocno A
- L clobbers hard register R while A is live
- A's parent allocno is AP

Previously, propagate_allocno_info would propagate conflict sets up the
loop tree, so that the conflict between A and R would become a conflict
between AP and R (and so on for ancestors of AP).
My thoughts for propagating hard register conflicts was to avoid 
changing allocations on the region border as much as possible.  The 
solution you are proposing might result in allocating R to the allocno 
and creating moves/loads/stores on the region border when it would be 
possible to assign R to another allocno and another hard reg to the 
allocno in consideration.  As it is all about heuristics it is hard to 
say just speculating what probability of such situation and what 
heuristic is better.  Only checking credible benchmarks is a criterium 
to choose heuristics.  It seems yours work better.  Thank you putting 
deep thoughts in improving existing heuristics in this and the following 
patches, Richard.

However, when IRA treats loops as separate allocation regions, it can
decide on a loop-by-loop basis whether to allocate a register or spill
to memory.  Conflicts in inner loops therefore don't need to become
hard conflicts in parent loops.  Instead we can record that using the
“conflicting” registers for the parent allocnos has a higher cost.
In the example above, this higher cost is the sum of:

- the cost of saving R on entry to L
- the cost of keeping the pseudo register in memory throughout L
- the cost of reloading R on exit from L

This value is also a cap on the hard register cost that A can contribute
to AP in general (not just for conflicts).  Whatever allocation we pick
for AP, there is always the option of spilling that register to memory
throughout L, so the cost to A of allocating a register to AP can't be
more than the cost of spilling A.

To take an extreme example: if allocating a register R2 to A is more
expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2]
could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100
times greater than ALLOCNO_MEMORY_COST (A).  But this scale factor
doesn't matter to AP.  All that matters is that R2 is more expensive
than memory for A, so that allocating R2 to AP should be costed as
spilling A to memory (again assuming that A and AP are in different
allocation regions).  Propagating a factor of 100 would distort the
register costs for AP.

move_spill_restore tries to undo the propagation done by
propagate_allocno_info, so we need some extra processing there.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_allocno::might_conflict_with_parent_p): New field.
(ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro.
(ira_single_region_allocno_p): New function.
(ira_total_conflict_hard_regs): Likewise.
* ira-build.c (ira_create_allocno): Initialize
ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P.
(ira_propagate_hard_reg_costs): New function.
(propagate_allocno_info): Use it.  Try to avoid propagating
hard register conflicts to parent allocnos if we can handle
the conflicts by spilling instead.  Limit the propagated
register costs to the cost of spilling throughout the child loop.
* ira-color.c (color_pass): Use ira_single_region_allocno_p to
test whether a child and parent allocno can share the same
register.
(move_spill_restore): Adjust for the new behavior of
propagate_allocno_info.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-2.c: New test.

Thank you for the patch.  It is ok for me.



Re: [PATCH 3/6] ira: Add ira_subloop_allocnos_can_differ_p

2022-01-07 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:47, Richard Sandiford wrote:

color_pass has two instances of the same code for propagating non-cap
assignments from parent loops to subloops.  This patch adds a helper
function for testing when such propagations are required for correctness
and uses it to remove the duplicated code.

A later patch will use this in ira-build.c too, which is why the
function is exported to ira-int.h.

No functional change intended.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_subloop_allocnos_can_differ_p): New function,
extracted from...
* ira-color.c (color_pass): ...here.

OK.



Re: [PATCH 2/6] ira: Add comments and fix move_spill_restore calculation

2022-01-07 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:46, Richard Sandiford wrote:

This patch adds comments to describe each use of ira_loop_border_costs.
I think this highlights that move_spill_restore was using the wrong cost
in one case, which came from tranposing [0] and [1] in the original
(pre-ira_loop_border_costs) ira_memory_move_cost expressions.  The
difference would only be noticeable on targets that distinguish between
load and store costs.

gcc/
PR rtl-optimization/98782
* ira-color.c (color_pass): Add comments to describe the spill costs.
(move_spill_restore): Likewise.  Fix reversed calculation.

OK for me.  Thank you for fixing the cost typo.



Re: [PATCH 1/6] ira: Add a ira_loop_border_costs class

2022-01-07 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:46, Richard Sandiford wrote:

The final index into (ira_)memory_move_cost is 1 for loads and
0 for stores.  Thus the combination:

   entry_freq * memory_cost[1] + exit_freq * memory_cost[0]

is the cost of loading a register on entry to a loop and
storing it back on exit from the loop.  This is the cost to
use if the register is successfully allocated within the
loop but is spilled in the parent loop.  Similarly:

   entry_freq * memory_cost[0] + exit_freq * memory_cost[1]

is the cost of storing a register on entry to the loop and
restoring it on exit from the loop.  This is the cost to
use if the register is spilled within the loop but is
successfully allocated in the parent loop.

The patch adds a helper class for calculating these values and
mechanically replaces the existing instances.  There is no attempt to
editorialise the choice between using “spill inside” and “spill outside”
costs.  (I think one of them is the wrong way round, but a later patch
deals with that.)

No functional change intended.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_loop_border_costs): New class.
* ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
New constructor.
(calculate_allocno_spill_cost): Use ira_loop_border_costs.
(color_pass): Likewise.
(move_spill_restore): Likewise.

It is OK for me.



Re: [PATCH 0/6] ira: Fix performance regression in exchange2 [PR98782]

2022-01-07 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:45, Richard Sandiford wrote:

This series of patches recovers the exchange2 performance lost in the
GCC 11 timeframe (at least on aarch64 and Power9 -- thanks Pat for
testing the latter).

There are 6 patches, split into two groups of 3.  The first 3 are just
preparatory patches, although patch 2 does contain a minor bug fix.
The other 3 patches are the ones that together fix the regression.

I realise this is a bit invasive for stage 3.  However, the series is
fixing a large regression in an important benchmark and AFAIK there are
no known acceptable mitigations that we could apply instead.  I think
the series is also working with concepts that already exist in IRA:
it's really about tweaking the cost model used to control them.

We also still have at least 3 months (more realistically 4 months) of
testing before GCC 12 is released.  So perhaps one option would be to
apply any approved version of the series now, but with the understanding
that if there's significant fallout (more than a handful of small tweaks
or fixes), we would simply revert the patches rather than trying to
rework them in-situ.  The series is confined to IRA so reverting it
should be simple.  Would that be OK?


Richard. thank you for working on these issues.

I don't think there is a problem with the GCC development stage here.  
These are patches solving existing PR(s).  Of course it is better to do 
such changes earlier at the stage3, so IMHO the timing is right.


I don't expect that the changes will result in serious problems like 
wrong code generation or RA crashes as they are about improving RA 
heuristics.  They can result in new GCC test failures on some targets 
(we have many overconstrained tests expecting an exact GCC output).  If 
we are overwhelmed with the new failures we can revert the patches.


The first 3 patches are ok to commit.  I'll look at the rest 3 ones this 
weekend and write you my opinion on Monday.  I don't think there will be 
a problem with the last 3 patches.  They are clearly improving RA 
heuristics.  I just need some time to think about them.


Thank you again for picking this difficult PR and working on it.


Each patch bootstrapped & regression-tested individually on
aarch64-linux-gnu.  Also tested as a series on aarch64_be-elf,
arm-linux-gnueabihf, powerpc64le-linux-gnu, and x86_64-linux-gnu.





[committed] [PR99531] Do not scan push insn for ia32 in the test

2021-12-14 Thread Vladimir Makarov via Gcc-patches

This is one more patch for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99531

The following patch fixes the test failure on ia32.

commit 4ddeae2b2777aa5136fc2bb21c15b0fcccdafece
Author: Vladimir N. Makarov 
Date:   Tue Dec 14 08:57:30 2021 -0500

[PR99531] Do not scan push insn for ia32 in the test

The patch prohibits scanning push insn for ia32 as push are expected not to be generated only for x86_64 Linux ABI.

gcc/testsuite/ChangeLog:

PR target/99531
* gcc.target/i386/pr99531.c: Do not scan for ia32.

diff --git a/gcc/testsuite/gcc.target/i386/pr99531.c b/gcc/testsuite/gcc.target/i386/pr99531.c
index 0e1a08b7c77..98536452488 100644
--- a/gcc/testsuite/gcc.target/i386/pr99531.c
+++ b/gcc/testsuite/gcc.target/i386/pr99531.c
@@ -4,4 +4,4 @@
 int func(int, int, int, int, int, int);
 int caller(int a, int b, int c, int d, int e) { return func(0, a, b, c, d, e); }
 
-/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "push"  { target { ! ia32 } } } } */


[committed][PR99531] IRA:Modify pseudo class cost calculation when processing move involving the pseudo and a hard register

2021-12-13 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99531

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64.


After some observation, if all is ok, I will commit the patch into gcc 
release branches mentioned in the PR.


[PR99531] Modify pseudo class cost calculation when processing move involving the pseudo and a hard register

Pseudo class calculated on the 1st iteration should not have a
special treatment in cost calculation when processing move involving
the pseudo and a hard register.

gcc/ChangeLog:

PR target/99531
* ira-costs.c (record_operand_costs): Do not take pseudo class
calculated on the 1st iteration into account when processing move
involving the pseudo and a hard register.

gcc/testsuite/ChangeLog:

PR target/99531
* gcc.target/i386/pr99531.c: New test.

diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index cb5ca8bc21b..d7191dcee3e 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -1310,7 +1310,7 @@ record_operand_costs (rtx_insn *insn, enum reg_class *pref)
 	  machine_mode mode = GET_MODE (SET_SRC (set));
 	  cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
 	  enum reg_class *cost_classes = cost_classes_ptr->classes;
-	  reg_class_t rclass, hard_reg_class, pref_class, bigger_hard_reg_class;
+	  reg_class_t rclass, hard_reg_class, bigger_hard_reg_class;
 	  int cost, k;
 	  move_table *move_costs;
 	  bool dead_p = find_regno_note (insn, REG_DEAD, REGNO (src));
@@ -1336,23 +1336,6 @@ record_operand_costs (rtx_insn *insn, enum reg_class *pref)
 		  : move_costs[rclass][hard_reg_class]);
 	  
 	  op_costs[i]->cost[k] = cost * frequency;
-	  /* If we have assigned a class to this allocno in our
-		 first pass, add a cost to this alternative
-		 corresponding to what we would add if this allocno
-		 were not in the appropriate class.  */
-	  if (pref)
-		{
-		  if ((pref_class = pref[COST_INDEX (regno)]) == NO_REGS)
-		op_costs[i]->cost[k]
-		  += ((i == 0 ? ira_memory_move_cost[mode][rclass][0] : 0)
-			  + (i == 1 ? ira_memory_move_cost[mode][rclass][1] : 0)
-			  * frequency);
-		  else if (ira_reg_class_intersect[pref_class][rclass]
-			   == NO_REGS)
-		op_costs[i]->cost[k]
-		  += (move_costs[pref_class][rclass]
-			  * frequency);
-		}
 	  /* If this insn is a single set copying operand 1 to
 		 operand 0 and one operand is an allocno with the
 		 other a hard reg or an allocno that prefers a hard
@@ -1378,9 +1361,6 @@ record_operand_costs (rtx_insn *insn, enum reg_class *pref)
 	}
 	  op_costs[i]->mem_cost
 	= ira_memory_move_cost[mode][hard_reg_class][i] * frequency;
-	  if (pref && (pref_class = pref[COST_INDEX (regno)]) != NO_REGS)
-	op_costs[i]->mem_cost
-	  += ira_memory_move_cost[mode][pref_class][i] * frequency;
 	  return;
 	}
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr99531.c b/gcc/testsuite/gcc.target/i386/pr99531.c
new file mode 100644
index 000..0e1a08b7c77
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr99531.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { x86_64-*-linux* } } } */
+/* { dg-options "-O2" } */
+
+int func(int, int, int, int, int, int);
+int caller(int a, int b, int c, int d, int e) { return func(0, a, b, c, d, e); }
+
+/* { dg-final { scan-assembler-not "push" } } */


Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches


On 2021-12-02 12:21, Vladimir Makarov via Gcc-patches wrote:


On 2021-12-02 12:06, Vladimir Makarov wrote:



So simple problem and so many details :)

This will require that long long is at least twice as large as int
everywhere, I thought you wanted to do that only when
__builtin_smul_overflow isn't available.


That is not critical as GCC and probably all others C++ compiler 
support only targets with this assertion.  I guess it is better to 
find this problem earlier on targets (if any) where it is not true 
*independently* on used compiler.


So it is difficult for me to know what is better.  Probably for 
GCC/Clang oriented world, your variant is better as it permits to 
compile the code by GCC even on targets where the assertion is false.



After some more considerations, I think you are right and the backup 
code should be conditional.  Because otherwise, there is no sense to 
use code with __builtin_smul_overflow.  I'll do the changes.



Here is one more patch I've committed.  Jakub, thank your for the 
discussion and your patience.


commit a72b8f376a176c620f1c1c684f2eee2016e6b4c3
Author: Vladimir N. Makarov 
Date:   Thu Dec 2 12:31:28 2021 -0500

[PR103437] Make backup code for overflow conditional

Switch off long long variant overflow code by preprocessor if the
build compiler has __builtin_smul_overflow.

gcc/ChangeLog:
PR rtl-optimization/103437
* ira-color.c (setup_allocno_priorities): Switch off backup code
for overflow if compiler has __builtin_smul_overflow.  Use <
for comparison with -INT_MAX.

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 3b19a58e1f0..a1b02776e77 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2797,7 +2797,6 @@ static void
 setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
 {
   int i, length, nrefs, priority, max_priority, mult, diff;
-  bool overflow_backup_p = true;
   ira_allocno_t a;
 
   max_priority = 0;
@@ -2810,27 +2809,27 @@ setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
   ira_assert (mult >= 0);
   mult *= ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
   diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
-  /* Multiplication can overflow for very large functions.
-	 Check the overflow and constrain the result if necessary: */
 #ifdef __has_builtin
 #if __has_builtin(__builtin_smul_overflow)
-  overflow_backup_p = false;
+#define HAS_SMUL_OVERFLOW
+#endif
+#endif
+  /* Multiplication can overflow for very large functions.
+	 Check the overflow and constrain the result if necessary: */
+#ifdef HAS_SMUL_OVERFLOW
   if (__builtin_smul_overflow (mult, diff, )
-	  || priority <= -INT_MAX)
+	  || priority < -INT_MAX)
 	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+#else
+  static_assert
+	(sizeof (long long) >= 2 * sizeof (int),
+	 "overflow code does not work for such int and long long sizes");
+  long long priorityll = (long long) mult * diff;
+  if (priorityll < -INT_MAX || priorityll > INT_MAX)
+	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+  else
+	priority = priorityll;
 #endif
-#endif
-  if (overflow_backup_p)
-	{
-	  static_assert
-	(sizeof (long long) >= 2 * sizeof (int),
-	 "overflow code does not work for such int and long long sizes");
-	  long long priorityll = (long long) mult * diff;
-	  if (priorityll < -INT_MAX || priorityll > INT_MAX)
-	priority = diff >= 0 ? INT_MAX : -INT_MAX;
-	  else
-	priority = priorityll;
-	}
   allocno_priorities[ALLOCNO_NUM (a)] = priority;
   if (priority < 0)
 	priority = -priority;


Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches



On 2021-12-02 12:06, Vladimir Makarov wrote:


On 2021-12-02 11:13, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 11:03:46AM -0500, Vladimir Makarov wrote:

--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2797,6 +2797,7 @@ static void
  setup_allocno_priorities (ira_allocno_t *consideration_allocnos, 
int n)

  {
    int i, length, nrefs, priority, max_priority, mult, diff;
+  bool overflow_backup_p = true;
    ira_allocno_t a;
      max_priority = 0;
@@ -2811,9 +2812,25 @@ setup_allocno_priorities (ira_allocno_t 
*consideration_allocnos, int n)

    diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
    /* Multiplication can overflow for very large functions.
   Check the overflow and constrain the result if necessary: */
+#ifdef __has_builtin
+#if __has_builtin(__builtin_smul_overflow)
+  overflow_backup_p = false;
    if (__builtin_smul_overflow (mult, diff, )
    || priority <= -INT_MAX)
  priority = diff >= 0 ? INT_MAX : -INT_MAX;
+#endif
+#endif
+  if (overflow_backup_p)
+    {
+  static_assert
+    (sizeof (long long) >= 2 * sizeof (int),
+ "overflow code does not work for such int and long long 
sizes");

+  long long priorityll = (long long) mult * diff;
+  if (priorityll < -INT_MAX || priorityll > INT_MAX)
+    priority = diff >= 0 ? INT_MAX : -INT_MAX;
+  else
+    priority = priorityll;
+    }

So simple problem and so many details :)

This will require that long long is at least twice as large as int
everywhere, I thought you wanted to do that only when
__builtin_smul_overflow isn't available.


That is not critical as GCC and probably all others C++ compiler 
support only targets with this assertion.  I guess it is better to 
find this problem earlier on targets (if any) where it is not true 
*independently* on used compiler.


So it is difficult for me to know what is better.  Probably for 
GCC/Clang oriented world, your variant is better as it permits to 
compile the code by GCC even on targets where the assertion is false.



After some more considerations, I think you are right and the backup 
code should be conditional.  Because otherwise, there is no sense to use 
code with __builtin_smul_overflow.  I'll do the changes.





Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches



On 2021-12-02 11:13, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 11:03:46AM -0500, Vladimir Makarov wrote:

--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2797,6 +2797,7 @@ static void
  setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
  {
int i, length, nrefs, priority, max_priority, mult, diff;
+  bool overflow_backup_p = true;
ira_allocno_t a;
  
max_priority = 0;

@@ -2811,9 +2812,25 @@ setup_allocno_priorities (ira_allocno_t 
*consideration_allocnos, int n)
diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
/* Multiplication can overflow for very large functions.
 Check the overflow and constrain the result if necessary: */
+#ifdef __has_builtin
+#if __has_builtin(__builtin_smul_overflow)
+  overflow_backup_p = false;
if (__builtin_smul_overflow (mult, diff, )
  || priority <= -INT_MAX)
priority = diff >= 0 ? INT_MAX : -INT_MAX;
+#endif
+#endif
+  if (overflow_backup_p)
+   {
+ static_assert
+   (sizeof (long long) >= 2 * sizeof (int),
+"overflow code does not work for such int and long long sizes");
+ long long priorityll = (long long) mult * diff;
+ if (priorityll < -INT_MAX || priorityll > INT_MAX)
+   priority = diff >= 0 ? INT_MAX : -INT_MAX;
+ else
+   priority = priorityll;
+   }

So simple problem and so many details :)

This will require that long long is at least twice as large as int
everywhere, I thought you wanted to do that only when
__builtin_smul_overflow isn't available.


That is not critical as GCC and probably all others C++ compiler support 
only targets with this assertion.  I guess it is better to find this 
problem earlier on targets (if any) where it is not true *independently* 
on used compiler.


So it is difficult for me to know what is better.  Probably for 
GCC/Clang oriented world, your variant is better as it permits to 
compile the code by GCC even on targets where the assertion is false.



That would be
#ifdef __has_builtin
#if __has_builtin(__builtin_smul_overflow)
#define HAS_SMUL_OVERFLOW
#endif
#endif
#ifdef HAS_SMUL_OVERFLOW
   if (__builtin_smul_overflow (mult, diff, )
  || priority <= -INT_MAX)
priority = diff >= 0 ? INT_MAX : -INT_MAX;
#else
   static_assert (sizeof (long long) >= 2 * sizeof (int),
 "overflow code does not work for int wider"
 "than half of long long");
   long long priorityll = (long long) mult * diff;
   if (priorityll < -INT_MAX || priorityll > INT_MAX)
priority = diff >= 0 ? INT_MAX : -INT_MAX;
   else
priority = priorityll;
#endif
Why priority <= -INT_MAX in the first case though,
shouldn't that be < -INT_MAX ?


My thought was to avoid 'always false' warning for non two's compliment 
binary representation targets.  As I remember C++17 started to require 
only two-compliment integers.  If we require to use only c++17 and 
upper, then probably it is better to fix it.


In any case, I feel these details are not my area of expertise. If you 
believe I should do these changes, please confirm that you want these 
changes and I'll definitely do this.  Thank you, Jakub.








Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches


On 2021-12-02 10:52, Christophe Lyon wrote:



On Thu, Dec 2, 2021 at 3:38 PM Vladimir Makarov via Gcc-patches 
 wrote:



On 2021-12-02 09:29, Jakub Jelinek wrote:
> On Thu, Dec 02, 2021 at 09:23:20AM -0500, Vladimir Makarov wrote:
>> On 2021-12-02 09:00, Jakub Jelinek wrote:
>>> On Thu, Dec 02, 2021 at 08:53:31AM -0500, Vladimir Makarov via
Gcc-patches wrote:
>>>> The following patch fixes
>>>>
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437
>>>>
>>>> The patch was successfully bootstrapped and tested on x86-64.
There is no
>>>> test as the bug occurs on GCC built with sanitizing for an
existing go test.
>>> I'm afraid we can't use __builtin_smul_overflow, not all
system compilers
>>> will have that.
>>> But, as it is done in int and we kind of rely on int being
32-bit on host
>>> and rely on long long being 64-bit, I think you can do
something like:
>>>         long long priorityll = (long long) mult * diff;
>>>         priority = priorityll;
>>>         if (priorityll != priority
>>> ...
>>>
>>>
>> My 1st version of the patch was based on long long but the
standard does not
>> guarantee that int size is smaller than long long size. 
Although it is true
>> for all targets supported by GCC.
>>
>> Another solution would be to switching to int32_t instead of
int for costs
>> but it will require a lot of changes in RA code.
>>
>> I see your point for usage system compiler different from GCC
and LLVM.  I
>> guess I could change it to
>>
>> #if __GNUC__ >= 5
> #ifdef __has_builtin
> # if __has_builtin(__builtin_smul_overflow)
> would be the best check.
> And you can just gcc_assert (sizeof (long long) >= 2 * sizeof
(int));
> in the fallback code ;)

I used static_assert in my 1st patch version.  I think it is
better than
gcc_assert..

I'll commit patch fix today.  Thank you for your feedback, Jakub.


Thanks, I confirm I am seeing build failures with gcc-4.8.5 ;-)

I've committed the following patch with the backup code.  Sorry for 
inconvenience.


commit 0eb22e619c294efb0f007178a230cac413dccb87
Author: Vladimir N. Makarov 
Date:   Thu Dec 2 10:55:59 2021 -0500

[PR103437] Use long long multiplication as backup for overflow processing

__builtin_smul_overflow can be unavailable for some C++ compilers.
Add long long multiplication as backup for overflow processing.

gcc/ChangeLog:
PR rtl-optimization/103437
* ira-color.c (setup_allocno_priorities): Use long long
multiplication as backup for overflow processing.

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 1f80cbea0e2..3b19a58e1f0 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2797,6 +2797,7 @@ static void
 setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
 {
   int i, length, nrefs, priority, max_priority, mult, diff;
+  bool overflow_backup_p = true;
   ira_allocno_t a;
 
   max_priority = 0;
@@ -2811,9 +2812,25 @@ setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
   diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
   /* Multiplication can overflow for very large functions.
 	 Check the overflow and constrain the result if necessary: */
+#ifdef __has_builtin
+#if __has_builtin(__builtin_smul_overflow)
+  overflow_backup_p = false;
   if (__builtin_smul_overflow (mult, diff, )
 	  || priority <= -INT_MAX)
 	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+#endif
+#endif
+  if (overflow_backup_p)
+	{
+	  static_assert
+	(sizeof (long long) >= 2 * sizeof (int),
+	 "overflow code does not work for such int and long long sizes");
+	  long long priorityll = (long long) mult * diff;
+	  if (priorityll < -INT_MAX || priorityll > INT_MAX)
+	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+	  else
+	priority = priorityll;
+	}
   allocno_priorities[ALLOCNO_NUM (a)] = priority;
   if (priority < 0)
 	priority = -priority;


Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches



On 2021-12-02 09:29, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 09:23:20AM -0500, Vladimir Makarov wrote:

On 2021-12-02 09:00, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 08:53:31AM -0500, Vladimir Makarov via Gcc-patches 
wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437

The patch was successfully bootstrapped and tested on x86-64. There is no
test as the bug occurs on GCC built with sanitizing for an existing go test.

I'm afraid we can't use __builtin_smul_overflow, not all system compilers
will have that.
But, as it is done in int and we kind of rely on int being 32-bit on host
and rely on long long being 64-bit, I think you can do something like:
long long priorityll = (long long) mult * diff;
priority = priorityll;
if (priorityll != priority
...



My 1st version of the patch was based on long long but the standard does not
guarantee that int size is smaller than long long size.  Although it is true
for all targets supported by GCC.

Another solution would be to switching to int32_t instead of int for costs
but it will require a lot of changes in RA code.

I see your point for usage system compiler different from GCC and LLVM.  I
guess I could change it to

#if __GNUC__ >= 5

#ifdef __has_builtin
# if __has_builtin(__builtin_smul_overflow)
would be the best check.
And you can just gcc_assert (sizeof (long long) >= 2 * sizeof (int));
in the fallback code ;)


I used static_assert in my 1st patch version.  I think it is better than 
gcc_assert..


I'll commit patch fix today.  Thank you for your feedback, Jakub.



Re: [PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches



On 2021-12-02 09:00, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 08:53:31AM -0500, Vladimir Makarov via Gcc-patches 
wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437

The patch was successfully bootstrapped and tested on x86-64. There is no
test as the bug occurs on GCC built with sanitizing for an existing go test.

I'm afraid we can't use __builtin_smul_overflow, not all system compilers
will have that.
But, as it is done in int and we kind of rely on int being 32-bit on host
and rely on long long being 64-bit, I think you can do something like:
   long long priorityll = (long long) mult * diff;
   priority = priorityll;
   if (priorityll != priority
...


My 1st version of the patch was based on long long but the standard does 
not guarantee that int size is smaller than long long size.  Although it 
is true for all targets supported by GCC.


Another solution would be to switching to int32_t instead of int for 
costs but it will require a lot of changes in RA code.


I see your point for usage system compiler different from GCC and LLVM.  
I guess I could change it to


#if __GNUC__ >= 5

current code

#else

long long code

#endif


What do you think?




[PR103437] [committed] IRA: Process multiplication overflow in priority calculation for allocno assignments

2021-12-02 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437

The patch was successfully bootstrapped and tested on x86-64. There is 
no test as the bug occurs on GCC built with sanitizing for an existing 
go test.
commit c6cf5ac1522c54b2ced98fc687e973a9ff17ba1e
Author: Vladimir N. Makarov 
Date:   Thu Dec 2 08:29:45 2021 -0500

[PR103437] Process multiplication overflow in priority calculation for allocno assignments

We process overflows in cost calculations but for huge functions
priority calculation can overflow as priority can be bigger the cost
used for it.  The patch fixes the problem.

gcc/ChangeLog:

PR rtl-optimization/103437
* ira-color.c (setup_allocno_priorities): Process multiplication
overflow.

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 3d01c60800c..1f80cbea0e2 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2796,7 +2796,7 @@ static int *allocno_priorities;
 static void
 setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
 {
-  int i, length, nrefs, priority, max_priority, mult;
+  int i, length, nrefs, priority, max_priority, mult, diff;
   ira_allocno_t a;
 
   max_priority = 0;
@@ -2807,11 +2807,14 @@ setup_allocno_priorities (ira_allocno_t *consideration_allocnos, int n)
   ira_assert (nrefs >= 0);
   mult = floor_log2 (ALLOCNO_NREFS (a)) + 1;
   ira_assert (mult >= 0);
-  allocno_priorities[ALLOCNO_NUM (a)]
-	= priority
-	= (mult
-	   * (ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a))
-	   * ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
+  mult *= ira_reg_class_max_nregs[ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)];
+  diff = ALLOCNO_MEMORY_COST (a) - ALLOCNO_CLASS_COST (a);
+  /* Multiplication can overflow for very large functions.
+	 Check the overflow and constrain the result if necessary: */
+  if (__builtin_smul_overflow (mult, diff, )
+	  || priority <= -INT_MAX)
+	priority = diff >= 0 ? INT_MAX : -INT_MAX;
+  allocno_priorities[ALLOCNO_NUM (a)] = priority;
   if (priority < 0)
 	priority = -priority;
   if (max_priority < priority)


[committed] [PR102842] LRA: Consider all outputs in generation of matching reloads

2021-10-26 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102842

As the patch touches a sensitive LRA code, the patch was bootstrapped 
tested on x86-64, aarch64, and ppc64.


I've committed the patch only in master branch.  Later (after some 
observation), I'll commit it into gcc-10 and gcc-11 branches.


commit 8c59f4118357789cfa8df2cf0d3ecb61be7e9041
Author: Vladimir N. Makarov 
Date:   Tue Oct 26 14:03:42 2021 -0400

[PR102842] Consider all outputs in generation of matching reloads

Without considering all output insn operands (not only processed
before), in rare cases LRA can use the same hard register for
different outputs of the insn on different assignment subpasses.  The
patch fixes the problem.

gcc/ChangeLog:

PR rtl-optimization/102842
* lra-constraints.c (match_reload): Ignore out in checking values
of outs.
(curr_insn_transform): Collect outputs before doing reloads of operands.

gcc/testsuite/ChangeLog:

PR rtl-optimization/102842
* g++.target/arm/pr102842.C: New test.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 8f75125fc2e..0195b4fb9c3 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1102,7 +1102,7 @@ match_reload (signed char out, signed char *ins, signed char *outs,
 	  for (i = 0; outs[i] >= 0; i++)
 	{
 	  rtx other_out_rtx = *curr_id->operand_loc[outs[i]];
-	  if (REG_P (other_out_rtx)
+	  if (outs[i] != out && REG_P (other_out_rtx)
 		  && (regno_val_use_in (REGNO (in_rtx), other_out_rtx)
 		  != NULL_RTX))
 		{
@@ -4382,7 +4382,10 @@ curr_insn_transform (bool check_only_p)
   }
 
   n_outputs = 0;
-  outputs[0] = -1;
+  for (i = 0; i < n_operands; i++)
+if (curr_static_id->operand[i].type == OP_OUT)
+  outputs[n_outputs++] = i;
+  outputs[n_outputs] = -1;
   for (i = 0; i < n_operands; i++)
 {
   int regno;
@@ -4457,8 +4460,6 @@ curr_insn_transform (bool check_only_p)
 		 lra-lives.c.  */
 		  match_reload (i, goal_alt_matched[i], outputs, goal_alt[i], ,
 , TRUE);
-		  outputs[n_outputs++] = i;
-		  outputs[n_outputs] = -1;
 		}
 	  continue;
 	}
@@ -4636,14 +4637,6 @@ curr_insn_transform (bool check_only_p)
 	   process_alt_operands decides that it is possible.  */
 	gcc_unreachable ();
 
-  /* Memorise processed outputs so that output remaining to be processed
-	 can avoid using the same register value (see match_reload).  */
-  if (curr_static_id->operand[i].type == OP_OUT)
-	{
-	  outputs[n_outputs++] = i;
-	  outputs[n_outputs] = -1;
-	}
-
   if (optional_p)
 	{
 	  rtx reg = op;
diff --git a/gcc/testsuite/g++.target/arm/pr102842.C b/gcc/testsuite/g++.target/arm/pr102842.C
new file mode 100644
index 000..a2bac66091a
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/pr102842.C
@@ -0,0 +1,30 @@
+/* PR rtl-optimization/102842 */
+/* { dg-do compile } */
+/* { dg-options "-fPIC  -O2 -fno-omit-frame-pointer -mthumb -march=armv7-a+fp" } */
+
+struct Plane {
+  using T = float;
+  T *Row();
+};
+using ImageF = Plane;
+long long Mirror_x;
+struct EnsurePaddingInPlaceRowByRow {
+  void Process() {
+switch (strategy_) {
+case kSlow:
+  float *row = img_.Row();
+  long long xsize = x1_;
+  while (Mirror_x >= xsize)
+if (Mirror_x)
+  Mirror_x = 2 * xsize - 1;
+  *row = Mirror_x;
+}
+  }
+  ImageF img_;
+  unsigned x1_;
+  enum { kSlow } strategy_;
+};
+void FinalizeImageRect() {
+  EnsurePaddingInPlaceRowByRow ensure_padding;
+  ensure_padding.Process();
+}


[committed] LRA: [PR102627] Use at least natural mode during splitting hard reg live range

2021-10-08 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102627

The patch was successfully bootstrapped and tested on x86-64.


commit fab2d977e69539aad9bef81caff17de48e53aedf (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Fri Oct 8 10:16:09 2021 -0400

[PR102627] Use at least natural mode during splitting hard reg live range

In the PR test case SImode was used to split live range of cx on x86-64
because it was the biggest mode for this hard reg in the function.  But
all 64-bits of cx contain structure members.  We need always to use at least
natural mode of hard reg in splitting to fix this problem.

gcc/ChangeLog:

PR rtl-optimization/102627
* lra-constraints.c (split_reg): Use at least natural mode of hard reg.

gcc/testsuite/ChangeLog:

PR rtl-optimization/102627
* gcc.target/i386/pr102627.c: New test.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 4d734548c38..8f75125fc2e 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5799,11 +5799,12 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
 	 part of a multi-word register.  In that case, just use the reg_rtx
 	 mode.  Do the same also if the biggest mode was larger than a register
 	 or we can not compare the modes.  Otherwise, limit the size to that of
-	 the biggest access in the function.  */
+	 the biggest access in the function or to the natural mode at least.  */
   if (mode == VOIDmode
 	  || !ordered_p (GET_MODE_PRECISION (mode),
 			 GET_MODE_PRECISION (reg_rtx_mode))
-	  || paradoxical_subreg_p (mode, reg_rtx_mode))
+	  || paradoxical_subreg_p (mode, reg_rtx_mode)
+	  || maybe_gt (GET_MODE_PRECISION (reg_rtx_mode), GET_MODE_PRECISION (mode)))
 	{
 	  original_reg = regno_reg_rtx[hard_regno];
 	  mode = reg_rtx_mode;
diff --git a/gcc/testsuite/gcc.target/i386/pr102627.c b/gcc/testsuite/gcc.target/i386/pr102627.c
new file mode 100644
index 000..8ab9acaf002
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr102627.c
@@ -0,0 +1,41 @@
+/* PR rtl-optimization/102627 */
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+int a, f, l, m, q, c, d, g;
+long b, e;
+struct g {
+  signed h;
+  signed i;
+  unsigned j;
+  unsigned k;
+};
+unsigned n;
+char o;
+int *p = 
+long r(int s) { return s && b ?: b; }
+long __attribute__((noipa)) v() {
+  l = 0 || r(n & o);
+  return q;
+}
+void w(int, unsigned, struct g x) {
+  c ?: a;
+  for (; d < 2; d++)
+*p = x.k;
+}
+struct g __attribute__((noipa)) y() {
+  struct g h = {3, 908, 1, 20};
+  for (; g; g++)
+;
+  return h;
+}
+int main() {
+  long t;
+  struct g u = y();
+  t = e << f;
+  w(0, t, u);
+  v(0, 4, 4, 4);
+  if (m != 20)
+__builtin_abort ();
+  return 0;
+}


[pushed] IRA: Make profitability calculation of RA conflict presentations independent of host compiler type sizes of RA conflict presentations independent of host compiler type sizes [PR102147]

2021-09-24 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102147

The patch was successfully bootstrapped and tested on x86-64.


commit ec4c30b64942e615b4bb4b9761cd3b2635158608 (HEAD -> master)
Author: Vladimir N. Makarov 
Date:   Fri Sep 24 10:06:45 2021 -0400

    Make profitability calculation of RA conflict presentations 
independent of host compiler type sizes. [PR102147]


    gcc/ChangeLog:

    2021-09-24  Vladimir Makarov  

    PR rtl-optimization/102147
    * ira-build.c (ira_conflict_vector_profitable_p): Make
    profitability calculation independent of host compiler 
pointer and

    IRA_INT_BITS sizes.

diff --git a/gcc/ira-build.c b/gcc/ira-build.c
index 42120656366..2a30efc4f2f 100644
--- a/gcc/ira-build.c
+++ b/gcc/ira-build.c
@@ -629,7 +629,7 @@ ior_hard_reg_conflicts (ira_allocno_t a, 
const_hard_reg_set set)

 bool
 ira_conflict_vector_profitable_p (ira_object_t obj, int num)
 {
-  int nw;
+  int nbytes;
   int max = OBJECT_MAX (obj);
   int min = OBJECT_MIN (obj);

@@ -638,9 +638,14 @@ ira_conflict_vector_profitable_p (ira_object_t obj, 
int num)

    in allocation.  */
 return false;

-  nw = (max - min + IRA_INT_BITS) / IRA_INT_BITS;
-  return (2 * sizeof (ira_object_t) * (num + 1)
- < 3 * nw * sizeof (IRA_INT_TYPE));
+  nbytes = (max - min) / 8 + 1;
+  STATIC_ASSERT (sizeof (ira_object_t) <= 8);
+  /* Don't use sizeof (ira_object_t), use constant 8.  Size of 
ira_object_t (a

+ pointer) is different on 32-bit and 64-bit targets.  Usage sizeof
+ (ira_object_t) can result in different code generation by GCC 
built as 32-
+ and 64-bit program.  In any case the profitability is just an 
estimation

+ and border cases are rare.  */
+  return (2 * 8 /* sizeof (ira_object_t) */ * (num + 1) < 3 * nbytes);
 }

 /* Allocates and initialize the conflict vector of OBJ for NUM



Re: [PATCH v4] ira: Support more matching constraint forms with param [PR100328]

2021-07-05 Thread Vladimir Makarov via Gcc-patches



On 2021-07-01 10:11 p.m., Kewen.Lin wrote:

Hi Vladimir,

on 2021/6/30 下午11:24, Vladimir Makarov wrote:


Many thanks for your review!  I've updated the patch according to your comments 
and also polished some comments and document words a bit.  Does it look better 
to you?

Sorry for the delay with the answer.  The patch is better for me now and 
can be committed into the trunk.


Thanks again for working on this performance issue.




Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]

2021-06-30 Thread Vladimir Makarov via Gcc-patches



On 2021-06-28 2:26 a.m., Kewen.Lin wrote:

Hi!

on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:

Hi,

PR100328 has some details about this issue, I am trying to
brief it here.  In the hottest function LBM_performStreamCollideTRT
of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
(27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
class have 64 registers whose foregoing 32 ones make up the
whole FLOAT_REG.  There are some differences for these two
flavors, taking "*fma4_fpr" as example:

(define_insn "*fma4_fpr"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
(fma:SFDF
  (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
  (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
  (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]

// wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
//  (f/d) => A floating point register, aka. FLOAT_REG.

So for VSX_REG, we only have the destructive form, when VSX_REG
alternative being used, the operand 2 or operand 3 is required
to be the same as operand 0.  reload has to take care of this
constraint and create some non-free register copies if required.

Assuming one fma insn looks like:
   op0 = FMA (op1, op2, op3)

The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
IRA simply creates three shuffle copies for them (here the operand
order matters, since with the same freq, the one with smaller number
takes preference), but IMO both op2 and op3 should take higher priority
in copy queue due to the matching constraint.

I noticed that there is one function ira_get_dup_out_num, which meant
to create this kind of constraint copy, but the below code looks to
refuse to create if there is an alternative which has valid regclass
without spilled need.

   default:
{
  enum constraint_num cn = lookup_constraint (str);
  enum reg_class cl = reg_class_for_constraint (cn);
  if (cl != NO_REGS
  && !targetm.class_likely_spilled_p (cl))
goto fail

 ...

I cooked one patch attached to make ira respect this kind of matching
constraint guarded with one parameter.  As I stated in the PR, I was
not sure this is on the right track.  The RFC patch is to check the
matching constraint in all alternatives, if there is one alternative
with matching constraint and matches the current preferred regclass
(or best of allocno?), it will record the output operand number and
further create one constraint copy for it.  Normally it can get the
priority against shuffle copies and the matching constraint will get
satisfied with higher possibility, reload doesn't create extra copies
to meet the matching constraint or the desirable register class when
it has to.

For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
as shuffle copies, and later any of A,B,C,D gets assigned by one
hardware register which is a VSX register (VSX_REG) but not a FP
register (FLOAT_REG), which means it has to pay costs once we can NOT
go with VSX alternatives, so at that time it's important to respect
the matching constraint then we can increase the freq for the remaining
copies related to this (A/B, A/C, A/D).  This idea requires some side
tables to record some information and seems a bit complicated in the
current framework, so the proposed patch aggressively emphasizes the
matching constraint at the time of creating copies.


Comparing with the original patch (v1), this patch v3 has
considered: (this should be v2 for this mail list, but bump
it to be consistent as PR's).

   - Excluding the case where for one preferred register class
 there can be two or more alternatives, one of them has the
 matching constraint, while another doesn't have.  So for
 the given operand, even if it's assigned by a hardware reg
 which doesn't meet the matching constraint, it can simply
 use the alternative which doesn't have matching constraint
 so no register move is needed.  One typical case is
 define_insn *mov_internal2 on rs6000.  So we
 shouldn't create constraint copy for it.

   - The possible free register move in the same register class,
 disable this if so since the register move to meet the
 constraint is considered as free.

   - Making it on by default, suggested by Segher & Vladimir, we
 hope to get rid of the parameter if the benchmarking result
 looks good on major targets.

   - Tweaking cost when either of matching constraint two sides
 is hardware register.  Before this patch, the constraint
 copy is simply taken as a real move insn for pref and
 conflict cost with one hardware register, after this patch,
 it's allowed that there are several input operands
 respecting the same matching constraint (but in different
 alternatives), so we should take it to be like shuffle copy
 for some cases to avoid over 

Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]

2021-06-30 Thread Vladimir Makarov via Gcc-patches



On 2021-06-28 2:26 a.m., Kewen.Lin wrote:

Hi!

on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:

Hi,

PR100328 has some details about this issue, I am trying to
brief it here.  In the hottest function LBM_performStreamCollideTRT
of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
(27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
class have 64 registers whose foregoing 32 ones make up the
whole FLOAT_REG.  There are some differences for these two
flavors, taking "*fma4_fpr" as example:

(define_insn "*fma4_fpr"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
(fma:SFDF
  (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
  (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
  (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]

// wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
//  (f/d) => A floating point register, aka. FLOAT_REG.

So for VSX_REG, we only have the destructive form, when VSX_REG
alternative being used, the operand 2 or operand 3 is required
to be the same as operand 0.  reload has to take care of this
constraint and create some non-free register copies if required.

Assuming one fma insn looks like:
   op0 = FMA (op1, op2, op3)

The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
IRA simply creates three shuffle copies for them (here the operand
order matters, since with the same freq, the one with smaller number
takes preference), but IMO both op2 and op3 should take higher priority
in copy queue due to the matching constraint.

I noticed that there is one function ira_get_dup_out_num, which meant
to create this kind of constraint copy, but the below code looks to
refuse to create if there is an alternative which has valid regclass
without spilled need.

   default:
{
  enum constraint_num cn = lookup_constraint (str);
  enum reg_class cl = reg_class_for_constraint (cn);
  if (cl != NO_REGS
  && !targetm.class_likely_spilled_p (cl))
goto fail

 ...

I cooked one patch attached to make ira respect this kind of matching
constraint guarded with one parameter.  As I stated in the PR, I was
not sure this is on the right track.  The RFC patch is to check the
matching constraint in all alternatives, if there is one alternative
with matching constraint and matches the current preferred regclass
(or best of allocno?), it will record the output operand number and
further create one constraint copy for it.  Normally it can get the
priority against shuffle copies and the matching constraint will get
satisfied with higher possibility, reload doesn't create extra copies
to meet the matching constraint or the desirable register class when
it has to.

For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
as shuffle copies, and later any of A,B,C,D gets assigned by one
hardware register which is a VSX register (VSX_REG) but not a FP
register (FLOAT_REG), which means it has to pay costs once we can NOT
go with VSX alternatives, so at that time it's important to respect
the matching constraint then we can increase the freq for the remaining
copies related to this (A/B, A/C, A/D).  This idea requires some side
tables to record some information and seems a bit complicated in the
current framework, so the proposed patch aggressively emphasizes the
matching constraint at the time of creating copies.


Comparing with the original patch (v1), this patch v3 has
considered: (this should be v2 for this mail list, but bump
it to be consistent as PR's).

   - Excluding the case where for one preferred register class
 there can be two or more alternatives, one of them has the
 matching constraint, while another doesn't have.  So for
 the given operand, even if it's assigned by a hardware reg
 which doesn't meet the matching constraint, it can simply
 use the alternative which doesn't have matching constraint
 so no register move is needed.  One typical case is
 define_insn *mov_internal2 on rs6000.  So we
 shouldn't create constraint copy for it.

   - The possible free register move in the same register class,
 disable this if so since the register move to meet the
 constraint is considered as free.

   - Making it on by default, suggested by Segher & Vladimir, we
 hope to get rid of the parameter if the benchmarking result
 looks good on major targets.

   - Tweaking cost when either of matching constraint two sides
 is hardware register.  Before this patch, the constraint
 copy is simply taken as a real move insn for pref and
 conflict cost with one hardware register, after this patch,
 it's allowed that there are several input operands
 respecting the same matching constraint (but in different
 alternatives), so we should take it to be like shuffle copy
 for some cases to avoid over 

Re: [backport gcc10, gcc9] Requet to backport PR97969

2021-05-31 Thread Vladimir Makarov via Gcc-patches



On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote:

Hi,
Just a follow up after GCC 11 release.

I've backported to gcc-10 branch (without any change to original patches)
PR97969 and following PR98722 & PR98777 patches.

Commits apply cleanly without changes.
Built and regression tested on:
* arm-none-eabi and
* aarch64-none-linux-gnu cross toolchains.

There were no issues and no regressions (all OK).

OK for backport to gcc-10 branch ?


Sorry for delay with the answer due to my vacation.

As the patches did not introduce new PRs I believe they are ok for gcc-10.

Thank you.



Kind regards,
Przemyslaw Wirkus

---
commits I've backported:

commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330
Author: Vladimir N. Makarov 
Date:   Tue Jan 12 11:26:15 2021 -0500

 [PR97969] LRA: Transform pattern `plus (plus (hard reg, const), pseudo)` 
after elimination

commit 4334b524274203125193a08a8485250c41c2daa9
Author: Vladimir N. Makarov 
Date:   Wed Jan 20 11:40:14 2021 -0500

 [PR98722] LRA: Check that target has no 3-op add insn to transform 2 plus 
expression.

commit 68ba1039c7daf0485b167fe199ed7e8031158091
Author: Vladimir N. Makarov 
Date:   Thu Jan 21 17:27:01 2021 -0500

 [PR98777] LRA: Use preliminary created pseudo for in LRA elimination 
subpass

$ ./contrib/git-backport.py cf2ac1c30af0fa783c8d72e527904dda5d8cc330
$ ./contrib/git-backport.py 4334b524274203125193a08a8485250c41c2daa9
$ ./contrib/git-backport.py 68ba1039c7daf0485b167fe199ed7e8031158091



Richard.




Re: [PATCH] lra: Avoid cycling on certain subreg reloads [PR96796]

2021-04-23 Thread Vladimir Makarov via Gcc-patches



On 2021-04-23 12:13 p.m., Richard Sandiford wrote:

This is a backport of the PR96796 fix to GCC 10 and GCC 9.  The original
trunk patch was:

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552878.html

reviewed here:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553308.html


...


This backport is less aggressive than the trunk version, in that the new
code reuses the test for a reload move from in_class_p.  We will therefore
only narrow OP_OUT classes if the instruction is a register move or memory
load that was generated by LRA itself.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK for GCC 10
and GCC 9?

Yes.  I think as the previous patch did not introduced new issues and 
this patch works in less cases, the patch is ok for GCC10 and GCC9 
branches.  I definitely like this version of the patch more.


Thank you, Richard, for working on this issue.


gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.

gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.
---
  gcc/lra-constraints.c | 59 +++
  gcc/testsuite/gcc.c-torture/compile/pr96796.c | 56 ++
  2 files changed, 105 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr96796.c

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 7cc479b3042..29a734e0e10 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -235,12 +235,17 @@ get_reg_class (int regno)
 CL.  Use elimination first if REG is a hard register.  If REG is a
 reload pseudo created by this constraints pass, assume that it will
 be allocated a hard register from its allocno class, but allow that
-   class to be narrowed to CL if it is currently a superset of CL.
+   class to be narrowed to CL if it is currently a superset of CL and
+   if either:
+
+   - ALLOW_ALL_RELOAD_CLASS_CHANGES_P is true or
+   - the instruction we're processing is not a reload move.
  
 If NEW_CLASS is nonnull, set *NEW_CLASS to the new allocno class of

 REGNO (reg), or NO_REGS if no change in its class was needed.  */
  static bool
-in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class)
+in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
+   bool allow_all_reload_class_changes_p = false)
  {
enum reg_class rclass, common_class;
machine_mode reg_mode;
@@ -267,7 +272,8 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class 
*new_class)
 typically moves that have many alternatives, and restricting
 reload pseudos for one alternative may lead to situations
 where other reload pseudos are no longer allocatable.  */
-  || (INSN_UID (curr_insn) >= new_insn_uid_start
+  || (!allow_all_reload_class_changes_p
+ && INSN_UID (curr_insn) >= new_insn_uid_start
  && src != NULL
  && ((REG_P (src) || MEM_P (src))
  || (GET_CODE (src) == SUBREG
@@ -570,13 +576,12 @@ init_curr_insn_input_reloads (void)
curr_insn_input_reloads_num = 0;
  }
  
-/* Create a new pseudo using MODE, RCLASS, ORIGINAL or reuse already

-   created input reload pseudo (only if TYPE is not OP_OUT).  Don't
-   reuse pseudo if IN_SUBREG_P is true and the reused pseudo should be
-   wrapped up in SUBREG.  The result pseudo is returned through
-   RESULT_REG.  Return TRUE if we created a new pseudo, FALSE if we
-   reused the already created input reload pseudo.  Use TITLE to
-   describe new registers for debug purposes.  */
+/* Create a new pseudo using MODE, RCLASS, ORIGINAL or reuse an existing
+   reload pseudo.  Don't reuse an existing reload pseudo if IN_SUBREG_P
+   is true and the reused pseudo should be wrapped up in a SUBREG.
+   The result pseudo is returned through RESULT_REG.  Return TRUE if we
+   created a new pseudo, FALSE if we reused an existing reload pseudo.
+   Use TITLE to describe new registers for debug purposes.  */
  static bool
  get_reload_reg (enum op_type type, machine_mode mode, rtx original,
enum reg_class rclass, bool in_subreg_p,
@@ -588,6 +593,40 @@ get_reload_reg (enum op_type type, machine_mode mode, rtx 
original,
  
if (type == OP_OUT)

  {
+  /* Output reload registers tend to start out with a conservative
+choice of register class.  Usually this is ALL_REGS, although
+a target might narrow it (for performance reasons) through
+targetm.preferred_reload_class.  It's therefore quite common
+for a reload instruction to require a more restrictive class
+than the class that was originally assigned to the reload 

[committed] [PR100066] Check paradoxical subreg when splitting hard reg live range

2021-04-14 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100066

The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64.


commit f99f64f69db49ce6343d79a39eab28dcc6b91865
Author: Vladimir N. Makarov 
Date:   Wed Apr 14 13:21:40 2021 -0400

[PR100066] Check paradoxical subreg when splitting hard reg live range

When splitting live range of a hard reg, LRA actually split multi-register
containing the hard reg.  So we need to check the biggest used mode of the hard reg on
paradoxical subregister when the natural and the biggest
mode are ordered.

gcc/ChangeLog:

PR rtl-optimization/100066
* lra-constraints.c (split_reg): Check paradoxical_subreg_p for
ordered modes when choosing splitting mode for hard reg.

gcc/testsuite/ChangeLog:

PR rtl-optimization/100066
* gcc.target/i386/pr100066.c: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 62bcfc31772..9425f2d7e73 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5797,10 +5797,14 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
   mode = lra_reg_info[hard_regno].biggest_mode;
   machine_mode reg_rtx_mode = GET_MODE (regno_reg_rtx[hard_regno]);
   /* A reg can have a biggest_mode of VOIDmode if it was only ever seen as
-	 part of a multi-word register.  In that case, just use the reg_rtx.
-	 Otherwise, limit the size to that of the biggest access in the
-	 function.  */
-  if (mode == VOIDmode)
+	 part of a multi-word register.  In that case, just use the reg_rtx
+	 mode.  Do the same also if the biggest mode was larger than a register
+	 or we can not compare the modes.  Otherwise, limit the size to that of
+	 the biggest access in the function.  */
+  if (mode == VOIDmode
+	  || !ordered_p (GET_MODE_PRECISION (mode),
+			 GET_MODE_PRECISION (reg_rtx_mode))
+	  || paradoxical_subreg_p (mode, reg_rtx_mode))
 	{
 	  original_reg = regno_reg_rtx[hard_regno];
 	  mode = reg_rtx_mode;
diff --git a/gcc/testsuite/gcc.target/i386/pr100066.c b/gcc/testsuite/gcc.target/i386/pr100066.c
new file mode 100644
index 000..a795864e172
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100066.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { int128 } } } */
+/* { dg-options "-O1 -w" } */
+int pm;
+
+void
+w3 (int, int, int);
+
+void
+e6 (__int128 rt, long int mo)
+{
+  mo += rt / 0;
+  w3 (pm / mo, pm, 0);
+}


  1   2   >