Re: [PATCH] xtensa: Correct the relative RTX cost that corresponds to the Move Immediate "MOVI" instruction

2022-07-18 Thread Max Filippov via Gcc-patches
On Mon, Jul 18, 2022 at 5:47 AM Takayuki 'January June' Suwa
 wrote:
>
> This patch corrects the overestimation of the relative cost of
> '(set (reg) (const_int N))' where N fits into the instruction itself.
>
> In fact, such overestimation confuses the RTL loop invariant motion pass.
> As a result, it brings almost no negative impact from the speed point of
> view, but addtiional reg-reg move instructions and register allocation
> pressure about the size.
>
...
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_rtx_costs):
> Change the relative cost of '(set (reg) (const_int N))' where
> N fits into signed 12-bit from 4 to 0 if optimizing for size.
> And use the appropriate macro instead of the bare number 4.
> ---
>  gcc/config/xtensa/xtensa.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


RE: [PATCH] Add a heuristic for eliminate redundant load and store in inline pass.

2022-07-18 Thread Cui, Lili via Gcc-patches
Hi Honza,
Gentle ping  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597891.html

Thanks,
Lili.

> -Original Message-
> From: Gcc-patches  On
> Behalf Of Cui, Lili via Gcc-patches
> Sent: Sunday, July 10, 2022 10:05 PM
> To: Jan Hubicka 
> Cc: Lu, Hongjiu ; Liu, Hongtao
> ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Add a heuristic for eliminate redundant load and store
> in inline pass.
> 
> 
> > -Original Message-
> > From: Jan Hubicka  This is interesting idea.
> > Basically we want to guess if inlining will
> > make SRA and or strore->load propagation possible.   I think the
> > solution using INLINE_HINT may be bit too trigger happy, since it is
> > very common that this happens and with -O3 the hints are taken quite
> sriously.
> >
> > We already have mechanism to predict this situaiton by simply
> > expeciting that stores to addresses pointed to by function parameter
> > will be eliminated by 50%.  See eliminated_by_inlining_prob.
> >
> > I was thinking that we may combine it with a knowledge that the
> > parameter points to caller local memory (which is done by llvm's
> > heuristics) which can be added to IPA predicates.
> >
> > The idea of checking that the actual sotre in question is paired with
> > load at caller side is bit harder: one needs to invent representation
> > for such conditions.  So I wonder how much extra help we need for
> > critical inlning to happen at imagemagics?
> 
> Hi Honza,
> 
> Really appreciate for the feedback. I found that eliminated_by_inlining_prob
> does eliminated  the stmt 50% of the time, but the gap is still big.
> SRA cannot split callee's parameter for "Do not decompose non-BLKmode
> parameters in a way that would create a BLKmode parameter. Especially for
> pass-by-reference (hence, pointer type parameters), it's not worth it."
> 
> Critical inline function information
> 
> Caller:GetVirtualPixelsFromNexus
> size:541
> time:  484.08
> e->freq: 0.83
> 
> Callee:SetPixelCacheNexusPixels
> nonspec time: 46.60
> time : 36.18
> size:87
> 
> 
> Since the insns number 87 of callee function is bigger than inline_insns_auto
> (30) and there is no hint, so inline depends on "big_speedup_p (e)". 484.08
> (caller_time) * 0.15 (param_inline_min_speedup == 15)   = 72.61,  which
> means callee's time should be at least 72.61, but callee's time is 46.60, so 
> we
> need to lower param_inline_min_speedup to 3 or 4. I checked the
> history(https://gcc.gnu.org/bugzilla/show_bug.cgi?format=multiple=8366
> 5), that you tried changing it to 8,  but that increases the gzip code size by
> 2.5KB. so I want to add a heuristic hit for it.
> 
> Thanks,
> Lili.
> >
> > Honza



[PATCH] avr: Removed errant control characters

2022-07-18 Thread Joel Holdsworth via Gcc-patches
Signed-off-by: Joel Holdsworth 
---
 gcc/config/avr/avr-devices.cc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/avr/avr-devices.cc b/gcc/config/avr/avr-devices.cc
index aa284217f50..ff6a5441b77 100644
--- a/gcc/config/avr/avr-devices.cc
+++ b/gcc/config/avr/avr-devices.cc
@@ -126,8 +126,6 @@ avr_mcu_types[] =
 };
 
 
-
-
 #ifndef IN_GEN_AVR_MMCU_TEXI
 
 static char*
-- 
2.35.GIT



Re: [PATCH] c++: shortcut bad reference bindings [PR94894]

2022-07-18 Thread Jason Merrill via Gcc-patches

On 7/18/22 12:59, Patrick Palka wrote:

In case of l/rvalue or cv-qual mismatch during reference binding, we try
to give more helpful diagnostics by attempting a bad conversion that
ignores the mismatch.  But in doing so, we may end up instantiating an
ill-formed conversion function, something that would otherwise be
avoided if we didn't try for the better diagnostic in the first place.
We could just give up on producing a good diagnostic in this case, but
ideally we could keep the good diagnostics while avoiding unnecessary
template instantiations.

To that end, this patch adapts the bad conversion shortcutting mechanism
from r12-3346-g47543e5f9d1fc5 to handle this situation as well.  The main
observation from there applies here as well: when there's a strictly
viable candidate, we don't care about the distinction between an unviable
and non-strictly viable candidate.  And in turn it also means we don't
really care about the distinction between an invalid and bad conversion.
Of course, we don't know whether there's a strictly viable candidate until
after the fact, so we still need to keep track of when we "shortcutted"
distinguishing between an invalid and bad conversion.  This patch adds a
new kind of conversion, ck_shortcutted, to represent such conversions.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/94894
PR c++/105766
PR c++/106201

gcc/cp/ChangeLog:

* call.cc (enum conversion_kind): Add ck_shortcutted enumerator.
(has_next): Return false for it.
(reference_binding): Return a ck_shortcutted conversion instead
of an actual bad conversion when LOOKUP_SHORTCUT_BAD_CONVS is set.
Remove now obsolete early exit for the incomplete target type case.
(implicit_conversion_1): Don't mask out LOOKUP_SHORTCUT_BAD_CONVS.
(add_function_candidate): Set LOOKUP_SHORTCUT_BAD_CONVS iff
shortcut_bad_convs.
(missing_conversion_p): Return true for a ck_shortcutted
conversion.
* cp-tree.h (LOOKUP_SHORTCUT_BAD_CONVS): Define.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/ref8.C: New test.
* g++.dg/conversion/ref9.C: New test.
---
  gcc/cp/call.cc | 87 --
  gcc/cp/cp-tree.h   |  5 ++
  gcc/testsuite/g++.dg/conversion/ref8.C | 22 +++
  gcc/testsuite/g++.dg/conversion/ref9.C | 21 +++
  4 files changed, 103 insertions(+), 32 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/conversion/ref8.C
  create mode 100644 gcc/testsuite/g++.dg/conversion/ref9.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index fc98552fda2..ca2f5fbca39 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -59,7 +59,8 @@ enum conversion_kind {
ck_ambig,
ck_list,
ck_aggr,
-  ck_rvalue
+  ck_rvalue,
+  ck_shortcutted,


Please give this a comment explaining how it's used.  OK with that change.

Maybe ck_deferred_bad?


  };
  
  /* The rank of the conversion.  Order of the enumerals matters; better

@@ -775,7 +776,8 @@ has_next (conversion_kind code)
return !(code == ck_identity
   || code == ck_ambig
   || code == ck_list
-  || code == ck_aggr);
+  || code == ck_aggr
+  || code == ck_shortcutted);
  }
  
  static conversion *

@@ -1912,18 +1914,38 @@ reference_binding (tree rto, tree rfrom, tree expr, 
bool c_cast_p, int flags,
   difference in top-level cv-qualification is subsumed by the
   initialization itself and does not constitute a conversion.  */
  
+  bool maybe_valid_p = true;

+
/* [dcl.init.ref]
  
   Otherwise, the reference shall be an lvalue reference to a

   non-volatile const type, or the reference shall be an rvalue
- reference.
+ reference.  */
+  if (!CP_TYPE_CONST_NON_VOLATILE_P (to) && !TYPE_REF_IS_RVALUE (rto))
+maybe_valid_p = false;
  
- We try below to treat this as a bad conversion to improve diagnostics,

- but if TO is an incomplete class, we need to reject this conversion
- now to avoid unnecessary instantiation.  */
-  if (!CP_TYPE_CONST_NON_VOLATILE_P (to) && !TYPE_REF_IS_RVALUE (rto)
-  && !COMPLETE_TYPE_P (to))
-return NULL;
+  /* [dcl.init.ref]
+
+ Otherwise, a temporary of type "cv1 T1" is created and
+ initialized from the initializer expression using the rules for a
+ non-reference copy initialization.  If T1 is reference-related to
+ T2, cv1 must be the same cv-qualification as, or greater
+ cv-qualification than, cv2; otherwise, the program is ill-formed.  */
+  if (related_p && !at_least_as_qualified_p (to, from))
+maybe_valid_p = false;
+
+  /* We try below to treat an invalid reference binding as a bad conversion
+ to improve diagnostics, but doing so may cause otherwise unnecessary
+ instantiations that can lead to hard error.  So during the first pass
+ of overload resolution wherein we shortcut bad conversions, 

[PATCH] Fortran: error recovery on invalid array reference of non-array [PR103590]

2022-07-18 Thread Harald Anlauf via Gcc-patches
Dear all,

I intend to commit the attached patch as obvious to mainline
within the next 24h unless someone complains.  It replaces a
lazy gfc_internal_error by an explicit error message and an
error recovery path.

As a side-effect, we now diagnose a previously missed error
in testcase gfortran.dg/associate_54.f90 similarly to Intel.

Regtested on x86_64-pc-linux-gnu.

Thanks,
Harald

From e6ecc4d8227afea565b0555e95a4f5dcb8f4ecab Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 18 Jul 2022 22:34:53 +0200
Subject: [PATCH] Fortran: error recovery on invalid array reference of
 non-array [PR103590]

gcc/fortran/ChangeLog:

	PR fortran/103590
	* resolve.cc (find_array_spec): Change function result to bool to
	enable error recovery.  Generate error message for missing array
	spec instead of an internal error.
	(gfc_resolve_ref): Use function result from find_array_spec for
	error recovery.

gcc/testsuite/ChangeLog:

	PR fortran/103590
	* gfortran.dg/associate_54.f90: Adjust.
	* gfortran.dg/associate_59.f90: New test.
---
 gcc/fortran/resolve.cc | 13 ++---
 gcc/testsuite/gfortran.dg/associate_54.f90 |  3 +--
 gcc/testsuite/gfortran.dg/associate_59.f90 |  9 +
 3 files changed, 20 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/associate_59.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 2ebf076f730..dacd33561d0 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -4976,7 +4976,7 @@ gfc_resolve_dim_arg (gfc_expr *dim)
 static void
 resolve_assoc_var (gfc_symbol* sym, bool resolve_target);

-static void
+static bool
 find_array_spec (gfc_expr *e)
 {
   gfc_array_spec *as;
@@ -5004,7 +5004,11 @@ find_array_spec (gfc_expr *e)
   {
   case REF_ARRAY:
 	if (as == NULL)
-	  gfc_internal_error ("find_array_spec(): Missing spec");
+	  {
+	gfc_error ("Symbol %qs at %L has not been declared as an array",
+		   e->symtree->n.sym->name, >where);
+	return false;
+	  }

 	ref->u.ar.as = as;
 	as = NULL;
@@ -5028,6 +5032,8 @@ find_array_spec (gfc_expr *e)

   if (as != NULL)
 gfc_internal_error ("find_array_spec(): unused as(2)");
+
+  return true;
 }


@@ -5346,7 +5352,8 @@ gfc_resolve_ref (gfc_expr *expr)
   for (ref = expr->ref; ref; ref = ref->next)
 if (ref->type == REF_ARRAY && ref->u.ar.as == NULL)
   {
-	find_array_spec (expr);
+	if (!find_array_spec (expr))
+	  return false;
 	break;
   }

diff --git a/gcc/testsuite/gfortran.dg/associate_54.f90 b/gcc/testsuite/gfortran.dg/associate_54.f90
index 003175a47fd..b23a4f160ac 100644
--- a/gcc/testsuite/gfortran.dg/associate_54.f90
+++ b/gcc/testsuite/gfortran.dg/associate_54.f90
@@ -26,9 +26,8 @@ contains
 integer, intent(in) :: a
 associate (state => obj%state(TEST_STATES)) ! { dg-error "is used as array" }
 !  state = a
-  state(TEST_STATE) = a
+  state(TEST_STATE) = a ! { dg-error "has not been declared as an array" }
 end associate
   end subroutine test_alter_state1

 end module test
-
diff --git a/gcc/testsuite/gfortran.dg/associate_59.f90 b/gcc/testsuite/gfortran.dg/associate_59.f90
new file mode 100644
index 000..2da97731d39
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/associate_59.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! PR fortran/103590 - ICE: find_array_spec(): Missing spec
+! Contributed by G.Steinmetz
+
+program p
+  associate (a => 1)
+print *, [character(a(1)) :: '1'] ! { dg-error "Scalar INTEGER expression" }
+  end associate
+end
--
2.35.3



[COMMITTED] tree-optimization/106280 - Check if transitives need to be registered.

2022-07-18 Thread Andrew MacLeod via Gcc-patches
Regardless of whether this is enough of an improvement for the PR, it 
should be done.


Whenever a relation is registered with the oracle, it walks the 
dominator tree trying to apply any transitives it can find.


FIrst, it should check whether the operands are already in any relation. 
If neither operand is in a relation, we can conclude there cannot be any 
transitives. this is a simple bitmask check.


The second thing this patch does is adjust the set_relation routine to 
return NULL if the relation being applied already exists.  ie, we don't 
need to set the relation or do any additionalwork if the relation is 
already true. This will also prevent additional calls to the register 
transitives.


This provides some marginal improvements across the board, and 
noticeable improvements in the testcase.


Bootstrapped on    x86_64-pc-linux-gnu with no regressions. Pushed.

Andrewcommit 5e47c9333df6df1aa9da861f07e68f985d7d28fb
Author: Andrew MacLeod 
Date:   Thu Jul 14 12:35:55 2022 -0400

Check if transitives need to be registered.

Whenever a relation is added, register_transitive is always called.
If neither operand was in a relation before, or this is not a new
relation, then there is no need to register transitives.

PR tree-optimization/106280
* value-relation.cc (dom_oracle::register_relation): Register
transitives only when it is possible for there to be one.
(dom_oracle::set_one_relation): Return NULL if this is an
existing relation.

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 13ce44199f7..bd344253af3 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -967,8 +967,12 @@ dom_oracle::register_relation (basic_block bb, relation_kind k, tree op1,
 equiv_oracle::register_relation (bb, k, op1, op2);
   else
 {
+  // if neither op1 nor op2 are in a relation before this is registered,
+  // there will be no transitive.
+  bool check = bitmap_bit_p (m_relation_set, SSA_NAME_VERSION (op1))
+		   || bitmap_bit_p (m_relation_set, SSA_NAME_VERSION (op2));
   relation_chain *ptr = set_one_relation (bb, k, op1, op2);
-  if (ptr)
+  if (ptr && check)
 	register_transitives (bb, *ptr);
 }
 }
@@ -1010,13 +1014,16 @@ dom_oracle::set_one_relation (basic_block bb, relation_kind k, tree op1,
   // Check into whether we can simply replace the relation rather than
   // intersecting it.  THis may help with some optimistic iterative
   // updating algorithms.
-  ptr->intersect (vr);
+  bool new_rel = ptr->intersect (vr);
   if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, " to produce ");
 	  ptr->dump (dump_file);
-	  fprintf (dump_file, "\n");
+	  fprintf (dump_file, " %s.\n", new_rel ? "Updated" : "No Change");
 	}
+  // If there was no change, return no record..
+  if (!new_rel)
+	return NULL;
 }
   else
 {


Re: [PATCH] match.pd: Add new abs pattern [PR94290]

2022-07-18 Thread Sam Feifer via Gcc-patches
Just realized I had mixed up the 9 and the 2 when labelling the patch. This
patch is referring to pr94920 not pr94290. Attached is a fixed patch file.
Sorry for any confusion.

On Mon, Jul 18, 2022 at 9:07 AM Sam Feifer  wrote:

> Here's an updated version of the patch.
>
> Thanks
> -Sam
>
> On Thu, Jul 14, 2022 at 3:54 PM Andrew Pinski  wrote:
>
>> On Thu, Jul 14, 2022 at 12:38 PM Sam Feifer  wrote:
>> >
>> >
>> >
>> > On Thu, Jul 14, 2022 at 1:24 PM Andrew Pinski 
>> wrote:
>> >>
>> >> On Thu, Jul 14, 2022 at 7:09 AM Sam Feifer  wrote:
>> >> >
>> >> >
>> >> > On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski 
>> wrote:
>> >> >>
>> >> >> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches
>> >> >>  wrote:
>> >> >> >
>> >> >> > This patch is intended to fix a missed optimization in match.pd.
>> It optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to
>> write a second simplification in match.pd to handle the commutative
>> property as the match was not ocurring otherwise. Additionally, the pattern
>> (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps with the
>> other simplification rule.
>> >> >>
>> >> >> You could use :c for the commutative property instead and that
>> should
>> >> >> simplify things.
>> >> >> That is:
>> >> >>
>> >> >> (simplify
>> >> >>   (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
>> >> >>   (abs @0))
>> >> >>
>> >> >> Also since integer_zerop works on vectors, it seems like you should
>> >> >> add a testcase or two for the vector case.
>> >> >> Also would be useful if you write a testcase that uses different
>> >> >> statements rather than one big one so it gets exercised in the
>> >> >> forwprop case.
>> >> >> Note also if either of the max are used more than just in this
>> >> >> simplification, it could increase the lifetime of @0, maybe you need
>> >> >> to add :s to the max expressions.
>> >> >>
>> >> >
>> >> > Thanks for the feedback. I'm not quite sure what a vector test case
>> would look like for this. Could I get some guidance on that?
>> >>
>> >> Yes this should produce the pattern at forwprop1 time (with the C++
>> >> front-end, the C front-end does not support vector selects):
>> >> typedef int __attribute__((vector_size(4*sizeof(int vint;
>> >>
>> >> vint foo(vint x) {
>> >> vint t = (x >= 0 ? x : 0) ;
>> >> vint xx = -x;
>> >> vint t1 =  (xx >= 0 ? xx : 0);
>> >> return t + t1;
>> >> }
>> >>
>> >> int foo(int x) {
>> >> int t = (x >= 0 ? x : 0) ;
>> >> int xx = -x;
>> >> int t1 =  (xx >= 0 ? xx : 0);
>> >> return t + t1;
>> >> }
>> >>
>> >> Thanks,
>> >> Andrew Pinski
>> >>
>> >
>> > Thanks for the help. I'm still having trouble with the vector test,
>> though. When I try to compile, I get an error saying "used vector type
>> where scalar is required", referring to the max expressions. How do I use
>> the max expression with two vectors as the inputs?
>>
>> As I mentioned it only works with the C++ front-end :). The C
>> front-end does not support ?: with vectors types.
>>
>> Thanks,
>> Andrew Pinski
>>
>> >
>> > Thanks
>> > -Sam
>> >>
>> >>
>> >> >
>> >> > Thanks
>> >> > -Sam
>> >> >
>> >> >>
>> >> >> Thanks,
>> >> >> Andrew
>> >> >>
>> >> >> >
>> >> >> > Tests are also included to be added to the testsuite.
>> >> >> >
>> >> >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>> >> >> >
>> >> >> > PR tree-optimization/94290
>> >> >> >
>> >> >> > gcc/ChangeLog:
>> >> >> >
>> >> >> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New
>> simplification.
>> >> >> > * match.pd (x <= 0 ? -x : 0): New Simplification.
>> >> >> >
>> >> >> > gcc/testsuite/ChangeLog:
>> >> >> >
>> >> >> > * gcc.c-torture/execute/pr94290-1.c: New test.
>> >> >> > * gcc.dg/pr94290-2.c: New test.
>> >> >> > * gcc.dg/pr94290.c: New test.
>> >> >> > ---
>> >> >> >  gcc/match.pd  | 15 ++
>> >> >> >  .../gcc.c-torture/execute/pr94290-1.c | 16 +++
>> >> >> >  gcc/testsuite/gcc.dg/pr94290-2.c  | 15 ++
>> >> >> >  gcc/testsuite/gcc.dg/pr94290.c| 46
>> +++
>> >> >> >  4 files changed, 92 insertions(+)
>> >> >> >  create mode 100644
>> gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
>> >> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c
>> >> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290.c
>> >> >> >
>> >> >> > diff --git a/gcc/match.pd b/gcc/match.pd
>> >> >> > index 45aefd96688..55ca79d7ac9 100644
>> >> >> > --- a/gcc/match.pd
>> >> >> > +++ b/gcc/match.pd
>> >> >> > @@ -7848,3 +7848,18 @@ and,
>> >> >> >(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
>> >> >> >  (bit_and @0 @1)
>> >> >> >(cond (le @0 @1) @0 (bit_and @0 @1))
>> >> >> > +
>> >> >> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
>> >> >> > +(simplify
>> >> >> > +  (plus (max @0 integer_zerop) (max (negate @0) integer_zerop))
>> >> >> > +  (abs @0))
>> 

[PATCH] c++: shortcut bad reference bindings [PR94894]

2022-07-18 Thread Patrick Palka via Gcc-patches
In case of l/rvalue or cv-qual mismatch during reference binding, we try
to give more helpful diagnostics by attempting a bad conversion that
ignores the mismatch.  But in doing so, we may end up instantiating an
ill-formed conversion function, something that would otherwise be
avoided if we didn't try for the better diagnostic in the first place.
We could just give up on producing a good diagnostic in this case, but
ideally we could keep the good diagnostics while avoiding unnecessary
template instantiations.

To that end, this patch adapts the bad conversion shortcutting mechanism
from r12-3346-g47543e5f9d1fc5 to handle this situation as well.  The main
observation from there applies here as well: when there's a strictly
viable candidate, we don't care about the distinction between an unviable
and non-strictly viable candidate.  And in turn it also means we don't
really care about the distinction between an invalid and bad conversion.
Of course, we don't know whether there's a strictly viable candidate until
after the fact, so we still need to keep track of when we "shortcutted"
distinguishing between an invalid and bad conversion.  This patch adds a
new kind of conversion, ck_shortcutted, to represent such conversions.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/94894
PR c++/105766
PR c++/106201

gcc/cp/ChangeLog:

* call.cc (enum conversion_kind): Add ck_shortcutted enumerator.
(has_next): Return false for it.
(reference_binding): Return a ck_shortcutted conversion instead
of an actual bad conversion when LOOKUP_SHORTCUT_BAD_CONVS is set.
Remove now obsolete early exit for the incomplete target type case.
(implicit_conversion_1): Don't mask out LOOKUP_SHORTCUT_BAD_CONVS.
(add_function_candidate): Set LOOKUP_SHORTCUT_BAD_CONVS iff
shortcut_bad_convs.
(missing_conversion_p): Return true for a ck_shortcutted
conversion.
* cp-tree.h (LOOKUP_SHORTCUT_BAD_CONVS): Define.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/ref8.C: New test.
* g++.dg/conversion/ref9.C: New test.
---
 gcc/cp/call.cc | 87 --
 gcc/cp/cp-tree.h   |  5 ++
 gcc/testsuite/g++.dg/conversion/ref8.C | 22 +++
 gcc/testsuite/g++.dg/conversion/ref9.C | 21 +++
 4 files changed, 103 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/conversion/ref8.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/ref9.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index fc98552fda2..ca2f5fbca39 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -59,7 +59,8 @@ enum conversion_kind {
   ck_ambig,
   ck_list,
   ck_aggr,
-  ck_rvalue
+  ck_rvalue,
+  ck_shortcutted,
 };
 
 /* The rank of the conversion.  Order of the enumerals matters; better
@@ -775,7 +776,8 @@ has_next (conversion_kind code)
   return !(code == ck_identity
   || code == ck_ambig
   || code == ck_list
-  || code == ck_aggr);
+  || code == ck_aggr
+  || code == ck_shortcutted);
 }
 
 static conversion *
@@ -1912,18 +1914,38 @@ reference_binding (tree rto, tree rfrom, tree expr, 
bool c_cast_p, int flags,
  difference in top-level cv-qualification is subsumed by the
  initialization itself and does not constitute a conversion.  */
 
+  bool maybe_valid_p = true;
+
   /* [dcl.init.ref]
 
  Otherwise, the reference shall be an lvalue reference to a
  non-volatile const type, or the reference shall be an rvalue
- reference.
+ reference.  */
+  if (!CP_TYPE_CONST_NON_VOLATILE_P (to) && !TYPE_REF_IS_RVALUE (rto))
+maybe_valid_p = false;
 
- We try below to treat this as a bad conversion to improve diagnostics,
- but if TO is an incomplete class, we need to reject this conversion
- now to avoid unnecessary instantiation.  */
-  if (!CP_TYPE_CONST_NON_VOLATILE_P (to) && !TYPE_REF_IS_RVALUE (rto)
-  && !COMPLETE_TYPE_P (to))
-return NULL;
+  /* [dcl.init.ref]
+
+ Otherwise, a temporary of type "cv1 T1" is created and
+ initialized from the initializer expression using the rules for a
+ non-reference copy initialization.  If T1 is reference-related to
+ T2, cv1 must be the same cv-qualification as, or greater
+ cv-qualification than, cv2; otherwise, the program is ill-formed.  */
+  if (related_p && !at_least_as_qualified_p (to, from))
+maybe_valid_p = false;
+
+  /* We try below to treat an invalid reference binding as a bad conversion
+ to improve diagnostics, but doing so may cause otherwise unnecessary
+ instantiations that can lead to hard error.  So during the first pass
+ of overload resolution wherein we shortcut bad conversions, instead just
+ produce a special conversion that we'll use to indicate we might need to
+ perform a second pass if there's no strictly viable candidate.  */
+  if 

[PATCH] RISC-V: Add RTX costs for `if_then_else' expressions

2022-07-18 Thread Maciej W. Rozycki
Fix a performance regression from commit 391500af1932 ("Do not ignore 
costs of jump insns in combine."), a part of the m68k series for MODE_CC 
conversion (), 
observed in soft-fp code in libgcc used by some of the embench-iot 
benchmarks.

The immediate origin of the regression is the middle end, which in the 
absence of cost information from the backend estimates the cost of an 
RTL expression by assuming a single machine instruction for each of the 
expression's subexpression.

So for `if_then_else', which takes 3 operands, the estimated cost is 3 
instructions (i.e. 12 units) even though a branch instruction evaluates 
it in a single machine cycle (ignoring the cost of actually taking the 
branch of course, which is handled elsewhere).  Consequently an insn 
sequence like:

(insn 595 594 596 43 (set (reg:DI 305)
(lshiftrt:DI (reg/v:DI 160 [ R_f ])
(const_int 55 [0x37]))) ".../libgcc/soft-fp/adddf3.c":46:3 216 
{lshrdi3}
 (nil))
(insn 596 595 597 43 (set (reg:DI 304)
(and:DI (reg:DI 305)
(const_int 1 [0x1]))) ".../libgcc/soft-fp/adddf3.c":46:3 109 
{anddi3}
 (expr_list:REG_DEAD (reg:DI 305)
(nil)))
(jump_insn 597 596 598 43 (set (pc)
(if_then_else (eq (reg:DI 304)
(const_int 0 [0]))
(label_ref:DI 1644)
(pc))) ".../libgcc/soft-fp/adddf3.c":46:3 237 {*branchdi}
 (expr_list:REG_DEAD (reg:DI 304)
(int_list:REG_BR_PROB 536870916 (nil)))
 -> 1644)

does not (anymore, as from the commit referred) get combined into:

(note 595 594 596 43 NOTE_INSN_DELETED)
(note 596 595 597 43 NOTE_INSN_DELETED)
(jump_insn 597 596 598 43 (parallel [
(set (pc)
(if_then_else (eq (zero_extract:DI (reg/v:DI 160 [ R_f ])
(const_int 1 [0x1])
(const_int 55 [0x37]))
(const_int 0 [0]))
(label_ref:DI 1644)
(pc)))
(clobber (scratch:DI))
]) ".../libgcc/soft-fp/adddf3.c":46:3 243 {*branch_on_bitdi}
 (int_list:REG_BR_PROB 536870916 (nil))
 -> 1644)

This is because the new cost is incorrectly calculated as 28 units while 
the cost of the original 3 instructions was 24:

rejecting combination of insns 595, 596 and 597
original costs 4 + 4 + 16 = 24
replacement cost 28

Before the commit referred the cost of jump instruction was ignored and 
considered 0 (i.e. unknown) and a sequence of instructions of a known 
cost used to win:

allowing combination of insns 595, 596 and 597
original costs 4 + 4 + 0 = 0
replacement cost 28

Add the missing costs for the 3 variants of `if_then_else' expressions 
we currently define in the backend.

With the fix in place the cost of this particular `if_then_else' pattern 
is 2 instructions or 8 units (because of the shift operation) and 
therefore the ultimate cost of the original 3 RTL insns will work out at 
16 units (4 + 4 + 8), however the replacement single RTL insn will cost 
8 units only.

gcc/
* config/riscv/riscv.cc (riscv_rtx_costs) : New 
case.
---
 gcc/config/riscv/riscv.cc |   27 +++
 1 file changed, 27 insertions(+)

gcc-riscv-rtx-costs-if-then-else.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -1853,6 +1853,33 @@ riscv_rtx_costs (rtx x, machine_mode mod
   /* Otherwise use the default handling.  */
   return false;
 
+case IF_THEN_ELSE:
+  if (TARGET_SFB_ALU
+ && register_operand (XEXP (x, 1), mode)
+ && sfb_alu_operand (XEXP (x, 2), mode)
+ && comparison_operator (XEXP (x, 0), VOIDmode))
+   {
+ /* For predicated conditional-move operations we assume the cost
+of a single instruction even though there are actually two.  */
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+  else if (LABEL_REF_P (XEXP (x, 1)) && XEXP (x, 2) == pc_rtx)
+   {
+ if (equality_operator (XEXP (x, 0), mode)
+ && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTRACT)
+   {
+ *total = COSTS_N_INSNS (SINGLE_SHIFT_COST + 1);
+ return true;
+   }
+ if (order_operator (XEXP (x, 0), mode))
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+   }
+  return false;
+
 case NOT:
   *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 2 : 1);
   return false;


Re: [PATCH] ipa-cp: Fix assert triggering with -fno-toplevel-reorder (PR 106260)

2022-07-18 Thread Martin Jambor
Hi,

On Mon, Jul 18 2022, Jan Hubicka wrote:
>> Hi,
>> 
>> with -fno-toplevel-reorder (and -fwhole-program), there apparently can
>> be local functions without any callers.  This is something that IPA-CP
>
> If there is possibility to trigger a local function without callers, I
> think one can also make two local functions calling each other but with
> no other callers.

I am not sure I understand.  The above can happen only with
-fno-toplevel-reorder, right?

>
> So I think you need to wait until dataflow solution stabilizes and
> then then find such isolated cases and do something sensible (as of
> not crashing since the code is dead anyway).

After dataflow stabilizes, it crashes only because a verifier does not
like the result.  Perhaps the solution is simply not verify nodes built
with -fno-toplevel-reorder?  But the assert was useful in the past in
the general case so I'd like to keep it.

I can now see that the proposed patch could also pessimize the lattices
of functions that the dead functions call a bit, although arguably it
might be the right thing to do, since we cannot say anything about these
calls.  But I do not care much.

>
> -fno-toplevel-reorder was kind of meant to get close to what 
> -fno-unit-at-a-time
> did to make legacy code (kernel) happy.  Without unit-at-a-time we did
> not remove local functions unless we decided to inline them since they
> were output before we had chance to realize that they were unused.

I understand that it is legacy stuff but actually I wonder what the
option means for IPA in general.  Is IPA run after we might have output
something already?  (If not then why do we see any differences?  If yes,
can all IPA optimizations work, especially with -fwhole-program?)

>
> Since we document that only about static variables, perhaps we could
> simply start removing functions to avoid these side cases?

Well, the funny thing is that we eventually do remove the function, just
at IPA-CP stage it is there and not even local yet (I have not noticed
this last week).  Removing it earlier would definitely make life simpler
for me :-)

Anyway, I'm happy to work in whichever direction you prefer, just now I
am not sure which one it is.

Thanks,

Martin

>
>> does not like because its propagation verifier checks that local
>> functions do not end up with TOP in their lattices.  Therefore there
>> is an assert checking that all call-less unreachable functions have
>> been removed, which triggers in PR 106260 with these two options.
>> 
>> This patch detects the situation and marks the lattices as variable,
>> thus avoiding both the assert trigger and the verification failure.
>> 
>> Bootstrapped and tested on x86_64-linux.  OK for master and then all
>> active release branches?
>> 
>> Thanks,
>> 
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 2022-07-13  Martin Jambor  
>> 
>>  PR ipa/106260
>>  * ipa-cp.cc (initialize_node_lattices): Replace assert that there are
>>  callers with handling that situation when -fno-toplevel_reorder.
>> 


[committed] RISC-V/doc: Add index references for `mrelax' and `mriscv-attribute'

2022-07-18 Thread Maciej W. Rozycki
Add missing index references for the `-mrelax' and `-mriscv-attribute' 
invocation options.

gcc/
* doc/invoke.texi (RISC-V Options): Add index references for 
`mrelax' and `mriscv-attribute'.
---
Hi,

 Verified with `make info pdf' and committed as obvious.

  Maciej
---
 gcc/doc/invoke.texi |2 ++
 1 file changed, 2 insertions(+)

gcc-riscv-doc-opindex.diff
Index: gcc/gcc/doc/invoke.texi
===
--- gcc.orig/gcc/doc/invoke.texi
+++ gcc/gcc/doc/invoke.texi
@@ -28170,12 +28170,14 @@ limit optimization.
 
 @item -mrelax
 @itemx -mno-relax
+@opindex mrelax
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses. The default is to take advantage of
 linker relaxations.
 
 @item -mriscv-attribute
 @itemx -mno-riscv-attribute
+@opindex mriscv-attribute
 Emit (do not emit) RISC-V attribute to record extra information into ELF
 objects.  This feature requires at least binutils 2.32.
 


[committed] RISC-V/doc: Correct the formatting of `-mstack-protector-guard-reg='

2022-07-18 Thread Maciej W. Rozycki
Add missing second space around the `-mstack-protector-guard-reg=' 
invocation option.

gcc/
* doc/invoke.texi (Option Summary): Add missing second space 
around `-mstack-protector-guard-reg='.
---
Hi,

 Verified with `make info pdf' and committed as obvious.

  Maciej
---
 gcc/doc/invoke.texi |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

gcc-riscv-doc-mstack-protector-guard-space.diff
Index: gcc/gcc/doc/invoke.texi
===
--- gcc.orig/gcc/doc/invoke.texi
+++ gcc/gcc/doc/invoke.texi
@@ -1214,7 +1214,7 @@ See RS/6000 and PowerPC Options.
 -mriscv-attribute  -mno-riscv-attribute @gol
 -malign-data=@var{type} @gol
 -mbig-endian  -mlittle-endian @gol
--mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
+-mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset}}
 
 @emph{RL78 Options}


[committed] RISC-V/doc: Correct the name of `-mriscv-attribute'

2022-07-18 Thread Maciej W. Rozycki
Correct the name of the `-mriscv-attribute' invocation option, including 
a typo in the negated form.

gcc/
* doc/invoke.texi (Option Summary): Fix `-mno-riscv-attribute'.
(RISC-V Options): Likewise, and `-mriscv-attribute'.
---
Hi,

 Verified with `make info pdf' and committed as obvious.

  Maciej
---
 gcc/doc/invoke.texi |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

gcc-riscv-doc-mriscv-attribute.diff
Index: gcc/gcc/doc/invoke.texi
===
--- gcc.orig/gcc/doc/invoke.texi
+++ gcc/gcc/doc/invoke.texi
@@ -1211,7 +1211,7 @@ See RS/6000 and PowerPC Options.
 -mcmodel=medlow  -mcmodel=medany @gol
 -mexplicit-relocs  -mno-explicit-relocs @gol
 -mrelax  -mno-relax @gol
--mriscv-attribute  -mmo-riscv-attribute @gol
+-mriscv-attribute  -mno-riscv-attribute @gol
 -malign-data=@var{type} @gol
 -mbig-endian  -mlittle-endian @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
@@ -28174,8 +28174,8 @@ Take advantage of linker relaxations to
 required to materialize symbol addresses. The default is to take advantage of
 linker relaxations.
 
-@item -memit-attribute
-@itemx -mno-emit-attribute
+@item -mriscv-attribute
+@itemx -mno-riscv-attribute
 Emit (do not emit) RISC-V attribute to record extra information into ELF
 objects.  This feature requires at least binutils 2.32.
 


[PING][PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns

2022-07-18 Thread Maciej W. Rozycki
On Mon, 4 Jul 2022, Maciej W. Rozycki wrote:

> These instructions are only produced via an expander already, so change 
> the expander to emit individual RTL insns for each machine instruction 
> in the ultimate ultimate sequence produced rather than deferring to a 
> single RTL insn producing the whole sequence at once.

 Ping for:



  Maciej


Re: [PATCH] match.pd: Add new abs pattern [PR94290]

2022-07-18 Thread Sam Feifer via Gcc-patches
Here's an updated version of the patch.

Thanks
-Sam

On Thu, Jul 14, 2022 at 3:54 PM Andrew Pinski  wrote:

> On Thu, Jul 14, 2022 at 12:38 PM Sam Feifer  wrote:
> >
> >
> >
> > On Thu, Jul 14, 2022 at 1:24 PM Andrew Pinski  wrote:
> >>
> >> On Thu, Jul 14, 2022 at 7:09 AM Sam Feifer  wrote:
> >> >
> >> >
> >> > On Wed, Jul 13, 2022 at 3:36 PM Andrew Pinski 
> wrote:
> >> >>
> >> >> On Wed, Jul 13, 2022 at 12:26 PM Sam Feifer via Gcc-patches
> >> >>  wrote:
> >> >> >
> >> >> > This patch is intended to fix a missed optimization in match.pd.
> It optimizes (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) to just abs(x). I had to
> write a second simplification in match.pd to handle the commutative
> property as the match was not ocurring otherwise. Additionally, the pattern
> (x <= 0 ? -x : 0) now gets optimized to max(-x, 0), which helps with the
> other simplification rule.
> >> >>
> >> >> You could use :c for the commutative property instead and that should
> >> >> simplify things.
> >> >> That is:
> >> >>
> >> >> (simplify
> >> >>   (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
> >> >>   (abs @0))
> >> >>
> >> >> Also since integer_zerop works on vectors, it seems like you should
> >> >> add a testcase or two for the vector case.
> >> >> Also would be useful if you write a testcase that uses different
> >> >> statements rather than one big one so it gets exercised in the
> >> >> forwprop case.
> >> >> Note also if either of the max are used more than just in this
> >> >> simplification, it could increase the lifetime of @0, maybe you need
> >> >> to add :s to the max expressions.
> >> >>
> >> >
> >> > Thanks for the feedback. I'm not quite sure what a vector test case
> would look like for this. Could I get some guidance on that?
> >>
> >> Yes this should produce the pattern at forwprop1 time (with the C++
> >> front-end, the C front-end does not support vector selects):
> >> typedef int __attribute__((vector_size(4*sizeof(int vint;
> >>
> >> vint foo(vint x) {
> >> vint t = (x >= 0 ? x : 0) ;
> >> vint xx = -x;
> >> vint t1 =  (xx >= 0 ? xx : 0);
> >> return t + t1;
> >> }
> >>
> >> int foo(int x) {
> >> int t = (x >= 0 ? x : 0) ;
> >> int xx = -x;
> >> int t1 =  (xx >= 0 ? xx : 0);
> >> return t + t1;
> >> }
> >>
> >> Thanks,
> >> Andrew Pinski
> >>
> >
> > Thanks for the help. I'm still having trouble with the vector test,
> though. When I try to compile, I get an error saying "used vector type
> where scalar is required", referring to the max expressions. How do I use
> the max expression with two vectors as the inputs?
>
> As I mentioned it only works with the C++ front-end :). The C
> front-end does not support ?: with vectors types.
>
> Thanks,
> Andrew Pinski
>
> >
> > Thanks
> > -Sam
> >>
> >>
> >> >
> >> > Thanks
> >> > -Sam
> >> >
> >> >>
> >> >> Thanks,
> >> >> Andrew
> >> >>
> >> >> >
> >> >> > Tests are also included to be added to the testsuite.
> >> >> >
> >> >> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >> >> >
> >> >> > PR tree-optimization/94290
> >> >> >
> >> >> > gcc/ChangeLog:
> >> >> >
> >> >> > * match.pd (x >= 0 ? x : 0) + (x <= 0 ? -x : 0): New
> simplification.
> >> >> > * match.pd (x <= 0 ? -x : 0): New Simplification.
> >> >> >
> >> >> > gcc/testsuite/ChangeLog:
> >> >> >
> >> >> > * gcc.c-torture/execute/pr94290-1.c: New test.
> >> >> > * gcc.dg/pr94290-2.c: New test.
> >> >> > * gcc.dg/pr94290.c: New test.
> >> >> > ---
> >> >> >  gcc/match.pd  | 15 ++
> >> >> >  .../gcc.c-torture/execute/pr94290-1.c | 16 +++
> >> >> >  gcc/testsuite/gcc.dg/pr94290-2.c  | 15 ++
> >> >> >  gcc/testsuite/gcc.dg/pr94290.c| 46
> +++
> >> >> >  4 files changed, 92 insertions(+)
> >> >> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr94290-1.c
> >> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290-2.c
> >> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr94290.c
> >> >> >
> >> >> > diff --git a/gcc/match.pd b/gcc/match.pd
> >> >> > index 45aefd96688..55ca79d7ac9 100644
> >> >> > --- a/gcc/match.pd
> >> >> > +++ b/gcc/match.pd
> >> >> > @@ -7848,3 +7848,18 @@ and,
> >> >> >(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> >> >> >  (bit_and @0 @1)
> >> >> >(cond (le @0 @1) @0 (bit_and @0 @1))
> >> >> > +
> >> >> > +/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
> >> >> > +(simplify
> >> >> > +  (plus (max @0 integer_zerop) (max (negate @0) integer_zerop))
> >> >> > +  (abs @0))
> >> >> > +
> >> >> > +/* (x <= 0 ? -x : 0) + (x >= 0 ? x : 0) -> abs x.  */
> >> >> > +(simplify
> >> >> > +  (plus (max (negate @0) integer_zerop) (max @0 integer_zerop) )
> >> >> > +  (abs @0))
> >> >> > +
> >> >> > +/* (x <= 0 ? -x : 0) -> max(-x, 0).  */
> >> >> > +(simplify
> >> >> > + (cond (le @0 integer_zerop@1) (negate @0) integer_zerop@1)
> >> >> > + (max (negate @0) 

[PATCH] xtensa: Correct the relative RTX cost that corresponds to the Move Immediate "MOVI" instruction

2022-07-18 Thread Takayuki 'January June' Suwa via Gcc-patches
This patch corrects the overestimation of the relative cost of
'(set (reg) (const_int N))' where N fits into the instruction itself.

In fact, such overestimation confuses the RTL loop invariant motion pass.
As a result, it brings almost no negative impact from the speed point of
view, but addtiional reg-reg move instructions and register allocation
pressure about the size.

/* example, optimized for size */
extern int foo(void);
extern int array[16];
void test_0(void) {
  unsigned int i;
  for (i = 0; i < sizeof(array)/sizeof(*array); ++i)
array[i] = 1024;
}
void test_1(void) {
  unsigned int i;
  for (i = 0; i < sizeof(array)/sizeof(*array); ++i)
array[i] = array[i] ? 1024 : 0;
}
void test_2(void) {
  unsigned int i;
  for (i = 0; i < sizeof(array)/sizeof(*array); ++i)
array[i] = foo() ? 0 : 1024;
}

;; before
.literal_position
.literal .LC0, array
test_0:
l32ra3, .LC0
movi.n  a2, 0
movia4, 0x400   // OK
.L2:
s32i.n  a4, a3, 0
addi.n  a2, a2, 1
addi.n  a3, a3, 4
bneia2, 16, .L2
ret.n
.literal_position
.literal .LC1, array
test_1:
l32ra2, .LC1
movi.n  a3, 0
movia5, 0x400   // NG
.L6:
l32i.n  a4, a2, 0
beqz.n  a4, .L5
mov.n   a4, a5  // should be "movi a4, 0x400"
.L5:
s32i.n  a4, a2, 0
addi.n  a3, a3, 1
addi.n  a2, a2, 4
bneia3, 16, .L6
ret.n
.literal_position
.literal .LC2, array
test_2:
addisp, sp, -32
s32i.n  a12, sp, 24
l32ra12, .LC2
s32i.n  a13, sp, 20
s32i.n  a14, sp, 16
s32i.n  a15, sp, 12
s32i.n  a0, sp, 28
addia13, a12, 64
movi.n  a15, 0  // NG
movia14, 0x400  // and wastes callee-saved registers (only 4)
.L11:
call0   foo
mov.n   a3, a14 // should be "movi a3, 0x400"
movnez  a3, a15, a2
s32i.n  a3, a12, 0
addi.n  a12, a12, 4
bne a12, a13, .L11
l32i.n  a0, sp, 28
l32i.n  a12, sp, 24
l32i.n  a13, sp, 20
l32i.n  a14, sp, 16
l32i.n  a15, sp, 12
addisp, sp, 32
ret.n

;; after
.literal_position
.literal .LC0, array
test_0:
l32ra3, .LC0
movi.n  a2, 0
movia4, 0x400   // OK
.L2:
s32i.n  a4, a3, 0
addi.n  a2, a2, 1
addi.n  a3, a3, 4
bneia2, 16, .L2
ret.n
.literal_position
.literal .LC1, array
test_1:
l32ra2, .LC1
movi.n  a3, 0
.L6:
l32i.n  a4, a2, 0
beqz.n  a4, .L5
movia4, 0x400   // OK
.L5:
s32i.n  a4, a2, 0
addi.n  a3, a3, 1
addi.n  a2, a2, 4
bneia3, 16, .L6
ret.n
.literal_position
.literal .LC2, array
test_2:
addisp, sp, -16
s32i.n  a12, sp, 8
l32ra12, .LC2
s32i.n  a13, sp, 4
s32i.n  a0, sp, 12
addia13, a12, 64
.L11:
call0   foo
movi.n  a3, 0   // OK
movia4, 0x400   // and less register allocation pressure
moveqz  a3, a4, a2
s32i.n  a3, a12, 0
addi.n  a12, a12, 4
bne a12, a13, .L11
l32i.n  a0, sp, 12
l32i.n  a12, sp, 8
l32i.n  a13, sp, 4
addisp, sp, 16
ret.n

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_rtx_costs):
Change the relative cost of '(set (reg) (const_int N))' where
N fits into signed 12-bit from 4 to 0 if optimizing for size.
And use the appropriate macro instead of the bare number 4.
---
 gcc/config/xtensa/xtensa.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 94337452ba8..a851a7ae6b3 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -4073,7 +4073,7 @@ xtensa_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
case SET:
  if (xtensa_simm12b (INTVAL (x)))
{
- *total = 4;
+ *total = speed ? COSTS_N_INSNS (1) : 0;
  return true;
}
  break;
-- 
2.20.1


[committed] arc: Add ARCHS release 310a tune variant.

2022-07-18 Thread Claudiu Zissulescu via Gcc-patches
Add mtune and mcpu options for ARCHS release 310a type CPU. The
mtune=release31a is designed to be used as an alternative to the
mcpu=hs4x_rel31 option.
ARCHS4x release 31a uses DSP instructions which are implemented a bit
different than mpy9. Hence, use safer mpy2 option.

gcc/
* config/arc/arc-arch.h (arc_tune_attr): Add
ARC_TUNE_ARCHS4X_REL31A variant.
* config/arc/arc.cc (arc_override_options): Tune options for
release 310a.
(arc_sched_issue_rate): Use correct enum.
(arc600_corereg_hazard): Textual change.
(arc_hazard): Add release 310a tunning.
* config/arc/arc.md (tune): Update and take into consideration new
tune option.
(tune_dspmpy): Likewise.
(tune_store): New attribute.
* config/arc/arc.opt (mtune): New tune option.
* config/arc/arcHS4x.md (hs4x_brcc0, hs4x_brcc1): New cpu units.
(hs4x_brcc_op): New instruction rezervation.
(hs4x_data_store_1_op): Likewise.
* config/arc/arc-cpus.def (hs4x_rel31): New cpu variant.
* config/arc/arc-tables.opt: Regenerate.
* config/arc/t-multilib: Likewise.
* doc/invoke.texi (ARC): Update mcpu and tune sections.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc-arch.h |   3 +-
 gcc/config/arc/arc-cpus.def   |   1 +
 gcc/config/arc/arc-tables.opt |   3 +
 gcc/config/arc/arc.cc | 192 +-
 gcc/config/arc/arc.md |  32 +++---
 gcc/config/arc/arc.opt|   3 +
 gcc/config/arc/arcHS4x.md |  17 ++-
 gcc/config/arc/t-multilib |   4 +-
 gcc/doc/invoke.texi   |  16 +++
 9 files changed, 181 insertions(+), 90 deletions(-)

diff --git a/gcc/config/arc/arc-arch.h b/gcc/config/arc/arc-arch.h
index 4c728a87453..83b156ee34a 100644
--- a/gcc/config/arc/arc-arch.h
+++ b/gcc/config/arc/arc-arch.h
@@ -77,7 +77,8 @@ enum arc_tune_attr
 ARC_TUNE_CORE_3,
 ARC_TUNE_ARCHS4X,
 ARC_TUNE_ARCHS4XD,
-ARC_TUNE_ARCHS4XD_SLOW
+ARC_TUNE_ARCHS4XD_SLOW,
+ARC_TUNE_ARCHS4X_REL31A
   };
 
 /* Extra options for a processor template to hold any CPU specific
diff --git a/gcc/config/arc/arc-cpus.def b/gcc/config/arc/arc-cpus.def
index baf61db02ed..5668b0fbf19 100644
--- a/gcc/config/arc/arc-cpus.def
+++ b/gcc/config/arc/arc-cpus.def
@@ -64,6 +64,7 @@ ARC_CPU (hs38, hs, FL_MPYOPT_9|FL_DIVREM|FL_LL64, 
NONE, NONE)
 ARC_CPU (hs38_linux, hs, FL_MPYOPT_9|FL_DIVREM|FL_LL64|FL_FPU_FPUD_ALL, NONE, 
NONE)
 ARC_CPU (hs4x,  hs, FL_MPYOPT_9|FL_DIVREM|FL_LL64, NONE, ARCHS4X)
 ARC_CPU (hs4xd, hs, FL_MPYOPT_9|FL_DIVREM|FL_LL64, NONE, ARCHS4XD)
+ARC_CPU (hs4x_rel31, hs, FL_MPYOPT_2|FL_DIVREM|FL_LL64, NONE, ARCHS4X_REL31A)
 
 ARC_CPU (arc600, 6xx, FL_BS, NONE, ARC600)
 ARC_CPU (arc600_norm,6xx, FL_BS|FL_NORM, NONE, ARC600)
diff --git a/gcc/config/arc/arc-tables.opt b/gcc/config/arc/arc-tables.opt
index 8cc5135205d..0a0d354db60 100644
--- a/gcc/config/arc/arc-tables.opt
+++ b/gcc/config/arc/arc-tables.opt
@@ -69,6 +69,9 @@ Enum(processor_type) String(hs4x) Value(PROCESSOR_hs4x)
 EnumValue
 Enum(processor_type) String(hs4xd) Value(PROCESSOR_hs4xd)
 
+EnumValue
+Enum(processor_type) String(hs4x_rel31) Value(PROCESSOR_hs4x_rel31)
+
 EnumValue
 Enum(processor_type) String(arc600) Value(PROCESSOR_arc600)
 
diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 77730c88e55..064790bf396 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -646,8 +646,8 @@ arc_sched_issue_rate (void)
 {
   switch (arc_tune)
 {
-case TUNE_ARCHS4X:
-case TUNE_ARCHS4XD:
+case ARC_TUNE_ARCHS4X:
+case ARC_TUNE_ARCHS4XD:
   return 3;
 default:
   break;
@@ -1458,6 +1458,12 @@ arc_override_options (void)
   if (!OPTION_SET_P (unaligned_access) && TARGET_HS)
 unaligned_access = 1;
 
+  if (TARGET_HS && (arc_tune == ARC_TUNE_ARCHS4X_REL31A))
+{
+  TARGET_CODE_DENSITY_FRAME = 0;
+  flag_delayed_branch = 0;
+}
+
   /* These need to be done at start up.  It's convenient to do them here.  */
   arc_init ();
 }
@@ -7817,6 +7823,115 @@ arc_store_addr_hazard_p (rtx_insn* producer, rtx_insn* 
consumer)
   return arc_store_addr_hazard_internal_p (producer, consumer);
 }
 
+/* Return length adjustment for INSN.
+   For ARC600:
+   A write to a core reg greater or equal to 32 must not be immediately
+   followed by a use.  Anticipate the length requirement to insert a nop
+   between PRED and SUCC to prevent a hazard.  */
+
+static int
+arc600_corereg_hazard (rtx_insn *pred, rtx_insn *succ)
+{
+  if (!TARGET_ARC600)
+return 0;
+  if (GET_CODE (PATTERN (pred)) == SEQUENCE)
+pred = as_a  (PATTERN (pred))->insn (1);
+  if (GET_CODE (PATTERN (succ)) == SEQUENCE)
+succ = as_a  (PATTERN (succ))->insn (0);
+  if (recog_memoized (pred) == CODE_FOR_mulsi_600
+  || recog_memoized (pred) == CODE_FOR_umul_600
+  || recog_memoized (pred) == CODE_FOR_mac_600
+  || recog_memoized (pred) == CODE_FOR_mul64_600
+ 

[PATCH] Fix builtin vs non-builtin partition merge in loop distribution

2022-07-18 Thread Richard Biener via Gcc-patches
When r7-6373-g40b6bff965d004 fixed a costing issue it failed to
make the logic symmetric which means that we now fuse
normal vs. builtin when the cost model says so but we don't fuse
builtin vs. normal.  The following fixes that, also allowing
the cost model to decide to fuse two builtin partitions as otherwise
an intermediate non-builtin can result in a partial merge as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-loop-distribution.cc (loop_distribution::distribute_loop):
When computing cost-based merging do not disregard builtin
classified partitions in some cases.

* gcc.dg/tree-ssa/ldist-24.c: XFAIL.
* gcc.dg/tree-ssa/ldist-36.c: Adjust expected outcome.
---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c | 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/ldist-36.c | 3 ++-
 gcc/tree-loop-distribution.cc| 5 +
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c
index 75f7b8f0c88..2403a24293b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c
@@ -20,5 +20,6 @@ void foo ()
   }
 }
 
-/* { dg-final { scan-tree-dump "generated memcpy" "ldist" } } */
-/* { dg-final { scan-tree-dump "generated memset zero" "ldist" } } */
+/* The cost modeling does not consider WAR as beneficial to split.  */
+/* { dg-final { scan-tree-dump "generated memcpy" "ldist" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "generated memset zero" "ldist" { xfail *-*-* } 
} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-36.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ldist-36.c
index 07393f0a665..6d560060e09 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-36.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-36.c
@@ -25,4 +25,5 @@ foo (struct st * restrict p)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "Loop nest . distributed: split to 0 
loops and 3 library" 1 "ldist" } } */
+/* The cost modeling doesn't consider splitting a WAR re-use profitable.  */
+/* { dg-final { scan-tree-dump-times "Loop nest . distributed: split to 1 
loops and 1 library" 1 "ldist" } } */
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 0714bc41a43..0ee441c077d 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -3090,10 +3090,7 @@ loop_distribution::distribute_loop (class loop *loop,
   for (i = 0; partitions.iterate (i, ); ++i)
 {
   bool changed = false;
-  if (partition_builtin_p (into) || into->kind == PKIND_PARTIAL_MEMSET)
-   continue;
-  for (int j = i + 1;
-  partitions.iterate (j, ); ++j)
+  for (int j = i + 1; partitions.iterate (j, ); ++j)
{
  if (share_memory_accesses (rdg, into, partition))
{
-- 
2.37.0


[committed 2/2] libgcc/arc: Update udivmodsi4 and make the lib safe for rf16

2022-07-18 Thread Claudiu Zissulescu via Gcc-patches
From: Claudiu Zissulescu 

The ARC soft udivmodsi4 algorithm and as well as using umodsi3
for reduced register set configurations are wrong.

libgcc/
* config/arc/lib2funcs.c (udivmodsi4): Update AND mask.
* config/arc/lib1funcs.S (umodsi3): Don't use it for RF16
configurations.
---
 libgcc/config/arc/lib1funcs.S | 2 ++
 libgcc/config/arc/lib2funcs.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/arc/lib1funcs.S b/libgcc/config/arc/lib1funcs.S
index 14fd1d2f303..b06361257d1 100644
--- a/libgcc/config/arc/lib1funcs.S
+++ b/libgcc/config/arc/lib1funcs.S
@@ -936,6 +936,7 @@ SYM(__divsi3):
 
 #endif /* L_divsi3 */
 
+#ifndef __ARC_RF16__
 #ifdef  L_umodsi3
.section .text
.align 4
@@ -950,6 +951,7 @@ SYM(__umodsi3):
ENDFUNC(__umodsi3)
 
 #endif /* L_umodsi3 */
+#endif /* !__ARC_RF16__ */
 
 #ifdef  L_modsi3
.section .text
diff --git a/libgcc/config/arc/lib2funcs.c b/libgcc/config/arc/lib2funcs.c
index 70727b55365..8cba45172b2 100644
--- a/libgcc/config/arc/lib2funcs.c
+++ b/libgcc/config/arc/lib2funcs.c
@@ -59,7 +59,7 @@ udivmodsi4 (nint32_t num, nint32_t den, word_t modwanted)
   nint32_t bit = 1;
   nint32_t res = 0;
 
-  while (den < num && bit && !(den & (1LL << 63)))
+  while (den < num && bit && !(den & (1L << 31)))
 {
   den <<= 1;
   bit <<= 1;
-- 
2.30.2



[committed 1/2] arc: Fix interrupt's epilogue.

2022-07-18 Thread Claudiu Zissulescu via Gcc-patches
The stack pointer adjustment in interrupt epilogue is happening after
restoring the ZOL registers which is wrong. Fixing this.

gcc/
* config/arc/arc.cc (arc_expand_epilogue): Adjust the frame
pointer first when in interrupts.

gcc/testsuite/
* gcc.target/arc/interrupt-13.c: New file.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.cc   |  2 +-
 gcc/testsuite/gcc.target/arc/interrupt-13.c | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/interrupt-13.c

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index fbc17e684a0..77730c88e55 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -3965,7 +3965,7 @@ arc_expand_epilogue (int sibcall_p)
   if (size)
 emit_insn (gen_blockage ());
 
-  if (ARC_INTERRUPT_P (fn_type) && restore_fp)
+  if (ARC_INTERRUPT_P (fn_type))
 {
   /* We need to restore FP before any SP operation in an
 interrupt.  */
diff --git a/gcc/testsuite/gcc.target/arc/interrupt-13.c 
b/gcc/testsuite/gcc.target/arc/interrupt-13.c
new file mode 100644
index 000..0ed8451c512
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/interrupt-13.c
@@ -0,0 +1,15 @@
+/* { dg-options "-O2" } */
+
+extern int foo (int *);
+
+void __attribute__((interrupt("ilink")))
+irq (void)
+{
+  struct {
+int x0;
+int x1;
+  } a = {1 ,2};
+  foo ((int *));
+}
+
+/* { dg-final { scan-assembler "add_s\\s+sp,sp,8.*pop_s\\s+r0" } } */
-- 
2.30.2



[committed] arc: Fix interrupt's epilogue.

2022-07-18 Thread Claudiu Zissulescu via Gcc-patches
The stack pointer adjustment in interrupt epilogue is happening after
restoring the ZOL registers which is wrong. Fixing this.

gcc/
* config/arc/arc.cc (arc_expand_epilogue): Adjust the frame
pointer first when in interrupts.

gcc/testsuite/
* gcc.target/arc/interrupt-13.c: New file.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.cc   |  2 +-
 gcc/testsuite/gcc.target/arc/interrupt-13.c | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/interrupt-13.c

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index fbc17e684a0..77730c88e55 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -3965,7 +3965,7 @@ arc_expand_epilogue (int sibcall_p)
   if (size)
 emit_insn (gen_blockage ());
 
-  if (ARC_INTERRUPT_P (fn_type) && restore_fp)
+  if (ARC_INTERRUPT_P (fn_type))
 {
   /* We need to restore FP before any SP operation in an
 interrupt.  */
diff --git a/gcc/testsuite/gcc.target/arc/interrupt-13.c 
b/gcc/testsuite/gcc.target/arc/interrupt-13.c
new file mode 100644
index 000..0ed8451c512
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/interrupt-13.c
@@ -0,0 +1,15 @@
+/* { dg-options "-O2" } */
+
+extern int foo (int *);
+
+void __attribute__((interrupt("ilink")))
+irq (void)
+{
+  struct {
+int x0;
+int x1;
+  } a = {1 ,2};
+  foo ((int *));
+}
+
+/* { dg-final { scan-assembler "add_s\\s+sp,sp,8.*pop_s\\s+r0" } } */
-- 
2.30.2



[PATCH] Improve common reduction vs builtin code generation in loop distribution

2022-07-18 Thread Richard Biener via Gcc-patches
loop distribution currently cannot handle the situation when the
last partition is a builtin but there's a common reduction in all
partitions (like the final IV value).  The following lifts this
restriction by making the last non-builtin partition provide the
definitions for the loop-closed PHI nodes.  Since we have heuristics
in place to avoid code generating builtins last writing a testcase
is difficult (but I ran into a case with other pending patches that
made the heuristic ineffective).  What's remaining is the inability
to preserve common reductions when all partitions could be builtins
(in some cases final value replacement could come to the rescue here).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-loop-distribution.cc (copy_loop_before): Add
the ability to replace the original LC PHI defs.
(generate_loops_for_partition): Pass through a flag
whether to redirect original LC PHI defs.
(generate_code_for_partition): Likewise.
(loop_distribution::distribute_loop): Compute the partition
that should provide the LC PHI defs for common reductions
and pass that down.
---
 gcc/tree-loop-distribution.cc | 64 ---
 1 file changed, 45 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index ed7f432f322..0714bc41a43 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -942,7 +942,7 @@ stmt_has_scalar_dependences_outside_loop (loop_p loop, 
gimple *stmt)
 /* Return a copy of LOOP placed before LOOP.  */
 
 static class loop *
-copy_loop_before (class loop *loop)
+copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
 {
   class loop *res;
   edge preheader = loop_preheader_edge (loop);
@@ -950,6 +950,24 @@ copy_loop_before (class loop *loop)
   initialize_original_copy_tables ();
   res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
   gcc_assert (res != NULL);
+
+  /* When a not last partition is supposed to keep the LC PHIs computed
+ adjust their definitions.  */
+  if (redirect_lc_phi_defs)
+{
+  edge exit = single_exit (loop);
+  for (gphi_iterator si = gsi_start_phis (exit->dest); !gsi_end_p (si);
+  gsi_next ())
+   {
+ gphi *phi = si.phi ();
+ if (virtual_operand_p (gimple_phi_result (phi)))
+   continue;
+ use_operand_p use_p = PHI_ARG_DEF_PTR_FROM_EDGE (phi, exit);
+ tree new_def = get_current_def (USE_FROM_PTR (use_p));
+ SET_USE (use_p, new_def);
+   }
+}
+
   free_original_copy_tables ();
   delete_update_ssa ();
 
@@ -977,7 +995,7 @@ create_bb_after_loop (class loop *loop)
 
 static void
 generate_loops_for_partition (class loop *loop, partition *partition,
- bool copy_p)
+ bool copy_p, bool keep_lc_phis_p)
 {
   unsigned i;
   basic_block *bbs;
@@ -985,7 +1003,7 @@ generate_loops_for_partition (class loop *loop, partition 
*partition,
   if (copy_p)
 {
   int orig_loop_num = loop->orig_loop_num;
-  loop = copy_loop_before (loop);
+  loop = copy_loop_before (loop, keep_lc_phis_p);
   gcc_assert (loop != NULL);
   loop->orig_loop_num = orig_loop_num;
   create_preheader (loop, CP_SIMPLE_PREHEADERS);
@@ -1336,7 +1354,8 @@ destroy_loop (class loop *loop)
 
 static bool 
 generate_code_for_partition (class loop *loop,
-partition *partition, bool copy_p)
+partition *partition, bool copy_p,
+bool keep_lc_phis_p)
 {
   switch (partition->kind)
 {
@@ -1345,7 +1364,8 @@ generate_code_for_partition (class loop *loop,
   /* Reductions all have to be in the last partition.  */
   gcc_assert (!partition_reduction_p (partition)
  || !copy_p);
-  generate_loops_for_partition (loop, partition, copy_p);
+  generate_loops_for_partition (loop, partition, copy_p,
+   keep_lc_phis_p);
   return false;
 
 case PKIND_MEMSET:
@@ -3013,6 +3033,7 @@ loop_distribution::distribute_loop (class loop *loop,
 
   bool any_builtin = false;
   bool reduction_in_all = false;
+  int reduction_partition_num = -1;
   FOR_EACH_VEC_ELT (partitions, i, partition)
 {
   reduction_in_all
@@ -3092,10 +3113,13 @@ loop_distribution::distribute_loop (class loop *loop,
 }
 
   /* Put a non-builtin partition last if we need to preserve a reduction.
- ???  This is a workaround that makes sort_partitions_by_post_order do
- the correct thing while in reality it should sort each component
- separately and then put the component with a reduction or a non-builtin
- last.  */
+ In most cases this helps to keep a normal partition last avoiding to
+ spill a reduction result across builtin calls.
+ ???  The proper way would be to use dependences to see whether we
+   

Re: [PATCH] Simplify branching in algos

2022-07-18 Thread Jonathan Wakely via Gcc-patches
On Mon, 18 Jul 2022 at 11:25, François Dumont  wrote:
>
> Hi
>
>  I just noticed that I still had this nice enhancement in my local
> branches.
>
>  Ok to commit ?

OK, thanks.



RE: [PATCH] arm: Replace arm_builtin_vectorized_function [PR106253]

2022-07-18 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, July 13, 2022 9:14 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Kyrylo Tkachov
> 
> Subject: [PATCH] arm: Replace arm_builtin_vectorized_function [PR106253]
> 
> This patch extends the fix for PR106253 to AArch32.  As with AArch64,
> we were using ACLE intrinsics to vectorise scalar built-ins, even
> though the two sometimes have different ECF_* flags.  (That in turn
> is because the ACLE intrinsics should follow the instruction semantics
> as closely as possible, whereas the scalar built-ins follow language
> specs.)
> 
> The patch also removes the copysignf built-in, which only existed
> for this purpose and wasn't a “real” arm_neon.h built-in.
> 
> Doing this also has the side-effect of enabling vectorisation of
> rint and roundeven.  Logically that should be a separate patch,
> but making it one would have meant adding a new int iterator
> for the original set of instructions and then removing it again
> when including new functions.
> 
> I've restricted the bswap tests to little-endian because we end
> up with excessive spilling on big-endian.  E.g.:
> 
> sub sp, sp, #8
> vstrd1, [sp]
> vldrd16, [sp]
> vrev16.8d16, d16
> vstrd16, [sp]
> vldrd0, [sp]
> add sp, sp, #8
> @ sp needed
> bx  lr
> 
> Similarly, the copysign tests require little-endian because on
> big-endian we unnecessarily load the constant from the constant pool:
> 
> vldr.32 s15, .L3
> vdup.32 d0, d7[1]
> vbsld0, d2, d1
> bx  lr
> .L3:
> .word   -2147483648
> 
> Tested on arm-linux-gnueabihf and armeb-eabi.  OK to install?

Ok.
Thanks,
Kyrill

> 
> Richard
> 
> 
> gcc/
>   * config/arm/arm-builtins.cc (arm_builtin_vectorized_function):
>   Delete.
>   * config/arm/arm-protos.h (arm_builtin_vectorized_function):
> Delete.
>   * config/arm/arm.cc
> (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION):
>   Delete.
>   * config/arm/arm_neon_builtins.def (copysignf): Delete.
>   * config/arm/iterators.md (nvrint_pattern): New attribute.
>   * config/arm/neon.md
> (2):
>   New pattern.
>   (l esult>2):
>   Likewise.
>   (neon_copysignf): Rename to...
>   (copysign3): ...this.
> 
> gcc/testsuite/
>   * gcc.target/arm/vect_unary_1.c: New test.
>   * gcc.target/arm/vect_binary_1.c: Likewise.
> ---
>  gcc/config/arm/arm-builtins.cc   | 123 --
>  gcc/config/arm/arm-protos.h  |   1 -
>  gcc/config/arm/arm.cc|   4 -
>  gcc/config/arm/arm_neon_builtins.def |   1 -
>  gcc/config/arm/iterators.md  |   7 +
>  gcc/config/arm/neon.md   |  17 +-
>  gcc/testsuite/gcc.target/arm/vect_binary_1.c |  50 +
>  gcc/testsuite/gcc.target/arm/vect_unary_1.c  | 224 +++
>  8 files changed, 297 insertions(+), 130 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/vect_binary_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/vect_unary_1.c
> 
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index d917137e5ee..8f8155c4413 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -4026,129 +4026,6 @@ arm_expand_builtin (tree exp,
>return NULL_RTX;
>  }
> 
> -tree
> -arm_builtin_vectorized_function (unsigned int fn, tree type_out, tree
> type_in)
> -{
> -  machine_mode in_mode, out_mode;
> -  int in_n, out_n;
> -  bool out_unsigned_p = TYPE_UNSIGNED (type_out);
> -
> -  /* Can't provide any vectorized builtins when we can't use NEON.  */
> -  if (!TARGET_NEON)
> -return NULL_TREE;
> -
> -  if (TREE_CODE (type_out) != VECTOR_TYPE
> -  || TREE_CODE (type_in) != VECTOR_TYPE)
> -return NULL_TREE;
> -
> -  out_mode = TYPE_MODE (TREE_TYPE (type_out));
> -  out_n = TYPE_VECTOR_SUBPARTS (type_out);
> -  in_mode = TYPE_MODE (TREE_TYPE (type_in));
> -  in_n = TYPE_VECTOR_SUBPARTS (type_in);
> -
> -/* ARM_CHECK_BUILTIN_MODE and ARM_FIND_VRINT_VARIANT are used
> to find the
> -   decl of the vectorized builtin for the appropriate vector mode.
> -   NULL_TREE is returned if no such builtin is available.  */
> -#undef ARM_CHECK_BUILTIN_MODE
> -#define ARM_CHECK_BUILTIN_MODE(C)\
> -  (TARGET_VFP5   \
> -   && flag_unsafe_math_optimizations \
> -   && ARM_CHECK_BUILTIN_MODE_1 (C))
> -
> -#undef ARM_CHECK_BUILTIN_MODE_1
> -#define ARM_CHECK_BUILTIN_MODE_1(C) \
> -  (out_mode == SFmode && out_n == C \
> -   && in_mode == SFmode && in_n == C)
> -
> -#undef ARM_FIND_VRINT_VARIANT
> -#define ARM_FIND_VRINT_VARIANT(N) \
> -  (ARM_CHECK_BUILTIN_MODE (2) \
> -? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v2sf, false) \
> -: (ARM_CHECK_BUILTIN_MODE (4) \
> -  ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v4sf, false) \
> -  : NULL_TREE))
> -
> -  switch (fn)
> -{
> -

Re: [PATCH] aarch64: Replace manual swapping idiom with std::swap in aarch64.cc

2022-07-18 Thread Richard Sandiford via Gcc-patches
Richard Ball  writes:
> Replace manual swapping idiom with std::swap in aarch64.cc
>
> gcc/config/aarch64/aarch64.cc has a few manual swapping idioms of the form:
>
> x = in0, in0 = in1, in1 = x;
>
> The preferred way is using the standard:
>
> std::swap (in0, in1);
>
> We should just fix these to use std::swap.
> This will also allow us to eliminate the x temporary rtx.
>
> gcc/ChangeLog:
>
>  * config/aarch64/aarch64.cc (aarch64_evpc_trn): Use std:swap.
>  (aarch64_evpc_uzp): Likewise.
>  (aarch64_evpc_zip): Likewise.

Thanks, pushed to trunk.

Richard

> ---
>
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> d049f9a9819628a73bfd57114c3b89d848da7d9c..b75a032c2f2c55d71bcb6b4b6ef1bd7f42a97235
>  
> 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23498,7 +23498,7 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
>   {
> HOST_WIDE_INT odd;
> poly_uint64 nelt = d->perm.length ();
> -  rtx out, in0, in1, x;
> +  rtx out, in0, in1;
> machine_mode vmode = d->vmode;
>
> if (GET_MODE_UNIT_SIZE (vmode) > 8)
> @@ -23522,7 +23522,7 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
>at the head of aarch64-sve.md for details.  */
> if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
>   {
> -  x = in0, in0 = in1, in1 = x;
> +  std::swap (in0, in1);
> odd = !odd;
>   }
> out = d->target;
> @@ -23592,7 +23592,7 @@ static bool
>   aarch64_evpc_uzp (struct expand_vec_perm_d *d)
>   {
> HOST_WIDE_INT odd;
> -  rtx out, in0, in1, x;
> +  rtx out, in0, in1;
> machine_mode vmode = d->vmode;
>
> if (GET_MODE_UNIT_SIZE (vmode) > 8)
> @@ -23615,7 +23615,7 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
>at the head of aarch64-sve.md for details.  */
> if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
>   {
> -  x = in0, in0 = in1, in1 = x;
> +  std::swap (in0, in1);
> odd = !odd;
>   }
> out = d->target;
> @@ -23631,7 +23631,7 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
>   {
> unsigned int high;
> poly_uint64 nelt = d->perm.length ();
> -  rtx out, in0, in1, x;
> +  rtx out, in0, in1;
> machine_mode vmode = d->vmode;
>
> if (GET_MODE_UNIT_SIZE (vmode) > 8)
> @@ -23656,7 +23656,7 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
>at the head of aarch64-sve.md for details.  */
> if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
>   {
> -  x = in0, in0 = in1, in1 = x;
> +  std::swap (in0, in1);
> high = !high;
>   }
> out = d->target;


Re: [PATCH] Simplify branching in algos

2022-07-18 Thread François Dumont via Gcc-patches

Hi

    I just noticed that I still had this nice enhancement in my local 
branches.


    Ok to commit ?

François

On 21/11/21 21:34, François Dumont wrote:
A recent thread on this mailing list made me remember that this 
proposal is still open.


I've updated it just to add a missing std qualification.

François

On 08/06/21 5:21 pm, Jonathan Wakely wrote:

I haven't forgotten this one, I just need to double-check that we
don't create another problem like std::rotate in 9.1

I'll try to finish the review tomorrow.

J.


On 27/05/21 07:04 +0200, François Dumont via Libstdc++ wrote:
Following latest fixes in std::inplace_merge and std::stable_sort 
you propose Jonathan to enhance branching in the first.


Here is a proposal based on yours to do so in both algos.

    libstdc++: Enhance branching in std::inplace_merge and 
std::stable_sort


    libstdc++-v3/ChangeLog:

    * include/bits/stl_algo.h
    (__merge_adaptive): Adapt to merge only when buffer is 
large enough..
    (__merge_adaptive_resize): New, adapt merge when buffer 
is too small.

    (__inplace_merge): Adapt, use latter.
    (__stable_sort_adaptive): Adapt to sort only when buffer 
is large enough.
    (__stable_sort_adaptive_resize): New, adapt sort when 
buffer is too small.

    (__stable_sort): Adapt, use latter.

Tested under Linux x64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 1d8ed4e5fa8..c6078054514 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -2397,21 +2397,35 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
 		 _BidirectionalIterator __middle,
 		 _BidirectionalIterator __last,
 		 _Distance __len1, _Distance __len2,
-		 _Pointer __buffer, _Distance __buffer_size,
-		 _Compare __comp)
+		 _Pointer __buffer, _Compare __comp)
 {
-  if (__len1 <= __len2 && __len1 <= __buffer_size)
+  if (__len1 <= __len2)
 	{
 	  _Pointer __buffer_end = _GLIBCXX_MOVE3(__first, __middle, __buffer);
 	  std::__move_merge_adaptive(__buffer, __buffer_end, __middle, __last,
  __first, __comp);
 	}
-  else if (__len2 <= __buffer_size)
+  else
 	{
 	  _Pointer __buffer_end = _GLIBCXX_MOVE3(__middle, __last, __buffer);
 	  std::__move_merge_adaptive_backward(__first, __middle, __buffer,
 	  __buffer_end, __last, __comp);
 	}
+}
+
+  template
+void
+__merge_adaptive_resize(_BidirectionalIterator __first,
+			_BidirectionalIterator __middle,
+			_BidirectionalIterator __last,
+			_Distance __len1, _Distance __len2,
+			_Pointer __buffer, _Distance __buffer_size,
+			_Compare __comp)
+{
+  if (__len1 <= __buffer_size || __len2 <= __buffer_size)
+	std::__merge_adaptive(__first, __middle, __last,
+			  __len1, __len2, __buffer, __comp);
   else
 	{
 	  _BidirectionalIterator __first_cut = __first;
@@ -2439,14 +2453,14 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
 
 	  _BidirectionalIterator __new_middle
 	= std::__rotate_adaptive(__first_cut, __middle, __second_cut,
- __len1 - __len11, __len22, __buffer,
- __buffer_size);
-	  std::__merge_adaptive(__first, __first_cut, __new_middle, __len11,
-__len22, __buffer, __buffer_size, __comp);
-	  std::__merge_adaptive(__new_middle, __second_cut, __last,
-__len1 - __len11,
-__len2 - __len22, __buffer,
-__buffer_size, __comp);
+ __len1 - __len11, __len22,
+ __buffer, __buffer_size);
+	  std::__merge_adaptive_resize(__first, __first_cut, __new_middle,
+   __len11, __len22,
+   __buffer, __buffer_size, __comp);
+	  std::__merge_adaptive_resize(__new_middle, __second_cut, __last,
+   __len1 - __len11, __len2 - __len22,
+   __buffer, __buffer_size, __comp);
 	}
 }
 
@@ -2524,11 +2538,14 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
   // [first,middle) and [middle,last).
   _TmpBuf __buf(__first, std::min(__len1, __len2));
 
-  if (__buf.begin() == 0)
+  if (__builtin_expect(__buf.size() == __buf.requested_size(), true))
+	std::__merge_adaptive
+	  (__first, __middle, __last, __len1, __len2, __buf.begin(), __comp);
+  else if (__builtin_expect(__buf.begin() == 0, false))
 	std::__merge_without_buffer
 	  (__first, __middle, __last, __len1, __len2, __comp);
   else
-	std::__merge_adaptive
+	std::__merge_adaptive_resize
 	  (__first, __middle, __last, __len1, __len2, __buf.begin(),
 	   _DistanceType(__buf.size()), __comp);
 }
@@ -2709,10 +2726,25 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
 	}
 }
 
+  template
+void
+__stable_sort_adaptive(_RandomAccessIterator __first,
+			   _RandomAccessIterator __middle,
+			   _RandomAccessIterator __last,
+			   _Pointer __buffer, _Compare __comp)
+{
+  std::__merge_sort_with_buffer(__first, __middle, __buffer, __comp);
+  std::__merge_sort_with_buffer(__middle, __last, 

Re: [PATCH] ipa-cp: Fix assert triggering with -fno-toplevel-reorder (PR 106260)

2022-07-18 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> with -fno-toplevel-reorder (and -fwhole-program), there apparently can
> be local functions without any callers.  This is something that IPA-CP

If there is possibility to trigger a local function without callers, I
think one can also make two local functions calling each other but with
no other callers.

So I think you need to wait until dataflow solution stabilizes and then
then find such isolated cases and do something sensible (as of not
crashing since the code is dead anyway).

-fno-toplevel-reorder was kind of meant to get close to what -fno-unit-at-a-time
did to make legacy code (kernel) happy.  Without unit-at-a-time we did
not remove local functions unless we decided to inline them since they
were output before we had chance to realize that they were unused.

Since we document that only about static variables, perhaps we could
simply start removing functions to avoid these side cases?

Honza

> does not like because its propagation verifier checks that local
> functions do not end up with TOP in their lattices.  Therefore there
> is an assert checking that all call-less unreachable functions have
> been removed, which triggers in PR 106260 with these two options.
> 
> This patch detects the situation and marks the lattices as variable,
> thus avoiding both the assert trigger and the verification failure.
> 
> Bootstrapped and tested on x86_64-linux.  OK for master and then all
> active release branches?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2022-07-13  Martin Jambor  
> 
>   PR ipa/106260
>   * ipa-cp.cc (initialize_node_lattices): Replace assert that there are
>   callers with handling that situation when -fno-toplevel_reorder.
> 
> gcc/testsuite/ChangeLog:
> 
> 2022-07-13  Martin Jambor  
> 
>   PR ipa/106260
>   * g++.dg/ipa/pr106260.C: New test.
> ---
>  gcc/ipa-cp.cc   |  6 ++-
>  gcc/testsuite/g++.dg/ipa/pr106260.C | 64 +
>  2 files changed, 69 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr106260.C
> 
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 543a9334e2c..f699a8dadc0 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -1286,10 +1286,14 @@ initialize_node_lattices (struct cgraph_node *node)
>int caller_count = 0;
>node->call_for_symbol_thunks_and_aliases (count_callers, _count,
>   true);
> -  gcc_checking_assert (caller_count > 0);
>if (caller_count == 1)
>   node->call_for_symbol_thunks_and_aliases (set_single_call_flag,
> NULL, true);
> +  else if (caller_count == 0)
> + {
> +   gcc_checking_assert (!opt_for_fn (node->decl, flag_toplevel_reorder));
> +   variable = true;
> + }
>  }
>else
>  {
> diff --git a/gcc/testsuite/g++.dg/ipa/pr106260.C 
> b/gcc/testsuite/g++.dg/ipa/pr106260.C
> new file mode 100644
> index 000..bd3b6e0af79
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr106260.C
> @@ -0,0 +1,64 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -std=gnu++14 -fwhole-program -fno-unit-at-a-time" }
> +
> +struct A;
> +template 
> +struct Q { Q (T); };
> +template
> +struct U {
> +  ~U () { m1 (nullptr); }
> +  D m2 ();
> +  T *u;
> +  void m1 (T *) { m2 () (u); }
> +};
> +struct F { F (int *); };
> +template 
> +using W = Q;
> +int a, b;
> +void fn1 (void *);
> +template 
> +void
> +fn2 (T *x)
> +{
> +  if (x)
> +x->~T();
> +  fn1 (x);
> +}
> +template 
> +struct C {
> +  void operator() (T *x) { fn2 (x); }
> +};
> +struct D;
> +template  >
> +using V = U;
> +struct A {
> +  A (int *);
> +};
> +struct S;
> +struct G {
> +  V m3 ();
> +};
> +struct S {
> +  int e;
> +  virtual ~S () {}
> +};
> +template
> +struct H {
> +  H (int, T x, int) : h(x) {}
> +  G g;
> +  void m4 () { g.m3 (); }
> +  T h;
> +};
> +struct I {
> +  I(A, W);
> +};
> +void
> +test ()
> +{
> +  A c ();
> +  W d ();
> +  I e (c, d);
> +  H f (0, e, a);
> +  f.m4 ();
> +}
> +
> -- 
> 2.36.1
> 


Re: Re: [PATCH] libstdc++: Make __from_chars_alnum_to_val conversion explicit

2022-07-18 Thread Marco Falke via Gcc-patches
(in reply to https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598412.html,
adding libstdc++ to CC, with the same patch attached again)

To clarify, this is not a fix for a user-facing issue of gcc or a fix
for UB. It is just a minor UX improvement for developers that use the
clang integer sanitizer to detect implicit int conversions.

To reproduce:

$ cat 1.cpp
#include 

int main() {
  const auto a{"-1"};
  unsigned b{};
  std::from_chars(a, a + 2, b);
}
$ clang++ -fsanitize=integer -std=c++17 1.cpp -o exe && ./exe
/usr/bin/../lib64/gcc/x86_64-s
use-linux/12/../../../../include/c++/12/charconv:439:9:
runtime error: implicit conversion from type 'int' of value -3
(32-bit, signed) to type 'unsigned char' changed the value to 253
(8-bit, unsigned)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
/usr/bin/../lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/charconv:439:9

Best,
Marco
From 2d4e7cd1d476a065d824e11045c8dbc049d5f0a0 Mon Sep 17 00:00:00 2001
From: MacroFake 
Date: Thu, 14 Jul 2022 15:26:12 +0200
Subject: [PATCH] libstdc++: Make __from_chars_alnum_to_val conversion explicit

The optimizations from commit a54137c88061c7495728fc6b8dfd0474e812b2cb
introduced a clang integer sanitizer error.

Fix this with an explicit static_cast, similar to the fix in commit
074436cf8cdd2a9ce75cadd36deb8301f00e55b9.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum_to_val): Replace
  implicit conversions from int to unsigned char with explicit
  casts.
---
 libstdc++-v3/include/std/charconv | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv
index 218813e4797..bdf23e4a5ad 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -436,7 +436,7 @@ namespace __detail
 __from_chars_alnum_to_val(unsigned char __c)
 {
   if _GLIBCXX17_CONSTEXPR (_DecOnly)
-	return __c - '0';
+   return static_cast(__c - '0');
   else
 	{
 	  // This initializer is deliberately made dependent in order to work
-- 
2.35.3



Re: [PATCH v3, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]

2022-07-18 Thread HAO CHEN GUI via Gcc-patches
Hi Segher,
  Thanks for your comments.

On 13/7/2022 上午 1:26, Segher Boessenkool wrote:
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -7078,27 +7078,38 @@ (define_expand "subti3"
>>  })
>>  
>>  ;; 128-bit logical operations expanders
>> +;; Fail TImode in all 128-bit logical operations expanders and split it into
>> +;; two DI registers.
>>
>>  (define_expand "and3"
>>[(set (match_operand:BOOL_128 0 "vlogical_operand")
>>  (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>>(match_operand:BOOL_128 2 "vlogical_operand")))]
>>""
>> -  "")
>> +{
>> +  if (mode == TImode)
>> +FAIL;
>> +})
> It is better to not FAIL it, but simply not have a pattern for the
> TImode version at all.
> 
> Does nothing depend on the :TI version to exist?
> 
> What about the :PTI version?  Getting rid of that as well will allow
> some nice optimisations.
> 
> Of course we *do* have instructions to do such TImode ops, on newer
> CPUs, but in vector registers only.  It isn't obvious what is faster.
> 

During expand, TI mode is split to two registers when it can't match
any expands. So I failed TI mode in each expand and expect to be
split at expand. TI mode is still in some insn_and_split patterns
(e.g. "*and3_internal"). If later rtl passes generate TI mode
logical operations, they still can be matched.

Originally, the TI mode is split after reload pass by
rs6000_split_logical. It's too late to catch some rtl optimizations.

For the PTI, it can't be split to two registers during expand. PTI
requires an even/odd register pair. So splitting it after reload can
make sure it gets correct registers, I think.

>From my understanding, it's sub-optimal to use vector logical operation
instructions for TI mode if the destination is an integer operand. It
needs three instructions (move to vector register, vector logical
operation and move from vector register). When splitting TImode, it only
needs two logical instructions on two separate registers.

Thanks again
Gui Haochen


Re: [Ada] Fix typos in comments

2022-07-18 Thread Arnaud Charlet via Gcc-patches
OK, thanks.


Re: [middle-end PATCH] PR c/106264: Silence warnings from __builtin_modf et al.

2022-07-18 Thread Richard Biener via Gcc-patches
On Sat, Jul 16, 2022 at 2:54 PM Roger Sayle  wrote:
>
>
> This middle-end patch resolves PR c/106264 which is a spurious warning
> regression caused by the tree-level expansion of modf, frexp and remquo
> producing "expression has no-effect" when the built-in function's result
> is ignored.  When these built-ins were first expanded at tree-level,
> fold_builtin_n would blindly set TREE_NO_WARNING for all built-ins.
> Now that we're more discerning, we should precisely set TREE_NO_WARNING
> selectively on those COMPOUND_EXPRs that need them.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?

I think you should use

suppress_warning  (res);

I can't think of a specific OPT_Wxyz to use so the end result should
be the same.

OK with that change.

Richard.

> 2022-07-16  Roger Sayle  
>
> gcc/ChangeLog
> PR c/106264
> * builtins.cc (fold_builtin_frexp): Set TREE_NO_WARNING on
> COMPOUND_EXPR to silence spurious warning if result isn't used.
> (fold_builtin_modf): Likewise.
> (do_mpfr_remquo): Likewise.
>
> gcc/testsuite/ChangeLog
> PR c/106264
> * gcc.dg/pr106264.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] Extend 16/32-bit vector bit_op patterns with (m,0,i)(vertical) alternative.

2022-07-18 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 18, 2022 at 3:59 AM liuhongt  wrote:
>
> And split it after reload.
>
> >IMO, the only case it is worth adding is a direct immediate store to
> >memory, which HJ recently added.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/106038
> * config/i386/mmx.md (3): Extend to AND mem,imm,
> and adjust below define_split.
> (mmxinsnmode): New mode attribute.
> (*mov_imm): Refactor with mmxinsnmode.
> * config/i386/predicates.md
> (register_or_x86_64_const_vector_operand): New predicate.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr106038-1.c: New test.
> ---
>  gcc/config/i386/mmx.md | 58 +++---
>  gcc/config/i386/predicates.md  |  4 ++
>  gcc/testsuite/gcc.target/i386/pr106038-1.c | 27 ++
>  3 files changed, 60 insertions(+), 29 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-1.c
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 3294c1e6274..fbcb34d4395 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -86,6 +86,14 @@ (define_mode_attr mmxvecsize
>[(V8QI "b") (V4QI "b") (V2QI "b")
> (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")])
>
> +;; Mapping to same size integral mode.
> +(define_mode_attr mmxinsnmode
> +  [(V8QI "DI") (V4QI "SI") (V2QI "HI")
> +   (V4HI "DI") (V2HI "SI")
> +   (V2SI "DI")
> +   (V4HF "DI") (V2HF "SI")
> +   (V2SF "DI")])
> +
>  (define_mode_attr mmxdoublemode
>[(V8QI "V8HI") (V4HI "V4SI")])
>
> @@ -350,22 +358,7 @@ (define_insn_and_split "*mov_imm"
>HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1],
> mode);
>operands[1] = GEN_INT (val);
> -  machine_mode mode;
> -  switch (GET_MODE_SIZE (mode))
> -{
> -case 2:
> -  mode = HImode;
> -  break;
> -case 4:
> -  mode = SImode;
> -  break;
> -case 8:
> -  mode = DImode;
> -  break;
> -default:
> -  gcc_unreachable ();
> -}
> -  operands[0] = lowpart_subreg (mode, operands[0], mode);
> +  operands[0] = lowpart_subreg (mode, operands[0], mode);
>  })
>
>  ;; For TARGET_64BIT we always round up to 8 bytes.
> @@ -2975,32 +2968,39 @@ (define_insn "*mmx_3"
> (set_attr "mode" "DI,TI,TI,TI")])
>
>  (define_insn "3"
> -  [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v")
> +  [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v")
>  (any_logic:VI_16_32
> - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v")
> - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v")))
> + (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v")
> + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand" 
> "r,i,x,x,v")))
> (clobber (reg:CC FLAGS_REG))]
>""

You will need ix86_binary_operator_ok insn constraint here with
corresponding expander using ix86_fixup_binary_operands_no_copy to
prepare insn operands.

>"#"
> -  [(set_attr "isa" "*,sse2_noavx,avx,avx512vl")
> -   (set_attr "type" "alu,sselog,sselog,sselog")
> -   (set_attr "mode" "SI,TI,TI,TI")])
> +  [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl")
> +   (set_attr "type" "alu,alu,sselog,sselog,sselog")
> +   (set_attr "mode" "SI,SI,TI,TI,TI")])
>
>  (define_split
> -  [(set (match_operand:VI_16_32 0 "general_reg_operand")
> +  [(set (match_operand:VI_16_32 0 "nonimmediate_gr_operand")
>  (any_logic:VI_16_32
> - (match_operand:VI_16_32 1 "general_reg_operand")
> - (match_operand:VI_16_32 2 "general_reg_operand")))
> + (match_operand:VI_16_32 1 "nonimmediate_gr_operand")
> + (match_operand:VI_16_32 2 
> "register_or_x86_64_const_vector_operand")))
> (clobber (reg:CC FLAGS_REG))]
>"reload_completed"
>[(parallel
>   [(set (match_dup 0)
> -  (any_logic:SI (match_dup 1) (match_dup 2)))
> +  (any_logic: (match_dup 1) (match_dup 2)))
>(clobber (reg:CC FLAGS_REG))])]
>  {
> -  operands[2] = lowpart_subreg (SImode, operands[2], mode);
> -  operands[1] = lowpart_subreg (SImode, operands[1], mode);
> -  operands[0] = lowpart_subreg (SImode, operands[0], mode);
> +  if (GET_CODE (operands[2]) == CONST_VECTOR)

Please use if (!register_operand (operands[2], mode)) instead.

Uros.

> +{
> +  HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2],
> +   mode);
> +  operands[2] = GEN_INT (val);
> +}
> +  else
> +operands[2] = lowpart_subreg (mode, operands[2], 
> mode);
> +  operands[1] = lowpart_subreg (mode, operands[1], mode);
> +  operands[0] = lowpart_subreg (mode, operands[0], mode);
>  })
>
>  (define_split
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index c71c453cceb..5f63a7d52f5 100644
> --- 

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-18 Thread Richard Biener via Gcc-patches
On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
 wrote:
>
> On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > >  wrote:
> > >>
> > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > >>  wrote:
> > >> >
> > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches
> > >> >  wrote:
> > >> > >
> > >> > > Hi Richard,
> > >> > > For the following test:
> > >> > >
> > >> > > svint32_t f2(int a, int b, int c, int d)
> > >> > > {
> > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > >> > >   return svld1rq_s32 (svptrue_b8 (), [0]);
> > >> > > }
> > >> > >
> > >> > > The compiler emits following ICE with -O3 -mcpu=generic+sve:
> > >> > > foo.c: In function ‘f2’:
> > >> > > foo.c:4:11: error: non-trivial conversion in ‘view_convert_expr’
> > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > >> > >   |   ^~
> > >> > > svint32_t
> > >> > > __Int32x4_t
> > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >> > > during GIMPLE pass: forwprop
> > >> > > dump file: foo.c.109t.forwprop2
> > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > >> > > 0xe9371f execute_function_todo
> > >> > > ../../gcc/gcc/passes.cc:2091
> > >> > > 0xe93ccb execute_todo
> > >> > > ../../gcc/gcc/passes.cc:2145
> > >> > >
> > >> > > This happens because, after folding svld1rq_s32 to vec_perm_expr, we 
> > >> > > have:
> > >> > >   int32x4_t v;
> > >> > >   __Int32x4_t _1;
> > >> > >   svint32_t _9;
> > >> > >   vector(4) int _11;
> > >> > >
> > >> > >:
> > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > >> > >   v_12 = _1;
> > >> > >   _11 = v_12;
> > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > >> > >   return _9;
> > >> > >
> > >> > > During forwprop, simplify_permutation simplifies vec_perm_expr to
> > >> > > view_convert_expr,
> > >> > > and the end result becomes:
> > >> > >   svint32_t _7;
> > >> > >   __Int32x4_t _8;
> > >> > >
> > >> > > ;;   basic block 2, loop depth 0
> > >> > > ;;pred:   ENTRY
> > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > >> > >   return _7;
> > >> > > ;;succ:   EXIT
> > >> > >
> > >> > > which causes the error duing verify_gimple since VIEW_CONVERT_EXPR
> > >> > > has incompatible types (svint32_t, int32x4_t).
> > >> > >
> > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > >> > > in simplify_permutation, if lhs and rhs have non compatible types,
> > >> > > which resolves ICE, but am not sure if it's the correct approach ?
> > >> >
> > >> > It for sure papers over the issue.  I think the error happens earlier,
> > >> > the V_C_E should have been built with the type of the VEC_PERM_EXPR
> > >> > which is the type of the LHS.  But then you probably run into the
> > >> > different sizes ICE (VLA vs constant size).  I think for this case you
> > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > >> > selecting the "low" part of the VLA vector.
> > >> Hi Richard,
> > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR to
> > >> represent dup operation
> > >> from fixed width to VLA vector. I am not sure how folding it to
> > >> BIT_FIELD_REF will work.
> > >> Could you please elaborate ?
> > >>
> > >> Also, the issue doesn't seem restricted to this case.
> > >> The following test case also ICE's during forwprop:
> > >> svint32_t foo()
> > >> {
> > >>   int32x4_t v = (int32x4_t) {1, 2, 3, 4};
> > >>   svint32_t v2 = svld1rq_s32 (svptrue_b8 (), [0]);
> > >>   return v2;
> > >> }
> > >>
> > >> foo2.c: In function ‘foo’:
> > >> foo2.c:9:1: error: non-trivial conversion in ‘vector_cst’
> > >> 9 | }
> > >>   | ^
> > >> svint32_t
> > >> int32x4_t
> > >> v2_4 = { 1, 2, 3, 4 };
> > >>
> > >> because simplify_permutation folds
> > >> VEC_PERM_EXPR< {1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, ...} >
> > >> into:
> > >> vector_cst {1, 2, 3, 4}
> > >>
> > >> and it complains during verify_gimple_assign_single because we don't
> > >> support assignment of vector_cst to VLA vector.
> > >>
> > >> I guess the issue really is that currently, only VEC_PERM_EXPR
> > >> supports lhs and rhs
> > >> to have vector types with differing lengths, and simplifying it to
> > >> other tree codes, like above,
> > >> will result in type errors ?
> > >
> > > That might be the case - Richard should know.
> >
> > I don't see anything particularly special about VEC_PERM_EXPR here,
> > or about the VLA vs. VLS thing.  We would have the same issue trying
> > to build a 128-bit vector from 2 64-bit vectors.  And there are other
> > tree codes whose input types are/can be different from their output
> > types.
> >
> > So it just seems like a normal type correctness issue: a VEC_PERM_EXPR
> > of type T needs to be 

Re: [x86_64 PATCH] PR target/106231: Optimize (any_extend:DI (ctz:SI ...)).

2022-07-18 Thread Uros Bizjak via Gcc-patches
On Sat, Jul 16, 2022 at 9:10 PM Roger Sayle  wrote:
>
>
> This patch resolves PR target/106231 by providing insns that recognize
> (zero_extend:DI (ctz:SI ...)) and (sign_extend:DI (ctz:SI ...)).  The
> result of ctz:SI is always between 0 and 32 (or undefined), so
> sign_extension is the same as zero_extension, and the result is already
> extended in the destination register.
>
> Things are a little complicated, because the existing implementation
> of *ctzsi2 handles multiple cases, including false dependencies, which
> we continue to support in this patch.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?
>
> 2022-07-16  Roger Sayle  
>
> gcc/ChangeLog
> PR target/106231
> * config/i386/i386.md (*ctzsidi2_ext): New insn_and_split
> to recognize any_extend:DI of ctz:SI which is implicitly extended.
> (*ctzsidi2_ext_falsedep): New define_insn to model a DImode
> extended ctz:SI that has preceding xor to break false dependency.
>
> gcc/testsuite/ChangeLog
> PR target/106231
> * gcc.target/i386/pr106231-1.c: New test case.
> * gcc.target/i386/pr106231-2.c: New test case.

OK.

Thanks,
Uros.

>
> Thanks in advance,
> Roger
> --
>


Re: [x86 PATCH] Fix issue with x86_64_const_vector_operand predicate.

2022-07-18 Thread Uros Bizjak via Gcc-patches
On Sat, Jul 16, 2022 at 2:06 PM Roger Sayle  wrote:
>
>
> This patch fixes (what I believe is) a latent bug in i386.md's
> x86_64_const_vector_operand define_predicate.  According to the
> documentation, when a predicate is called with rtx operand OP and
> machine_mode operand MODE, we can't shouldn't assume that the
> MODE is (or has been checked to be) GET_MODE (OP).
>
> The failure mode is that recog can call x86_64_const_vector_operand
> on an arbitrary CONST_VECTOR passing a MODE of V2QI_mode, but when
> the CONST_VECTOR is in fact V1TImode, it's unsafe to directly call
> ix86_convert_const_vector_to_integer, which assumes that the CONST_VECTOR
> contains CONST_INTs when it actually contains CONST_WIDE_INTs.  The
> checks in this define_predicate need to be testing OP's mode, and
> ideally confirming that this matches the passed in/specified MODE.
>
> This bug is currently latent, but adding an innocent/unrelated
> define_insn, such as "(set (reg:CCC FLAGS_REG) (const_int 0))" to
> i386.md can occasionally change the order in which genrecog generates
> its tests, then ICEing during bootstrap due to V1TI CONST_VECTORs.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target-board=unix{-m32},
> with no new failures.  Ok for mainline?
>
> 2022-07-16  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/predicates.md (x86_64_const_vector_operand):
> Check the operand's mode matches the specified mode argument.

OK.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>