date:20231121

Re: [RFC PATCH] Detecting lifetime-dse issues via Valgrind

2023-11-21 Thread Alexander Monakov



On Tue, 21 Nov 2023, Alexander Monakov wrote:

> I am concerned that if GCC ever learns to leave out the following access
> to 'this->foo', leaving tmp uninitialized, we will end up with:
> 
>   this->foo = 42;

Sorry, this store will be DSE'd out, of course, but my question stands.

Alexander

>   *this = { CLOBBER };
>   __valgrind_make_mem_undefined(this, sizeof *this);
>   int tmp(D);
>   return tmp(D); // uninitialized
> 
> and Valgrind will not report anything since the invalid load is optimized out.
> 
> With early instrumentation such optimization is not going to happen, since the
> builtin may modify *this.
> 
> Is my concern reasonable?
> 
> Thanks.
> Alexander

Re: [RFC PATCH] Detecting lifetime-dse issues via Valgrind

2023-11-21 Thread Richard Biener

On Tue, Nov 21, 2023 at 8:59 AM Alexander Monakov  wrote:
>
>
> On Tue, 21 Nov 2023, Alexander Monakov wrote:
>
> > I am concerned that if GCC ever learns to leave out the following access
> > to 'this->foo', leaving tmp uninitialized, we will end up with:
> >
> >   this->foo = 42;
>
> Sorry, this store will be DSE'd out, of course, but my question stands.

I think that would be a reasonable transform, yes.

> Alexander
>
> >   *this = { CLOBBER };
> >   __valgrind_make_mem_undefined(this, sizeof *this);
> >   int tmp(D);
> >   return tmp(D); // uninitialized

and this, too, btw. - the DSE actually happens, the latter transform not.
We specifically "opt out" of doing that for QOI to not make undefined
behavior worse.  The more correct transform would be to replace the
load with a __builtin_trap () during path isolation (or wire in path isolation
to value-numbering where we actually figure out there's no valid definition
to reach for the load).

So yes, if you want to avoid these kind of transforms earlier instrumentation
is better.

Richard.

> >
> > and Valgrind will not report anything since the invalid load is optimized 
> > out.
> >
> > With early instrumentation such optimization is not going to happen, since 
> > the
> > builtin may modify *this.
> >
> > Is my concern reasonable?
> >
> > Thanks.
> > Alexander

[PATCH] builtins: Fix fold_builtin_query clzg/ctzg side-effects handling [PR112639]

2023-11-21 Thread Jakub Jelinek

Hi!

As the testcase shows, I've missed one spot where initially the code thinks
it could use 2 argument IFN_CLZ/IFN_CTZ form, but then verifies it can't
because it doesn't have the right target value and turns it into the
arg0 ? arg1 : .C[LT]Z (arg0)
form.  That form evaluates the argument twice though and so needs save_expr,
which I've missed to call in that case.  In other cases where it is known
from the beginning that it will be needed (e.g. the __builtin_clzg case
on types smaller than unsigned int where we'll need to add an addend
to the clz value) or the unsigned __int128 expansion called save_expr
before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-21  Jakub Jelinek  

PR middle-end/112639
* builtins.cc (fold_builtin_bit_query): If arg0 has side-effects, arg1
is specified but cleared, call save_expr on arg0.

* gcc.dg/torture/pr112639.c: New test.

--- gcc/builtins.cc.jj  2023-11-20 10:36:18.642625716 +0100
+++ gcc/builtins.cc 2023-11-20 15:25:59.665718971 +0100
@@ -9819,6 +9819,8 @@ fold_builtin_bit_query (location_t loc,
  if (!direct_internal_fn_supported_p (ifn, arg0_type,
   OPTIMIZE_FOR_BOTH))
arg2 = NULL_TREE;
+ if (arg2 == NULL_TREE)
+   arg0 = save_expr (arg0);
}
   if (fcodei == END_BUILTINS || arg2)
call = build_call_expr_internal_loc (loc, ifn, integer_type_node,
--- gcc/testsuite/gcc.dg/torture/pr112639.c.jj  2023-11-20 15:32:24.994391656 
+0100
+++ gcc/testsuite/gcc.dg/torture/pr112639.c 2023-11-20 15:34:14.395882993 
+0100
@@ -0,0 +1,34 @@
+/* PR middle-end/112639 */
+/* { dg-do run } */
+
+unsigned long long b = 0;
+
+int
+foo (void)
+{
+  return __builtin_clzg (b++, __SIZEOF_LONG_LONG__ * __CHAR_BIT__);
+}
+
+int
+bar (void)
+{
+  return __builtin_ctzg (b++, __SIZEOF_LONG_LONG__ * __CHAR_BIT__);
+}
+
+int
+main ()
+{
+  if (foo () != __SIZEOF_LONG_LONG__ * __CHAR_BIT__ || b != 1)
+__builtin_abort ();
+  if (foo () != __SIZEOF_LONG_LONG__ * __CHAR_BIT__ - 1 || b != 2)
+__builtin_abort ();
+  if (foo () != __SIZEOF_LONG_LONG__ * __CHAR_BIT__ - 2 || b != 3)
+__builtin_abort ();
+  b = 0;
+  if (bar () != __SIZEOF_LONG_LONG__ * __CHAR_BIT__ || b != 1)
+__builtin_abort ();
+  if (bar () != 0 || b != 2)
+__builtin_abort ();
+  if (bar () != 1 || b != 3)
+__builtin_abort ();
+}

Jakub

Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693]

2023-11-21 Thread Jakub Jelinek

On Mon, Nov 20, 2023 at 08:01:45AM +, Richard Biener wrote:
> > On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote:
> > > As a follow-up, I'm considering changing in this routine the popcount
> > > call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
> > 
> > Here is the follow-up which does the rtx costs testing.
> > While having to tweak internal-fn.def so that POPCOUNT can have a custom
> > expand_POPCOUNT, I have noticed we are inconsistent, some DEF_INTERNAL*
> > macros (most of them) were undefined at the end of internal-fn.def (but in
> > some cases uselessly undefined again after inclusion), while others were not
> > (and sometimes undefined after the inclusion).  I've changed it to always
> > undefine at the end of internal-fn.def.
> 
> The last bit is cleanup that could go in independently.

Ok, I've committed the following patch first and the 2 patches adjusted on
top of that.  The 2 original ones were bootstrapped/regtested successfully
on x86_64-linux and i686-linux last night.

2023-11-20  Jakub Jelinek  

* internal-fn.def: Document missing DEF_INTERNAL* macros and make sure
they are all undefined at the end.
* internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn,
widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN
macros after inclusion of internal-fn.def.

--- gcc/internal-fn.def.jj  2023-11-17 15:51:02.642802521 +0100
+++ gcc/internal-fn.def 2023-11-18 10:12:10.329236626 +0100
@@ -33,9 +33,12 @@ along with GCC; see the file COPYING3.
  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SIGNED_OPTAB,
   UNSIGNED_OPTAB, TYPE)
  DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
+ DEF_INTERNAL_FLT_FLOATN_FN (NAME, FLAGS, OPTAB, TYPE)
  DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
  DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE)
  DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE)
+ DEF_INTERNAL_WIDENING_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB,
+TYPE)
 
where NAME is the name of the function, FLAGS is a set of
ECF_* flags and FNSPEC is a string describing functions fnspec.
@@ -572,6 +585,9 @@ DEF_INTERNAL_FN (DIVMODBITINT, ECF_LEAF,
 DEF_INTERNAL_FN (FLOATTOBITINT, ECF_LEAF | ECF_NOTHROW, ". O . . ")
 DEF_INTERNAL_FN (BITINTTOFLOAT, ECF_PURE | ECF_LEAF, ". R . ")
 
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_SIGNED_COND_FN
+#undef DEF_INTERNAL_COND_FN
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
--- gcc/internal-fn.cc.jj   2023-11-07 08:32:01.741254155 +0100
+++ gcc/internal-fn.cc  2023-11-18 11:06:31.740703230 +0100
@@ -102,8 +102,6 @@ lookup_hilo_internal_fn (internal_fn ifn
 {
 default:
   gcc_unreachable ();
-#undef DEF_INTERNAL_FN
-#undef DEF_INTERNAL_WIDENING_OPTAB_FN
 #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
 #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)  \
 case IFN_##NAME:   \
@@ -111,8 +109,6 @@ lookup_hilo_internal_fn (internal_fn ifn
   *hi = internal_fn (IFN_##NAME##_HI); \
   break;
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
-#undef DEF_INTERNAL_WIDENING_OPTAB_FN
 }
 }
 
@@ -129,8 +125,6 @@ lookup_evenodd_internal_fn (internal_fn
 {
 default:
   gcc_unreachable ();
-#undef DEF_INTERNAL_FN
-#undef DEF_INTERNAL_WIDENING_OPTAB_FN
 #define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
 #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)  \
 case IFN_##NAME:   \
@@ -138,8 +132,6 @@ lookup_evenodd_internal_fn (internal_fn
   *odd = internal_fn (IFN_##NAME##_ODD);   \
   break;
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
-#undef DEF_INTERNAL_WIDENING_OPTAB_FN
 }
 }
 
@@ -4261,7 +4253,6 @@ widening_fn_p (code_helper code)
   internal_fn fn = as_internal_fn ((combined_fn) code);
   switch (fn)
 {
-#undef DEF_INTERNAL_WIDENING_OPTAB_FN
 #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \
 case IFN_##NAME: \
 case IFN_##NAME##_HI:\
@@ -4270,7 +4261,6 @@ widening_fn_p (code_helper code)
 case IFN_##NAME##_ODD:   \
   return true;
 #include "internal-fn.def"
-#undef DEF_INTERNAL_WIDENING_OPTAB_FN
 
 default:
   return false;
@@ -4305,8 +4296,6 @@ set_edom_supported_p (void)
 expand_##TYPE##_optab_fn (fn, stmt, which_optab);  \
   }
 #include "internal-fn.def"
-#undef DEF_INTERNAL_OPTAB_FN
-#undef DEF_INTERNAL_SIGNED_OPTAB_FN
 
 /* Routines to expand each internal function, indexed by function number.
Each routine has the prototype:
@@ -4465,8 +4454,6 @@ get_len_internal_fn (internal_fn fn)
 {
   switch (fn)

Re: [PATCH] middle-end, v2: Add new value for vector types for __builtin_classify_type

2023-11-21 Thread Jakub Jelinek

On Mon, Nov 20, 2023 at 08:03:18AM +, Richard Biener wrote:
> OK.  Do we have to adjust any of our documentation for this?

I've done it this way.  We don't really document the exact values
in the documentation, so I think it is sufficient like that.
Committed to trunk.

2023-11-20  Jakub Jelinek  

gcc/
* typeclass.h (enum type_class): Add vector_type_class.
* builtins.cc (type_to_class): Return vector_type_class for
VECTOR_TYPE.
* doc/extend.texi (__builtin_classify_type): Mention bit-precise
integer types and vector types.
gcc/testsuite/
* c-c++-common/builtin-classify-type-1.c (main): Add tests for vector
types.

--- gcc/typeclass.h.jj  2023-09-06 17:28:24.238977355 +0200
+++ gcc/typeclass.h 2023-11-10 10:50:59.519007647 +0100
@@ -38,7 +38,7 @@ enum type_class
   record_type_class, union_type_class,
   array_type_class, string_type_class,
   lang_type_class, opaque_type_class,
-  bitint_type_class
+  bitint_type_class, vector_type_class
 };
 
 #endif /* GCC_TYPECLASS_H */
--- gcc/builtins.cc.jj  2023-11-09 09:17:40.230182483 +0100
+++ gcc/builtins.cc 2023-11-10 11:19:29.669129855 +0100
@@ -1859,6 +1859,7 @@ type_to_class (tree type)
 case LANG_TYPE:   return lang_type_class;
 case OPAQUE_TYPE:  return opaque_type_class;
 case BITINT_TYPE: return bitint_type_class;
+case VECTOR_TYPE: return vector_type_class;
 default:  return no_type_class;
 }
 }
--- gcc/doc/extend.texi.jj
+++ gcc/doc/extend.texi
@@ -14746,11 +14746,11 @@ The @code{__builtin_classify_type} returns a small 
integer with a category
 of @var{arg} argument's type, like void type, integer type, enumeral type,
 boolean type, pointer type, reference type, offset type, real type, complex
 type, function type, method type, record type, union type, array type,
-string type, etc.  When the argument is an expression, for
-backwards compatibility reason the argument is promoted like arguments
-passed to @code{...} in varargs function, so some classes are never returned
-in certain languages.  Alternatively, the argument of the built-in
-function can be a typename, such as the @code{typeof} specifier.
+string type, bit-precise integer type, vector type, etc.  When the argument
+is an expression, for backwards compatibility reason the argument is promoted
+like arguments passed to @code{...} in varargs function, so some classes are
+never returned in certain languages.  Alternatively, the argument of the
+built-in function can be a typename, such as the @code{typeof} specifier.
 
 @smallexample
 int a[2];
--- gcc/testsuite/c-c++-common/builtin-classify-type-1.c.jj 2023-09-26 
09:25:30.019599039 +0200
+++ gcc/testsuite/c-c++-common/builtin-classify-type-1.c2023-11-10 
11:02:01.927776922 +0100
@@ -22,6 +22,10 @@ main ()
   const char *p = (const char *) 0;
   float f = 0.0;
   _Complex double c = 0.0;
+  typedef int VI __attribute__((vector_size (4 * sizeof (int;
+  typedef float VF __attribute__((vector_size (4 * sizeof (int;
+  VI vi = { 0, 0, 0, 0 };
+  VF vf = { 0.0f, 0.0f, 0.0f, 0.0f };
 #ifdef __cplusplus
   struct T { void foo (); };
   int &r = a[0];
@@ -43,6 +47,8 @@ main ()
   static_assert (__builtin_classify_type (struct S) == 12, "");
   static_assert (__builtin_classify_type (union U) == 13, "");
   static_assert (__builtin_classify_type (int [2]) == 14, "");
+  static_assert (__builtin_classify_type (VI) == 19, "");
+  static_assert (__builtin_classify_type (VF) == 19, "");
   static_assert (__builtin_classify_type (__typeof__ (a[0])) == 1, "");
   static_assert (__builtin_classify_type (__typeof__ (e)) == 3, "");
   static_assert (__builtin_classify_type (__typeof__ (b)) == 4, "");
@@ -57,6 +63,8 @@ main ()
   static_assert (__builtin_classify_type (__typeof__ (s)) == 12, "");
   static_assert (__builtin_classify_type (__typeof__ (u)) == 13, "");
   static_assert (__builtin_classify_type (__typeof__ (a)) == 14, "");
+  static_assert (__builtin_classify_type (__typeof__ (vi)) == 19, "");
+  static_assert (__builtin_classify_type (__typeof__ (vf)) == 19, "");
 #ifndef __cplusplus
   static_assert (__builtin_classify_type (a[0]) == 1, "");
   static_assert (__builtin_classify_type (e) == 1, "");
@@ -102,4 +110,8 @@ main ()
 abort ();
   if (__builtin_classify_type (a) != 5)
 abort ();
+  if (__builtin_classify_type (vi) != 19)
+abort ();
+  if (__builtin_classify_type (vf) != 19)
+abort ();
 }


Jakub

Re: [PATCH 2/2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Jakub Jelinek

On Mon, Nov 20, 2023 at 03:33:31PM +0100, Sebastian Huber wrote:
> This change fixes issues like this:
> 
>   gcc.dg/gomp/pr27573.c: In function ‘main._omp_fn.0’:
>   gcc.dg/gomp/pr27573.c:19:1: error: non-trivial conversion in ‘ssa_name’
>  19 | }
> | ^
>   long int
>   long unsigned int
>   # .MEM_19 = VDEF <.MEM_18>
>   __gcov7.main._omp_fn.0[0] = PROF_time_profile_12;
>   during IPA pass: profile
>   gcc.dg/gomp/pr27573.c:19:1: internal compiler error: verify_gimple failed
> 
> gcc/ChangeLog:
> 
>   PR middle-end/112634
> 
>   * tree-profile.cc (gen_assign_counter_update): Cast the unsigned result 
> type of
>   __atomic_add_fetch() to the signed counter type.
> ---
>  gcc/tree-profile.cc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index 68db09f6189..54938e1d165 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -284,7 +284,9 @@ gen_assign_counter_update (gimple_stmt_iterator *gsi, 
> gcall *call, tree func,
>tree tmp = make_temp_ssa_name (result_type, NULL, name);
>gimple_set_lhs (call, tmp);
>gsi_insert_after (gsi, call, GSI_NEW_STMT);
> -  gassign *assign = gimple_build_assign (result, tmp);
> +  gassign *assign = gimple_build_assign (result,
> +  build_int_cst (TREE_TYPE (result),
> + tmp));

This can't be correct.
tmp is a SSA_NAME, so calling build_int_cst on it is not appropriate, the
second argument should be some unsigned HOST_WIDE_INT value.
If result_type is different type from TREE_TYPE (result), but both are
integer types, then you want
  gassign *assing = gimple_build_assign (result, NOP_EXPR, tmp);
or so.

Jakub

Re: [PATCH] c: Add __builtin_stdc_bit_{width,floor,ceil} builtins

2023-11-21 Thread Jakub Jelinek

On Mon, Nov 20, 2023 at 02:55:33PM +, Joseph Myers wrote:
> On Sat, 18 Nov 2023, Jakub Jelinek wrote:
> 
> > +@defbuiltin{@var{type} __builtin_stdc_bit_ceil (@var{type} @var{arg})}
> > +The @code{__builtin_stdc_bit_ceil} function is available only
> > +in C.  It is type-generic, the argument can be any unsigned integer
> > +(standard, extended or bit-precise).  No integral argument promotions are
> > +performed on the argument.  It is equivalent to
> > +@code{@var{arg} <= 1 ? (@var{type}) 1
> > +: (@var{type}) 1 << (@var{prec} - __builtin_clzg ((@var{type}) (@var{arg} 
> > - 1)))}
> > +where @var{prec} is bit width of @var{type}, except that side-effects
> > +in @var{arg} are evaluated just once.
> > +@enddefbuiltin
> 
> Note that stdc_bit_ceil now has defined behavior (return 0) on overflow: 
> CD2 comment FR-135 was accepted for the DIS at the June WG14 meeting.  
> This affects both the documentation and the implementation, as they need 
> to avoid an undefined shift by the width of the type.  That's why my 
> stdbit.h implementations have two shifts (not claiming that's necessarily 
> the optimal way of ensuring the correct result in the overflow case).
> 
>   return __x <= 1 ? 1 : ((uint64_t) 1) << (__bw64_inline (__x - 1) - 1) << 1;

So
  return __x <= 1 ? 1 : ((uint64_t) 2) << (__bw64_inline (__x - 1) - 1);
then?

Jakub

Re: [PATCH] c: Add __builtin_stdc_bit_{width,floor,ceil} builtins

2023-11-21 Thread Jakub Jelinek

On Mon, Nov 20, 2023 at 02:55:33PM +, Joseph Myers wrote:
> On Sat, 18 Nov 2023, Jakub Jelinek wrote:
> 
> > +@defbuiltin{@var{type} __builtin_stdc_bit_ceil (@var{type} @var{arg})}
> > +The @code{__builtin_stdc_bit_ceil} function is available only
> > +in C.  It is type-generic, the argument can be any unsigned integer
> > +(standard, extended or bit-precise).  No integral argument promotions are
> > +performed on the argument.  It is equivalent to
> > +@code{@var{arg} <= 1 ? (@var{type}) 1
> > +: (@var{type}) 1 << (@var{prec} - __builtin_clzg ((@var{type}) (@var{arg} 
> > - 1)))}
> > +where @var{prec} is bit width of @var{type}, except that side-effects
> > +in @var{arg} are evaluated just once.
> > +@enddefbuiltin
> 
> Note that stdc_bit_ceil now has defined behavior (return 0) on overflow: 
> CD2 comment FR-135 was accepted for the DIS at the June WG14 meeting.  
> This affects both the documentation and the implementation, as they need 
> to avoid an undefined shift by the width of the type.  That's why my 
> stdbit.h implementations have two shifts (not claiming that's necessarily 
> the optimal way of ensuring the correct result in the overflow case).
> 
>   return __x <= 1 ? 1 : ((uint64_t) 1) << (__bw64_inline (__x - 1) - 1) << 1;

Given the feedback from Richi I've in the meantime reworked the patch to
add all 14 builtins (but because the enum rid is very close to 256 values
and with 14 new ones was already 7 too many, used one RID value for all 14
builtins (different spellings)).

Will need to rework it for CD2 FR-135 then...

2023-11-20  Jakub Jelinek  

gcc/
* doc/extend.texi (__builtin_stdc_bit_ceil, __builtin_stdc_bit_floor,
__builtin_stdc_bit_width, __builtin_stdc_count_ones,
__builtin_stdc_count_zeros, __builtin_stdc_first_leading_one,
__builtin_stdc_first_leading_zero, __builtin_stdc_first_trailing_one,
__builtin_stdc_first_trailing_zero, __builtin_stdc_has_single_bit,
__builtin_stdc_leading_ones, __builtin_stdc_leading_zeros,
__builtin_stdc_trailing_ones, __builtin_stdc_trailing_zeros): Document.
gcc/c-family/
* c-common.h (enum rid): Add RID_BUILTIN_STDC: New.
* c-common.cc (c_common_reswords): Add __builtin_stdc_bit_ceil,
__builtin_stdc_bit_floor, __builtin_stdc_bit_width,
__builtin_stdc_count_ones, __builtin_stdc_count_zeros,
__builtin_stdc_first_leading_one, __builtin_stdc_first_leading_zero,
__builtin_stdc_first_trailing_one, __builtin_stdc_first_trailing_zero,
__builtin_stdc_has_single_bit, __builtin_stdc_leading_ones,
__builtin_stdc_leading_zeros, __builtin_stdc_trailing_ones and
__builtin_stdc_trailing_zeros.  Move __builtin_assoc_barrier
alphabetically earlier.
gcc/c/
* c-parser.cc (c_parser_postfix_expression): Handle RID_BUILTIN_STDC.
* c-decl.cc (names_builtin_p): Likewise.
gcc/testsuite/
* gcc.dg/builtin-stdc-bit-1.c: New test.
* gcc.dg/builtin-stdc-bit-2.c: New test.

--- gcc/c-family/c-common.h.jj  2023-11-20 09:49:34.760674813 +0100
+++ gcc/c-family/c-common.h 2023-11-20 13:55:50.691612365 +0100
@@ -106,10 +106,10 @@ enum rid
   /* C extensions */
   RID_ASM,   RID_TYPEOF,   RID_TYPEOF_UNQUAL, RID_ALIGNOF,  RID_ATTRIBUTE,
   RID_VA_ARG,
-  RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX, 
RID_BUILTIN_SHUFFLE,
-  RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
-  RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_ASSOC_BARRIER,
+  RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,RID_CHOOSE_EXPR,
+  RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,   RID_BUILTIN_SHUFFLE,
+  RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,  RID_BUILTIN_TGMATH,
+  RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_ASSOC_BARRIER,  RID_BUILTIN_STDC,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,

   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
--- gcc/c-family/c-common.cc.jj 2023-11-20 09:49:34.691675777 +0100
+++ gcc/c-family/c-common.cc2023-11-20 13:55:40.104758727 +0100
@@ -380,6 +380,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",   RID_ATTRIBUTE,  0 },
   { "__auto_type", RID_AUTO_TYPE,  D_CONLY },
   { "__builtin_addressof", RID_ADDRESSOF, D_CXXONLY },
+  { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 },
   { "__builtin_bit_cast", RID_BUILTIN_BIT_CAST, D_CXXONLY },
   { "__builtin_call_with_static_chain",
 RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
@@ -388,9 +389,22 @@ const struct c_common_resword c_common_r
   { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
-  { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shuff

Re: [PATCH v5] gcc: Introduce -fhardened

2023-11-21 Thread Jakub Jelinek

On Thu, Nov 16, 2023 at 03:51:22PM -0500, Marek Polacek wrote:
> Thanks, that's a good point.  In this version I've added a target hook.
> 
> On my system, -D_FORTIFY_SOURCE=3 will be used, and if I remove
> linux_fortify_source_default_level it's =2 as expected.
> 
> The only problem was that it doesn't seem to be possible to use
> targetm. in opts.cc -- I get an undefined reference.  But since
> the opts.cc use is for --help only, it's not a big deal either way.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> In 
> I proposed -fhardened, a new umbrella option that enables a reasonable set
> of hardening flags.  The read of the room seems to be that the option
> would be useful.  So here's a patch implementing that option.
> 
> Currently, -fhardened enables:
> 
>   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
>   -D_GLIBCXX_ASSERTIONS
>   -ftrivial-auto-var-init=zero
>   -fPIE  -pie  -Wl,-z,relro,-z,now
>   -fstack-protector-strong
>   -fstack-clash-protection
>   -fcf-protection=full (x86 GNU/Linux only)
> 
> -fhardened will not override options that were specified on the command line
> (before or after -fhardened).  For example,
> 
>  -D_FORTIFY_SOURCE=1 -fhardened
> 
> means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> 
>   -fhardened -fstack-protector
> 
> will not enable -fstack-protector-strong.
> 
> Currently, -fhardened is only supported on GNU/Linux.
> 
> In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
> to anything.  This patch provides -Whardened, enabled by default, which
> warns when -fhardened couldn't enable a particular option.  I think most
> often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
> were not enabled.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-opts.cc: Include "target.h".
>   (c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
>   and _GLIBCXX_ASSERTIONS.
> 
> gcc/ChangeLog:
> 
>   * common.opt (Whardened, fhardened): New options.
>   * config.in: Regenerate.
>   * config/bpf/bpf.cc: Include "opts.h".
>   (bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
>   not inform that -fstack-protector does not work.
>   * config/i386/i386-options.cc (ix86_option_override_internal): When
>   -fhardened, maybe enable -fcf-protection=full.
>   * config/linux-protos.h (linux_fortify_source_default_level): Declare.
>   * config/linux.cc (linux_fortify_source_default_level): New.
>   * config/linux.h (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Redefine.
>   * configure: Regenerate.
>   * configure.ac: Check if the linker supports '-z now' and '-z relro'.
>   Check if -fhardened is supported on $target_os.
>   * doc/invoke.texi: Document -fhardened and -Whardened.
>   * doc/tm.texi: Regenerate.
>   * doc/tm.texi.in (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Add.
>   * gcc.cc (driver_handle_option): Remember if any link options or -static
>   were specified on the command line.
>   (process_command): When -fhardened, maybe enable -pie and
>   -Wl,-z,relro,-z,now.
>   * opts.cc (flag_stack_protector_set_by_fhardened_p): New global.
>   (finish_options): When -fhardened, enable
>   -ftrivial-auto-var-init=zero and -fstack-protector-strong.
>   (print_help_hardened): New.
>   (print_help): Call it.
>   * target.def (fortify_source_default_level): New target hook.
>   * targhooks.cc (default_fortify_source_default_level): New.
>   * targhooks.h (default_fortify_source_default_level): Declare.
>   * toplev.cc (process_options): When -fhardened, enable
>   -fstack-clash-protection.  If flag_stack_protector_set_by_fhardened_p,
>   do not warn that -fstack-protector not supported for this target.
>   Don't enable -fhardened when !HAVE_FHARDENED_SUPPORT.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.misc-tests/help.exp: Test -fhardened.
>   * c-c++-common/fhardened-1.S: New test.
>   * c-c++-common/fhardened-1.c: New test.
>   * c-c++-common/fhardened-10.c: New test.
>   * c-c++-common/fhardened-11.c: New test.
>   * c-c++-common/fhardened-12.c: New test.
>   * c-c++-common/fhardened-13.c: New test.
>   * c-c++-common/fhardened-14.c: New test.
>   * c-c++-common/fhardened-15.c: New test.
>   * c-c++-common/fhardened-2.c: New test.
>   * c-c++-common/fhardened-3.c: New test.
>   * c-c++-common/fhardened-4.c: New test.
>   * c-c++-common/fhardened-5.c: New test.
>   * c-c++-common/fhardened-6.c: New test.
>   * c-c++-common/fhardened-7.c: New test.
>   * c-c++-common/fhardened-8.c: New test.
>   * c-c++-common/fhardened-9.c: New test.
>   * gcc.target/i386/cf_check-6.c: New test.

LGTM.

Jakub

Re: [PATCH] c: Add __builtin_stdc_bit_{width,floor,ceil} builtins

2023-11-21 Thread Jakub Jelinek

On Mon, Nov 20, 2023 at 04:03:07PM +0100, Jakub Jelinek wrote:
> > Note that stdc_bit_ceil now has defined behavior (return 0) on overflow: 
> > CD2 comment FR-135 was accepted for the DIS at the June WG14 meeting.  
> > This affects both the documentation and the implementation, as they need 
> > to avoid an undefined shift by the width of the type.  That's why my 
> > stdbit.h implementations have two shifts (not claiming that's necessarily 
> > the optimal way of ensuring the correct result in the overflow case).
> > 
> >   return __x <= 1 ? 1 : ((uint64_t) 1) << (__bw64_inline (__x - 1) - 1) << 
> > 1;
> 
> Given the feedback from Richi I've in the meantime reworked the patch to
> add all 14 builtins (but because the enum rid is very close to 256 values
> and with 14 new ones was already 7 too many, used one RID value for all 14
> builtins (different spellings)).
> 
> Will need to rework it for CD2 FR-135 then...

Here it is updated to use that
x <= 1 ? 1 : ((type) 2) << (prec - 1 - __builtin_clzg ((type) (x - 1)))
I've mentioned.

2023-11-20  Jakub Jelinek  

gcc/
* doc/extend.texi (__builtin_stdc_bit_ceil, __builtin_stdc_bit_floor,
__builtin_stdc_bit_width, __builtin_stdc_count_ones,
__builtin_stdc_count_zeros, __builtin_stdc_first_leading_one,
__builtin_stdc_first_leading_zero, __builtin_stdc_first_trailing_one,
__builtin_stdc_first_trailing_zero, __builtin_stdc_has_single_bit,
__builtin_stdc_leading_ones, __builtin_stdc_leading_zeros,
__builtin_stdc_trailing_ones, __builtin_stdc_trailing_zeros): Document.
gcc/c-family/
* c-common.h (enum rid): Add RID_BUILTIN_STDC: New.
* c-common.cc (c_common_reswords): Add __builtin_stdc_bit_ceil,
__builtin_stdc_bit_floor, __builtin_stdc_bit_width,
__builtin_stdc_count_ones, __builtin_stdc_count_zeros,
__builtin_stdc_first_leading_one, __builtin_stdc_first_leading_zero,
__builtin_stdc_first_trailing_one, __builtin_stdc_first_trailing_zero,
__builtin_stdc_has_single_bit, __builtin_stdc_leading_ones,
__builtin_stdc_leading_zeros, __builtin_stdc_trailing_ones and
__builtin_stdc_trailing_zeros.  Move __builtin_assoc_barrier
alphabetically earlier.
gcc/c/
* c-parser.cc (c_parser_postfix_expression): Handle RID_BUILTIN_STDC.
* c-decl.cc (names_builtin_p): Likewise.
gcc/testsuite/
* gcc.dg/builtin-stdc-bit-1.c: New test.
* gcc.dg/builtin-stdc-bit-2.c: New test.

--- gcc/c-family/c-common.h.jj  2023-11-20 09:49:34.760674813 +0100
+++ gcc/c-family/c-common.h 2023-11-20 13:55:50.691612365 +0100
@@ -106,10 +106,10 @@ enum rid
   /* C extensions */
   RID_ASM,   RID_TYPEOF,   RID_TYPEOF_UNQUAL, RID_ALIGNOF,  RID_ATTRIBUTE,
   RID_VA_ARG,
-  RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX, 
RID_BUILTIN_SHUFFLE,
-  RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
-  RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_ASSOC_BARRIER,
+  RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,RID_CHOOSE_EXPR,
+  RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,   RID_BUILTIN_SHUFFLE,
+  RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,  RID_BUILTIN_TGMATH,
+  RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_ASSOC_BARRIER,  RID_BUILTIN_STDC,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
 
   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
--- gcc/c-family/c-common.cc.jj 2023-11-20 09:49:34.691675777 +0100
+++ gcc/c-family/c-common.cc2023-11-20 13:55:40.104758727 +0100
@@ -380,6 +380,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",   RID_ATTRIBUTE,  0 },
   { "__auto_type", RID_AUTO_TYPE,  D_CONLY },
   { "__builtin_addressof", RID_ADDRESSOF, D_CXXONLY },
+  { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 },
   { "__builtin_bit_cast", RID_BUILTIN_BIT_CAST, D_CXXONLY },
   { "__builtin_call_with_static_chain",
 RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
@@ -388,9 +389,22 @@ const struct c_common_resword c_common_r
   { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
-  { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
+  { "__builtin_stdc_bit_ceil", RID_BUILTIN_STDC, D_CONLY },
+  { "__builtin_stdc_bit_floor", RID_BUILTIN_STDC, D_CONLY },
+  { "__builtin_stdc_bit_width", RID_BUILTIN_STDC, D_CONLY },
+  { "__builtin_stdc_count_ones", RID_BUILTIN_STDC, D_CONLY },
+  { "__builtin_stdc_count_zeros", RID_BUILTIN_STDC, D_CONLY },
+  { "__builtin_stdc_first_leading_one", RID_BUILTIN_STDC, D_CONLY },
+  { "__builtin_stdc_first_leading_zero", RID_BUILTIN_STDC, D_CONLY },
+  { "__builtin_std

Re: [PATCH] c: Add __builtin_stdc_bit_{width,floor,ceil} builtins

2023-11-21 Thread Jakub Jelinek

On Mon, Nov 20, 2023 at 04:29:47PM +0100, Jakub Jelinek wrote:
> On Mon, Nov 20, 2023 at 04:03:07PM +0100, Jakub Jelinek wrote:
> > > Note that stdc_bit_ceil now has defined behavior (return 0) on overflow: 
> > > CD2 comment FR-135 was accepted for the DIS at the June WG14 meeting.  
> > > This affects both the documentation and the implementation, as they need 
> > > to avoid an undefined shift by the width of the type.  That's why my 
> > > stdbit.h implementations have two shifts (not claiming that's necessarily 
> > > the optimal way of ensuring the correct result in the overflow case).
> > > 
> > >   return __x <= 1 ? 1 : ((uint64_t) 1) << (__bw64_inline (__x - 1) - 1) 
> > > << 1;
> > 
> > Given the feedback from Richi I've in the meantime reworked the patch to
> > add all 14 builtins (but because the enum rid is very close to 256 values
> > and with 14 new ones was already 7 too many, used one RID value for all 14
> > builtins (different spellings)).
> > 
> > Will need to rework it for CD2 FR-135 then...
> 
> Here it is updated to use that
> x <= 1 ? 1 : ((type) 2) << (prec - 1 - __builtin_clzg ((type) (x - 1)))
> I've mentioned.

Now successfully bootstrapped/regtested on x86_64-linux and i686-linux
on top of the 
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637540.html
patch (the bug was discovered while working on this patch).

> 2023-11-20  Jakub Jelinek  
> 
> gcc/
>   * doc/extend.texi (__builtin_stdc_bit_ceil, __builtin_stdc_bit_floor,
>   __builtin_stdc_bit_width, __builtin_stdc_count_ones,
>   __builtin_stdc_count_zeros, __builtin_stdc_first_leading_one,
>   __builtin_stdc_first_leading_zero, __builtin_stdc_first_trailing_one,
>   __builtin_stdc_first_trailing_zero, __builtin_stdc_has_single_bit,
>   __builtin_stdc_leading_ones, __builtin_stdc_leading_zeros,
>   __builtin_stdc_trailing_ones, __builtin_stdc_trailing_zeros): Document.
> gcc/c-family/
>   * c-common.h (enum rid): Add RID_BUILTIN_STDC: New.
>   * c-common.cc (c_common_reswords): Add __builtin_stdc_bit_ceil,
>   __builtin_stdc_bit_floor, __builtin_stdc_bit_width,
>   __builtin_stdc_count_ones, __builtin_stdc_count_zeros,
>   __builtin_stdc_first_leading_one, __builtin_stdc_first_leading_zero,
>   __builtin_stdc_first_trailing_one, __builtin_stdc_first_trailing_zero,
>   __builtin_stdc_has_single_bit, __builtin_stdc_leading_ones,
>   __builtin_stdc_leading_zeros, __builtin_stdc_trailing_ones and
>   __builtin_stdc_trailing_zeros.  Move __builtin_assoc_barrier
>   alphabetically earlier.
> gcc/c/
>   * c-parser.cc (c_parser_postfix_expression): Handle RID_BUILTIN_STDC.
>   * c-decl.cc (names_builtin_p): Likewise.
> gcc/testsuite/
>   * gcc.dg/builtin-stdc-bit-1.c: New test.
>   * gcc.dg/builtin-stdc-bit-2.c: New test.

Jakub

Re: [PATCH] builtins: Fix fold_builtin_query clzg/ctzg side-effects handling [PR112639]

2023-11-21 Thread Richard Biener

On Tue, 21 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> As the testcase shows, I've missed one spot where initially the code thinks
> it could use 2 argument IFN_CLZ/IFN_CTZ form, but then verifies it can't
> because it doesn't have the right target value and turns it into the
> arg0 ? arg1 : .C[LT]Z (arg0)
> form.  That form evaluates the argument twice though and so needs save_expr,
> which I've missed to call in that case.  In other cases where it is known
> from the beginning that it will be needed (e.g. the __builtin_clzg case
> on types smaller than unsigned int where we'll need to add an addend
> to the clz value) or the unsigned __int128 expansion called save_expr
> before.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-11-21  Jakub Jelinek  
> 
>   PR middle-end/112639
>   * builtins.cc (fold_builtin_bit_query): If arg0 has side-effects, arg1
>   is specified but cleared, call save_expr on arg0.
> 
>   * gcc.dg/torture/pr112639.c: New test.
> 
> --- gcc/builtins.cc.jj2023-11-20 10:36:18.642625716 +0100
> +++ gcc/builtins.cc   2023-11-20 15:25:59.665718971 +0100
> @@ -9819,6 +9819,8 @@ fold_builtin_bit_query (location_t loc,
> if (!direct_internal_fn_supported_p (ifn, arg0_type,
>  OPTIMIZE_FOR_BOTH))
>   arg2 = NULL_TREE;
> +   if (arg2 == NULL_TREE)
> + arg0 = save_expr (arg0);
>   }
>if (fcodei == END_BUILTINS || arg2)
>   call = build_call_expr_internal_loc (loc, ifn, integer_type_node,
> --- gcc/testsuite/gcc.dg/torture/pr112639.c.jj2023-11-20 
> 15:32:24.994391656 +0100
> +++ gcc/testsuite/gcc.dg/torture/pr112639.c   2023-11-20 15:34:14.395882993 
> +0100
> @@ -0,0 +1,34 @@
> +/* PR middle-end/112639 */
> +/* { dg-do run } */
> +
> +unsigned long long b = 0;
> +
> +int
> +foo (void)
> +{
> +  return __builtin_clzg (b++, __SIZEOF_LONG_LONG__ * __CHAR_BIT__);
> +}
> +
> +int
> +bar (void)
> +{
> +  return __builtin_ctzg (b++, __SIZEOF_LONG_LONG__ * __CHAR_BIT__);
> +}
> +
> +int
> +main ()
> +{
> +  if (foo () != __SIZEOF_LONG_LONG__ * __CHAR_BIT__ || b != 1)
> +__builtin_abort ();
> +  if (foo () != __SIZEOF_LONG_LONG__ * __CHAR_BIT__ - 1 || b != 2)
> +__builtin_abort ();
> +  if (foo () != __SIZEOF_LONG_LONG__ * __CHAR_BIT__ - 2 || b != 3)
> +__builtin_abort ();
> +  b = 0;
> +  if (bar () != __SIZEOF_LONG_LONG__ * __CHAR_BIT__ || b != 1)
> +__builtin_abort ();
> +  if (bar () != 0 || b != 2)
> +__builtin_abort ();
> +  if (bar () != 1 || b != 3)
> +__builtin_abort ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] rtl-optimization: Modify loop live data with livein of loop header

2023-11-21 Thread Ajit Agarwal

Hello All:

This patch marked LOOP_DATA->live as the livein at the loop header basic
block. This is because Livein at each basic block is live in at the loop header.

Bootstrapped and regtested on powerpc64-linux-gnu.

SPEC CPU 2017 benchmarks score is better than trunk wit this
change for INT and FP benchmarks.

THanks & Regards
Ajit


rtl-optimization: Modify loop live data with livein of loop header

Livein at each basic block is live in at the loop header.
Marked LOOP_DATA->live as the livein at the loop header basic
block.

2023-11-21  Ajit Kumar Agarwal  

gcc/ChangeLog:

* loop-invariant.cc (calculate_loop_reg_pressure): Mark
LOOP_DATA->live as the livein at the loop header basic block.
---
 gcc/loop-invariant.cc | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/loop-invariant.cc b/gcc/loop-invariant.cc
index a9b156682bc..c15e6168d5d 100644
--- a/gcc/loop-invariant.cc
+++ b/gcc/loop-invariant.cc
@@ -2169,17 +2169,18 @@ calculate_loop_reg_pressure (void)
   }
   ira_setup_eliminable_regset ();
   bitmap_initialize (&curr_regs_live, ®_obstack);
+
+  /* Livein (loop_hdr) is live at each of the loop basic blocks.  */
+  for (auto loop : loops_list (cfun, 0))
+if (loop->aux == NULL)
+  bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN (loop->header));
+
   FOR_EACH_BB_FN (bb, cfun)
 {
   curr_loop = bb->loop_father;
   if (curr_loop == current_loops->tree_root)
continue;
 
-  for (class loop *loop = curr_loop;
-  loop != current_loops->tree_root;
-  loop = loop_outer (loop))
-   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN (bb));
-
   bitmap_copy (&curr_regs_live, DF_LR_IN (bb));
   for (i = 0; i < ira_pressure_classes_num; i++)
curr_reg_pressure[ira_pressure_classes[i]] = 0;
-- 
2.39.3

Re: [PATCH] c-family, middle-end: Add __builtin_c[lt]zg (arg, 0ULL) exception

2023-11-21 Thread Jakub Jelinek

On Tue, Nov 21, 2023 at 12:12:18AM +, Joseph Myers wrote:
> On Mon, 20 Nov 2023, Jakub Jelinek wrote:
> 
> > and be done with that.  If there is an agreement we should do that
> > for all 14 rather than just those 3 + the 2 ugly hacks (__builtin_c{l,t}zg 
> > with
> > 0ULL second argument and __builtin_bit_complement), I can change the
> 
> I tend to agree with the "ugly hack" description of the 0ULL second 
> argument special case.

It could be done differently, e.g. by adding optional third argument with
the meaning that if that third argument is present and constant non-zero,
the return value on 0 first argument would be second argument + bitsize
of the first argument, while if the third argument is not present or
constant zero, it is the current behavior of second argument on constant
zero.

Given that I've already posted all 14 __builtin_stdc_*, that isn't
strictly needed for stdbit.h, the only question is if it would be useful
for users in other cases.
If there is the _Generic extension or say even just a new builtin like
__builtin_save_expr (x, y)
which would introduce some special fixed identifier when evaluating y for
the value (but not side-effects) of x, that wouldn't be needed and I agree
going for something like that would be cleaner.

Jakub

[PATCH] testsuite: Fix up pr111309-2.c on arm [PR111309]

2023-11-21 Thread Jakub Jelinek

Hi!

ARM defaults to -fshort-enums and the following testcase FAILs there in 2
lines.  The difference is that in C++, E0 has enum E type, which normally
has unsigned int underlying type, so it isn't int nor something that
promotes to int, which is why we diagnose it (in C it is promoted to int).
But with -fshort-enums, the underlying type is unsigned char in that case,
which promotes to int just fine.

The following patch adjusts the expectations, such that we don't expect
it on arm or when people manually test with -fshort-enums.

Tested on x86_64-linux and i686-linux, ok for trunk?

2023-11-21  Jakub Jelinek  

PR c/111309
* c-c++-common/pr111309-2.c (foo): Don't expect errors for C++ with
-fshort-enums if second argument is E0.

--- gcc/testsuite/c-c++-common/pr111309-2.c.jj  2023-11-14 10:52:16.191276028 
+0100
+++ gcc/testsuite/c-c++-common/pr111309-2.c 2023-11-20 17:52:30.606386073 
+0100
@@ -32,7 +32,7 @@ foo (void)
   __builtin_clzg (0U, 2LL);/* { dg-error "does not have 'int' type" } */
   __builtin_clzg (0U, 2U); /* { dg-error "does not have 'int' type" } */
   __builtin_clzg (0U, true);
-  __builtin_clzg (0U, E0); /* { dg-error "does not have 'int' type" "" { 
target c++ } } */
+  __builtin_clzg (0U, E0); /* { dg-error "does not have 'int' type" "" { 
target { c++ && { ! short_enums } } } } */
   __builtin_ctzg ();   /* { dg-error "too few arguments" } */
   __builtin_ctzg (0U, 1, 2);   /* { dg-error "too many arguments" } */
   __builtin_ctzg (0);  /* { dg-error "has signed type" } */
@@ -51,7 +51,7 @@ foo (void)
   __builtin_ctzg (0U, 2LL);/* { dg-error "does not have 'int' type" } */
   __builtin_ctzg (0U, 2U); /* { dg-error "does not have 'int' type" } */
   __builtin_ctzg (0U, true);
-  __builtin_ctzg (0U, E0); /* { dg-error "does not have 'int' type" "" { 
target c++ } } */
+  __builtin_ctzg (0U, E0); /* { dg-error "does not have 'int' type" "" { 
target { c++ && { ! short_enums } } } } */
   __builtin_clrsbg (); /* { dg-error "too few arguments" } */
   __builtin_clrsbg (0, 1); /* { dg-error "too many arguments" } */
   __builtin_clrsbg (0U);   /* { dg-error "has unsigned type" } */

Jakub

Re: [PATCH] testsuite: Fix up pr111309-2.c on arm [PR111309]

2023-11-21 Thread Richard Biener

On Tue, 21 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> ARM defaults to -fshort-enums and the following testcase FAILs there in 2
> lines.  The difference is that in C++, E0 has enum E type, which normally
> has unsigned int underlying type, so it isn't int nor something that
> promotes to int, which is why we diagnose it (in C it is promoted to int).
> But with -fshort-enums, the underlying type is unsigned char in that case,
> which promotes to int just fine.
> 
> The following patch adjusts the expectations, such that we don't expect
> it on arm or when people manually test with -fshort-enums.
> 
> Tested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2023-11-21  Jakub Jelinek  
> 
>   PR c/111309
>   * c-c++-common/pr111309-2.c (foo): Don't expect errors for C++ with
>   -fshort-enums if second argument is E0.
> 
> --- gcc/testsuite/c-c++-common/pr111309-2.c.jj2023-11-14 
> 10:52:16.191276028 +0100
> +++ gcc/testsuite/c-c++-common/pr111309-2.c   2023-11-20 17:52:30.606386073 
> +0100
> @@ -32,7 +32,7 @@ foo (void)
>__builtin_clzg (0U, 2LL);  /* { dg-error "does not have 'int' type" } */
>__builtin_clzg (0U, 2U);   /* { dg-error "does not have 'int' type" } */
>__builtin_clzg (0U, true);
> -  __builtin_clzg (0U, E0);   /* { dg-error "does not have 'int' type" "" { 
> target c++ } } */
> +  __builtin_clzg (0U, E0);   /* { dg-error "does not have 'int' type" "" { 
> target { c++ && { ! short_enums } } } } */
>__builtin_ctzg (); /* { dg-error "too few arguments" } */
>__builtin_ctzg (0U, 1, 2); /* { dg-error "too many arguments" } */
>__builtin_ctzg (0);/* { dg-error "has signed type" } */
> @@ -51,7 +51,7 @@ foo (void)
>__builtin_ctzg (0U, 2LL);  /* { dg-error "does not have 'int' type" } */
>__builtin_ctzg (0U, 2U);   /* { dg-error "does not have 'int' type" } */
>__builtin_ctzg (0U, true);
> -  __builtin_ctzg (0U, E0);   /* { dg-error "does not have 'int' type" "" { 
> target c++ } } */
> +  __builtin_ctzg (0U, E0);   /* { dg-error "does not have 'int' type" "" { 
> target { c++ && { ! short_enums } } } } */
>__builtin_clrsbg ();   /* { dg-error "too few arguments" } */
>__builtin_clrsbg (0, 1);   /* { dg-error "too many arguments" } */
>__builtin_clrsbg (0U); /* { dg-error "has unsigned type" } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [RFC PATCH] Detecting lifetime-dse issues via Valgrind

2023-11-21 Thread Alexander Monakov



On Tue, 21 Nov 2023, Richard Biener wrote:

> and this, too, btw. - the DSE actually happens, the latter transform not.
> We specifically "opt out" of doing that for QOI to not make undefined
> behavior worse.  The more correct transform would be to replace the
> load with a __builtin_trap () during path isolation (or wire in path isolation
> to value-numbering where we actually figure out there's no valid definition
> to reach for the load).
> 
> So yes, if you want to avoid these kind of transforms earlier instrumentation
> is better.

And then attempting to schedule it immediately after pass_ccp in the early-opts
pipeline is already too late, right?

Thanks!
Alexander

RE: [ARC PATCH] Consistent use of whitespace in assembler templates.

2023-11-21 Thread Claudiu Zissulescu

Hi Roger,

Apologizes for late reply, I was in a short vacation. Please proceed with your 
commit. BTW, I consider this type of contribution as obvious, and you can 
always push it without waiting for my feedback.

Thank you for your contribution,
Claudiu

-Original Message-
From: Roger Sayle  
Sent: Monday, November 6, 2023 8:37 PM
To: gcc-patches@gcc.gnu.org
Cc: 'Claudiu Zissulescu' 
Subject: [ARC PATCH] Consistent use of whitespace in assembler templates.


This minor clean-up patch tweaks arc.md to use whitespace consistently in 
output templates, always using a TAB between the mnemonic and its operands, and 
avoiding spaces after commas between operands.  There should be no functional 
changes with this patch, though several test cases' scan-assembler needed to be 
updated to use \s+ instead of testing for a TAB or a space explicitly.

Tested with a cross-compiler to arc-linux hosted on x86_64, with no new 
(compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?


2023-11-06  Roger Sayle  

gcc/ChangeLog
* config/arc/arc.md: Make output template whitespace consistent.

gcc/testsuite/ChangeLog
* gcc.target/arc/jli-1.c: Update dg-final whitespace.
* gcc.target/arc/jli-2.c: Likewise.
* gcc.target/arc/naked-1.c: Likewise.
* gcc.target/arc/naked-2.c: Likewise.
* gcc.target/arc/tmac-1.c: Likewise.
* gcc.target/arc/tmac-2.c: Likewise.


Thanks again,
Roger
--

Re: [PATCH] rtl-optimization: Modify loop live data with livein of loop header

2023-11-21 Thread Richard Biener

On Tue, Nov 21, 2023 at 9:30 AM Ajit Agarwal  wrote:
>
> Hello All:
>
> This patch marked LOOP_DATA->live as the livein at the loop header basic
> block. This is because Livein at each basic block is live in at the loop 
> header.

The current code does the same, you now have fewer regs live.  In fact
your patch removes all of the settings since when
loop->aux == NULL there's no LOOP_DATA (loop), so you never do anything.

It appears that you do not fully grasp the changes done by your
patches - you need to improve
in this regard and either provide better explanations or stop sending these kind
of patches.

I will stop looking at your patches now, it appears to be a waste of
my precious time.

Peter - please work with Ajit here.

Thanks,
Richard.


> Bootstrapped and regtested on powerpc64-linux-gnu.
>
> SPEC CPU 2017 benchmarks score is better than trunk wit this
> change for INT and FP benchmarks.
>
> THanks & Regards
> Ajit
>
>
> rtl-optimization: Modify loop live data with livein of loop header
>
> Livein at each basic block is live in at the loop header.
> Marked LOOP_DATA->live as the livein at the loop header basic
> block.
>
> 2023-11-21  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> * loop-invariant.cc (calculate_loop_reg_pressure): Mark
> LOOP_DATA->live as the livein at the loop header basic block.
> ---
>  gcc/loop-invariant.cc | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/loop-invariant.cc b/gcc/loop-invariant.cc
> index a9b156682bc..c15e6168d5d 100644
> --- a/gcc/loop-invariant.cc
> +++ b/gcc/loop-invariant.cc
> @@ -2169,17 +2169,18 @@ calculate_loop_reg_pressure (void)
>}
>ira_setup_eliminable_regset ();
>bitmap_initialize (&curr_regs_live, ®_obstack);
> +
> +  /* Livein (loop_hdr) is live at each of the loop basic blocks.  */
> +  for (auto loop : loops_list (cfun, 0))
> +if (loop->aux == NULL)
> +  bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN 
> (loop->header));
> +
>FOR_EACH_BB_FN (bb, cfun)
>  {
>curr_loop = bb->loop_father;
>if (curr_loop == current_loops->tree_root)
> continue;
>
> -  for (class loop *loop = curr_loop;
> -  loop != current_loops->tree_root;
> -  loop = loop_outer (loop))
> -   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN (bb));
> -
>bitmap_copy (&curr_regs_live, DF_LR_IN (bb));
>for (i = 0; i < ira_pressure_classes_num; i++)
> curr_reg_pressure[ira_pressure_classes[i]] = 0;
> --
> 2.39.3
>

Re: [PATCH]AArch64 docs: update -mcpu=generic definition on aarch64

2023-11-21 Thread Richard Earnshaw





On 20/11/2023 21:49, Tamar Christina wrote:

-Original Message-
From: Richard Earnshaw 
Sent: Monday, November 20, 2023 12:53 PM
To: Tamar Christina ; gcc-patches@gcc.gnu.org
Cc: nd ; Richard Earnshaw ;
Marcus Shawcroft ; Kyrylo Tkachov
; Richard Sandiford

Subject: Re: [PATCH]AArch64 docs: update -mcpu=generic definition on
aarch64



On 16/11/2023 15:19, Tamar Christina wrote:
> Hi All,
>
> This documents the behavior of the generic CPU options on AArch64.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>  * doc/invoke.texi (generic): Update defintion.
>  (generic-armv8-a, generic-armv9-a): Document.
>
> --- inline copy of patch --
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
>
d0b55fb106f908e8222394bbd07670aa583c5680..77684c5d7c9c0bdd5872
50acc190
> da81e0f7f032 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -20759,7 +20759,8 @@ processors implementing the target
architecture.
>   @item -mtune=@var{name}
>   Specify the name of the target processor for which GCC should tune the
>   performance of the code.  Permissible values for this option are:
> -@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53},
> @samp{cortex-a55},
> +@samp{generic}, @samp{generic-armv8-a}, @samp{generic-armv9-a},
> +@samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
>   @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73},
@samp{cortex-a75},
>   @samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
>   @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34}, @@
> -20798,6 +20799,11 @@ arithmetic instructions per cycle (2 for 256-bit
SVE, 4 for 128-bit SVE).
>   This is more general than tuning for a specific core like Neoverse V1
>   but is more specific than the default tuning described below.
>
> +The value @samp{generic} should not be assumed to be a static
configuration.
> +Starting with GCC 14 this value can change over time in order to
> +better reflect advancements in CPU microarchitecture.  If a specific
> +version is required you are encouraged to use one of the architecture
specific generic processors, e.g. @samp{generic-armv8-a}.
> +
>   Additionally on native AArch64 GNU/Linux systems the value
>   @samp{native} tunes performance to the host system.  This option has no
effect
>   if the compiler is unable to recognize the processor of the host system.
>
>
>
>
@opindex mcpu
@item -mcpu=@var{name}
Specify the name of the target processor, optionally suffixed by one or more
feature modifiers.  This option has the form @option{-
mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the
permissible values for @var{cpu} are the same as those available
^^^
^
for @option{-mtune}.


So what is the behaviour now if these are used for -mcpu?  Do we really want
to permit their use here?



They behave as any other CPU but with the baseline architecture and no 
extensions

i.e. -mcpu=generic == -march=armv8-a -mtune=generic.

We've never blocked them before so doing so now would be a regression.
Conceptually they do make sense as -mcpu values as they just mean "give me
the best compatibility with this architecture as a baseline".


My point is that if 'generic' can change meaning from release to release 
(which is acceptable for tune), then it becomes somewhat ambiguous (and 
therefore useless) for a CPU.


R.

Re: [RFC PATCH] Detecting lifetime-dse issues via Valgrind

2023-11-21 Thread Richard Biener

On Tue, Nov 21, 2023 at 9:56 AM Alexander Monakov  wrote:
>
>
> On Tue, 21 Nov 2023, Richard Biener wrote:
>
> > and this, too, btw. - the DSE actually happens, the latter transform not.
> > We specifically "opt out" of doing that for QOI to not make undefined
> > behavior worse.  The more correct transform would be to replace the
> > load with a __builtin_trap () during path isolation (or wire in path 
> > isolation
> > to value-numbering where we actually figure out there's no valid definition
> > to reach for the load).
> >
> > So yes, if you want to avoid these kind of transforms earlier 
> > instrumentation
> > is better.
>
> And then attempting to schedule it immediately after pass_ccp in the 
> early-opts
> pipeline is already too late, right?

Well, yes.  CCP won't do many things but it might for example rewrite a
stack variable to a register.

> Thanks!
> Alexander

Re: [PATCH]AArch64 docs: update -mcpu=generic definition on aarch64

2023-11-21 Thread Richard Biener

On Tue, Nov 21, 2023 at 10:38 AM Richard Earnshaw
 wrote:
>
>
>
> On 20/11/2023 21:49, Tamar Christina wrote:
> >> -Original Message-
> >> From: Richard Earnshaw 
> >> Sent: Monday, November 20, 2023 12:53 PM
> >> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> >> Cc: nd ; Richard Earnshaw ;
> >> Marcus Shawcroft ; Kyrylo Tkachov
> >> ; Richard Sandiford
> >> 
> >> Subject: Re: [PATCH]AArch64 docs: update -mcpu=generic definition on
> >> aarch64
> >>
> >>
> >>
> >> On 16/11/2023 15:19, Tamar Christina wrote:
> >> > Hi All,
> >> >
> >> > This documents the behavior of the generic CPU options on AArch64.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * doc/invoke.texi (generic): Update defintion.
> >> >  (generic-armv8-a, generic-armv9-a): Document.
> >> >
> >> > --- inline copy of patch --
> >> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> >> >
> >> d0b55fb106f908e8222394bbd07670aa583c5680..77684c5d7c9c0bdd5872
> >> 50acc190
> >> > da81e0f7f032 100644
> >> > --- a/gcc/doc/invoke.texi
> >> > +++ b/gcc/doc/invoke.texi
> >> > @@ -20759,7 +20759,8 @@ processors implementing the target
> >> architecture.
> >> >   @item -mtune=@var{name}
> >> >   Specify the name of the target processor for which GCC should tune the
> >> >   performance of the code.  Permissible values for this option are:
> >> > -@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53},
> >> > @samp{cortex-a55},
> >> > +@samp{generic}, @samp{generic-armv8-a}, @samp{generic-armv9-a},
> >> > +@samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
> >> >   @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73},
> >> @samp{cortex-a75},
> >> >   @samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
> >> >   @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34}, @@
> >> > -20798,6 +20799,11 @@ arithmetic instructions per cycle (2 for 256-bit
> >> SVE, 4 for 128-bit SVE).
> >> >   This is more general than tuning for a specific core like Neoverse V1
> >> >   but is more specific than the default tuning described below.
> >> >
> >> > +The value @samp{generic} should not be assumed to be a static
> >> configuration.
> >> > +Starting with GCC 14 this value can change over time in order to
> >> > +better reflect advancements in CPU microarchitecture.  If a specific
> >> > +version is required you are encouraged to use one of the architecture
> >> specific generic processors, e.g. @samp{generic-armv8-a}.
> >> > +
> >> >   Additionally on native AArch64 GNU/Linux systems the value
> >> >   @samp{native} tunes performance to the host system.  This option has no
> >> effect
> >> >   if the compiler is unable to recognize the processor of the host 
> >> > system.
> >> >
> >> >
> >> >
> >> >
> >> @opindex mcpu
> >> @item -mcpu=@var{name}
> >> Specify the name of the target processor, optionally suffixed by one or 
> >> more
> >> feature modifiers.  This option has the form @option{-
> >> mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the
> >> permissible values for @var{cpu} are the same as those available
> >> ^^^
> >> ^
> >> for @option{-mtune}.
> >> 
> >>
> >> So what is the behaviour now if these are used for -mcpu?  Do we really 
> >> want
> >> to permit their use here?
> >>
> >
> > They behave as any other CPU but with the baseline architecture and no
> > extensions
> > i.e. -mcpu=generic == -march=armv8-a -mtune=generic.
> >
> > We've never blocked them before so doing so now would be a regression.
> > Conceptually they do make sense as -mcpu values as they just mean "give me
> > the best compatibility with this architecture as a baseline".
>
> My point is that if 'generic' can change meaning from release to release
> (which is acceptable for tune), then it becomes somewhat ambiguous (and
> therefore useless) for a CPU.

Which is why x86 doesn't have -march=generic but only -mtune=generic.
IMHO options selecting ISA features shouldn't change their meaning over time.

> R.

RE: [PATCH]AArch64 docs: update -mcpu=generic definition on aarch64

2023-11-21 Thread Tamar Christina

> 
> On 20/11/2023 21:49, Tamar Christina wrote:
> >> -Original Message-
> >> From: Richard Earnshaw 
> >> Sent: Monday, November 20, 2023 12:53 PM
> >> To: Tamar Christina ;
> >> gcc-patches@gcc.gnu.org
> >> Cc: nd ; Richard Earnshaw ;
> >> Marcus Shawcroft ; Kyrylo Tkachov
> >> ; Richard Sandiford
> >> 
> >> Subject: Re: [PATCH]AArch64 docs: update -mcpu=generic definition on
> >> aarch64
> >>
> >>
> >>
> >> On 16/11/2023 15:19, Tamar Christina wrote:
> >> > Hi All,
> >> >
> >> > This documents the behavior of the generic CPU options on AArch64.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * doc/invoke.texi (generic): Update defintion.
> >> >  (generic-armv8-a, generic-armv9-a): Document.
> >> >
> >> > --- inline copy of patch --
> >> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> >> >
> >>
> d0b55fb106f908e8222394bbd07670aa583c5680..77684c5d7c9c0bdd5872
> >> 50acc190
> >> > da81e0f7f032 100644
> >> > --- a/gcc/doc/invoke.texi
> >> > +++ b/gcc/doc/invoke.texi
> >> > @@ -20759,7 +20759,8 @@ processors implementing the target
> >> architecture.
> >> >   @item -mtune=@var{name}
> >> >   Specify the name of the target processor for which GCC should
> >> >tune the
> >> >   performance of the code.  Permissible values for this option are:
> >> > -@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53},
> >> >@samp{cortex-a55},
> >> > +@samp{generic}, @samp{generic-armv8-a}, @samp{generic-armv9-a},
> >> > +@samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
> >> >   @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73},
> >> @samp{cortex-a75},
> >> >   @samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
> >> >   @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34}, @@
> >> > -20798,6 +20799,11 @@ arithmetic instructions per cycle (2 for
> >> >256-bit
> >> SVE, 4 for 128-bit SVE).
> >> >   This is more general than tuning for a specific core like
> >> >Neoverse V1
> >> >   but is more specific than the default tuning described below.
> >> >
> >> > +The value @samp{generic} should not be assumed to be a static
> >> configuration.
> >> > +Starting with GCC 14 this value can change over time in order to
> >> > +better reflect advancements in CPU microarchitecture.  If a
> >> > +specific version is required you are encouraged to use one of the
> >> > +architecture
> >> specific generic processors, e.g. @samp{generic-armv8-a}.
> >> > +
> >> >   Additionally on native AArch64 GNU/Linux systems the value
> >> >   @samp{native} tunes performance to the host system.  This option
> >> >has no
> >> effect
> >> >   if the compiler is unable to recognize the processor of the host 
> >> >system.
> >> >
> >> >
> >> >
> >> >
> >> @opindex mcpu
> >> @item -mcpu=@var{name}
> >> Specify the name of the target processor, optionally suffixed by one
> >> or more feature modifiers.  This option has the form @option{-
> >> mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the
> >> permissible values for @var{cpu} are the same as those available
> >>
> ^^^
> >> ^
> >> for @option{-mtune}.
> >> 
> >>
> >> So what is the behaviour now if these are used for -mcpu?  Do we
> >> really want to permit their use here?
> >>
> >
> > They behave as any other CPU but with the baseline architecture and no
> > extensions i.e. -mcpu=generic == -march=armv8-a -mtune=generic.
> >
> > We've never blocked them before so doing so now would be a regression.
> > Conceptually they do make sense as -mcpu values as they just mean
> > "give me the best compatibility with this architecture as a baseline".
> 
> My point is that if 'generic' can change meaning from release to release 
> (which
> is acceptable for tune), then it becomes somewhat ambiguous (and therefore
> useless) for a CPU.
> 

But I don't think we should be moving the baseline architecture for generic, 
only
the tuning part.

For moving your baseline we have the more specific ones like 
-mcpu=generic-armv9-a
for instance.

So the meaning of generic should stay the same, widest compatible binaries with 
good
but not the best performance.

Perhaps I should clarify that.

Cheers,
Tamar

> R.

Re: [PATCH] rtl-optimization: Modify loop live data with livein of loop header

2023-11-21 Thread Ajit Agarwal




On 21/11/23 3:02 pm, Richard Biener wrote:
> On Tue, Nov 21, 2023 at 9:30 AM Ajit Agarwal  wrote:
>>
>> Hello All:
>>
>> This patch marked LOOP_DATA->live as the livein at the loop header basic
>> block. This is because Livein at each basic block is live in at the loop 
>> header.
> 
> The current code does the same, you now have fewer regs live.  In fact
> your patch removes all of the settings since when
> loop->aux == NULL there's no LOOP_DATA (loop), so you never do anything.
> 

Sorry for the inconvenience caused. I forgot to remove the check loop-aux == 
NULL
in the patch that I sent.

My mistake. Sorry for that.

Thanks & Regards
Ajit
> It appears that you do not fully grasp the changes done by your
> patches - you need to improve
> in this regard and either provide better explanations or stop sending these 
> kind
> of patches.
> 
> I will stop looking at your patches now, it appears to be a waste of
> my precious time.
> 
> Peter - please work with Ajit here.
> 
> Thanks,
> Richard.
> 
> 
>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>
>> SPEC CPU 2017 benchmarks score is better than trunk wit this
>> change for INT and FP benchmarks.
>>
>> THanks & Regards
>> Ajit
>>
>>
>> rtl-optimization: Modify loop live data with livein of loop header
>>
>> Livein at each basic block is live in at the loop header.
>> Marked LOOP_DATA->live as the livein at the loop header basic
>> block.
>>
>> 2023-11-21  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>> * loop-invariant.cc (calculate_loop_reg_pressure): Mark
>> LOOP_DATA->live as the livein at the loop header basic block.
>> ---
>>  gcc/loop-invariant.cc | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/loop-invariant.cc b/gcc/loop-invariant.cc
>> index a9b156682bc..c15e6168d5d 100644
>> --- a/gcc/loop-invariant.cc
>> +++ b/gcc/loop-invariant.cc
>> @@ -2169,17 +2169,18 @@ calculate_loop_reg_pressure (void)
>>}
>>ira_setup_eliminable_regset ();
>>bitmap_initialize (&curr_regs_live, ®_obstack);
>> +
>> +  /* Livein (loop_hdr) is live at each of the loop basic blocks.  */
>> +  for (auto loop : loops_list (cfun, 0))
>> +if (loop->aux == NULL)
>> +  bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN 
>> (loop->header));
>> +
>>FOR_EACH_BB_FN (bb, cfun)
>>  {
>>curr_loop = bb->loop_father;
>>if (curr_loop == current_loops->tree_root)
>> continue;
>>
>> -  for (class loop *loop = curr_loop;
>> -  loop != current_loops->tree_root;
>> -  loop = loop_outer (loop))
>> -   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN (bb));
>> -
>>bitmap_copy (&curr_regs_live, DF_LR_IN (bb));
>>for (i = 0; i < ira_pressure_classes_num; i++)
>> curr_reg_pressure[ira_pressure_classes[i]] = 0;
>> --
>> 2.39.3
>>

Re: [PATCH 2/2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Sebastian Huber





On 20.11.23 15:56, Jakub Jelinek wrote:

On Mon, Nov 20, 2023 at 03:33:31PM +0100, Sebastian Huber wrote:

This change fixes issues like this:

   gcc.dg/gomp/pr27573.c: In function ‘main._omp_fn.0’:
   gcc.dg/gomp/pr27573.c:19:1: error: non-trivial conversion in ‘ssa_name’
  19 | }
 | ^
   long int
   long unsigned int
   # .MEM_19 = VDEF <.MEM_18>
   __gcov7.main._omp_fn.0[0] = PROF_time_profile_12;
   during IPA pass: profile
   gcc.dg/gomp/pr27573.c:19:1: internal compiler error: verify_gimple failed

gcc/ChangeLog:

PR middle-end/112634

* tree-profile.cc (gen_assign_counter_update): Cast the unsigned result 
type of
__atomic_add_fetch() to the signed counter type.
---
  gcc/tree-profile.cc | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 68db09f6189..54938e1d165 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -284,7 +284,9 @@ gen_assign_counter_update (gimple_stmt_iterator *gsi, gcall 
*call, tree func,
tree tmp = make_temp_ssa_name (result_type, NULL, name);
gimple_set_lhs (call, tmp);
gsi_insert_after (gsi, call, GSI_NEW_STMT);
-  gassign *assign = gimple_build_assign (result, tmp);
+  gassign *assign = gimple_build_assign (result,
+build_int_cst (TREE_TYPE (result),
+   tmp));

This can't be correct.
tmp is a SSA_NAME, so calling build_int_cst on it is not appropriate, the
second argument should be some unsigned HOST_WIDE_INT value.
If result_type is different type from TREE_TYPE (result), but both are
integer types, then you want
   gassign *assing = gimple_build_assign (result, NOP_EXPR, tmp);
or so.


I really don't know what I am doing here, so a lot of guess work is 
involved from my side. The change fixed at least the failing test case. 
When I use the NOP_EXPR


static inline void
gen_assign_counter_update (gimple_stmt_iterator *gsi, gcall *call, tree 
func,

   tree result, const char *name)
{
  if (result)
{
  tree result_type = TREE_TYPE (TREE_TYPE (func));
  tree tmp = make_temp_ssa_name (result_type, NULL, name);
  gimple_set_lhs (call, tmp);
  gsi_insert_after (gsi, call, GSI_NEW_STMT);
  gassign *assign = gimple_build_assign (result, NOP_EXPR, tmp);
  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
}
  else
gsi_insert_after (gsi, call, GSI_NEW_STMT);
}

I get

gcc -O2 -fopenmp -fprofile-generate 
./gcc/testsuite/gcc.dg/gomp/pr27573.c -S -o -

.file   "pr27573.c"
./gcc/testsuite/gcc.dg/gomp/pr27573.c: In function ‘main._omp_fn.0’:
./gcc/testsuite/gcc.dg/gomp/pr27573.c:19:1: error: non-register as LHS 
of unary operation

   19 | }
  | ^
# .MEM_19 = VDEF <.MEM_18>
__gcov7.main._omp_fn.0[0] = (long int) PROF_time_profile_12;
during IPA pass: profile
./gcc/testsuite/gcc.dg/gomp/pr27573.c:19:1: internal compiler error: 
verify_gimple failed


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

RE: [PATCH]AArch64 docs: update -mcpu=generic definition on aarch64

2023-11-21 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, November 21, 2023 9:41 AM
> To: Richard Earnshaw 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
> nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: Re: [PATCH]AArch64 docs: update -mcpu=generic definition on
> aarch64
> 
> On Tue, Nov 21, 2023 at 10:3 AM Richard Earnshaw
>  wrote:
> >
> >
> >
> > On 20/11/2023 21:49, Tamar Christina wrote:
> > >> -Original Message-
> > >> From: Richard Earnshaw 
> > >> Sent: Monday, November 20, 2023 12:53 PM
> > >> To: Tamar Christina ;
> > >> gcc-patches@gcc.gnu.org
> > >> Cc: nd ; Richard Earnshaw
> ;
> > >> Marcus Shawcroft ; Kyrylo Tkachov
> > >> ; Richard Sandiford
> > >> 
> > >> Subject: Re: [PATCH]AArch64 docs: update -mcpu=generic definition
> > >> on
> > >> aarch64
> > >>
> > >>
> > >>
> > >> On 16/11/2023 15:19, Tamar Christina wrote:
> > >> > Hi All,
> > >> >
> > >> > This documents the behavior of the generic CPU options on AArch64.
> > >> >
> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >> >
> > >> > Ok for master?
> > >> >
> > >> > Thanks,
> > >> > Tamar
> > >> >
> > >> > gcc/ChangeLog:
> > >> >
> > >> >  * doc/invoke.texi (generic): Update defintion.
> > >> >  (generic-armv8-a, generic-armv9-a): Document.
> > >> >
> > >> > --- inline copy of patch --
> > >> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> > >> >
> > >>
> d0b55fb106f908e8222394bbd07670aa583c5680..77684c5d7c9c0bdd5872
> > >> 50acc190
> > >> > da81e0f7f032 100644
> > >> > --- a/gcc/doc/invoke.texi
> > >> > +++ b/gcc/doc/invoke.texi
> > >> > @@ -20759,7 +20759,8 @@ processors implementing the target
> > >> architecture.
> > >> >   @item -mtune=@var{name}
> > >> >   Specify the name of the target processor for which GCC should tune
> the
> > >> >   performance of the code.  Permissible values for this option are:
> > >> > -@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53},
> > >> > @samp{cortex-a55},
> > >> > +@samp{generic}, @samp{generic-armv8-a}, @samp{generic-armv9-
> a},
> > >> > +@samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
> > >> >   @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73},
> > >> @samp{cortex-a75},
> > >> >   @samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
> > >> >   @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34}, @@
> > >> > -20798,6 +20799,11 @@ arithmetic instructions per cycle (2 for
> > >> > 256-bit
> > >> SVE, 4 for 128-bit SVE).
> > >> >   This is more general than tuning for a specific core like Neoverse V1
> > >> >   but is more specific than the default tuning described below.
> > >> >
> > >> > +The value @samp{generic} should not be assumed to be a static
> > >> configuration.
> > >> > +Starting with GCC 14 this value can change over time in order to
> > >> > +better reflect advancements in CPU microarchitecture.  If a
> > >> > +specific version is required you are encouraged to use one of
> > >> > +the architecture
> > >> specific generic processors, e.g. @samp{generic-armv8-a}.
> > >> > +
> > >> >   Additionally on native AArch64 GNU/Linux systems the value
> > >> >   @samp{native} tunes performance to the host system.  This
> > >> > option has no
> > >> effect
> > >> >   if the compiler is unable to recognize the processor of the host 
> > >> > system.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> @opindex mcpu
> > >> @item -mcpu=@var{name}
> > >> Specify the name of the target processor, optionally suffixed by
> > >> one or more feature modifiers.  This option has the form @option{-
> > >> mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where
> the
> > >> permissible values for @var{cpu} are the same as those available
> > >>
> ^^^
> > >> ^
> > >> for @option{-mtune}.
> > >> 
> > >>
> > >> So what is the behaviour now if these are used for -mcpu?  Do we
> > >> really want to permit their use here?
> > >>
> > >
> > > They behave as any other CPU but with the baseline architecture and
> > > no extensions i.e. -mcpu=generic == -march=armv8-a -mtune=generic.
> > >
> > > We've never blocked them before so doing so now would be a regression.
> > > Conceptually they do make sense as -mcpu values as they just mean
> > > "give me the best compatibility with this architecture as a baseline".
> >
> > My point is that if 'generic' can change meaning from release to
> > release (which is acceptable for tune), then it becomes somewhat
> > ambiguous (and therefore useless) for a CPU.
> 
> Which is why x86 doesn't have -march=generic but only -mtune=generic.
> IMHO options selecting ISA features shouldn't change their meaning over
> time.
> 

Agreed, and that's not the plan.  Perhaps this was unclear.  Today generic
Generates code for lowest baseline architecture but tuned for a 10 year old 
core.

The intention of this clarification is to say that the target being tuned

Re: [PATCH] rtl-optimization: Modify loop live data with livein of loop header

2023-11-21 Thread Ajit Agarwal




On 21/11/23 3:15 pm, Ajit Agarwal wrote:
> 
> 
> On 21/11/23 3:02 pm, Richard Biener wrote:
>> On Tue, Nov 21, 2023 at 9:30 AM Ajit Agarwal  wrote:
>>>
>>> Hello All:
>>>
>>> This patch marked LOOP_DATA->live as the livein at the loop header basic
>>> block. This is because Livein at each basic block is live in at the loop 
>>> header.
>>
>> The current code does the same, you now have fewer regs live.  In fact
>> your patch removes all of the settings since when
>> loop->aux == NULL there's no LOOP_DATA (loop), so you never do anything.
>>
> 
> Sorry for the inconvenience caused. I forgot to remove the check loop-aux == 
> NULL
> in the patch that I sent.
> 
> My mistake. Sorry for that.
> 
> Thanks & Regards
> Ajit

I did copy from one directory to another and forgot to remove loop->aux == NULL 
check
in the patch that I sent.

My mistake. But anyhow I have tested without that check.

Sorry for inconvenience caused. I will make sure this wont happen again in the 
future
patches.

Thanks & Regards
Ajit
>> It appears that you do not fully grasp the changes done by your
>> patches - you need to improve
>> in this regard and either provide better explanations or stop sending these 
>> kind
>> of patches.
>>
>> I will stop looking at your patches now, it appears to be a waste of
>> my precious time.
>>
>> Peter - please work with Ajit here.
>>
>> Thanks,
>> Richard.
>>
>>
>>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>>
>>> SPEC CPU 2017 benchmarks score is better than trunk wit this
>>> change for INT and FP benchmarks.
>>>
>>> THanks & Regards
>>> Ajit
>>>
>>>
>>> rtl-optimization: Modify loop live data with livein of loop header
>>>
>>> Livein at each basic block is live in at the loop header.
>>> Marked LOOP_DATA->live as the livein at the loop header basic
>>> block.
>>>
>>> 2023-11-21  Ajit Kumar Agarwal  
>>>
>>> gcc/ChangeLog:
>>>
>>> * loop-invariant.cc (calculate_loop_reg_pressure): Mark
>>> LOOP_DATA->live as the livein at the loop header basic block.
>>> ---
>>>  gcc/loop-invariant.cc | 11 ++-
>>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.cc b/gcc/loop-invariant.cc
>>> index a9b156682bc..c15e6168d5d 100644
>>> --- a/gcc/loop-invariant.cc
>>> +++ b/gcc/loop-invariant.cc
>>> @@ -2169,17 +2169,18 @@ calculate_loop_reg_pressure (void)
>>>}
>>>ira_setup_eliminable_regset ();
>>>bitmap_initialize (&curr_regs_live, ®_obstack);
>>> +
>>> +  /* Livein (loop_hdr) is live at each of the loop basic blocks.  */
>>> +  for (auto loop : loops_list (cfun, 0))
>>> +if (loop->aux == NULL)
>>> +  bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN 
>>> (loop->header));
>>> +
>>>FOR_EACH_BB_FN (bb, cfun)
>>>  {
>>>curr_loop = bb->loop_father;
>>>if (curr_loop == current_loops->tree_root)
>>> continue;
>>>
>>> -  for (class loop *loop = curr_loop;
>>> -  loop != current_loops->tree_root;
>>> -  loop = loop_outer (loop))
>>> -   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN (bb));
>>> -
>>>bitmap_copy (&curr_regs_live, DF_LR_IN (bb));
>>>for (i = 0; i < ira_pressure_classes_num; i++)
>>> curr_reg_pressure[ira_pressure_classes[i]] = 0;
>>> --
>>> 2.39.3
>>>

Re: [PATCH 4/4] gcov: Improve -fprofile-update=atomic

2023-11-21 Thread Jakub Jelinek

On Wed, Nov 15, 2023 at 06:51:10AM +0100, Sebastian Huber wrote:
> sorry, in the patch I should use targetm.have_atomic instead of 
> TARGET_HAVE_LIBATOMIC. 

I've noticed the r14-5579 commit introduced some formatting issues,
this patch fixes what I saw.

In particular, operators don't go at the end of line but at the start of
next one.

Committed to trunk as obvious.

2023-11-21  Jakub Jelinek  

gcc/
* tree-profile.cc (gen_counter_update, tree_profiling): Formatting
fixes.
libgcc/
* libgcov.h (GCOV_SUPPORTS_ATOMIC): Formatting fixes.

--- gcc/tree-profile.cc.jj  2023-11-21 09:31:36.349387476 +0100
+++ gcc/tree-profile.cc 2023-11-21 10:45:26.035499140 +0100
@@ -304,8 +304,8 @@ gen_counter_update (gimple_stmt_iterator
   tree one = build_int_cst (type, 1);
   tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED);
 
-  if (counter_update == COUNTER_UPDATE_ATOMIC_BUILTIN ||
-  (result && counter_update == COUNTER_UPDATE_ATOMIC_SPLIT))
+  if (counter_update == COUNTER_UPDATE_ATOMIC_BUILTIN
+  || (result && counter_update == COUNTER_UPDATE_ATOMIC_SPLIT))
 {
   /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
   tree f = builtin_decl_explicit (TYPE_PRECISION (type) > 32
@@ -314,8 +314,8 @@ gen_counter_update (gimple_stmt_iterator
   gcall *call = gimple_build_call (f, 3, addr, one, relaxed);
   gen_assign_counter_update (gsi, call, f, result, name);
 }
-  else if (!result && (counter_update == COUNTER_UPDATE_ATOMIC_SPLIT ||
-  counter_update == COUNTER_UPDATE_ATOMIC_PARTIAL))
+  else if (!result && (counter_update == COUNTER_UPDATE_ATOMIC_SPLIT
+  || counter_update == COUNTER_UPDATE_ATOMIC_PARTIAL))
 {
   /* low = __atomic_add_fetch_4 (addr, 1, MEMMODEL_RELAXED);
 high_inc = low == 0 ? 1 : 0;
@@ -780,8 +780,8 @@ tree_profiling (void)
   flag_profile_update = PROFILE_UPDATE_SINGLE;
 }
   else if (flag_profile_update == PROFILE_UPDATE_PREFER_ATOMIC)
-flag_profile_update = can_support_atomic
-  ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE;
+flag_profile_update
+  = can_support_atomic ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE;
 
   if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
 {
@@ -791,7 +791,7 @@ tree_profiling (void)
counter_update = COUNTER_UPDATE_ATOMIC_BUILTIN;
 }
   else if (gcov_type_size == 8 && have_atomic_4)
-  counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
+counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
 
   /* This is a small-ipa pass that gets called only once, from
  cgraphunit.cc:ipa_passes().  */
--- libgcc/libgcov.h.jj 2023-11-20 09:50:08.434204617 +0100
+++ libgcc/libgcov.h2023-11-21 10:47:50.320481543 +0100
@@ -96,9 +96,9 @@ typedef unsigned gcov_type_unsigned __at
 #endif
 
 /* Detect whether target can support atomic update of profilers.  */
-#if (__SIZEOF_LONG_LONG__ == 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) || \
-(__SIZEOF_LONG_LONG__ == 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8) || \
-defined (__LIBGCC_HAVE_LIBATOMIC)
+#if (__SIZEOF_LONG_LONG__ == 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) \
+|| (__SIZEOF_LONG_LONG__ == 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8) \
+|| defined (__LIBGCC_HAVE_LIBATOMIC)
 #define GCOV_SUPPORTS_ATOMIC 1
 #else
 #define GCOV_SUPPORTS_ATOMIC 0

Jakub

[COMMITTED] ada: Trivial typo fix in comment

2023-11-21 Thread Marc Poulhiès

gcc/ada/

* exp_util.ads: Typo fix.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.ads | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/exp_util.ads b/gcc/ada/exp_util.ads
index 95ea4403c5d..932bf3fdcbc 100644
--- a/gcc/ada/exp_util.ads
+++ b/gcc/ada/exp_util.ads
@@ -1070,7 +1070,7 @@ package Exp_Util is
--  call and is analyzed and resolved on return. Name_Req may only be set to
--  True if Exp has the form of a name, and the effect is to guarantee that
--  any replacement maintains the form of name. If Renaming_Req is set to
-   --  True, the routine produces an object renaming reclaration capturing the
+   --  True, the routine produces an object renaming declaration capturing the
--  expression. If Variable_Ref is set to True, a variable is considered as
--  side effect (used in implementing Force_Evaluation). Note: after call to
--  Remove_Side_Effects, it is safe to call New_Copy_Tree to obtain a copy
-- 
2.42.0

[COMMITTED] ada: Fix SCOs generation for aspect specifications

2023-11-21 Thread Marc Poulhiès

From: Pierre-Marie de Rodat 

The recent overhaul for the representation of aspect specifications in
the tree broke SCOs generation: decisions that appeared in aspects were
processed twice, leading to the emission of erroneous obligations. Tweak
SCOs generation to skip aspect specifications the second time to go back
to the previous behavior.

gcc/ada/

* par_sco.adb (Process_Decisions): Skip aspect
specifications.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/par_sco.adb | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/ada/par_sco.adb b/gcc/ada/par_sco.adb
index 0639ca616e0..84af8bf9867 100644
--- a/gcc/ada/par_sco.adb
+++ b/gcc/ada/par_sco.adb
@@ -751,6 +751,13 @@ package body Par_SCO is
   begin
  case Nkind (N) is
 
+--  Aspect specifications have dedicated processings (see
+--  Traverse_Aspects) so ignore them here, so that they are
+--  processed only once.
+
+when N_Aspect_Specification =>
+   return Skip;
+
 --  Logical operators, output table entries and then process
 --  operands recursively to deal with nested conditions.
 
-- 
2.42.0

[COMMITTED] ada: Always use -gnatg in run-time GPR files

2023-11-21 Thread Marc Poulhiès

From: Ronan Desplanques 

This patch makes it so -gnatg is always passed to the compiler when
rebuilding the run-time library with the dedicated GPR files. Before
this patch, if a user rebuilt the run-time with -XADAFLAGS=XXX where
XXX didn't include "-gnatg", the build would immediately fail. This
case occurs when following the instructions in libada.gpr, which
use '-XADAFLAGS="-gnatn"'.

gcc/ada/

* libgnat/libgnat_common.gpr: Unconditionally pass -gnatg.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/libgnat_common.gpr | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/libgnat/libgnat_common.gpr 
b/gcc/ada/libgnat/libgnat_common.gpr
index 63039288764..a6340332c57 100644
--- a/gcc/ada/libgnat/libgnat_common.gpr
+++ b/gcc/ada/libgnat/libgnat_common.gpr
@@ -5,7 +5,7 @@ abstract project Libgnat_Common is
("-I../include", "-DIN_RTS=1", "-fexceptions",
 "-DSTANDALONE") &
External_As_List ("EXTRALIBFLAGS", " ");
-   Ada_Flags:= Common_Flags & ("-nostdinc", "-I../adainclude")
+   Ada_Flags:= Common_Flags & ("-nostdinc", "-I../adainclude", "-gnatg")
& Split (External ("ADAFLAGS", "-gnatpg"), " ");
Library_Kind := External ("LIBRARY_KIND", "static");
 
-- 
2.42.0

[COMMITTED] ada: Deep delta aggregates

2023-11-21 Thread Marc Poulhiès

From: Steve Baird 

Add support for "deep" delta aggregates, a GNAT-defined language extension
conditionally enabled via the -gnatX0 switch. In a deep delta aggregate, a
delta choice may specify a subcomponent (as opposed to just a component).

gcc/ada/

* par.adb: Add new Boolean variable Inside_Delta_Aggregate.
* par-ch4.adb (P_Simple_Expression): Add support for a deep delta
aggregate choice. We turn a sequence of selectors into a peculiar
tree. We build a component (Indexed or Selected) whose prefix is
another such component, etc. The leftmost prefix at the bottom of
the tree has a "name" which is the first selector, without any
further prefix. For something like "with delta (1)(2) => 3" where
the type of the aggregate is an array of arrays of integers, we'll
build an N_Indexed_Component whose prefix is an integer literal 1.
This is consistent with the trees built for "regular"
(Ada-defined) delta aggregates.
* sem_aggr.adb (Is_Deep_Choice, Is_Root_Prefix_Of_Deep_Choice):
New queries.
(Resolve_Deep_Delta_Assoc): new procedure.
(Resolve_Delta_Array_Aggregate): call Resolve_Deep_Delta_Assoc in
deep case.
(Resolve_Delta_Record_Aggregate): call Resolve_Deep_Delta_Assoc in
deep case.
(Get_Component_Type): new function replaces old Get_Component
function.
* sem_aggr.ads (Is_Deep_Choice, Is_Root_Prefix_Of_Deep_Choice):
New queries.
* exp_aggr.adb (Expand_Delta_Array_Aggregate): add nested function
Make_Array_Delta_Assignment_LHS; call it instead of
Make_Indexed_Component.
(Expand_Delta_Record_Aggregate): add nested function
Make_Record_Delta_Assignment_LHS; call it instead of
Make_Selected_Component.
* exp_spark.adb (Expand_SPARK_Delta_Or_Update): Insert range
checks for indexes in deep delta aggregates.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb  | 108 ++--
 gcc/ada/exp_spark.adb |  53 +++-
 gcc/ada/par-ch4.adb   | 120 +-
 gcc/ada/par.adb   |   5 +
 gcc/ada/sem_aggr.adb  | 288 +++---
 gcc/ada/sem_aggr.ads  |  14 +-
 6 files changed, 522 insertions(+), 66 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 319254dfd63..a6a54e892e2 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -5243,7 +5243,7 @@ package body Exp_Aggr is
  --  The bounds of the aggregate for this dimension
 
  Ind_Typ : constant Entity_Id := Aggr_Index_Typ (Dim);
- --  The index type for this dimension.xxx
+ --  The index type for this dimension.
 
  Cond  : Node_Id;
  Assoc : Node_Id;
@@ -7344,6 +7344,12 @@ package body Exp_Aggr is
   --  choices that are ranges, subtype indications, subtype names, and
   --  iterated component associations.
 
+  function Make_Array_Delta_Assignment_LHS
+(Choice : Node_Id; Temp : Entity_Id) return Node_Id;
+  --  Generate the LHS for the assignment associated with one
+  --  component association. This can be more complex than just an
+  --  indexed component in the case of a deep delta aggregate.
+
   ---
   -- Generate_Loop --
   ---
@@ -7380,6 +7386,60 @@ package body Exp_Aggr is
   End_Label   => Empty);
   end Generate_Loop;
 
+  function Make_Array_Delta_Assignment_LHS
+(Choice : Node_Id; Temp : Entity_Id) return Node_Id
+  is
+ function Make_Delta_Choice_LHS
+   (Choice  : Node_Id;
+Deep_Choice : Boolean) return Node_Id;
+ --  Recursively (but recursion only in deep delta aggregate case)
+ --  build up the LHS by successively applying selectors.
+
+ ---
+ -- Make_Delta_Choice_LHS --
+ ---
+
+ function Make_Delta_Choice_LHS
+   (Choice  : Node_Id;
+Deep_Choice : Boolean) return Node_Id
+ is
+ begin
+if not Deep_Choice
+  or else Is_Root_Prefix_Of_Deep_Choice (Choice)
+then
+   return Make_Indexed_Component (Sloc (Choice),
+Prefix  => New_Occurrence_Of (Temp, Loc),
+Expressions => New_List (New_Copy_Tree (Choice)));
+
+else
+   --  a deep delta aggregate choice
+   pragma Assert (All_Extensions_Allowed);
+
+   declare
+  --  recursively get name for prefix
+  LHS_Prefix : constant Node_Id
+:= Make_Delta_Choice_LHS (Prefix (Choice), Deep_Choice);
+   begin
+  if Nkind (Choice) = N_Indexed_Component then
+ return Make_Indexed_Component (Sloc (Choice),
+

[COMMITTED] ada: Fix misplaced index directive in documentation

2023-11-21 Thread Marc Poulhiès

The index directive must be located before the indexed element, at least
for the generated texinfo to be correct. See:

https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-index

This was reported along with changes done in 
https://inbox.sourceware.org/gcc-patches/20230223102714.3606058-3-ar...@aarsen.me/

gcc/ada/

* doc/gnat_ugn/the_gnat_compilation_model.rst: Move index
directives.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../gnat_ugn/the_gnat_compilation_model.rst   | 11 
 gcc/ada/gnat_ugn.texi | 28 +--
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst 
b/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
index ed24fed7293..fd15459203a 100644
--- a/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
+++ b/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
@@ -226,9 +226,9 @@ possible encoding schemes:
   ``16#A345#``.
   This scheme is compatible with use of the full Wide_Character set.
 
-*Upper-Half Coding*
-  .. index:: Upper-Half Coding
+.. index:: Upper-Half Coding
 
+*Upper-Half Coding*
   The wide character with encoding ``16#abcd#`` where the upper bit is on
   (in other words, 'a' is in the range 8-F) is represented as two bytes,
   ``16#ab#`` and ``16#cd#``. The second byte cannot be a format control
@@ -236,9 +236,9 @@ possible encoding schemes:
   be also used for shift-JIS or EUC, where the internal coding matches the
   external coding.
 
-*Shift JIS Coding*
-  .. index:: Shift JIS Coding
+.. index:: Shift JIS Coding
 
+*Shift JIS Coding*
   A wide character is represented by a two-character sequence,
   ``16#ab#`` and
   ``16#cd#``, with the restrictions described for upper-half encoding as
@@ -247,10 +247,9 @@ possible encoding schemes:
   conversion. Only characters defined in the JIS code set table can be
   used with this encoding method.
 
+.. index:: EUC Coding
 
 *EUC Coding*
-  .. index:: EUC Coding
-
   A wide character is represented by a two-character sequence
   ``16#ab#`` and
   ``16#cd#``, with both characters being in the upper half. The internal
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 78f8849e379..3859709afff 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-GNAT User's Guide for Native Platforms , Oct 26, 2023
+GNAT User's Guide for Native Platforms , Nov 10, 2023
 
 AdaCore
 
@@ -1363,22 +1363,30 @@ characters (using uppercase letters) of the wide 
character code. For
 example, ESC A345 is used to represent the wide character with code
 @code{16#A345#}.
 This scheme is compatible with use of the full Wide_Character set.
-
-@item `Upper-Half Coding'
+@end table
 
 @geindex Upper-Half Coding
 
+
+@table @asis
+
+@item `Upper-Half Coding'
+
 The wide character with encoding @code{16#abcd#} where the upper bit is on
 (in other words, ‘a’ is in the range 8-F) is represented as two bytes,
 @code{16#ab#} and @code{16#cd#}. The second byte cannot be a format control
 character, but is not required to be in the upper half. This method can
 be also used for shift-JIS or EUC, where the internal coding matches the
 external coding.
-
-@item `Shift JIS Coding'
+@end table
 
 @geindex Shift JIS Coding
 
+
+@table @asis
+
+@item `Shift JIS Coding'
+
 A wide character is represented by a two-character sequence,
 @code{16#ab#} and
 @code{16#cd#}, with the restrictions described for upper-half encoding as
@@ -1386,11 +1394,15 @@ described above. The internal character code is the 
corresponding JIS
 character according to the standard algorithm for Shift-JIS
 conversion. Only characters defined in the JIS code set table can be
 used with this encoding method.
-
-@item `EUC Coding'
+@end table
 
 @geindex EUC Coding
 
+
+@table @asis
+
+@item `EUC Coding'
+
 A wide character is represented by a two-character sequence
 @code{16#ab#} and
 @code{16#cd#}, with both characters being in the upper half. The internal
@@ -29568,8 +29580,8 @@ to permit their use in free software.
 
 @printindex ge
 
-@anchor{d1}@w{  }
 @anchor{gnat_ugn/gnat_utility_programs switches-related-to-project-files}@w{   
   }
+@anchor{d1}@w{  }
 
 @c %**end of body
 @bye
-- 
2.42.0

[COMMITTED] ada: Runtime recompilation instructions improvements.

2023-11-21 Thread Marc Poulhiès

From: Doug Rupp 

Revise instructions to work on both cross and native targets hosted
on Linux

gcc/ada/

* libgnat/libada.gpr: Revise section 1

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/libada.gpr | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/libgnat/libada.gpr b/gcc/ada/libgnat/libada.gpr
index 9453cae8f53..2848c566ee1 100644
--- a/gcc/ada/libgnat/libada.gpr
+++ b/gcc/ada/libgnat/libada.gpr
@@ -6,13 +6,14 @@
 -- 1. Create a new directory (e.g. "rts-debug"), then copy the adainclude
 --directory from the reference runtime that you want to rebuild.
 --You can find the relevant adainclude directory by running the command
---gprls [--target=] [--RTS=] and using the adainclude
+--gprls -v [--target=] [--RTS=] and using the adainclude
 --directory listed. For example:
--- $ cd 
--- $ mkdir rts-debug
--- $ cd rts-debug
--- $ cp -a `gprls -v | grep adainclude` .
--- $ cd adainclude
+--$ cd 
+--$ mkdir rts-debug
+--$ cd rts-debug
+--$ cp -a `gprls -v \
+--  [--target=] --RTS=native | grep adainclude` .
+--$ cd adainclude
 --
 --or under Windows:
 --
-- 
2.42.0

[COMMITTED] ada: Small cleanup in finalization machinery

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

This removes an obsolete flag and adjusts a couple of obsolete comments.

gcc/ada/

* gen_il-fields.ads (Opt_Field_Enum): Remove Is_Finalization_Wrapper
* gen_il-gen-gen_nodes.adb (N_Block_Statement): Likewise.
* sinfo.ads (Is_Finalization_Wrapper): Delete.
* exp_ch7.adb (Build_Finalizer.Process_Declarations): Adjust comment
and remove obsolete code testing the Is_Finalization_Wrapper flag.
* exp_util.adb (Requires_Cleanup_Actions): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb  | 20 ++--
 gcc/ada/exp_util.adb | 24 ++--
 gcc/ada/gen_il-fields.ads|  1 -
 gcc/ada/gen_il-gen-gen_nodes.adb |  1 -
 gcc/ada/sinfo.ads|  7 ---
 5 files changed, 8 insertions(+), 45 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 369f0b07999..2e3da4cfaed 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -2248,8 +2248,8 @@ package body Exp_Ch7 is
--  Finalization of transient objects are treated separately in
--  order to handle sensitive cases. These include:
 
-   --* Aggregate expansion
-   --* If, case, and expression with actions expansion
+   --* Conditional expressions
+   --* Expressions with actions
--* Transient scopes
 
--  If one of those contexts has marked the transient object as
@@ -2508,22 +2508,6 @@ package body Exp_Ch7 is
then
   Last_Top_Level_Ctrl_Construct := Decl;
end if;
-
---  Handle the case where the original context has been wrapped in
---  a block to avoid interference between exception handlers and
---  At_End handlers. Treat the block as transparent and process its
---  contents.
-
-elsif Nkind (Decl) = N_Block_Statement
-  and then Is_Finalization_Wrapper (Decl)
-then
-   if Present (Handled_Statement_Sequence (Decl)) then
-  Process_Declarations
-(Statements (Handled_Statement_Sequence (Decl)),
- Preprocess);
-   end if;
-
-   Process_Declarations (Declarations (Decl), Preprocess);
 end if;
 
 Prev_Non_Pragma (Decl);
diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 730889cae3e..3b34e4659f1 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -13023,8 +13023,8 @@ package body Exp_Util is
 --  Finalization of transient objects are treated separately in
 --  order to handle sensitive cases. These include:
 
---* Aggregate expansion
---* If, case, and expression with actions expansion
+--* Conditional expressions
+--* Expressions with actions
 --* Transient scopes
 
 --  If one of those contexts has marked the transient object as
@@ -13234,23 +13234,11 @@ package body Exp_Util is
return True;
 end if;
 
- elsif Nkind (Decl) = N_Block_Statement
-   and then
-
-   --  Handle a rare case caused by a controlled transient object
-   --  created as part of a record init proc. The variable is wrapped
-   --  in a block, but the block is not associated with a transient
-   --  scope.
-
-   (Inside_Init_Proc
+--  Handle a rare case caused by a controlled transient object created
+--  as part of a record init proc. The variable is wrapped in a block,
+--  but the block is not associated with a transient scope.
 
-   --  Handle the case where the original context has been wrapped in
-   --  a block to avoid interference between exception handlers and
-   --  At_End handlers. Treat the block as transparent and process its
-   --  contents.
-
- or else Is_Finalization_Wrapper (Decl))
- then
+ elsif Nkind (Decl) = N_Block_Statement and then Inside_Init_Proc then
 if Requires_Cleanup_Actions (Decl, Lib_Level) then
return True;
 end if;
diff --git a/gcc/ada/gen_il-fields.ads b/gcc/ada/gen_il-fields.ads
index a0bfb398ebb..c565e19701d 100644
--- a/gcc/ada/gen_il-fields.ads
+++ b/gcc/ada/gen_il-fields.ads
@@ -255,7 +255,6 @@ package Gen_IL.Fields is
   Is_Entry_Barrier_Function,
   Is_Expanded_Build_In_Place_Call,
   Is_Expanded_Contract,
-  Is_Finalization_Wrapper,
   Is_Folded_In_Parser,
   Is_Generic_Contract_Pragma,
   Is_Homogeneous_Aggregate,
diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
index 996d8d78aea..087f78567f4 100644
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -1

[COMMITTED] ada: Fix spurious error on call with default parameter in generic package

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

This occurs when the default value is a function call returning a private
type, and is caused by a bad interaction between two internal mechanisms.

gcc/ada/

* sem_ch12.adb (Save_Global_References.Set_Global_Type): Beef up
comment about the setting of the full view.
* sem_res.adb (Resolve_Actuals.Insert_Default): Add another bypass
for the case of a generic context.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb |  7 +--
 gcc/ada/sem_res.adb  | 14 +++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index f73e1b53b0e..31fcbedf774 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -16938,8 +16938,11 @@ package body Sem_Ch12 is
  elsif No (Full_View (Typ)) and then Typ /= Etype (Typ) then
 null;
 
- --  Otherwise mark the type for flipping and use the full view when
- --  available.
+ --  Otherwise mark the type for flipping and set the full view on N2
+ --  when available, which is necessary for Check_Private_View to swap
+ --  back the views in case the full declaration of Typ is visible in
+ --  the instantiation context. Note that this will be problematic if
+ --  N2 is re-analyzed later, e.g. if it's a default value in a call.
 
  else
 Set_Has_Private_View (N);
diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
index 42f7c10c5c5..70a84176054 100644
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -4017,13 +4017,21 @@ package body Sem_Res is
   Analyze_And_Resolve (Actval, Base_Type (Etype (Actval)));
 
--  Resolve entities with their own type, which may differ from
-   --  the type of a reference in a generic context (the view
-   --  swapping mechanism did not anticipate the re-analysis of
-   --  default values in calls).
+   --  the type of a reference in a generic context because of the
+   --  trick used in Save_Global_References.Set_Global_Type to set
+   --  full views forcefully, which did not anticipate the need to
+   --  re-analyze default values in calls.
 
elsif Is_Entity_Name (Actval) then
   Analyze_And_Resolve (Actval, Etype (Entity (Actval)));
 
+   --  Ditto for calls whose name is an entity, for the same reason
+
+   elsif Nkind (Actval) = N_Function_Call
+ and then Is_Entity_Name (Name (Actval))
+   then
+  Analyze_And_Resolve (Actval, Etype (Entity (Name (Actval;
+
else
   Analyze_And_Resolve (Actval, Etype (Actval));
end if;
-- 
2.42.0

[COMMITTED] ada: Use CLOCK_MONOTONIC on VxWorks

2023-11-21 Thread Marc Poulhiès

From: Doug Rupp 

The monotonic clock keeps track of the time that has elapsed since
system startup; that is, the value returned by clock_gettime() is the
amount of time (in seconds and nanoseconds) that has passed since the
system booted. The monotonic clock cannot be reset. As a result,
time interval measurements made relative to the monotonic clock are
not subject to errors resulting from the clock time being unexpectedly
adjusted between the interval start and end.

gcc/ada/

* s-oscons-tmplt.c: #define CLOCK_RT_Ada "CLOCK_MONOTONIC" for
__vxworks

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/s-oscons-tmplt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/s-oscons-tmplt.c b/gcc/ada/s-oscons-tmplt.c
index fb6bb0f043b..f1140d5ecbc 100644
--- a/gcc/ada/s-oscons-tmplt.c
+++ b/gcc/ada/s-oscons-tmplt.c
@@ -1975,7 +1975,8 @@ CND(CLOCK_THREAD_CPUTIME_ID, "Thread CPU clock")
 
 #if defined(__linux__) || defined(__FreeBSD__) \
  || (defined(_AIX) && defined(_AIXVERSION_530)) \
- || defined(__DragonFly__) || defined(__QNX__)
+ || defined(__DragonFly__) || defined(__QNX__) \
+ || defined (__vxworks)
 /** On these platforms use system provided monotonic clock instead of
  ** the default CLOCK_REALTIME. We then need to set up cond var attributes
  ** appropriately (see thread.c).
-- 
2.42.0

[COMMITTED] ada: Further cleanup in finalization machinery

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

This removes the specific treatment of transient scopes in initialization
procedures, which is obsolete.

gcc/ada/

* exp_aggr.adb (Convert_To_Assignments): Do not treat initialization
procedures specially when it comes to creating a transient scope.
* exp_ch7.adb (Build_Finalizer.Process_Declarations): Likewise.
* exp_util.adb (Requires_Cleanup_Actions): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 10 ++
 gcc/ada/exp_ch7.adb  | 29 -
 gcc/ada/exp_util.adb |  9 -
 3 files changed, 2 insertions(+), 46 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index a6a54e892e2..691430a3e52 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -4294,15 +4294,9 @@ package body Exp_Aggr is
  return;
   end if;
 
-  --  Otherwise, if a transient scope is required, create it now. If we
-  --  are within an initialization procedure do not create such, because
-  --  the target of the assignment must not be declared within a local
-  --  block, and because cleanup will take place on return from the
-  --  initialization procedure.
+  --  Otherwise, if a transient scope is required, create it now
 
-  --  Should the condition be more restrictive ???
-
-  if Requires_Transient_Scope (Typ) and then not Inside_Init_Proc then
+  if Requires_Transient_Scope (Typ) then
  Establish_Transient_Scope (N, Manage_Sec_Stack => False);
   end if;
 
diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 2e3da4cfaed..ef3b5c95d64 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -2479,35 +2479,6 @@ package body Exp_Ch7 is
   and then Present (Library_Unit (Decl))
 then
Process_Package_Body (Proper_Body (Unit (Library_Unit (Decl;
-
---  Handle a rare case caused by a controlled transient object
---  created as part of a record init proc. The variable is wrapped
---  in a block, but the block is not associated with a transient
---  scope.
-
-elsif Nkind (Decl) = N_Block_Statement
-  and then Inside_Init_Proc
-then
-   Old_Counter_Val := Counter_Val;
-
-   if Present (Handled_Statement_Sequence (Decl)) then
-  Process_Declarations
-(Statements (Handled_Statement_Sequence (Decl)),
- Preprocess);
-   end if;
-
-   Process_Declarations (Declarations (Decl), Preprocess);
-
-   --  Either the declaration or statement list of the block has a
-   --  controlled object.
-
-   if Preprocess
- and then Top_Level
- and then No (Last_Top_Level_Ctrl_Construct)
- and then Counter_Val > Old_Counter_Val
-   then
-  Last_Top_Level_Ctrl_Construct := Decl;
-   end if;
 end if;
 
 Prev_Non_Pragma (Decl);
diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 3b34e4659f1..3952a161bd7 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -13233,15 +13233,6 @@ package body Exp_Util is
 then
return True;
 end if;
-
---  Handle a rare case caused by a controlled transient object created
---  as part of a record init proc. The variable is wrapped in a block,
---  but the block is not associated with a transient scope.
-
- elsif Nkind (Decl) = N_Block_Statement and then Inside_Init_Proc then
-if Requires_Cleanup_Actions (Decl, Lib_Level) then
-   return True;
-end if;
  end if;
 
  Next (Decl);
-- 
2.42.0

[COMMITTED] ada: Deep delta aggregates cleanup.

2023-11-21 Thread Marc Poulhiès

From: Steve Baird 

Cleanup after the introduction of deep delta aggregates.
Eliminate a new gnatcheck message.

gcc/ada/

* sem_aggr.adb: Replace "not Present (...)" call with "No (...)" call.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index c1d25404ae4..46a96a31a00 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -3903,7 +3903,7 @@ package body Sem_Aggr is
   Prefix_Type : constant Entity_Id :=
 Get_Component_Type (Prefix (Selector), Enclosing_Type);
begin
-  if not Present (Prefix_Type) then
+  if No (Prefix_Type) then
  pragma Assert (Serious_Errors_Detected > 0);
  return Empty;
   end if;
-- 
2.42.0

[COMMITTED] ada: Further cleanup in finalization machinery

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

The bodies of generic units are instantiated separately by GNAT at the end
of the processing of the compilation unit.  This requires the deferral of
the generation of cleanups and finalization actions in enclosing scopes,
except for instantiations in generic units where they are not generated.

The criterion used to detect this latter case is Inside_A_Generic, but this
global variable is not properly updated during the instantiation of generic
bodies, leading to problems with nested instantiations, so it is changed to
Expander_Active instead.  As a matter of fact, the exact same idiom is used
a few lines above to clear the Needs_Body variable.

gcc/ada/

* sem_ch12.adb (Analyze_Package_Instantiation): Test Expander_Active
to detect generic contexts for the generation of cleanup actions.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 31fcbedf774..7c645c490ae 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -4824,10 +4824,7 @@ package body Sem_Ch12 is
  --  Cleanup actions are not generated within generic units
  --  or in the formal part of generic units.
 
- if Inside_A_Generic
-   or else Is_Generic_Unit (S)
-   or else Ekind (S) = E_Void
- then
+ if not Expander_Active then
 exit;
 
  --  For package scopes, cleanup actions are generated only
-- 
2.42.0

[COMMITTED] ada: Fix Ada.Text_IO.Delete with "encoding=8bits" form

2023-11-21 Thread Marc Poulhiès

From: Ronan Desplanques 

Before this patch, on Windows, file with non-ASCII Latin1 names could be created
with Ada.Text_IO.Create by passing "encoding=8bits" through the Form
parameter and a Latin1-encoded string through the Name parameter,
but calling Ada.Text_IO.Delete on them raised an illegitimate exception.

This patch fixes this by making the wrappers of the unlink system function
aware of the encoding value passed through the Form parameter. It also
removes an unnecessary curly-brace block.

gcc/ada/

* adaint.c (__gnat_unlink): Add new parameter and fix text
conversion on Windows. Remove unnecessary curly braces.
* adaint.h (__gnat_unlink): Add new parameter.
* libgnat/i-cstrea.ads (unlink): Adapt to __gnat_unlink signature
change.
* libgnat/i-cstrea.adb (unlink): New Subprogram definition.
* libgnat/s-crtl.ads (unlink): Adapt to __gnat_unlink signature
change.
* libgnat/s-fileio.adb (Delete): Pass encoding argument to unlink.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/adaint.c | 14 +-
 gcc/ada/adaint.h |  2 +-
 gcc/ada/libgnat/i-cstrea.adb |  9 +
 gcc/ada/libgnat/i-cstrea.ads |  3 +--
 gcc/ada/libgnat/s-crtl.ads   |  3 ++-
 gcc/ada/libgnat/s-fileio.adb |  3 ++-
 6 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
index bb4ed2607e5..4ab95658c62 100644
--- a/gcc/ada/adaint.c
+++ b/gcc/ada/adaint.c
@@ -747,15 +747,19 @@ __gnat_os_filename (char *filename ATTRIBUTE_UNUSED,
 /* Delete a file.  */
 
 int
-__gnat_unlink (char *path)
+__gnat_unlink (char *path, int encoding ATTRIBUTE_UNUSED)
 {
 #if defined (__MINGW32__) && ! defined (__vxworks) && ! defined (IS_CROSS)
-  {
-TCHAR wpath[GNAT_MAX_PATH_LEN];
+  TCHAR wpath[GNAT_MAX_PATH_LEN];
 
+  if (encoding == Encoding_Unspecified)
 S2WSC (wpath, path, GNAT_MAX_PATH_LEN);
-return _tunlink (wpath);
-  }
+  else if (encoding == Encoding_UTF8)
+S2WSU (wpath, path, GNAT_MAX_PATH_LEN);
+  else
+S2WS (wpath, path, GNAT_MAX_PATH_LEN);
+
+  return _tunlink (wpath);
 #else
   return unlink (path);
 #endif
diff --git a/gcc/ada/adaint.h b/gcc/ada/adaint.h
index 987432c9307..298ea9e2f9f 100644
--- a/gcc/ada/adaint.h
+++ b/gcc/ada/adaint.h
@@ -172,7 +172,7 @@ extern int__gnat_open_new_temp (char *, 
int);
 extern int__gnat_mkdir(char *, int);
 extern int__gnat_stat (char *,
GNAT_STRUCT_STAT *);
-extern int__gnat_unlink(char *);
+extern int__gnat_unlink(char *, int encoding);
 extern int__gnat_rename(char *, char *);
 extern int__gnat_chdir (char *);
 extern int__gnat_rmdir (char *);
diff --git a/gcc/ada/libgnat/i-cstrea.adb b/gcc/ada/libgnat/i-cstrea.adb
index f761f3f73ae..fe668e159ad 100644
--- a/gcc/ada/libgnat/i-cstrea.adb
+++ b/gcc/ada/libgnat/i-cstrea.adb
@@ -130,4 +130,13 @@ package body Interfaces.C_Streams is
   return C_setvbuf (stream, buffer, mode, size);
end setvbuf;
 
+   
+   -- unlink --
+   
+
+   function unlink (filename : chars) return int is
+   begin
+  return System.CRTL.unlink (filename);
+   end unlink;
+
 end Interfaces.C_Streams;
diff --git a/gcc/ada/libgnat/i-cstrea.ads b/gcc/ada/libgnat/i-cstrea.ads
index 39111225db4..67f10cf0b42 100644
--- a/gcc/ada/libgnat/i-cstrea.ads
+++ b/gcc/ada/libgnat/i-cstrea.ads
@@ -197,8 +197,7 @@ package Interfaces.C_Streams is
function ungetc (c : int; stream : FILEs) return int
  renames System.CRTL.ungetc;
 
-   function unlink (filename : chars) return int
- renames System.CRTL.unlink;
+   function unlink (filename : chars) return int;
 
-
-- Extra functions --
diff --git a/gcc/ada/libgnat/s-crtl.ads b/gcc/ada/libgnat/s-crtl.ads
index c3a3b6481db..56900a8756f 100644
--- a/gcc/ada/libgnat/s-crtl.ads
+++ b/gcc/ada/libgnat/s-crtl.ads
@@ -220,7 +220,8 @@ package System.CRTL is
function ungetc (c : int; stream : FILEs) return int;
pragma Import (C, ungetc, "ungetc");
 
-   function unlink (filename : chars) return int;
+   function unlink (filename : chars;
+ encoding : Filename_Encoding := Unspecified) return int;
pragma Import (C, unlink, "__gnat_unlink");
 
function open (filename : chars; oflag : int) return int;
diff --git a/gcc/ada/libgnat/s-fileio.adb b/gcc/ada/libgnat/s-fileio.adb
index 931b68a3d2e..f55cdc796a3 100644
--- a/gcc/ada/libgnat/s-fileio.adb
+++ b/gcc/ada/libgnat/s-fileio.adb
@@ -350,6 +350,7 @@ package body System.File_IO is
   declare
  Filename : aliased constant String := File.Name.all;
  Is_Temporary_File : constant Boolean := File.Is_Temporary_File;
+ Encoding : constant CRTL.Filename_Encoding := File

[COMMITTED] ada: Small improvement to Null_Status function

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

The function is used to optimize away access checks.

gcc/ada/

* sem_util.adb (Null_Status): Deal with unchecked type conversions.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 3d870b1049c..eb2d83a4d6d 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -25429,7 +25429,7 @@ package body Sem_Util is
 
   --  Check the status of the operand of a type conversion
 
-  elsif Nkind (N) = N_Type_Conversion then
+  elsif Nkind (N) in N_Type_Conversion | N_Unchecked_Type_Conversion then
  return Null_Status (Expression (N));
 
   --  The input denotes a reference to an entity. Determine whether the
-- 
2.42.0

[COMMITTED] ada: Fix string indexing within GNAT.Calendar.Time_IO.Value

2023-11-21 Thread Marc Poulhiès

From: Justin Squirek 

The patch fixes an issue in the compiler whereby calls to
GNAT.Calendar.Time_IO.Value where the actual for formal String Date with
indexing starting at any value besides one would result in a spurious runtime
exception.

gcc/ada/

* libgnat/g-catiio.adb (Value): Modify conditionals to use 'Last
instead of 'Length

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/g-catiio.adb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnat/g-catiio.adb b/gcc/ada/libgnat/g-catiio.adb
index 42b86cce4a1..d80e6fc1ca0 100644
--- a/gcc/ada/libgnat/g-catiio.adb
+++ b/gcc/ada/libgnat/g-catiio.adb
@@ -849,7 +849,7 @@ package body GNAT.Calendar.Time_IO is
   begin
  Advance_Digits (Num_Digits => 1);
 
- while Index <= Date'Length and then Symbol in '0' .. '9' loop
+ while Index <= Date'Last and then Symbol in '0' .. '9' loop
 Advance;
  end loop;
 
@@ -1005,7 +1005,7 @@ package body GNAT.Calendar.Time_IO is
 
   --  Check for trailing characters
 
-  if Index /= Date'Length + 1 then
+  if Index /= Date'Last + 1 then
  raise Wrong_Syntax;
   end if;
 
-- 
2.42.0

[COMMITTED] ada: Avoid Style_Checks pragmas affecting other units

2023-11-21 Thread Marc Poulhiès

From: Viljar Indus 

gcc/ada/

* par.adb: Restore Style_Checks after parsing each unit.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/par.adb | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/ada/par.adb b/gcc/ada/par.adb
index 180ec08561c..4e10dd9049c 100644
--- a/gcc/ada/par.adb
+++ b/gcc/ada/par.adb
@@ -80,6 +80,10 @@ function Par (Configuration_Pragmas : Boolean) return 
List_Id is
--  True within a delta aggregate (but only after the "delta" token has
--  been scanned). Used to distinguish syntax errors from syntactically
--  correct "deep" delta aggregates (enabled via -gnatX0).
+   Save_Style_Checks : Style_Check_Options;
+   Save_Style_Check  : Boolean;
+   --  Variables for storing the original state of whether style checks should
+   --  be active in general and which particular ones should be checked.
 

-- Error Recovery --
@@ -1601,6 +1605,11 @@ begin
else
   Save_Config_Attrs := Save_Config_Switches;
 
+  --  Store the state of Style_Checks pragamas
+
+  Save_Style_Check := Style_Check;
+  Save_Style_Check_Options (Save_Style_Checks);
+
   --  The following loop runs more than once in syntax check mode
   --  where we allow multiple compilation units in the same file
   --  and in Multiple_Unit_Per_file mode where we skip units till
@@ -1658,6 +1667,7 @@ begin
  --  syntax mode we are interested in all units in the file.
 
  else
+
 declare
Comp_Unit_Node : constant Node_Id := P_Compilation_Unit;
 
@@ -1744,6 +1754,13 @@ begin
  Restore_Config_Switches (Save_Config_Attrs);
   end loop;
 
+  --  Restore the state of Style_Checks after parsing the unit to
+  --  avoid parsed pragmas affecting other units.
+
+  Reset_Style_Check_Options;
+  Set_Style_Check_Options (Save_Style_Checks);
+  Style_Check := Save_Style_Check;
+
   --  Now that we have completely parsed the source file, we can complete
   --  the source file table entry.
 
-- 
2.42.0

[COMMITTED] ada: Compiler crash on container aggregate with loop_parameter_specifications

2023-11-21 Thread Marc Poulhiès

From: Gary Dismukes 

The compiler crashes on a container aggregate with more than one
iterated_element_association given by a loop_parameter_specification.
In such a case, the tree contains N_Iterated_Component_Association
nodes rather than N_Iterated_Element_Association nodes, and the code
for handling those needs to obtain the bounds from the Discrete_Choices
field of each N_Iterated_Component_Association rather than assuming
that the association has a normal list of choices.

gcc/ada/

* sem_aggr.adb (Resolve_Container_Aggregate): In the case where Comp
is an N_Iterated_Component_Association, pick up Discrete_Choices rather
than Choices.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index d3f9a773191..bc03a079f5a 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -3575,7 +3575,23 @@ package body Sem_Aggr is
   end if;
end if;
 else
-   Choice := First (Choices (Comp));
+
+   --  If Nkind is N_Iterated_Component_Association,
+   --  this corresponds to an iterator_specification
+   --  with a loop_parameter_specification, and we
+   --  have to pick up Discrete_Choices. In this case
+   --  there will be just one "choice", which will
+   --  typically be a range.
+
+   if Nkind (Comp) = N_Iterated_Component_Association
+   then
+  Choice := First (Discrete_Choices (Comp));
+
+   --  Case where there's a list of choices
+
+   else
+  Choice := First (Choices (Comp));
+   end if;
 
while Present (Choice) loop
   Get_Index_Bounds (Choice, Lo, Hi);
-- 
2.42.0

[COMMITTED] ada: Fix issue with indefinite vector of overaligned unconstrained array

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

The problem is that the aligning machinery is not consistently triggered,
depending on whether a constrained view or the nominal unconstrained view
of the element type is used to perform the allocations and deallocations.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) : Put
the alignment directly on the type in the constrained case too.
* gcc-interface/utils.cc (maybe_pad_type): For an array type, take
the alignment of the element type as the original alignment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc  | 12 
 gcc/ada/gcc-interface/utils.cc | 19 ---
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 9c7f6840e21..c446b146179 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -3010,6 +3010,18 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree 
gnu_expr, bool definition)
TREE_TYPE (TYPE_FIELDS (gnu_type)) = gnu_inner;
}
}
+
+ /* Otherwise, if an alignment is specified, use it if valid and, if
+the alignment was requested with an explicit clause, state so.  */
+ else if (Known_Alignment (gnat_entity))
+   {
+ SET_TYPE_ALIGN (gnu_type,
+ validate_alignment (Alignment (gnat_entity),
+ gnat_entity,
+ TYPE_ALIGN (gnu_type)));
+ if (Present (Alignment_Clause (gnat_entity)))
+   TYPE_USER_ALIGN (gnu_type) = 1;
+   }
}
   break;
 
diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 8b2c7f99ef3..e7b5c7783b1 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -1485,7 +1485,14 @@ canonicalize_pad_type (tree type)
IS_COMPONENT_TYPE is true if this is being done for the component type of
an array.  DEFINITION is true if this type is being defined.  SET_RM_SIZE
is true if the RM size of the resulting type is to be set to SIZE too; in
-   this case, the padded type is canonicalized before being returned.  */
+   this case, the padded type is canonicalized before being returned.
+
+   Note that, if TYPE is an array, then we pad it even if it has already got
+   an alignment of ALIGN, provided that it's larger than the alignment of the
+   element type.  This ensures that the size of the type is a multiple of its
+   alignment as required by the GCC type system, and alleviates the oddity of
+   the larger alignment, which is used to implement alignment clauses present
+   on unconstrained array types.  */
 
 tree
 maybe_pad_type (tree type, tree size, unsigned int align,
@@ -1493,7 +1500,10 @@ maybe_pad_type (tree type, tree size, unsigned int align,
bool definition, bool set_rm_size)
 {
   tree orig_size = TYPE_SIZE (type);
-  unsigned int orig_align = TYPE_ALIGN (type);
+  unsigned int orig_align
+= TREE_CODE (type) == ARRAY_TYPE
+  ? TYPE_ALIGN (TREE_TYPE (type))
+  : TYPE_ALIGN (type);
   tree record, field;
 
   /* If TYPE is a padded type, see if it agrees with any size and alignment
@@ -1515,7 +1525,10 @@ maybe_pad_type (tree type, tree size, unsigned int align,
 
   type = TREE_TYPE (TYPE_FIELDS (type));
   orig_size = TYPE_SIZE (type);
-  orig_align = TYPE_ALIGN (type);
+  orig_align
+   = TREE_CODE (type) == ARRAY_TYPE
+ ? TYPE_ALIGN (TREE_TYPE (type))
+ : TYPE_ALIGN (type);
 }
 
   /* If the size is either not being changed or is being made smaller (which
-- 
2.42.0

[COMMITTED] ada: Another couple of cleanups in the finalization machinery

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

For package specs and bodies that need finalizers, Build_Finalizer is
invoked from the Standard scope so it needs to adjust the scope stack
before creating new objects; this changes it to do so only once.

For other kinds of scopes, it is invoked from Expand_Cleanup_Actions,
which assumes that the correct scope is already on the stack; that's
why Cleanup_Scopes adjusts the scope stack explicitly, but it should
use Pop_Scope instead of End_Scope to do it.

gcc/ada/

* exp_ch7.adb (Build_Finalizer): For package specs and bodies, push
and pop the specs onto the scope stack only once.
* inline.adb (Cleanup_Scopes): Call Pop_Scope instead of End_Scope.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 29 +
 gcc/ada/inline.adb  |  2 +-
 2 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index ef3b5c95d64..f8c12b73e9b 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -1575,19 +1575,10 @@ package body Exp_Ch7 is
 Prepend_To (Decls, Counter_Typ_Decl);
 
 --  The counter and its associated type must be manually analyzed
---  since N has already been analyzed. Use the scope of the spec
---  when inserting in a package.
+--  since N has already been analyzed.
 
-if For_Package then
-   Push_Scope (Spec_Id);
-   Analyze (Counter_Typ_Decl);
-   Analyze (Counter_Decl);
-   Pop_Scope;
-
-else
-   Analyze (Counter_Typ_Decl);
-   Analyze (Counter_Decl);
-end if;
+Analyze (Counter_Typ_Decl);
+Analyze (Counter_Decl);
 
 Jump_Alts := New_List;
  end if;
@@ -1933,12 +1924,8 @@ package body Exp_Ch7 is
Append_To (Decls, Fin_Body);
 end if;
 
---  Push the name of the package
-
-Push_Scope (Spec_Id);
 Analyze (Fin_Spec);
 Analyze (Fin_Body);
-Pop_Scope;
 
  --  Non-package case
 
@@ -3419,6 +3406,10 @@ package body Exp_Ch7 is
   --  Step 2: Object [pre]processing
 
   if For_Package then
+ --  For package specs and bodies, we are invoked from the Standard
+ --  scope, so we need to push the specs onto the scope stack first.
+
+ Push_Scope (Spec_Id);
 
  --  Preprocess the visible declarations now in order to obtain the
  --  correct number of controlled object by the time the private
@@ -3496,6 +3487,12 @@ package body Exp_Ch7 is
   if Acts_As_Clean or Has_Ctrl_Objs or Has_Tagged_Types then
  Create_Finalizer;
   end if;
+
+  --  Pop the scope that was pushed above for package specs and bodies
+
+  if For_Package then
+ Pop_Scope;
+  end if;
end Build_Finalizer;
 
--
diff --git a/gcc/ada/inline.adb b/gcc/ada/inline.adb
index 5fff88144b2..1fbbe6d32c2 100644
--- a/gcc/ada/inline.adb
+++ b/gcc/ada/inline.adb
@@ -2908,7 +2908,7 @@ package body Inline is
  else
 Push_Scope (Scop);
 Expand_Cleanup_Actions (Decl);
-End_Scope;
+Pop_Scope;
  end if;
 
  Next_Elmt (Elmt);
-- 
2.42.0

[COMMITTED] ada: Fix type for SPARK expansion on deep delta aggregates

2023-11-21 Thread Marc Poulhiès

From: Yannick Moy 

gcc/ada/

* exp_spark.adb (Expand_SPARK_Delta_Or_Aggregate): Fix type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_spark.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/exp_spark.adb b/gcc/ada/exp_spark.adb
index c19aa201bde..ae0e616c797 100644
--- a/gcc/ada/exp_spark.adb
+++ b/gcc/ada/exp_spark.adb
@@ -200,7 +200,7 @@ package body Exp_SPARK is
   begin
  loop
 if Nkind (Pref) = N_Indexed_Component then
-   Index := First (Expressions (Choice));
+   Index := First (Expressions (Pref));
Apply_Scalar_Range_Check (Index, Etype (Index));
 
 elsif Is_Array_Type (Typ)
-- 
2.42.0

[BUG FIX] RISC-V: Disallow COSNT_VECTOR for DI on RV32

2023-11-21 Thread Juzhe-Zhong

This bug is exposed when testing on zvl512b RV32 system.

The rootcause is RA reload DI CONST_VECTOR into vmv.v.x then it ICE.

So disallow DI CONST_VECTOR on RV32.

PR target/112598

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Disallow DI CONST_VECTOR 
on RV32.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112598-1.c: New test.

---
 gcc/config/riscv/riscv.cc |  8 +++
 .../gcc.target/riscv/rvv/autovec/pr112598-1.c | 56 +++
 2 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112598-1.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3701f41b1b3..60d3f617395 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1598,6 +1598,14 @@ riscv_const_insns (rtx x)
rtx elt;
if (const_vec_duplicate_p (x, &elt))
  {
+   /* We don't allow CONST_VECTOR for DI vector on RV32
+  system since the ELT constant value can not held
+  within a single register to disable reload a DI
+  register vec_duplicate into vmv.v.x.  */
+   scalar_mode smode = GET_MODE_INNER (GET_MODE (x));
+   if (maybe_gt (GET_MODE_SIZE (smode), UNITS_PER_WORD)
+   && !immediate_operand (elt, Pmode))
+ return 0;
/* Constants from -16 to 15 can be loaded with vmv.v.i.
   The Wc0, Wc1 constraints are already covered by the
   vi constraint so we do not need to check them here
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112598-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112598-1.c
new file mode 100644
index 000..a1d7e5bf17b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112598-1.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zvfh_zfh_zvl512b -mabi=ilp32d -O3 
--param=riscv-autovec-lmul=m8 -O3 -fno-vect-cost-model -ffast-math" } */
+
+#include 
+#define TEST_UNARY_CALL_CVT(TYPE_IN, TYPE_OUT, CALL) \
+  void test_##TYPE_IN##_##TYPE_OUT##_##CALL (\
+TYPE_OUT *out, TYPE_IN *in, unsigned count)  \
+  {  \
+for (unsigned i = 0; i < count; i++) \
+  out[i] = CALL (in[i]); \
+  }
+#define TEST_ASSERT(TYPE) \
+  void test_##TYPE##_assert (TYPE *out, TYPE *ref, unsigned size) \
+  {   \
+for (unsigned i = 0; i < size; i++)   \
+  {   \
+   if (out[i] != ref[i]) \
+ __builtin_abort (); \
+  }   \
+  }
+#define TEST_INIT_CVT(TYPE_IN, VAL_IN, TYPE_REF, VAL_REF, NUM) \
+  void test_##TYPE_IN##_##TYPE_REF##_init_##NUM (  \
+TYPE_IN *in, TYPE_REF *ref, unsigned size) \
+  {\
+for (unsigned i = 0; i < size; i++)\
+  {\
+   in[i] = VAL_IN;\
+   ref[i] = VAL_REF;  \
+  }\
+  }
+#define RUN_TEST_CVT(TYPE_IN, TYPE_OUT, NUM, CALL, IN, OUT, REF, SIZE) \
+  test_##TYPE_IN##_##TYPE_OUT##_init_##NUM (IN, REF, SIZE);\
+  test_##TYPE_IN##_##TYPE_OUT##_##CALL (OUT, IN, SIZE);\
+  test_##TYPE_OUT##_assert (OUT, REF, SIZE);
+
+#define ARRAY_SIZE 128
+
+float in[ARRAY_SIZE];
+int64_t out[ARRAY_SIZE];
+int64_t ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL_CVT (float, int64_t, __builtin_llceilf)
+
+TEST_ASSERT (int64_t)
+
+
+TEST_INIT_CVT (float, 9223372036854775808.0, int64_t, 0x7fff, 26)
+TEST_INIT_CVT (float, __builtin_inf (), int64_t, __builtin_llceilf 
(__builtin_inf ()), 29)
+
+int64_t
+main ()
+{
+  RUN_TEST_CVT (float, int64_t, 26, __builtin_llceilf, in, out, ref, 
ARRAY_SIZE);
+  RUN_TEST_CVT (float, int64_t, 29, __builtin_llceilf, in, out, ref, 
ARRAY_SIZE);
+  return 0;
+}
-- 
2.36.3

[COMMITTED] ada: Fix internal error on 'Address of task component

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

This happens when the prefix of the selected component is of an access type,
i.e. there is an implicit dereference. because the prefix is not resolved.

gcc/ada/

* sem_attr.adb (Resolve_Attribute) : Remove the
bypass for prefixes with task type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 531bc112c91..000253e7993 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -12119,9 +12119,7 @@ package body Sem_Attr is
Note_Possible_Modification (P, Sure => False);
 end if;
 
-if Nkind (P) in N_Subexpr
-  and then Is_Overloaded (P)
-then
+if Nkind (P) in N_Subexpr and then Is_Overloaded (P) then
Get_First_Interp (P, Index, It);
Get_Next_Interp (Index, It);
 
@@ -12135,11 +12133,7 @@ package body Sem_Attr is
 if not Is_Entity_Name (P)
   or else not Is_Overloadable (Entity (P))
 then
-   if not Is_Task_Type (Etype (P))
- or else Nkind (P) = N_Explicit_Dereference
-   then
-  Resolve (P);
-   end if;
+   Resolve (P);
 end if;
 
 --  If this is the name of a derived subprogram, or that of a
-- 
2.42.0

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-21 Thread waffl3x

On Monday, November 20th, 2023 at 7:35 AM, Jason Merrill  
wrote:

> 
> 
> On 11/19/23 16:44, waffl3x wrote:
> 
> > On Sunday, November 19th, 2023 at 1:34 PM, Jason Merrill ja...@redhat.com 
> > wrote:
> > 
> > > On 11/19/23 13:36, waffl3x wrote:
> > > 
> > > > I'm having trouble fixing the error for this case, the control flow
> > > > when the functions are overloaded is much more complex.
> > > > 
> > > > struct S {
> > > > void f(this S&) {}
> > > > void f(this S&, int)
> > > > 
> > > > void g() {
> > > > void (*fp)(S&) = &f;
> > > > }
> > > > };
> > > > 
> > > > This seemed to have fixed the non overloaded case, but I'm also not
> > > > very happy with it, it feels kind of icky. Especially since the expr's
> > > > location isn't available here, although, it just occurred to me that
> > > > the expr's location is probably stored in the node.
> > > > 
> > > > typeck.cc:cp_build_addr_expr_1
> > > > ```
> > > > case BASELINK:
> > > > arg = BASELINK_FUNCTIONS (arg);
> > > > if (DECL_XOBJ_MEMBER_FUNC_P (
> > > > {
> > > > error ("You must qualify taking address of xobj member functions");
> > > > return error_mark_node;
> > > > }
> > > 
> > > The loc variable was set earlier in the function, you can use that.
> > 
> > Will do.
> > 
> > > The overloaded case we want to handle here in
> > > resolve_address_of_overloaded_function:
> > > 
> > > > if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)
> > > > && !(complain & tf_ptrmem_ok) && !flag_ms_extensions)
> > > > {
> > > > static int explained;
> > > > 
> > > > if (!(complain & tf_error))
> > > > return error_mark_node;
> > > > 
> > > > auto_diagnostic_group d;
> > > > if (permerror (input_location, "assuming pointer to member %qD", fn)
> > > > && !explained)
> > > > {
> > > > inform (input_location, "(a pointer to member can only be "
> > > > "formed with %<&%E%>)", fn);
> > > > explained = 1;
> > > > }
> > > > }
> > > 
> > > Jason
> > 
> > I'll check that out now, I just mostly finished the first lambda crash.
> > 
> > What is the proper way to error out of instantiate_body? What I have
> > right now is just not recursing down further if theres a problem. Also,
> > I'm starting to wonder if I should actually be erroring in
> > instantiate_decl instead.
> 
> 
> I think you want to error in start_preparsed_function, to handle
> template and non-template cases in the same place.
> 
> Jason

I just started a bootstrap, hopefully everything comes out just fine.
If I don't pass out before the tests finish (and the tests are all
fine) then I'll send it in for review tonight.

I stared at start_preparsed_function for a long while and couldn't
figure out where to start off at. So for now the error handling is
split up between instantiate_body and cp_parser_lambda_declarator_opt.
The latter is super not correct but I've been stuck on this for a long
time now though so I wanted to actually get something that works and
then try to make it better.

This patch is not as final as I would have liked as you can probably
deduce from the previous paragraph. I still have to write tests for a
number of cases, but I'm pretty sure everything works. I was going to
say except for by-value xobj parameters in lambdas with captures, but
that's magically working now too. I was also going to say I don't know
why, but I found where my mistake was (that I fixed without realizing
it was causing the aforementioned problem) so I do know what did it.

So rambling aside, I think everything should work now. To reiterate, I
still have to finish the tests for a few things. There's some
diagnostics I'm super not happy with, and lambda's names still say
static, but I already know how to fix that I think.

I will make an attempt at moving the diagnostics for an unrelated
explicit object parameter in a lambda with captures tomorrow. I just
want to get the almost fully featured patch reviewed ASAP, even if it's
still got some cruft.

As soon as these tests pass I will submit the patch, I'm not going to
split it up today, I'm simply too tired, but I assure you the final
version will properly be split up with a correct commit message and
everything.

Alex

[COMMITTED] ada: Compiler error reporting illegal prefix on legal loop iterator with "in"

2023-11-21 Thread Marc Poulhiès

From: Gary Dismukes 

During semantic analysis, the compiler fails to determine the cursor type
in the case of a generalized iterator loop with "in", in the case where the
iterator type has a parent type that is a controlled type (for example) and
its ancestor iterator interface type is given after as a progenitor. It also
improperly determines the ancestor interface type during expansion (within
Expand_Iterator_Loop_Over_Container), for both "in" and "of" iterator forms.
The FE was assuming that the iterator interface is simply the parent type
of the iterator type, but that type can occur later in the interface list,
or be inherited. A new function is added that properly locates a type's
iterator interface ancestor, if any, and is called for analysis and expansion.

gcc/ada/

* exp_ch5.adb (Expand_Iterator_Loop_Over_Container): Retrieve the
iteration type's iteration interface progenitor via
Iterator_Interface_Ancestor, in the case of both "in" and "of"
iterators. Narrow the scope of Pack, so it's declared and
initialized only within the code related to "of" iterators, and
change its name to Cont_Type_Pack. Adjust comments.
* sem_ch5.adb (Get_Cursor_Type): In the case of a derived type,
retrieve the iteration type's iterator interface progenitor (if it
exists) via Iterator_Interface_Ancestor rather than assuming that
the parent type is the interface progenitor.
* sem_util.ads (Iterator_Interface_Ancestor): New function.
* sem_util.adb (Iterator_Interface_Ancestor): New function
returning a type's associated iterator interface type, if any, by
collecting and traversing the type's interfaces.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch5.adb  | 42 +-
 gcc/ada/sem_ch5.adb  | 23 ---
 gcc/ada/sem_util.adb | 34 ++
 gcc/ada/sem_util.ads |  7 +++
 4 files changed, 82 insertions(+), 24 deletions(-)

diff --git a/gcc/ada/exp_ch5.adb b/gcc/ada/exp_ch5.adb
index cd3b02b9360..d946f6dda5e 100644
--- a/gcc/ada/exp_ch5.adb
+++ b/gcc/ada/exp_ch5.adb
@@ -5158,9 +5158,6 @@ package body Exp_Ch5 is
   --  The package in which the iterator interface is instantiated. This is
   --  typically an instance within the container package.
 
-  Pack : Entity_Id;
-  --  The package in which the container type is declared
-
begin
   if Present (Iterator_Filter (I_Spec)) then
  pragma Assert (Ada_Version >= Ada_2022);
@@ -5195,15 +5192,6 @@ package body Exp_Ch5 is
   --package Vector_Iterator_Interfaces is new
   --  Ada.Iterator_Interfaces (Cursor, Has_Element);
 
-  --  If the container type is a derived type, the cursor type is found in
-  --  the package of the ultimate ancestor type.
-
-  if Is_Derived_Type (Container_Typ) then
- Pack := Scope (Root_Type (Container_Typ));
-  else
- Pack := Scope (Container_Typ);
-  end if;
-
   if Of_Present (I_Spec) then
  Handle_Of : declare
 Container_Arg : Node_Id;
@@ -5289,6 +5277,9 @@ package body Exp_Ch5 is
 Default_Iter : Entity_Id;
 Ent  : Entity_Id;
 
+Cont_Type_Pack : Entity_Id;
+--  The package in which the container type is declared
+
 Reference_Control_Type : Entity_Id := Empty;
 Pseudo_Reference   : Entity_Id := Empty;
 
@@ -5312,11 +5303,14 @@ package body Exp_Ch5 is
 
 Iter_Type := Etype (Default_Iter);
 
---  The iterator type, which is a class-wide type, may itself be
---  derived locally, so the desired instantiation is the scope of
---  the root type of the iterator type.
+--  If the container type is a derived type, the cursor type is
+--  found in the package of the ultimate ancestor type.
 
-Iter_Pack := Scope (Root_Type (Etype (Iter_Type)));
+if Is_Derived_Type (Container_Typ) then
+   Cont_Type_Pack := Scope (Root_Type (Container_Typ));
+else
+   Cont_Type_Pack := Scope (Container_Typ);
+end if;
 
 --  Find declarations needed for "for ... of" optimization.
 --  These declarations come from GNAT sources or sources
@@ -5326,7 +5320,7 @@ package body Exp_Ch5 is
 --  Note that we use _Next or _Previous to avoid picking up
 --  some arbitrary user-defined Next or Previous.
 
-Ent := First_Entity (Pack);
+Ent := First_Entity (Cont_Type_Pack);
 while Present (Ent) loop
--  Get_Element_Access function with one parameter called
--  Position.
@@ -5400,6 +5394,11 @@ package body Exp_Ch5 is
 
 Analyze_And_Resolve (Name (I_Spec));
 
+--  The desired instantiation is the scope of

[COMMITTED] ada: Small consistency fix for -gnatwv warning

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

The goal is to arrange for the warning to be issued consistently between
objects whose address is taken and objects whose address is not taken.

gcc/ada/

* sem_warn.adb (Check_References.Type_OK_For_No_Value_Assigned):
New predicate.
(Check_References): For Warn_On_No_Value_Assigned, use the same test
on the type in the address-not-taken and default cases.

gcc/testsuite/ChangeLog:

* gnat.dg/warn25.adb: Add xfail.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_warn.adb | 46 ++--
 gcc/testsuite/gnat.dg/warn25.adb |  1 +
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/gcc/ada/sem_warn.adb b/gcc/ada/sem_warn.adb
index 7ecb4d9c4a6..125f5c701e0 100644
--- a/gcc/ada/sem_warn.adb
+++ b/gcc/ada/sem_warn.adb
@@ -857,6 +857,10 @@ package body Sem_Warn is
   --  from another unit. This is true for entities in packages that are at
   --  the library level.
 
+  function Type_OK_For_No_Value_Assigned (T : Entity_Id) return Boolean;
+  --  Return True if it is OK for an object of type T to be referenced
+  --  without having been assigned a value in the source.
+
   function Warnings_Off_E1 return Boolean;
   --  Return True if Warnings_Off is set for E1, or for its Etype (E1T),
   --  or for the base type of E1T.
@@ -1121,6 +1125,37 @@ package body Sem_Warn is
  end loop;
   end Publicly_Referenceable;
 
+  ---
+  -- Type_OK_For_No_Value_Assigned --
+  ---
+
+  function Type_OK_For_No_Value_Assigned (T : Entity_Id) return Boolean is
+  begin
+ --  No information for generic types, so be conservative
+
+ if Is_Generic_Type (T) then
+return False;
+ end if;
+
+ --  Even if objects of access types are implicitly initialized to null
+
+ if Is_Access_Type (T) then
+return False;
+ end if;
+
+ --  The criterion is whether the type is (partially) initialized in
+ --  the source, in other words we disregard implicit default values.
+ --  But we do not require full initialization for by-reference types
+ --  because they are complex and it may not be possible to have it.
+
+ if Is_By_Reference_Type (T) then
+return
+  Is_Partially_Initialized_Type (T, Include_Implicit => False);
+ else
+return Is_Fully_Initialized_Type (T);
+ end if;
+  end Type_OK_For_No_Value_Assigned;
+
   -
   -- Warnings_Off_E1 --
   -
@@ -1414,10 +1449,7 @@ package body Sem_Warn is
   and then not Warnings_Off_E1
   and then not Has_Junk_Name (E1)
 then
-   if Is_Access_Type (E1T)
- or else
-   not Is_Partially_Initialized_Type (E1T, False)
-   then
+   if not Type_OK_For_No_Value_Assigned (E1T) then
   Output_Reference_Error
 ("?v?variable& is read but never assigned!");
end if;
@@ -1456,14 +1488,12 @@ package body Sem_Warn is
   goto Continue;
end if;
 
-   --  Check for unset reference. If type of object has
-   --  preelaborable initialization, warning is misleading.
+   --  Check for unset reference
 
if Warn_On_No_Value_Assigned
  and then Present (UR)
- and then not Known_To_Have_Preelab_Init (Etype (E1))
+ and then not Type_OK_For_No_Value_Assigned (E1T)
then
-
   --  Don't issue warning if appearing inside Initial_Condition
   --  pragma or aspect, since that expression is not evaluated
   --  at the point where it occurs in the source.
diff --git a/gcc/testsuite/gnat.dg/warn25.adb b/gcc/testsuite/gnat.dg/warn25.adb
index e7848701818..cdf28aecbf5 100644
--- a/gcc/testsuite/gnat.dg/warn25.adb
+++ b/gcc/testsuite/gnat.dg/warn25.adb
@@ -1,5 +1,6 @@
 --  { dg-do compile }
 --  { dg-options "-gnatwa" }
+--  { dg-xfail-if "expected regression" { *-*-* } }
 
 with Ada.Exceptions;
 procedure Warn25 is
-- 
2.42.0

[COMMITTED] ada: Deep delta aggregates in postconditions

2023-11-21 Thread Marc Poulhiès

From: Steve Baird 

Fix a bug in handling array-valued deep delta aggregates occurring in
postconditions. The bug could result in a spurious compilation failure.

gcc/ada/

* sem_aggr.adb (Resolve_Delta_Array_Aggregate): In the case of a
deep delta choice, the expected type for the expression will
typically not be the component type of the array type, so a call
to Analyze_And_Resolve that assumes otherwise would be an error.
It turns out that such a call, while wrong, is usually harmless
because the expression has already been marked as analyzed. This
doesn't work if the aggregate occurs in a postcondition and, in
any case, we don't want to rely on this. So do not perform the
call in the deep case.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 46a96a31a00..d3f9a773191 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -3696,6 +3696,8 @@ package body Sem_Aggr is
   Choice : Node_Id;
   Expr   : Node_Id;
 
+  Deep_Choice_Seen : Boolean := False;
+
begin
   Assoc := First (Deltas);
   while Present (Assoc) loop
@@ -3750,6 +3752,7 @@ package body Sem_Aggr is
 while Present (Choice) loop
if Is_Deep_Choice (Choice, Typ) then
   pragma Assert (All_Extensions_Allowed);
+  Deep_Choice_Seen := True;
 
   --  a deep delta aggregate
   Resolve_Deep_Delta_Assoc (Assoc, Typ);
@@ -3794,7 +3797,7 @@ package body Sem_Aggr is
 if Box_Present (Assoc) then
Error_Msg_N
  ("'<'> in array delta aggregate is not allowed", Assoc);
-else
+elsif not Deep_Choice_Seen then
Analyze_And_Resolve (Expression (Assoc), Component_Type (Typ));
 end if;
  end if;
-- 
2.42.0

[COMMITTED] ada: Fix miscompilation of loop over boolean range

2023-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

The optimized form generated in this case turns out to be problematic.

gcc/ada/

* gcc-interface/trans.cc (Loop_Statement_to_gnu): Always use the
simpler form for a loop with a boolean iteration variable.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index c7d91628f80..9c418beda96 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -3021,7 +3021,9 @@ Loop_Statement_to_gnu (Node_Id gnat_node)
}
 
   /* We use two different strategies to translate the loop, depending on
-whether optimization is enabled.
+whether optimization is enabled, except for the very peculiar case
+of a loop running over a boolean type where we use the simpler form
+in order to avoid manipulating negative values in a boolean context.
 
 If it is, we generate the canonical loop form expected by the loop
 optimizer and the loop vectorizer, which is the do-while form:
@@ -3067,7 +3069,9 @@ Loop_Statement_to_gnu (Node_Id gnat_node)
 
 which works in all cases.  */
 
-  if (optimize && !optimize_debug)
+  if (optimize
+ && !optimize_debug
+ && TREE_CODE (gnu_base_type) != BOOLEAN_TYPE)
{
  /* We can use the do-while form directly if GNU_FIRST-1 doesn't
 overflow.  */
-- 
2.42.0

Re: [PATCH 2/2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Jakub Jelinek

On Tue, Nov 21, 2023 at 10:46:13AM +0100, Sebastian Huber wrote:
> > > --- a/gcc/tree-profile.cc
> > > +++ b/gcc/tree-profile.cc
> > > @@ -284,7 +284,9 @@ gen_assign_counter_update (gimple_stmt_iterator *gsi, 
> > > gcall *call, tree func,
> > > tree tmp = make_temp_ssa_name (result_type, NULL, name);
> > > gimple_set_lhs (call, tmp);
> > > gsi_insert_after (gsi, call, GSI_NEW_STMT);
> > > -  gassign *assign = gimple_build_assign (result, tmp);
> > > +  gassign *assign = gimple_build_assign (result,
> > > +  build_int_cst (TREE_TYPE (result),
> > > + tmp));
> > This can't be correct.
> > tmp is a SSA_NAME, so calling build_int_cst on it is not appropriate, the
> > second argument should be some unsigned HOST_WIDE_INT value.
> > If result_type is different type from TREE_TYPE (result), but both are
> > integer types, then you want
> >gassign *assing = gimple_build_assign (result, NOP_EXPR, tmp);
> > or so.
> 
> I really don't know what I am doing here, so a lot of guess work is involved
> from my side. The change fixed at least the failing test case. When I use
> the NOP_EXPR
> 
> static inline void
> gen_assign_counter_update (gimple_stmt_iterator *gsi, gcall *call, tree
> func,
>  tree result, const char *name)
> {
>   if (result)
> {
>   tree result_type = TREE_TYPE (TREE_TYPE (func));
>   tree tmp = make_temp_ssa_name (result_type, NULL, name);
>   gimple_set_lhs (call, tmp);
>   gsi_insert_after (gsi, call, GSI_NEW_STMT);
>   gassign *assign = gimple_build_assign (result, NOP_EXPR, tmp);
>   gsi_insert_after (gsi, assign, GSI_NEW_STMT);

Ah, sure, if result is not is_gimple_reg, then one can't use a cast into
that directly, needs to use another statement, one for the cast, one for the
store.
If you know the types are never compatible and result is never is_gimple_reg,
then
gimple_set_lhs (call, tmp);
gsi_insert_after (gsi, call, GSI_NEW_STMT);
gassign *assign
  = gimple_build_assign (make_ssa_name (TREE_TYPE (result)),
 NOP_EXPR, tmp);
gsi_insert_after (gsi, assign, GSI_NEW_STMT);
assign = gimple_build_assign (result, gimple_assign_lhs (assign));
gsi_insert_after (gsi, assign, GSI_NEW_STMT);
would do it, if it is sometimes the types are compatible and sometimes they
are not but result never is_gimple_reg, perhaps
gimple_set_lhs (call, tmp);
gsi_insert_after (gsi, call, GSI_NEW_STMT);
if (!useless_type_conversion_p (TREE_TYPE (result), result_type))
  {
gassign *assign
  = gimple_build_assign (make_ssa_name (TREE_TYPE (result)),
 NOP_EXPR, tmp);
gsi_insert_after (gsi, assign, GSI_NEW_STMT);
tmp = gimple_assign_lhs (assign);
  }
gassign *assign = gimple_build_assign (result, tmp);
gsi_insert_after (gsi, assign, GSI_NEW_STMT);
etc.

Jakub

Re: [PATCH 2/2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Jakub Jelinek

On Tue, Nov 21, 2023 at 11:07:47AM +0100, Jakub Jelinek wrote:
> > static inline void
> > gen_assign_counter_update (gimple_stmt_iterator *gsi, gcall *call, tree
> > func,
> >tree result, const char *name)
> > {
> >   if (result)
> > {
> >   tree result_type = TREE_TYPE (TREE_TYPE (func));
> >   tree tmp = make_temp_ssa_name (result_type, NULL, name);
> >   gimple_set_lhs (call, tmp);
> >   gsi_insert_after (gsi, call, GSI_NEW_STMT);
> >   gassign *assign = gimple_build_assign (result, NOP_EXPR, tmp);
> >   gsi_insert_after (gsi, assign, GSI_NEW_STMT);
> 
> Ah, sure, if result is not is_gimple_reg, then one can't use a cast into
> that directly, needs to use another statement, one for the cast, one for the
> store.
> If you know the types are never compatible and result is never is_gimple_reg,
> then
> gimple_set_lhs (call, tmp);
> gsi_insert_after (gsi, call, GSI_NEW_STMT);
> gassign *assign
> = gimple_build_assign (make_ssa_name (TREE_TYPE (result)),
>NOP_EXPR, tmp);
> gsi_insert_after (gsi, assign, GSI_NEW_STMT);
>   assign = gimple_build_assign (result, gimple_assign_lhs (assign));
> gsi_insert_after (gsi, assign, GSI_NEW_STMT);
> would do it

And to answer my own question, seems if result is non-NULL, then it always
has incompatible type and always is ARRAY_REF, i.e. not is_gimple_reg,
because it will have get_gcov_type () which is a signed type and the call
in this case __sync_add_fetch_{4,8} which is unsigned 32-bit or 64-bit
type.  So I'd go with the above.
Another formatting nit:
  tree f = builtin_decl_explicit (TYPE_PRECISION (type) > 32
  ? BUILT_IN_ATOMIC_ADD_FETCH_8:
  BUILT_IN_ATOMIC_ADD_FETCH_4);
should have been
  tree f = builtin_decl_explicit (TYPE_PRECISION (type) > 32
  ? BUILT_IN_ATOMIC_ADD_FETCH_8
  : BUILT_IN_ATOMIC_ADD_FETCH_4);
The : shouldn't be at the end of line and there should be space.

Jakub

[PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Sebastian Huber

This change fixes issues like this:

  gcc.dg/gomp/pr27573.c: In function ‘main._omp_fn.0’:
  gcc.dg/gomp/pr27573.c:19:1: error: non-trivial conversion in ‘ssa_name’
 19 | }
| ^
  long int
  long unsigned int
  # .MEM_19 = VDEF <.MEM_18>
  __gcov7.main._omp_fn.0[0] = PROF_time_profile_12;
  during IPA pass: profile
  gcc.dg/gomp/pr27573.c:19:1: internal compiler error: verify_gimple failed

gcc/ChangeLog:

PR middle-end/112634

* tree-profile.cc (gen_assign_counter_update): Cast the unsigned result 
type of
__atomic_add_fetch() to the signed counter type.
(gen_counter_update): Fix formatting.
---

v2: Use NOP_EXPR to do the cast.  Fix formatting.

 gcc/tree-profile.cc | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index f12b374ca27..ff95f8ef7cd 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -281,10 +281,13 @@ gen_assign_counter_update (gimple_stmt_iterator *gsi, 
gcall *call, tree func,
   if (result)
 {
   tree result_type = TREE_TYPE (TREE_TYPE (func));
-  tree tmp = make_temp_ssa_name (result_type, NULL, name);
-  gimple_set_lhs (call, tmp);
+  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
+  gimple_set_lhs (call, tmp1);
   gsi_insert_after (gsi, call, GSI_NEW_STMT);
-  gassign *assign = gimple_build_assign (result, tmp);
+  tree tmp2 = make_ssa_name (TREE_TYPE (result));
+  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
+  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
+  assign = gimple_build_assign (result, gimple_assign_lhs (assign));
   gsi_insert_after (gsi, assign, GSI_NEW_STMT);
 }
   else
@@ -309,8 +312,8 @@ gen_counter_update (gimple_stmt_iterator *gsi, tree 
counter, tree result,
 {
   /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
   tree f = builtin_decl_explicit (TYPE_PRECISION (type) > 32
- ? BUILT_IN_ATOMIC_ADD_FETCH_8:
- BUILT_IN_ATOMIC_ADD_FETCH_4);
+ ? BUILT_IN_ATOMIC_ADD_FETCH_8
+ : BUILT_IN_ATOMIC_ADD_FETCH_4);
   gcall *call = gimple_build_call (f, 3, addr, one, relaxed);
   gen_assign_counter_update (gsi, call, f, result, name);
 }
-- 
2.35.3

Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-21 Thread Maxim Kuvyrkov

> On Nov 20, 2023, at 20:30, Alexander Monakov  wrote:
> 
> 
> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
> 
>>> On Nov 20, 2023, at 17:52, Alexander Monakov  wrote:
>>> 
>>> 
>>> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
>>> 
 This patch avoids sched-deps.cc:find_inc() creating exponential number
 of dependencies, which become memory and compilation time hogs.
 Consider example (simplified from PR96388) ...
 ===
 sp=sp-4 // sp_insnA
 mem_insnA1[sp+A1]
 ...
 mem_insnAN[sp+AN]
 sp=sp-4 // sp_insnB
 mem_insnB1[sp+B1]
 ...
 mem_insnBM[sp+BM]
 ===
 ... in this example find_modifiable_mems() will arrange for mem_insnA*
 to be able to pass sp_insnA, and, while doing this, will create
 dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
 is a consumer of sp_insnA.  After this sp_insnB will have N new
 backward dependencies.
 Then find_modifiable_mems() gets to mem_insnB*s and starts to create
 N new dependencies for _every_ mem_insnB*.  This gets us N*M new
 dependencies.
>> 
>> [For avoidance of doubt, below discussion is about the general implementation
>> of find_modifiable_mems() and not about the patch.]
> 
> I was saying the commit message is hard to read (unless it's just me).
> 
>>> It's a bit hard to read this without knowing which value of 'backwards'
>>> is assumed.

Oh, sorry, I misunderstood your comment.

In the above example I want to describe outcome that current code generates, 
without going into details about exactly how it does it.  I'm not sure how to 
make it more readable, and would appreciate suggestions.


>>> 
>>> Say 'backwards' is true and we are inspecting producer sp_insnB of 
>>> mem_insnB1.
>>> This is a true dependency. We know we can break it by adjusting B1 by -4, 
>>> but
>>> we need to be careful not to move such modified mem_insnB1 above sp_insnA, 
>>> so
>>> we need to iterate over *incoming true dependencies* of sp_insnB and add 
>>> them.
>>> 
>>> But the code seems to be iterating over *all incoming dependencies*, so it
>>> will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
>>> dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my
>>> understanding is correct.
>> 
>> Yeap, your understanding is correct.  However, this is what
>> find_modifiable_mems() has to do to avoid complicated analysis of 
>> second-level
>> dependencies.
> 
> What is the reason it cannot simply skip anti-dependencies in the
> 'if (backwards)' loop, and true dependencies in the 'else' loop?

I /think/, this should be possible.  However, rather than improving current 
implementation my preference is to rework it by integrating with the main 
dependency analysis.  This should provide both faster and more precise 
dependency analysis, which would generate breakable addr/mem dependencies.

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Jakub Jelinek

On Tue, Nov 21, 2023 at 11:29:58AM +0100, Sebastian Huber wrote:
> This change fixes issues like this:
> 
>   gcc.dg/gomp/pr27573.c: In function ‘main._omp_fn.0’:
>   gcc.dg/gomp/pr27573.c:19:1: error: non-trivial conversion in ‘ssa_name’
>  19 | }
> | ^
>   long int
>   long unsigned int
>   # .MEM_19 = VDEF <.MEM_18>
>   __gcov7.main._omp_fn.0[0] = PROF_time_profile_12;
>   during IPA pass: profile
>   gcc.dg/gomp/pr27573.c:19:1: internal compiler error: verify_gimple failed
> 
> gcc/ChangeLog:
> 
>   PR middle-end/112634
> 
>   * tree-profile.cc (gen_assign_counter_update): Cast the unsigned result 
> type of
>   __atomic_add_fetch() to the signed counter type.
>   (gen_counter_update): Fix formatting.

> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -281,10 +281,13 @@ gen_assign_counter_update (gimple_stmt_iterator *gsi, 
> gcall *call, tree func,
>if (result)
>  {
>tree result_type = TREE_TYPE (TREE_TYPE (func));
> -  tree tmp = make_temp_ssa_name (result_type, NULL, name);
> -  gimple_set_lhs (call, tmp);
> +  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
> +  gimple_set_lhs (call, tmp1);
>gsi_insert_after (gsi, call, GSI_NEW_STMT);
> -  gassign *assign = gimple_build_assign (result, tmp);
> +  tree tmp2 = make_ssa_name (TREE_TYPE (result));
> +  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
> +  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
> +  assign = gimple_build_assign (result, gimple_assign_lhs (assign));

When you use a temporary tmp2 for the lhs of the conversion, you can just
use it here,
  assign = gimple_build_assign (result, tmp2);

Ok for trunk with that change.

>gsi_insert_after (gsi, assign, GSI_NEW_STMT);
>  }
>else
> @@ -309,8 +312,8 @@ gen_counter_update (gimple_stmt_iterator *gsi, tree 
> counter, tree result,
>  {
>/* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
>tree f = builtin_decl_explicit (TYPE_PRECISION (type) > 32
> -   ? BUILT_IN_ATOMIC_ADD_FETCH_8:
> -   BUILT_IN_ATOMIC_ADD_FETCH_4);
> +   ? BUILT_IN_ATOMIC_ADD_FETCH_8
> +   : BUILT_IN_ATOMIC_ADD_FETCH_4);
>gcall *call = gimple_build_call (f, 3, addr, one, relaxed);
>gen_assign_counter_update (gsi, call, f, result, name);
>  }

Jakub

Re: [PATCH]AArch64 Add pattern for unsigned widenings (uxtl) to zip{1, 2}

2023-11-21 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> This changes unpack instructions to use zip{1,2} when doing a zero-extending
> widening operation.  Permutes generally have a higher throughput than the
> widening operations. Zeros are shuffled into the top half of the registers.
>
> The testcase
>
> void d2 (unsigned * restrict a, unsigned short *b, int n)
> {
> for (int i = 0; i < (n & -8); i++)
>   a[i] = b[i];
> }
>
> now generates:
>
> moviv1.4s, 0
> .L3:
> ldr q0, [x1], 16
> zip1v2.8h, v0.8h, v1.8h
> zip2v0.8h, v0.8h, v1.8h
> stp q2, q0, [x0]
> add x0, x0, 32
> cmp x1, x2
> bne .L3
>
>
> instead of:
>
> .L3:
> ldr q0, [x1], 16
> uxtlv1.4s, v0.4h
> uxtl2   v0.4s, v0.8h
> stp q1, q0, [x0]
> add x0, x0, 32
> cmp x1, x2
> bne .L3
>
> Since we need the extra 0 register we do this only for the vectorizer's lo/hi
> pairs when we know the 0 will be floated outside of the loop.

The patterns are used by BB SLP as well, so we don't know for certain
that there's a containing loop.

We could provide patterns that match zips with zero, to allow the
zero to be combined back into the pair if the zero doesn't get hoisted.
The zip-with-zero patterns would need to have a higher cost than plain zips,
so that 2 zips + movi has the same cost as 2 zips with zero.  I guess
that means that the zips with zero should have a cost of 1.5 insns.

The late combine pass would then be able to get rid of the zero
if it is stuck in the same basic block (or a block with the same
execution frequency), but would keep a hoisted zero.

But if we do one thing all the time or another thing all the time,
I agree it's better to do the zips all the time.

If this turns out to be bad for non-Neoverse cores, we can add some sort
of tuning flag.  But I agree that we should use zips unconditionally
until we know of a specific core that doesn't want it.

> This gives an 8% speed-up in Imagick in SPECCPU 2017 on Neoverse V2.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (vec_unpack_lo_   vec_unpack_lo_ (vec_unpacku_lo_   vec_unpacku_lo_   (aarch64_usubw__zip): New.
>   (aarch64_uaddw__zip): New.
>   * config/aarch64/iterators.md (PERM_EXTEND, perm_index): New.
>   (perm_hilo): Add UNSPEC_ZIP1, UNSPEC_ZIP2.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/simd/vmovl_high_1.c: Update codegen.
>   * gcc.target/aarch64/uxtl-combine-1.c: New test.
>   * gcc.target/aarch64/uxtl-combine-2.c: New test.
>   * gcc.target/aarch64/uxtl-combine-3.c: New test.
>   * gcc.target/aarch64/uxtl-combine-4.c: New test.
>   * gcc.target/aarch64/uxtl-combine-5.c: New test.
>   * gcc.target/aarch64/uxtl-combine-6.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 81ff5bad03d598fa0d48df93d172a28bc0d1d92e..3d811007dd94dcd9176d6021a41a196c12fe9c3f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1988,26 +1988,60 @@ (define_insn "aarch64_simd_vec_unpack_hi_"
>[(set_attr "type" "neon_shift_imm_long")]
>  )
>  
> -(define_expand "vec_unpack_hi_"
> +(define_expand "vec_unpacku_hi_"
>[(match_operand: 0 "register_operand")
> -   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))]
> +   (match_operand:VQW 1 "register_operand")]
> +  "TARGET_SIMD"
> +  {
> +rtx res = gen_reg_rtx (mode);
> +rtx tmp = aarch64_gen_shareable_zero (mode);
> +if (BYTES_BIG_ENDIAN)
> +  emit_insn (gen_aarch64_zip2 (res, tmp, operands[1]));
> +else
> + emit_insn (gen_aarch64_zip2 (res, operands[1], tmp));
> +emit_move_insn (operands[0],
> +simplify_gen_subreg (mode, res, mode, 0));
> +DONE;
> +  }
> +)
> +
> +(define_expand "vec_unpacks_hi_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")]
>"TARGET_SIMD"
>{
>  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> -emit_insn (gen_aarch64_simd_vec_unpack_hi_ (operands[0],
> -   operands[1], p));
> +emit_insn (gen_aarch64_simd_vec_unpacks_hi_ (operands[0],
> +operands[1], p));
> +DONE;
> +  }
> +)
> +
> +(define_expand "vec_unpacku_lo_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VQW 1 "register_operand")]
> +  "TARGET_SIMD"
> +  {
> +rtx res = gen_reg_rtx (mode);
> +rtx tmp = aarch64_gen_shareable_zero (mode);
> +if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_aarch64_zip1 (res, tmp, operands[1]));
> +else
> + emit_insn (gen_aarch64_zip1 (res, operands[1], tmp));
> +emit_move_insn (oper

Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Sebastian Huber





On 21.11.23 11:34, Jakub Jelinek wrote:

--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -281,10 +281,13 @@ gen_assign_counter_update (gimple_stmt_iterator *gsi, 
gcall *call, tree func,
if (result)
  {
tree result_type = TREE_TYPE (TREE_TYPE (func));
-  tree tmp = make_temp_ssa_name (result_type, NULL, name);
-  gimple_set_lhs (call, tmp);
+  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
+  gimple_set_lhs (call, tmp1);
gsi_insert_after (gsi, call, GSI_NEW_STMT);
-  gassign *assign = gimple_build_assign (result, tmp);
+  tree tmp2 = make_ssa_name (TREE_TYPE (result));
+  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
+  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
+  assign = gimple_build_assign (result, gimple_assign_lhs (assign));

When you use a temporary tmp2 for the lhs of the conversion, you can just
use it here,
   assign = gimple_build_assign (result, tmp2);

Ok for trunk with that change.


Just a question, could I also use

tree tmp2 = make_temp_ssa_name (TREE_TYPE (result), NULL, name);

?

This make_temp_ssa_name() is used throughout the file and the new 
make_ssa_name() would be the first use in this file.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Jakub Jelinek

On Tue, Nov 21, 2023 at 11:42:06AM +0100, Sebastian Huber wrote:
> 
> 
> On 21.11.23 11:34, Jakub Jelinek wrote:
> > > --- a/gcc/tree-profile.cc
> > > +++ b/gcc/tree-profile.cc
> > > @@ -281,10 +281,13 @@ gen_assign_counter_update (gimple_stmt_iterator 
> > > *gsi, gcall *call, tree func,
> > > if (result)
> > >   {
> > > tree result_type = TREE_TYPE (TREE_TYPE (func));
> > > -  tree tmp = make_temp_ssa_name (result_type, NULL, name);
> > > -  gimple_set_lhs (call, tmp);
> > > +  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
> > > +  gimple_set_lhs (call, tmp1);
> > > gsi_insert_after (gsi, call, GSI_NEW_STMT);
> > > -  gassign *assign = gimple_build_assign (result, tmp);
> > > +  tree tmp2 = make_ssa_name (TREE_TYPE (result));
> > > +  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
> > > +  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
> > > +  assign = gimple_build_assign (result, gimple_assign_lhs (assign));
> > When you use a temporary tmp2 for the lhs of the conversion, you can just
> > use it here,
> >assign = gimple_build_assign (result, tmp2);
> > 
> > Ok for trunk with that change.
> 
> Just a question, could I also use
> 
> tree tmp2 = make_temp_ssa_name (TREE_TYPE (result), NULL, name);
> 
> ?
> 
> This make_temp_ssa_name() is used throughout the file and the new
> make_ssa_name() would be the first use in this file.

Yes.  The only difference is that it won't be _234 = (type) something;
but PROF_time_profile_234 = (type) something; in the dumps, but sure,
consistency is useful.

Jakub

Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

2023-11-21 Thread Alexander Monakov



On Tue, 21 Nov 2023, Maxim Kuvyrkov wrote:

>  This patch avoids sched-deps.cc:find_inc() creating exponential number
>  of dependencies, which become memory and compilation time hogs.
>  Consider example (simplified from PR96388) ...
>  ===
>  sp=sp-4 // sp_insnA
>  mem_insnA1[sp+A1]
>  ...
>  mem_insnAN[sp+AN]
>  sp=sp-4 // sp_insnB
>  mem_insnB1[sp+B1]
>  ...
>  mem_insnBM[sp+BM]
>  ===
>  ... in this example find_modifiable_mems() will arrange for mem_insnA*
>  to be able to pass sp_insnA, and, while doing this, will create
>  dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>  is a consumer of sp_insnA.  After this sp_insnB will have N new
>  backward dependencies.
>  Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>  N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>  dependencies.
> >> 
> >> [For avoidance of doubt, below discussion is about the general 
> >> implementation
> >> of find_modifiable_mems() and not about the patch.]
> > 
> > I was saying the commit message is hard to read (unless it's just me).
> > 
> >>> It's a bit hard to read this without knowing which value of 'backwards'
> >>> is assumed.
> 
> Oh, sorry, I misunderstood your comment.
> 
> In the above example I want to describe outcome that current code generates,
> without going into details about exactly how it does it.  I'm not sure how to
> make it more readable, and would appreciate suggestions.

I think it would be easier to follow if you could fix a specific value of
'backwards' up front, and then ensure all following statements are consistent
with that, like I did in my explanation. Please feel free to pick up my text
into the commit message, if it helps.

> >>> Say 'backwards' is true and we are inspecting producer sp_insnB of 
> >>> mem_insnB1.
> >>> This is a true dependency. We know we can break it by adjusting B1 by -4, 
> >>> but
> >>> we need to be careful not to move such modified mem_insnB1 above 
> >>> sp_insnA, so
> >>> we need to iterate over *incoming true dependencies* of sp_insnB and add 
> >>> them.
> >>> 
> >>> But the code seems to be iterating over *all incoming dependencies*, so it
> >>> will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
> >>> dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if 
> >>> my
> >>> understanding is correct.
> >> 
> >> Yeap, your understanding is correct.  However, this is what
> >> find_modifiable_mems() has to do to avoid complicated analysis of 
> >> second-level
> >> dependencies.
> > 
> > What is the reason it cannot simply skip anti-dependencies in the
> > 'if (backwards)' loop, and true dependencies in the 'else' loop?
> 
> I /think/, this should be possible.  However, rather than improving current
> implementation my preference is to rework it by integrating with the main
> dependency analysis.  This should provide both faster and more precise
> dependency analysis, which would generate breakable addr/mem dependencies.

I see, thank you.

Alexander

Fix 'gcc.dg/tree-ssa/return-value-range-1.c' (was: Propagate value ranges of return values)

2023-11-21 Thread Thomas Schwinge

Hi!

On 2023-11-19T16:05:42+0100, Jan Hubicka  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do ling } */

ERROR: gcc.dg/tree-ssa/return-value-range-1.c: 1: syntax error for " dg-do 
1 ling "

With that fixed into 'dg-do link', and...

> +/* { dg-options "-O1 -dump-tree-evrp-details" } */

... that one fixed into '-fdump-tree-evrp-details', I then get:

FAIL: gcc.dg/tree-ssa/return-value-range-1.c (test for excess errors)
UNRESOLVED: gcc.dg/tree-ssa/return-value-range-1.c scan-tree-dump-times 
evrp "Recording return range" 2

/tmp/ccTEuffl.o: In function `test':
return-value-range-1.c:(.text+0x24): undefined reference to `link_error'

This disappears when switching from '-O1' to '-O2'.  OK to push the
attached "Fix 'gcc.dg/tree-ssa/return-value-range-1.c'"?  (..., or did
you intend something else, here?)


Grüße
 Thomas


> +__attribute__ ((__noinline__))
> +int a(char c)
> +{
> + return c;
> +}
> +void link_error ();
> +
> +void
> +test(int d)
> +{
> + if (a(d) > 200)
> + link_error ();
> +}
> +int
> +main(int argc, char **argv)
> +{
> + test(argc);
> + return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "Recording return range" 2 "evrp"} } */


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f3a47339a9df9726da7e3c1daeadc216e1d5b365 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 21 Nov 2023 11:51:42 +0100
Subject: [PATCH] Fix 'gcc.dg/tree-ssa/return-value-range-1.c'

... added in recent commit 53ba8d669550d3a1f809048428b97ca607f95cf5
"inter-procedural value range propagation".

	gcc/testsuite/
	* gcc.dg/tree-ssa/return-value-range-1.c: Fix.
---
 gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c b/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c
index 4db52233c5d..74f1a5080bb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c
@@ -1,5 +1,5 @@
-/* { dg-do ling } */
-/* { dg-options "-O1 -dump-tree-evrp-details" } */
+/* { dg-do link } */
+/* { dg-options "-O2 -fdump-tree-evrp-details" } */
 __attribute__ ((__noinline__))
 int a(char c)
 {
-- 
2.34.1

Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-21 Thread Sebastian Huber


On 21.11.23 11:46, Jakub Jelinek wrote:

On Tue, Nov 21, 2023 at 11:42:06AM +0100, Sebastian Huber wrote:


On 21.11.23 11:34, Jakub Jelinek wrote:

--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -281,10 +281,13 @@ gen_assign_counter_update (gimple_stmt_iterator *gsi, 
gcall *call, tree func,
 if (result)
   {
 tree result_type = TREE_TYPE (TREE_TYPE (func));
-  tree tmp = make_temp_ssa_name (result_type, NULL, name);
-  gimple_set_lhs (call, tmp);
+  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
+  gimple_set_lhs (call, tmp1);
 gsi_insert_after (gsi, call, GSI_NEW_STMT);
-  gassign *assign = gimple_build_assign (result, tmp);
+  tree tmp2 = make_ssa_name (TREE_TYPE (result));
+  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
+  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
+  assign = gimple_build_assign (result, gimple_assign_lhs (assign));

When you use a temporary tmp2 for the lhs of the conversion, you can just
use it here,
assign = gimple_build_assign (result, tmp2);

Ok for trunk with that change.

Just a question, could I also use

tree tmp2 = make_temp_ssa_name (TREE_TYPE (result), NULL, name);

?

This make_temp_ssa_name() is used throughout the file and the new
make_ssa_name() would be the first use in this file.

Yes.  The only difference is that it won't be _234 = (type) something;
but PROF_time_profile_234 = (type) something; in the dumps, but sure,
consistency is useful.


Thanks for your help. I checked in an updated version.

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/

Re: [PATCH] Fix PR ada/111909 On Darwin, determine filesystem case sensitivity at runtime

2023-11-21 Thread Iain Sandoe

Hello Simon, Arno,

> On 17 Nov 2023, at 13:43, Simon Wright  wrote:
> 
>> 
>>> Apple’s naming is definitely confusing in this area!
>>> 
>>> In current SDKs, TARGET_OS_MAC means code is being generated for a Mac OS X 
>>> variant, 
>>> which covers OSX, IOS, Watch … ; to determine which kind of device, you 
>>> have to check the 
>>> specific define for that device - OSX corresponds to macOS, i.e. laptops, 
>>> desktops.
>>> 
>>> In older SDKs (specifically Xcode 3, for macOS Leopard (darwin 9) as 
>>> mentioned by Iain) 
>>> TARGET_OS_MAC means code is being generated for "Mac OS", i.e. laptops, 
>>> desktops as 
>>> above; TARGET_OS_OSX is undefined (as are TARGET_OS_IOS etc).
>>> 
>>> If we are compiling for macOS, using a current macOS SDK, then 
>>> TARGET_OS_MAC is
>>> set to 1 and TARGET_OS_OSX is set to 1. 
>>> 
>>> If we were compiling for iOS, using a current iOS SDK as supplied with 
>>> current Xcode, then 
>>> TARGET_OS_MAC would be set to 1, TARGET_OS_OSX would be set to 0, and 
>>> TARGET_OS_IOS would be set to 1.
>> 
>> OK so then the following is sufficient for our needs:
>> 
>> #elif defined (__APPLE__)
>>  /* By default, macOS volumes are case-insensitive, iOS
>> volumes are case-sensitive.  */
>> #if TARGET_OS_IOS
>>   file_names_case_sensitive_cache = 1;
>> #else
>>   file_names_case_sensitive_cache = 0;
>> #endif
>> #else /* Neither Windows nor Apple.  */
>>   file_names_case_sensitive_cache = 1;
>> #endif
>> 
>> We want the default to be 0, and we only care about setting it to 1 on iOS 
>> for recent
>> SDKs, the case of an old SDK and iOS isn't interesting at this stage, so 
>> it's fine if we set
>> the var to 0 in this scenario.
> 
> I can’t speak for Darwin maintainers, so I’ll leave it to Iain to comment on 
> this suggestion.

* We are far away from having support for watchOS (32b Arm64) so I think that 
is a bridge
that can be crossed later.

* It seems to me that the proposed solution is better matched to the defaults 
on macOS/iOS.

* It would be better to have an automatic solution for folks (like me) who do 
use case-
sensitive file systems on macOS, but we do not have the resources right now to 
figure
out what is not working on the earlier systems.  I looked briefly, and found 
that the libcalls
are thin wrappers on a syscall, so that the different behaviours we are seeing 
on earlier
OS versions reflects the kernel’s handling of the provided path, rather than 
some improvement
in newer library functions.  That suggests to me that we will need to wrap the 
call in some more
complex logic to obtain the correct response.

So, I think that (with a test across the range of supported OS versions) the 
proposed
solution is an incremental improvement and we should take it.

When there’s a final proposed patch, I can add it into my testing across the 
systems.

Iain

Re: [PATCH, v2] Fortran: restrictions on integer arguments to SYSTEM_CLOCK [PR112609]

2023-11-21 Thread Mikael Morin


Hello,

Le 20/11/2023 à 20:02, Steve Kargl a écrit :

Harald,

Sorry about delayed response.  Got side-tracked by Family this weekend.

On Sun, Nov 19, 2023 at 09:46:46PM +0100, Harald Anlauf wrote:


On 11/19/23 01:04, Steve Kargl wrote:

On Sat, Nov 18, 2023 at 11:12:55PM +0100, Harald Anlauf wrote:

Regtested on x86_64-pc-linux-gnu.  OK for mainline?



Not in its current form.


   {
+  int first_int_kind = -1;
+  bool f2023 = ((gfc_option.allow_std & GFC_STD_F2023) != 0
+   && (gfc_option.allow_std & GFC_STD_GNU) == 0);
+


If you use the gfc_notify_std(), then you should not need the
above check on GFC_STD_GNU as it should include GFC_STD_F2023.


this is actually the question (and problem).  For all new features,
-std=gnu shall include everything allowed by -std=f2023.


Yes.

Harald, you mentioned the lack of GFC_STD_F2023_DEL feature group in 
your first message, but I don't quite understand why you didn't add one. 
 It seems to me the most natural way to do this.



Here we have the problem that the testcase is valid F2018 and is
silently accepted by gfortran-13 for -std=gnu and -std=f2018.


F2023 is the Fortran standard and supercedes previous Fortran standards.
If there is a conflict between the standing standard and an old standard,
then the standing standard should take precedence unless one specifically
uses, for example, -std=f2018.

After 20+ years of contributing to gfortran, I've come to believe
that the default -std= should be the current standard, and -std=gnu
should be deprecated.  All GNU extensions should require an option
to active.  For example,

write(*,*), 'hello'
end

gfortran12 -o z a.f90
a.f90:1:10:

1 | write(*,*), 'hello'
  |   1
Warning: Legacy Extension: Comma before i/o item list at (1)

This should be an error unless the -fallow-write-stmt-comma is used.
The option would simply degrade the error to a warning.  Why, you ask?
To encourage people to write standard conforming code.  Unfortunately,
that horse has left the barn.


I prefer to keep it that way also for gfortran-14, and apply the
new restrictions only for -std=f2023.  Do we agree on this?


I suggest we emit a warning by default, error with -std=f2023 (I agree 
with Steve that we should push towards strict f2023 conformance), and no 
diagnostic with -std=gnu or -std=f2018 or lower.



If gfortran wants to maintain the status quo for 14, then
it should probably remove the -std=f2023 patch and wait for
the branch to 15.


Now that should happen for -std=gnu -pedantic (-w)?


-pedantic is not a very effective option and should be ignored.
 >> I have thought some more and came up with the revised attached

patch, which still has the above condition.  It now marks the
diagnostics as GNU extensions beyond F2023 for -std=f2023.

The mask f2023 in the above form suppresses new warnings even
for -pedantic; one would normally use -w to suppress them.

Now if you remove the second part of the condition, we will
regress on testcases system_clock_1.f90 and system_clock_3.f90
because they would emit GNU extension warnings because the
testsuite runs with -pedantic.


It seems that the solution is to fix the code in the testsuite.


Agreed, these seem to explicitly test mismatching kinds, so add an 
option to prevent error.


Mikael

Re: [PATCH 01/11] rtl-ssa: Support for inserting new insns

2023-11-21 Thread Richard Sandiford

Alex Coplan  writes:
> N.B. this is just a rebased (but otherwise unchanged) version of the
> same patch already posted here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633348.html
>
> this is the only unreviewed dependency from the previous series, so it
> seemed easier just to re-post it (not least to appease the pre-commit
> CI).
>
> -- >8 --
>
> The upcoming aarch64 load pair pass needs to form store pairs, and can
> re-order stores over loads when alias analysis determines this is safe.
> In the case that both mem defs have uses in the RTL-SSA IR, and both
> stores require re-ordering over their uses, we represent that as
> (tentative) deletion of the original store insns and creation of a new
> insn, to prevent requiring repeated re-parenting of uses during the
> pass.  We then update all mem uses that require re-parenting in one go
> at the end of the pass.
>
> To support this, RTL-SSA needs to handle inserting new insns (rather
> than just changing existing ones), so this patch adds support for that.
>
> New insns (and new accesses) are temporaries, allocated above a temporary
> obstack_watermark, such that the user can easily back out of a change without
> awkward bookkeeping.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/ChangeLog:
>
> * rtl-ssa/accesses.cc (function_info::create_set): New.
> * rtl-ssa/accesses.h (access_info::is_temporary): New.
> * rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns.
> (function_info::finalize_new_accesses): Handle new/temporary
> user-created accesses.
> (function_info::apply_changes_to_insn): Ensure m_is_temp flag
> on new insns gets cleared.
> (function_info::change_insns): Handle new/temporary insns.
> (function_info::create_insn): New.
> * rtl-ssa/changes.h (class insn_change): Make function_info a
> friend class.
> * rtl-ssa/functions.h (function_info): Declare new entry points:
> create_set, create_insn.  Declare new change_alloc helper.
> * rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns 
> in
> dump.
> * rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying
> is_temporary accessor.
> * rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp 
> to
> false.
> * rtl-ssa/member-fns.inl (function_info::change_alloc): New.
> * rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add
> handling for temporary defs.

Looks good, but there were a couple of things I didn't understand:

> ---
>  gcc/rtl-ssa/accesses.cc| 10 ++
>  gcc/rtl-ssa/accesses.h |  4 +++
>  gcc/rtl-ssa/changes.cc | 74 +++---
>  gcc/rtl-ssa/changes.h  |  2 ++
>  gcc/rtl-ssa/functions.h| 14 
>  gcc/rtl-ssa/insns.cc   |  5 +++
>  gcc/rtl-ssa/insns.h|  7 +++-
>  gcc/rtl-ssa/internals.inl  |  1 +
>  gcc/rtl-ssa/member-fns.inl | 12 +++
>  gcc/rtl-ssa/movement.h |  8 -
>  10 files changed, 123 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
> index 510545a8bad..76d70fd8bd3 100644
> --- a/gcc/rtl-ssa/accesses.cc
> +++ b/gcc/rtl-ssa/accesses.cc
> @@ -1456,6 +1456,16 @@ function_info::make_uses_available (obstack_watermark 
> &watermark,
>return use_array (new_uses, num_uses);
>  }
>  
> +set_info *
> +function_info::create_set (obstack_watermark &watermark,
> +insn_info *insn,
> +resource_info resource)
> +{
> +  auto set = change_alloc (watermark, insn, resource);
> +  set->m_is_temp = true;
> +  return set;
> +}
> +
>  // Return true if ACCESS1 can represent ACCESS2 and if ACCESS2 can
>  // represent ACCESS1.
>  static bool
> diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
> index fce31d46717..7e7a90ece97 100644
> --- a/gcc/rtl-ssa/accesses.h
> +++ b/gcc/rtl-ssa/accesses.h
> @@ -204,6 +204,10 @@ public:
>// in the main instruction pattern.
>bool only_occurs_in_notes () const { return m_only_occurs_in_notes; }
>  
> +  // Return true if this is a temporary access, e.g. one created for
> +  // an insn that is about to be inserted.
> +  bool is_temporary () const { return m_is_temp; }
> +
>  protected:
>access_info (resource_info, access_kind);
>  
> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index aab532b9f26..da2a61d701a 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -394,14 +394,20 @@ move_insn (insn_change &change, insn_info *after)
>// At the moment we don't support moving instructions between EBBs,
>// but this would be worth adding if it's useful.
>insn_info *insn = change.insn ();
> -  gcc_assert (after->ebb () == insn->ebb ());
> +
>bb_info *bb = after->bb ();
>basic_block cfg_bb = bb->cfg_bb ();
>  
> -  if (insn->bb () != bb)
> -

Re: [PATCH 1/4] libsanitizer: merge from upstream (c425db2eb558c263)

2023-11-21 Thread Iain Sandoe

Hi FX

> On 17 Nov 2023, at 11:57, FX Coudert  wrote:
> 
>> If they accept it say within a day, wait for it + cherry-pick to GCC,
>> otherwise apply to GCC as a local patch in anticipation they accept it.
>> If it is all that fixes Darwin support, great.
> 
> With that patch, I can finish bootstrap, and regtesting is undergoing but I’m 
> seeing no issue so far.

I see that the fix was applied locally (and my bootstraps on various Darwin 
versions worked OK),
but I’m not clear if you submitted a PR upstream (or just the bug report).  If 
the fix is to remain
local-only, it should be added to the list in LOCAL_PATCHES.

thanks for handling this,
Iain

Re: [PATCH 02/11] rtl-ssa: Add some helpers for removing accesses

2023-11-21 Thread Richard Sandiford

Alex Coplan  writes:
> This adds some helpers to access-utils.h for removing accesses from an
> access_array.  This is needed by the upcoming aarch64 load/store pair
> fusion pass.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/ChangeLog:
>
>   * rtl-ssa/access-utils.h (filter_accesses): New.
>   (remove_regno_access): New.
>   (check_remove_regno_access): New.
> ---
>  gcc/rtl-ssa/access-utils.h | 42 ++
>  1 file changed, 42 insertions(+)
>
> diff --git a/gcc/rtl-ssa/access-utils.h b/gcc/rtl-ssa/access-utils.h
> index f078625babf..31259d742d9 100644
> --- a/gcc/rtl-ssa/access-utils.h
> +++ b/gcc/rtl-ssa/access-utils.h
> @@ -78,6 +78,48 @@ drop_memory_access (T accesses)
>return T (arr.begin (), accesses.size () - 1);
>  }
>  
> +// Filter ACCESSES to return an access_array of only those accesses that
> +// satisfy PREDICATE.  Alocate the new array above WATERMARK.
> +template
> +inline T
> +filter_accesses (obstack_watermark &watermark,
> +  T accesses,
> +  FilterPredicate predicate)
> +{
> +  access_array_builder builder (watermark);
> +  builder.reserve (accesses.size ());
> +  auto it = accesses.begin ();
> +  auto end = accesses.end ();
> +  for (; it != end; it++)
> +if (predicate (*it))
> +  builder.quick_push (*it);

It looks like the last five lines could be simplified to:

  for (access_info *access : accesses)
if (!predicate (access))
  builder.quick_push (access);

> +  return T (builder.finish ());
> +}
> +
> +// Given an array of ACCESSES, remove any access with regno REGNO.
> +// Allocate the new access array above WM.
> +template
> +inline T
> +remove_regno_access (obstack_watermark &watermark,
> +  T accesses, unsigned int regno)
> +{
> +  using Access = decltype (accesses[0]);
> +  auto pred = [regno](Access a) { return a->regno () != regno; };
> +  return filter_accesses (watermark, accesses, pred);
> +}
> +
> +// As above, but additionally check that we actually did remove an access.
> +template
> +inline T
> +check_remove_regno_access (obstack_watermark &watermark,
> +T accesses, unsigned regno)
> +{
> +  auto orig_size = accesses.size ();
> +  auto result = remove_regno_access (watermark, accesses, regno);
> +  gcc_assert (result.size () < orig_size);
> +  return result;
> +}
> +

Could you also use the helper to replace:

access_array_builder builder (watermark);
builder.reserve (accesses.size ());
for (access_info *access2 : accesses)
  if (!access2->only_occurs_in_notes ())
builder.quick_push (access2);
return builder.finish ();

in remove_note_accesses_base?

OK with those changes, thanks.

Richard

>  // If sorted array ACCESSES includes a reference to REGNO, return the
>  // access, otherwise return null.
>  template

Re: [PATCH 03/11] aarch64, testsuite: Fix up auto-init-padding tests

2023-11-21 Thread Richard Sandiford

Alex Coplan  writes:
> The tests currently depending on memcpy lowering forming stps at -O0,
> but we no longer want to form stps during memcpy lowering, but instead
> in the load/store pair fusion pass.
>
> This patch therefore tweaks affected tests to enable optimizations
> (-O1), and adjusts the tests to avoid parts of the structures being
> optimized away where necessary.
>
> OK for trunk?
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/auto-init-padding-1.c: Add -O to options,
>   adjust test to work with optimizations enabled.
>   * gcc.target/aarch64/auto-init-padding-2.c: Add -O to options.
>   * gcc.target/aarch64/auto-init-padding-3.c: Add -O to options,
>   adjust test to work with optimizations enabled.
>   * gcc.target/aarch64/auto-init-padding-4.c: Likewise.
>   * gcc.target/aarch64/auto-init-padding-9.c: Likewise.

OK, thanks.

Richard

> ---
>  gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c | 8 +---
>  gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c | 2 +-
>  gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c | 7 ---
>  gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c | 4 ++--
>  gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c | 7 ---
>  5 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c 
> b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c
> index c747ebdcdf7..7027454dc74 100644
> --- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c
> @@ -1,17 +1,19 @@
>  /* Verify zero initialization for structure type automatic variables with
> padding.  */
>  /* { dg-do compile } */
> -/* { dg-options "-ftrivial-auto-var-init=zero" } */
> +/* { dg-options "-O -ftrivial-auto-var-init=zero" } */
>  
>  struct test_aligned {
>  int internal1;
>  long long internal2;
>  } __attribute__ ((aligned(64)));
>  
> -int foo ()
> +void bar (struct test_aligned *);
> +
> +void foo ()
>  {
>struct test_aligned var;
> -  return var.internal1;
> +  bar(&var);
>  }
>  
>  /* { dg-final { scan-assembler-times {stp\tq[0-9]+, q[0-9]+,} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c 
> b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c
> index 6e280904da1..d3b6591c9b0 100644
> --- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c
> @@ -1,7 +1,7 @@
>  /* Verify pattern initialization for structure type automatic variables with
> padding.  */
>  /* { dg-do compile } */
> -/* { dg-options "-ftrivial-auto-var-init=pattern" } */
> +/* { dg-options "-O -ftrivial-auto-var-init=pattern" } */
>  
>  struct test_aligned {
>  int internal1;
> diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c 
> b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c
> index 9ddea58b468..aad4bb8944f 100644
> --- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c
> +++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c
> @@ -1,7 +1,7 @@
>  /* Verify zero initialization for nested structure type automatic variables 
> with
> padding.  */
>  /* { dg-do compile } */
> -/* { dg-options "-ftrivial-auto-var-init=zero" } */
> +/* { dg-options "-O -ftrivial-auto-var-init=zero" } */
>  
>  struct test_aligned {
>  unsigned internal1;
> @@ -16,11 +16,12 @@ struct test_big_hole {
>  struct test_aligned four;
>  } __attribute__ ((aligned(64)));
>  
> +void bar (struct test_big_hole *);
>  
> -int foo ()
> +void foo ()
>  {
>struct test_big_hole var;
> -  return var.four.internal1;
> +  bar (&var);
>  }
>  
>  /* { dg-final { scan-assembler-times {stp\tq[0-9]+, q[0-9]+,} 4 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c 
> b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c
> index 75bba82ed34..efd310f054d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c
> @@ -1,7 +1,7 @@
>  /* Verify pattern initialization for nested structure type automatic 
> variables with
> padding.  */
>  /* { dg-do compile } */
> -/* { dg-options "-ftrivial-auto-var-init=pattern" } */
> +/* { dg-options "-O -ftrivial-auto-var-init=pattern" } */
>  
>  struct test_aligned {
>  unsigned internal1;
> @@ -23,4 +23,4 @@ int foo ()
>return var.four.internal1;
>  }
>  
> -/* { dg-final { scan-assembler-times {stp\tq[0-9]+, q[0-9]+,} 5 } } */
> +/* { dg-final { scan-assembler-times {stp\tq[0-9]+, q[0-9]+,} 4 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c 
> b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c
> index 0f1930f813e..64ed8f11fe6 100644
> --- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c
> +++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c
> @@ -1,7 +1,7 @@
>  /* Verify zero initiali

Re: [PATCH 04/11] aarch64, testsuite: Allow ldp/stp on SVE regs with -msve-vector-bits=128

2023-11-21 Thread Richard Sandiford

Alex Coplan  writes:
> Later patches in the series allow ldp and stp to use SVE modes if
> -msve-vector-bits=128 is provided.  This patch therefore adjusts tests
> that pass -msve-vector-bits=128 to allow ldp/stp to save/restore SVE
> registers.
>
> OK for trunk?
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/pcs/stack_clash_1_128.c: Allow ldp/stp saves
>   of SVE registers.
>   * gcc.target/aarch64/sve/pcs/struct_3_128.c: Likewise.

OK, thanks.

Richard

> ---
>  .../aarch64/sve/pcs/stack_clash_1_128.c   | 32 +++
>  .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 29 +
>  2 files changed, 61 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1_128.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1_128.c
> index 404301dc0c1..795429b01cb 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1_128.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1_128.c
> @@ -19,6 +19,7 @@
>  **   str p13, \[sp, #9, mul vl\]
>  **   str p14, \[sp, #10, mul vl\]
>  **   str p15, \[sp, #11, mul vl\]
> +** (
>  **   str z8, \[sp, #2, mul vl\]
>  **   str z9, \[sp, #3, mul vl\]
>  **   str z10, \[sp, #4, mul vl\]
> @@ -35,7 +36,18 @@
>  **   str z21, \[sp, #15, mul vl\]
>  **   str z22, \[sp, #16, mul vl\]
>  **   str z23, \[sp, #17, mul vl\]
> +** |
> +**   stp q8, q9, \[sp, 32\]
> +**   stp q10, q11, \[sp, 64\]
> +**   stp q12, q13, \[sp, 96\]
> +**   stp q14, q15, \[sp, 128\]
> +**   stp q16, q17, \[sp, 160\]
> +**   stp q18, q19, \[sp, 192\]
> +**   stp q20, q21, \[sp, 224\]
> +**   stp q22, q23, \[sp, 256\]
> +** )
>  **   ptrue   p0\.b, vl16
> +** (
>  **   ldr z8, \[sp, #2, mul vl\]
>  **   ldr z9, \[sp, #3, mul vl\]
>  **   ldr z10, \[sp, #4, mul vl\]
> @@ -52,6 +64,16 @@
>  **   ldr z21, \[sp, #15, mul vl\]
>  **   ldr z22, \[sp, #16, mul vl\]
>  **   ldr z23, \[sp, #17, mul vl\]
> +** |
> +**   ldp q8, q9, \[sp, 32\]
> +**   ldp q10, q11, \[sp, 64\]
> +**   ldp q12, q13, \[sp, 96\]
> +**   ldp q14, q15, \[sp, 128\]
> +**   ldp q16, q17, \[sp, 160\]
> +**   ldp q18, q19, \[sp, 192\]
> +**   ldp q20, q21, \[sp, 224\]
> +**   ldp q22, q23, \[sp, 256\]
> +** )
>  **   ldr p4, \[sp\]
>  **   ldr p5, \[sp, #1, mul vl\]
>  **   ldr p6, \[sp, #2, mul vl\]
> @@ -101,16 +123,26 @@ test_2 (void)
>  **   str p5, \[sp\]
>  **   str p6, \[sp, #1, mul vl\]
>  **   str p11, \[sp, #2, mul vl\]
> +** (
>  **   str z8, \[sp, #1, mul vl\]
>  **   str z13, \[sp, #2, mul vl\]
>  **   str z19, \[sp, #3, mul vl\]
>  **   str z20, \[sp, #4, mul vl\]
> +** |
> +**   stp q8, q13, \[sp, 16\]
> +**   stp q19, q20, \[sp, 48\]
> +** )
>  **   str z22, \[sp, #5, mul vl\]
>  **   ptrue   p0\.b, vl16
> +** (
>  **   ldr z8, \[sp, #1, mul vl\]
>  **   ldr z13, \[sp, #2, mul vl\]
>  **   ldr z19, \[sp, #3, mul vl\]
>  **   ldr z20, \[sp, #4, mul vl\]
> +** |
> +**   ldp q8, q13, \[sp, 16\]
> +**   ldp q19, q20, \[sp, 48\]
> +** )
>  **   ldr z22, \[sp, #5, mul vl\]
>  **   ldr p5, \[sp\]
>  **   ldr p6, \[sp, #1, mul vl\]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
> index f6d78469aa5..0d330c015b9 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
> @@ -220,6 +220,7 @@ SEL2 (struct, pst_arr5)
>  /*
>  ** test_pst_arr5:
>  **   sub sp, sp, #128
> +** (
>  **   str z0, \[sp\]
>  **   str z1, \[sp, #1, mul vl\]
>  **   str z2, \[sp, #2, mul vl\]
> @@ -228,6 +229,12 @@ SEL2 (struct, pst_arr5)
>  **   str z5, \[sp, #5, mul vl\]
>  **   str z6, \[sp, #6, mul vl\]
>  **   str z7, \[sp, #7, mul vl\]
> +** |
> +**   stp q0, q1, \[sp\]
> +**   stp q2, q3, \[sp, 32\]
> +**   stp q4, q5, \[sp, 64\]
> +**   stp q6, q7, \[sp, 96\]
> +** )
>  **   mov (x7, sp|w7, wsp)
>  **   add sp, sp, #?128
>  **   ret
> @@ -374,8 +381,12 @@ SEL2 (struct, pst_uniform1)
>  /*
>  ** test_pst_uniform1:
>  **   sub sp, sp, #32
> +** (
>  **   str z0, \[sp\]
>  **   str z1, \[sp, #1, mul vl\]
> +** |
> +**   stp q0, q1, \[sp\]
> +** )
>  **   mov (x7, sp|w7, wsp)
>  **   add sp, sp, #?32
>  **   ret
> @@ -398,8 +409,12 @@ SEL2 (struct, pst_uniform2)
>  /*
>  ** test_pst_uniform2:
>  **   sub sp, sp, #48
> +** (
>  **   str z0, \[sp\]
>  **   str z1, \[sp, #1, mul vl\]
> +** |
> +**   stp q0, q1, \[sp\]
> +** )
>  **   str z2, \[sp, #2, mul vl\]
>  **   mov (x7, sp|w7, wsp)
>  **   add sp, sp, #?48
> @@ -424,10 +439,15 @@ SEL2 (struct, pst_uniform3)
>  /*
>  ** test_pst_uniform3:
>  **   sub sp, sp, #64
> +** (
>  **   str z0, \[sp\]
>  **   str z1, \[sp, #1, mul vl\]

Re: [PATCH 05/11] aarch64, testsuite: Fix up pr103147-10 tests

2023-11-21 Thread Richard Sandiford

Alex Coplan  writes:
> For the ret function, allow the loads to be emitted in either order in
> the codegen.  The order gets inverted with the new load/store pair pass.
>
> OK for trunk?
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/aarch64/pr103147-10.C (ret): Allow loads in either order.

OK, but: would adding -fno-schedule-insns -fno-schedule-insns2
also have worked?  It's usually a good idea to pass those options
to -O2 check-function-bodies tests, if the test checks for insns
that have dependency.

In this case, those options would ensure that nothing else gets
moved between the loads.

So OK as-is, but also OK with the options instead.

Thanks,
Richard

>   * gcc.target/aarch64/pr103147-10.c (ret): Likewise.
> ---
>  gcc/testsuite/g++.target/aarch64/pr103147-10.C | 5 +
>  gcc/testsuite/gcc.target/aarch64/pr103147-10.c | 5 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/gcc/testsuite/g++.target/aarch64/pr103147-10.C 
> b/gcc/testsuite/g++.target/aarch64/pr103147-10.C
> index e12771533f7..5a98c30ed3f 100644
> --- a/gcc/testsuite/g++.target/aarch64/pr103147-10.C
> +++ b/gcc/testsuite/g++.target/aarch64/pr103147-10.C
> @@ -62,8 +62,13 @@ ld4 (int32x4x4_t *a, int32_t *b)
>  /*
>  ** ret:
>  **   ...
> +** (
>  **   ldp q0, q1, \[x0\]
>  **   ldr q2, \[x0, #?32\]
> +** |
> +**   ldr q2, \[x0, #?32\]
> +**   ldp q0, q1, \[x0\]
> +** )
>  **   ...
>  */
>  int32x4x3_t
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr103147-10.c 
> b/gcc/testsuite/gcc.target/aarch64/pr103147-10.c
> index 57942bfd10a..2609266bc46 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr103147-10.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr103147-10.c
> @@ -60,8 +60,13 @@ ld4 (int32x4x4_t *a, int32_t *b)
>  /*
>  ** ret:
>  **   ...
> +** (
>  **   ldp q0, q1, \[x0\]
>  **   ldr q2, \[x0, #?32\]
> +** |
> +**   ldr q2, \[x0, #?32\]
> +**   ldp q0, q1, \[x0\]
> +** )
>  **   ...
>  */
>  int32x4x3_t

Re: [PATCH 06/11] aarch64: Fix up aarch64_print_operand xzr/wzr case

2023-11-21 Thread Richard Sandiford

Alex Coplan  writes:
> This adjusts aarch64_print_operand to recognize zero rtxes in modes other than
> VOIDmode.  This allows us to use xzr/wzr for zero vectors, for example.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_print_operand): Handle
>   non-VOIDmode CONST0_RTXes in {x,w}zr cases.

OK (and a good sign!)

Thanks,
Richard

> ---
>  gcc/config/aarch64/aarch64.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 800a8b0e110..abd029887e5 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -12387,7 +12387,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>  
>  case 'w':
>  case 'x':
> -  if (x == const0_rtx
> +  if (x == CONST0_RTX (GET_MODE (x))
> || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x)))
>   {
> asm_fprintf (f, "%czr", code);

Re: Fix 'gcc.dg/tree-ssa/return-value-range-1.c' (was: Propagate value ranges of return values)

2023-11-21 Thread Jan Hubicka

> Hi!
> 
> On 2023-11-19T16:05:42+0100, Jan Hubicka  wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c
> > @@ -0,0 +1,22 @@
> > +/* { dg-do ling } */
> 
> ERROR: gcc.dg/tree-ssa/return-value-range-1.c: 1: syntax error for " 
> dg-do 1 ling "
> 
> With that fixed into 'dg-do link', and...
> 
> > +/* { dg-options "-O1 -dump-tree-evrp-details" } */
> 
> ... that one fixed into '-fdump-tree-evrp-details', I then get:
> 
> FAIL: gcc.dg/tree-ssa/return-value-range-1.c (test for excess errors)
> UNRESOLVED: gcc.dg/tree-ssa/return-value-range-1.c scan-tree-dump-times 
> evrp "Recording return range" 2
> 
> /tmp/ccTEuffl.o: In function `test':
> return-value-range-1.c:(.text+0x24): undefined reference to `link_error'
> 
> This disappears when switching from '-O1' to '-O2'.  OK to push the
> attached "Fix 'gcc.dg/tree-ssa/return-value-range-1.c'"?  (..., or did
> you intend something else, here?)

Ah sorry for that - I looked for FAIl and missed the error.  Yes, the
change is OK.  Indeed -fipa-vrp is enabled only at -O2. (I think basic
non-dataflow VRP could be doable and effective even at -O1, but we don't
do that)
Honza
> 
> 
> Grüße
>  Thomas
> 
> 
> > +__attribute__ ((__noinline__))
> > +int a(char c)
> > +{
> > + return c;
> > +}
> > +void link_error ();
> > +
> > +void
> > +test(int d)
> > +{
> > + if (a(d) > 200)
> > + link_error ();
> > +}
> > +int
> > +main(int argc, char **argv)
> > +{
> > + test(argc);
> > + return 0;
> > +}
> > +/* { dg-final { scan-tree-dump-times "Recording return range" 2 "evrp"} } 
> > */
> 
> 
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> From f3a47339a9df9726da7e3c1daeadc216e1d5b365 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Tue, 21 Nov 2023 11:51:42 +0100
> Subject: [PATCH] Fix 'gcc.dg/tree-ssa/return-value-range-1.c'
> 
> ... added in recent commit 53ba8d669550d3a1f809048428b97ca607f95cf5
> "inter-procedural value range propagation".
> 
>   gcc/testsuite/
>   * gcc.dg/tree-ssa/return-value-range-1.c: Fix.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c
> index 4db52233c5d..74f1a5080bb 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/return-value-range-1.c
> @@ -1,5 +1,5 @@
> -/* { dg-do ling } */
> -/* { dg-options "-O1 -dump-tree-evrp-details" } */
> +/* { dg-do link } */
> +/* { dg-options "-O2 -fdump-tree-evrp-details" } */
>  __attribute__ ((__noinline__))
>  int a(char c)
>  {
> -- 
> 2.34.1
>

Re: [PATCH 07/11] aarch64: Fix up printing of ldp/stp with -msve-vector-bits=128

2023-11-21 Thread Richard Sandiford

Alex Coplan  writes:
> Later patches allow using SVE modes in ldp/stp with -msve-vector-bits=128,
> so we need to make sure that we don't use SVE addressing modes when
> printing the address for the ldp/stp.
>
> This patch does that.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_print_address_internal): Handle SVE
>   modes when printing ldp/stp addresses.
> ---
>  gcc/config/aarch64/aarch64.cc | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index abd029887e5..4820fac67a1 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -12661,6 +12661,9 @@ aarch64_print_address_internal (FILE *f, machine_mode 
> mode, rtx x,
>return false;
>  }
>  
> +  const bool load_store_pair_p = (type == ADDR_QUERY_LDP_STP
> +   || type == ADDR_QUERY_LDP_STP_N);
> +
>if (aarch64_classify_address (&addr, x, mode, true, type))
>  switch (addr.type)
>{
> @@ -12672,7 +12675,15 @@ aarch64_print_address_internal (FILE *f, 
> machine_mode mode, rtx x,
> }
>  
>   vec_flags = aarch64_classify_vector_mode (mode);
> - if (vec_flags & VEC_ANY_SVE)
> + if ((vec_flags & VEC_ANY_SVE)
> + && load_store_pair_p
> + && !addr.const_offset.is_constant ())
> +   {
> + output_operand_lossage ("poly offset in ldp/stp address");
> + return false;
> +   }
> +
> + if ((vec_flags & VEC_ANY_SVE) && !load_store_pair_p)
> {
>   HOST_WIDE_INT vnum
> = exact_div (addr.const_offset,

It should be possible to exercise the error form of the output_operand_lossage
with a test like:

#include 

void f(int32_t *ptr) {
  asm volatile ("%z0" :: "m" (*(svint32_t*)(ptr + svcntw(;
}

But the "poly offset" won't mean anything to users.

It's up to the caller to report the error on a false return, so I think
this code should just return with its own error.

Also, IMO it'd be neater to do the tests the other way around:


if ((vec_flags & VEC_ANY_SVE) && !load_store_pair_p)
  ...

if (!CONST_INT_P (addr.offset))
  return false;

asm_fprintf (f, "[%s, %wd]", reg_names[REGNO (addr.base)],
 INTVAL (addr.offset));
return true;

OK with those changes, and with the test above if it works.

Thanks,
Richard

Re: libstdc++: Speed up push_back

2023-11-21 Thread Jan Hubicka

> > +   // RAII type to destroy initialized elements.
> 
> There's only one initialized element, not "elements".
> 
> > +   struct _Guard_elts
> > +   {
> > + pointer _M_first, _M_last;  // Elements to destroy
> 
> We only need to store one pointer here, call it _M_p.
> 
> > + _Tp_alloc_type& _M_alloc;
> > +
> > + _GLIBCXX20_CONSTEXPR
> > + _Guard_elts(pointer __elt, _Tp_alloc_type& __a)
> > + : _M_first(__elt), _M_last(__elt + 1), _M_alloc(__a)
> > + { }
> > +
> > + _GLIBCXX20_CONSTEXPR
> > + ~_Guard_elts()
> > + { std::_Destroy(_M_first, _M_last, _M_alloc); }
> 
> This should be either:
> 
>   std::_Destroy(_M_p, _M_p+1, _M_alloc);
> 
> or avoid the loop that happens in that _Destroy function:
> 
>   _Alloc_traits::destroy(_M_alloc, _M_p);
> 
> > +
> > +   private:
> > + _Guard_elts(const _Guard_elts&);
> > +   };
> > +
> > +   // Guard the new element so it will be destroyed if anything 
> > throws.
> > +   _Guard_elts __guard_elts(__new_start + __elems, _M_impl);
> > +
> > +   __new_finish = std::__uninitialized_move_if_noexcept_a(
> > +__old_start, __old_finish,
> > +__new_start, _M_get_Tp_allocator());
> > +
> > +   ++__new_finish;
> > +   // Guard everything before the new element too.
> > +   __guard_elts._M_first = __new_start;
> 
> This seems redundant, we're not doing any more insertions now, and so
> this store is dead.

I am attaching patch with this check removed.  However I still think
Guard_elts needs to stay to be able to destroy the old ellements
> 
> > +
> > +   // New storage has been fully initialized, destroy the old 
> > elements.
> > +   __guard_elts._M_first = __old_start;
> > +   __guard_elts._M_last = __old_finish;
... here

Does it look better? (I am not really that confidend with libstdc++).

I also was wondering if we do have some data on what libstdc++ functions
are perofrmance critical in practice.  Given that the push_back can be
sped up very noticeably, I wonder if we don't have problems elsewhere?

Thank you,
Honza

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 5e18f6eedce..973f4d7e2e9 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1288,7 +1288,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_GLIBCXX_ASAN_ANNOTATE_GREW(1);
  }
else
- _M_realloc_insert(end(), __x);
+ _M_realloc_append(__x);
   }
 
 #if __cplusplus >= 201103L
@@ -1822,6 +1822,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   void
   _M_realloc_insert(iterator __position, const value_type& __x);
+
+  void
+  _M_realloc_append(const value_type& __x);
 #else
   // A value_type object constructed with _Alloc_traits::construct()
   // and destroyed with _Alloc_traits::destroy().
@@ -1871,6 +1874,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
void
_M_realloc_insert(iterator __position, _Args&&... __args);
 
+  template
+   _GLIBCXX20_CONSTEXPR
+   void
+   _M_realloc_append(_Args&&... __args);
+
   // Either move-construct at the end, or forward to _M_insert_aux.
   _GLIBCXX20_CONSTEXPR
   iterator
diff --git a/libstdc++-v3/include/bits/vector.tcc 
b/libstdc++-v3/include/bits/vector.tcc
index 80631d1e2a1..0ccef7911b3 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -120,7 +120,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_GLIBCXX_ASAN_ANNOTATE_GREW(1);
  }
else
- _M_realloc_insert(end(), std::forward<_Args>(__args)...);
+ _M_realloc_append(std::forward<_Args>(__args)...);
 #if __cplusplus > 201402L
return back();
 #endif
@@ -459,6 +459,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #endif
 {
   const size_type __len = _M_check_len(1u, "vector::_M_realloc_insert");
+  if (__len <= 0)
+   __builtin_unreachable ();
   pointer __old_start = this->_M_impl._M_start;
   pointer __old_finish = this->_M_impl._M_finish;
   const size_type __elems_before = __position - begin();
@@ -571,6 +573,127 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   this->_M_impl._M_end_of_storage = __new_start + __len;
 }
 
+#if __cplusplus >= 201103L
+  template
+template
+  _GLIBCXX20_CONSTEXPR
+  void
+  vector<_Tp, _Alloc>::
+  _M_realloc_append(_Args&&... __args)
+#else
+  template
+void
+vector<_Tp, _Alloc>::
+_M_realloc_append(const _Tp& __x)
+#endif
+{
+  const size_type __len = _M_check_len(1u, "vector::_M_realloc_append");
+  if (__len <= 0)
+   __builtin_unreachable ();
+  pointer __old_start = this->_M_impl._M_start;
+  pointer __old_finish = this->_M_impl._M_finish;
+  const

[Committed] RISC-V: Add missing dump check of pr112438.c

2023-11-21 Thread Juzhe-Zhong

Notice the dump check is missing, add it.

Committed as it is obvious.
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112438.c: Add missing dump check.

---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
index 51f90df38a0..d6770af1842 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
@@ -31,3 +31,4 @@ float * __restrict out, float x)
 }
 
 /* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }.  */
+/* { dg-final { scan-tree-dump-not "\\+ \{ POLY_INT_CST" "optimized" } } */
-- 
2.36.3

Re: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-11-21 Thread Richard Biener

On Thu, Nov 9, 2023 at 6:53 PM Di Zhao OS  wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, October 31, 2023 9:48 PM
> > To: Di Zhao OS 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > get_reassociation_width
> >
> > On Sun, Oct 8, 2023 at 6:40 PM Di Zhao OS 
> > wrote:
> > >
> > > Attached is a new version of the patch.
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Friday, October 6, 2023 5:33 PM
> > > > To: Di Zhao OS 
> > > > Cc: gcc-patches@gcc.gnu.org
> > > > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > > > get_reassociation_width
> > > >
> > > > On Thu, Sep 14, 2023 at 2:43 PM Di Zhao OS
> > > >  wrote:
> > > > >
> > > > > This is a new version of the patch on "nested FMA".
> > > > > Sorry for updating this after so long, I've been studying and
> > > > > writing micro cases to sort out the cause of the regression.
> > > >
> > > > Sorry for taking so long to reply.
> > > >
> > > > > First, following previous discussion:
> > > > > (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html)
> > > > >
> > > > > 1. From testing more altered cases, I don't think the
> > > > > problem is that reassociation works locally. In that:
> > > > >
> > > > >   1) On the example with multiplications:
> > > > >
> > > > > tmp1 = a + c * c + d * d + x * y;
> > > > > tmp2 = x * tmp1;
> > > > > result += (a + c + d + tmp2);
> > > > >
> > > > >   Given "result" rewritten by width=2, the performance is
> > > > >   worse if we rewrite "tmp1" with width=2. In contrast, if we
> > > > >   remove the multiplications from the example (and make "tmp1"
> > > > >   not singe used), and still rewrite "result" by width=2, then
> > > > >   rewriting "tmp1" with width=2 is better. (Make sense because
> > > > >   the tree's depth at "result" is still smaller if we rewrite
> > > > >   "tmp1".)
> > > > >
> > > > >   2) I tried to modify the assembly code of the example without
> > > > >   FMA, so the width of "result" is 4. On Ampere1 there's no
> > > > >   obvious improvement. So although this is an interesting
> > > > >   problem, it doesn't seem like the cause of the regression.
> > > >
> > > > OK, I see.
> > > >
> > > > > 2. From assembly code of the case with FMA, one problem is
> > > > > that, rewriting "tmp1" to parallel didn't decrease the
> > > > > minimum CPU cycles (taking MULT_EXPRs into account), but
> > > > > increased code size, so the overhead is increased.
> > > > >
> > > > >a) When "tmp1" is not re-written to parallel:
> > > > > fmadd d31, d2, d2, d30
> > > > > fmadd d31, d3, d3, d31
> > > > > fmadd d31, d4, d5, d31  //"tmp1"
> > > > > fmadd d31, d31, d4, d3
> > > > >
> > > > >b) When "tmp1" is re-written to parallel:
> > > > > fmul  d31, d4, d5
> > > > > fmadd d27, d2, d2, d30
> > > > > fmadd d31, d3, d3, d31
> > > > > fadd  d31, d31, d27 //"tmp1"
> > > > > fmadd d31, d31, d4, d3
> > > > >
> > > > > For version a), there are 3 dependent FMAs to calculate "tmp1".
> > > > > For version b), there are also 3 dependent instructions in the
> > > > > longer path: the 1st, 3rd and 4th.
> > > >
> > > > Yes, it doesn't really change anything.  The patch has
> > > >
> > > > +  /* If there's code like "acc = a * b + c * d + acc" in a tight loop,
> > some
> > > > + uarchs can execute results like:
> > > > +
> > > > +   _1 = a * b;
> > > > +   _2 = .FMA (c, d, _1);
> > > > +   acc_1 = acc_0 + _2;
> > > > +
> > > > + in parallel, while turning it into
> > > > +
> > > > +   _1 = .FMA(a, b, acc_0);
> > > > +   acc_1 = .FMA(c, d, _1);
> > > > +
> > > > + hinders that, because then the first FMA depends on the result
> > > > of preceding
> > > > + iteration.  */
> > > >
> > > > I can't see what can be run in parallel for the first case.  The .FMA
> > > > depends on the multiplication a * b.  Iff the uarch somehow decomposes
> > > > .FMA into multiply + add then the c * d multiply could run in parallel
> > > > with the a * b multiply which _might_ be able to hide some of the
> > > > latency of the full .FMA.  Like on x86 Zen FMA has a latency of 4
> > > > cycles but a multiply only 3.  But I never got confirmation from any
> > > > of the CPU designers that .FMAs are issued when the multiply
> > > > operands are ready and the add operand can be forwarded.
> > > >
> > > > I also wonder why the multiplications of the two-FMA sequence
> > > > then cannot be executed at the same time?  So I have some doubt
> > > > of the theory above.
> > >
> > > The parallel execution for the code snippet above was the other
> > > issue (previously discussed here:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628960.html).
> > > Sorry it's a bit confusing to include that here, but these 2 fixes
> > > needs to be combined to avoid new regr

Re: libstdc++: Speed up push_back

2023-11-21 Thread Jonathan Wakely

On Tue, 21 Nov 2023 at 12:50, Jan Hubicka  wrote:
>
> > > +   // RAII type to destroy initialized elements.
> >
> > There's only one initialized element, not "elements".
> >
> > > +   struct _Guard_elts
> > > +   {
> > > + pointer _M_first, _M_last;  // Elements to destroy
> >
> > We only need to store one pointer here, call it _M_p.
> >
> > > + _Tp_alloc_type& _M_alloc;
> > > +
> > > + _GLIBCXX20_CONSTEXPR
> > > + _Guard_elts(pointer __elt, _Tp_alloc_type& __a)
> > > + : _M_first(__elt), _M_last(__elt + 1), _M_alloc(__a)
> > > + { }
> > > +
> > > + _GLIBCXX20_CONSTEXPR
> > > + ~_Guard_elts()
> > > + { std::_Destroy(_M_first, _M_last, _M_alloc); }
> >
> > This should be either:
> >
> >   std::_Destroy(_M_p, _M_p+1, _M_alloc);
> >
> > or avoid the loop that happens in that _Destroy function:
> >
> >   _Alloc_traits::destroy(_M_alloc, _M_p);
> >
> > > +
> > > +   private:
> > > + _Guard_elts(const _Guard_elts&);
> > > +   };
> > > +
> > > +   // Guard the new element so it will be destroyed if anything 
> > > throws.
> > > +   _Guard_elts __guard_elts(__new_start + __elems, _M_impl);
> > > +
> > > +   __new_finish = std::__uninitialized_move_if_noexcept_a(
> > > +__old_start, __old_finish,
> > > +__new_start, _M_get_Tp_allocator());
> > > +
> > > +   ++__new_finish;
> > > +   // Guard everything before the new element too.
> > > +   __guard_elts._M_first = __new_start;
> >
> > This seems redundant, we're not doing any more insertions now, and so
> > this store is dead.
>
> I am attaching patch with this check removed.  However I still think
> Guard_elts needs to stay to be able to destroy the old ellements
> >
> > > +
> > > +   // New storage has been fully initialized, destroy the old 
> > > elements.
> > > +   __guard_elts._M_first = __old_start;
> > > +   __guard_elts._M_last = __old_finish;
> ... here
>
> Does it look better? (I am not really that confidend with libstdc++).

Yes, looks good, thanks.


>
> I also was wondering if we do have some data on what libstdc++ functions
> are perofrmance critical in practice.  Given that the push_back can be
> sped up very noticeably, I wonder if we don't have problems elsewhere?

We don't have that data, no.

It's possible that we could do similar things for insert(iterator pos,
...) for the case where pos==end(), i.e. inserting multiple elements
at the end.
>
> Thank you,
> Honza
>
> diff --git a/libstdc++-v3/include/bits/stl_vector.h 
> b/libstdc++-v3/include/bits/stl_vector.h
> index 5e18f6eedce..973f4d7e2e9 100644
> --- a/libstdc++-v3/include/bits/stl_vector.h
> +++ b/libstdc++-v3/include/bits/stl_vector.h
> @@ -1288,7 +1288,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> _GLIBCXX_ASAN_ANNOTATE_GREW(1);
>   }
> else
> - _M_realloc_insert(end(), __x);
> + _M_realloc_append(__x);
>}
>
>  #if __cplusplus >= 201103L
> @@ -1822,6 +1822,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>
>void
>_M_realloc_insert(iterator __position, const value_type& __x);
> +
> +  void
> +  _M_realloc_append(const value_type& __x);
>  #else
>// A value_type object constructed with _Alloc_traits::construct()
>// and destroyed with _Alloc_traits::destroy().
> @@ -1871,6 +1874,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> void
> _M_realloc_insert(iterator __position, _Args&&... __args);
>
> +  template
> +   _GLIBCXX20_CONSTEXPR
> +   void
> +   _M_realloc_append(_Args&&... __args);
> +
>// Either move-construct at the end, or forward to _M_insert_aux.
>_GLIBCXX20_CONSTEXPR
>iterator
> diff --git a/libstdc++-v3/include/bits/vector.tcc 
> b/libstdc++-v3/include/bits/vector.tcc
> index 80631d1e2a1..0ccef7911b3 100644
> --- a/libstdc++-v3/include/bits/vector.tcc
> +++ b/libstdc++-v3/include/bits/vector.tcc
> @@ -120,7 +120,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> _GLIBCXX_ASAN_ANNOTATE_GREW(1);
>   }
> else
> - _M_realloc_insert(end(), std::forward<_Args>(__args)...);
> + _M_realloc_append(std::forward<_Args>(__args)...);
>  #if __cplusplus > 201402L
> return back();
>  #endif
> @@ -459,6 +459,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>  #endif
>  {
>const size_type __len = _M_check_len(1u, "vector::_M_realloc_insert");
> +  if (__len <= 0)
> +   __builtin_unreachable ();
>pointer __old_start = this->_M_impl._M_start;
>pointer __old_finish = this->_M_impl._M_finish;
>const size_type __elems_before = __position - begin();
> @@ -571,6 +573,127 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>this->_M_impl._M_end_of_storage = __new_start + __len;
>  }
>
> +#if __cplusplus >= 201103L
>

Re: [PATCH 1/4] libsanitizer: merge from upstream (c425db2eb558c263)

2023-11-21 Thread FX



> I see that the fix was applied locally (and my bootstraps on various Darwin 
> versions worked OK),
> but I’m not clear if you submitted a PR upstream (or just the bug report).  
> If the fix is to remain
> local-only, it should be added to the list in LOCAL_PATCHES.

Patch was submitted upstream: https://github.com/llvm/llvm-project/pull/72642

FX

Re: [BUG FIX] RISC-V: Disallow COSNT_VECTOR for DI on RV32

2023-11-21 Thread Robin Dapp

Hi Juzhe,

> This bug is exposed when testing on zvl512b RV32 system.
> 
> The rootcause is RA reload DI CONST_VECTOR into vmv.v.x then it ICE.
> 
> So disallow DI CONST_VECTOR on RV32.

OK. 

Regards
 Robin

RE: [PATCH] aarch64: Add support for Ampere-1B (-mcpu=ampere1b) CPU

2023-11-21 Thread Tamar Christina

Hi Philipp,

Could you rebase this patch on top of master please.

Essentially we put each tuning model in its own file now.

Thanks,
Tamar

> -Original Message-
> From: Philipp Tomsich 
> Sent: Thursday, November 16, 2023 6:16 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Philipp Tomsich
> 
> Subject: [PATCH] aarch64: Add support for Ampere-1B (-mcpu=ampere1b)
> CPU
> 
> This patch adds initial support for Ampere-1B core.
> 
> The Ampere-1B core implements ARMv8.7 with the following (compiler
> visible) extensions:
>  - CSSC (Common Short Sequence Compression instructions),
>  - MTE (Memory Tagging Extension)
>  - SM3/SM4
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere-
> 1b
>   * config/aarch64/aarch64-cost-tables.h: Add ampere1b_extra_costs
>   * config/aarch64/aarch64.cc: Add ampere1b_prefetch_tune and
>   ampere1b_advsimd_vector_costs
>   * config/aarch64/aarch64-tune.md: Regenerate
>   * doc/invoke.texi: Document -mcpu=ampere1b
> 
> Signed-off-by: Philipp Tomsich 
> ---
> 
>  gcc/config/aarch64/aarch64-cores.def |   1 +
>  gcc/config/aarch64/aarch64-cost-tables.h | 107
> +++
>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>  gcc/config/aarch64/aarch64.cc|  89 +++
>  gcc/doc/invoke.texi  |   2 +-
>  5 files changed, 199 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> index eae40b29df6..19dfb133d29 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -74,6 +74,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,
> thunderx,  V8A,  (CRC, CRYPTO), thu
>  /* Ampere Computing ('\xC0') cores. */
>  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES,
> SHA3), ampere1, 0xC0, 0xac3, -1)  AARCH64_CORE("ampere1a", ampere1a,
> cortexa57, V8_6A, (F16, RNG, AES, SHA3, SM4, MEMTAG), ampere1a, 0xC0,
> 0xac4, -1)
> +AARCH64_CORE("ampere1b", ampere1b, cortexa57, V8_7A, (F16, RNG,
> AES,
> +SHA3, SM4, MEMTAG, CSSC), ampere1b, 0xC0, 0xac5, -1)
>  /* Do not swap around "emag" and "xgene1",
> this order is required to handle variant correctly. */
>  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag,
> 0x50, 0x000, 3)
> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h
> b/gcc/config/aarch64/aarch64-cost-tables.h
> index 0cb638f3a13..4c8da7f119b 100644
> --- a/gcc/config/aarch64/aarch64-cost-tables.h
> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> @@ -882,4 +882,111 @@ const struct cpu_cost_table ampere1a_extra_costs
> =
>}
>  };
> 
> +const struct cpu_cost_table ampere1b_extra_costs = {
> +  /* ALU */
> +  {
> +0, /* arith.  */
> +0, /* logical.  */
> +0, /* shift.  */
> +COSTS_N_INSNS (1), /* shift_reg.  */
> +0, /* arith_shift.  */
> +COSTS_N_INSNS (1), /* arith_shift_reg.  */
> +0, /* log_shift.  */
> +COSTS_N_INSNS (1), /* log_shift_reg.  */
> +0, /* extend.  */
> +COSTS_N_INSNS (1), /* extend_arith.  */
> +0, /* bfi.  */
> +0, /* bfx.  */
> +0, /* clz.  */
> +0, /* rev.  */
> +0, /* non_exec.  */
> +true   /* non_exec_costs_exec.  */
> +  },
> +  {
> +/* MULT SImode */
> +{
> +  COSTS_N_INSNS (2),   /* simple.  */
> +  COSTS_N_INSNS (2),   /* flag_setting.  */
> +  COSTS_N_INSNS (2),   /* extend.  */
> +  COSTS_N_INSNS (3),   /* add.  */
> +  COSTS_N_INSNS (3),   /* extend_add.  */
> +  COSTS_N_INSNS (12)   /* idiv.  */
> +},
> +/* MULT DImode */
> +{
> +  COSTS_N_INSNS (2),   /* simple.  */
> +  0,   /* flag_setting (N/A).  */
> +  COSTS_N_INSNS (2),   /* extend.  */
> +  COSTS_N_INSNS (3),   /* add.  */
> +  COSTS_N_INSNS (3),   /* extend_add.  */
> +  COSTS_N_INSNS (18)   /* idiv.  */
> +}
> +  },
> +  /* LD/ST */
> +  {
> +COSTS_N_INSNS (2), /* load.  */
> +COSTS_N_INSNS (2), /* load_sign_extend.  */
> +0, /* ldrd (n/a).  */
> +0, /* ldm_1st.  */
> +0, /* ldm_regs_per_insn_1st.  */
> +0, /* ldm_regs_per_insn_subsequent.  */
> +COSTS_N_INSNS (3), /* loadf.  */
> +COSTS_N_INSNS (3), /* loadd.  */
> +COSTS_N_INSNS (3), /* load_unaligned.  */
> +0, /* store.  */
> +0, /* strd.  */
> +0, /* stm_1st.  */
> +0, /* stm_regs_per_insn_1st.  */
> +0, /* stm_regs_per_insn_subsequent.  */
> +COSTS_N_INS

Re: [PATCH] RISC-V: testsuite: Fix popcount test.

2023-11-21 Thread Robin Dapp

> Mhm, not so obvious after all.  We vectorize 250 instances with
> rv32gcv, 229 with rv64gcv and 250 with rv64gcv_zbb.  Will have
> another look tomorrow.

The problem is that tree-vect-patterns is more restrictive than
necessary and does not vectorize everything it could.  Therefore
I'm going to commit the attached with a TODO comment and a
separate check for zbb.

Regards
 Robin

Subject: [PATCH v2] RISC-V: testsuite: Fix popcount test.

Due to Jakub's recent middle-end changes we now vectorize some more
popcount instances.  This patch just adjusts the dump check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/popcount.c: Adjust check.
* lib/target-supports.exp: Add riscv_zbb.
---
 .../gcc.target/riscv/rvv/autovec/unop/popcount.c  | 10 +-
 gcc/testsuite/lib/target-supports.exp | 11 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
index 585a522aa81..ca1319c2e7e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
@@ -1461,4 +1461,12 @@ main ()
   RUN_ALL ()
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
+/* TODO: Due to an over-zealous check in tree-vect-patterns we do not vectorize
+   e.g.
+ uint64_t dst[];
+ uint32_t src[];
+ dst[i] = __builtin_popcountll (src[i]);
+   even though we could.  Therefore, for now, adjust the following checks.
+   This difference was exposed in r14-5557-g6dd4c703be17fa.  */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" { target { { 
rv64 } && { ! riscv_zbb } } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 250 "vect" { target { { 
rv32 } || { riscv_zbb } } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f3cd0311e27..87b2ae58720 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1983,6 +1983,17 @@ proc check_effective_target_riscv_ztso { } {
 }]
 }
 
+# Return 1 if the target arch supports the Zbb extension, 0 otherwise.
+# Cache the result.
+
+proc check_effective_target_riscv_zbb { } {
+return [check_no_compiler_messages riscv_ext_zbb assembly {
+   #ifndef __riscv_zbb
+   #error "Not __riscv_zbb"
+   #endif
+}]
+}
+
 # Return 1 if we can execute code when using dg-add-options riscv_v
 
 proc check_effective_target_riscv_v_ok { } {
-- 
2.42.0

Re: [PATCH] RISC-V: testsuite: Add rv64 requirement for bug-9 and bug-14.

2023-11-21 Thread Robin Dapp

> /* { dg-do run { target { { {riscv_v} && {rv64} } } } } */
> 
> Seems you should remove rv64 here ? sicne I think it is redundant here.

Going to commit with that removed.

Regards
 Robin

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-21 Thread Arsen Arsenović


Arsen Arsenović  writes:

> Bruno Haible  writes:
>
>> Arsen Arsenović wrote:
>>>   Comparing stages 2 and 3
>>>   Bootstrap comparison failure!
>>>   gettext/libasprintf/autosprintf.o differs
>>>   make[2]: *** [Makefile:23435: compare] Error 1
>>
>> You should be able to work around this by passing the additional option
>> --disable-libasprintf to gettext-runtime/configure. Nothing in GCC needs
>> libasprintf; therefore there is no need to build it.
>
> Ah, sure, that works for me too (note that the fix is to pass
> -frandom-seed=, according to Jakub, should this show up again).

Indeed, that got a bootstrap to pass.  I've also taken the opportunity
to check the problems Eric Gallager reported.  The install tree seems
clean now, and the info et al targets appear to work again.  David,
Eric, could you check whether the attached patch works for you in the
scenarios you ran into problems with?  Make sure to fetch gettext-0.22.4
into your trees.

From d0f8b623f9720947b805d71c05a5d6a638daefb8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Arsen=20Arsenovi=C4=87?= 
Date: Thu, 16 Nov 2023 23:50:30 +0100
Subject: [PATCH] gettext: disable install, docs targets, libasprintf, threads

This fixes issues reported by David Edelsohn , and by
Eric Gallager .

ChangeLog:

	* Makefile.def (gettext): Disable (via missing)
	{install-,}{pdf,html,info,dvi} and TAGS targets.  Set no_install
	to true.  Add --disable-threads --disable-libasprintf.
	* Makefile.in: Regenerate.
---
 Makefile.def |  13 +++-
 Makefile.in  | 202 ---
 2 files changed, 40 insertions(+), 175 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index 792f81447e1b..6b03deb49506 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -80,8 +80,17 @@ host_modules= { module= gettext; bootstrap=true; no_install=true;
 		// need it in some configuratons, which is determined via nontrivial tests.
 		// Always enabling pic seems to make sense for something tied to
 		// user-facing output.
-extra_configure_flags='--disable-shared --disable-java --disable-csharp --with-pic';
-lib_path=intl/.libs; };
+extra_configure_flags='--disable-shared --disable-threads --disable-java --disable-csharp --with-pic --disable-libasprintf';
+		missing= pdf;
+		missing= html;
+		missing= info;
+		missing= dvi;
+		missing= install-pdf;
+		missing= install-html;
+		missing= install-info;
+		missing= install-dvi;
+		missing= TAGS;
+		no_install= true;};
 host_modules= { module= tcl;
 missing=mostlyclean; };
 host_modules= { module= itcl; };
diff --git a/Makefile.in b/Makefile.in
index da2344b3f3dc..3bd7d37e9605 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -768,7 +768,7 @@ TARGET_LIB_PATH_libatomic = $$r/$(TARGET_SUBDIR)/libatomic/.libs:
 
 # This is the list of directories that may be needed in RPATH_ENVVAR
 # so that programs built for the host machine work.
-HOST_LIB_PATH = $(HOST_LIB_PATH_gmp)$(HOST_LIB_PATH_mpfr)$(HOST_LIB_PATH_mpc)$(HOST_LIB_PATH_isl)$(HOST_LIB_PATH_gettext)
+HOST_LIB_PATH = $(HOST_LIB_PATH_gmp)$(HOST_LIB_PATH_mpfr)$(HOST_LIB_PATH_mpc)$(HOST_LIB_PATH_isl)
 
 # Define HOST_LIB_PATH_gcc here, for the sake of TARGET_LIB_PATH, ouch
 @if gcc
@@ -796,11 +796,6 @@ HOST_LIB_PATH_isl = \
   $$r/$(HOST_SUBDIR)/isl/.libs:$$r/$(HOST_SUBDIR)/prev-isl/.libs:
 @endif isl
 
-@if gettext
-HOST_LIB_PATH_gettext = \
-  $$r/$(HOST_SUBDIR)/gettext/intl/.libs:$$r/$(HOST_SUBDIR)/prev-gettext/intl/.libs:
-@endif gettext
-
 
 CXX_FOR_TARGET_FLAG_TO_PASS = \
 	"CXX_FOR_TARGET=$(CXX_FOR_TARGET)"
@@ -19827,7 +19822,7 @@ configure-gettext:
 	  $$s/$$module_srcdir/configure \
 	  --srcdir=$${topdir}/$$module_srcdir \
 	  $(HOST_CONFIGARGS) --build=${build_alias} --host=${host_alias} \
-	  --target=${target_alias} --disable-shared --disable-java --disable-csharp --with-pic \
+	  --target=${target_alias} --disable-shared --disable-threads --disable-java --disable-csharp --with-pic --disable-libasprintf \
 	  || exit 1
 @endif gettext
 
@@ -19863,7 +19858,7 @@ configure-stage1-gettext:
 	  --target=${target_alias} \
 	   \
 	  $(STAGE1_CONFIGURE_FLAGS) \
-	  --disable-shared --disable-java --disable-csharp --with-pic
+	  --disable-shared --disable-threads --disable-java --disable-csharp --with-pic --disable-libasprintf
 @endif gettext-bootstrap
 
 .PHONY: configure-stage2-gettext maybe-configure-stage2-gettext
@@ -19897,7 +19892,7 @@ configure-stage2-gettext:
 	  --target=${target_alias} \
 	  --with-build-libsubdir=$(HOST_SUBDIR) \
 	  $(STAGE2_CONFIGURE_FLAGS) \
-	  --disable-shared --disable-java --disable-csharp --with-pic
+	  --disable-shared --disable-threads --disable-java --disable-csharp --with-pic --disable-libasprintf
 @endif gettext-bootstrap
 
 .PHONY: configure-stage3-gettext maybe-configure-stage3-gettext
@@ -19931,7 +19926,7 @@ configure-stage3-gettext:
 	  --target=${target_alias} \
 	  --with-build-libsubdir=$(HOST_SUBDIR) \
 	  $(STAGE3_CONFIGURE_FLAGS) \
-	  --disable-shared -

Re: [PATCH 0/4] v2 of Option handling: add documentation URLs

2023-11-21 Thread David Malcolm

On Tue, 2023-11-21 at 02:09 +0100, Hans-Peter Nilsson wrote:
> > From: David Malcolm 
> > Date: Thu, 16 Nov 2023 09:28:54 -0500
> 
> > How is this looking for trunk?
> > 
> > Thanks
> > Dave
> > 
> > 
> > David Malcolm (4):
> >   options: add gcc/regenerate-opt-urls.py
> >   Add generated .opt.urls files
> >   opts: add logic to generate options-urls.cc
> >   options: wire up options-urls.cc into gcc_urlifier
> > 
> >  gcc/Makefile.in  |   29 +-
> >  gcc/ada/gcc-interface/lang.opt.urls  |   30 +
> >  gcc/analyzer/analyzer.opt.urls   |  206 ++
> >  gcc/c-family/c.opt.urls  | 1409 ++
> >  gcc/common.opt.urls  | 1832
> > ++
> >  gcc/config/aarch64/aarch64.opt.urls  |   84 +
> >  gcc/config/alpha/alpha.opt.urls  |   76 +
> >  gcc/config/alpha/elf.opt.urls    |    2 +
> >  gcc/config/arc/arc-tables.opt.urls   |    2 +
> 
> [... etc .opt.urls particularly in gcc/config/*
> autogenerated for each *.opt ...]
> 
> Sorry for barging in though I did try finding the relevant
> discussion, but is committing this generated stuff necessary?
> Is it because we don't want to depend on Python being
> present at build time?

Partly, yes, but also because this information is generated from the
generated HTML for the user manuals: gcc, gdc, and gfortran, and it
gets used to generate "optionlist", which is a dependency for all of
the rest of the "gcc" subdirectory.  Doing it automatically would make
the generation of the HTML docs for gcc, d and gfortran be a hard
dependency early on in the build, which seems to me to be a bad idea -
better to have this rarely-changing and non-critical information be
regenerated when it's needed, and not impose the requirement to build
the HTML docs for all these langs on all builds of gcc.

> 
> If nothing else, can you add a few lines to the commit
> message why this can't be/is preferably not done at build
> time?

The wording here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636060.html
may be what you're looking for.

Maybe that text (or something like it) should go in as a big comment at
the top of regenerate-opt-urls.py ?

Hope this clarifies things
Dave

Re: Propagate value ranges of return values

2023-11-21 Thread Christophe Lyon

Hi!

On Sun, 19 Nov 2023 at 16:05, Jan Hubicka  wrote:
>
> Hi,
> this is updated version which also adds testuiste compensation
> I lost earlier while maintaining the patch in my testing tree.
> There are quite few testcases that use constant return values to hide
> something from optimizer.
>
> Bootstrapped/regtested x86_64-linux.
> gcc/ChangeLog:
>
> * cgraph.cc (add_detected_attribute_1): New function.
> (cgraph_node::add_detected_attribute): Likewise.
> * cgraph.h (cgraph_node::add_detected_attribute): Declare.
> * common.opt: Add -Wsuggest-attribute=returns_nonnull.
> * doc/invoke.texi: Document new flag.
> * gimple-range-fold.cc (fold_using_range::range_of_call):
> Use known reutrn value ranges.
> * ipa-prop.cc (struct ipa_return_value_summary): New type.
> (class ipa_return_value_sum_t): New type.
> (ipa_return_value_sum): New summary.
> (ipa_record_return_value_range): New function.
> (ipa_return_value_range): New function.
> * ipa-prop.h (ipa_return_value_range): Declare.
> (ipa_record_return_value_range): Declare.
> * ipa-pure-const.cc (warn_function_returns_nonnull): New funcion.
> * ipa-utils.h (warn_function_returns_nonnull): Declare.
> * symbol-summary.h: Fix comment.
> * tree-vrp.cc (execute_ranger_vrp): Record return values.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ipa/devirt-2.C: Add noipa attribute to prevent ipa-vrp.
> * g++.dg/ipa/devirt-7.C: Disable ipa-vrp.
> * g++.dg/ipa/ipa-icf-2.C: Disable ipa-vrp.
> * g++.dg/ipa/ipa-icf-3.C: Disable ipa-vrp.
> * g++.dg/ipa/ivinline-1.C: Disable ipa-vrp.
> * g++.dg/ipa/ivinline-3.C: Disable ipa-vrp.
> * g++.dg/ipa/ivinline-5.C: Disable ipa-vrp.
> * g++.dg/ipa/ivinline-8.C: Disable ipa-vrp.
> * g++.dg/ipa/nothrow-1.C: Disable ipa-vrp.
> * g++.dg/ipa/pure-const-1.C: Disable ipa-vrp.
> * g++.dg/ipa/pure-const-2.C: Disable ipa-vrp.
> * g++.dg/lto/inline-crossmodule-1_0.C: Disable ipa-vrp.
> * gcc.c-torture/compile/pr106433.c: Add noipa attribute to prevent 
> ipa-vrp.
> * gcc.c-torture/execute/frame-address.c: Likewise.
> * gcc.dg/ipa/fopt-info-inline-1.c: Disable ipa-vrp.
> * gcc.dg/ipa/ipa-icf-25.c: Disable ipa-vrp.
> * gcc.dg/ipa/ipa-icf-38.c: Disable ipa-vrp.
> * gcc.dg/ipa/pure-const-1.c: Disable ipa-vrp.
> * gcc.dg/ipa/remref-0.c: Add noipa attribute to prevent ipa-vrp.
> * gcc.dg/tree-prof/time-profiler-1.c: Disable ipa-vrp.
> * gcc.dg/tree-prof/time-profiler-2.c: Disable ipa-vrp.
> * gcc.dg/tree-ssa/pr110269.c: Disable ipa-vrp.
> * gcc.dg/tree-ssa/pr20701.c: Disable ipa-vrp.
> * gcc.dg/tree-ssa/vrp05.c: Disable ipa-vrp.
> * gcc.dg/tree-ssa/return-value-range-1.c: New test.
>

After this patch in addition to the problem already reported about
vlda1.c and return-value-range-1.c, we have noticed these regressions
on aarch64:
Running gcc:gcc.target/aarch64/aarch64.exp ...
FAIL: gcc.target/aarch64/movk.c scan-assembler movk\tx[0-9]+, 0x4667, lsl 16
FAIL: gcc.target/aarch64/movk.c scan-assembler movk\tx[0-9]+, 0x7a3d, lsl 32

Running gcc:gcc.target/aarch64/simd/simd.exp ...
FAIL: gcc.target/aarch64/simd/vmulxd_f64_2.c scan-assembler-times
fmul[ \t]+[dD][0-9]+, ?[dD][0-9]+, ?[dD][0-9]+\n 1
FAIL: gcc.target/aarch64/simd/vmulxd_f64_2.c scan-assembler-times
fmulx[ \t]+[dD][0-9]+, ?[dD][0-9]+, ?[dD][0-9]+\n 4
FAIL: gcc.target/aarch64/simd/vmulxs_f32_2.c scan-assembler-times
fmul[ \t]+[sS][0-9]+, ?[sS][0-9]+, ?[sS][0-9]+\n 1
FAIL: gcc.target/aarch64/simd/vmulxs_f32_2.c scan-assembler-times
fmulx[ \t]+[sS][0-9]+, ?[sS][0-9]+, ?[sS][0-9]+\n 4

We have already sent you a notification for the regression on arm, but
it includes on vla-1.c and return-value-range-1.c.
The notification email contains a pointer to the page where we record
all the configurations that regress because of this patch:

https://linaro.atlassian.net/browse/GNU-1025

Can you have a look?

Thanks,

Christophe




> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index e41e5ad3ae7..71dacf23ce1 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -2629,6 +2629,54 @@ cgraph_node::set_malloc_flag (bool malloc_p)
>return changed;
>  }
>
> +/* Worker to set malloc flag.  */
> +static void
> +add_detected_attribute_1 (cgraph_node *node, const char *attr, bool *changed)
> +{
> +  if (!lookup_attribute (attr, DECL_ATTRIBUTES (node->decl)))
> +{
> +  DECL_ATTRIBUTES (node->decl) = tree_cons (get_identifier (attr),
> +NULL_TREE, DECL_ATTRIBUTES 
> (node->decl));
> +  *changed = true;
> +}
> +
> +  ipa_ref *ref;
> +  FOR_EACH_ALIAS (node, ref)
> +{
> +  cgraph_node *alias = dyn_cast (ref->referring);
> +  if (alias->get_availability () > AVAIL_INTERPOSABLE)
> +   add_detected_attribute_1 (

Re: [PATCH] testsuite: Fix up pr111309-2.c on arm [PR111309]

2023-11-21 Thread Christophe Lyon

On Tue, 21 Nov 2023 at 09:48, Jakub Jelinek  wrote:
>
> Hi!
>
> ARM defaults to -fshort-enums and the following testcase FAILs there in 2
> lines.  The difference is that in C++, E0 has enum E type, which normally
> has unsigned int underlying type, so it isn't int nor something that
> promotes to int, which is why we diagnose it (in C it is promoted to int).
> But with -fshort-enums, the underlying type is unsigned char in that case,
> which promotes to int just fine.
>
> The following patch adjusts the expectations, such that we don't expect
> it on arm or when people manually test with -fshort-enums.
>
> Tested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-11-21  Jakub Jelinek  
>
> PR c/111309
> * c-c++-common/pr111309-2.c (foo): Don't expect errors for C++ with
> -fshort-enums if second argument is E0.
>

Thanks for the fix!

I keep forgetting about the -fshort-enum difference between arm-linux
and arm-eabi targets

Christophe

> --- gcc/testsuite/c-c++-common/pr111309-2.c.jj  2023-11-14 10:52:16.191276028 
> +0100
> +++ gcc/testsuite/c-c++-common/pr111309-2.c 2023-11-20 17:52:30.606386073 
> +0100
> @@ -32,7 +32,7 @@ foo (void)
>__builtin_clzg (0U, 2LL);/* { dg-error "does not have 'int' type" } */
>__builtin_clzg (0U, 2U); /* { dg-error "does not have 'int' type" } */
>__builtin_clzg (0U, true);
> -  __builtin_clzg (0U, E0); /* { dg-error "does not have 'int' type" "" { 
> target c++ } } */
> +  __builtin_clzg (0U, E0); /* { dg-error "does not have 'int' type" "" { 
> target { c++ && { ! short_enums } } } } */
>__builtin_ctzg ();   /* { dg-error "too few arguments" } */
>__builtin_ctzg (0U, 1, 2);   /* { dg-error "too many arguments" } */
>__builtin_ctzg (0);  /* { dg-error "has signed type" } */
> @@ -51,7 +51,7 @@ foo (void)
>__builtin_ctzg (0U, 2LL);/* { dg-error "does not have 'int' type" } */
>__builtin_ctzg (0U, 2U); /* { dg-error "does not have 'int' type" } */
>__builtin_ctzg (0U, true);
> -  __builtin_ctzg (0U, E0); /* { dg-error "does not have 'int' type" "" { 
> target c++ } } */
> +  __builtin_ctzg (0U, E0); /* { dg-error "does not have 'int' type" "" { 
> target { c++ && { ! short_enums } } } } */
>__builtin_clrsbg (); /* { dg-error "too few arguments" } */
>__builtin_clrsbg (0, 1); /* { dg-error "too many arguments" } */
>__builtin_clrsbg (0U);   /* { dg-error "has unsigned type" } */
>
> Jakub
>

Re: Propagate value ranges of return values

2023-11-21 Thread Jan Hubicka

> After this patch in addition to the problem already reported about
> vlda1.c and return-value-range-1.c, we have noticed these regressions
> on aarch64:
> Running gcc:gcc.target/aarch64/aarch64.exp ...
> FAIL: gcc.target/aarch64/movk.c scan-assembler movk\tx[0-9]+, 0x4667, lsl 16
> FAIL: gcc.target/aarch64/movk.c scan-assembler movk\tx[0-9]+, 0x7a3d, lsl 32
> 
> Running gcc:gcc.target/aarch64/simd/simd.exp ...
> FAIL: gcc.target/aarch64/simd/vmulxd_f64_2.c scan-assembler-times
> fmul[ \t]+[dD][0-9]+, ?[dD][0-9]+, ?[dD][0-9]+\n 1
> FAIL: gcc.target/aarch64/simd/vmulxd_f64_2.c scan-assembler-times
> fmulx[ \t]+[dD][0-9]+, ?[dD][0-9]+, ?[dD][0-9]+\n 4
> FAIL: gcc.target/aarch64/simd/vmulxs_f32_2.c scan-assembler-times
> fmul[ \t]+[sS][0-9]+, ?[sS][0-9]+, ?[sS][0-9]+\n 1
> FAIL: gcc.target/aarch64/simd/vmulxs_f32_2.c scan-assembler-times
> fmulx[ \t]+[sS][0-9]+, ?[sS][0-9]+, ?[sS][0-9]+\n 4

Sorry for that - I guess we will see some on various targets.
This is quite common issue - the testcase is having
dummy_number_generator function returning constant and prevents
inlining to avoid constant being visible to compiler.  This no longer
works, since we get it from the return value range.  This should fix it.

return-value_range-1.c should be fixed now and I do not have vlda1.c in
my tree.  I will check.

diff --git a/gcc/testsuite/gcc.target/aarch64/movk.c 
b/gcc/testsuite/gcc.target/aarch64/movk.c
index e6e4e3a8961..6b1f3f8ecf5 100644
--- a/gcc/testsuite/gcc.target/aarch64/movk.c
+++ b/gcc/testsuite/gcc.target/aarch64/movk.c
@@ -1,8 +1,9 @@
 /* { dg-do run } */
-/* { dg-options "-O2 --save-temps -fno-inline" } */
+/* { dg-options "-O2 --save-temps" } */
 
 extern void abort (void);
 
+__attribute__ ((noipa))
 long long int
 dummy_number_generator ()
 {

> 
> We have already sent you a notification for the regression on arm, but
> it includes on vla-1.c and return-value-range-1.c.
> The notification email contains a pointer to the page where we record
> all the configurations that regress because of this patch:
> 
> https://linaro.atlassian.net/browse/GNU-1025
> 
> Can you have a look?
> 
> Thanks,
> 
> Christophe
> 
> 
> 
> 
> > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> > index e41e5ad3ae7..71dacf23ce1 100644
> > --- a/gcc/cgraph.cc
> > +++ b/gcc/cgraph.cc
> > @@ -2629,6 +2629,54 @@ cgraph_node::set_malloc_flag (bool malloc_p)
> >return changed;
> >  }
> >
> > +/* Worker to set malloc flag.  */
> > +static void
> > +add_detected_attribute_1 (cgraph_node *node, const char *attr, bool 
> > *changed)
> > +{
> > +  if (!lookup_attribute (attr, DECL_ATTRIBUTES (node->decl)))
> > +{
> > +  DECL_ATTRIBUTES (node->decl) = tree_cons (get_identifier (attr),
> > +NULL_TREE, DECL_ATTRIBUTES 
> > (node->decl));
> > +  *changed = true;
> > +}
> > +
> > +  ipa_ref *ref;
> > +  FOR_EACH_ALIAS (node, ref)
> > +{
> > +  cgraph_node *alias = dyn_cast (ref->referring);
> > +  if (alias->get_availability () > AVAIL_INTERPOSABLE)
> > +   add_detected_attribute_1 (alias, attr, changed);
> > +}
> > +
> > +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
> > +if (e->caller->thunk
> > +   && (e->caller->get_availability () > AVAIL_INTERPOSABLE))
> > +  add_detected_attribute_1 (e->caller, attr, changed);
> > +}
> > +
> > +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
> > +
> > +bool
> > +cgraph_node::add_detected_attribute (const char *attr)
> > +{
> > +  bool changed = false;
> > +
> > +  if (get_availability () > AVAIL_INTERPOSABLE)
> > +add_detected_attribute_1 (this, attr, &changed);
> > +  else
> > +{
> > +  ipa_ref *ref;
> > +
> > +  FOR_EACH_ALIAS (this, ref)
> > +   {
> > + cgraph_node *alias = dyn_cast (ref->referring);
> > + if (alias->get_availability () > AVAIL_INTERPOSABLE)
> > +   add_detected_attribute_1 (alias, attr, &changed);
> > +   }
> > +}
> > +  return changed;
> > +}
> > +
> >  /* Worker to set noreturng flag.  */
> >  static void
> >  set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed)
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index cedaaac3a45..cfdd9f693a8 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -1190,6 +1190,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
> > public symtab_node
> >
> >bool set_pure_flag (bool pure, bool looping);
> >
> > +  /* Add attribute ATTR to cgraph_node's decl and on aliases of the node
> > + if any.  */
> > +  bool add_detected_attribute (const char *attr);
> > +
> >/* Call callback on function and aliases associated to the function.
> >   When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks 
> > are
> >   skipped. */
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index d21db5d4a20..c6599c7147b 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -781,6 +781,10 @@ Wsuggest-attribute=malloc
> >

Re: [PATCH] RISC-V: testsuite: Fix popcount test.

2023-11-21 Thread juzhe.zhong

ok Replied Message FromRobin DappDate11/21/2023 21:35 Togcc-patches,palmer,Kito Cheng,jeffreyalaw,juzhe.zh...@rivai.ai Ccrdapp@gmail.comSubjectRe: [PATCH] RISC-V: testsuite: Fix popcount test.> Mhm, not so obvious after all.  We vectorize 250 instances with
> rv32gcv, 229 with rv64gcv and 250 with rv64gcv_zbb.  Will have
> another look tomorrow.

The problem is that tree-vect-patterns is more restrictive than
necessary and does not vectorize everything it could.  Therefore
I'm going to commit the attached with a TODO comment and a
separate check for zbb.

Regards
 Robin

Subject: [PATCH v2] RISC-V: testsuite: Fix popcount test.

Due to Jakub's recent middle-end changes we now vectorize some more
popcount instances.  This patch just adjusts the dump check.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/autovec/unop/popcount.c: Adjust check.
    * lib/target-supports.exp: Add riscv_zbb.
---
 .../gcc.target/riscv/rvv/autovec/unop/popcount.c  | 10 +-
 gcc/testsuite/lib/target-supports.exp | 11 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
index 585a522aa81..ca1319c2e7e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
@@ -1461,4 +1461,12 @@ main ()
   RUN_ALL ()
 }
  
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
+/* TODO: Due to an over-zealous check in tree-vect-patterns we do not vectorize
+   e.g.
+ uint64_t dst[];
+ uint32_t src[];
+ dst[i] = __builtin_popcountll (src[i]);
+   even though we could.  Therefore, for now, adjust the following checks.
+   This difference was exposed in r14-5557-g6dd4c703be17fa.  */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" { target { { rv64 } && { ! riscv_zbb } } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 250 "vect" { target { { rv32 } || { riscv_zbb } } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index f3cd0311e27..87b2ae58720 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1983,6 +1983,17 @@ proc check_effective_target_riscv_ztso { } {
 }]
 }
  
+# Return 1 if the target arch supports the Zbb extension, 0 otherwise.
+# Cache the result.
+
+proc check_effective_target_riscv_zbb { } {
+    return [check_no_compiler_messages riscv_ext_zbb assembly {
+   #ifndef __riscv_zbb
+   #error "Not __riscv_zbb"
+   #endif
+    }]
+}
+
 # Return 1 if we can execute code when using dg-add-options riscv_v
  
 proc check_effective_target_riscv_v_ok { } {
--  
2.42.0

Re: [PATCH 0/4] v2 of Option handling: add documentation URLs

2023-11-21 Thread Tobias Burnus


On 21.11.23 14:57, David Malcolm wrote:

On Tue, 2023-11-21 at 02:09 +0100, Hans-Peter Nilsson wrote:

Sorry for barging in though I did try finding the relevant
discussion, but is committing this generated stuff necessary?
Is it because we don't want to depend on Python being
present at build time?

Partly, yes, [...]


I wonder how to ensure that this remains up to date. Should there be an
item at

https://gcc.gnu.org/branching.html and/or
https://gcc.gnu.org/releasing.html similar to the .pot generation?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: Pushed: LoongArch: Fix libgcc build failure when libc is not available (was Re: genopts: Add infrastructure to generate code for new features in ISA evolution)

2023-11-21 Thread Jeff Law





On 11/20/23 20:09, Xi Ruoyao wrote:

On Tue, 2023-11-21 at 08:00 +0800, Xi Ruoyao wrote:

/* snip */


This has broken libgcc builds when target libc isn't yet available.

In file included from 
/scratch/jmyers/glibc-bot/src/gcc/libgcc/../gcc/config/loongarch/loongarch-def.h:49,
  from 
/scratch/jmyers/glibc-bot/src/gcc/libgcc/../gcc/config/loongarch/loongarch-opts.h:24,
  from ../.././gcc/options.h:8,
  from ../.././gcc/tm.h:49,
  from /scratch/jmyers/glibc-bot/src/gcc/libgcc/libgcc2.c:29:
/scratch/jmyers/glibc-bot/build/compilers/loongarch64-linux-gnu-lp64d/gcc-first/gcc/include/stdint.h:9:16:
 fatal error: stdint.h: No such file or directory
     9 | # include_next 
   |    ^~
compilation terminated.
make[3]: *** [Makefile:505: _muldi3.o] Error 1

[ ... ]
Thanks.  My tester had been tripping over that for a couple days.

Jeff

Re: GCC/Rust libgrust-v2/to-submit branch

2023-11-21 Thread Arthur Cohen


Hi Thomas!

A newer version of the library has been force-pushed to the branch 
`libgrust-v2/to-submit`.


On 11/20/23 15:55, Thomas Schwinge wrote:

Hi!

Arthur and Pierre-Emmanuel have prepared a GCC/Rust libgrust-v2/to-submit
branch: .
In that one, most of the issues raised have been addressed, and which
I've now successfully "tested" in my different GCC configurations,
requiring just one additional change (see end of this email).  I'm using
"tested" in quotes here, as libgrust currently is still missing its
eventual content, and still is without actual users, so we may still be
up for surprises later on.  ;-)

On 2023-10-27T22:41:52+0200, I wrote:

On 2023-09-27T00:25:16+0200, I wrote:

don't we also directly need to
incorporate here a few GCC/Rust master branch follow-on commits, like:

   - commit 171ea4e2b3e202067c50f9c206974fbe1da691c0 "fixup: Fix bootstrap 
build"
   - commit 61cbe201029658c32e5c360823b9a1a17d21b03c "fixup: Fix missing build 
dependency"


I've not yet run into the need for these two.  Let's please leave these
out of the upstream submission for now, until we understand what exactly
these are necessary for.


(Still the same.)


Do you mean that we should remove the content of these commits from the 
submission? If so, I believe it's now done.





However:


   - commit 6a8b207b9ef7f9038e0cae7766117428783825d8 "libgrust: Add dependency to 
libstdc++"


... this one definitely is necessary right now; see discussion in

"Disable target libgrust if we're not building target libstdc++".


This one still isn't in the GCC/Rust libgrust-v2/to-submit branch -- but
having now tested that branch, I'm now no longer seeing the respective
build failure.  Isn't that change "libgrust: Add dependency to libstdc++"
still necessary, conceptually?  (Maybe we're just lucky, currently?)
I'll be sure to re-test in my different GCC configurations once libgrust
gains actual content and use.  (..., which might then re-expose the
original problem?)


This commit was integrated into another one:

fb31093105e build: Add libgrust as compilation modules

(on libgrust-v2/to-submit as of 2 minutes ago)




And:


(Not sure if all of these are necessary and/or if that's the complete
list; haven't looked up the corresponding GCC/Rust GitHub PRs.)


--- a/gcc/rust/config-lang.in
+++ b/gcc/rust/config-lang.in



+target_libs="target-libffi target-libbacktrace target-libgrust"


Please don't add back 'target-libffi' and 'target-libbacktrace' here;
just 'target-libgrust'.  (As is present in GCC/Rust master branch, and
per commit 7411eca498beb13729cc2acec77e68250940aa81
"Rust: Don't depend on unused 'target-libffi', 'target-libbacktrace'".)


... that change is necessary, too.


That's still unchanged in the GCC/Rust libgrust-v2/to-submit branch;
please apply to 'gcc/rust/config-lang.in':

 -target_libs="target-libffi target-libbacktrace target-libgrust"
 +target_libs=target-libgrust

Then, still should re-order the commits so that (re)generation of
auto-generated files comes before use of libgrust (so that later
bisection doesn't break), and move the 'contrib/gcc_update' update into
the commit that adds the auto-generated files.


Do you mean that the regeneration should happen before the commit adding 
the proc_macro library? Or that when we keep going and adding more 
commits on top of this, we need to make sure the regeneration commit 
happens before any code starts using/depending on libgrust/?


And alright, we'll move the changes to contrib/gcc_update into the 
regeneration commit.


All the best, and thanks again for testing :)

Arthur




Grüße
  Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[PATCH] Move VF based dependence check

2023-11-21 Thread Richard Biener

The following moves the check whether the maximum vectorization
factor determined by data dependence analysis is in conflict with
the chosen vectorization factor to after the point where we applied
both the SLP and the unrolling adjustment to the vectorization
factor.  We check the latter before applying unrolling, but the
SLP adjustment can result in both missed optimization and wrong-code.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-loop.cc (vect_analyze_loop_2): Move check
of VF against max_vf until VF is final.
---
 gcc/tree-vect-loop.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 58679e91c0a..a73a533beb1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2817,9 +2817,6 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
&fatal,
 "can't determine vectorization factor.\n");
   return ok;
 }
-  if (max_vf != MAX_VECTORIZATION_FACTOR
-  && maybe_lt (max_vf, LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
-return opt_result::failure_at (vect_location, "bad data dependence.\n");
 
   /* Compute the scalar iteration cost.  */
   vect_compute_single_scalar_iteration_cost (loop_vinfo);
@@ -2881,6 +2878,10 @@ start_over:
   LOOP_VINFO_INT_NITERS (loop_vinfo));
 }
 
+  if (max_vf != MAX_VECTORIZATION_FACTOR
+  && maybe_lt (max_vf, LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
+return opt_result::failure_at (vect_location, "bad data dependence.\n");
+
   loop_vinfo->vector_costs = init_cost (loop_vinfo, false);
 
   /* Analyze the alignment of the data-refs in the loop.
-- 
2.35.3

[PATCH] tree-optimization/112623 - forwprop VEC_PACK_TRUNC generation

2023-11-21 Thread Richard Biener

For vec_pack_trunc patterns there can be an ambiguity for the
source mode for BFmode vs HFmode.  The vectorizer checks
the insns operand mode for this, the following makes forwprop
do the same.  That of course doesn't help if the target supports
both conversions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/112623
* tree-ssa-forwprop.cc (simplify_vector_constructor):
Check the source mode of the insn for vector pack/unpacks.

* gcc.target/i386/pr112623.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr112623.c | 11 +++
 gcc/tree-ssa-forwprop.cc | 13 +
 2 files changed, 20 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112623.c

diff --git a/gcc/testsuite/gcc.target/i386/pr112623.c 
b/gcc/testsuite/gcc.target/i386/pr112623.c
new file mode 100644
index 000..c4ebacec85c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112623.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O -mavx512vl -mavx512fp16" } */
+
+typedef __bf16 __attribute__((__vector_size__ (16))) BF;
+typedef float __attribute__((__vector_size__ (32))) F;
+
+BF
+foo (F f)
+{
+  return __builtin_convertvector (f, BF);
+}
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index d39dfc1065f..0fb21e58138 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "cfganal.h"
 #include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h"
 #include "tree-vector-builder.h"
 #include "vec-perm-indices.h"
 #include "internal-fn.h"
@@ -2978,6 +2980,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
  /* Only few targets implement direct conversion patterns so try
 some simple special cases via VEC_[UN]PACK[_FLOAT]_LO_EXPR.  */
  optab optab;
+ insn_code icode;
  tree halfvectype, dblvectype;
  enum tree_code unpack_op;
 
@@ -3015,8 +3018,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
  && (optab = optab_for_tree_code (unpack_op,
   dblvectype,
   optab_default))
- && (optab_handler (optab, TYPE_MODE (dblvectype))
- != CODE_FOR_nothing))
+ && ((icode = optab_handler (optab, TYPE_MODE (dblvectype)))
+ != CODE_FOR_nothing)
+ && (insn_data[icode].operand[0].mode == TYPE_MODE (type)))
{
  gimple_seq stmts = NULL;
  tree dbl;
@@ -3054,8 +3058,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   && (optab = optab_for_tree_code (VEC_PACK_TRUNC_EXPR,
halfvectype,
optab_default))
-  && (optab_handler (optab, TYPE_MODE (halfvectype))
-  != CODE_FOR_nothing))
+  && ((icode = optab_handler (optab, TYPE_MODE (halfvectype)))
+  != CODE_FOR_nothing)
+  && (insn_data[icode].operand[0].mode == TYPE_MODE (type)))
{
  gimple_seq stmts = NULL;
  tree low = gimple_build (&stmts, BIT_FIELD_REF, halfvectype,
-- 
2.35.3

Re: [PATCH] tree-optimization/112623 - forwprop VEC_PACK_TRUNC generation

2023-11-21 Thread Jakub Jelinek

On Tue, Nov 21, 2023 at 02:35:02PM +, Richard Biener wrote:
> For vec_pack_trunc patterns there can be an ambiguity for the
> source mode for BFmode vs HFmode.  The vectorizer checks
> the insns operand mode for this, the following makes forwprop
> do the same.  That of course doesn't help if the target supports
> both conversions.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/112623
>   * tree-ssa-forwprop.cc (simplify_vector_constructor):
>   Check the source mode of the insn for vector pack/unpacks.
> 
>   * gcc.target/i386/pr112623.c: New testcase.

LGTM, thanks.

Jakub

[PATCH] vect: Allow reduc_index != 1 for COND_OPs.

2023-11-21 Thread Robin Dapp

Hi,

in PR112406 Tamar found another problem with COND_OP reductions.
I wrongly assumed that the reduction variable will always remain in
operand 1, just as we create the COND_OP in ifcvt.  But of course,
addition being commutative, we are free to swap operand 1 and 2 and
can end up with e.g.

 _ifc__60 = .COND_ADD (_2, _6, MADPictureC1_lsm.10_25, MADPictureC1_lsm.10_25);

which does not pass the asserts I put in place.

This patch removes this restriction and allows the reduction index to be
2 as well.

Bootstrapped and regtested on aarch64 and regtested on riscv.  x86 is
still running.

Regards
 Robin

gcc/ChangeLog:

PR middle-end/112406

* tree-vect-loop.cc (vectorize_fold_left_reduction): Allow
reduction index != 1.
(vect_transform_reduction): Handle reduction index != 1.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr112406-2.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/pr112406-2.c | 20 +++
 gcc/tree-vect-loop.cc | 15 +-
 2 files changed, 30 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr112406-2.c

diff --git a/gcc/testsuite/gcc.target/aarch64/pr112406-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr112406-2.c
new file mode 100644
index 000..bb6e9cf7c70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr112406-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-options "-march=armv8-a+sve -Ofast" } */
+
+double MADPictureC1;
+extern int PictureRejected[];
+int PictureMAD_0, MADModelEstimator_n_windowSize_i, 
MADModelEstimator_n_windowSize_oneSampleQ;
+
+void MADModelEstimator_n_windowSize() {
+  int estimateX2 = 0;
+  for (; MADModelEstimator_n_windowSize_i; MADModelEstimator_n_windowSize_i++) 
{
+if (MADModelEstimator_n_windowSize_oneSampleQ &&
+!PictureRejected[MADModelEstimator_n_windowSize_i])
+  estimateX2 = 1;
+if (!PictureRejected[MADModelEstimator_n_windowSize_i])
+  MADPictureC1 += PictureMAD_0;
+  }
+  if (estimateX2)
+for (;;)
+  ;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 58679e91c0a..044eacddf7e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7070,7 +7070,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 op0 = ops[1 - reduc_index];
   else
 {
-  op0 = ops[2];
+  op0 = ops[2 + (1 - reduc_index)];
   opmask = ops[0];
   gcc_assert (!slp_node);
 }
@@ -8455,7 +8455,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB
  || code == IFN_COND_MUL || code == IFN_COND_AND
  || code == IFN_COND_IOR || code == IFN_COND_XOR);
-  gcc_assert (op.num_ops == 4 && (op.ops[1] == op.ops[3]));
+  gcc_assert (op.num_ops == 4
+ && (op.ops[reduc_index]
+ == op.ops[internal_fn_else_index ((internal_fn) code)]));
 }
 
   bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
@@ -8498,12 +8500,15 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 {
   /* For a conditional operation pass the truth type as mask
 vectype.  */
-  gcc_assert (single_defuse_cycle && reduc_index == 1);
+  gcc_assert (single_defuse_cycle
+ && (reduc_index == 1 || reduc_index == 2));
   vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
 op.ops[0], &vec_oprnds0,
 truth_type_for (vectype_in),
-NULL_TREE, &vec_oprnds1, NULL_TREE,
-op.ops[2], &vec_oprnds2, NULL_TREE);
+reduc_index == 1 ? NULL_TREE : op.ops[1],
+&vec_oprnds1, NULL_TREE,
+reduc_index == 2 ? NULL_TREE : op.ops[2],
+&vec_oprnds2, NULL_TREE);
 }
 
   /* For single def-use cycles get one copy of the vectorized reduction
-- 
2.42.0

Re: [PATCH] vect: Allow reduc_index != 1 for COND_OPs.

2023-11-21 Thread Richard Biener

On Tue, 21 Nov 2023, Robin Dapp wrote:

> Hi,
> 
> in PR112406 Tamar found another problem with COND_OP reductions.
> I wrongly assumed that the reduction variable will always remain in
> operand 1, just as we create the COND_OP in ifcvt.  But of course,
> addition being commutative, we are free to swap operand 1 and 2 and
> can end up with e.g.
> 
>  _ifc__60 = .COND_ADD (_2, _6, MADPictureC1_lsm.10_25, 
> MADPictureC1_lsm.10_25);
> 
> which does not pass the asserts I put in place.
> 
> This patch removes this restriction and allows the reduction index to be
> 2 as well.
> 
> Bootstrapped and regtested on aarch64 and regtested on riscv.  x86 is
> still running.

LGTM.

Thanks,
Richard.

> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
>   PR middle-end/112406
> 
>   * tree-vect-loop.cc (vectorize_fold_left_reduction): Allow
>   reduction index != 1.
>   (vect_transform_reduction): Handle reduction index != 1.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/pr112406-2.c: New test.
> ---
>  gcc/testsuite/gcc.target/aarch64/pr112406-2.c | 20 +++
>  gcc/tree-vect-loop.cc | 15 +-
>  2 files changed, 30 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr112406-2.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr112406-2.c 
> b/gcc/testsuite/gcc.target/aarch64/pr112406-2.c
> new file mode 100644
> index 000..bb6e9cf7c70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr112406-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-options "-march=armv8-a+sve -Ofast" } */
> +
> +double MADPictureC1;
> +extern int PictureRejected[];
> +int PictureMAD_0, MADModelEstimator_n_windowSize_i, 
> MADModelEstimator_n_windowSize_oneSampleQ;
> +
> +void MADModelEstimator_n_windowSize() {
> +  int estimateX2 = 0;
> +  for (; MADModelEstimator_n_windowSize_i; 
> MADModelEstimator_n_windowSize_i++) {
> +if (MADModelEstimator_n_windowSize_oneSampleQ &&
> +!PictureRejected[MADModelEstimator_n_windowSize_i])
> +  estimateX2 = 1;
> +if (!PictureRejected[MADModelEstimator_n_windowSize_i])
> +  MADPictureC1 += PictureMAD_0;
> +  }
> +  if (estimateX2)
> +for (;;)
> +  ;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 58679e91c0a..044eacddf7e 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7070,7 +7070,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>  op0 = ops[1 - reduc_index];
>else
>  {
> -  op0 = ops[2];
> +  op0 = ops[2 + (1 - reduc_index)];
>opmask = ops[0];
>gcc_assert (!slp_node);
>  }
> @@ -8455,7 +8455,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB
> || code == IFN_COND_MUL || code == IFN_COND_AND
> || code == IFN_COND_IOR || code == IFN_COND_XOR);
> -  gcc_assert (op.num_ops == 4 && (op.ops[1] == op.ops[3]));
> +  gcc_assert (op.num_ops == 4
> +   && (op.ops[reduc_index]
> +   == op.ops[internal_fn_else_index ((internal_fn) code)]));
>  }
>  
>bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> @@ -8498,12 +8500,15 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>  {
>/* For a conditional operation pass the truth type as mask
>vectype.  */
> -  gcc_assert (single_defuse_cycle && reduc_index == 1);
> +  gcc_assert (single_defuse_cycle
> +   && (reduc_index == 1 || reduc_index == 2));
>vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
>op.ops[0], &vec_oprnds0,
>truth_type_for (vectype_in),
> -  NULL_TREE, &vec_oprnds1, NULL_TREE,
> -  op.ops[2], &vec_oprnds2, NULL_TREE);
> +  reduc_index == 1 ? NULL_TREE : op.ops[1],
> +  &vec_oprnds1, NULL_TREE,
> +  reduc_index == 2 ? NULL_TREE : op.ops[2],
> +  &vec_oprnds2, NULL_TREE);
>  }
>  
>/* For single def-use cycles get one copy of the vectorized reduction
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

1 2 >

1 - 100 of 188 matches

Mail list logo