Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2023-05-24 Thread Kewen.Lin via Gcc-patches
Hi Alexandre,

on 2023/5/24 13:51, Alexandre Oliva wrote:
> 
> Codegen changes caused add instruction count mismatches on
> ppc-*-linux-gnu and other 32-bit ppc targets.  At some point the
> expected counts were adjusted for lp64, but ilp32 differences
> remained, and published test results confirm it.

Thanks for fixing, I tested this on ppc64le and ppc64 {-m64,-m32}
well.

> 
> Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
> with gcc-12.
> 
> for  gcc/testsuite/ChangeLog

I think this is for PR101169, could you add it as PR marker?

> 
>   * gcc.target/powerpc/fold-vec-extract-char.p7.c: Adjust addi
>   counts for ilp32.
>   * gcc.target/powerpc/fold-vec-extract-double.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-float.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-float.p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-int.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-short.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
> ---
>  .../gcc.target/powerpc/fold-vec-extract-char.p7.c  |3 ++-
>  .../powerpc/fold-vec-extract-double.p7.c   |2 +-
>  .../gcc.target/powerpc/fold-vec-extract-float.p7.c |2 +-
>  .../gcc.target/powerpc/fold-vec-extract-float.p8.c |2 +-
>  .../gcc.target/powerpc/fold-vec-extract-int.p7.c   |2 +-
>  .../gcc.target/powerpc/fold-vec-extract-int.p8.c   |2 +-
>  .../gcc.target/powerpc/fold-vec-extract-short.p7.c |2 +-
>  .../gcc.target/powerpc/fold-vec-extract-short.p8.c |2 +-
>  8 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
> index 29a8aa84db282..c6647431d09c9 100644
> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
> @@ -11,7 +11,8 @@
>  /* one extsb (extend sign-bit) instruction generated for each test against
> unsigned types */
> 
> -/* { dg-final { scan-assembler-times {\maddi\M} 9 } } */
> +/* { dg-final { scan-assembler-times {\maddi\M} 9 { target { lp64 } } } } */
> +/* { dg-final { scan-assembler-times {\maddi\M} 6 { target { ilp32 } } } } */
>  /* { dg-final { scan-assembler-times {\mli\M} 6 } } */
>  /* { dg-final { scan-assembler-times {\mstxvw4x\M|\mstvx\M|\mstxv\M} 6 } } */
>  /* -m32 target uses rlwinm in place of rldicl. */
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
> index 3cae644b90b71..db325efbb07ff 100644
> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
> @@ -14,7 +14,7 @@
>  /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
>  /* -m32 target has an 'add' in place of one of the 'addi'. */
>  /* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } 
> } */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } 
> } */
> +/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target ilp32 } } 
> } */

So both lp64 and ilp32 have the same count, could we merge it and remove the 
selectors?

>  /* -m32 target has a rlwinm in place of a rldic .  */
>  /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
>  /* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
> index 59a4979457dcb..42ec69475fd07 100644
> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
> @@ -13,7 +13,7 @@
>  /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
>  /* -m32 as an add in place of an addi. */
>  /* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } 
> } */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } 
> } */
> +/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target ilp32 } } 
> } */

Ditto.

>  /* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstvx\M|\mstxv\M} 1 } } */
>  /* -m32 uses rlwinm in place of rldic */
>  /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
> index 4b1d75ee26d0f..68de4b307 100644
> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
> @@ -26,7 +26,7 @@
>  /* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {\madd\M} 1 { target ilp32 } } } 

Re: [V7][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-05-24 Thread Bernhard Reutner-Fischer via Gcc-patches
On 24 May 2023 16:09:21 CEST, Qing Zhao  wrote:
>Bernhard,
>
>Thanks a lot for your comments.
>
>> On May 19, 2023, at 7:11 PM, Bernhard Reutner-Fischer 
>>  wrote:
>> 
>> On Fri, 19 May 2023 20:49:47 +
>> Qing Zhao via Gcc-patches  wrote:
>> 
>>> GCC extension accepts the case when a struct with a flexible array member
>>> is embedded into another struct or union (possibly recursively).
>> 
>> Do you mean TYPE_TRAILING_FLEXARRAY()?
>
>The following might be more accurate description:
>
>GCC extension accepts the case when a struct with a flexible array member
> is embedded into another struct or union (possibly recursively) as the last 
> field.
>
>
>
>> 
>>> diff --git a/gcc/tree.h b/gcc/tree.h
>>> index 0b72663e6a1..237644e788e 100644
>>> --- a/gcc/tree.h
>>> +++ b/gcc/tree.h
>>> @@ -786,7 +786,12 @@ extern void omp_clause_range_check_failed (const_tree, 
>>> const char *, int,
>>>(...) prototype, where arguments can be accessed with va_start and
>>>va_arg), as opposed to an unprototyped function.  */
>>> #define TYPE_NO_NAMED_ARGS_STDARG_P(NODE) \
>>> -  (TYPE_CHECK (NODE)->type_common.no_named_args_stdarg_p)
>>> +  (FUNC_OR_METHOD_CHECK (NODE)->type_common.no_named_args_stdarg_p)
>>> +
>>> +/* True if this RECORD_TYPE or UNION_TYPE includes a flexible array member
>>> +   at the last field recursively.  */
>>> +#define TYPE_INCLUDE_FLEXARRAY(NODE) \
>>> +  (RECORD_OR_UNION_CHECK (NODE)->type_common.no_named_args_stdarg_p)
>> 
>> Until i read the description above i read TYPE_INCLUDE_FLEXARRAY as an
>> option to include or not include something. The description hints more
>> at TYPE_INCLUDES_FLEXARRAY (with an S) to be a type which has at least
>> one member which has a trailing flexible array or which itself has a
>> trailing flexible array.
>
>Yes, TYPE_INCLUDES_FLEXARRAY (maybe with a S is a better name) means the 
>structure/union TYPE includes a flexible array member or includes a struct 
>with a flexible array member as the last field.
>

So ANY_TRAILING_FLEXARRAY or TYPE_CONTAINS_FLEXARRAY, TYPE_INCLUDES_FLEXARRAY 
or something like that would be more clear, i don't know.
I'd probably use the first, but that's enough bike shedding for me now. Let's 
see what others think.

thanks,

>Hope this is clear.
>thanks.
>
>Qing
>> 
>>> 
>>> /* In an IDENTIFIER_NODE, this means that assemble_name was called with
>>>this string as an argument.  */
>> 
>



[COMMITTED] Stream out NANs correctly.

2023-05-24 Thread Aldy Hernandez via Gcc-patches
NANs don't have bounds, so there's no need to stream them out.

gcc/ChangeLog:

* data-streamer-in.cc (streamer_read_value_range): Handle NANs.
* data-streamer-out.cc (streamer_write_vrange): Same.
* value-range.h (class vrange): Make streamer_write_vrange a friend.
---
 gcc/data-streamer-in.cc  | 16 
 gcc/data-streamer-out.cc | 17 -
 gcc/value-range.h|  1 +
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 07728bef413..578c328475f 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -248,14 +248,22 @@ streamer_read_value_range (class lto_input_block *ib, 
data_in *data_in,
   if (is_a  (vr))
 {
   frange  = as_a  (vr);
-  REAL_VALUE_TYPE lb, ub;
-  streamer_read_real_value (ib, );
-  streamer_read_real_value (ib, );
+
+  // Stream in NAN bits.
   struct bitpack_d bp = streamer_read_bitpack (ib);
   bool pos_nan = (bool) bp_unpack_value (, 1);
   bool neg_nan = (bool) bp_unpack_value (, 1);
   nan_state nan (pos_nan, neg_nan);
-  r.set (type, lb, ub, nan);
+
+  if (kind == VR_NAN)
+   r.set_nan (type, nan);
+  else
+   {
+ REAL_VALUE_TYPE lb, ub;
+ streamer_read_real_value (ib, );
+ streamer_read_real_value (ib, );
+ r.set (type, lb, ub, nan);
+   }
   return;
 }
   gcc_unreachable ();
diff --git a/gcc/data-streamer-out.cc b/gcc/data-streamer-out.cc
index afc9862062b..93dedfcb895 100644
--- a/gcc/data-streamer-out.cc
+++ b/gcc/data-streamer-out.cc
@@ -410,7 +410,7 @@ streamer_write_vrange (struct output_block *ob, const 
vrange )
   gcc_checking_assert (!v.undefined_p ());
 
   // Write the common fields to all vranges.
-  value_range_kind kind = v.varying_p () ? VR_VARYING : VR_RANGE;
+  value_range_kind kind = v.m_kind;
   streamer_write_enum (ob->main_stream, value_range_kind, VR_LAST, kind);
   stream_write_tree (ob, v.type (), true);
 
@@ -429,15 +429,22 @@ streamer_write_vrange (struct output_block *ob, const 
vrange )
   if (is_a  (v))
 {
   const frange  = as_a  (v);
-  REAL_VALUE_TYPE lb = r.lower_bound ();
-  REAL_VALUE_TYPE ub = r.upper_bound ();
-  streamer_write_real_value (ob, );
-  streamer_write_real_value (ob, );
+
+  // Stream out NAN bits.
   bitpack_d bp = bitpack_create (ob->main_stream);
   nan_state nan = r.get_nan_state ();
   bp_pack_value (, nan.pos_p (), 1);
   bp_pack_value (, nan.neg_p (), 1);
   streamer_write_bitpack ();
+
+  // Stream out bounds.
+  if (kind != VR_NAN)
+   {
+ REAL_VALUE_TYPE lb = r.lower_bound ();
+ REAL_VALUE_TYPE ub = r.upper_bound ();
+ streamer_write_real_value (ob, );
+ streamer_write_real_value (ob, );
+   }
   return;
 }
   gcc_unreachable ();
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 39023e7b5eb..2b4ebabe7c8 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -76,6 +76,7 @@ class GTY((user)) vrange
 {
   template  friend bool is_a (vrange &);
   friend class Value_Range;
+  friend void streamer_write_vrange (struct output_block *, const vrange &);
 public:
   virtual void accept (const class vrange_visitor ) const = 0;
   virtual void set (tree, tree, value_range_kind = VR_RANGE);
-- 
2.40.1



[COMMITTED] Disallow setting of NANs in frange setter unless setting trees.

2023-05-24 Thread Aldy Hernandez via Gcc-patches
frange::set() is confusing in that we can set a NAN by specifying a
bound of +-NAN, even though we tecnically disallow NANs in the setter
because the kind can never be VR_NAN.  This is a wart for
get_tree_range(), which builds a range out of a tree from the source,
to work correctly.  It's ugly, and it showed its limitation while
implementing LTO streaming of ranges.

This patch disallows passing NAN bounds in frange::set() and fixes
get_tree_range.

gcc/ChangeLog:

* value-query.cc (range_query::get_tree_range): Set NAN directly
if necessary.
* value-range.cc (frange::set): Assert that bounds are not NAN.
---
 gcc/value-query.cc | 13 ++---
 gcc/value-range.cc |  9 +
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index 43297f17c39..a84f164d77b 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -189,9 +189,16 @@ range_query::get_tree_range (vrange , tree expr, gimple 
*stmt)
   {
frange  = as_a  (r);
REAL_VALUE_TYPE *rv = TREE_REAL_CST_PTR (expr);
-   f.set (TREE_TYPE (expr), *rv, *rv);
-   if (!real_isnan (rv))
- f.clear_nan ();
+   if (real_isnan (rv))
+ {
+   bool sign = real_isneg (rv);
+   f.set_nan (TREE_TYPE (expr), sign);
+ }
+   else
+ {
+   nan_state nan (false);
+   f.set (TREE_TYPE (expr), *rv, *rv, nan);
+ }
return true;
   }
 
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 2f37ff3e58e..707b1f15fd4 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -359,14 +359,7 @@ frange::set (tree type,
   gcc_unreachable ();
 }
 
-  // Handle NANs.
-  if (real_isnan () || real_isnan ())
-{
-  gcc_checking_assert (real_identical (, ));
-  bool sign = real_isneg ();
-  set_nan (type, sign);
-  return;
-}
+  gcc_checking_assert (!real_isnan () && !real_isnan ());
 
   m_kind = kind;
   m_type = type;
-- 
2.40.1



[COMMITTED] Hash known NANs correctly for franges.

2023-05-24 Thread Aldy Hernandez via Gcc-patches
We're ICEing when trying to hash a known NAN.  This is unnoticeable
because the only user would be IPA, and even so, it currently doesn't
handle floats.  However, handling floats is a flip of a switch, so
it's best to handle them already.

gcc/ChangeLog:

* value-range.cc (add_vrange): Handle known NANs.
---
 gcc/value-range.cc | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 874a1843ebf..2f37ff3e58e 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -269,14 +269,14 @@ add_vrange (const vrange , inchash::hash ,
   if (is_a  (v))
 {
   const frange  = as_a  (v);
-  if (r.varying_p ())
-   hstate.add_int (VR_VARYING);
+  if (r.known_isnan ())
+   hstate.add_int (VR_NAN);
   else
-   hstate.add_int (VR_RANGE);
-
-  hstate.add_real_value (r.lower_bound ());
-  hstate.add_real_value (r.upper_bound ());
-
+   {
+ hstate.add_int (r.varying_p () ? VR_VARYING : VR_RANGE);
+ hstate.add_real_value (r.lower_bound ());
+ hstate.add_real_value (r.upper_bound ());
+   }
   nan_state nan = r.get_nan_state ();
   hstate.add_int (nan.pos_p ());
   hstate.add_int (nan.neg_p ());
-- 
2.40.1



[COMMITTED] Add an frange::set_nan() variant that takes a nan_state.

2023-05-24 Thread Aldy Hernandez via Gcc-patches
Generalize frange::set_nan() to take a nan_state and make current
set_nan() methods syntactic sugar.

This is in preparation for better streaming of NANs for LTO/IPA.

gcc/ChangeLog:

* value-range.h (frange::set_nan): New.
---
 gcc/value-range.h | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index b8cc2a0e76a..39023e7b5eb 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -327,6 +327,7 @@ public:
const nan_state &, value_range_kind = VR_RANGE);
   void set_nan (tree type);
   void set_nan (tree type, bool sign);
+  void set_nan (tree type, const nan_state &);
   virtual void set_varying (tree type) override;
   virtual void set_undefined () override;
   virtual bool union_ (const vrange &) override;
@@ -1219,17 +1220,18 @@ frange_val_is_max (const REAL_VALUE_TYPE , const_tree 
type)
   return real_identical (, );
 }
 
-// Build a signless NAN of type TYPE.
+// Build a NAN with a state of NAN.
 
 inline void
-frange::set_nan (tree type)
+frange::set_nan (tree type, const nan_state )
 {
+  gcc_checking_assert (nan.pos_p () || nan.neg_p ());
   if (HONOR_NANS (type))
 {
   m_kind = VR_NAN;
   m_type = type;
-  m_pos_nan = true;
-  m_neg_nan = true;
+  m_neg_nan = nan.neg_p ();
+  m_pos_nan = nan.pos_p ();
   if (flag_checking)
verify_range ();
 }
@@ -1237,22 +1239,22 @@ frange::set_nan (tree type)
 set_undefined ();
 }
 
+// Build a signless NAN of type TYPE.
+
+inline void
+frange::set_nan (tree type)
+{
+  nan_state nan (true);
+  set_nan (type, nan);
+}
+
 // Build a NAN of type TYPE with SIGN.
 
 inline void
 frange::set_nan (tree type, bool sign)
 {
-  if (HONOR_NANS (type))
-{
-  m_kind = VR_NAN;
-  m_type = type;
-  m_neg_nan = sign;
-  m_pos_nan = !sign;
-  if (flag_checking)
-   verify_range ();
-}
-  else
-set_undefined ();
+  nan_state nan (/*pos=*/!sign, /*neg=*/sign);
+  set_nan (type, nan);
 }
 
 // Return TRUE if range is known to be finite.
-- 
2.40.1



[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #10 from Andrew Pinski  ---
(In reply to Alexander Monakov from comment #8)
> I think the following testcase indicates that GCC assumes that tail padding
> is accessible: 

Well it aligned accesses are always accessable 
the alignment of `struct S` in this case is 4 byte aligned after all.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #9 from Martin Uecker  ---
Clang as well, but that would be only padding inside the first part without
taking into account extra element in the FAM. 

I am more concert about programmers using the formula sizeof(.) + n * sizeof
for memcpy etc.  (and we have an example in the standard using this formula).
Creating objects smaller than this seems a bit dangerous.

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-24 Thread Kewen.Lin via Gcc-patches
on 2023/5/24 23:20, Carl Love wrote:
> On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
>> on 2023/5/24 06:30, Peter Bergner wrote:
>>> On 5/23/23 12:24 AM, Kewen.Lin wrote:
 on 2023/5/23 01:31, Carl Love wrote:
> The builtins were requested for use in GLibC.  As of version
> 2.31 they
> were added as inline asm.  They requested a builtin so the asm
> could be
> removed.

 So IMHO we also want the similar support for mffscrn, that is to
 make
 use of mffscrn and mffscrni on Power9 and later, but falls back
 to 
 __builtin_set_fpscr_rn + mffs similar on older platforms.
>>>
>>> So __builtin_set_fpscr_rn everything we want (sets the RN bits) and
>>> uses mffscrn/mffscrni on P9 and later and uses older insns on pre-
>>> P9.
>>> The only problem is we don't return the current FPSCR bits, as the
>>> bif
>>> is defined to return void.
>>
>> Yes.
>>
>>> Crazy idea, but could we extend the built-in
>>> with an overload that returns the FPSCR bits?  
>>
>> So you agree that we should make this proposed new bif handle pre-P9
>> just
>> like some other existing bifs. :)  I think extending it is good and
>> doable,
>> but the only concern here is the bif name "__builtin_set_fpscr_rn",
>> which
>> matches the existing behavior (only set rounding) but doesn't match
>> the
>> proposed extending behavior (set rounding and get some env bits
>> back).
>> Maybe it's not a big deal if the documentation clarify it well.
> 
> Extending the builtin to pre Power 9 is straight forward and I agree
> would make good sense to do.
> 
> I am a bit concerned on how to extend __builtin_set_fpscr_rn to add the
> new functionality.  Peter suggests overloading the builtin to either
> return void or returns FPSCR bits.  It is my understanding that the
> return value for a given builtin had to be the same, i.e. you can't
> overload the return value. Maybe you can with Bill's new
> infrastructure?  I recall having problems trying to overload the return
> value in the past and Bill said you couldn't do it.  I play with this
> and see if I can overload the return value.

Your understanding on that we fail to overload this for just different
return types is correct.  But previously I interpreted the extending
proposal as to extend
 
  void __builtin_set_fpscr_rn (int);

to 

  void __builtin_set_fpscr_rn (int, double*);

The related address taken and store here can be optimized out normally.

BR,
Kewen


[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov  ---
(In reply to jos...@codesourcery.com from comment #6)
> For the standard, dynamically allocated case, you should only need to 
> allocate enough memory to contain the initial part of the struct and the 
> array members being accessed - not any padding after that array.  (There 
> were wording problems before C99 TC2; see DR#282.)

I think the following testcase indicates that GCC assumes that tail padding is
accessible:

struct S {
int i;
char c;
char fam[];
};

void f(struct S *p, struct S *q)
{
*p = *q;
}

f:
movq(%rsi), %rax
movq%rax, (%rdi)
ret

Sorry for the tangential remark, but there seems to be a contradiction.

[Bug fortran/90504] Improved NORM2 algorithm

2023-05-24 Thread jb at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90504

--- Comment #2 from Janne Blomqvist  ---
(In reply to anlauf from comment #1)
> (In reply to Janne Blomqvist from comment #0)
> > Hanson, Hopkins, Remark on Algorithm 539: A Modern Fortran Reference
> > Implementation for Carefully Computing the Euclidean Norm,
> > https://dl.acm.org/citation.cfm?id=3134441
> > 
> > Above article tests different algorithms for NORM2 and tests performance and
> > numerical accuracy.
> 
> This article is behind a paywall.
> 
> Is there a publicly available description?

https://kar.kent.ac.uk/67205/1/remark.pdf

(Found via the https://unpaywall.org/ browser extension)

Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread juzhe.zh...@rivai.ai
>> It's highly unlikely we'll switch from the mechanisms we're using.
>>They're pretty deeply embedded into how all the ports are developed and
>>work.

We just take a look at the build file. It seems that the functions generated by 
define_insn 
are so many. Do we have the chance optimize it?
I believe the tablegen mechanism in LLVM is well optimized in case of generated 
files and functions
so that they won't be affected to much as instructions go up.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-25 12:07
To: juzhe.zh...@rivai.ai; kito.cheng
CC: jeffreyalaw; palmer; vineetg; Kito.cheng; gcc-patches; Patrick O'Neill; 
macro
Subject: Re: RISC-V Bootstrap problems
 
 
On 5/24/23 21:54, juzhe.zh...@rivai.ai wrote:
>  >> IIRC LLVM is using the table driven mechanism, so it's less impact 
> on the
>>>compilation time when the instruction becomes more and more.
> Oh, I see. Could you share more details ?
> Maybe we can support this in GCC.
It's highly unlikely we'll switch from the mechanisms we're using. 
They're pretty deeply embedded into how all the ports are developed and 
work.
 
The first step is to figure out what's exploding.  I strongly suspect 
we'll be able to see this in a cross, but again, the magnitude will be 
smaller.
 
jeff
 


Re: RISC-V Bootstrap problems

2023-05-24 Thread Kito Cheng via Gcc-patches
Yeah, JoJo still working on toolchain stuff, but just not active on upstream GCC

cc. jojo

On Thu, May 25, 2023 at 12:06 PM Jeff Law  wrote:
>
>
>
> On 5/24/23 21:53, Kito Cheng wrote:
> > Jojo has a patch to try to split those things that should help this,
> > but seems not landed.
> >
> > https://patchwork.ozlabs.org/project/gcc/patch/20201104015315.81416-1-jiejie_r...@c-sky.com/
> Is JoJo still active?  I haven't heard from JoJo in many months, perhaps
> as long as a year or two.
>
> Jeff


Re: RISC-V Bootstrap problems

2023-05-24 Thread Jeff Law




On 5/24/23 21:54, juzhe.zh...@rivai.ai wrote:
 >> IIRC LLVM is using the table driven mechanism, so it's less impact 
on the

compilation time when the instruction becomes more and more.

Oh, I see. Could you share more details ?
Maybe we can support this in GCC.
It's highly unlikely we'll switch from the mechanisms we're using. 
They're pretty deeply embedded into how all the ports are developed and 
work.


The first step is to figure out what's exploding.  I strongly suspect 
we'll be able to see this in a cross, but again, the magnitude will be 
smaller.


jeff


Re: RISC-V Bootstrap problems

2023-05-24 Thread Jeff Law




On 5/24/23 21:53, Kito Cheng wrote:

Jojo has a patch to try to split those things that should help this,
but seems not landed.

https://patchwork.ozlabs.org/project/gcc/patch/20201104015315.81416-1-jiejie_r...@c-sky.com/
Is JoJo still active?  I haven't heard from JoJo in many months, perhaps 
as long as a year or two.


Jeff


Re: [PATCH] RISC-V: Remove FRM_REGNUM dependency for rtx conversions

2023-05-24 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Wed, May 24, 2023 at 7:26 PM  wrote:
>
> From: Juzhe-Zhong 
>
> According to RVV ISA:
> The conversions use the dynamic rounding mode in frm, except for the rtz 
> variants, which round towards zero.
>
> So rtz conversion patterns should not have FRM dependency.
>
> We can't support mode switching for FRM yet since rvv intrinsic doc is not 
> updated but
> I think this patch is correct.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Remove FRM_REGNUM dependency in rtz 
> instructions.
>
> ---
>  gcc/config/riscv/vector.md | 12 +++-
>  1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 9afef0d12bc..15f66efaa48 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7072,10 +7072,8 @@
>  (match_operand 5 "const_int_operand""  i,  i,  i,  
> i")
>  (match_operand 6 "const_int_operand""  i,  i,  i,  
> i")
>  (match_operand 7 "const_int_operand""  i,  i,  i,  
> i")
> -(match_operand 8 "const_int_operand""  i,  i,  i,  
> i")
>  (reg:SI VL_REGNUM)
> -(reg:SI VTYPE_REGNUM)
> -(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (any_fix:
>  (match_operand:VF 3 "register_operand"  " vr, vr, vr, 
> vr"))
>   (match_operand: 2 "vector_merge_operand" " vu,  0, vu,  
> 0")))]
> @@ -7142,10 +7140,8 @@
>  (match_operand 5 "const_int_operand""i,i")
>  (match_operand 6 "const_int_operand""i,i")
>  (match_operand 7 "const_int_operand""i,i")
> -(match_operand 8 "const_int_operand""i,i")
>  (reg:SI VL_REGNUM)
> -(reg:SI VTYPE_REGNUM)
> -(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (any_fix:VWCONVERTI
>  (match_operand: 3 "register_operand" "   vr,   vr"))
>   (match_operand:VWCONVERTI 2 "vector_merge_operand" "   vu,0")))]
> @@ -7233,10 +7229,8 @@
>  (match_operand 5 "const_int_operand" "  i,  i,  i,  
> i,i,i")
>  (match_operand 6 "const_int_operand" "  i,  i,  i,  
> i,i,i")
>  (match_operand 7 "const_int_operand" "  i,  i,  i,  
> i,i,i")
> -(match_operand 8 "const_int_operand" "  i,  i,  i,  
> i,i,i")
>  (reg:SI VL_REGNUM)
> -(reg:SI VTYPE_REGNUM)
> -(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (any_fix:
>  (match_operand:VF 3 "register_operand"   "  0,  0,  0,  
> 0,   vr,   vr"))
>   (match_operand: 2 "vector_merge_operand" " vu,  0, vu,  
> 0,   vu,0")))]
> --
> 2.36.1
>


Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread juzhe.zh...@rivai.ai
>> IIRC LLVM is using the table driven mechanism, so it's less impact on the
>> compilation time when the instruction becomes more and more.
Oh, I see. Could you share more details ?
Maybe we can support this in GCC.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-25 11:53
To: juzhe.zh...@rivai.ai
CC: jeffreyalaw; palmer; vineetg; Kito.cheng; gcc-patches; Patrick O'Neill; 
jlaw; macro
Subject: Re: Re: RISC-V Bootstrap problems
Jojo has a patch to try to split those things that should help this,
but seems not landed.
 
https://patchwork.ozlabs.org/project/gcc/patch/20201104015315.81416-1-jiejie_r...@c-sky.com/
 
 
> How about LLVM? Can kito help with this issue?
> LLVM has already supported full intrinsics for a long time and no issues.
 
IIRC LLVM is using the table driven mechanism, so it's less impact on the
compilation time when the instruction becomes more and more.
 
 
On Thu, May 25, 2023 at 11:46 AM juzhe.zh...@rivai.ai
 wrote:
>
> segment intrinsics are really huge amount.
>
> Even though I have tried to optimized them, still we have the issues..
>
> How about LLVM? Can kito help with this issue?
> LLVM has already support full intrinsics for a long time and no issues.
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>
> From: Jeff Law
> Date: 2023-05-25 11:43
> To: Palmer Dabbelt; Vineet Gupta
> CC: kito.cheng; gcc-patches; Kito Cheng; Patrick O'Neill; Jeff Law; macro; 
> juzhe.zh...@rivai.ai
> Subject: Re: RISC-V Bootstrap problems
>
>
> On 5/24/23 17:13, Palmer Dabbelt wrote:
> > On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:
>
> [ ... big snip ... ]
>
> >>
> >> Never mind. Looks like I found the issue - with just trial and error and
> >> no idea of how this stuff works.
> >> The torture-{init,finish} needs to be in riscv.exp not rvv.exp
> >> Running full tests now.
> >
> > Thanks!
> Marginally related.  I was able to bisect the "hang" when 3-staging the
> trunk on RISC-V with qemu user mode emulation.
>
> So it wasn't actually hanging, but after the introduction of segment
> intrinsics the compilation time for insn-emit explodes -- previously I
> could do a full 3-stage bootstrap, build the glibc & the kernel, then
> test c/c++/fortran in ~10 hours.
>
> Now just building insn-emit.o alone takes ~10 hours in that environment.
>   I suspect (but have not yet confirmed) that we should see a huge
> compile-time spike in cross builds as well, though obviously it won't be
> as bad since we're not using qemu emulation.
>
> Clearly something isn't scaling well.  I don't know if we've got a crazy
> large function in there, a crazy number of functions or something that's
> just triggering a compile-time scaling problem.  Whatever it is, we
> probably need to address it.
>
> jeff
>
>
 


Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread Kito Cheng via Gcc-patches
Jojo has a patch to try to split those things that should help this,
but seems not landed.

https://patchwork.ozlabs.org/project/gcc/patch/20201104015315.81416-1-jiejie_r...@c-sky.com/


> How about LLVM? Can kito help with this issue?
> LLVM has already supported full intrinsics for a long time and no issues.

IIRC LLVM is using the table driven mechanism, so it's less impact on the
compilation time when the instruction becomes more and more.


On Thu, May 25, 2023 at 11:46 AM juzhe.zh...@rivai.ai
 wrote:
>
> segment intrinsics are really huge amount.
>
> Even though I have tried to optimized them, still we have the issues..
>
> How about LLVM? Can kito help with this issue?
> LLVM has already support full intrinsics for a long time and no issues.
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>
> From: Jeff Law
> Date: 2023-05-25 11:43
> To: Palmer Dabbelt; Vineet Gupta
> CC: kito.cheng; gcc-patches; Kito Cheng; Patrick O'Neill; Jeff Law; macro; 
> juzhe.zh...@rivai.ai
> Subject: Re: RISC-V Bootstrap problems
>
>
> On 5/24/23 17:13, Palmer Dabbelt wrote:
> > On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:
>
> [ ... big snip ... ]
>
> >>
> >> Never mind. Looks like I found the issue - with just trial and error and
> >> no idea of how this stuff works.
> >> The torture-{init,finish} needs to be in riscv.exp not rvv.exp
> >> Running full tests now.
> >
> > Thanks!
> Marginally related.  I was able to bisect the "hang" when 3-staging the
> trunk on RISC-V with qemu user mode emulation.
>
> So it wasn't actually hanging, but after the introduction of segment
> intrinsics the compilation time for insn-emit explodes -- previously I
> could do a full 3-stage bootstrap, build the glibc & the kernel, then
> test c/c++/fortran in ~10 hours.
>
> Now just building insn-emit.o alone takes ~10 hours in that environment.
>   I suspect (but have not yet confirmed) that we should see a huge
> compile-time spike in cross builds as well, though obviously it won't be
> as bad since we're not using qemu emulation.
>
> Clearly something isn't scaling well.  I don't know if we've got a crazy
> large function in there, a crazy number of functions or something that's
> just triggering a compile-time scaling problem.  Whatever it is, we
> probably need to address it.
>
> jeff
>
>


Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread juzhe.zh...@rivai.ai
Besides, we don't have compilation issues in crossing-compiling (with segment 
intrinsics).
But I do agree we need to address such issue.

As far as I known, GCC compile insn-emit in single thread single core.
Can we multi-thread && multi-core to compile it to speed up the compilation?

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-25 11:43
To: Palmer Dabbelt; Vineet Gupta
CC: kito.cheng; gcc-patches; Kito Cheng; Patrick O'Neill; Jeff Law; macro; 
juzhe.zh...@rivai.ai
Subject: Re: RISC-V Bootstrap problems
 
 
On 5/24/23 17:13, Palmer Dabbelt wrote:
> On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:
 
[ ... big snip ... ]
 
>>
>> Never mind. Looks like I found the issue - with just trial and error and
>> no idea of how this stuff works.
>> The torture-{init,finish} needs to be in riscv.exp not rvv.exp
>> Running full tests now.
> 
> Thanks!
Marginally related.  I was able to bisect the "hang" when 3-staging the 
trunk on RISC-V with qemu user mode emulation.
 
So it wasn't actually hanging, but after the introduction of segment 
intrinsics the compilation time for insn-emit explodes -- previously I 
could do a full 3-stage bootstrap, build the glibc & the kernel, then 
test c/c++/fortran in ~10 hours.
 
Now just building insn-emit.o alone takes ~10 hours in that environment. 
  I suspect (but have not yet confirmed) that we should see a huge 
compile-time spike in cross builds as well, though obviously it won't be 
as bad since we're not using qemu emulation.
 
Clearly something isn't scaling well.  I don't know if we've got a crazy 
large function in there, a crazy number of functions or something that's 
just triggering a compile-time scaling problem.  Whatever it is, we 
probably need to address it.
 
jeff
 
 


Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread juzhe.zh...@rivai.ai
segment intrinsics are really huge amount. 

Even though I have tried to optimized them, still we have the issues..

How about LLVM? Can kito help with this issue? 
LLVM has already support full intrinsics for a long time and no issues.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-25 11:43
To: Palmer Dabbelt; Vineet Gupta
CC: kito.cheng; gcc-patches; Kito Cheng; Patrick O'Neill; Jeff Law; macro; 
juzhe.zh...@rivai.ai
Subject: Re: RISC-V Bootstrap problems
 
 
On 5/24/23 17:13, Palmer Dabbelt wrote:
> On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:
 
[ ... big snip ... ]
 
>>
>> Never mind. Looks like I found the issue - with just trial and error and
>> no idea of how this stuff works.
>> The torture-{init,finish} needs to be in riscv.exp not rvv.exp
>> Running full tests now.
> 
> Thanks!
Marginally related.  I was able to bisect the "hang" when 3-staging the 
trunk on RISC-V with qemu user mode emulation.
 
So it wasn't actually hanging, but after the introduction of segment 
intrinsics the compilation time for insn-emit explodes -- previously I 
could do a full 3-stage bootstrap, build the glibc & the kernel, then 
test c/c++/fortran in ~10 hours.
 
Now just building insn-emit.o alone takes ~10 hours in that environment. 
  I suspect (but have not yet confirmed) that we should see a huge 
compile-time spike in cross builds as well, though obviously it won't be 
as bad since we're not using qemu emulation.
 
Clearly something isn't scaling well.  I don't know if we've got a crazy 
large function in there, a crazy number of functions or something that's 
just triggering a compile-time scaling problem.  Whatever it is, we 
probably need to address it.
 
jeff
 
 


Re: RISC-V Bootstrap problems

2023-05-24 Thread Jeff Law via Gcc-patches




On 5/24/23 17:13, Palmer Dabbelt wrote:

On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:


[ ... big snip ... ]



Never mind. Looks like I found the issue - with just trial and error and
no idea of how this stuff works.
The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.


Thanks!
Marginally related.  I was able to bisect the "hang" when 3-staging the 
trunk on RISC-V with qemu user mode emulation.


So it wasn't actually hanging, but after the introduction of segment 
intrinsics the compilation time for insn-emit explodes -- previously I 
could do a full 3-stage bootstrap, build the glibc & the kernel, then 
test c/c++/fortran in ~10 hours.


Now just building insn-emit.o alone takes ~10 hours in that environment. 
 I suspect (but have not yet confirmed) that we should see a huge 
compile-time spike in cross builds as well, though obviously it won't be 
as bad since we're not using qemu emulation.


Clearly something isn't scaling well.  I don't know if we've got a crazy 
large function in there, a crazy number of functions or something that's 
just triggering a compile-time scaling problem.  Whatever it is, we 
probably need to address it.


jeff



Re: [PATCH] LoongArch: Fix the problem of structure parameter passing in C++. This structure has empty structure members and less than three floating point members.

2023-05-24 Thread Lulu Cheng



在 2023/5/25 上午10:52, WANG Xuerui 写道:


On 2023/5/25 10:46, Lulu Cheng wrote:


在 2023/5/25 上午4:15, Jason Merrill 写道:
On Wed, May 24, 2023 at 5:00 AM Jonathan Wakely via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:


    On Wed, 24 May 2023 at 09:41, Xi Ruoyao  wrote:

    > Wang Lei raised some concerns about Itanium C++ ABI, so let's
    ask a C++
    > expert here...
    >
    > Jonathan: AFAIK the standard and the Itanium ABI treats an empty
    class
    > as size 1

    Only as a complete object, not as a subobject.


Also as a data member subobject.

    > in order to guarantee unique address, so for the following:
    >
    > class Empty {};
    > class Test { Empty empty; double a, b; };

    There is no need to have a unique address here, so Test::empty and
    Test::a
    have the same address. It's a potentially-overlapping subobject.

    For the Itanium ABI, sizeof(Test) == 2 * sizeof(double).


That would be true if Test::empty were marked [[no_unique_address]], 
but without that attribute, sizeof(Test) is actually 3 * 
sizeof(double).


    > When we pass "Test" via registers, we may only allocate the
    registers
    > for Test::a and Test::b, and complete ignore Test::empty because
    there
    > is no addresses of registers.  Is this correct or not?

    I think that's a decision for the loongarch psABI. In principle,
    there's no
    reason a register has to be used to pass Test::empty, since you
    can't read
    from it or write to it.


Agreed.  The Itanium C++ ABI has nothing to say about how registers 
are allocated for parameter passing; this is a matter for the psABI.


And there is no need for a psABI to allocate a register for 
Test::empty because it contains no data.


In the x86_64 psABI, Test above is passed in memory because of its 
size ("the size of the aggregate exceeds two eightbytes...").  But


struct Test2 { Empty empty; double a; };

is passed in a single floating-point register; the Test2::empty 
subobject is not passed anywhere, because its eightbyte is 
classified as NO_CLASS, because there is no actual data there.






I know nothing about the LoongArch psABI, but going out of your way 
to assign a register to an empty class seems like a mistake.


MIPS64 and ARM64 also allocate parameter registers for empty structs. 
https://godbolt.org/z/jT4cY3T5o


Our original intention is not to pass this empty structure member, 
but to make the following two structures treat empty structure members


in the same way in the process of passing parameters.

struct st1
{
 struct empty {} e1;
 long a;
 long b;
};

struct st2
{
 struct empty {} e1;
 double f0;
 double f1;
};


Then shouldn't we try to avoid the extra register in all cases, 
instead of wasting one regardless? ;-)


https://godbolt.org/z/eK5T3Erbs

Compared with the situation of x86-64, if it is necessary not to pass 
empty structure members, it is difficult to achieve uniform processing.




Re: [PATCH] i386: Fix incorrect intrinsic signature for AVX512 s{lli|rai|rli}

2023-05-24 Thread Hongtao Liu via Gcc-patches
On Thu, May 25, 2023 at 10:55 AM Hu, Lin1 via Gcc-patches
 wrote:
>
> Hi all,
>
> This patch aims to fix incorrect intrinsic signature for 
> _mm{512|256|}_s{lli|rai|rli}_epi*. And it has been tested on 
> x86_64-pc-linux-gnu. OK for trunk?
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> PR target/109173
> PR target/109174
> * config/i386/avx512bwintrin.h (_mm512_srli_epi16): Change type from
> int to const int.
int to unsigned int or const int to const unsigned int.
Others LGTM.
> (_mm512_mask_srli_epi16): Ditto.
> (_mm512_slli_epi16): Ditto.
> (_mm512_mask_slli_epi16): Ditto.
> (_mm512_maskz_slli_epi16): Ditto.
> (_mm512_srai_epi16): Ditto.
> (_mm512_mask_srai_epi16): Ditto.
> (_mm512_maskz_srai_epi16): Ditto.
> * config/i386/avx512vlintrin.h (_mm256_mask_srli_epi32): Ditto.
> (_mm256_maskz_srli_epi32): Ditto.
> (_mm_mask_srli_epi32): Ditto.
> (_mm_maskz_srli_epi32): Ditto.
> (_mm256_mask_srli_epi64): Ditto.
> (_mm256_maskz_srli_epi64): Ditto.
> (_mm_mask_srli_epi64): Ditto.
> (_mm_maskz_srli_epi64): Ditto.
> (_mm256_mask_srai_epi32): Ditto.
> (_mm256_maskz_srai_epi32): Ditto.
> (_mm_mask_srai_epi32): Ditto.
> (_mm_maskz_srai_epi32): Ditto.
> (_mm256_srai_epi64): Ditto.
> (_mm256_mask_srai_epi64): Ditto.
> (_mm256_maskz_srai_epi64): Ditto.
> (_mm_srai_epi64): Ditto.
> (_mm_mask_srai_epi64): Ditto.
> (_mm_maskz_srai_epi64): Ditto.
> (_mm_mask_slli_epi32): Ditto.
> (_mm_maskz_slli_epi32): Ditto.
> (_mm_mask_slli_epi64): Ditto.
> (_mm_maskz_slli_epi64): Ditto.
> (_mm256_mask_slli_epi32): Ditto.
> (_mm256_maskz_slli_epi32): Ditto.
> (_mm256_mask_slli_epi64): Ditto.
> (_mm256_maskz_slli_epi64): Ditto.
> (_mm_mask_srai_epi16): Ditto.
> (_mm_maskz_srai_epi16): Ditto.
> (_mm256_srai_epi16): Ditto.
> (_mm256_mask_srai_epi16): Ditto.
> (_mm_mask_slli_epi16): Ditto.
> (_mm_maskz_slli_epi16): Ditto.
> (_mm256_mask_slli_epi16): Ditto.
> (_mm256_maskz_slli_epi16): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> PR target/109173
> PR target/109174
> * gcc.target/i386/pr109173-1.c: New test.
> * gcc.target/i386/pr109174-1.c: Ditto.
> ---
>  gcc/config/i386/avx512bwintrin.h   |  32 +++---
>  gcc/config/i386/avx512fintrin.h|  58 +++
>  gcc/config/i386/avx512vlbwintrin.h |  36 ---
>  gcc/config/i386/avx512vlintrin.h   | 112 +++--
>  gcc/testsuite/gcc.target/i386/pr109173-1.c |  57 +++
>  gcc/testsuite/gcc.target/i386/pr109174-1.c |  45 +
>  6 files changed, 236 insertions(+), 104 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr109173-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr109174-1.c
>
> diff --git a/gcc/config/i386/avx512bwintrin.h 
> b/gcc/config/i386/avx512bwintrin.h
> index 89790f7917b..791d4e35f32 100644
> --- a/gcc/config/i386/avx512bwintrin.h
> +++ b/gcc/config/i386/avx512bwintrin.h
> @@ -2880,7 +2880,7 @@ _mm512_maskz_dbsad_epu8 (__mmask32 __U, __m512i __A, 
> __m512i __B,
>
>  extern __inline __m512i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm512_srli_epi16 (__m512i __A, const int __imm)
> +_mm512_srli_epi16 (__m512i __A, const unsigned int __imm)
>  {
>return (__m512i) __builtin_ia32_psrlwi512_mask ((__v32hi) __A, __imm,
>   (__v32hi)
> @@ -2891,7 +2891,7 @@ _mm512_srli_epi16 (__m512i __A, const int __imm)
>  extern __inline __m512i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_srli_epi16 (__m512i __W, __mmask32 __U, __m512i __A,
> -   const int __imm)
> +   const unsigned int __imm)
>  {
>return (__m512i) __builtin_ia32_psrlwi512_mask ((__v32hi) __A, __imm,
>   (__v32hi) __W,
> @@ -2910,7 +2910,7 @@ _mm512_maskz_srli_epi16 (__mmask32 __U, __m512i __A, 
> const int __imm)
>
>  extern __inline __m512i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm512_slli_epi16 (__m512i __A, const int __B)
> +_mm512_slli_epi16 (__m512i __A, const unsigned int __B)
>  {
>return (__m512i) __builtin_ia32_psllwi512_mask ((__v32hi) __A, __B,
>   (__v32hi)
> @@ -2921,7 +2921,7 @@ _mm512_slli_epi16 (__m512i __A, const int __B)
>  extern __inline __m512i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_slli_epi16 (__m512i __W, __mmask32 __U, __m512i __A,
> -   const int __B)
> +   const unsigned int __B)
>  {
>return (__m512i) __builtin_ia32_psllwi512_mask ((__v32hi) __A, __B,
>

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

--- Comment #4 from Andrew Pinski  ---
Note the underlaying issue with VRP is similar to PR 109959 but it is about a
slightly different optimization though.

RE: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence

2023-05-24 Thread Li, Pan2 via Gcc-patches
Oops, forget to remove it in previous version, will wait a while and update 
them together.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, May 25, 2023 11:14 AM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v6] RISC-V: Using merge approach to optimize repeating 
sequence


* machmode.h (VECTOR_BOOL_MODE_P): New macro.

--- a/gcc/machmode.h

+++ b/gcc/machmode.h

@@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];

|| GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM\

|| GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)



+/* Nonzero if MODE is a vector bool mode.  */

+#define VECTOR_BOOL_MODE_P(MODE)\

+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)   \

+
Why do you add this? But no use. You should drop this.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-05-25 11:09
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to optimize the VLS vector initialization like
repeating sequence. From the vslide1down to the vmerge with a simple
cost model, aka every instruction only has 1 cost.

Given code with -march=rv64gcv_zvl256b --param 
riscv-autovec-preference=fixed-vlmax
typedef int64_t vnx32di __attribute__ ((vector_size (256)));

__attribute__ ((noipa)) void
f_vnx32di (int64_t a, int64_t b, int64_t *out)
{
  vnx32di v = {
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
  };
  *(vnx32di *) out = v;
}

Before this patch:
vslide1down.vx (x31 times)

After this patch:
li a5,-1431654400
addi a5,a5,-1365
li a3,-1431654400
addi a3,a3,-1366
slli a5,a5,32
add a5,a5,a3
vsetvli a4,zero,e64,m8,ta,ma
vmv.v.x v8,a0
vmv.s.x v0,a5
vmerge.vxm v8,v8,a1,v0
vs8r.v v8,0(a2)

Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into
SEW = 128 element and then broadcast this big element.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): New type.
* config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro.
(rvv_builder::can_duplicate_repeating_sequence_p): Align the
referenced class member.
(rvv_builder::get_merged_repeating_sequence):
(rvv_builder::repeating_sequence_use_merge_profitable_p): New
function to evaluate the optimization cost.
(rvv_builder::get_merge_scalar_mask): New function to get the
merge mask.
(emit_scalar_move_insn): New function to emit vmv.s.x.
(emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x.
(emit_nonvlmax_integer_move_insn): New function to emit nonvlmax
vmv.v.x.
(get_repeating_sequence_dup_machine_mode): New function to get
the dup machine mode.
(expand_vector_init_merge_repeating_sequence): New function to
perform the optimization.
(expand_vec_init): Add this vector init optimization.
* config/riscv/riscv.h (BITS_PER_WORD): New macro.
* machmode.h (VECTOR_BOOL_MODE_P): New macro.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/riscv-protos.h   |   1 +
gcc/config/riscv/riscv-v.cc   | 225 +-
gcc/config/riscv/riscv.h  |   1 +
gcc/machmode.h|   4 +
.../vls-vlmax/init-repeat-sequence-1.c|  21 ++
.../vls-vlmax/init-repeat-sequence-2.c|  24 ++
.../vls-vlmax/init-repeat-sequence-3.c|  25 ++
.../vls-vlmax/init-repeat-sequence-4.c|  15 ++
.../vls-vlmax/init-repeat-sequence-5.c|  17 ++
.../vls-vlmax/init-repeat-sequence-run-1.c|  47 
.../vls-vlmax/init-repeat-sequence-run-2.c|  46 
.../vls-vlmax/init-repeat-sequence-run-3.c|  41 
12 files changed, 461 insertions(+), 6 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c

Re: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence

2023-05-24 Thread juzhe.zh...@rivai.ai
* machmode.h (VECTOR_BOOL_MODE_P): New macro.
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
|| GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM   \
|| GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
 
+/* Nonzero if MODE is a vector bool mode.  */
+#define VECTOR_BOOL_MODE_P(MODE)   \
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)  \
+
Why do you add this? But no use. You should drop this.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-05-25 11:09
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence
From: Pan Li 
 
This patch would like to optimize the VLS vector initialization like
repeating sequence. From the vslide1down to the vmerge with a simple
cost model, aka every instruction only has 1 cost.
 
Given code with -march=rv64gcv_zvl256b --param 
riscv-autovec-preference=fixed-vlmax
typedef int64_t vnx32di __attribute__ ((vector_size (256)));
 
__attribute__ ((noipa)) void
f_vnx32di (int64_t a, int64_t b, int64_t *out)
{
  vnx32di v = {
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
  };
  *(vnx32di *) out = v;
}
 
Before this patch:
vslide1down.vx (x31 times)
 
After this patch:
li a5,-1431654400
addi a5,a5,-1365
li a3,-1431654400
addi a3,a3,-1366
slli a5,a5,32
add a5,a5,a3
vsetvli a4,zero,e64,m8,ta,ma
vmv.v.x v8,a0
vmv.s.x v0,a5
vmerge.vxm v8,v8,a1,v0
vs8r.v v8,0(a2)
 
Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into
SEW = 128 element and then broadcast this big element.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (enum insn_type): New type.
* config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro.
(rvv_builder::can_duplicate_repeating_sequence_p): Align the
referenced class member.
(rvv_builder::get_merged_repeating_sequence):
(rvv_builder::repeating_sequence_use_merge_profitable_p): New
function to evaluate the optimization cost.
(rvv_builder::get_merge_scalar_mask): New function to get the
merge mask.
(emit_scalar_move_insn): New function to emit vmv.s.x.
(emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x.
(emit_nonvlmax_integer_move_insn): New function to emit nonvlmax
vmv.v.x.
(get_repeating_sequence_dup_machine_mode): New function to get
the dup machine mode.
(expand_vector_init_merge_repeating_sequence): New function to
perform the optimization.
(expand_vec_init): Add this vector init optimization.
* config/riscv/riscv.h (BITS_PER_WORD): New macro.
* machmode.h (VECTOR_BOOL_MODE_P): New macro.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-protos.h   |   1 +
gcc/config/riscv/riscv-v.cc   | 225 +-
gcc/config/riscv/riscv.h  |   1 +
gcc/machmode.h|   4 +
.../vls-vlmax/init-repeat-sequence-1.c|  21 ++
.../vls-vlmax/init-repeat-sequence-2.c|  24 ++
.../vls-vlmax/init-repeat-sequence-3.c|  25 ++
.../vls-vlmax/init-repeat-sequence-4.c|  15 ++
.../vls-vlmax/init-repeat-sequence-5.c|  17 ++
.../vls-vlmax/init-repeat-sequence-run-1.c|  47 
.../vls-vlmax/init-repeat-sequence-run-2.c|  46 
.../vls-vlmax/init-repeat-sequence-run-3.c|  41 
12 files changed, 461 insertions(+), 6 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c
 
diff --git a/gcc/config/riscv/riscv-protos.h 

RE: Re: [PATCH V5] RISC-V: Using merge approach to optimize repeating sequence in vec_init

2023-05-24 Thread Li, Pan2 via Gcc-patches
Hi Kito,

Update the PATCH v6 with refactored framework as below, thanks for comments.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619536.html

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Wednesday, May 17, 2023 11:52 AM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches ; palmer ; 
jeffreyalaw 
Subject: Re: Re: [PATCH V5] RISC-V: Using merge approach to optimize repeating 
sequence in vec_init

On Wed, May 17, 2023 at 11:36 AM juzhe.zh...@rivai.ai  
wrote:
>
> >> Does it means we assume inner_int_mode is DImode? (because sizeof 
> >> (uint64_t)) or it should be something like `for (unsigned int i = 
> >> 0; i < (GET_MODE_SIZE(inner_int_mode ()) * 8 / npatterns ()); i++)` ?
> No, sizeof (uint64_t) means uint64_t mask = 0;

+  return gen_int_mode (mask, inner_int_mode ());
And we expect the uint64_t mask can always be put into inner_int_mode ()?
If not, why do we fill up all 64 bits?

>
> >> Do you mind give more comment about this? what it checked and what it did?
> The reason we use known_gt (GET_MODE_SIZE (dup_mode), 
> BYTES_PER_RISCV_VECTOR) since we want are using vector integer mode to 
> generate the mask for example we generate 0b01010101010101 mask, we 
> should use a scalar register holding value = 0b010101010...
> Then vmv.v.x into a vector,then this vector will be used as a mask.
>
> >> Why this only hide in else? I guess I have this question is because 
> >> I don't fully understand the logic of the if condition?
>
> Since we can't vector floting-point instruction to generate a mask.

I don't get why it's not something like below?

if (known_gt (GET_MODE_SIZE (dup_mode), BYTES_PER_RISCV_VECTOR)) { ...
}
if (FLOAT_MODE_P (dup_mode))
{
...
}



>
> >> nit: builder.inner_mode () rather than GET_MODE_INNER (dup_mode)?
>
> They are the same. I can change it using GET_MODE_INNER
>
> >> And I would like have more commnet to explain why we need force_reg here.
> Since it will creat ICE.

But why? And why can it be resolved by force_reg? you need few more comment in 
the code


[PATCH v6] RISC-V: Using merge approach to optimize repeating sequence

2023-05-24 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to optimize the VLS vector initialization like
repeating sequence. From the vslide1down to the vmerge with a simple
cost model, aka every instruction only has 1 cost.

Given code with -march=rv64gcv_zvl256b --param 
riscv-autovec-preference=fixed-vlmax
typedef int64_t vnx32di __attribute__ ((vector_size (256)));

__attribute__ ((noipa)) void
f_vnx32di (int64_t a, int64_t b, int64_t *out)
{
  vnx32di v = {
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
  };
  *(vnx32di *) out = v;
}

Before this patch:
vslide1down.vx (x31 times)

After this patch:
li  a5,-1431654400
addia5,a5,-1365
li  a3,-1431654400
addia3,a3,-1366
sllia5,a5,32
add a5,a5,a3
vsetvli a4,zero,e64,m8,ta,ma
vmv.v.x v8,a0
vmv.s.x v0,a5
vmerge.vxm  v8,v8,a1,v0
vs8r.v  v8,0(a2)

Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into
SEW = 128 element and then broadcast this big element.

Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): New type.
* config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro.
(rvv_builder::can_duplicate_repeating_sequence_p): Align the
referenced class member.
(rvv_builder::get_merged_repeating_sequence):
(rvv_builder::repeating_sequence_use_merge_profitable_p): New
function to evaluate the optimization cost.
(rvv_builder::get_merge_scalar_mask): New function to get the
merge mask.
(emit_scalar_move_insn): New function to emit vmv.s.x.
(emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x.
(emit_nonvlmax_integer_move_insn): New function to emit nonvlmax
vmv.v.x.
(get_repeating_sequence_dup_machine_mode): New function to get
the dup machine mode.
(expand_vector_init_merge_repeating_sequence): New function to
perform the optimization.
(expand_vec_init): Add this vector init optimization.
* config/riscv/riscv.h (BITS_PER_WORD): New macro.
* machmode.h (VECTOR_BOOL_MODE_P): New macro.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New 
test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New 
test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: 
New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   | 225 +-
 gcc/config/riscv/riscv.h  |   1 +
 gcc/machmode.h|   4 +
 .../vls-vlmax/init-repeat-sequence-1.c|  21 ++
 .../vls-vlmax/init-repeat-sequence-2.c|  24 ++
 .../vls-vlmax/init-repeat-sequence-3.c|  25 ++
 .../vls-vlmax/init-repeat-sequence-4.c|  15 ++
 .../vls-vlmax/init-repeat-sequence-5.c|  17 ++
 .../vls-vlmax/init-repeat-sequence-run-1.c|  47 
 .../vls-vlmax/init-repeat-sequence-run-2.c|  46 
 .../vls-vlmax/init-repeat-sequence-run-3.c|  41 
 12 files changed, 461 insertions(+), 6 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 36419c95bbd..768b646fec1 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -140,6 +140,7 @@ enum insn_type
   RVV_MERGE_OP = 4,
   RVV_CMP_OP = 4,
   RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand.  */
+  RVV_SCALAR_MOV_OP = 4,
 };
 enum vlmul_type
 {
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f71ad9e46a1..458020ce0a1 100644

[Bug fortran/109948] [13/14 Regression] ICE(segfault) in gfc_expression_rank() from gfc_op_rank_conformable()

2023-05-24 Thread rimvydas.jas at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109948

--- Comment #5 from Rimvydas (RJ)  ---
(In reply to anlauf from comment #4)
> Can you check if this works for you?

This patch allows to avoid issue on all other associate use cases (tried on
gcc-13 branch).

However it is a bit suspicious that using variable name abbreviations (to dig
out arrays from deeply nested types) is enough to change how the internal
gfc_array_ref is populated.  ICE was triggered only on patterns involving first
using abbreviated name indexed access (like k(1)) followed by any operation
involving whole array.

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread juzhe.zh...@rivai.ai
Hi, Richard. 
After several tries with your testcases (I already added into V15 patch).
I think "using a new IV" would be better than "multiplication"

Now:
 loop_len_34 = MIN_EXPR ;
  _74 = MIN_EXPR ;   --> multiplication approach will changed 
into  _74 = loop_len_34  * 2;
  loop_len_48 = MIN_EXPR <_74, 4>;
  _77 = _74 - loop_len_48;
  loop_len_49 = MIN_EXPR <_77, 4>;
  _78 = _77 - loop_len_49;
  loop_len_50 = MIN_EXPR <_78, 4>;
  loop_len_51 = _78 - loop_len_50;

I prefer "new IV" since it looks more reasonable and better codegen.
Could you take a look at it:
V15 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619534.html 
  
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-25 04:05
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by 
variable amount support
I'll look at the samples tomorrow, but just to address one thing:
 
钟居哲  writes:
>>> What gives the best code in these cases?  Is emitting a multiplication
>>> better?  Or is using a new IV better?
> Could you give me more detail information about "new refresh IV" approach.
> I'd like to try that.
 
By “using a new IV” I meant calling vect_set_loop_controls_directly
for every rgroup, not just the first.  So in the earlier example,
there would be one decrementing IV for x and one decrementing IV for y.
 
Thanks,
Richard
 
 
 


[PATCH V15] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch is supporting decrement IV by following the flow designed by Richard:

(1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
call vect_set_loop_controls_directly.

(2) vect_set_loop_controls_directly calculates "step" as in your patch.
If rgc has 1 control, this step is the SSA name created for that control.
Otherwise the step is a fresh SSA name, as in your patch.

(3) vect_set_loop_controls_directly stores this step somewhere for later
use, probably in LOOP_VINFO.  Let's use "S" to refer to this stored step.

(4) After the vect_set_loop_controls_directly call above, and outside
the "if" statement that now contains vect_set_loop_controls_directly,
check whether rgc->controls.length () > 1.  If so, use
vect_adjust_loop_lens_control to set the controls based on S.

Then the only caller of vect_adjust_loop_lens_control is
vect_set_loop_condition_partial_vectors.  And the starting
step for vect_adjust_loop_lens_control is always S.

This patch has well tested for single-rgroup and multiple-rgroup (SLP) and
passed all testcase in RISC-V port.

Also, pass tests for multiple-rgroup (non-SLP) tested on vec_pack_trunk.

Fix bugs of V14 patch:
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test

This patch passed all testcases listed above.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Add 
decrement IV support.
(vect_adjust_loop_lens_control): Ditto.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): New variables.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
(LOOP_VINFO_DECREMENTING_IV_STEP): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c: New 
test.

---
 .../rvv/autovec/partial/multiple_rgroup-3.c   | 288 ++
 .../rvv/autovec/partial/multiple_rgroup-4.c   |  75 +
 .../autovec/partial/multiple_rgroup_run-3.c   |  36 +++
 .../autovec/partial/multiple_rgroup_run-4.c   |  15 +
 gcc/tree-vect-loop-manip.cc   | 153 ++
 gcc/tree-vect-loop.cc |  13 +
 gcc/tree-vectorizer.h |  12 +
 7 files changed, 592 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
new file mode 100644
index 000..9579749c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
@@ -0,0 +1,288 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+
+void __attribute__ ((noinline, noclone))
+f0 (int8_t *__restrict x, int16_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] += 1;
+  x[i + 1] += 2;
+  x[i + 2] += 3;
+  x[i + 3] += 4;
+  y[j + 0] += 1;
+  y[j + 1] += 2;
+  y[j + 2] += 3;
+  y[j + 3] += 4;
+  y[j + 4] += 5;
+  y[j + 5] += 6;
+  y[j + 6] += 7;
+  y[j + 7] += 8;
+}
+}
+
+void __attribute__ ((optimize (0)))
+f0_init (int8_t *__restrict x, int8_t *__restrict x2, int16_t *__restrict y,
+int16_t *__restrict y2, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] = i % 120;
+  x[i + 1] = i % 78;
+  x[i + 2] = i % 55;
+  x[i + 3] = i % 27;
+  y[j + 0] = j % 33;
+  y[j + 1] = j % 44;
+  y[j + 2] = j % 66;
+  y[j + 3] = j % 88;
+  y[j + 4] = j % 99;
+  y[j + 5] = j % 39;
+  y[j + 6] = j % 49;
+  y[j + 7] = j % 101;
+
+  x2[i + 0] = i % 120;
+  

[PATCH] i386: Fix incorrect intrinsic signature for AVX512 s{lli|rai|rli}

2023-05-24 Thread Hu, Lin1 via Gcc-patches
Hi all,

This patch aims to fix incorrect intrinsic signature for 
_mm{512|256|}_s{lli|rai|rli}_epi*. And it has been tested on 
x86_64-pc-linux-gnu. OK for trunk?

BRs,
Lin

gcc/ChangeLog:

PR target/109173
PR target/109174
* config/i386/avx512bwintrin.h (_mm512_srli_epi16): Change type from
int to const int.
(_mm512_mask_srli_epi16): Ditto.
(_mm512_slli_epi16): Ditto.
(_mm512_mask_slli_epi16): Ditto.
(_mm512_maskz_slli_epi16): Ditto.
(_mm512_srai_epi16): Ditto.
(_mm512_mask_srai_epi16): Ditto.
(_mm512_maskz_srai_epi16): Ditto.
* config/i386/avx512vlintrin.h (_mm256_mask_srli_epi32): Ditto.
(_mm256_maskz_srli_epi32): Ditto.
(_mm_mask_srli_epi32): Ditto.
(_mm_maskz_srli_epi32): Ditto.
(_mm256_mask_srli_epi64): Ditto.
(_mm256_maskz_srli_epi64): Ditto.
(_mm_mask_srli_epi64): Ditto.
(_mm_maskz_srli_epi64): Ditto.
(_mm256_mask_srai_epi32): Ditto.
(_mm256_maskz_srai_epi32): Ditto.
(_mm_mask_srai_epi32): Ditto.
(_mm_maskz_srai_epi32): Ditto.
(_mm256_srai_epi64): Ditto.
(_mm256_mask_srai_epi64): Ditto.
(_mm256_maskz_srai_epi64): Ditto.
(_mm_srai_epi64): Ditto.
(_mm_mask_srai_epi64): Ditto.
(_mm_maskz_srai_epi64): Ditto.
(_mm_mask_slli_epi32): Ditto.
(_mm_maskz_slli_epi32): Ditto.
(_mm_mask_slli_epi64): Ditto.
(_mm_maskz_slli_epi64): Ditto.
(_mm256_mask_slli_epi32): Ditto.
(_mm256_maskz_slli_epi32): Ditto.
(_mm256_mask_slli_epi64): Ditto.
(_mm256_maskz_slli_epi64): Ditto.
(_mm_mask_srai_epi16): Ditto.
(_mm_maskz_srai_epi16): Ditto.
(_mm256_srai_epi16): Ditto.
(_mm256_mask_srai_epi16): Ditto.
(_mm_mask_slli_epi16): Ditto.
(_mm_maskz_slli_epi16): Ditto.
(_mm256_mask_slli_epi16): Ditto.
(_mm256_maskz_slli_epi16): Ditto.

gcc/testsuite/ChangeLog:

PR target/109173
PR target/109174
* gcc.target/i386/pr109173-1.c: New test.
* gcc.target/i386/pr109174-1.c: Ditto.
---
 gcc/config/i386/avx512bwintrin.h   |  32 +++---
 gcc/config/i386/avx512fintrin.h|  58 +++
 gcc/config/i386/avx512vlbwintrin.h |  36 ---
 gcc/config/i386/avx512vlintrin.h   | 112 +++--
 gcc/testsuite/gcc.target/i386/pr109173-1.c |  57 +++
 gcc/testsuite/gcc.target/i386/pr109174-1.c |  45 +
 6 files changed, 236 insertions(+), 104 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109173-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109174-1.c

diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h
index 89790f7917b..791d4e35f32 100644
--- a/gcc/config/i386/avx512bwintrin.h
+++ b/gcc/config/i386/avx512bwintrin.h
@@ -2880,7 +2880,7 @@ _mm512_maskz_dbsad_epu8 (__mmask32 __U, __m512i __A, 
__m512i __B,
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_srli_epi16 (__m512i __A, const int __imm)
+_mm512_srli_epi16 (__m512i __A, const unsigned int __imm)
 {
   return (__m512i) __builtin_ia32_psrlwi512_mask ((__v32hi) __A, __imm,
  (__v32hi)
@@ -2891,7 +2891,7 @@ _mm512_srli_epi16 (__m512i __A, const int __imm)
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_srli_epi16 (__m512i __W, __mmask32 __U, __m512i __A,
-   const int __imm)
+   const unsigned int __imm)
 {
   return (__m512i) __builtin_ia32_psrlwi512_mask ((__v32hi) __A, __imm,
  (__v32hi) __W,
@@ -2910,7 +2910,7 @@ _mm512_maskz_srli_epi16 (__mmask32 __U, __m512i __A, 
const int __imm)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_slli_epi16 (__m512i __A, const int __B)
+_mm512_slli_epi16 (__m512i __A, const unsigned int __B)
 {
   return (__m512i) __builtin_ia32_psllwi512_mask ((__v32hi) __A, __B,
  (__v32hi)
@@ -2921,7 +2921,7 @@ _mm512_slli_epi16 (__m512i __A, const int __B)
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_slli_epi16 (__m512i __W, __mmask32 __U, __m512i __A,
-   const int __B)
+   const unsigned int __B)
 {
   return (__m512i) __builtin_ia32_psllwi512_mask ((__v32hi) __A, __B,
  (__v32hi) __W,
@@ -2930,7 +2930,7 @@ _mm512_mask_slli_epi16 (__m512i __W, __mmask32 __U, 
__m512i __A,
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_slli_epi16 (__mmask32 __U, __m512i __A, const int __B)
+_mm512_maskz_slli_epi16 (__mmask32 __U, 

Re: [PATCH] LoongArch: Fix the problem of structure parameter passing in C++. This structure has empty structure members and less than three floating point members.

2023-05-24 Thread WANG Xuerui



On 2023/5/25 10:46, Lulu Cheng wrote:


在 2023/5/25 上午4:15, Jason Merrill 写道:
On Wed, May 24, 2023 at 5:00 AM Jonathan Wakely via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:


On Wed, 24 May 2023 at 09:41, Xi Ruoyao  wrote:

> Wang Lei raised some concerns about Itanium C++ ABI, so let's
ask a C++
> expert here...
>
> Jonathan: AFAIK the standard and the Itanium ABI treats an empty
class
> as size 1

Only as a complete object, not as a subobject.


Also as a data member subobject.

> in order to guarantee unique address, so for the following:
>
> class Empty {};
> class Test { Empty empty; double a, b; };

There is no need to have a unique address here, so Test::empty and
Test::a
have the same address. It's a potentially-overlapping subobject.

For the Itanium ABI, sizeof(Test) == 2 * sizeof(double).


That would be true if Test::empty were marked [[no_unique_address]], 
but without that attribute, sizeof(Test) is actually 3 * sizeof(double).


> When we pass "Test" via registers, we may only allocate the
registers
> for Test::a and Test::b, and complete ignore Test::empty because
there
> is no addresses of registers.  Is this correct or not?

I think that's a decision for the loongarch psABI. In principle,
there's no
reason a register has to be used to pass Test::empty, since you
can't read
from it or write to it.


Agreed.  The Itanium C++ ABI has nothing to say about how registers 
are allocated for parameter passing; this is a matter for the psABI.


And there is no need for a psABI to allocate a register for 
Test::empty because it contains no data.


In the x86_64 psABI, Test above is passed in memory because of its 
size ("the size of the aggregate exceeds two eightbytes...").  But


struct Test2 { Empty empty; double a; };

is passed in a single floating-point register; the Test2::empty 
subobject is not passed anywhere, because its eightbyte is classified 
as NO_CLASS, because there is no actual data there.






I know nothing about the LoongArch psABI, but going out of your way to 
assign a register to an empty class seems like a mistake.


MIPS64 and ARM64 also allocate parameter registers for empty structs. 
https://godbolt.org/z/jT4cY3T5o


Our original intention is not to pass this empty structure member, but 
to make the following two structures treat empty structure members


in the same way in the process of passing parameters.

struct st1
{
     struct empty {} e1;
     long a;
     long b;
};

struct st2
{
     struct empty {} e1;
     double f0;
     double f1;
};


Then shouldn't we try to avoid the extra register in all cases, instead 
of wasting one regardless? ;-)


[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

--- Comment #4 from Andrew Pinski  ---
I happened to notice this because I am working on a match patch that transform
`a ? 1 : b` into `a | b`.

In the case of stmt_can_terminate_bb_p, I noticed we had:
   [local count: 330920071]:
  _48 = MEM[(const struct gasm *)t_22(D)].D.129035.D.128905.D.128890.subcode;
  _49 = _48 & 2;
  if (_49 != 0)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 218407246]:
  _50 = (bool) _48;

   [local count: 940291388]:
  # _13 = PHI <0(14), _50(32), _12(29), 0(11), 0(30), 1(2), 1(31), 0(25)>

And the patch to match would do:
   [local count: 330920071]:
  _48 = MEM[(const struct gasm *)t_22(D)].D.129035.D.128905.D.128890.subcode;
  _49 = _48 & 2;
  _50 = (bool) _48;
  _127 = _49 != 0;
  _44 = _50 | _127;

   [local count: 940291388]:
  # _13 = PHI <0(14), 0(25), _12(29), 0(11), 0(30), 1(2), _44(31)>

Which is definitely better than before but I was like isn't that the same as:
  _49 = _48 & 3;
  _44 = _49 != 0;

Re: [PATCH] LoongArch: Fix the problem of structure parameter passing in C++. This structure has empty structure members and less than three floating point members.

2023-05-24 Thread Lulu Cheng



在 2023/5/25 上午4:15, Jason Merrill 写道:
On Wed, May 24, 2023 at 5:00 AM Jonathan Wakely via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:


On Wed, 24 May 2023 at 09:41, Xi Ruoyao  wrote:

> Wang Lei raised some concerns about Itanium C++ ABI, so let's
ask a C++
> expert here...
>
> Jonathan: AFAIK the standard and the Itanium ABI treats an empty
class
> as size 1

Only as a complete object, not as a subobject.


Also as a data member subobject.

> in order to guarantee unique address, so for the following:
>
> class Empty {};
> class Test { Empty empty; double a, b; };

There is no need to have a unique address here, so Test::empty and
Test::a
have the same address. It's a potentially-overlapping subobject.

For the Itanium ABI, sizeof(Test) == 2 * sizeof(double).


That would be true if Test::empty were marked [[no_unique_address]], 
but without that attribute, sizeof(Test) is actually 3 * sizeof(double).


> When we pass "Test" via registers, we may only allocate the
registers
> for Test::a and Test::b, and complete ignore Test::empty because
there
> is no addresses of registers.  Is this correct or not?

I think that's a decision for the loongarch psABI. In principle,
there's no
reason a register has to be used to pass Test::empty, since you
can't read
from it or write to it.


Agreed.  The Itanium C++ ABI has nothing to say about how registers 
are allocated for parameter passing; this is a matter for the psABI.


And there is no need for a psABI to allocate a register for 
Test::empty because it contains no data.


In the x86_64 psABI, Test above is passed in memory because of its 
size ("the size of the aggregate exceeds two eightbytes...").  But


struct Test2 { Empty empty; double a; };

is passed in a single floating-point register; the Test2::empty 
subobject is not passed anywhere, because its eightbyte is classified 
as NO_CLASS, because there is no actual data there.






I know nothing about the LoongArch psABI, but going out of your way to 
assign a register to an empty class seems like a mistake.


MIPS64 and ARM64 also allocate parameter registers for empty structs. 
https://godbolt.org/z/jT4cY3T5o


Our original intention is not to pass this empty structure member, but 
to make the following two structures treat empty structure members


in the same way in the process of passing parameters.

struct st1
{
    struct empty {} e1;
    long a;
    long b;
};

struct st2
{
    struct empty {} e1;
    double f0;
    double f1;
};




> On Wed, 2023-05-24 at 14:45 +0800, Xi Ruoyao via Gcc-patches wrote:
> > On Wed, 2023-05-24 at 14:04 +0800, Lulu Cheng wrote:
> > > An empty struct type that is not non-trivial for the purposes of
> > > calls
> > > will be treated as though it were the following C type:
> > >
> > > struct {
> > >   char c;
> > > };
> > >
> > > Before this patch was added, a structure parameter containing an
> > > empty structure and
> > > less than three floating-point members was passed through
one or two
> > > floating-point
> > > registers, while nested empty structures are ignored. Which
did not
> > > conform to the
> > > calling convention.
> >
> > No, it's a deliberate decision I've made in
> > https://gcc.gnu.org/r12-8294. And we already agreed "the ABI
needs to
> > be updated" when we applied r12-8294, but I've never improved my
> > English
> > skill to revise the ABI myself :(.
> >
> > We are also using the same "de-facto" ABI throwing away the empty
> > struct
> > for Clang++ (https://reviews.llvm.org/D132285). So we should
update
> > the
> > spec here, instead of changing every implementation.
> >
> > The C++ standard treats the empty struct as size 1 for
ensuring the
> > semantics of pointer comparison operations.  When we pass it
through
> > the
> > registers, there is no need to really consider the empty field
because
> > there is no pointers to registers.
> >
>
>



[Bug target/100106] [10 Regression] ICE in gen_movdi, at config/arm/arm.md:6187 since r10-2840-g70cdb21e

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100106

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Alexandre Oliva :

https://gcc.gnu.org/g:d6b756447cd58bcca20e6892790582308b869817

commit r14-1187-gd6b756447cd58bcca20e6892790582308b869817
Author: Alexandre Oliva 
Date:   Wed May 24 03:07:56 2023 -0300

[PR100106] Reject unaligned subregs when strict alignment is required

The testcase for pr100106, compiled with optimization for 32-bit
powerpc -mcpu=604 with -mstrict-align expands the initialization of a
union from a float _Complex value into a load from an SCmode
constant pool entry, aligned to 4 bytes, into a DImode pseudo,
requiring 8-byte alignment.

The patch that introduced the testcase modified simplify_subreg to
avoid changing the MEM to outermode, but simplify_gen_subreg still
creates a SUBREG or a MEM that would require stricter alignment than
MEM's, and lra_constraints appears to get confused by that, repeatedly
creating unsatisfiable reloads for the SUBREG until it exceeds the
insn count.

Avoiding the unaligned SUBREG, expand splits the DImode dest into
SUBREGs and loads each SImode word of the constant pool with the
proper alignment.


for  gcc/ChangeLog

PR target/100106
* emit-rtl.cc (validate_subreg): Reject a SUBREG of a MEM that
requires stricter alignment than MEM's.

for  gcc/testsuite/ChangeLog

PR target/100106
* gcc.target/powerpc/pr100106-sa.c: New.

[Bug target/109933] __atomic_test_and_set is broken for BIG ENDIAN riscv targets

2023-05-24 Thread rory.bolt at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109933

--- Comment #9 from Rory Bolt  ---
Created attachment 55153
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55153=edit
patch

Tested fix for big endian, NOT tested on little endian

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|1   |0
 Status|ASSIGNED|UNCONFIRMED
   Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #3 from Andrew Pinski  ---
Nope not working, even tried to figure out how to modify tree-ssa-reassoc.cc to
teach it about `(bool)a` being the same as `(a & 1) != 0` But I could not
figure out how.

[V8][PATCH 0/2]Accept and Handle the case when a structure including a FAM nested in another structure

2023-05-24 Thread Qing Zhao via Gcc-patches
Hi,

This is the 8th version of the patch, which rebased on the latest trunk.
This is an important patch needed by Linux Kernel security project. 

compared to the 7th version, the major change are:
1. update the documentation wordings based on Joseph's suggestions.
2. change the name of the new macro TYPE_INCLUDE_FLEXARRAY to
   TYPE_INCLUDES_FLEXARRAY. 

all others keep the same as version 7. 

the 7th version are here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619033.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619034.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619036.html

bootstrapped and regression tested on aarch64 and x86.

Okay for commit?

thanks a lot.

Qing



[PATCH 2/2] Update documentation to clarify a GCC extension [PR77650]

2023-05-24 Thread Qing Zhao via Gcc-patches
on a structure with a C99 flexible array member being nested in
another structure.

"The GCC extension accepts a structure containing an ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.

 There are two situations:

   * A structure containing a C99 flexible array member, or a union
 containing such a structure, is the last field of another structure,
 for example:

  struct flex  { int length; char data[]; };
  union union_flex { int others; struct flex f; };

  struct out_flex_struct { int m; struct flex flex_data; };
  struct out_flex_union { int n; union union_flex flex_data; };

 In the above, both 'out_flex_struct.flex_data.data[]' and
 'out_flex_union.flex_data.f.data[]' are considered as flexible
 arrays too.

   * A structure containing a C99 flexible array member, or a union
 containing such a structure, is not the last field of another structure,
 for example:

  struct flex  { int length; char data[]; };

  struct mid_flex { int m; struct flex flex_data; int n; };

 In the above, accessing a member of the array 'mid_flex.flex_data.data[]'
 might have undefined behavior.  Compilers do not handle such a case
 consistently, Any code relying on this case should be modified to ensure
 that flexible array members only end up at the ends of structures.

 Please use the warning option '-Wflex-array-member-not-at-end' to
 identify all such cases in the source code and modify them.  This
 warning will be on by default starting from GCC 15.
"

gcc/c-family/ChangeLog:

* c.opt: New option -Wflex-array-member-not-at-end.

gcc/c/ChangeLog:

* c-decl.cc (finish_struct): Issue warnings for new option.

gcc/ChangeLog:

* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.

gcc/testsuite/ChangeLog:

* gcc.dg/variable-sized-type-flex-array.c: New test.
---
 gcc/c-family/c.opt|  5 +++
 gcc/c/c-decl.cc   |  9 
 gcc/doc/extend.texi   | 44 ++-
 .../gcc.dg/variable-sized-type-flex-array.c   | 31 +
 4 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/variable-sized-type-flex-array.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cddeece..c26d9801b63 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -737,6 +737,11 @@ Wformat-truncation=
 C ObjC C++ LTO ObjC++ Joined RejectNegative UInteger Var(warn_format_trunc) 
Warning LangEnabledBy(C ObjC C++ LTO ObjC++,Wformat=, warn_format >= 1, 0) 
IntegerRange(0, 2)
 Warn about calls to snprintf and similar functions that truncate output.
 
+Wflex-array-member-not-at-end
+C C++ Var(warn_flex_array_member_not_at_end) Warning
+Warn when a structure containing a C99 flexible array member as the last
+field is not at the end of another structure.
+
 Wif-not-aligned
 C ObjC C++ ObjC++ Var(warn_if_not_aligned) Init(1) Warning
 Warn when the field in a struct is not aligned.
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index e14f514cb6e..ecd10ebb69c 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9278,6 +9278,15 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
TYPE_INCLUDES_FLEXARRAY (t)
  = is_last_field && TYPE_INCLUDES_FLEXARRAY (TREE_TYPE (x));
 
+  if (warn_flex_array_member_not_at_end
+ && !is_last_field
+ && RECORD_OR_UNION_TYPE_P (TREE_TYPE (x))
+ && TYPE_INCLUDES_FLEXARRAY (TREE_TYPE (x)))
+   warning_at (DECL_SOURCE_LOCATION (x),
+   OPT_Wflex_array_member_not_at_end,
+   "structure containing a flexible array member"
+   " is not at the end of another structure");
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f9d13b495ad..17ef80e75cc 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1751,7 +1751,49 @@ Flexible array members may only appear as the last 
member of a
 A structure containing a flexible array member, or a union containing
 such a structure (possibly recursively), may not be a member of a
 structure or an element of an array.  (However, these uses are
-permitted by GCC as extensions.)
+permitted by GCC as extensions, see details below.)
+@end itemize
+
+The GCC extension accepts a structure containing an ISO C99 @dfn{flexible array
+member}, or a union containing such a structure (possibly recursively)
+to be a member of a structure.
+
+There are two situations:
+
+@itemize @bullet
+@item
+A structure containing a C99 flexible array member, or a union containing
+such a structure, is the last field of another structure, for example:
+

[PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-05-24 Thread Qing Zhao via Gcc-patches
GCC extension accepts the case when a struct with a C99 flexible array member
is embedded into another struct or union (possibly recursively) as the last
field.
__builtin_object_size should treat such struct as flexible size.

gcc/c/ChangeLog:

PR tree-optimization/101832
* c-decl.cc (finish_struct): Set TYPE_INCLUDES_FLEXARRAY for
struct/union type.

gcc/lto/ChangeLog:

PR tree-optimization/101832
* lto-common.cc (compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDES_FLEXARRAY properly
for its corresponding type.

gcc/ChangeLog:

PR tree-optimization/101832
* print-tree.cc (print_node): Print new bit type_includes_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_includes_flexarray for RECORD_TYPE or UNION_TYPE.
* tree-object-size.cc (addr_object_size): Handle structure/union type
when it has flexible size.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDES_FLEXARRAY): New macro TYPE_INCLUDES_FLEXARRAY.

gcc/testsuite/ChangeLog:

PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.
---
 gcc/c/c-decl.cc   |  11 ++
 gcc/lto/lto-common.cc |   5 +-
 gcc/print-tree.cc |   5 +
 .../gcc.dg/builtin-object-size-pr101832.c | 134 ++
 gcc/tree-core.h   |   2 +
 gcc/tree-object-size.cc   |  23 ++-
 gcc/tree-streamer-in.cc   |   5 +-
 gcc/tree-streamer-out.cc  |   5 +-
 gcc/tree.h|   7 +-
 9 files changed, 192 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 1af51c4acfc..e14f514cb6e 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9267,6 +9267,17 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   /* Set DECL_NOT_FLEXARRAY flag for FIELD_DECL x.  */
   DECL_NOT_FLEXARRAY (x) = !is_flexible_array_member_p (is_last_field, x);
 
+  /* Set TYPE_INCLUDES_FLEXARRAY for the context of x, t.
+when x is an array and is the last field.  */
+  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
+   TYPE_INCLUDES_FLEXARRAY (t)
+ = is_last_field && flexible_array_member_type_p (TREE_TYPE (x));
+  /* Recursively set TYPE_INCLUDES_FLEXARRAY for the context of x, t
+when x is an union or record and is the last field.  */
+  else if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
+   TYPE_INCLUDES_FLEXARRAY (t)
+ = is_last_field && TYPE_INCLUDES_FLEXARRAY (TREE_TYPE (x));
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 537570204b3..f6b85bbc6f7 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1275,7 +1275,10 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
   if (AGGREGATE_TYPE_P (t1))
compare_values (TYPE_TYPELESS_STORAGE);
   compare_values (TYPE_EMPTY_P);
-  compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (FUNC_OR_METHOD_TYPE_P (t1))
+   compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (RECORD_OR_UNION_TYPE_P (t1))
+   compare_values (TYPE_INCLUDES_FLEXARRAY);
   compare_values (TYPE_PACKED);
   compare_values (TYPE_RESTRICT);
   compare_values (TYPE_USER_ALIGN);
diff --git a/gcc/print-tree.cc b/gcc/print-tree.cc
index ccecd3dc6a7..62451b6cf4e 100644
--- a/gcc/print-tree.cc
+++ b/gcc/print-tree.cc
@@ -632,6 +632,11 @@ print_node (FILE *file, const char *prefix, tree node, int 
indent,
  && TYPE_CXX_ODR_P (node))
fputs (" cxx-odr-p", file);
 
+  if ((code == RECORD_TYPE
+  || code == UNION_TYPE)
+ && TYPE_INCLUDES_FLEXARRAY (node))
+   fputs (" includes-flexarray", file);
+
   /* The transparent-union flag is used for different things in
 different nodes.  */
   if ((code == UNION_TYPE || code == RECORD_TYPE)
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
new file mode 100644
index 000..60078e11634
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
@@ -0,0 +1,134 @@
+/* PR 101832: 
+   GCC extension accepts the case when a struct with a C99 flexible array
+   member is embedded into another struct (possibly recursively).
+   __builtin_object_size will treat such struct as flexible size.

[Bug c++/109961] auto assigned from requires and lambda inside

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109961

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
Summary|storage size of 'variable   |auto assigned from requires
   |name' isn't known   |and lambda inside
   Keywords||c++-lambda, rejects-valid
   Last reconfirmed||2023-05-25
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed.
Reduced all the way:
```
auto a = requires{  []()  {}; };
```

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-05-25
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #2 from Andrew Pinski  ---
Or maybe extend recognize_single_bit_test to recognize (bool)a != 0 is the same
as a & 1 != 0.

Let me try that.

[Bug c++/109961] New: storage size of 'variable name' isn't known

2023-05-24 Thread Darrell.Wright at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109961

Bug ID: 109961
   Summary: storage size of 'variable name' isn't known
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Darrell.Wright at gmail dot com
  Target Milestone: ---

The following valid code fails to compile in gcc-trunk on
https://foo.godbolt.org/z/vGMGbv8oP 

auto a = requires{ 
[]( int b ) consteval {
   if( b ) {
throw b;
   }
}( 0 );
};

With the following error

:3:6: error: storage size of 'a' isn't known
3 | auto a = requires{
  |  ^
Compiler returned: 1

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

--- Comment #1 from Andrew Pinski  ---
We could have a pattern that does:

`(a & CST) != 0 ? 1: (bool)a` -> `a & (CST|1) != 0` to fix this I think.

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||8.5.0
  Known to fail||9.1.0
   Target Milestone|--- |10.5

[Bug tree-optimization/109960] New: [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

Bug ID: 109960
   Summary: [10/11/12/13/14 Regression] missing combining of
`(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take the following C++ code (reduced from stmt_can_terminate_bb_p):
```
static inline bool f1(unsigned *a)
{
return (*a&1);
}
static inline bool f2(unsigned *a)
{
return (*a&2);
}

bool f(int c, unsigned *a)
{
  if (c)
return 0;
  return f2(a) || f1(a) ;
}
```

At -O1 we can produce:
```
movl$0, %eax
testl   %edi, %edi
jne .L1
testb   $3, (%rsi)
setne   %al
.L1:
ret
```
But at -O2 we get:
xorl%eax, %eax
testl   %edi, %edi
jne .L1
movl(%rsi), %edx
movl%edx, %eax
andl$1, %eax
andl$2, %edx
movl$1, %edx
cmovne  %edx, %eax
.L1:
ret

Which is just so much worse.
This started in GCC 9.

RE: [PATCH] PR gcc/98350:Handle FMA friendly in reassoc pass

2023-05-24 Thread Cui, Lili via Gcc-patches
> > +rewrite_expr_tree_parallel (gassign *stmt, int width, bool has_fma,
> > +const vec
> > +)
> >  {
> >enum tree_code opcode = gimple_assign_rhs_code (stmt);
> >int op_num = ops.length ();
> > @@ -5483,10 +5494,11 @@ rewrite_expr_tree_parallel (gassign *stmt, int
> width,
> >int stmt_num = op_num - 1;
> >gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
> >int op_index = op_num - 1;
> > -  int stmt_index = 0;
> > -  int ready_stmts_end = 0;
> > -  int i = 0;
> > -  gimple *stmt1 = NULL, *stmt2 = NULL;
> > +  int width_count = width;
> > +  int i = 0, j = 0;
> > +  tree tmp_op[2], op1;
> > +  operand_entry *oe;
> > +  gimple *stmt1 = NULL;
> >tree last_rhs1 = gimple_assign_rhs1 (stmt);
> >
> >/* We start expression rewriting from the top statements.
> > @@ -5496,91 +5508,84 @@ rewrite_expr_tree_parallel (gassign *stmt, int
> width,
> >for (i = stmt_num - 2; i >= 0; i--)
> >  stmts[i] = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmts[i+1]));
> >
> > -  for (i = 0; i < stmt_num; i++)
> > +  /* Build parallel dependency chain according to width.  */  for (i
> > + = 0; i < width; i++)
> >  {
> > -  tree op1, op2;
> > -
> > -  /* Determine whether we should use results of
> > -already handled statements or not.  */
> > -  if (ready_stmts_end == 0
> > - && (i - stmt_index >= width || op_index < 1))
> > -   ready_stmts_end = i;
> > -
> > -  /* Now we choose operands for the next statement.  Non zero
> > -value in ready_stmts_end means here that we should use
> > -the result of already generated statements as new operand.  */
> > -  if (ready_stmts_end > 0)
> > -   {
> > - op1 = gimple_assign_lhs (stmts[stmt_index++]);
> > - if (ready_stmts_end > stmt_index)
> > -   op2 = gimple_assign_lhs (stmts[stmt_index++]);
> > - else if (op_index >= 0)
> > -   {
> > - operand_entry *oe = ops[op_index--];
> > - stmt2 = oe->stmt_to_insert;
> > - op2 = oe->op;
> > -   }
> > - else
> > -   {
> > - gcc_assert (stmt_index < i);
> > - op2 = gimple_assign_lhs (stmts[stmt_index++]);
> > -   }
> > +  /*   */
> 
> empty comment?

Added it, thanks.

> 
> > +  if (op_index > 1 && !has_fma)
> > +   swap_ops_for_binary_stmt (ops, op_index - 2);
> >
> > - if (stmt_index >= ready_stmts_end)
> > -   ready_stmts_end = 0;
> > -   }
> > -  else
> > +  for (j = 0; j < 2; j++)
> > {
> > - if (op_index > 1)
> > -   swap_ops_for_binary_stmt (ops, op_index - 2);
> > - operand_entry *oe2 = ops[op_index--];
> > - operand_entry *oe1 = ops[op_index--];
> > - op2 = oe2->op;
> > - stmt2 = oe2->stmt_to_insert;
> > - op1 = oe1->op;
> > - stmt1 = oe1->stmt_to_insert;
> > + gcc_assert (op_index >= 0);
> > + oe = ops[op_index--];
> > + tmp_op[j] = oe->op;
> > + /* If the stmt that defines operand has to be inserted, insert it
> > +before the use.  */
> > + stmt1 = oe->stmt_to_insert;
> > + if (stmt1)
> > +   insert_stmt_before_use (stmts[i], stmt1);
> > + stmt1 = NULL;
> > }
> > -
> > -  /* If we emit the last statement then we should put
> > -operands into the last statement.  It will also
> > -break the loop.  */
> > -  if (op_index < 0 && stmt_index == i)
> > -   i = stmt_num - 1;
> > +  stmts[i] = build_and_add_sum (TREE_TYPE (last_rhs1), tmp_op[1],
> tmp_op[0], opcode);
> > +  gimple_set_visited (stmts[i], true);
> >
> >if (dump_file && (dump_flags & TDF_DETAILS))
> > {
> > - fprintf (dump_file, "Transforming ");
> > + fprintf (dump_file, " into ");
> >   print_gimple_stmt (dump_file, stmts[i], 0);
> > }
> > +}
> >
> > -  /* If the stmt that defines operand has to be inserted, insert it
> > -before the use.  */
> > -  if (stmt1)
> > -   insert_stmt_before_use (stmts[i], stmt1);
> > -  if (stmt2)
> > -   insert_stmt_before_use (stmts[i], stmt2);
> > -  stmt1 = stmt2 = NULL;
> > -
> > -  /* We keep original statement only for the last one.  All
> > -others are recreated.  */
> > -  if (i == stmt_num - 1)
> > +  for (i = width; i < stmt_num; i++)
> > +{
> > +  /* We keep original statement only for the last one.  All others are
> > +recreated.  */
> > +  if ( op_index < 0)
> > {
> > - gimple_assign_set_rhs1 (stmts[i], op1);
> > - gimple_assign_set_rhs2 (stmts[i], op2);
> > - update_stmt (stmts[i]);
> > + if (width_count == 2)
> > +   {
> > +
> > + /* We keep original statement only for the last one.  All
> > +others are recreated.  */
> > + 

[PATCH] Handle FMA friendly in reassoc pass

2023-05-24 Thread Cui, Lili via Gcc-patches
From: Lili Cui 

Make some changes in reassoc pass to make it more friendly to fma pass later.
Using FMA instead of mult + add reduces register pressure and insruction
retired.

There are mainly two changes
1. Put no-mult ops and mult ops alternately at the end of the queue, which is
conducive to generating more fma and reducing the loss of FMA when breaking
the chain.
2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel
chains according to the given correlation width, keeping the FMA chance as
much as possible.

With the patch applied

On ICX:
507.cactuBSSN_r: Improved by 1.7% for multi-copy .
503.bwaves_r   : Improved by  0.60% for single copy .
507.cactuBSSN_r: Improved by  1.10% for single copy .
519.lbm_r  : Improved by  2.21% for single copy .
no measurable changes for other benchmarks.

On aarch64
507.cactuBSSN_r: Improved by 1.7% for multi-copy.
503.bwaves_r   : Improved by 6.00% for single-copy.
no measurable changes for other benchmarks.

TEST1:

float
foo (float a, float b, float c, float d, float *e)
{
   return  *e  + a * b + c * d ;
}

For "-Ofast -mfpmath=sse -mfma" GCC generates:
vmulss  %xmm3, %xmm2, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vaddss  (%rdi), %xmm0, %xmm0
ret

With this patch GCC generates:
vfmadd213ss   (%rdi), %xmm1, %xmm0
vfmadd231ss   %xmm2, %xmm3, %xmm0
ret

TEST2:

for (int i = 0; i < N; i++)
{
  a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i] + 
m[i]* o[i] + p[i];
}

For "-Ofast -mfpmath=sse -mfma"  GCC generates:
vmovapd e(%rax), %ymm4
vmulpd  d(%rax), %ymm4, %ymm3
addq$32, %rax
vmovapd c-32(%rax), %ymm5
vmovapd j-32(%rax), %ymm6
vmulpd  h-32(%rax), %ymm6, %ymm2
vmovapd a-32(%rax), %ymm6
vaddpd  p-32(%rax), %ymm6, %ymm0
vmovapd g-32(%rax), %ymm7
vfmadd231pd b-32(%rax), %ymm5, %ymm3
vmovapd o-32(%rax), %ymm4
vmulpd  m-32(%rax), %ymm4, %ymm1
vmovapd l-32(%rax), %ymm5
vfmadd231pd f-32(%rax), %ymm7, %ymm2
vfmadd231pd k-32(%rax), %ymm5, %ymm1
vaddpd  %ymm3, %ymm0, %ymm0
vaddpd  %ymm2, %ymm0, %ymm0
vaddpd  %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq$8192, %rax
jne .L4
vzeroupper
ret

with this patch applied GCC breaks the chain with width = 2 and generates 6 fma:

vmovapd a(%rax), %ymm2
vmovapd c(%rax), %ymm0
addq$32, %rax
vmovapd e-32(%rax), %ymm1
vmovapd p-32(%rax), %ymm5
vmovapd g-32(%rax), %ymm3
vmovapd j-32(%rax), %ymm6
vmovapd l-32(%rax), %ymm4
vmovapd o-32(%rax), %ymm7
vfmadd132pd b-32(%rax), %ymm2, %ymm0
vfmadd132pd d-32(%rax), %ymm5, %ymm1
vfmadd231pd f-32(%rax), %ymm3, %ymm0
vfmadd231pd h-32(%rax), %ymm6, %ymm1
vfmadd231pd k-32(%rax), %ymm4, %ymm0
vfmadd231pd m-32(%rax), %ymm7, %ymm1
vaddpd  %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq$8192, %rax
jne .L2
vzeroupper
ret

gcc/ChangeLog:

PR gcc/98350
* tree-ssa-reassoc.cc
(rewrite_expr_tree_parallel): Rewrite this function.
(rank_ops_for_fma): New.
(reassociate_bb): Handle new function.

gcc/testsuite/ChangeLog:

PR gcc/98350
* gcc.dg/pr98350-1.c: New test.
* gcc.dg/pr98350-2.c: Ditto.
---
 gcc/testsuite/gcc.dg/pr98350-1.c |  31 
 gcc/testsuite/gcc.dg/pr98350-2.c |  11 ++
 gcc/tree-ssa-reassoc.cc  | 256 +--
 3 files changed, 215 insertions(+), 83 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr98350-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr98350-2.c

diff --git a/gcc/testsuite/gcc.dg/pr98350-1.c b/gcc/testsuite/gcc.dg/pr98350-1.c
new file mode 100644
index 000..6bcf78a19ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98350-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast  -fdump-tree-widening_mul" } */
+
+/* Test that the compiler properly optimizes multiply and add 
+   to generate more FMA instructions.  */
+#define N 1024
+double a[N];
+double b[N];
+double c[N];
+double d[N];
+double e[N];
+double f[N];
+double g[N];
+double h[N];
+double j[N];
+double k[N];
+double l[N];
+double m[N];
+double o[N];
+double p[N];
+
+
+void
+foo (void)
+{
+  for (int i = 0; i < N; i++)
+  {
+a[i] += b[i] * c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * 
l[i] + m[i]* o[i] + p[i];
+  }
+}
+/* { dg-final { scan-tree-dump-times { = \.FMA \(} 6 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr98350-2.c b/gcc/testsuite/gcc.dg/pr98350-2.c
new file mode 100644
index 000..333d34f026a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98350-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-widening_mul" } */
+
+/* 

[Bug target/109927] Bootstrap fails for m68k in stage2 compilation of gimple-match.cc

2023-05-24 Thread userm57 at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109927

--- Comment #18 from Stan Johnson  ---
$ git clone git://gcc.gnu.org/git/gcc.git
$ cd gcc
$ git checkout master

I'm testing a manual bootstrap of "gcc version 14.0.0 20230524 (experimental)
(GCC)" now, accessed via git as shown above.

It will still take about 24 more hours for the bootstrap to finish (I'll send
an update if it fails), but with gimple-match.cc (and generic-match.cc, which
was not affected in my tests) split up, it looks like it will finish ok (it's
currently in about the middle of stage 2 and has successfully compiled all the
gimple-match-n.cc files).

Note that Gentoo's emerge of gcc-13 behaves a little differently than a manual
bootstrap. I don't know why, since I think I'm using Gentoo's ./configure
options in the manual bootstrap, but in Gentoo's emerge of gcc, they seem to
run cc1plus and "as" simultaneously for each compilation, perhaps aggravating
the memory issue for gimple-match.cc (or maybe not, since the problem is
virtual memory exhausted, not swap space exhausted).

Anyway, it looks like the solution was already close. Does anyone know whether
the change will be backported to gcc-12 or gcc-13 available from
ftp.gnu.org/pub/gnu/gcc?

Thanks to all of the GNU developers who continue to make modern tools available
for use on old hardware!

Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-24 Thread Jeff Law




On 5/24/23 17:12, Vineet Gupta wrote:



On 5/24/23 15:13, Vineet Gupta wrote:


PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-times mul\t 1
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not div\t
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not rem\t
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp completed in 60 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp ...
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp.

ERROR: tcl error code NONE
ERROR: torture-init: torture_without_loops is not empty as expected
    while executing
"error "torture-init: torture_without_loops is not empty as expected""
    invoked from within
"if [info exists torture_without_loops] {
    error "torture-init: torture_without_loops is not empty as expected"
    }"
    (procedure "torture-init" line 4)
    invoked from within
"torture-init"
    (file 
"/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp" line 42)

    invoked from within
"source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    invoked from within
"catch "uplevel #0 source $test_file_name" msg"
UNRESOLVED: testcase 
'/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp' aborted due to Tcl error
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp completed in 0 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/rl78/rl78.exp ...

...



Never mind. Looks like I found the issue - with just trial and error and 
no idea of how this stuff works.

The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.

Trial and error is how I think most of us deal with the TCL insanity.

I have found send_user (aka printf debugging) to be quite helpful 
through the years.  There's also verbosity and trace options, but they 
can be painfully hard to interpret.


jeff



[RFC] RISC-V: Eliminate extension after for *w instructions

2023-05-24 Thread Jivan Hakobyan via Gcc-patches
`This patch tries to prevent generating unnecessary sign extension
after *w instructions like "addiw" or "divw".

The main idea of it is to add SUBREG_PROMOTED fields during expanding.

I have tested on SPEC2017 there is no regression.
Only gcc.dg/pr30957-1.c test failed.
To solve that I did some changes in loop-iv.cc, but not sure that it is
suitable.


gcc/ChangeLog:
* config/riscv/bitmanip.md (rotrdi3): New pattern.
(rotrsi3): Likewise.
(rotlsi3): Likewise.
* config/riscv/riscv-protos.h (riscv_emit_binary): New function
declaration
* config/riscv/riscv.cc (riscv_emit_binary): Removed static
* config/riscv/riscv.md (addsi3): New pattern
(subsi3): Likewise.
(negsi2): Likewise.
(mulsi3): Likewise.
(si3): New pattern for any_div.
(si3): New pattern for any_shift.
* loop-iv.cc (get_biv_step_1):  Process src of extension when it
PLUS

gcc/testsuite/ChangeLog:
* testsuite/gcc.target/riscv/shift-and-2.c: New test
* testsuite/gcc.target/riscv/shift-shift-2.c: New test
* testsuite/gcc.target/riscv/sign-extend.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-03.c: New test


-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 96d31d92670b27d495dc5a9fbfc07e8767f40976..0430af7c95b1590308648dc4d5aaea78ada71760 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -304,9 +304,9 @@
   [(set_attr "type" "bitmanip,load")
(set_attr "mode" "HI")])
 
-(define_expand "rotr3"
-  [(set (match_operand:GPR 0 "register_operand")
-	(rotatert:GPR (match_operand:GPR 1 "register_operand")
+(define_expand "rotrdi3"
+  [(set (match_operand:DI 0 "register_operand")
+	(rotatert:DI (match_operand:DI 1 "register_operand")
 		 (match_operand:QI 2 "arith_operand")))]
   "TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB"
 {
@@ -322,6 +322,26 @@
   "ror%i2%~\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_expand "rotrsi3"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (rotatert:SI (match_operand:SI 1 "register_operand" "r")
+(match_operand:QI 2 "arith_operand" "rI")))]
+  "TARGET_ZBB || TARGET_ZBKB || TARGET_XTHEADBB"
+{
+  if (TARGET_XTHEADBB && !immediate_operand (operands[2], VOIDmode))
+FAIL;
+  if (TARGET_64BIT && register_operand(operands[2], QImode))
+{
+  rtx t = gen_reg_rtx (DImode);
+  emit_insn (gen_rotrsi3_sext (t, operands[1], operands[2]));
+  t = gen_lowpart (SImode, t);
+  SUBREG_PROMOTED_VAR_P (t) = 1;
+  SUBREG_PROMOTED_SET (t, SRP_SIGNED);
+  emit_move_insn (operands[0], t);
+  DONE;
+}
+})
+
 (define_insn "*rotrdi3"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(rotatert:DI (match_operand:DI 1 "register_operand" "r")
@@ -330,7 +350,7 @@
   "ror%i2\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
-(define_insn "*rotrsi3_sext"
+(define_insn "rotrsi3_sext"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(sign_extend:DI (rotatert:SI (match_operand:SI 1 "register_operand" "r")
  (match_operand:QI 2 "arith_operand" "rI"]
@@ -338,7 +358,7 @@
   "ror%i2%~\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
-(define_insn "rotlsi3"
+(define_insn "*rotlsi3"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(rotate:SI (match_operand:SI 1 "register_operand" "r")
 		   (match_operand:QI 2 "register_operand" "r")))]
@@ -346,6 +366,24 @@
   "rol%~\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_expand "rotlsi3"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (rotate:SI (match_operand:SI 1 "register_operand" "r")
+  (match_operand:QI 2 "register_operand" "r")))]
+  "TARGET_ZBB || TARGET_ZBKB"
+{
+  if (TARGET_64BIT)
+{
+  rtx t = gen_reg_rtx (DImode);
+  emit_insn (gen_rotlsi3_sext (t, operands[1], operands[2]));
+  t = gen_lowpart (SImode, t);
+  SUBREG_PROMOTED_VAR_P (t) = 1;
+  SUBREG_PROMOTED_SET (t, SRP_SIGNED);
+  emit_move_insn (operands[0], t);
+  DONE;
+}
+})
+
 (define_insn "rotldi3"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(rotate:DI (match_operand:DI 1 "register_operand" "r")
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 36419c95bbd8eebcb499ae0e02ca7aafde6c879f..de16ffd607e8e004e9b98ee9e25e4f3693818762 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -61,6 +61,7 @@ extern const char *riscv_output_return ();
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx);
+extern rtx riscv_emit_binary (enum rtx_code code, rtx dest, rtx x, rtx y);
 #endif
 extern bool riscv_expand_conditional_move (rtx, rtx, rtx, rtx);
 extern rtx 

Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-24 Thread Palmer Dabbelt

On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:



On 5/24/23 15:13, Vineet Gupta wrote:


PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (test for excess errors)
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects   scan-assembler-times mul\t 1
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects   scan-assembler-not div\t
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects   scan-assembler-not rem\t
testcase
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp
completed in 60 seconds
Running
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
...
ERROR: tcl error sourcing
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp.
ERROR: tcl error code NONE
ERROR: torture-init: torture_without_loops is not empty as expected
    while executing
"error "torture-init: torture_without_loops is not empty as expected""
    invoked from within
"if [info exists torture_without_loops] {
    error "torture-init: torture_without_loops is not empty as expected"
    }"
    (procedure "torture-init" line 4)
    invoked from within
"torture-init"
    (file
"/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"
line 42)
    invoked from within
"source
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name" msg"
UNRESOLVED: testcase
'/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp'
aborted due to Tcl error
testcase
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
completed in 0 seconds
Running
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/rl78/rl78.exp
...
...



Never mind. Looks like I found the issue - with just trial and error and
no idea of how this stuff works.
The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.


Thanks!



-Vineet


Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-24 Thread Vineet Gupta




On 5/24/23 15:13, Vineet Gupta wrote:


PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-times mul\t 1
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not div\t
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not rem\t
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp 
completed in 60 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
...
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp.

ERROR: tcl error code NONE
ERROR: torture-init: torture_without_loops is not empty as expected
    while executing
"error "torture-init: torture_without_loops is not empty as expected""
    invoked from within
"if [info exists torture_without_loops] {
    error "torture-init: torture_without_loops is not empty as expected"
    }"
    (procedure "torture-init" line 4)
    invoked from within
"torture-init"
    (file 
"/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp" 
line 42)

    invoked from within
"source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    invoked from within
"catch "uplevel #0 source $test_file_name" msg"
UNRESOLVED: testcase 
'/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp' 
aborted due to Tcl error
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
completed in 0 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/rl78/rl78.exp 
...

...



Never mind. Looks like I found the issue - with just trial and error and 
no idea of how this stuff works.

The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.

-Vineet


gcc-10-20230524 is now available

2023-05-24 Thread GCC Administrator via Gcc
Snapshot gcc-10-20230524 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20230524/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision 40b2f2a62ce65c425df904b2307803b5ea203991

You'll find:

 gcc-10-20230524.tar.xz   Complete GCC

  SHA256=39ecfb226832768a85a4baac01b42b94ff426429f55d649bc93c265f9dbad0e8
  SHA1=3b59140e7c5d368cb31a8fcd114f8f28a11a712a

Diffs from 10-20230517 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

--- Comment #3 from Andrew Pinski  ---
here is another related testcase but this was the exactly reduced one from
bitmap_single_bit_set_p :

```
_Bool f(unsigned a, int t)
{
  void g(void);
  if (t)
return 0;
  g();
  if (a > 1)
return 0;
  return a == 1;
}
```

this should be optimized down to:
```
_Bool f(unsigned a, int t)
{
  void g(void);
  if (t)
return 0;
  g();
  return a == 1;
}
```

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

--- Comment #2 from Andrew Pinski  ---
I should note I found this while looking at code generation of
bitmap_single_bit_set_p after a match pattern addition.

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

Andrew Pinski  changed:

   What|Removed |Added

Summary|`(a > 1) ? 0 : (a == 1)` is |`(a > 1) ? 0 : (a == 1)` is
   |not optimized when spelled  |not optimized when spelled
   |out |out at -O2+

--- Comment #1 from Andrew Pinski  ---
I should say this at -O2.

part of the reason is VRP changes `a == 1` to be `(bool)a` and then phiopt
comes along and decides to factor out the conversion (phiopt did that even
before my recent changes).

at -O1, it is actually optimized during reassoc1 (because the above is not
done) since GCC 7.

Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-24 Thread Vineet Gupta

On 5/24/23 13:34, Thomas Schwinge wrote:

Yeah, at this point I'm not sure whether my recent changes really are
related/relevant here.


Apparently in addition to Kito's patch below, If I comment out the
additional torture options, failures go down drastically.

Meaning that *all* those ERRORs disappear?


No but they reduced significantly. Anyhow I think the issue should be 
simple enough for someone familiar with how the tcl stuff works...





diff --git a/gcc/testsuite/gcc.target/riscv/riscv.exp
b/gcc/testsuite/gcc.target/riscv/riscv.exp

-lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
+#lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}

@Thomas, do you have some thoughts on how to fix riscv.exp properly in
light of recent changes to exp files.

I'm trying to understand this, but so far don't.  Can I please see a
complete 'gcc.log' file where the ERRORs are visible?


So we are at bleeding edge gcc from today
 2023-05-24 ec2e86274427 Fortran: reject bad DIM argument of SIZE 
intrinsic in simplification [PR104350]


With an additional fix from Kito along the lines of..

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp


 dg-init
+torture-init

 # All done.
+torture-finish
 dg-finish

I'm pasting a snippet of gcc.log. Issue is indeed triggered by rvv.exp 
which needs some love.



PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-times mul\t 1
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not div\t
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not rem\t
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp 
completed in 60 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
...
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp.

ERROR: tcl error code NONE
ERROR: torture-init: torture_without_loops is not empty as expected
    while executing
"error "torture-init: torture_without_loops is not empty as expected""
    invoked from within
"if [info exists torture_without_loops] {
    error "torture-init: torture_without_loops is not empty as expected"
    }"
    (procedure "torture-init" line 4)
    invoked from within
"torture-init"
    (file 
"/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp" 
line 42)

    invoked from within
"source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    invoked from within
"catch "uplevel #0 source $test_file_name" msg"
UNRESOLVED: testcase 
'/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp' 
aborted due to Tcl error
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
completed in 0 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/rl78/rl78.exp 
...

...



[Bug tree-optimization/109959] New: `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

Bug ID: 109959
   Summary: `(a > 1) ? 0 : (a == 1)` is not optimized when spelled
out
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
_Bool f(unsigned a)
{
if (a > 1)
  return 0;
return a == 1;
}


_Bool f0(unsigned a)
{
  return (a > 1) ? 0 : (a == 1);
}
```
Both of these should just optimize to:
`return a == 1`, f0 is currently.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #7 from joseph at codesourcery dot com  ---
I suppose the question is how to interpret "the longest array (with the 
same element type) that would not make the structure larger than the 
object being accessed".  The difficulty of interpreting "make the 
structure larger" in terms of including post-array padding in the 
replacement structure is that there might not be a definition of what that 
post-array padding should be given the offset of the array need not be the 
same as the offset with literal replacement in the struct definition.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #6 from joseph at codesourcery dot com  ---
For the standard, dynamically allocated case, you should only need to 
allocate enough memory to contain the initial part of the struct and the 
array members being accessed - not any padding after that array.  (There 
were wording problems before C99 TC2; see DR#282.)

[COMMITTED 2/4] - Make ssa_cache a range_query.

2023-05-24 Thread Andrew MacLeod via Gcc-patches
By having an ssa_cache inherit from a range_query, and then providing a 
range_of_expr routine which returns the current global value, we open up 
the possibility of folding statements and doing other interesting things 
with an ssa-cache.


In particular, you can now call fold_range()  with an ssa-range cache 
and fold a stmt by retrieving the values which are stored in the cache.


This patch also provides a ranger object with a  const_query() method 
which will allow access to the current global ranges ranger knows for 
folding.   There are times where we use get_global_range_query(), but 
we'd actually get more accuarte results if we have a ranger and use 
const_query ().    const_query should be  a superset of what 
get_global_range_query knows.


There is 0 performance impact.

Bootstraps on x86_64-pc-linux-gnu  with no regressions.  Pushed.

Andrew


From be6e6b93cc5d42a09a1f2be26dfdf7e3f897d296 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 24 May 2023 09:06:26 -0400
Subject: [PATCH 2/4] Make ssa_cache a range_query.

By providing range_of_expr as a range_query, we can fold and do other
interesting things using values from the global table.  Make ranger's
knonw globals available via const_query.

	* gimple-range-cache.cc (ssa_cache::range_of_expr): New.
	* gimple-range-cache.h (class ssa_cache): Inherit from range_query.
	(ranger_cache::const_query): New.
	* gimple-range.cc (gimple_ranger::const_query): New.
	* gimple-range.h (gimple_ranger::const_query): New prototype.
---
 gcc/gimple-range-cache.cc | 14 ++
 gcc/gimple-range-cache.h  |  5 -
 gcc/gimple-range.cc   |  8 
 gcc/gimple-range.h|  1 +
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index f25abaffd34..52165d2405b 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -545,6 +545,20 @@ ssa_cache::~ssa_cache ()
   delete m_range_allocator;
 }
 
+// Enable a query to evaluate staements/ramnges based on picking up ranges
+// from just an ssa-cache.
+
+bool
+ssa_cache::range_of_expr (vrange , tree expr, gimple *stmt)
+{
+  if (!gimple_range_ssa_p (expr))
+return get_tree_range (r, expr, stmt);
+
+  if (!get_range (r, expr))
+gimple_range_global (r, expr, cfun);
+  return true;
+}
+
 // Return TRUE if the global range of NAME has a cache entry.
 
 bool
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 4fc98230430..afcf8d7de7b 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -52,7 +52,7 @@ private:
 // has been visited during this incarnation.  Once the ranger evaluates
 // a name, it is typically not re-evaluated again.
 
-class ssa_cache
+class ssa_cache : public range_query
 {
 public:
   ssa_cache ();
@@ -63,6 +63,8 @@ public:
   virtual void clear_range (tree name);
   virtual void clear ();
   void dump (FILE *f = stderr);
+  virtual bool range_of_expr (vrange , tree expr, gimple *stmt);
+
 protected:
   vec m_tab;
   vrange_allocator *m_range_allocator;
@@ -103,6 +105,7 @@ public:
   bool get_global_range (vrange , tree name) const;
   bool get_global_range (vrange , tree name, bool _p);
   void set_global_range (tree name, const vrange , bool changed = true);
+  range_query _query () { return m_globals; }
 
   void propagate_updated_value (tree name, basic_block bb);
 
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 4fae3f95e6a..01e62d3ff39 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -70,6 +70,14 @@ gimple_ranger::~gimple_ranger ()
   m_stmt_list.release ();
 }
 
+// Return a range_query which accesses just the known global values.
+
+range_query &
+gimple_ranger::const_query ()
+{
+  return m_cache.const_query ();
+}
+
 bool
 gimple_ranger::range_of_expr (vrange , tree expr, gimple *stmt)
 {
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index e3aa9475f5e..6587e4923ff 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -64,6 +64,7 @@ public:
   bool fold_stmt (gimple_stmt_iterator *gsi, tree (*) (tree));
   void register_inferred_ranges (gimple *s);
   void register_transitive_inferred_ranges (basic_block bb);
+  range_query _query ();
 protected:
   bool fold_range_internal (vrange , gimple *s, tree name);
   void prefill_name (vrange , tree name);
-- 
2.40.1



[COMMITTED 4/4] - Gimple range PHI analyzer and testcases

2023-05-24 Thread Andrew MacLeod via Gcc-patches

This patch provide the framework for a gimple-range phi analyzer.

Currently, the  primary purpose is to give better initial values for 
members of a "phi group"


a PHI group is defined as a a group of PHI nodes whose arguments are all 
either members of the same PHI group, or one of 2 other values:

 - An initializer, (typically a constant), but not necessarily,
 - A modifier, which is always of the form:   member_ssa = member_ssa 
OP op2


When the analyzer finds a group which matches this pattern, it tries to 
evaluate the modifier using the initial value and project a range for 
the entire group.


This initial version is fairly simplistic.  It looks for 2 things:

1) if there is a relation between LHS and the other ssa_name in the 
modifier, then we can project a range. ie,

    a_3 = a_2 + 1
if there is a relation generated by the stmt which say a_3 > a_2, and 
the initial value is 0, we can project a range of [0, +INF] as the 
moifier will cause the value to always increase, and not wrap.


Likewise, for a_3 = a_2 - 1,  we can project a range of [-INF, 0] based 
on the "<" relationship between a_3 and a_2.


2) If there is no relationship, then we use the initial range and 
"simulate" the modifier statement a set number of times looking to see 
if the value converges.
Currently I have arbitrarily hard coded 10 attempts, but intend to 
change this down the road with a --param, as well as to perhaps 
influence it with any known values from SCEV regarding known iterations 
of the loop and possibly change it based on optimization levels.


I also suspect something like one more than the number of bits in the 
type might help with any bitmasking tricks.


Theres a lot of additinal things we can do to enhance this, but this 
framework provides a start.  These 2 initial evaluations fix 107822, and 
part of 107986.


 There is about a 1.5% slowdown to VRP to invoke and utilize the 
analyzer in all 3 passes of VRP.  overall compile time is 0.06% slower.


Bootstraps on x86_64-pc-linux-gnu  with no regressions.  Pushed.

Andrew




From 64e844c1182198e49d33f9fa138b9a782371225d Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 24 May 2023 09:52:26 -0400
Subject: [PATCH 4/4] Gimple range PHI analyzer and testcases

Provide a PHI analyzer framework to provive better initial values for
PHI nodes which formk groups with initial values and single statements
which modify the PHI values in some predicatable way.

	PR tree-optimization/107822
	PR tree-optimization/107986
	gcc/
	* Makefile.in (OBJS): Add gimple-range-phi.o.
	* gimple-range-cache.h (ranger_cache::m_estimate): New
	phi_analyzer pointer member.
	* gimple-range-fold.cc (fold_using_range::range_of_phi): Use
	phi_analyzer if no loop info is available.
	* gimple-range-phi.cc: New file.
	* gimple-range-phi.h: New file.
	* tree-vrp.cc (execute_ranger_vrp): Utililze a phi_analyzer.

	gcc/testsuite/
	* gcc.dg/pr107822.c: New.
	* gcc.dg/pr107986-1.c: New.
---
 gcc/Makefile.in   |   1 +
 gcc/gimple-range-cache.h  |   2 +
 gcc/gimple-range-fold.cc  |  27 ++
 gcc/gimple-range-phi.cc   | 518 ++
 gcc/gimple-range-phi.h| 109 +++
 gcc/testsuite/gcc.dg/pr107822.c   |  20 ++
 gcc/testsuite/gcc.dg/pr107986-1.c |  16 +
 gcc/tree-vrp.cc   |   7 +-
 8 files changed, 699 insertions(+), 1 deletion(-)
 create mode 100644 gcc/gimple-range-phi.cc
 create mode 100644 gcc/gimple-range-phi.h
 create mode 100644 gcc/testsuite/gcc.dg/pr107822.c
 create mode 100644 gcc/testsuite/gcc.dg/pr107986-1.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index bb63b5c501d..1d39e6dd3f8 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1454,6 +1454,7 @@ OBJS = \
 	gimple-range-gori.o \
 	gimple-range-infer.o \
 	gimple-range-op.o \
+	gimple-range-phi.o \
 	gimple-range-trace.o \
 	gimple-ssa-backprop.o \
 	gimple-ssa-isolate-paths.o \
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index afcf8d7de7b..93d16294d2e 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "gimple-range-gori.h" 
 #include "gimple-range-infer.h"
+#include "gimple-range-phi.h"
 
 // This class manages a vector of pointers to ssa_block ranges.  It
 // provides the basis for the "range on entry" cache for all
@@ -136,6 +137,7 @@ private:
   void exit_range (vrange , tree expr, basic_block bb, enum rfd_mode);
   bool edge_range (vrange , edge e, tree name, enum rfd_mode);
 
+  phi_analyzer *m_estimate;
   vec m_workback;
   class update_list *m_update;
 };
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 4df065c8a6e..173d9f386c5 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -934,6 +934,7 @@ fold_using_range::range_of_phi (vrange , gphi *phi, fur_source )
 	  }
   }
 
+  bool loop_info_p = false;
   // If SCEV is available, query if this PHI has any 

[COMMITTED 3/4] Provide relation queries for a stmt.

2023-05-24 Thread Andrew MacLeod via Gcc-patches
This tweaks someof the fold_stmt routines and helpers.. in particular 
the ones which you provide a vector of ranges to to satisfy any ssa-names.


Previously, once the vector was depleted, any remaining values were 
picked up from the default get_global_range_query() query. It is useful 
to be able to speiocyf your own range_query to these routines, as most 
fo the other fold_stmt routines allow.


This patch changes it so the default doesnt change, but you can 
optionally specify your own range_query to the routines.


It also provides a new routine:

    relation_trio fold_relations (gimple *s, range_query *q)

Which instead of folding a stmt, will return a relation trio based on 
folding the stmt with the range_query.  The relation trio will let you 
know if the statement causes a relation between LHS-OP1,  LHS_OP2, or 
OP1_OP2...  so for something like

   a_3 = b_4 + 6
based on known ranges and types, we might get back (LHS  > OP1)

It just provides  a generic interface into what relations a statement 
may provide based on what a range_query returns for values and the stmt 
itself.


There is no performance impact.

Bootstraps on x86_64-pc-linux-gnu  with no regressions.  Pushed.

Andrew


From 933e14dc613269641ffe3613bf4792ac50590275 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 24 May 2023 09:17:32 -0400
Subject: [PATCH 3/4] Provide relation queries for a stmt.

Allow fur_list and fold_stmt to be provided a range_query rather than
always defaultsing to NULL (which becomes a global query).
Also provide a fold_relations () routine which can provide a range_trio
for an arbitrary statement using any range_query

	* gimple-range-fold.cc (fur_list::fur_list): Add range_query param
	to contructors.
	(fold_range): Add range_query parameter.
	(fur_relation::fur_relation): New.
	(fur_relation::trio): New.
	(fur_relation::register_relation): New.
	(fold_relations): New.
	* gimple-range-fold.h (fold_range): Adjust prototypes.
	(fold_relations): New.
---
 gcc/gimple-range-fold.cc | 128 +++
 gcc/gimple-range-fold.h  |  11 +++-
 2 files changed, 124 insertions(+), 15 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 96cbd799488..4df065c8a6e 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -214,9 +214,9 @@ fur_depend::register_relation (edge e, relation_kind k, tree op1, tree op2)
 class fur_list : public fur_source
 {
 public:
-  fur_list (vrange );
-  fur_list (vrange , vrange );
-  fur_list (unsigned num, vrange **list);
+  fur_list (vrange , range_query *q = NULL);
+  fur_list (vrange , vrange , range_query *q = NULL);
+  fur_list (unsigned num, vrange **list, range_query *q = NULL);
   virtual bool get_operand (vrange , tree expr) override;
   virtual bool get_phi_operand (vrange , tree expr, edge e) override;
 private:
@@ -228,7 +228,7 @@ private:
 
 // One range supplied for unary operations.
 
-fur_list::fur_list (vrange ) : fur_source (NULL)
+fur_list::fur_list (vrange , range_query *q) : fur_source (q)
 {
   m_list = m_local;
   m_index = 0;
@@ -238,7 +238,7 @@ fur_list::fur_list (vrange ) : fur_source (NULL)
 
 // Two ranges supplied for binary operations.
 
-fur_list::fur_list (vrange , vrange ) : fur_source (NULL)
+fur_list::fur_list (vrange , vrange , range_query *q) : fur_source (q)
 {
   m_list = m_local;
   m_index = 0;
@@ -249,7 +249,8 @@ fur_list::fur_list (vrange , vrange ) : fur_source (NULL)
 
 // Arbitrary number of ranges in a vector.
 
-fur_list::fur_list (unsigned num, vrange **list) : fur_source (NULL)
+fur_list::fur_list (unsigned num, vrange **list, range_query *q)
+  : fur_source (q)
 {
   m_list = list;
   m_index = 0;
@@ -278,20 +279,20 @@ fur_list::get_phi_operand (vrange , tree expr, edge e ATTRIBUTE_UNUSED)
 // Fold stmt S into range R using R1 as the first operand.
 
 bool
-fold_range (vrange , gimple *s, vrange )
+fold_range (vrange , gimple *s, vrange , range_query *q)
 {
   fold_using_range f;
-  fur_list src (r1);
+  fur_list src (r1, q);
   return f.fold_stmt (r, s, src);
 }
 
 // Fold stmt S into range R using R1  and R2 as the first two operands.
 
 bool
-fold_range (vrange , gimple *s, vrange , vrange )
+fold_range (vrange , gimple *s, vrange , vrange , range_query *q)
 {
   fold_using_range f;
-  fur_list src (r1, r2);
+  fur_list src (r1, r2, q);
   return f.fold_stmt (r, s, src);
 }
 
@@ -299,10 +300,11 @@ fold_range (vrange , gimple *s, vrange , vrange )
 // operands encountered.
 
 bool
-fold_range (vrange , gimple *s, unsigned num_elements, vrange **vector)
+fold_range (vrange , gimple *s, unsigned num_elements, vrange **vector,
+	range_query *q)
 {
   fold_using_range f;
-  fur_list src (num_elements, vector);
+  fur_list src (num_elements, vector, q);
   return f.fold_stmt (r, s, src);
 }
 
@@ -326,6 +328,108 @@ fold_range (vrange , gimple *s, edge on_edge, range_query *q)
   return f.fold_stmt (r, s, src);
 }
 
+// Provide a fur_source which can be used 

[COMMITTED 1/4] - Make ssa_cache and ssa_lazy_cache virtual.

2023-05-24 Thread Andrew MacLeod via Gcc-patches
I originally implemented the lazy ssa cache by inheriting from an 
ssa_cache in protected mode and providing the required routines. This 
makes it a little awkward to do various things, and they also become not 
quite as interchangeable as I'd like.   Making the routines virtual and 
using proper inheritance will avoid an inevitable issue down the road, 
and allows me to remove the printing hack which provided a protected 
output routine.


Overall performance impact is pretty negligible, so lets just clean it up.

Bootstraps on x86_64-pc-linux-gnu  with no regressions.  Pushed.

Andrew

From 3079056d0b779b907f8adc01d48a8aa495b8a661 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 24 May 2023 08:49:30 -0400
Subject: [PATCH 1/4] Make ssa_cache and ssa_lazy_cache virtual.

Making them virtual allows us to interchangebly use the caches.

	* gimple-range-cache.cc (ssa_cache::dump): Use get_range.
	(ssa_cache::dump_range_query): Delete.
	(ssa_lazy_cache::dump_range_query): Delete.
	(ssa_lazy_cache::get_range): Move from header file.
	(ssa_lazy_cache::clear_range): ditto.
	(ssa_lazy_cache::clear): Ditto.
	* gimple-range-cache.h (class ssa_cache): Virtualize.
	(class ssa_lazy_cache): Inherit and virtualize.
---
 gcc/gimple-range-cache.cc | 43 +++
 gcc/gimple-range-cache.h  | 37 ++---
 2 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index e069241bc9d..f25abaffd34 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -626,7 +626,7 @@ ssa_cache::dump (FILE *f)
   // Invoke dump_range_query which is a private virtual version of
   // get_range.   This avoids performance impacts on general queries,
   // but allows sharing of the dump routine.
-  if (dump_range_query (r, ssa_name (x)) && !r.varying_p ())
+  if (get_range (r, ssa_name (x)) && !r.varying_p ())
 	{
 	  if (print_header)
 	{
@@ -648,23 +648,14 @@ ssa_cache::dump (FILE *f)
 fputc ('\n', f);
 }
 
-// Virtual private get_range query for dumping.
+// Return true if NAME has an active range in the cache.
 
 bool
-ssa_cache::dump_range_query (vrange , tree name) const
+ssa_lazy_cache::has_range (tree name) const
 {
-  return get_range (r, name);
+  return bitmap_bit_p (active_p, SSA_NAME_VERSION (name));
 }
 
-// Virtual private get_range query for dumping.
-
-bool
-ssa_lazy_cache::dump_range_query (vrange , tree name) const
-{
-  return get_range (r, name);
-}
-
-
 // Set range of NAME to R in a lazy cache.  Return FALSE if it did not already
 // have a range.
 
@@ -684,6 +675,32 @@ ssa_lazy_cache::set_range (tree name, const vrange )
   return false;
 }
 
+// Return TRUE if NAME has a range, and return it in R.
+
+bool
+ssa_lazy_cache::get_range (vrange , tree name) const
+{
+  if (!bitmap_bit_p (active_p, SSA_NAME_VERSION (name)))
+return false;
+  return ssa_cache::get_range (r, name);
+}
+
+// Remove NAME from the active range list.
+
+void
+ssa_lazy_cache::clear_range (tree name)
+{
+  bitmap_clear_bit (active_p, SSA_NAME_VERSION (name));
+}
+
+// Remove all ranges from the active range list.
+
+void
+ssa_lazy_cache::clear ()
+{
+  bitmap_clear (active_p);
+}
+
 // --
 
 
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 871255a8116..4fc98230430 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -57,14 +57,13 @@ class ssa_cache
 public:
   ssa_cache ();
   ~ssa_cache ();
-  bool has_range (tree name) const;
-  bool get_range (vrange , tree name) const;
-  bool set_range (tree name, const vrange );
-  void clear_range (tree name);
-  void clear ();
+  virtual bool has_range (tree name) const;
+  virtual bool get_range (vrange , tree name) const;
+  virtual bool set_range (tree name, const vrange );
+  virtual void clear_range (tree name);
+  virtual void clear ();
   void dump (FILE *f = stderr);
 protected:
-  virtual bool dump_range_query (vrange , tree name) const;
   vec m_tab;
   vrange_allocator *m_range_allocator;
 };
@@ -72,35 +71,21 @@ protected:
 // This is the same as global cache, except it maintains an active bitmap
 // rather than depending on a zero'd out vector of pointers.  This is better
 // for sparsely/lightly used caches.
-// It could be made a fully derived class, but at this point there doesnt seem
-// to be a need to take the performance hit for it.
 
-class ssa_lazy_cache : protected ssa_cache
+class ssa_lazy_cache : public ssa_cache
 {
 public:
   inline ssa_lazy_cache () { active_p = BITMAP_ALLOC (NULL); }
   inline ~ssa_lazy_cache () { BITMAP_FREE (active_p); }
-  bool set_range (tree name, const vrange );
-  inline bool get_range (vrange , tree name) const;
-  inline void clear_range (tree name)
-{ bitmap_clear_bit (active_p, SSA_NAME_VERSION (name)); } ;
-  inline void clear () { bitmap_clear (active_p); }
-  inline 

[Bug tree-optimization/107986] [12/13/14 Regression] Bogus -Warray-bounds diagnostic with std::sort

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107986

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:1cd5bc387c453126fdb4c9400096180484ecddee

commit r14-1179-g1cd5bc387c453126fdb4c9400096180484ecddee
Author: Andrew MacLeod 
Date:   Wed May 24 09:52:26 2023 -0400

Gimple range PHI analyzer and testcases

Provide a PHI analyzer framework to provive better initial values for
PHI nodes which formk groups with initial values and single statements
which modify the PHI values in some predicatable way.

PR tree-optimization/107822
PR tree-optimization/107986
gcc/
* Makefile.in (OBJS): Add gimple-range-phi.o.
* gimple-range-cache.h (ranger_cache::m_estimate): New
phi_analyzer pointer member.
* gimple-range-fold.cc (fold_using_range::range_of_phi): Use
phi_analyzer if no loop info is available.
* gimple-range-phi.cc: New file.
* gimple-range-phi.h: New file.
* tree-vrp.cc (execute_ranger_vrp): Utililze a phi_analyzer.

gcc/testsuite/
* gcc.dg/pr107822.c: New.
* gcc.dg/pr107986-1.c: New.

[Bug tree-optimization/107822] [13/14/14 Regression] Dead Code Elimination Regression at -Os (trunk vs. 12.2.0)

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107822

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:1cd5bc387c453126fdb4c9400096180484ecddee

commit r14-1179-g1cd5bc387c453126fdb4c9400096180484ecddee
Author: Andrew MacLeod 
Date:   Wed May 24 09:52:26 2023 -0400

Gimple range PHI analyzer and testcases

Provide a PHI analyzer framework to provive better initial values for
PHI nodes which formk groups with initial values and single statements
which modify the PHI values in some predicatable way.

PR tree-optimization/107822
PR tree-optimization/107986
gcc/
* Makefile.in (OBJS): Add gimple-range-phi.o.
* gimple-range-cache.h (ranger_cache::m_estimate): New
phi_analyzer pointer member.
* gimple-range-fold.cc (fold_using_range::range_of_phi): Use
phi_analyzer if no loop info is available.
* gimple-range-phi.cc: New file.
* gimple-range-phi.h: New file.
* tree-vrp.cc (execute_ranger_vrp): Utililze a phi_analyzer.

gcc/testsuite/
* gcc.dg/pr107822.c: New.
* gcc.dg/pr107986-1.c: New.

[Bug libstdc++/109947] std::expected monadic operations do not support move-only error types yet

2023-05-24 Thread aemseemann at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109947

Martin Seemann  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #5 from Martin Seemann  ---
Thanks for the clarification! Now I am convinced that it is not a bug in
libstdc++ (although I still doubt that the side-effects were intended when the
committee formulated the "Effects" for monadic operations, but that's not
relevant here).

Marking as resolved and sorry for the noise.

Re: [PATCH] testsuite, analyzer: Fix testcases with fclose

2023-05-24 Thread David Malcolm via Gcc-patches
On Tue, 2023-05-23 at 09:34 +, Christophe Lyon wrote:
> The gcc.dg/analyzer/data-model-4.c and
> gcc.dg/analyzer/torture/conftest-1.c fail with recent glibc headers
> and succeed with older headers.
> 
> The new error message is:
> warning: use of possibly-NULL 'f' where non-null expected [CWE-690]
> [-Wanalyzer-possible-null-argument]
> 
> Like similar previous fixes in this area, this patch updates the
> testcase so that this warning isn't reported.

LGTM

Thanks
Dave

> 
> 2023-05-23  Christophe Lyon  
> 
> gcc/testsuite/
> * gcc.dg/analyzer/data-model-4.c: Exit if fopen returns NULL.
> * gcc.dg/analyzer/torture/conftest-1.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/analyzer/data-model-4.c   | 2 ++
>  gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.dg/analyzer/data-model-4.c
> b/gcc/testsuite/gcc.dg/analyzer/data-model-4.c
> index 33f90871dfb..d41868d6dbc 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/data-model-4.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/data-model-4.c
> @@ -8,6 +8,8 @@ int
>  main ()
>  {
>    FILE *f = fopen ("conftest.out", "w");
> +  if (f == NULL)
> +    return 1;
>    return ferror (f) || fclose (f) != 0;
>  
>    ;
> diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c
> b/gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c
> index 0cf85f0ebe1..9631bcf73e0 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c
> @@ -3,6 +3,8 @@ int
>  main ()
>  {
>    FILE *f = fopen ("conftest.out", "w");
> +  if (f == NULL)
> +    return 1;
>    return ferror (f) || fclose (f) != 0;
>  
>    ;



[Bug fortran/90504] Improved NORM2 algorithm

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90504

--- Comment #1 from anlauf at gcc dot gnu.org ---
(In reply to Janne Blomqvist from comment #0)
> Hanson, Hopkins, Remark on Algorithm 539: A Modern Fortran Reference
> Implementation for Carefully Computing the Euclidean Norm,
> https://dl.acm.org/citation.cfm?id=3134441
> 
> Above article tests different algorithms for NORM2 and tests performance and
> numerical accuracy.

This article is behind a paywall.

Is there a publicly available description?

[Bug fortran/87270] "FINAL" subroutine is called when compiled with "gfortran -O1", but not "gfortran -O0"

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87270

--- Comment #6 from anlauf at gcc dot gnu.org ---
All current compilers seem to give the same, apparently correct result,
even with different optimization level.

So can we close this finally?

Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-24 Thread Thomas Schwinge via Gcc-patches
Hi!

On 2023-05-24T11:18:35-0700, Vineet Gupta  wrote:
> On 5/22/23 20:52, Vineet Gupta wrote:
>> On 5/22/23 02:17, Kito Cheng wrote:
>>> Ooops, seems still some issue around here,
>>
>> Yep still 5000 fails :-(
>>
>>>   but I found something might
>>> related this issue:
>>>
>>> https://github.com/gcc-mirror/gcc/commit/d6654a4be3ba44c0d57be7c8a51d76d9721345e1
>>>  
>>>
>>> https://github.com/gcc-mirror/gcc/commit/23c49bb8d09bc3bfce9a08be637cf32ac014de56
>>>  
>>>
>>
>> It seems both of these patches are essentially doing what yours did. 
>> So something else is amiss still.

Yeah, at this point I'm not sure whether my recent changes really are
related/relevant here.

> Apparently in addition to Kito's patch below, If I comment out the 
> additional torture options, failures go down drastically.

Meaning that *all* those ERRORs disappear?

> diff --git a/gcc/testsuite/gcc.target/riscv/riscv.exp 
> b/gcc/testsuite/gcc.target/riscv/riscv.exp
>
> -lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
> +#lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
>
> @Thomas, do you have some thoughts on how to fix riscv.exp properly in 
> light of recent changes to exp files.

I'm trying to understand this, but so far don't.  Can I please see a
complete 'gcc.log' file where the ERRORs are visible?


Grüße
 Thomas


>>> On Mon, May 22, 2023 at 2:42 PM Kito Cheng  
>>> wrote:
 Hi Vineet:

 Could you help to test this patch, this could resolve that issue on our
 machine, but I would like to also work for other env.

 Thanks :)

 ---

 We got bunch of following error message for multi-lib run:

 ERROR: torture-init: torture_without_loops is not empty as expected
 ERROR: tcl error code NONE

 And seems we need torture-init and torture-finish around the test
 loop.

 gcc/testsuite/ChangeLog:

  * gcc.target/riscv/rvv/rvv.exp: Add torture-init and
  torture-finish.
 ---
   gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 3 +++
   1 file changed, 3 insertions(+)

 diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
 b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
 index bc99cc0c3cf4..19179564361a 100644
 --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
 +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
 @@ -39,6 +39,7 @@ if [istarget riscv32-*-*] then {

   # Initialize `dg'.
   dg-init
 +torture-init

   # Main loop.
   set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
 @@ -69,5 +70,7 @@ foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain 
 $srcdir/$subdir/autovec/vls-vlmax/*.\[cS\]]] \
  "-std=c99 -O3 -ftree-vectorize --param 
 riscv-autovec-preference=fixed-vlmax" $CFLAGS

 +torture-finish
 +
   # All done.
   dg-finish
 -- 
 2.40.1

>>


[Bug c++/109876] [10/11/12/13/14 Regression] initializer_list not usable in constant expressions in a template

2023-05-24 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109876

--- Comment #9 from Jason Merrill  ---
(In reply to Marek Polacek from comment #8)
> > Instead, we should probably treat num as value-dependent even though it 
> > actually isn't.
> 
> An attempt to implement that:
> 
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -27969,6 +27969,12 @@ value_dependent_expression_p (tree expression)
>else if (TYPE_REF_P (TREE_TYPE (expression)))
> /* FIXME cp_finish_decl doesn't fold reference initializers.  */
> return true;
> +  else if (DECL_DECLARED_CONSTEXPR_P (expression)
> +  && TREE_STATIC (expression)

I'd expect we could get a similar issue with non-static constexprs.

> +  && !DECL_NAMESPACE_SCOPE_P (expression)

This seems an unnecessary optimization?

> +  && DECL_INITIAL (expression)

Perhaps we also want to return true if DECL_INITIAL is null?

> +  && TREE_CODE (DECL_INITIAL (expression)) == IMPLICIT_CONV_EXPR)

Maybe !TREE_CONSTANT?

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #5 from Martin Uecker  ---
Clang bug:
https://github.com/llvm/llvm-project/issues/62929

[Bug libstdc++/109947] std::expected monadic operations do not support move-only error types yet

2023-05-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109947

--- Comment #4 from Jonathan Wakely  ---
(In reply to Martin Seemann from comment #3)
> So it comes down to how to interpret the "Effects:" clause: Does "Equivalent
> to " mean  that all restrictions of
> `value()` apply transitively or is it merely an implementation hint?

The former.  The standard says:

Whenever the Effects element specifies that the semantics of some function F
are Equivalent to some code sequence, then the various elements are interpreted
as follows. If F’s semantics specifies any Constraints or Mandates elements,
then those requirements are logically imposed prior to the equivalent-to
semantics. Next, the semantics of the code sequence are determined by the
Constraints, Mandates, Preconditions, Effects, Synchronization, Postconditions,
Returns, Throws, Complexity, Remarks, and Error conditions specified for the
function invocations contained in the code sequence. The value returned from F
is specified by F’s Returns element, or if F has no Returns element, a non-void
return from F is specified by the return statements (8.7.4) in the code
sequence. If F’s semantics contains a Throws, Postconditions, or Complexity
element, then that supersedes any occurrences of that element in the code
sequence.


> (Strangely enough, in the "Effects:" clause of `value_or()&&` the expression
> `std::move(**this)` is used  instead of `std::move(value())`. Maybe this is
> an oversight/inconsistency of the standard.)

Yes. The spec were written by different people at different times.

[Bug c++/109876] [10/11/12/13/14 Regression] initializer_list not usable in constant expressions in a template

2023-05-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109876

--- Comment #8 from Marek Polacek  ---
> Instead, we should probably treat num as value-dependent even though it 
> actually isn't.

An attempt to implement that:

--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -27969,6 +27969,12 @@ value_dependent_expression_p (tree expression)
   else if (TYPE_REF_P (TREE_TYPE (expression)))
/* FIXME cp_finish_decl doesn't fold reference initializers.  */
return true;
+  else if (DECL_DECLARED_CONSTEXPR_P (expression)
+  && TREE_STATIC (expression)
+  && !DECL_NAMESPACE_SCOPE_P (expression)
+  && DECL_INITIAL (expression)
+  && TREE_CODE (DECL_INITIAL (expression)) == IMPLICIT_CONV_EXPR)
+   return true;
   return false;

 case DYNAMIC_CAST_EXPR:

Re: [PATCH] LoongArch: Fix the problem of structure parameter passing in C++. This structure has empty structure members and less than three floating point members.

2023-05-24 Thread Jason Merrill via Gcc-patches
On Wed, May 24, 2023 at 5:00 AM Jonathan Wakely via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Wed, 24 May 2023 at 09:41, Xi Ruoyao  wrote:
>
> > Wang Lei raised some concerns about Itanium C++ ABI, so let's ask a C++
> > expert here...
> >
> > Jonathan: AFAIK the standard and the Itanium ABI treats an empty class
> > as size 1
>
> Only as a complete object, not as a subobject.
>

Also as a data member subobject.


> > in order to guarantee unique address, so for the following:
> >
> > class Empty {};
> > class Test { Empty empty; double a, b; };
>
> There is no need to have a unique address here, so Test::empty and Test::a
> have the same address. It's a potentially-overlapping subobject.
>
> For the Itanium ABI, sizeof(Test) == 2 * sizeof(double).
>

That would be true if Test::empty were marked [[no_unique_address]], but
without that attribute, sizeof(Test) is actually 3 * sizeof(double).


> > When we pass "Test" via registers, we may only allocate the registers
> > for Test::a and Test::b, and complete ignore Test::empty because there
> > is no addresses of registers.  Is this correct or not?
>
> I think that's a decision for the loongarch psABI. In principle, there's no
> reason a register has to be used to pass Test::empty, since you can't read
> from it or write to it.
>

Agreed.  The Itanium C++ ABI has nothing to say about how registers are
allocated for parameter passing; this is a matter for the psABI.

And there is no need for a psABI to allocate a register for Test::empty
because it contains no data.

In the x86_64 psABI, Test above is passed in memory because of its size
("the size of the aggregate exceeds two eightbytes...").  But

struct Test2 { Empty empty; double a; };

is passed in a single floating-point register; the Test2::empty subobject
is not passed anywhere, because its eightbyte is classified as NO_CLASS,
because there is no actual data there.

I know nothing about the LoongArch psABI, but going out of your way to
assign a register to an empty class seems like a mistake.

> On Wed, 2023-05-24 at 14:45 +0800, Xi Ruoyao via Gcc-patches wrote:
> > > On Wed, 2023-05-24 at 14:04 +0800, Lulu Cheng wrote:
> > > > An empty struct type that is not non-trivial for the purposes of
> > > > calls
> > > > will be treated as though it were the following C type:
> > > >
> > > > struct {
> > > >   char c;
> > > > };
> > > >
> > > > Before this patch was added, a structure parameter containing an
> > > > empty structure and
> > > > less than three floating-point members was passed through one or two
> > > > floating-point
> > > > registers, while nested empty structures are ignored. Which did not
> > > > conform to the
> > > > calling convention.
> > >
> > > No, it's a deliberate decision I've made in
> > > https://gcc.gnu.org/r12-8294.  And we already agreed "the ABI needs to
> > > be updated" when we applied r12-8294, but I've never improved my
> > > English
> > > skill to revise the ABI myself :(.
> > >
> > > We are also using the same "de-facto" ABI throwing away the empty
> > > struct
> > > for Clang++ (https://reviews.llvm.org/D132285).  So we should update
> > > the
> > > spec here, instead of changing every implementation.
> > >
> > > The C++ standard treats the empty struct as size 1 for ensuring the
> > > semantics of pointer comparison operations.  When we pass it through
> > > the
> > > registers, there is no need to really consider the empty field because
> > > there is no pointers to registers.
> > >
> >
> >
>
>


Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
I'll look at the samples tomorrow, but just to address one thing:

钟居哲  writes:
>>> What gives the best code in these cases?  Is emitting a multiplication
>>> better?  Or is using a new IV better?
> Could you give me more detail information about "new refresh IV" approach.
> I'd like to try that.

By “using a new IV” I meant calling vect_set_loop_controls_directly
for every rgroup, not just the first.  So in the earlier example,
there would be one decrementing IV for x and one decrementing IV for y.

Thanks,
Richard




Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-24 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Wed, 24 May 2023 at 15:40, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Mon, 22 May 2023 at 14:18, Richard Sandiford
>> >  wrote:
>> >>
>> >> Prathamesh Kulkarni  writes:
>> >> > Hi Richard,
>> >> > Thanks for the suggestions. Does the attached patch look OK ?
>> >> > Boostrap+test in progress on aarch64-linux-gnu.
>> >>
>> >> Like I say, please wait for the tests to complete before sending an RFA.
>> >> It saves a review cycle if the tests don't in fact pass.
>> > Right, sorry, will post patches after completion of testing henceforth.
>> >>
>> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
>> >> > b/gcc/config/aarch64/aarch64.cc
>> >> > index 29dbacfa917..e611a7cca25 100644
>> >> > --- a/gcc/config/aarch64/aarch64.cc
>> >> > +++ b/gcc/config/aarch64/aarch64.cc
>> >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, 
>> >> > rtx vals, bool even_p)
>> >> >return gen_rtx_PARALLEL (new_mode, vec);
>> >> >  }
>> >> >
>> >> > +/* Return true if INSN is a scalar move.  */
>> >> > +
>> >> > +static bool
>> >> > +scalar_move_insn_p (const rtx_insn *insn)
>> >> > +{
>> >> > +  rtx set = single_set (insn);
>> >> > +  if (!set)
>> >> > +return false;
>> >> > +  rtx src = SET_SRC (set);
>> >> > +  rtx dest = SET_DEST (set);
>> >> > +  return is_a(GET_MODE (dest))
>> >> > +  && aarch64_mov_operand_p (src, GET_MODE (src));
>> >>
>> >> Formatting:
>> >>
>> >>   return (is_a(GET_MODE (dest))
>> >>   && aarch64_mov_operand_p (src, GET_MODE (src)));
>> >>
>> >> OK with that change if the tests pass, thanks.
>> > Unfortunately, the patch regressed vec-init-21.c:
>> >
>> > int8x16_t f_s8(int8_t x, int8_t y)
>> > {
>> >   return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6,
>> >7, 8, 9, 10, 11, 12, 13, 14 };
>> > }
>> >
>> > -O3 code-gen trunk:
>> > f_s8:
>> > adrpx2, .LC0
>> > ldr q0, [x2, #:lo12:.LC0]
>> > ins v0.b[0], w0
>> > ins v0.b[1], w1
>> > ret
>> >
>> > -O3 code-gen patch:
>> > f_s8:
>> > adrpx2, .LC0
>> > ldr d31, [x2, #:lo12:.LC0]
>> > adrpx2, .LC1
>> > ldr d0, [x2, #:lo12:.LC1]
>> > ins v31.b[0], w0
>> > ins v0.b[0], w1
>> > zip1v0.16b, v31.16b, v0.16b
>> > ret
>> >
>> > With trunk, it chooses the fallback sequence because both fallback
>> > and zip1 sequence had cost = 20, however with patch applied,
>> > we end up with zip1 sequence cost = 24 and fallback sequence
>> > cost = 28.
>> >
>> > This happens because of using insn_cost instead of
>> > set_rtx_cost for the following expression:
>> > (set (reg:QI 100)
>> > (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0))
>> > set_rtx_cost returns 0 for above expression but insn_cost returns 4.
>>
>> Yeah, was wondering why you'd dropped the set_rtx_cost thing,
>> but decided not to question it since using insn_cost seemed
>> reasonable if it worked.
> The attached patch uses set_rtx_cost for single_set and insn_cost
> otherwise for non debug insns similar to seq_cost.

FWIW, I think with the aarch64_mov_operand fix, the old way of using
insn_cost for everything would have worked too.  But either way is fine.

>> > This expression template appears twice in fallback sequence, which raises
>> > the cost to 28 from 20, while it appears once in each half of zip1 
>> > sequence,
>> > which raises the cost to 24 from 20, and so it now prefers zip1 sequence
>> > instead.
>> >
>> > I assumed this expression would be ignored because it looks like a scalar 
>> > move,
>> > but that doesn't seem to be the case ?
>> > aarch64_classify_symbolic_expression returns
>> > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)
>> > and thus aarch64_mov_operand_p returns false.
>>
>> Ah, I guess it should be aarch64_mov_operand instead.  Confusing that
>> they're so different...
> Thanks, using aarch64_mov_operand worked.
>>
>> > Another issue with the zip1 sequence above is using same register x2
>> > for loading another half of constant in:
>> > adrpx2, .LC1
>> >
>> > I guess this will create an output dependency from adrp x2, .LC0 ->
>> > adrp x2, .LC1
>> > and anti-dependency from  ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1
>> > essentially forcing almost the entire sequence (except ins
>> > instructions) to execute sequentially ?
>>
>> I'd expect modern cores to handle that via renaming.
> Ah right, thanks for the clarification.
>
> For some reason, it seems git diff is not formatting the patch correctly :/
> Or perhaps I am doing something wrongly.

No, I think it's fine.  It's just tabs vs. spaces.  A leading
"+" followed by a tab is still only indented 8 columns, whereas
"+" followed by 6 spaces is indented 7 columns.  So indentation
can look a bit weird in the diff.

I was accounting for that though. :)

> For eg, it shows:
> +  return is_a(GET_MODE (dest))
> +&& aarch64_mov_operand (src, GET_MODE 

[Bug fortran/104350] ICE in gfc_array_dimen_size(): Bad dimension

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104350

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:ec2e86274427a402d2de2199ba550f7295ea9b5f

commit r14-1175-gec2e86274427a402d2de2199ba550f7295ea9b5f
Author: Harald Anlauf 
Date:   Wed May 24 21:04:43 2023 +0200

Fortran: reject bad DIM argument of SIZE intrinsic in simplification
[PR104350]

gcc/fortran/ChangeLog:

PR fortran/104350
* simplify.cc (simplify_size): Reject DIM argument of intrinsic
SIZE
with error when out of valid range.

gcc/testsuite/ChangeLog:

PR fortran/104350
* gfortran.dg/size_dim_2.f90: New test.

[Bug fortran/103794] ICE in gfc_check_reshape, at fortran/check.c:4727

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103794

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:5fd5d8fb744fd9251d04e4b17d04f2340e6a283b

commit r14-1174-g5fd5d8fb744fd9251d04e4b17d04f2340e6a283b
Author: Harald Anlauf 
Date:   Sun May 21 22:25:29 2023 +0200

Fortran: checking and simplification of RESHAPE intrinsic [PR103794]

gcc/fortran/ChangeLog:

PR fortran/103794
* check.cc (gfc_check_reshape): Expand constant arguments SHAPE and
ORDER before checking.
* gfortran.h (gfc_is_constant_array_expr): Add prototype.
* iresolve.cc (gfc_resolve_reshape): Expand constant argument
SHAPE.
* simplify.cc (is_constant_array_expr): If array is determined to
be
constant, expand small array constructors if needed.
(gfc_is_constant_array_expr): Wrapper for is_constant_array_expr.
(gfc_simplify_reshape): Fix check for insufficient elements in
SOURCE
when no padding specified.

gcc/testsuite/ChangeLog:

PR fortran/103794
* gfortran.dg/reshape_10.f90: New test.
* gfortran.dg/reshape_11.f90: New test.

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-24 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 24 May 2023 at 15:40, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Mon, 22 May 2023 at 14:18, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > Hi Richard,
> >> > Thanks for the suggestions. Does the attached patch look OK ?
> >> > Boostrap+test in progress on aarch64-linux-gnu.
> >>
> >> Like I say, please wait for the tests to complete before sending an RFA.
> >> It saves a review cycle if the tests don't in fact pass.
> > Right, sorry, will post patches after completion of testing henceforth.
> >>
> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
> >> > b/gcc/config/aarch64/aarch64.cc
> >> > index 29dbacfa917..e611a7cca25 100644
> >> > --- a/gcc/config/aarch64/aarch64.cc
> >> > +++ b/gcc/config/aarch64/aarch64.cc
> >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, 
> >> > rtx vals, bool even_p)
> >> >return gen_rtx_PARALLEL (new_mode, vec);
> >> >  }
> >> >
> >> > +/* Return true if INSN is a scalar move.  */
> >> > +
> >> > +static bool
> >> > +scalar_move_insn_p (const rtx_insn *insn)
> >> > +{
> >> > +  rtx set = single_set (insn);
> >> > +  if (!set)
> >> > +return false;
> >> > +  rtx src = SET_SRC (set);
> >> > +  rtx dest = SET_DEST (set);
> >> > +  return is_a(GET_MODE (dest))
> >> > +  && aarch64_mov_operand_p (src, GET_MODE (src));
> >>
> >> Formatting:
> >>
> >>   return (is_a(GET_MODE (dest))
> >>   && aarch64_mov_operand_p (src, GET_MODE (src)));
> >>
> >> OK with that change if the tests pass, thanks.
> > Unfortunately, the patch regressed vec-init-21.c:
> >
> > int8x16_t f_s8(int8_t x, int8_t y)
> > {
> >   return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6,
> >7, 8, 9, 10, 11, 12, 13, 14 };
> > }
> >
> > -O3 code-gen trunk:
> > f_s8:
> > adrpx2, .LC0
> > ldr q0, [x2, #:lo12:.LC0]
> > ins v0.b[0], w0
> > ins v0.b[1], w1
> > ret
> >
> > -O3 code-gen patch:
> > f_s8:
> > adrpx2, .LC0
> > ldr d31, [x2, #:lo12:.LC0]
> > adrpx2, .LC1
> > ldr d0, [x2, #:lo12:.LC1]
> > ins v31.b[0], w0
> > ins v0.b[0], w1
> > zip1v0.16b, v31.16b, v0.16b
> > ret
> >
> > With trunk, it chooses the fallback sequence because both fallback
> > and zip1 sequence had cost = 20, however with patch applied,
> > we end up with zip1 sequence cost = 24 and fallback sequence
> > cost = 28.
> >
> > This happens because of using insn_cost instead of
> > set_rtx_cost for the following expression:
> > (set (reg:QI 100)
> > (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0))
> > set_rtx_cost returns 0 for above expression but insn_cost returns 4.
>
> Yeah, was wondering why you'd dropped the set_rtx_cost thing,
> but decided not to question it since using insn_cost seemed
> reasonable if it worked.
[reposting because my reply got blocked for moderator approval]

The attached patch uses set_rtx_cost for single_set and insn_cost
otherwise for non debug insns similar to seq_cost.
>
> > This expression template appears twice in fallback sequence, which raises
> > the cost to 28 from 20, while it appears once in each half of zip1 sequence,
> > which raises the cost to 24 from 20, and so it now prefers zip1 sequence
> > instead.
> >
> > I assumed this expression would be ignored because it looks like a scalar 
> > move,
> > but that doesn't seem to be the case ?
> > aarch64_classify_symbolic_expression returns
> > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)
> > and thus aarch64_mov_operand_p returns false.
>
> Ah, I guess it should be aarch64_mov_operand instead.  Confusing that
> they're so different...
Thanks, using aarch64_mov_operand worked.
>
> > Another issue with the zip1 sequence above is using same register x2
> > for loading another half of constant in:
> > adrpx2, .LC1
> >
> > I guess this will create an output dependency from adrp x2, .LC0 ->
> > adrp x2, .LC1
> > and anti-dependency from  ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1
> > essentially forcing almost the entire sequence (except ins
> > instructions) to execute sequentially ?
>
> I'd expect modern cores to handle that via renaming.
Ah right, thanks for the clarification.

For some reason, it seems git diff is not formatting the patch correctly :/
Or perhaps I am doing something wrongly.
For eg, it shows:
+  return is_a(GET_MODE (dest))
+&& aarch64_mov_operand (src, GET_MODE (src));
but after applying the patch, it's formatted correctly with
"&"  right below is_a, both on column 10.

Similarly, for following hunk in seq_cost_ignoring_scalar_moves:
+if (NONDEBUG_INSN_P (seq)
+   && !scalar_move_insn_p (seq))
After applying patch, "&&" is below N, and not '('. Both N and "&&"
are on col 9.

And for the following just below:
+  {
+   if (rtx set = single_set (seq))

diff shows only one space difference between '{' and the following if,
but after applying the patch 

Re: [PATCH] Fortran: checking and simplification of RESHAPE intrinsic [PR103794]

2023-05-24 Thread Mikael Morin

Le 21/05/2023 à 22:48, Harald Anlauf via Fortran a écrit :

Dear all,

checking and simplification of the RESHAPE intrinsic could fail in
various ways for sufficiently complicated arguments, like array
constructors.  Debugging revealed that in these cases we determined
that the array arguments were constant but we did not properly
simplify and expand the constructors.

A possible solution is the extend the test for constant arrays -
which already does an expansion for initialization expressions -
to also perform an expansion for small constructors in the
non-initialization case.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK, thanks.


Re: [PATCH] Fortran: reject bad DIM argument of SIZE intrinsic in simplification [PR104350]

2023-05-24 Thread Mikael Morin

Le 24/05/2023 à 21:16, Harald Anlauf via Fortran a écrit :

Dear all,

the attached almost obvious patch fixes an ICE on invalid that may
occur when we attempt to simplify an initialization expression with
SIZE for an out-of-range DIM argument.  Returning gfc_bad_expr
allows for a more graceful error recovery.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK, thanks.


[Bug libstdc++/109261] std::experimental::simd is not usable in several constant expressions

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109261

--- Comment #13 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Matthias Kretz
:

https://gcc.gnu.org/g:2b502c3119c91fe3ba2313f0842a3bedd395bc91

commit r12-9651-g2b502c3119c91fe3ba2313f0842a3bedd395bc91
Author: Matthias Kretz 
Date:   Wed May 24 12:50:46 2023 +0200

libstdc++: Fix SFINAE for __is_intrinsic_type on ARM

On ARM NEON doesn't support double, so __is_intrinsic_type_v should say false (instead of being ill-formed).

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd.h (__intrinsic_type):
Specialize __intrinsic_type and
__intrinsic_type in any case, but provide the member
type only with __aarch64__.

(cherry picked from commit aa8b363171a95b8f867a74f29c75f9577e9087e1)

[Bug libstdc++/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #10 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Matthias Kretz
:

https://gcc.gnu.org/g:ff7360dafe209b960535eaaa3efcfbaaa44daff9

commit r12-9652-gff7360dafe209b960535eaaa3efcfbaaa44daff9
Author: Matthias Kretz 
Date:   Wed May 24 16:43:07 2023 +0200

libstdc++: Fix type of first argument to vec_cntm call

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109949
* include/experimental/bits/simd.h (__intrinsic_type): If
__ALTIVEC__ is defined, map gnu::vector_size types to their
corresponding __vector T types without losing unsignedness of
integer types. Also prefer long long over long.
* include/experimental/bits/simd_ppc.h (_S_popcount): Cast mask
object to the expected unsigned vector type.

(cherry picked from commit efd2b55d8562c6e80cb7ee8b9b1f9418f0c00cd9)

[Bug libstdc++/109261] std::experimental::simd is not usable in several constant expressions

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109261

--- Comment #12 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Matthias Kretz
:

https://gcc.gnu.org/g:8be71168f7bbafa04f592a7524432351ffea71ba

commit r12-9650-g8be71168f7bbafa04f592a7524432351ffea71ba
Author: Matthias Kretz 
Date:   Tue May 23 23:48:49 2023 +0200

libstdc++: Add missing constexpr to simd_neon

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd_neon.h (_S_reduce): Add
constexpr and make NEON implementation conditional on
not __builtin_is_constant_evaluated.

(cherry picked from commit b0a483b0a011f9cbc8b25053eae809c77dae2a12)

[Bug libstdc++/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Matthias Kretz :

https://gcc.gnu.org/g:efd2b55d8562c6e80cb7ee8b9b1f9418f0c00cd9

commit r14-1173-gefd2b55d8562c6e80cb7ee8b9b1f9418f0c00cd9
Author: Matthias Kretz 
Date:   Wed May 24 16:43:07 2023 +0200

libstdc++: Fix type of first argument to vec_cntm call

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109949
* include/experimental/bits/simd.h (__intrinsic_type): If
__ALTIVEC__ is defined, map gnu::vector_size types to their
corresponding __vector T types without losing unsignedness of
integer types. Also prefer long long over long.
* include/experimental/bits/simd_ppc.h (_S_popcount): Cast mask
object to the expected unsigned vector type.

[Bug libstdc++/109947] std::expected monadic operations do not support move-only error types yet

2023-05-24 Thread aemseemann at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109947

--- Comment #3 from Martin Seemann  ---
Thanks for pointing me to the LWG issue. It makes sense that the error type
must be copyable for the `value()` overloads due to potentially throwing a
`bad_expected_access` with the embedded error embedded.

However, the monadic operations will never throw this exception.
Consequently, the standard draft for the monadic operations
(https://eel.is/c++draft/expected.object.monadic) does not contain any
"Throws:" clause nor is copyability of the error type included in the
"Constraints:" clause.

So it comes down to how to interpret the "Effects:" clause: Does "Equivalent to
" mean  that all restrictions of
`value()` apply transitively or is it merely an implementation hint?

(Strangely enough, in the "Effects:" clause of `value_or()&&` the expression
`std::move(**this)` is used  instead of `std::move(value())`. Maybe this is an
oversight/inconsistency of the standard.)

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #4 from Martin Uecker  ---

The concern would be that a program relying on the size of an object being
larger may then have out of bounds accesses.  But rereading the standard, I am
also not not seeing that this is required. (for the extension nothing is
required anyway, but it should be consistent with it).

Re: [PATCH v4] libgfortran: Replace mutex with rwlock

2023-05-24 Thread Thomas Koenig via Gcc-patches

Hi Lipeng,


May I know any comment or concern on this patch, thanks for your time 


Thanks for your patience in getting this reviewed.

A few remarks / questions.

Which strategy is used in this implementation, read-preferring or
write-preferring?  And if read-preferring is used, is there
a danger of deadlock if people do unreasonable things?
Maybe you could explain that, also in a comment in the code.

Can you add some sort of torture test case(s) which does a lot of
opening/closing/reading/writing, possibly with asynchronous
I/O and/or pthreads, to catch possible problems?  If there is a
system dependency or some race condition, chances are that regression
testers will catch this.

With this, the libgfortran parts are OK, unless somebody else has more
comments, so give this a couple of days.  I cannot approve the libgcc
parts, that would be somebody else (Jakub?)

Best regards

Thomas




[Bug fortran/104350] ICE in gfc_array_dimen_size(): Bad dimension

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104350

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |anlauf at gcc dot 
gnu.org

--- Comment #3 from anlauf at gcc dot gnu.org ---
Submitted: https://gcc.gnu.org/pipermail/fortran/2023-May/059322.html

[PATCH] Fortran: reject bad DIM argument of SIZE intrinsic in simplification [PR104350]

2023-05-24 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached almost obvious patch fixes an ICE on invalid that may
occur when we attempt to simplify an initialization expression with
SIZE for an out-of-range DIM argument.  Returning gfc_bad_expr
allows for a more graceful error recovery.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 738bdcce46bd760fcafd1eb56700c8824621266f Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 24 May 2023 21:04:43 +0200
Subject: [PATCH] Fortran: reject bad DIM argument of SIZE intrinsic in
 simplification [PR104350]

gcc/fortran/ChangeLog:

	PR fortran/104350
	* simplify.cc (simplify_size): Reject DIM argument of intrinsic SIZE
	with error when out of valid range.

gcc/testsuite/ChangeLog:

	PR fortran/104350
	* gfortran.dg/size_dim_2.f90: New test.
---
 gcc/fortran/simplify.cc  | 12 +++-
 gcc/testsuite/gfortran.dg/size_dim_2.f90 | 19 +++
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/size_dim_2.f90

diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
index 3f77203e62e..81680117f70 100644
--- a/gcc/fortran/simplify.cc
+++ b/gcc/fortran/simplify.cc
@@ -7594,7 +7594,17 @@ simplify_size (gfc_expr *array, gfc_expr *dim, int k)
   if (dim->expr_type != EXPR_CONSTANT)
 	return NULL;

-  d = mpz_get_ui (dim->value.integer) - 1;
+  if (array->rank == -1)
+	return NULL;
+
+  d = mpz_get_si (dim->value.integer) - 1;
+  if (d < 0 || d > array->rank - 1)
+	{
+	  gfc_error ("DIM argument (%d) to intrinsic SIZE at %L out of range "
+		 "(1:%d)", d+1, >where, array->rank);
+	  return _bad_expr;
+	}
+
   if (!gfc_array_dimen_size (array, d, ))
 	return NULL;
 }
diff --git a/gcc/testsuite/gfortran.dg/size_dim_2.f90 b/gcc/testsuite/gfortran.dg/size_dim_2.f90
new file mode 100644
index 000..27a71d90a47
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/size_dim_2.f90
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! PR fortran/104350 - ICE with SIZE and bad DIM in initialization expression
+! Contributed by G. Steinmetz
+
+program p
+  implicit none
+  integer :: k
+  integer, parameter :: x(2,3) = 42
+  integer, parameter :: s(*) = [(size(x,dim=k),k=1,rank(x))]
+  integer, parameter :: t(*) = [(size(x,dim=k),k=1,3)]   ! { dg-error "out of range" }
+  integer, parameter :: u(*) = [(size(x,dim=k),k=0,3)]   ! { dg-error "out of range" }
+  integer, parameter :: v = product(shape(x))
+  integer, parameter :: w = product([(size(x,k),k=0,3)]) ! { dg-error "out of range" }
+  print *,([(size(x,dim=k),k=1,rank(x))])
+  print *, [(size(x,dim=k),k=1,rank(x))]
+  print *, [(size(x,dim=k),k=0,rank(x))]
+  print *, product([(size(x,dim=k),k=1,rank(x))])
+  print *, product([(size(x,dim=k),k=0,rank(x))])
+end
--
2.35.3



[Bug rtl-optimization/101188] [AVR] Miscompilation and function pointers

2023-05-24 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

--- Comment #6 from Georg-Johann Lay  ---
Created attachment 55152
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55152=edit
diff testcase by v4.9.2 vs v5.2.1

Code from v4.9.2 is correct, but from v5.2.1 is bogus:

--- fail1-4.9.2.sx  2023-05-24 17:20:46.508698338 +0200
+++ fail1-5.2.1.sx  2023-05-24 17:19:50.019976879 +0200
@@ -39,11 +39,11 @@
adiw r24,1   ;  13  addhi3_clobber/1[length = 1]
std Z+1,r25  ;  14  *movhi/4[length = 2]
st Z,r24
-   adiw r30,2   ;  15  *addhi3/3   [length = 1]
-   movw r14,r16 ;  39  *movhi/1[length = 1]
-   ldi r24,68   ;  16  addhi3_clobber/3[length = 3]
-   add r14,r24
+   movw r14,r16 ;  38  *movhi/1[length = 1]
+   ldi r31,68   ;  15  addhi3_clobber/3[length = 3]
+   add r14,r31
adc r15,__zero_reg__
+   adiw r30,2   ;  17  *addhi3/3   [length = 1]
ld __tmp_reg__,Z+;  18  *movhi/3[length = 3]
ld r31,Z
mov r30,__tmp_reg__

[Bug c++/109958] [10/11/12/13/14 Regression] ICE: in build_ptrmem_type, at cp/decl.cc:11066 taking the address of bound static member function brought into derived class by using-declaration

2023-05-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109958

Marek Polacek  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
   Priority|P3  |P2
Summary|ICE: in build_ptrmem_type,  |[10/11/12/13/14 Regression]
   |at cp/decl.cc:11066 taking  |ICE: in build_ptrmem_type,
   |the address of bound static |at cp/decl.cc:11066 taking
   |member function brought |the address of bound static
   |into derived class by   |member function brought
   |using-declaration   |into derived class by
   ||using-declaration
   Target Milestone|--- |10.5

[PATCH RFC] c++: use __cxa_call_terminate for MUST_NOT_THROW [PR97720]

2023-05-24 Thread Jason Merrill via Gcc-patches
Middle-end folks: any thoughts about how best to make the change described in
the last paragraph below?

Library folks: any thoughts on the changes to __cxa_call_terminate?

-- 8< --

[except.handle]/7 says that when we enter std::terminate due to a throw,
that is considered an active handler.  We already implemented that properly
for the case of not finding a handler (__cxa_throw calls __cxa_begin_catch
before std::terminate) and the case of finding a callsite with no landing
pad (the personality function calls __cxa_call_terminate which calls
__cxa_begin_catch), but for the case of a throw in a try/catch in a noexcept
function, we were emitting a cleanup that calls std::terminate directly
without ever calling __cxa_begin_catch to handle the exception.

A straightforward way to fix this seems to be calling __cxa_call_terminate
instead.  However, that requires exporting it from libstdc++, which we have
not previously done.  Despite the name, it isn't actually part of the ABI
standard.  Nor is __cxa_call_unexpected, as far as I can tell, but that one
is also used by clang.  For this case they use __clang_call_terminate; it
seems reasonable to me for us to stick with __cxa_call_terminate.

I also change __cxa_call_terminate to take void* for simplicity in the front
end (and consistency with __cxa_call_unexpected) but that isn't necessary if
it's undesirable for some reason.

This patch does not fix the issue that representing the noexcept as a
cleanup is wrong, and confuses the handler search; since it looks like a
cleanup in the EH tables, the unwinder keeps looking until it finds the
catch in main(), which it should never have gotten to.  Without the
try/catch in main, the unwinder would reach the end of the stack and say no
handler was found.  The noexcept is a handler, and should be treated as one,
as it is when the landing pad is omitted.

The best fix for that issue seems to me to be to represent an
ERT_MUST_NOT_THROW after an ERT_TRY in an action list as though it were an
ERT_ALLOWED_EXCEPTIONS (since indeed it is an exception-specification).  The
actual code generation shouldn't need to change (apart from the change made
by this patch), only the action table entry.

PR c++/97720

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index): Add CPTI_CALL_TERMINATE_FN.
(call_terminate_fn): New macro.
* cp-gimplify.cc (gimplify_must_not_throw_expr): Use it.
* except.cc (init_exception_processing): Set it.
(cp_protect_cleanup_actions): Return it.

gcc/ChangeLog:

* tree-eh.cc (lower_resx): Pass the exception pointer to the
failure_decl.
* except.h: Tweak comment.

libstdc++-v3/ChangeLog:

* libsupc++/eh_call.cc (__cxa_call_terminate): Take void*.
* config/abi/pre/gnu.ver: Add it.

gcc/testsuite/ChangeLog:

* g++.dg/eh/terminate2.C: New test.
---
 gcc/cp/cp-tree.h |  2 ++
 gcc/except.h |  2 +-
 gcc/cp/cp-gimplify.cc|  2 +-
 gcc/cp/except.cc |  5 -
 gcc/testsuite/g++.dg/eh/terminate2.C | 30 
 gcc/tree-eh.cc   | 16 ++-
 libstdc++-v3/libsupc++/eh_call.cc|  4 +++-
 libstdc++-v3/config/abi/pre/gnu.ver  |  7 +++
 8 files changed, 63 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/eh/terminate2.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index a1b882f11fe..a8465a988b5 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -217,6 +217,7 @@ enum cp_tree_index
definitions.  */
 CPTI_ALIGN_TYPE,
 CPTI_TERMINATE_FN,
+CPTI_CALL_TERMINATE_FN,
 CPTI_CALL_UNEXPECTED_FN,
 
 /* These are lazily inited.  */
@@ -358,6 +359,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
 /* Exception handling function declarations.  */
 #define terminate_fn   cp_global_trees[CPTI_TERMINATE_FN]
 #define call_unexpected_fn cp_global_trees[CPTI_CALL_UNEXPECTED_FN]
+#define call_terminate_fn  cp_global_trees[CPTI_CALL_TERMINATE_FN]
 #define get_exception_ptr_fn   
cp_global_trees[CPTI_GET_EXCEPTION_PTR_FN]
 #define begin_catch_fn cp_global_trees[CPTI_BEGIN_CATCH_FN]
 #define end_catch_fn   cp_global_trees[CPTI_END_CATCH_FN]
diff --git a/gcc/except.h b/gcc/except.h
index 5ecdbc0d1dc..378a9e4cb77 100644
--- a/gcc/except.h
+++ b/gcc/except.h
@@ -155,7 +155,7 @@ struct GTY(()) eh_region_d
 struct eh_region_u_must_not_throw {
   /* A function decl to be invoked if this region is actually reachable
 from within the function, rather than implementable from the runtime.
-The normal way for this to happen is for there to be a CLEANUP region
+The normal way for this to happen is for there to be a TRY region
 contained within this MUST_NOT_THROW region.  Note that if the
 runtime handles the MUST_NOT_THROW region, we have no 

[Bug c++/109958] ICE: in build_ptrmem_type, at cp/decl.cc:11066 taking the address of bound static member function brought into derived class by using-declaration

2023-05-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109958

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||mpolacek at gcc dot gnu.org
   Last reconfirmed||2023-05-24

--- Comment #1 from Marek Polacek  ---
Confirmed.  r0-115460-g57910f3a9a81e9:

commit 57910f3a9a81e9ad122a814255197f6f24c6af08
Author: Jason Merrill 
Date:   Sat Mar 3 19:53:30 2012 -0500

class.c (add_method): Always build an OVERLOAD for using-decls.

* class.c (add_method): Always build an OVERLOAD for using-decls.
* search.c (lookup_member): Handle getting an OVERLOAD for a
single function.

From-SVN: r184873

[Bug c++/109958] New: ICE: in build_ptrmem_type, at cp/decl.cc:11066 taking the address of bound static member function brought into derived class by using-declaration

2023-05-24 Thread ed at catmur dot uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109958

Bug ID: 109958
   Summary: ICE: in build_ptrmem_type, at cp/decl.cc:11066 taking
the address of bound static member function brought
into derived class by using-declaration
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ed at catmur dot uk
  Target Milestone: ---

struct B { static int f(); };
struct D : B { using B::f; };
void f(D d) {  }

: In function 'void f(D)':
:3:18: error: ISO C++ forbids taking the address of a bound member
function to form a pointer to member function.  Say '::f' [-fpermissive]
3 | void f(D d) {  }
  |~~^
:3:18: internal compiler error: in build_ptrmem_type, at
cp/decl.cc:11066
3 | void f(D d) {  }
  |  ^
0x23a0cee internal_error(char const*, ...)
???:0
0xa95fae fancy_abort(char const*, int, char const*)
???:0
0xd31f7f build_x_unary_op(unsigned int, tree_code, cp_expr, tree_node*, int)
???:0
0xc7ab2f c_parse_file()
???:0
0xdb9519 c_common_parse_file()
???:0

This appears to have been broken somewhere between 4.7.4 and 4.8.1.

[COMMITTED] Remove deprecated vrange::kind().

2023-05-24 Thread Aldy Hernandez via Gcc-patches
gcc/ChangeLog:

* value-range.h (vrange::kind): Remove.
---
 gcc/value-range.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 936eb175062..b8cc2a0e76a 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -100,9 +100,6 @@ public:
   bool operator== (const vrange &) const;
   bool operator!= (const vrange ) const { return !(*this == r); }
   void dump (FILE *) const;
-
-  enum value_range_kind kind () const; // DEPRECATED
-
 protected:
   vrange (enum value_range_discriminator d) : m_discriminator (d) { }
   ENUM_BITFIELD(value_range_kind) m_kind : 8;
-- 
2.40.1



  1   2   3   4   >