date:20221129

Re: [PATCH Rust front-end v3 38/46] gccrs: Add HIR to GCC GENERIC lowering entry point

2022-11-29 Thread Richard Biener via Gcc-patches

On Tue, Nov 29, 2022 at 7:10 PM Arthur Cohen  wrote:
>
> Hi Richard,
>
> (...)
>
>  +
>  +  unsigned HOST_WIDE_INT ltype_length
>  += wi::ext (wi::to_offset (TYPE_MAX_VALUE (ltype_domain))
>  +- wi::to_offset (TYPE_MIN_VALUE (ltype_domain)) + 1,
> >>>
> >>> TYPE_MIN_VALUE is not checked to be constant, also the correct
> >>> check would be to use TREE_CODE (..) == INTEGER_CST, in
> >>> the GCC middle-end an expression '1 + 2' (a PLUS_EXPR) would
> >>> be TREE_CONSTANT but wi::to_offset would ICE.
> >>>
>  +  TYPE_PRECISION (TREE_TYPE (ltype_domain)),
>  +  TYPE_SIGN (TREE_TYPE (ltype_domain)))
>  +   .to_uhwi ();
> >>>
> >>> .to_uhwi will just truncate if the value doesn't fit, the same result as
> >>> above is achieved with
> >>>
> >>>unsigned HOST_WIDE_INT ltype_length
> >>>   = TREE_INT_CST_LOW (TYPE_MAX_VALUE (..))
> >>> - TREE_INT_CST_LOW (TYPE_MIN_VALUE (...)) + 1;
> >>>
> >>> so it appears you wanted to be "more correct" here (but if I see
> >>> correctly you fail on that attempt)?
> >>>
>
> I've made the changes you proposed and noticed failure on our 32-bit CI.
>
> I've had a look at the values in detail, and it seems that truncating
> was the expected behavior.
>
> On our 64 bit CI, with a testcase containing an array of zero elements,
> we get the following values:
>
> TREE_INT_CST_LOW(TYPE_MAX_VALUE(...)) = 18446744073709551615;
> TREE_INT_CST_LOW(TYPE_MIN_VALUE(...)) = 0;
>
> Adding 1 to the result of the substraction results in an overflow,
> wrapping back to zero.
>
> With the -m32 flag, we get the following values:
>
> TREE_INT_CST_LOW(TYPE_MAX_VALUE(...)) = 4294967295;
> TREE_INT_CST_LOW(TYPE_MIN_VALUE(...)) = 0;
>
> The addition of 1 does not overflow the unsigned HOST_WIDE_INT type and
> we end up with 4294967296 as the length of our array.
>
> I am not sure on how to fix this behavior, and whether or not it is the
> expected one, nor am I familiar enough with the tree API to reproduce
> the original behavior. Any input is welcome.
>
> In the meantime, I'll revert those changes and probably keep the
> existing code in the patches if that's okay with you.

Sure - take my comments as that the code needs comments explaining
what it tries to do.  Apparently I misunderstood the intent (and still don't
get it, but I don't remember the part in detail either).

> >>> Overall this part of the rust frontend looks OK.  Take the comments as
> >>> suggestions (for future
> >>> enhancements).
>
> Which seems to be the case :)
>
> The v4 of patches, which contains a lot of fixes for the issues you
> mentioned, will be sent soon.
>
> All the best,
>
> Arthur

Re: [PATCH v2] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2022-11-29 Thread Richard Biener via Gcc-patches

On Tue, Nov 29, 2022 at 11:05 AM Manolis Tsamis  wrote:
>
> When using SWAR (SIMD in a register) techniques a comparison operation within
> such a register can be made by using a combination of shifts, bitwise and and
> multiplication. If code using this scheme is vectorized then there is 
> potential
> to replace all these operations with a single vector comparison, by 
> reinterpreting
> the vector types to match the width of the SWAR register.
>
> For example, for the test function packed_cmp_16_32, the original generated 
> code is:
>
> ldr q0, [x0]
> add w1, w1, 1
> ushrv0.4s, v0.4s, 15
> and v0.16b, v0.16b, v2.16b
> shl v1.4s, v0.4s, 16
> sub v0.4s, v1.4s, v0.4s
> str q0, [x0], 16
> cmp w2, w1
> bhi .L20
>
> with this pattern the above can be optimized to:
>
> ldr q0, [x0]
> add w1, w1, 1
> cmltv0.8h, v0.8h, #0
> str q0, [x0], 16
> cmp w2, w1
> bhi .L20
>
> The effect is similar for x86-64.
>
> Signed-off-by: Manolis Tsamis 
>
> gcc/ChangeLog:
>
> * match.pd: Simplify vector shift + bit_and + multiply in some cases.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/swar_to_vec_cmp.c: New test.
>
> ---
>
> Changes in v2:
> - Changed pattern to use vec_cond_expr.
> - Changed pattern to work with VLA vector.
> - Added more checks and comments.
>
>  gcc/match.pd  | 60 
>  .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72 +++
>  2 files changed, 132 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 67a0a682f31..05e7fc79ba8 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -301,6 +301,66 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (view_convert (bit_and:itype (view_convert @0)
>  (ne @1 { build_zero_cst (type); })))
>
> +/* In SWAR (SIMD in a register) code a signed comparison of packed data can
> +   be constructed with a particular combination of shift, bitwise and,
> +   and multiplication by constants.  If that code is vectorized we can
> +   convert this pattern into a more efficient vector comparison.  */
> +(simplify
> + (mult (bit_and (rshift @0 uniform_integer_cst_p@1)
> +   uniform_integer_cst_p@2)
> +uniform_integer_cst_p@3)

Please use VECTOR_CST in the match instead of uniform_integer_cst_p
and instead ...

> + (with {
> +   tree rshift_cst = uniform_integer_cst_p (@1);
> +   tree bit_and_cst = uniform_integer_cst_p (@2);
> +   tree mult_cst = uniform_integer_cst_p (@3);
> +  }
> +  /* Make sure we're working with vectors and uniform vector constants.  */
> +  (if (VECTOR_TYPE_P (type)

... test for non-NULL *_cst here where you can use uniform_vector_p instead
of uniform_integer_cst_p.  You can elide the VECTOR_TYPE_P check then
and instead do INTEGRAL_TYPE_P (TREE_TYPE (type)).

> +   && tree_fits_uhwi_p (rshift_cst)
> +   && tree_fits_uhwi_p (mult_cst)
> +   && tree_fits_uhwi_p (bit_and_cst))
> +   /* Compute what constants would be needed for this to represent a packed
> +  comparison based on the shift amount denoted by RSHIFT_CST.  */
> +   (with {
> + HOST_WIDE_INT vec_elem_bits = vector_element_bits (type);
> + poly_int64 vec_nelts = TYPE_VECTOR_SUBPARTS (type);
> + poly_int64 vec_bits = vec_elem_bits * vec_nelts;
> +
> + unsigned HOST_WIDE_INT cmp_bits_i, bit_and_i, mult_i;
> + unsigned HOST_WIDE_INT target_mult_i, target_bit_and_i;
> + cmp_bits_i = tree_to_uhwi (rshift_cst) + 1;
> + target_mult_i = (HOST_WIDE_INT_1U << cmp_bits_i) - 1;
> +
> + mult_i = tree_to_uhwi (mult_cst);
> + bit_and_i = tree_to_uhwi (bit_and_cst);
> + target_bit_and_i = 0;
> +
> + /* The bit pattern in BIT_AND_I should be a mask for the least
> +significant bit of each packed element that is CMP_BITS wide.  */
> + for (unsigned i = 0; i < vec_elem_bits / cmp_bits_i; i++)
> +   target_bit_and_i = (target_bit_and_i << cmp_bits_i) | 1U;
> +}
> +(if ((exact_log2 (cmp_bits_i)) >= 0
> +&& cmp_bits_i < HOST_BITS_PER_WIDE_INT
> +&& multiple_p (vec_bits, cmp_bits_i)
> +&& vec_elem_bits <= HOST_BITS_PER_WIDE_INT
> +&& target_mult_i == mult_i
> +&& target_bit_and_i == bit_and_i)
> + /* Compute the vector shape for the comparison and check if the target 
> is
> +   able to expand the comparison with that type.  */
> + (with {
> +   /* We're doing a signed comparison.  */
> +   tree cmp_type = build_nonstandard_integer_type (cmp_bits_i, 0);
> +   poly_int64 vector_type_nelts = exact_div (vec_bits, cmp_bits_i);
> +   tree vector_cmp_type = build_vector_type (cmp_type, 
> vector_type_nelts);
> +   tree zeros = build_zero_cst (vector_cmp_type);
>

Re: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]

2022-11-29 Thread Richard Biener via Gcc-patches

On Thu, Nov 24, 2022 at 4:27 PM Iskander Shakirzyanov
 wrote:
>
> How did you test the patch? If you bootstrapped it and ran the
> testsuite then it's OK.
>
> Yes, i ran testsuite and  bootstrapped and everything seemed OK, but i missed 
> fail of tests gcc.dg/Warray-bounds-34.c and gcc.dg/Warray-bounds-43.c, so 
> Franz is right. After that I fixed the regexps in dg directives and now 
> everything seems OK.
>
> I'm pretty sure the testsuite will have regressions, as I have a very similar 
> patch lying around that needs these testsuite changes.
>
> You are right, thank you. I missed this, attaching corrected version of patch.
>
> This also shows nicely why I don't like warnings with levels, what if I want 
> -Werror=array-bounds=2 + -Warray-bounds=1?
>
> I completely agree with you, because I also thought about using -Werror=opt=X 
> + -Wopt=Y, this functionality looks useful. As I know, gcc, while parsing an 
> option with the same OPT,  overwrites the old config of OPT.
>
> Because I think at least -Wuse-after-free= and Wattributes= have the same 
> problem.
>
> Yes, looks like this, probably should be fixed too.
>
> BTW, is the duplicated warning description "Warn if an array is accessed out 
> of bounds." needed or not with Alias()?
>
> According to other examples in common.opt, duplicated description is not 
> necessary, you are right.
>
> I've attached my patch, feel free to integrate the testsuite changes.
>
> Thanks, but it seems to me that duplicating existing tests seems redundant to 
> test functionality of -Werror=array-bounds=X.
>
>
> From bf047e36392dab138db10be2ec257d08c376ada5 Mon Sep 17 00:00:00 2001
> From: Iskander Shakirzyanov 
> Date: Thu, 24 Nov 2022 14:26:59 +
> Subject: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]
>
> According to documentation the -Werror= option makes the specified warning
> into an error and also automatically implies this option. Then it seems that
> the behavior of the compiler when specifying -Werror=array-bounds=X should be
> the same as specifying "-Werror=array-bounds -Warray-bounds=X", so we expect 
> to
> receive array-bounds pass triggers and they must be processed as errors.
> In practice, we observe that the array-bounds pass is indeed called, but
> its responses are processed as warnings, not errors.
> As I understand, this happens because Warray-bounds and Warray-bounds= are
> declared as 2 different options in common.opt, so when
> diagnostic_classify_diagnostic() is called, DK_ERROR is set for
> the Warray-bounds= option, but in diagnostic_report_diagnostic() through
> warning_at() passes opt_index of Warray-bounds, so information about
> DK_ERROR is lost. Fixed by using Alias() in declaration of
> Warray-bounds (similarly as in Wattribute-alias etc.)

OK if this passed bootstrap & regtest.

Thanks,
Richard.

> PR driver/107787
>
> Co-authored-by: Franz Sirl 
>
> gcc/ChangeLog:
>
> * common.opt (Warray-bounds): Turn into alias to
> -Warray-bounds=1.
> * builtins.cc (warn_array_bounds): Use OPT_Warray_bounds_
> instead of OPT_Warray_bounds.
> * diagnostic-spec.cc: Likewise.
> * gimple-array-bounds.cc: Likewise.
> * gimple-ssa-warn-restrict.cc: Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/Warray-bounds-34.c: Correct the regular
> expression for -Warray-bounds=.
> * gcc.dg/Warray-bounds-43.c: Likewise.
> * gcc.dg/pr107787.c: New test.
>
> gcc/c-family/ChangeLog:
>
> * c-common.cc (warn_array_bounds): Use OPT_Warray_bounds_
> instead of OPT_Warray_bounds.
> ---
>  gcc/builtins.cc |  6 ++--
>  gcc/c-family/c-common.cc|  4 +--
>  gcc/common.opt  |  3 +-
>  gcc/diagnostic-spec.cc  |  1 -
>  gcc/gimple-array-bounds.cc  | 38 -
>  gcc/gimple-ssa-warn-restrict.cc |  2 +-
>  gcc/testsuite/gcc.dg/Warray-bounds-34.c |  2 +-
>  gcc/testsuite/gcc.dg/Warray-bounds-43.c |  6 ++--
>  gcc/testsuite/gcc.dg/pr107787.c | 13 +
>  9 files changed, 43 insertions(+), 32 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr107787.c
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 4dc1ca672b2..02c4fefa86f 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -696,14 +696,14 @@ c_strlen (tree arg, int only_value, c_strlen_data 
> *data, unsigned eltsize)
>  {
>/* Suppress multiple warnings for propagated constant strings.  */
>if (only_value != 2
> -  && !warning_suppressed_p (arg, OPT_Warray_bounds)
> -  && warning_at (loc, OPT_Warray_bounds,
> +  && !warning_suppressed_p (arg, OPT_Warray_bounds_)
> +  && warning_at (loc, OPT_Warray_bounds_,
>   "offset %qwi outside bounds of constant string",
>   eltoff))
>   {
>if (decl)
>  inform (DECL_SOURCE_LOCATION (decl), "%qE declared here", decl);
> -  suppress_warning (arg, OPT_Warray_bounds);
> +  suppress_warning (arg, OPT_Warray_bounds_);
>   }
>return NULL_TREE;
>  }
> diff --git

[PATCH 1/2 V2] Implement hwasan target_hook.

2022-11-29 Thread liuhongt via Gcc-patches

Update in V2:
Add documentation for -mlam={none,u48,u57} to x86 options in invoke.texi.

gcc/ChangeLog:

* doc/invoke.texi (x86 options): Document
-mlam={none,u48,u57}.
* config/i386/i386-opts.h (enum lam_type): New enum.
* config/i386/i386.c (ix86_memtag_can_tag_addresses): New.
(ix86_memtag_set_tag): Ditto.
(ix86_memtag_extract_tag): Ditto.
(ix86_memtag_add_tag): Ditto.
(ix86_memtag_tag_size): Ditto.
(ix86_memtag_untagged_pointer): Ditto.
(TARGET_MEMTAG_CAN_TAG_ADDRESSES): New.
(TARGET_MEMTAG_ADD_TAG): Ditto.
(TARGET_MEMTAG_SET_TAG): Ditto.
(TARGET_MEMTAG_EXTRACT_TAG): Ditto.
(TARGET_MEMTAG_UNTAGGED_POINTER): Ditto.
(TARGET_MEMTAG_TAG_SIZE): Ditto.
(IX86_HWASAN_SHIFT): Ditto.
(IX86_HWASAN_TAG_SIZE): Ditto.
* config/i386/i386-expand.c (ix86_expand_call): Untag code
pointer.
* config/i386/i386-options.c (ix86_option_override_internal):
Error when enable -mlam=[u48|u57] for 32-bit code.
* config/i386/i386.opt: Add -mlam=[none|u48|u57].
* config/i386/i386-protos.h (ix86_memtag_untagged_pointer):
Declare.
(ix86_memtag_can_tag_addresses): Ditto.
---
 gcc/config/i386/i386-expand.cc  |  12 
 gcc/config/i386/i386-options.cc |   3 +
 gcc/config/i386/i386-opts.h |   6 ++
 gcc/config/i386/i386-protos.h   |   2 +
 gcc/config/i386/i386.cc | 123 
 gcc/config/i386/i386.opt|  16 +
 gcc/doc/invoke.texi |   9 ++-
 7 files changed, 170 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index d26e7e41445..0e94782165a 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -92,6 +92,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "i386-options.h"
 #include "i386-builtins.h"
 #include "i386-expand.h"
+#include "asan.h"
 
 /* Split one or more double-mode RTL references into pairs of half-mode
references.  The RTL can be REG, offsettable MEM, integer constant, or
@@ -9438,6 +9439,17 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr));
 }
 
+  /* PR100665: Hwasan may tag code pointer which is not supported by LAM,
+ mask off code pointers here.
+ TODO: also need to handle indirect jump.  */
+  if (ix86_memtag_can_tag_addresses () && !fndecl
+  && sanitize_flags_p (SANITIZE_HWADDRESS))
+{
+  rtx untagged_addr = ix86_memtag_untagged_pointer (XEXP (fnaddr, 0),
+   NULL_RTX);
+  fnaddr = gen_rtx_MEM (QImode, untagged_addr);
+}
+
   call = gen_rtx_CALL (VOIDmode, fnaddr, callarg1);
 
   if (retval)
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 44dcccb0a73..25f21ac2a49 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2033,6 +2033,9 @@ ix86_option_override_internal (bool main_args_p,
   if (TARGET_UINTR && !TARGET_64BIT)
 error ("%<-muintr%> not supported for 32-bit code");
 
+  if (ix86_lam_type && !TARGET_LP64)
+error ("%<-mlam=%> option: [u48|u57] not supported for 32-bit code");
+
   if (!opts->x_ix86_arch_string)
 opts->x_ix86_arch_string
   = TARGET_64BIT_P (opts->x_ix86_isa_flags)
diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index 8f71e89fa9a..d3bfeed0af2 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -128,4 +128,10 @@ enum harden_sls {
   harden_sls_all = harden_sls_return | harden_sls_indirect_jmp
 };
 
+enum lam_type {
+  lam_none = 0,
+  lam_u48 = 1,
+  lam_u57
+};
+
 #endif
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index e136f6ec175..abd123c9efc 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -228,6 +228,8 @@ extern void ix86_expand_atomic_fetch_op_loop (rtx, rtx, 
rtx, enum rtx_code,
 extern void ix86_expand_cmpxchg_loop (rtx *, rtx, rtx, rtx, rtx, rtx,
  bool, rtx_code_label *);
 extern rtx ix86_expand_fast_convert_bf_to_sf (rtx);
+extern rtx ix86_memtag_untagged_pointer (rtx, rtx);
+extern bool ix86_memtag_can_tag_addresses (void);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 95babd93c9d..518cc9ffd1f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24274,6 +24274,111 @@ ix86_push_rounding (poly_int64 bytes)
   return ROUND_UP (bytes, UNITS_PER_WORD);
 }
 
+/* Use 8 bits metadata start from bit48 for LAM_U48,
+   6 bits metadat start from bit57 for LAM_U57.  */
+#define IX86_HWASAN_SHIFT (ix86_lam_type == lam_u48\
+  ? 48 \
+

Re: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

2022-11-29 Thread Jiufu Guo via Gcc-patches

Thanks for your comment!
Date: Wed, 30 Nov 2022 12:30:02 +0800
Message-ID: <7ebkopxdx1@pike.rch.stglabs.ibm.com>

Segher Boessenkool  writes:

>> > +  else if ((ud4 == 0x && ud3 == 0x)
>> > + && ((ud1 & 0x8000) || (ud1 == 0 && !(ud2 & 0x8000
>> > +{
>> > +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>> > +
>> > +  HOST_WIDE_INT imm = (ud1 & 0x8000) ? ((ud1 ^ 0x8000) - 0x8000)
>> > +   : ((ud2 << 16) - 0x8000);
>
> We really should have some "hwi::sign_extend (ud1, 16)" helper function,
> heh.  Maybe there already is?  Ah, "sext_hwi".  Fixing that up
> everywhere in this function is preapproved.

I drafted a seperate patch for this like below.  Maybe I could update other
code like "((v & 0xf..f) ^ 0x80..0) - 0x80..0" in rs6000.cc and rs6000.md
with sext_hwi too. 

BR,
Jeff (Jiufu)

Below NFC patch just uses sext_hwi to hand expresion like:
(xx ^ 0x80..0) - 0x80..0 in rs6000_emit_set_long_const.

---
 gcc/config/rs6000/rs6000.cc | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 5efe9b22d8b..b03e059222b 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10242,7 +10242,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
-emit_move_insn (dest, GEN_INT ((ud1 ^ 0x8000) - 0x8000));
+emit_move_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
 
   else if ((ud4 == 0x && ud3 == 0x && (ud2 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000)))
@@ -10250,7 +10250,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
 
   emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest,
- GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
+ GEN_INT (sext_hwi (ud2 << 16, 32)));
   if (ud1 != 0)
emit_move_insn (dest,
gen_rtx_IOR (DImode, copy_rtx (temp),
@@ -10261,8 +10261,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
 
   gcc_assert (ud2 & 0x8000);
-  emit_move_insn (copy_rtx (temp),
- GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
+  emit_move_insn (copy_rtx (temp), GEN_INT (sext_hwi (ud2 << 16, 32)));
   if (ud1 != 0)
emit_move_insn (copy_rtx (temp),
gen_rtx_IOR (DImode, copy_rtx (temp),
@@ -10273,7 +10272,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   HOST_WIDE_INT num = (ud2 << 16) | ud1;
-  rs6000_emit_set_long_const (temp, (num ^ 0x8000) - 0x8000);
+  rs6000_emit_set_long_const (temp, sext_hwi (num, 32));
   rtx one = gen_rtx_AND (DImode, temp, GEN_INT (0x));
   rtx two = gen_rtx_ASHIFT (DImode, temp, GEN_INT (32));
   emit_move_insn (dest, gen_rtx_IOR (DImode, one, two));
@@ -10283,8 +10282,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
 
-  emit_move_insn (copy_rtx (temp),
- GEN_INT (((ud3 << 16) ^ 0x8000) - 0x8000));
+  emit_move_insn (copy_rtx (temp), GEN_INT (sext_hwi (ud3 << 16, 32)));
   if (ud2 != 0)
emit_move_insn (copy_rtx (temp),
gen_rtx_IOR (DImode, copy_rtx (temp),
@@ -10336,8 +10334,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
 
-  emit_move_insn (copy_rtx (temp),
- GEN_INT (((ud4 << 16) ^ 0x8000) - 0x8000));
+  emit_move_insn (copy_rtx (temp), GEN_INT (sext_hwi (ud4 << 16, 32)));
   if (ud3 != 0)
emit_move_insn (copy_rtx (temp),
gen_rtx_IOR (DImode, copy_rtx (temp),
-- 
2.17.1

>
>> > +  else
>> > +  {
>> > +emit_move_insn (temp,
>> > +GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
>> > +if (ud1 != 0)
>> > +  emit_move_insn (temp, gen_rtx_IOR (DImode, temp, GEN_INT (ud1)));
>> > +emit_move_insn (dest,
>> > +gen_rtx_ZERO_EXTEND (DImode,
>> > + gen_lowpart (SImode, temp)));
>> > +  }
>
> Why this?  Please just write it in DImode, do not go via SImode?
>
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/powerpc/pr106708.h
>> > @@ -0,0 +1,9 @@
>> > +/* Test constants which can be built by li/lis + oris/xoris */
>> > +void  __attribute__ ((__noinline__, __noclone__)) foo (long long *arg)
>> > +{
>> > +  *arg++ = 0x98765432ULL;
>> > +  *arg++ = 0x7cdeab55ULL;
>> > +

RE: [PATCH 7/8]AArch64: Consolidate zero and sign extension patterns and add missing ones.

2022-11-29 Thread Tamar Christina via Gcc-patches

Ping.

> -Original Message-
> From: Tamar Christina 
> Sent: Monday, October 31, 2022 12:00 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: [PATCH 7/8]AArch64: Consolidate zero and sign extension patterns
> and add missing ones.
> 
> Hi All,
> 
> The target has various zero and sign extension patterns.  These however live
> in various locations around the MD file and almost all of them are split
> differently.  Due to the various patterns we also ended up missing valid
> extensions.  For instance smov is almost never generated.
> 
> This change tries to make this more manageable by consolidating the
> patterns as much as possible and in doing so fix the missing alternatives.
> 
> There were also some duplicate patterns.  Note that the
> zero_extend<*_ONLY:mode>2  patterns are nearly
> identical however QImode lacks an alternative that the others don't have, so
> I have left them as
> 3 different patterns next to each other.
> 
> In a lot of cases the wrong iterator was used leaving out cases that should
> exist.
> 
> I've also changed the masks used for zero extensions to hex instead of
> decimal as it's more clear what they do that way, and aligns better with
> output of other compilers.
> 
> This leave the bulk of the extensions in just 3 patterns.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md
>   (*aarch64_get_lane_zero_extend):
> Changed to ...
>   (*aarch64_get_lane_zero_extend): ...
> This.
>   (*aarch64_get_lane_extenddi): New.
>   * config/aarch64/aarch64.md (sidi2, *extendsidi2_aarch64,
>   qihi2, *extendqihi2_aarch64, *zero_extendsidi2_aarch64):
> Remove
>   duplicate patterns.
>   (2,
>   *extend2_aarch64): Remove,
> consolidate
>   into ...
>   (extend2): ... This.
>   (*zero_extendqihi2_aarch64,
>   *zero_extend2_aarch64): Remove,
> consolidate into
>   ...
>   (zero_extend2,
>   zero_extend2,
>   zero_extend2):
>   (*ands_compare0): Renamed to ...
>   (*ands_compare0): ... This.
>   * config/aarch64/iterators.md (HI_ONLY, QI_ONLY): New.
>   (short_mask): Use hex rather than dec and add SI.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/ands_3.c: Update codegen.
>   * gcc.target/aarch64/sve/slp_1.c: Likewise.
>   * gcc.target/aarch64/tst_5.c: Likewise.
>   * gcc.target/aarch64/tst_6.c: Likewise.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> 8a84a8560e982b8155b18541f5504801b3330124..d0b37c4dd48aeafd3d87c90dc
> 3270e71af5a72b9 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4237,19 +4237,34 @@ (define_insn
> "*aarch64_get_lane_extend"
>[(set_attr "type" "neon_to_gp")]
>  )
> 
> -(define_insn
> "*aarch64_get_lane_zero_extend"
> +(define_insn "*aarch64_get_lane_extenddi"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (sign_extend:DI
> +   (vec_select:
> + (match_operand:VS 1 "register_operand" "w")
> + (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
> +  "TARGET_SIMD"
> +  {
> +operands[2] = aarch64_endian_lane_rtx (mode,
> +INTVAL (operands[2]));
> +return "smov\\t%x0, %1.[%2]";
> +  }
> +  [(set_attr "type" "neon_to_gp")]
> +)
> +
> +(define_insn
> "*aarch64_get_lane_zero_extend"
>[(set (match_operand:GPI 0 "register_operand" "=r")
>   (zero_extend:GPI
> -   (vec_select:
> - (match_operand:VDQQH 1 "register_operand" "w")
> +   (vec_select:
> + (match_operand:VDQV_L 1 "register_operand" "w")
>   (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
>"TARGET_SIMD"
>{
> -operands[2] = aarch64_endian_lane_rtx (mode,
> +operands[2] = aarch64_endian_lane_rtx (mode,
>  INTVAL (operands[2]));
> -return "umov\\t%w0, %1.[%2]";
> +return "umov\\t%w0, %1.[%2]";
>}
> -  [(set_attr "type" "neon_to_gp")]
> +  [(set_attr "type" "neon_to_gp")]
>  )
> 
>  ;; Lane extraction of a value, neither sign nor zero extension diff --git
> a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index
> 3ea16dbc2557c6a4f37104d44a49f77f768eb53d..09ae1118371f82ca63146fceb9
> 53eb9e820d05a4 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1911,22 +1911,6 @@ (define_insn
> "storewb_pair_"
>  ;; Sign/Zero extension
>  ;; ---
> 
> -(define_expand "sidi2"
> -  [(set (match_operand:DI 0 "register_operand")
> - (ANY_EXTEND:DI (match_operand:SI 1 "nonimmediate_operand")))]
> -  ""
> -)
> -
> -(define_insn "*extendsidi2_aarch64"
> -  [(set

RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-29 Thread Tamar Christina via Gcc-patches

Ping x3

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, November 22, 2022 4:01 PM
> To: Tamar Christina ; Richard Sandiford
> 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> 
> Ping
> 
> > -Original Message-
> > From: Gcc-patches  > bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> > Christina via Gcc-patches
> > Sent: Friday, November 11, 2022 2:40 PM
> > To: Richard Sandiford 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > ; Marcus Shawcroft
> > ; Kyrylo Tkachov
> 
> > Subject: RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> >
> > Hi,
> >
> >
> > > This name might cause confusion with the SVE iterators, where FULL
> > > means "every bit of the register is used".  How about something like
> > > VMOVE instead?
> > >
> > > With this change, I guess VALL_F16 represents "The set of all modes
> > > for which the vld1 intrinsics are provided" and VMOVE or whatever is
> > > "All Advanced SIMD modes suitable for moving, loading, and storing".
> > > That is, VMOVE extends VALL_F16 with modes that are not manifested
> > > via intrinsics.
> > >
> >
> > Done.
> >
> > > Where is the 2h used, and is it valid syntax in that context?
> > >
> > > Same for later instances of 2h.
> >
> > They are, but they weren't meant to be in this patch.  They belong in
> > a separate FP16 series that I won't get to finish for GCC 13 due not
> > being able to finish writing all the tests.  I have moved them to that patch
> series though.
> >
> > While the addp patch series has been killed, this patch is still good
> > standalone and improves codegen as shown in the updated testcase.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
> > (mov, movmisalign, aarch64_dup_lane,
> > aarch64_store_lane0, aarch64_simd_vec_set,
> > @aarch64_simd_vec_copy_lane, vec_set,
> > reduc__scal_, reduc__scal_,
> > aarch64_reduc__internal,
> > aarch64_get_lane,
> > vec_init, vec_extract): Support V2HF.
> > (aarch64_simd_dupv2hf): New.
> > * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
> > Add E_V2HFmode.
> > * config/aarch64/iterators.md (VHSDF_P): New.
> > (V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
> > Vel, q, vp): Add V2HF.
> > * config/arm/types.md (neon_fp_reduc_add_h): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/slp_1.c: Update testcase.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
> > e6c2d578fca4b7 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -19,10 +19,10 @@
> >  ;; .
> >
> >  (define_expand "mov"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -   (match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> > +   (match_operand:VMOVE 1 "general_operand"))]
> >"TARGET_SIMD"
> > -  "
> > +{
> >/* Force the operand into a register if it is not an
> >   immediate whose use can be replaced with xzr.
> >   If the mode is 16 bytes wide, then we will be doing @@ -46,12
> > +46,11 @@ (define_expand "mov"
> >aarch64_expand_vector_init (operands[0], operands[1]);
> >DONE;
> >  }
> > -  "
> > -)
> > +})
> >
> >  (define_expand "movmisalign"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -(match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> > +(match_operand:VMOVE 1 "general_operand"))]
> >"TARGET_SIMD && !STRICT_ALIGNMENT"
> >  {
> >/* This pattern is not permitted to fail during expansion: if both
> > arguments @@ -73,6 +72,16 @@ (define_insn
> "aarch64_simd_dup"
> >[(set_attr "type" "neon_dup, neon_from_gp")]
> >  )
> >
> > +(define_insn "aarch64_simd_dupv2hf"
> > +  [(set (match_operand:V2HF 0 "register_operand" "=w")
> > +   (vec_duplicate:V2HF
> > + (match_operand:HF 1 "register_operand" "0")))]
> > +  "TARGET_SIMD"
> > +  "@
> > +   sli\\t%d0, %d1, 16"
> > +  [(set_attr "type" "neon_shift_imm")]
> > +)
> > +
> >  (define_insn "aarch64_simd_dup"
> >[(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
> > (vec_duplicate:VDQF_F16
> > @@ -85,10 +94,10 @@ (define_insn "aarch64_simd_dup"
> >  )
> >
> >  (define_insn "aarch64_dup_lane"
> > -  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
> > -   (vec_duplicate:VALL_F16
> > +  [(set (match_operand:VMOVE 0 "register_operand" "=w")
> > +

Re: [RFA] src-release.sh: Fix gdb source tarball build failure due to libsframe

2022-11-29 Thread Joel Brobecker via Gcc-patches

> > "Joel" == Joel Brobecker via Gdb-patches  
> > writes:
> 
> Joel> ChangeLog:
> 
> Joel> * src-release.sh (GDB_SUPPORT_DIRS): Add libsframe.
> 
> Joel> Ok to apply to master?
> 
> Looks good to me.
> I think we recently agreed that gdb and binutils maintainers can approve
> patches like this... ?

Thanks Tom. Pushed to master.

FTR, I thought this script was also part of the GCC repository,
but discovered that this is not the case when I tried to apply
the same patch there.

-- 
Joel

[PATCH] NFC: use more readable pattern to clean high 32 bits

2022-11-29 Thread Jiufu Guo via Gcc-patches

Hi,

This patch is just using a more readable pattern for "rldicl x,x,0,32"
to clean high 32bits.
Old pattern looks like: r118:DI=zero_extend(r120:DI#0)
new pattern looks like: r118:DI=r120:DI&0x

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update
zero_extend(reg:DI#0) to reg:DI&0x

---
 gcc/config/rs6000/rs6000.cc | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index eb7ad5e954f..5efe9b22d8b 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10267,10 +10267,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
emit_move_insn (copy_rtx (temp),
gen_rtx_IOR (DImode, copy_rtx (temp),
 GEN_INT (ud1)));
-  emit_move_insn (dest,
- gen_rtx_ZERO_EXTEND (DImode,
-  gen_lowpart (SImode,
-   copy_rtx (temp;
+  emit_move_insn (dest, gen_rtx_AND (DImode, temp, GEN_INT (0x)));
 }
   else if (ud1 == ud3 && ud2 == ud4)
 {
-- 
2.17.1

Re: [PATCH] [x86] Fix unrecognizable insn due to illegal immediate_operand (const_int 255) of QImode.

2022-11-29 Thread Hongtao Liu via Gcc-patches

On Wed, Nov 30, 2022 at 3:12 AM H.J. Lu  wrote:
>
> On Mon, Nov 28, 2022 at 11:04 PM Hongtao Liu  wrote:
> >
> > On Mon, Nov 28, 2022 at 9:06 PM liuhongt  wrote:
> > >
> > > For __builtin_ia32_vec_set_v16qi (a, -1, 2) with
> > > !flag_signed_char. it's transformed to
> > > __builtin_ia32_vec_set_v16qi (_4, 255, 2) in the gimple,
> > > and expanded to (const_int 255) in the rtl. But for immediate_operand,
> > > it expects (const_int 255) to be signed extended to
> > > (const_int -1). The mismatch caused an unrecognizable insn error.
> > >
> > > expand_expr_real_1 generates (const_int 255) without considering the 
> > > target mode.
> > > I guess it's on purpose, so I'll leave that alone and only change the 
> > > expander
> > > in the backend. After applying convert_modes to (const_int 255),
> > > it's transformed to (const_int -1) which fix the issue.
> > >
> > > Bootstrapped and regtested x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk(and backport to GCC-10/11/12 release branches)?
> > Drop this patch since it's not a complete solution, there're also
> > other QI builtins which is not handled.
>
> I checked the x86 backend.  __builtin_ia32_vec_set_v16qi is the
> only intrinsic with this issue.
Ok, I'll commit the patch.
>
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/107863
> > > * config/i386/i386-expand.cc (ix86_expand_vec_set_builtin):
> > > Convert op1 to target mode whenever mode mismatch.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr107863.c: New test.
> > > ---
> > >  gcc/config/i386/i386-expand.cc   | 2 +-
> > >  gcc/testsuite/gcc.target/i386/pr107863.c | 8 
> > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr107863.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index 0373c3614a4..c639ee3a9f7 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -12475,7 +12475,7 @@ ix86_expand_vec_set_builtin (tree exp)
> > >op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
> > >elt = get_element_number (TREE_TYPE (arg0), arg2);
> > >
> > > -  if (GET_MODE (op1) != mode1 && GET_MODE (op1) != VOIDmode)
> > > +  if (GET_MODE (op1) != mode1)
> > >  op1 = convert_modes (mode1, GET_MODE (op1), op1, true);
> > >
> > >op0 = force_reg (tmode, op0);
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr107863.c 
> > > b/gcc/testsuite/gcc.target/i386/pr107863.c
> > > new file mode 100644
> > > index 000..99fd85d9765
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr107863.c
> > > @@ -0,0 +1,8 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-mavx2 -O" } */
> > > +
> > > +typedef char v16qi __attribute__((vector_size(16)));
> > > +
> > > +v16qi foo(v16qi a){
> > > +  return __builtin_ia32_vec_set_v16qi (a, -1, 2);
> > > +}
> > > --
> > > 2.27.0
> > >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> H.J.



-- 
BR,
Hongtao

[committed] analyzer work on issues with flex-generated lexers [PR103546]

2022-11-29 Thread David Malcolm via Gcc-patches

PR analyzer/103546 tracks various false positives seen on
flex-generated lexers.

Whilst investigating them, I noticed an ICE with
-fanalyzer-call-summaries due to attempting to store sm-state
for an UNKNOWN svalue, which this patch fixes.

This patch also provides known_function implementations of all of the
external functions called by the lexer, reducing the number of false
positives.

The patch doesn't eliminate all false positives, but adds integration
tests to try to establish a baseline from which the remaining false
positives can be fixed.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4399-g78a17f4452db95.

gcc/analyzer/ChangeLog:
PR analyzer/103546
* analyzer.h (register_known_file_functions): New decl.
* program-state.cc (sm_state_map::replay_call_summary): Rejct
attempts to store sm-state for caller_sval that can't have
associated state.
* region-model-impl-calls.cc (register_known_functions): Call
register_known_file_functions.
* sm-fd.cc (class kf_isatty): New.
(register_known_fd_functions): Register it.
* sm-file.cc (class kf_ferror): New.
(class kf_fileno): New.
(class kf_getc): New.
(register_known_file_functions): New.

gcc/ChangeLog:
PR analyzer/103546
* doc/invoke.texi (Static Analyzer Options): Add isatty, ferror,
fileno, and getc to the list of functions known to the analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/103546
* gcc.dg/analyzer/ferror-1.c: New test.
* gcc.dg/analyzer/fileno-1.c: New test.
* gcc.dg/analyzer/flex-with-call-summaries.c: New test.
* gcc.dg/analyzer/flex-without-call-summaries.c: New test.
* gcc.dg/analyzer/getc-1.c: New test.
* gcc.dg/analyzer/isatty-1.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.h   |1 +
 gcc/analyzer/program-state.cc |2 +
 gcc/analyzer/region-model-impl-calls.cc   |1 +
 gcc/analyzer/sm-fd.cc |   79 +
 gcc/analyzer/sm-file.cc   |   53 +
 gcc/doc/invoke.texi   |4 +
 gcc/testsuite/gcc.dg/analyzer/ferror-1.c  |6 +
 gcc/testsuite/gcc.dg/analyzer/fileno-1.c  |6 +
 .../analyzer/flex-with-call-summaries.c   | 1683 +
 .../analyzer/flex-without-call-summaries.c| 1680 
 gcc/testsuite/gcc.dg/analyzer/getc-1.c|6 +
 gcc/testsuite/gcc.dg/analyzer/isatty-1.c  |   56 +
 12 files changed, 3577 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/ferror-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fileno-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/flex-with-call-summaries.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/flex-without-call-summaries.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/getc-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/isatty-1.c

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 4fbe092199f..35c71f3d69c 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -259,6 +259,7 @@ public:
 
 extern void register_known_functions (known_function_manager );
 extern void register_known_fd_functions (known_function_manager );
+extern void register_known_file_functions (known_function_manager );
 extern void register_varargs_builtins (known_function_manager );
 
 /* Passed by pointer to PLUGIN_ANALYZER_INIT callbacks.  */
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index 037dbecb6f1..3942b5fdc18 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -821,6 +821,8 @@ sm_state_map::replay_call_summary (call_summary_replay ,
   const svalue *caller_sval = r.convert_svalue_from_summary (summary_sval);
   if (!caller_sval)
continue;
+  if (!caller_sval->can_have_associated_state_p ())
+   continue;
   const svalue *summary_origin = kv.second.m_origin;
   const svalue *caller_origin
= (summary_origin
diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 37cb09f9195..6d8c9f94138 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -1662,6 +1662,7 @@ register_known_functions (known_function_manager )
 kfm.add ("putenv", make_unique ());
 
 register_known_fd_functions (kfm);
+register_known_file_functions (kfm);
   }
 
   /* glibc functions.  */
diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index 8f8ec851bab..794733e55ca 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -2487,6 +2487,84 @@ public:
   }
 };
 
+/* Handler for "isatty"".
+   See e.g. https://man7.org/linux/man-pages/man3/isatty.3.html  */
+
+class kf_isatty : public known_function
+{
+  class outcome_of_isatty : public succeed_or_fail_call_info

[committed] analyzer: move stdio known fns to sm-file.cc

2022-11-29 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4400-g84046b192e568e.

gcc/analyzer/ChangeLog:
* region-model-impl-calls.cc (class kf_fgets): Move to sm-file.cc.
(kf_fgets::impl_call_pre): Likewise.
(class kf_fread): Likewise.
(kf_fread::impl_call_pre): Likewise.
(class kf_getchar): Likewise.
(class kf_stdio_output_fn): Likewise.
(register_known_functions): Move registration of
BUILT_IN_FPRINTF, BUILT_IN_FPRINTF_UNLOCKED, BUILT_IN_FPUTC,
BUILT_IN_FPUTC_UNLOCKED, BUILT_IN_FPUTS, BUILT_IN_FPUTS_UNLOCKED,
BUILT_IN_FWRITE, BUILT_IN_FWRITE_UNLOCKED, BUILT_IN_PRINTF,
BUILT_IN_PRINTF_UNLOCKED, BUILT_IN_PUTC, BUILT_IN_PUTCHAR,
BUILT_IN_PUTCHAR_UNLOCKED, BUILT_IN_PUTC_UNLOCKED, BUILT_IN_PUTS,
BUILT_IN_PUTS_UNLOCKED, BUILT_IN_VFPRINTF, BUILT_IN_VPRINTF,
"getchar", "fgets", "fgets_unlocked", and "fread" to
register_known_file_functions.
* sm-file.cc (class kf_stdio_output_fn): Move here from
region-model-impl-calls.cc.
(class kf_fgets): Likewise.
(class kf_fread): Likewise.
(class kf_getchar): Likewise.
(register_known_file_functions): Move registration of
BUILT_IN_FPRINTF, BUILT_IN_FPRINTF_UNLOCKED, BUILT_IN_FPUTC,
BUILT_IN_FPUTC_UNLOCKED, BUILT_IN_FPUTS, BUILT_IN_FPUTS_UNLOCKED,
BUILT_IN_FWRITE, BUILT_IN_FWRITE_UNLOCKED, BUILT_IN_PRINTF,
BUILT_IN_PRINTF_UNLOCKED, BUILT_IN_PUTC, BUILT_IN_PUTCHAR,
BUILT_IN_PUTCHAR_UNLOCKED, BUILT_IN_PUTC_UNLOCKED, BUILT_IN_PUTS,
BUILT_IN_PUTS_UNLOCKED, BUILT_IN_VFPRINTF, BUILT_IN_VPRINTF,
"fgets", "fgets_unlocked", "fread", and "getchar" to here from
register_known_functions.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model-impl-calls.cc | 111 
 gcc/analyzer/sm-file.cc | 106 ++
 2 files changed, 106 insertions(+), 111 deletions(-)

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 6d8c9f94138..8ba644c33cd 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -704,66 +704,6 @@ kf_error::impl_call_pre (const call_details ) const
   ctxt->terminate_path ();
 }
 
-/* Handler for "fgets" and "fgets_unlocked".  */
-
-class kf_fgets : public known_function
-{
-public:
-  bool matches_call_types_p (const call_details ) const final override
-  {
-return (cd.num_args () == 3
-   && cd.arg_is_pointer_p (0)
-   && cd.arg_is_pointer_p (2));
-  }
-
-  void impl_call_pre (const call_details ) const final override;
-};
-
-void
-kf_fgets::impl_call_pre (const call_details ) const
-{
-  /* Ideally we would bifurcate state here between the
- error vs no error cases.  */
-  region_model *model = cd.get_model ();
-  const svalue *ptr_sval = cd.get_arg_svalue (0);
-  if (const region *reg = ptr_sval->maybe_get_region ())
-{
-  const region *base_reg = reg->get_base_region ();
-  const svalue *new_sval = cd.get_or_create_conjured_svalue (base_reg);
-  model->set_value (base_reg, new_sval, cd.get_ctxt ());
-}
-}
-
-/* Handler for "fread"".  */
-
-class kf_fread : public known_function
-{
-public:
-  bool matches_call_types_p (const call_details ) const final override
-  {
-return (cd.num_args () == 4
-   && cd.arg_is_pointer_p (0)
-   && cd.arg_is_size_p (1)
-   && cd.arg_is_size_p (2)
-   && cd.arg_is_pointer_p (3));
-  }
-
-  void impl_call_pre (const call_details ) const final override;
-};
-
-void
-kf_fread::impl_call_pre (const call_details ) const
-{
-  region_model *model = cd.get_model ();
-  const svalue *ptr_sval = cd.get_arg_svalue (0);
-  if (const region *reg = ptr_sval->maybe_get_region ())
-{
-  const region *base_reg = reg->get_base_region ();
-  const svalue *new_sval = cd.get_or_create_conjured_svalue (base_reg);
-  model->set_value (base_reg, new_sval, cd.get_ctxt ());
-}
-}
-
 /* Handler for "free", after sm-handling.
 
If the ptr points to an underlying heap region, delete the region,
@@ -803,20 +743,6 @@ kf_free::impl_call_post (const call_details ) const
 }
 }
 
-/* Handler for "getchar"".  */
-
-class kf_getchar : public known_function
-{
-public:
-  bool matches_call_types_p (const call_details ) const final override
-  {
-return cd.num_args () == 0;
-  }
-
-  /* Empty.  No side-effects (tracking stream state is out-of-scope
- for the analyzer).  */
-};
-
 /* Handle the on_call_pre part of "malloc".  */
 
 class kf_malloc : public known_function
@@ -1455,21 +1381,6 @@ public:
   /* Currently a no-op.  */
 };
 
-/* Handler for various stdio-related builtins that merely have external
-   effects that are out of scope for the analyzer: we only want to model
-   the effects on the return value.  */
-
-class kf_stdio_output_fn :

[committed] analyzer: fix folding of '(PTR + 0) => PTR' [PR105784]

2022-11-29 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4398-g3a32fb2eaa761a.

gcc/analyzer/ChangeLog:
PR analyzer/105784
* region-model-manager.cc
(region_model_manager::maybe_fold_binop): For POINTER_PLUS_EXPR,
PLUS_EXPR and MINUS_EXPR, eliminate requirement that the final
type matches that of arg0 in favor of a cast.

gcc/testsuite/ChangeLog:
PR analyzer/105784
* gcc.dg/analyzer/torture/fold-ptr-arith-pr105784.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model-manager.cc  |  8 ++--
 .../torture/fold-ptr-arith-pr105784.c | 43 +++
 2 files changed, 47 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/analyzer/torture/fold-ptr-arith-pr105784.c

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index d9a7ae91a35..ae63c664ae5 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -613,13 +613,13 @@ region_model_manager::maybe_fold_binop (tree type, enum 
tree_code op,
 case POINTER_PLUS_EXPR:
 case PLUS_EXPR:
   /* (VAL + 0) -> VAL.  */
-  if (cst1 && zerop (cst1) && type == arg0->get_type ())
-   return arg0;
+  if (cst1 && zerop (cst1))
+   return get_or_create_cast (type, arg0);
   break;
 case MINUS_EXPR:
   /* (VAL - 0) -> VAL.  */
-  if (cst1 && zerop (cst1) && type == arg0->get_type ())
-   return arg0;
+  if (cst1 && zerop (cst1))
+   return get_or_create_cast (type, arg0);
   break;
 case MULT_EXPR:
   /* (VAL * 0).  */
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/fold-ptr-arith-pr105784.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/fold-ptr-arith-pr105784.c
new file mode 100644
index 000..5e5a2bf79a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/fold-ptr-arith-pr105784.c
@@ -0,0 +1,43 @@
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } { "" } } */
+
+#include "../analyzer-decls.h"
+
+extern _Bool quit_flag;
+extern void char_charset (int);
+
+static void
+__analyzer_ccl_driver (int *source, int src_size)
+{
+  int *src = source, *src_end = src + src_size;
+  int i = 0;
+
+  while (!quit_flag)
+{
+  if (src < src_end)
+   {
+ __analyzer_dump_path (); /* { dg-message "path" } */
+ i = *src++; /* { dg-bogus "uninit" } */
+   }
+  char_charset (i);
+}
+}
+
+void
+Fccl_execute_on_string (char *str, long str_bytes)
+{
+  while (1)
+{
+  char *p = str;
+  char *endp = str + str_bytes;
+  int source[1024];
+  int src_size = 0;
+
+  while (src_size < 1024 && p < endp)
+   {
+ __analyzer_dump_path (); /* { dg-message "path" } */
+ source[src_size++] = *p++;
+   }
+
+  __analyzer_ccl_driver (source, src_size);
+}
+}
-- 
2.26.3

Re: [PATCH v2] libgo: Don't rely on GNU-specific strerror_r variant on Linux

2022-11-29 Thread Ian Lance Taylor via Gcc-patches

On Tue, Nov 29, 2022 at 9:54 AM  wrote:
>
> From: Sören Tempel 
>
> On glibc, there are two versions of strerror_r: An XSI-compliant and a
> GNU-specific version. The latter is only available on glibc. In order
> to avoid duplicating the post-processing code of error messages, this
> commit provides a separate strerror_go symbol which always refers to the
> XSI-compliant version of strerror_r (even on glibc) by selectively
> undefining the corresponding feature test macro.
>
> Previously, gofrontend assumed that the GNU-specific version of
> strerror_r was always available on Linux (which isn't the case when
> using a musl as a libc, for example). This commit thereby improves
> compatibility with Linux systems that are not using glibc.
>
> Tested on x86_64 Alpine Linux Edge and Arch Linux (glibc 2.36).

Thanks.  I committed a version of this, as attached.

Ian
b6c6a3d64f2e4e9347733290aca3c75898c44b2e
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 7e531c3f90b..984d8324004 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-5e658f4659c551330ea68f5667e4f951b218f32d
+fef6aa3c1678cdbe7dca454b2cebb369d8ba81bf
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index b03e6553e90..207d5a98127 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -465,6 +465,7 @@ runtime_files = \
runtime/go-nanotime.c \
runtime/go-now.c \
runtime/go-nosys.c \
+   runtime/go-strerror.c \
runtime/go-reflect-call.c \
runtime/go-setenv.c \
runtime/go-signal.c \
diff --git a/libgo/go/syscall/errstr.go b/libgo/go/syscall/errstr.go
index 59f7a82c6d7..9f688e2a0c7 100644
--- a/libgo/go/syscall/errstr.go
+++ b/libgo/go/syscall/errstr.go
@@ -4,23 +4,19 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-//go:build !hurd && !linux
-// +build !hurd,!linux
-
 package syscall
 
-//sysnbstrerror_r(errnum int, buf []byte) (err Errno)
-//strerror_r(errnum _C_int, buf *byte, buflen Size_t) _C_int
+import "internal/bytealg"
+
+//extern go_strerror
+func go_strerror(_C_int, *byte, Size_t) _C_int
 
 func Errstr(errnum int) string {
-   for len := 128; ; len *= 2 {
-   b := make([]byte, len)
-   errno := strerror_r(errnum, b)
+   for size := 128; ; size *= 2 {
+   b := make([]byte, size)
+   errno := go_strerror(_C_int(errnum), [0], Size_t(len(b)))
if errno == 0 {
-   i := 0
-   for b[i] != 0 {
-   i++
-   }
+   i := bytealg.IndexByte(b, 0)
// Lowercase first letter: Bad -> bad, but
// STREAM -> STREAM.
if i > 1 && 'A' <= b[0] && b[0] <= 'Z' && 'a' <= b[1] 
&& b[1] <= 'z' {
@@ -29,7 +25,7 @@ func Errstr(errnum int) string {
return string(b[:i])
}
if errno != ERANGE {
-   return "errstr failure"
+   return "strerror_r failure"
}
}
 }
diff --git a/libgo/go/syscall/errstr_glibc.go b/libgo/go/syscall/errstr_glibc.go
deleted file mode 100644
index 03a327dbc90..000
--- a/libgo/go/syscall/errstr_glibc.go
+++ /dev/null
@@ -1,34 +0,0 @@
-// errstr_glibc.go -- GNU/Linux and GNU/Hurd specific error strings.
-
-// Copyright 2010 The Go Authors. All rights reserved.
-// Use of this source code is governed by a BSD-style
-// license that can be found in the LICENSE file.
-
-// We use this rather than errstr.go because on GNU/Linux sterror_r
-// returns a pointer to the error message, and may not use buf at all.
-
-//go:build hurd || linux
-// +build hurd linux
-
-package syscall
-
-import "unsafe"
-
-//sysnbstrerror_r(errnum int, b []byte) (errstr *byte)
-//strerror_r(errnum _C_int, b *byte, len Size_t) *byte
-
-func Errstr(errnum int) string {
-   a := make([]byte, 128)
-   p := strerror_r(errnum, a)
-   b := (*[1000]byte)(unsafe.Pointer(p))
-   i := 0
-   for b[i] != 0 {
-   i++
-   }
-   // Lowercase first letter: Bad -> bad, but STREAM -> STREAM.
-   if i > 1 && 'A' <= b[0] && b[0] <= 'Z' && 'a' <= b[1] && b[1] <= 'z' {
-   c := b[0] + 'a' - 'A'
-   return string(c) + string(b[1:i])
-   }
-   return string(b[:i])
-}
diff --git a/libgo/runtime/go-strerror.c b/libgo/runtime/go-strerror.c
new file mode 100644
index 000..13d1d91df84
--- /dev/null
+++ b/libgo/runtime/go-strerror.c
@@ -0,0 +1,37 @@
+/* go-strerror.c -- wrapper around XSI-compliant strerror_r.
+
+   Copyright 2022 The Go Authors. All rights reserved.
+   Use of this source code is governed by a BSD-style
+   license that can be found in the LICENSE

Re: [PATCH] c++: Incremental fix for g++.dg/gomp/for-21.C [PR84469]

2022-11-29 Thread Jakub Jelinek via Gcc-patches

On Tue, Nov 29, 2022 at 04:38:50PM -0500, Jason Merrill wrote:
> > --- gcc/testsuite/g++.dg/gomp/for-21.C.jj   2020-01-12 11:54:37.178401867 
> > +0100
> > +++ gcc/testsuite/g++.dg/gomp/for-21.C  2022-11-29 13:06:59.038410557 
> > +0100
> > @@ -54,9 +54,9 @@ void
> >   f6 (S ()[10])
> >   {
> > #pragma omp for collapse (2)
> > -  for (auto [i, j, k] : a) // { dg-error "use of 'i' 
> > before deduction of 'auto'" "" { target *-*-* } .-1 }
> > +  for (auto [i, j, k] : a) // { dg-error "use of 'i' 
> > before deduction of 'auto'" }
> >   for (int l = i; l < j; l += k)// { dg-error "use of 
> > 'j' before deduction of 'auto'" }
> > -  ;// { dg-error "use of 
> > 'k' before deduction of 'auto'" "" { target *-*-* } .-3 }
> > +  ;// { dg-error "use of 
> > 'k' before deduction of 'auto'" "" { target *-*-* } .-1 }
> 
> Hmm, this error is surprising: since the initializer is non-dependent, we
> should have deduced immediately.  I'd expect the same error as in the
> non-structured-binding cases, "* expression refers to iteration variable".

The reason was just to be consistent what is (unfortunately) emitted
in the other cases (!processing_template_decl or type dependent).
I guess I could try how much work would it be to deduce it sooner, but
generally it is pretty corner case, people rarely do this in OpenMP code.

Jakub

Fwd: [PATCH] libstdc++: Add error handler for

2022-11-29 Thread Björn Schäpers






 Weitergeleitete Nachricht 
Betreff: [PATCH] libstdc++: Add error handler for 
Datum: Tue, 29 Nov 2022 22:41:07 +0100
Von: Björn Schäpers 
An: gcc-patc...@gc.gnu.org, libstd...@gcc.gnu.org

From: Björn Schäpers 

Not providing an error handler results in a nullpointer dereference when
an error occurs.

libstdc++-v3/ChangeLog

* include/std/stacktrace: Add __backtrace_error_handler and use
it in all calls to libbacktrace.
---
 libstdc++-v3/include/std/stacktrace | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/std/stacktrace 
b/libstdc++-v3/include/std/stacktrace

index e7cbbee5638..b786441cbad 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -85,6 +85,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  #define __cpp_lib_stacktrace 202011L
 +  inline void
+  __backtrace_error_handler(void*, const char*, int) {}
+
   // [stacktrace.entry], class stacktrace_entry
   class stacktrace_entry
   {
@@ -159,7 +162,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _S_init()
 {
   static __glibcxx_backtrace_state* __state
-   = __glibcxx_backtrace_create_state(nullptr, 1, nullptr, nullptr);
+   = __glibcxx_backtrace_create_state(nullptr, 1,
+  __backtrace_error_handler, nullptr);
   return __state;
 }
 @@ -192,7 +196,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  return __function != nullptr;
   };
   const auto __state = _S_init();
-  if (::__glibcxx_backtrace_pcinfo(__state, _M_pc, +__cb, nullptr, 
&__data))
+  if (::__glibcxx_backtrace_pcinfo(__state, _M_pc, +__cb,
+  __backtrace_error_handler, &__data))
return true;
   if (__desc && __desc->empty())
{
@@ -201,8 +206,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  if (__symname)
*static_cast<_Data*>(__data)->_M_desc = _S_demangle(__symname);
  };
- if (::__glibcxx_backtrace_syminfo(__state, _M_pc, +__cb2, nullptr,
-   &__data))
+ if (::__glibcxx_backtrace_syminfo(__state, _M_pc, +__cb2,
+   __backtrace_error_handler, &__data))
return true;
}
   return false;
@@ -252,7 +257,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (auto __cb = __ret._M_prepare()) [[likely]]
  {
auto __state = stacktrace_entry::_S_init();
-   if (__glibcxx_backtrace_simple(__state, 1, __cb, nullptr,
+   if (__glibcxx_backtrace_simple(__state, 1, __cb,
+  __backtrace_error_handler,
   std::__addressof(__ret)))
  __ret._M_clear();
  }
@@ -270,7 +276,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (auto __cb = __ret._M_prepare()) [[likely]]
  {
auto __state = stacktrace_entry::_S_init();
-   if (__glibcxx_backtrace_simple(__state, __skip + 1, __cb, nullptr,
+   if (__glibcxx_backtrace_simple(__state, __skip + 1, __cb,
+  __backtrace_error_handler,
   std::__addressof(__ret)))
  __ret._M_clear();
  }
@@ -294,7 +301,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  {
auto __state = stacktrace_entry::_S_init();
int __err = __glibcxx_backtrace_simple(__state, __skip + 1, __cb,
-  nullptr,
+  __backtrace_error_handler,
   std::__addressof(__ret));
if (__err < 0)
  __ret._M_clear();
--
2.38.1

Re: [PATCH] c++: Incremental fix for g++.dg/gomp/for-21.C [PR84469]

2022-11-29 Thread Jason Merrill via Gcc-patches


On 11/29/22 07:32, Jakub Jelinek wrote:

Hi!

The PR84469 patch I've just posted regresses the for-21.C testcase,
when in OpenMP loop there are at least 2 associated loops and
in a template outer structured binding with non type dependent expression
is used in the expressions of some inner loop, we don't diagnose those
any longer, as the (weirdly worded) diagnostics was only done during
finish_id_expression -> mark_used which for the inner loop expressions
happens before the structured bindings are finalized.  When in templates,
mark_used doesn't diagnose uses of non-deduced variables, and if the
range for expression is type dependent, it is similarly diagnosed during
instantiation.  But newly with the PR84469 fix if the range for expression
is not type dependent, there is no place that would diagnose it, as during
instantiation the structured bindings are already deduced.

The following patch diagnoses it in that case during finish_omp_for (for
consistency with the same weird message).

I'll commit this to trunk if the other patch is approved and it passes
bootstrap/regtest.

2022-11-29  Jakub Jelinek  

PR c++/84469
* semantics.cc: Define INCLUDE_MEMORY before including system.h.
(struct finish_omp_for_data): New type.
(finish_omp_for_decomps_r): New function.
(finish_omp_for): Diagnose uses of non-type-dependent range for
loop decompositions in inner OpenMP associated loops in templates.

* g++.dg/gomp/for-21.C (f6): Adjust lines of expected diagnostics.
* g++.dg/gomp/for-22.C: New test.

--- gcc/cp/semantics.cc.jj  2022-11-19 09:21:14.897436616 +0100
+++ gcc/cp/semantics.cc 2022-11-29 12:58:36.165771985 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
  .  */
  
  #include "config.h"

+#define INCLUDE_MEMORY
  #include "system.h"
  #include "coretypes.h"
  #include "target.h"
@@ -10401,6 +10402,47 @@ handle_omp_for_class_iterator (int i, lo
return false;
  }
  
+struct finish_omp_for_data {

+  std::unique_ptr> decomps;
+  bool fail;
+  location_t loc;
+};
+
+/* Helper function for finish_omp_for.  Diagnose uses of structured
+   bindings of OpenMP collapsed loop range for loops in the associated
+   loops.  If not processing_template_decl, this is diagnosed by
+   finish_id_expression -> mark_used before the range for is deduced.
+   And if processing_template_decl and the range for expression is
+   type dependent, it is similarly diagnosed during instantiation.
+   Only when processing_template_decl and range for expression is
+   not type dependent, we wouldn't diagnose it at all, so do it
+   from finish_omp_for in that case.  */
+
+static tree
+finish_omp_for_decomps_r (tree *tp, int *, void *d)
+{
+  if (VAR_P (*tp)
+  && DECL_DECOMPOSITION_P (*tp)
+  && !type_dependent_expression_p (*tp)
+  && DECL_HAS_VALUE_EXPR_P (*tp))
+{
+  tree v = DECL_VALUE_EXPR (*tp);
+  if (TREE_CODE (v) == ARRAY_REF
+ && VAR_P (TREE_OPERAND (v, 0))
+ && DECL_DECOMPOSITION_P (TREE_OPERAND (v, 0)))
+   {
+ finish_omp_for_data *data = (finish_omp_for_data *) d;
+ if (data->decomps->contains (TREE_OPERAND (v, 0)))
+   {
+ error_at (data->loc, "use of %qD before deduction of %",
+   *tp);
+ data->fail = true;
+   }
+   }
+}
+  return NULL_TREE;
+}
+
  /* Build and validate an OMP_FOR statement.  CLAUSES, BODY, COND, INCR
 are directly for their associated operands in the statement.  DECL
 and INIT are a combo; if DECL is NULL then INIT ought to be a
@@ -10419,6 +10461,7 @@ finish_omp_for (location_t locus, enum t
int i;
int collapse = 1;
int ordered = 0;
+  finish_omp_for_data data;
  
gcc_assert (TREE_VEC_LENGTH (declv) == TREE_VEC_LENGTH (initv));

gcc_assert (TREE_VEC_LENGTH (declv) == TREE_VEC_LENGTH (condv));
@@ -10479,7 +10522,25 @@ finish_omp_for (location_t locus, enum t
elocus = EXPR_LOCATION (init);
  
if (cond == global_namespace)

-   continue;
+   {
+ gcc_assert (processing_template_decl);
+ if (TREE_VEC_LENGTH (declv) > 1
+ && VAR_P (decl)
+ && DECL_DECOMPOSITION_P (decl)
+ && !type_dependent_expression_p (decl))
+   {
+ gcc_assert (DECL_HAS_VALUE_EXPR_P (decl));
+ tree v = DECL_VALUE_EXPR (decl);
+ gcc_assert (TREE_CODE (v) == ARRAY_REF
+ && VAR_P (TREE_OPERAND (v, 0))
+ && DECL_DECOMPOSITION_P (TREE_OPERAND (v, 0)));
+ if (!data.decomps)
+   data.decomps
+ = std::unique_ptr> (new hash_set);
+ data.decomps->add (TREE_OPERAND (v, 0));
+   }
+ continue;
+   }
  
if (cond == NULL)

{
@@ -10497,6 +10558,37 @@ finish_omp_for (location_t locus, enum t
TREE_VEC_ELT (initv, i) = init;

Re: [PATCH] c++: ICE with <=> of incompatible pointers [PR107542]

2022-11-29 Thread Jason Merrill via Gcc-patches


On 11/29/22 15:03, Patrick Palka wrote:

In a SFINAE context composite_pointer_type returns error_mark_node if
the given pointer types are incompatible, but the SPACESHIP_EXPR case of
cp_build_binary_op wasn't prepared to handle error_mark_node, which led
to an ICE (from spaceship_comp_cat) for the below testcase where we form
a <=> with incompatible pointer operands.

(In a non-SFINAE context composite_pointer_type issues a permerror and
returns cv void* in this case, so no ICE.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/12?


OK.


PR c++/107542

gcc/cp/ChangeLog:

* typeck.cc (cp_build_binary_op): Handle result_type being
error_mark_node in the SPACESHIP_EXPR case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/spaceship-sfinae2.C: New test.
---
  gcc/cp/typeck.cc  |  5 ++--
  .../g++.dg/cpp2a/spaceship-sfinae2.C  | 29 +++
  2 files changed, 32 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae2.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index f0e7452f3a0..10b7ed020f7 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -6215,8 +6215,9 @@ cp_build_binary_op (const op_location_t ,
tree_code orig_code0 = TREE_CODE (orig_type0);
tree orig_type1 = TREE_TYPE (orig_op1);
tree_code orig_code1 = TREE_CODE (orig_type1);
-  if (!result_type)
-   /* Nope.  */;
+  if (!result_type || result_type == error_mark_node)
+   /* Nope.  */
+   result_type = NULL_TREE;
else if ((orig_code0 == BOOLEAN_TYPE) != (orig_code1 == BOOLEAN_TYPE))
/* "If one of the operands is of type bool and the other is not, the
   program is ill-formed."  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae2.C 
b/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae2.C
new file mode 100644
index 000..52ff038b36f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae2.C
@@ -0,0 +1,29 @@
+// PR c++/107542
+// { dg-do compile { target c++20 } }
+
+#include 
+
+template
+concept same_as = __is_same(T, U);
+
+template
+concept Ord = requires(const Lhs& lhs, const Rhs& rhs) {
+  { lhs <=> rhs } -> same_as;
+};
+
+static_assert(Ord);  // Works.
+static_assert(!Ord); // ICE.
+
+template
+struct S {
+  T* p;
+};
+
+template
+  requires(Ord)
+constexpr inline auto operator<=>(const S& l, const S& r) noexcept {
+  return l.p <=> r.p;
+}
+
+static_assert(Ord, S>);   // Works.
+static_assert(!Ord, S>); // ICE.

Re: Ping: [PATCH] maintainer-scripts: Add gdc to update_web_docs_git

2022-11-29 Thread Gerald Pfeifer

Hi Iain,

On Tue, 29 Nov 2022, Iain Buclaw via Gcc-patches wrote:
> This looks obvious, however I don't know how things are generated for
> the online documentation site in order to say this won't cause any
> problems for whatever process is building these pages.

>> maintainer-scripts/ChangeLog:
>> 
>>  * update_web_docs_git: Add gdc to MANUALS.

please go ahead and let me know when done. I'll see how I can help.

Gerald

[PATCH] c++: ICE with <=> of incompatible pointers [PR107542]

2022-11-29 Thread Patrick Palka via Gcc-patches

In a SFINAE context composite_pointer_type returns error_mark_node if
the given pointer types are incompatible, but the SPACESHIP_EXPR case of
cp_build_binary_op wasn't prepared to handle error_mark_node, which led
to an ICE (from spaceship_comp_cat) for the below testcase where we form
a <=> with incompatible pointer operands.

(In a non-SFINAE context composite_pointer_type issues a permerror and
returns cv void* in this case, so no ICE.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/12?

PR c++/107542

gcc/cp/ChangeLog:

* typeck.cc (cp_build_binary_op): Handle result_type being
error_mark_node in the SPACESHIP_EXPR case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/spaceship-sfinae2.C: New test.
---
 gcc/cp/typeck.cc  |  5 ++--
 .../g++.dg/cpp2a/spaceship-sfinae2.C  | 29 +++
 2 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae2.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index f0e7452f3a0..10b7ed020f7 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -6215,8 +6215,9 @@ cp_build_binary_op (const op_location_t ,
   tree_code orig_code0 = TREE_CODE (orig_type0);
   tree orig_type1 = TREE_TYPE (orig_op1);
   tree_code orig_code1 = TREE_CODE (orig_type1);
-  if (!result_type)
-   /* Nope.  */;
+  if (!result_type || result_type == error_mark_node)
+   /* Nope.  */
+   result_type = NULL_TREE;
   else if ((orig_code0 == BOOLEAN_TYPE) != (orig_code1 == BOOLEAN_TYPE))
/* "If one of the operands is of type bool and the other is not, the
   program is ill-formed."  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae2.C 
b/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae2.C
new file mode 100644
index 000..52ff038b36f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-sfinae2.C
@@ -0,0 +1,29 @@
+// PR c++/107542
+// { dg-do compile { target c++20 } }
+
+#include 
+
+template
+concept same_as = __is_same(T, U);
+
+template
+concept Ord = requires(const Lhs& lhs, const Rhs& rhs) {
+  { lhs <=> rhs } -> same_as;
+};
+
+static_assert(Ord);  // Works.
+static_assert(!Ord); // ICE.
+
+template
+struct S {
+  T* p;
+};
+
+template
+  requires(Ord)
+constexpr inline auto operator<=>(const S& l, const S& r) noexcept {
+  return l.p <=> r.p;
+}
+
+static_assert(Ord, S>);   // Works.
+static_assert(!Ord, S>); // ICE.
-- 
2.39.0.rc0.49.g083e01275b

Re: [PATCH] [x86] Fix unrecognizable insn due to illegal immediate_operand (const_int 255) of QImode.

2022-11-29 Thread H.J. Lu via Gcc-patches

On Mon, Nov 28, 2022 at 11:04 PM Hongtao Liu  wrote:
>
> On Mon, Nov 28, 2022 at 9:06 PM liuhongt  wrote:
> >
> > For __builtin_ia32_vec_set_v16qi (a, -1, 2) with
> > !flag_signed_char. it's transformed to
> > __builtin_ia32_vec_set_v16qi (_4, 255, 2) in the gimple,
> > and expanded to (const_int 255) in the rtl. But for immediate_operand,
> > it expects (const_int 255) to be signed extended to
> > (const_int -1). The mismatch caused an unrecognizable insn error.
> >
> > expand_expr_real_1 generates (const_int 255) without considering the target 
> > mode.
> > I guess it's on purpose, so I'll leave that alone and only change the 
> > expander
> > in the backend. After applying convert_modes to (const_int 255),
> > it's transformed to (const_int -1) which fix the issue.
> >
> > Bootstrapped and regtested x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk(and backport to GCC-10/11/12 release branches)?
> Drop this patch since it's not a complete solution, there're also
> other QI builtins which is not handled.

I checked the x86 backend.  __builtin_ia32_vec_set_v16qi is the
only intrinsic with this issue.

> >
> > gcc/ChangeLog:
> >
> > PR target/107863
> > * config/i386/i386-expand.cc (ix86_expand_vec_set_builtin):
> > Convert op1 to target mode whenever mode mismatch.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr107863.c: New test.
> > ---
> >  gcc/config/i386/i386-expand.cc   | 2 +-
> >  gcc/testsuite/gcc.target/i386/pr107863.c | 8 
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr107863.c
> >
> > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index 0373c3614a4..c639ee3a9f7 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -12475,7 +12475,7 @@ ix86_expand_vec_set_builtin (tree exp)
> >op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
> >elt = get_element_number (TREE_TYPE (arg0), arg2);
> >
> > -  if (GET_MODE (op1) != mode1 && GET_MODE (op1) != VOIDmode)
> > +  if (GET_MODE (op1) != mode1)
> >  op1 = convert_modes (mode1, GET_MODE (op1), op1, true);
> >
> >op0 = force_reg (tmode, op0);
> > diff --git a/gcc/testsuite/gcc.target/i386/pr107863.c 
> > b/gcc/testsuite/gcc.target/i386/pr107863.c
> > new file mode 100644
> > index 000..99fd85d9765
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr107863.c
> > @@ -0,0 +1,8 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx2 -O" } */
> > +
> > +typedef char v16qi __attribute__((vector_size(16)));
> > +
> > +v16qi foo(v16qi a){
> > +  return __builtin_ia32_vec_set_v16qi (a, -1, 2);
> > +}
> > --
> > 2.27.0
> >
>
>
> --
> BR,
> Hongtao



-- 
H.J.

Ping: [PATCH] maintainer-scripts: Add gdc to update_web_docs_git

2022-11-29 Thread Iain Buclaw via Gcc-patches

Ping.

This looks obvious, however I don't know how things are generated for
the online documentation site in order to say this won't cause any
problems for whatever process is building these pages.

Excerpts from Iain Buclaw's message of November 21, 2022 11:29 am:
> Hi,
> 
> This patch adds gdc to maintainer-scripts/update_web_docs_git so that
> it's built and uploaded to gcc.gnu.org/onlinedocs.
> 
> One half of re-adding the gdc docs that were taken down after the revert
> to the Sphinx conversion.
> 
> OK?
> 
> Regards,
> Iain.
> 
> ---
>   PR web/107749
> 
> maintainer-scripts/ChangeLog:
> 
>   * update_web_docs_git: Add gdc to MANUALS.
> ---
>  maintainer-scripts/update_web_docs_git | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/maintainer-scripts/update_web_docs_git 
> b/maintainer-scripts/update_web_docs_git
> index 6c38e213562..dee9b1d3b5e 100755
> --- a/maintainer-scripts/update_web_docs_git
> +++ b/maintainer-scripts/update_web_docs_git
> @@ -21,6 +21,7 @@ MANUALS="cpp
>gccgo
>gccint
>gcj
> +  gdc
>gfortran
>gfc-internals
>gnat_ugn
> -- 
> 2.37.2
> 
>

[Patch] libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors (was: amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors)

2022-11-29 Thread Tobias Burnus


Hi PA, hi Andrew, hi Jakub, hi all,

On 29.11.22 16:56, Paul-Antoine Arras wrote:

This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP
context selectors, [...]


I think this should be documented somewhere. We have
https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html

For GCN and ISA, it refers to -march= and gfx803 is only a context
selector. Hence:

How about the attached patch?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Context Selectors): Add 'gfx803' to gcn's isa.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 85cae742cd4..0066d41fdc5 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4378,5 +4378,6 @@ offloading devices (it's not clear if they should be):
 @item @code{amdgcn}, @code{gcn}
   @tab @code{gpu}
-  @tab See @code{-march=} in ``AMD GCN Options''
+  @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally
+  supported is @code{gfx803} as an alias for @code{fiji}.}
 @item @code{nvptx}
   @tab @code{gpu}

Re: [PATCH Rust front-end v3 38/46] gccrs: Add HIR to GCC GENERIC lowering entry point

2022-11-29 Thread Arthur Cohen


Hi Richard,

(...)


+
+  unsigned HOST_WIDE_INT ltype_length
+= wi::ext (wi::to_offset (TYPE_MAX_VALUE (ltype_domain))
+- wi::to_offset (TYPE_MIN_VALUE (ltype_domain)) + 1,


TYPE_MIN_VALUE is not checked to be constant, also the correct
check would be to use TREE_CODE (..) == INTEGER_CST, in
the GCC middle-end an expression '1 + 2' (a PLUS_EXPR) would
be TREE_CONSTANT but wi::to_offset would ICE.


+  TYPE_PRECISION (TREE_TYPE (ltype_domain)),
+  TYPE_SIGN (TREE_TYPE (ltype_domain)))
+   .to_uhwi ();


.to_uhwi will just truncate if the value doesn't fit, the same result as
above is achieved with

   unsigned HOST_WIDE_INT ltype_length
  = TREE_INT_CST_LOW (TYPE_MAX_VALUE (..))
- TREE_INT_CST_LOW (TYPE_MIN_VALUE (...)) + 1;

so it appears you wanted to be "more correct" here (but if I see
correctly you fail on that attempt)?



I've made the changes you proposed and noticed failure on our 32-bit CI.

I've had a look at the values in detail, and it seems that truncating 
was the expected behavior.


On our 64 bit CI, with a testcase containing an array of zero elements, 
we get the following values:


TREE_INT_CST_LOW(TYPE_MAX_VALUE(...)) = 18446744073709551615;
TREE_INT_CST_LOW(TYPE_MIN_VALUE(...)) = 0;

Adding 1 to the result of the substraction results in an overflow, 
wrapping back to zero.


With the -m32 flag, we get the following values:

TREE_INT_CST_LOW(TYPE_MAX_VALUE(...)) = 4294967295;
TREE_INT_CST_LOW(TYPE_MIN_VALUE(...)) = 0;

The addition of 1 does not overflow the unsigned HOST_WIDE_INT type and 
we end up with 4294967296 as the length of our array.


I am not sure on how to fix this behavior, and whether or not it is the 
expected one, nor am I familiar enough with the tree API to reproduce 
the original behavior. Any input is welcome.


In the meantime, I'll revert those changes and probably keep the 
existing code in the patches if that's okay with you.



Overall this part of the rust frontend looks OK.  Take the comments as
suggestions (for future
enhancements).


Which seems to be the case :)

The v4 of patches, which contains a lot of fixes for the issues you 
mentioned, will be sent soon.


All the best,

Arthur


OpenPGP_0x1B3465B044AD9C65.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

[PATCH v2] libgo: Don't rely on GNU-specific strerror_r variant on Linux

2022-11-29 Thread soeren--- via Gcc-patches

From: Sören Tempel 

On glibc, there are two versions of strerror_r: An XSI-compliant and a
GNU-specific version. The latter is only available on glibc. In order
to avoid duplicating the post-processing code of error messages, this
commit provides a separate strerror_go symbol which always refers to the
XSI-compliant version of strerror_r (even on glibc) by selectively
undefining the corresponding feature test macro.

Previously, gofrontend assumed that the GNU-specific version of
strerror_r was always available on Linux (which isn't the case when
using a musl as a libc, for example). This commit thereby improves
compatibility with Linux systems that are not using glibc.

Tested on x86_64 Alpine Linux Edge and Arch Linux (glibc 2.36).

Alternative: Use a GNU autoconf macro to detect which version is in
use. However, this requires moving the allocations and post-processing
logic from Go to C.

Signed-off-by: Sören Tempel 
---
Changes since v1: Fixed a typo in the Makefile.

 libgo/Makefile.am|  1 +
 libgo/Makefile.in|  6 +-
 libgo/go/syscall/errstr.go   |  9 +++--
 libgo/go/syscall/errstr_glibc.go | 34 
 libgo/runtime/go-strerror.c  | 30 
 5 files changed, 39 insertions(+), 41 deletions(-)
 delete mode 100644 libgo/go/syscall/errstr_glibc.go
 create mode 100644 libgo/runtime/go-strerror.c

diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index b03e6553..207d5a98 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -465,6 +465,7 @@ runtime_files = \
runtime/go-nanotime.c \
runtime/go-now.c \
runtime/go-nosys.c \
+   runtime/go-strerror.c \
runtime/go-reflect-call.c \
runtime/go-setenv.c \
runtime/go-signal.c \
diff --git a/libgo/Makefile.in b/libgo/Makefile.in
index 16ed62a8..0407e09c 100644
--- a/libgo/Makefile.in
+++ b/libgo/Makefile.in
@@ -247,7 +247,7 @@ am__objects_4 = runtime/aeshash.lo runtime/go-assert.lo \
runtime/go-fieldtrack.lo runtime/go-matherr.lo \
runtime/go-memclr.lo runtime/go-memmove.lo \
runtime/go-memequal.lo runtime/go-nanotime.lo \
-   runtime/go-now.lo runtime/go-nosys.lo \
+   runtime/go-now.lo runtime/go-nosys.lo runtime/go-strerror.lo \
runtime/go-reflect-call.lo runtime/go-setenv.lo \
runtime/go-signal.lo runtime/go-unsafe-pointer.lo \
runtime/go-unsetenv.lo runtime/go-unwind.lo \
@@ -917,6 +917,7 @@ runtime_files = \
runtime/go-nanotime.c \
runtime/go-now.c \
runtime/go-nosys.c \
+   runtime/go-strerror.c \
runtime/go-reflect-call.c \
runtime/go-setenv.c \
runtime/go-signal.c \
@@ -1390,6 +1391,8 @@ runtime/go-now.lo: runtime/$(am__dirstamp) \
runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-nosys.lo: runtime/$(am__dirstamp) \
runtime/$(DEPDIR)/$(am__dirstamp)
+runtime/go-strerror.lo: runtime/$(am__dirstamp) \
+   runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-reflect-call.lo: runtime/$(am__dirstamp) \
runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-setenv.lo: runtime/$(am__dirstamp) \
@@ -1453,6 +1456,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-memmove.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-nanotime.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-nosys.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-strerror.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-now.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-reflect-call.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-setenv.Plo@am__quote@
diff --git a/libgo/go/syscall/errstr.go b/libgo/go/syscall/errstr.go
index 59f7a82c..6cc73853 100644
--- a/libgo/go/syscall/errstr.go
+++ b/libgo/go/syscall/errstr.go
@@ -4,18 +4,15 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-//go:build !hurd && !linux
-// +build !hurd,!linux
-
 package syscall
 
-//sysnbstrerror_r(errnum int, buf []byte) (err Errno)
-//strerror_r(errnum _C_int, buf *byte, buflen Size_t) _C_int
+//sysnbstrerror_go(errnum int, buf []byte) (err Errno)
+//strerror_go(errnum _C_int, buf *byte, buflen Size_t) _C_int
 
 func Errstr(errnum int) string {
for len := 128; ; len *= 2 {
b := make([]byte, len)
-   errno := strerror_r(errnum, b)
+   errno := strerror_go(errnum, b)
if errno == 0 {
i := 0
for b[i] != 0 {
diff --git a/libgo/go/syscall/errstr_glibc.go b/libgo/go/syscall/errstr_glibc.go
deleted file mode 100644
index 03a327db..
--- a/libgo/go/syscall/errstr_glibc.go
+++ /dev/null
@@ -1,34 +0,0 @@
-// errstr_glibc.go -- GNU/Linux and GNU/Hurd specific error strings.
-
-// Copyright

Re: [PATCH] amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors

2022-11-29 Thread Andrew Stubbs


On 29/11/2022 15:56, Paul-Antoine Arras wrote:

Hi all,

This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP 
context selectors, so as to be consistent with LLVM. It also adds test 
cases checking all supported AMD ISAs are properly recognised when used 
in a 'declare variant' construct.


Is it OK for mainline?


OK

Andrew

Ping #3: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.

2022-11-29 Thread Michael Meissner via Gcc-patches

Can we get the three patches in this patch set reviewed?  Without them, GCC 13
can't be built on Fedora 37 which defaults to IEEE 128-bit long double.

| Date: Tue, 1 Nov 2022 22:44:01 -0400
| Subject: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299

2022-11-29 Thread Michael Meissner via Gcc-patches

Can we get the three patches in this patch set reviewed?  Without the patches
applied, GCC 13 will not build on Fedora 37, which uses long double defaulting
to IEEE 128-bit.

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH] libgo: Don't rely on GNU-specific strerror_r variant on Linux

2022-11-29 Thread soeren--- via Gcc-patches

From: Sören Tempel 

On glibc, there are two versions of strerror_r: An XSI-compliant and a
GNU-specific version. The latter is only available on glibc. In order
to avoid duplicating the post-processing code of error messages, this
commit provides a separate strerror_go symbol which always refers to the
XSI-compliant version of strerror_r (even on glibc) by selectively
undefining the corresponding feature test macro.

Previously, gofrontend assumed that the GNU-specific version of
strerror_r was always available on Linux (which isn't the case when
using a musl as a libc, for example). This commit thereby improves
compatibility with Linux systems that are not using glibc.

Tested on x86_64 Alpine Linux Edge and Arch Linux (glibc 2.36).

Alternative: Use a GNU autoconf macro to detect which version is in
use. However, this requires moving the allocations and post-processing
logic from Go to C.

Signed-off-by: Sören Tempel 
---
I previously experimented a bit with the aforementioned GNU autoconf
macro, however, I believe this is the more elegant and portable solution
as it doesn't require dealing with memory allocation for XSI strerror_r
buffers etc.

 libgo/Makefile.am|  1 +
 libgo/Makefile.in|  6 +-
 libgo/go/syscall/errstr.go   |  9 +++--
 libgo/go/syscall/errstr_glibc.go | 34 
 libgo/runtime/go-strerror.c  | 30 
 5 files changed, 39 insertions(+), 41 deletions(-)
 delete mode 100644 libgo/go/syscall/errstr_glibc.go
 create mode 100644 libgo/runtime/go-strerror.c

diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index b03e6553..207d5a98 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -465,6 +465,7 @@ runtime_files = \
runtime/go-nanotime.c \
runtime/go-now.c \
runtime/go-nosys.c \
+   runtime/go-strerror.c \
runtime/go-reflect-call.c \
runtime/go-setenv.c \
runtime/go-signal.c \
diff --git a/libgo/Makefile.in b/libgo/Makefile.in
index 16ed62a8..ab9fa376 100644
--- a/libgo/Makefile.in
+++ b/libgo/Makefile.in
@@ -247,7 +247,7 @@ am__objects_4 = runtime/aeshash.lo runtime/go-assert.lo \
runtime/go-fieldtrack.lo runtime/go-matherr.lo \
runtime/go-memclr.lo runtime/go-memmove.lo \
runtime/go-memequal.lo runtime/go-nanotime.lo \
-   runtime/go-now.lo runtime/go-nosys.lo \
+   runtime/go-now.lo runtime/go-nosys.lo runtime/go-strerror.lo \
runtime/go-reflect-call.lo runtime/go-setenv.lo \
runtime/go-signal.lo runtime/go-unsafe-pointer.lo \
runtime/go-unsetenv.lo runtime/go-unwind.lo \
@@ -917,6 +917,7 @@ runtime_files = \
runtime/go-nanotime.c \
runtime/go-now.c \
runtime/go-nosys.c \
+   runtime/go-strerror.c \
runtime/go-reflect-call.c \
runtime/go-setenv.c \
runtime/go-signal.c \
@@ -1390,6 +1391,8 @@ runtime/go-now.lo: runtime/$(am__dirstamp) \
runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-nosys.lo: runtime/$(am__dirstamp) \
runtime/$(DEPDIR)/$(am__dirstamp)
+runtime/strerror.lo: runtime/$(am__dirstamp) \
+   runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-reflect-call.lo: runtime/$(am__dirstamp) \
runtime/$(DEPDIR)/$(am__dirstamp)
 runtime/go-setenv.lo: runtime/$(am__dirstamp) \
@@ -1453,6 +1456,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-memmove.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-nanotime.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-nosys.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-strerror.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-now.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ 
@am__quote@runtime/$(DEPDIR)/go-reflect-call.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@runtime/$(DEPDIR)/go-setenv.Plo@am__quote@
diff --git a/libgo/go/syscall/errstr.go b/libgo/go/syscall/errstr.go
index 59f7a82c..6cc73853 100644
--- a/libgo/go/syscall/errstr.go
+++ b/libgo/go/syscall/errstr.go
@@ -4,18 +4,15 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-//go:build !hurd && !linux
-// +build !hurd,!linux
-
 package syscall
 
-//sysnbstrerror_r(errnum int, buf []byte) (err Errno)
-//strerror_r(errnum _C_int, buf *byte, buflen Size_t) _C_int
+//sysnbstrerror_go(errnum int, buf []byte) (err Errno)
+//strerror_go(errnum _C_int, buf *byte, buflen Size_t) _C_int
 
 func Errstr(errnum int) string {
for len := 128; ; len *= 2 {
b := make([]byte, len)
-   errno := strerror_r(errnum, b)
+   errno := strerror_go(errnum, b)
if errno == 0 {
i := 0
for b[i] != 0 {
diff --git a/libgo/go/syscall/errstr_glibc.go b/libgo/go/syscall/errstr_glibc.go
deleted file mode 100644
index

Ping #2: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-11-29 Thread Michael Meissner via Gcc-patches

Can we please get this patch reviewed?  GCC 13 won't build on Fedora 37 (which
defaults to long double being IEEE 128-bit) without the 3 patches in this set.

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR 
target/107299
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[committed] libstdc++: Avoid bogus warning in std::vector::insert [PR107852]

2022-11-29 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

GCC assumes that any global variable might be modified by operator new,
and so in the testcase for this PR all data members get reloaded after
allocating new storage. By making local copies of the _M_start and
_M_finish members we avoid that, and then the compiler has enough info
to remove the dead branches that trigger bogus -Warray-bounds warnings.

libstdc++-v3/ChangeLog:

PR libstdc++/107852
PR libstdc++/106199
PR libstdc++/100366
* include/bits/vector.tcc (vector::_M_fill_insert): Copy
_M_start and _M_finish members before allocating.
(vector::_M_default_append): Likewise.
(vector::_M_range_insert): Likewise.
---
 libstdc++-v3/include/bits/vector.tcc | 63 
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/include/bits/vector.tcc 
b/libstdc++-v3/include/bits/vector.tcc
index 33faabf2eae..27ef1a4ee7f 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -539,9 +539,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  if (__elems_after > __n)
{
  _GLIBCXX_ASAN_ANNOTATE_GROW(__n);
- std::__uninitialized_move_a(this->_M_impl._M_finish - __n,
- this->_M_impl._M_finish,
- this->_M_impl._M_finish,
+ std::__uninitialized_move_a(__old_finish - __n,
+ __old_finish,
+ __old_finish,
  _M_get_Tp_allocator());
  this->_M_impl._M_finish += __n;
  _GLIBCXX_ASAN_ANNOTATE_GREW(__n);
@@ -554,7 +554,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
{
  _GLIBCXX_ASAN_ANNOTATE_GROW(__n);
  this->_M_impl._M_finish =
-   std::__uninitialized_fill_n_a(this->_M_impl._M_finish,
+   std::__uninitialized_fill_n_a(__old_finish,
  __n - __elems_after,
  __x_copy,
  _M_get_Tp_allocator());
@@ -569,9 +569,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
}
  else
{
+ // Make local copies of these members because the compiler thinks
+ // the allocator can alter them if 'this' is globally reachable.
+ pointer __old_start = this->_M_impl._M_start;
+ pointer __old_finish = this->_M_impl._M_finish;
+ const pointer __pos = __position.base();
+
  const size_type __len =
_M_check_len(__n, "vector::_M_fill_insert");
- const size_type __elems_before = __position - begin();
+ const size_type __elems_before = __pos - __old_start;
  pointer __new_start(this->_M_allocate(__len));
  pointer __new_finish(__new_start);
  __try
@@ -584,15 +590,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
  __new_finish
= std::__uninitialized_move_if_noexcept_a
-   (this->_M_impl._M_start, __position.base(),
-__new_start, _M_get_Tp_allocator());
+   (__old_start, __pos, __new_start, _M_get_Tp_allocator());
 
  __new_finish += __n;
 
  __new_finish
= std::__uninitialized_move_if_noexcept_a
-   (__position.base(), this->_M_impl._M_finish,
-__new_finish, _M_get_Tp_allocator());
+   (__pos, __old_finish, __new_finish, _M_get_Tp_allocator());
}
  __catch(...)
{
@@ -606,12 +610,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  _M_deallocate(__new_start, __len);
  __throw_exception_again;
}
- std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
-   _M_get_Tp_allocator());
+ std::_Destroy(__old_start, __old_finish, _M_get_Tp_allocator());
  _GLIBCXX_ASAN_ANNOTATE_REINIT;
- _M_deallocate(this->_M_impl._M_start,
-   this->_M_impl._M_end_of_storage
-   - this->_M_impl._M_start);
+ _M_deallocate(__old_start,
+   this->_M_impl._M_end_of_storage - __old_start);
  this->_M_impl._M_start = __new_start;
  this->_M_impl._M_finish = __new_finish;
  this->_M_impl._M_end_of_storage = __new_start + __len;
@@ -645,6 +647,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
}
  else
{
+ // Make local copies of these members because the compiler thinks
+ // the allocator can alter them if 'this'

[committed] libstdc++: Remove unnecessary tag dispatching in std::vector

2022-11-29 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

There's no need to call a _M_xxx_dispatch function with a
statically-known __false_type tag, we can just directly call the
function that should be dispatched to. This will compile a tiny bit
faster and save a function call with optimization or inlining turned
off.

Also add the always_inline attribute to the __iterator_category helper
used for dispatching on the iterator category.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator_base_types.h (__iterator_category):
Add always_inline attribute.
* include/bits/stl_vector.h (assign(Iter, Iter)): Call
_M_assign_aux directly, instead of _M_assign_dispatch.
(insert(const_iterator, Iter, Iter)): Call _M_range_insert
directly instead of _M_insert_dispatch.
---
 libstdc++-v3/include/bits/stl_iterator_base_types.h | 1 +
 libstdc++-v3/include/bits/stl_vector.h  | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator_base_types.h 
b/libstdc++-v3/include/bits/stl_iterator_base_types.h
index 9eecd1dd855..5d90c0d8ea7 100644
--- a/libstdc++-v3/include/bits/stl_iterator_base_types.h
+++ b/libstdc++-v3/include/bits/stl_iterator_base_types.h
@@ -233,6 +233,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  sugar for internal library use only.
   */
   template
+__attribute__((__always_inline__))
 inline _GLIBCXX_CONSTEXPR
 typename iterator_traits<_Iter>::iterator_category
 __iterator_category(const _Iter&)
diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index b4ff3989a5d..e87fab0e51c 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -821,7 +821,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_GLIBCXX20_CONSTEXPR
void
assign(_InputIterator __first, _InputIterator __last)
-   { _M_assign_dispatch(__first, __last, __false_type()); }
+   { _M_assign_aux(__first, __last, std::__iterator_category(__first)); }
 #else
   template
void
@@ -1478,8 +1478,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _InputIterator __last)
{
  difference_type __offset = __position - cbegin();
- _M_insert_dispatch(begin() + __offset,
-__first, __last, __false_type());
+ _M_range_insert(begin() + __offset, __first, __last,
+ std::__iterator_category(__first));
  return begin() + __offset;
}
 #else
-- 
2.38.1

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2022-11-29 Thread Prathamesh Kulkarni via Gcc-patches

On Tue, 29 Nov 2022 at 20:43, Andrew Pinski  wrote:
>
> On Tue, Nov 29, 2022 at 6:40 AM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > Hi,
> > For the following test-case:
> >
> > int16x8_t foo(int16_t x, int16_t y)
> > {
> >   return (int16x8_t) { x, y, x, y, x, y, x, y };
> > }
>
> (Not to block this patch)
> Seems like this trick can be done even with less than perfect initializer too:
> e.g.
> int16x8_t foo(int16_t x, int16_t y)
> {
>   return (int16x8_t) { x, y, x, y, x, y, x, 0 };
> }
>
> Which should generate something like:
> dup v0.8h, w0
> dup v1.8h, w1
> zip1 v0.8h, v0.8h, v1.8h
> ins v0.h[7], wzr
Hi Andrew,
Nice catch, thanks for the suggestions!
More generally, code-gen with constants involved seems to be sub-optimal.
For example:
int16x8_t foo(int16_t x)
{
  return (int16x8_t) { x, x, x, x, x, x, x, 1 };
}

results in:
foo:
moviv0.8h, 0x1
ins v0.h[0], w0
ins v0.h[1], w0
ins v0.h[2], w0
ins v0.h[3], w0
ins v0.h[4], w0
ins v0.h[5], w0
ins v0.h[6], w0
ret

which I suppose could instead be the following ?
foo:
dup v0.8h, w0
movw1, 0x1
ins   v0.h[7], w1
ret

I will try to address this in follow up patch.

Thanks,
Prathamesh

>
> Thanks,
> Andrew Pinski
>
>
> >
> > Code gen at -O3:
> > foo:
> > dupv0.8h, w0
> > ins v0.h[1], w1
> > ins v0.h[3], w1
> > ins v0.h[5], w1
> > ins v0.h[7], w1
> > ret
> >
> > For 16 elements, it results in 8 ins instructions which might not be
> > optimal perhaps.
> > I guess, the above code-gen would be equivalent to the following ?
> > dup v0.8h, w0
> > dup v1.8h, w1
> > zip1 v0.8h, v0.8h, v1.8h
> >
> > I have attached patch to do the same, if number of elements >= 8,
> > which should be possibly better compared to current code-gen ?
> > Patch passes bootstrap+test on aarch64-linux-gnu.
> > Does the patch look OK ?
> >
> > Thanks,
> > Prathamesh

[PATCH] amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors

2022-11-29 Thread Paul-Antoine Arras


Hi all,

This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP 
context selectors, so as to be consistent with LLVM. It also adds test 
cases checking all supported AMD ISAs are properly recognised when used 
in a 'declare variant' construct.


Is it OK for mainline?

Thanks,
--
PAFrom 2523122f7fff806aca7f7f03109668064969aa2d Mon Sep 17 00:00:00 2001
From: Paul-Antoine Arras 
Date: Tue, 29 Nov 2022 16:22:07 +0100
Subject: [PATCH] amdgcn: Support AMD-specific 'isa' traits in OpenMP context
 selectors

Add support for gfx803 as an alias for fiji.
Add test cases for all supported 'isa' values.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Add gfx803.
* config/gcn/t-omp-device: Add gfx803.

libgomp/ChangeLog:

* testsuite/libgomp.c/declare-variant-4-fiji.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx900.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx906.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx908.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx90a.c: New test.
* testsuite/libgomp.c/declare-variant-4.h: New header file.
---
 gcc/config/gcn/gcn.cc |  2 +-
 gcc/config/gcn/t-omp-device   |  2 +-
 .../libgomp.c/declare-variant-4-fiji.c|  8 +++
 .../libgomp.c/declare-variant-4-gfx803.c  |  7 +++
 .../libgomp.c/declare-variant-4-gfx900.c  |  7 +++
 .../libgomp.c/declare-variant-4-gfx906.c  |  7 +++
 .../libgomp.c/declare-variant-4-gfx908.c  |  7 +++
 .../libgomp.c/declare-variant-4-gfx90a.c  |  7 +++
 .../testsuite/libgomp.c/declare-variant-4.h   | 63 +++
 9 files changed, 108 insertions(+), 2 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-4-fiji.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-4-gfx803.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-4-gfx900.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-4-gfx906.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-4-gfx908.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-4-gfx90a.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-4.h

diff --git gcc/config/gcn/gcn.cc gcc/config/gcn/gcn.cc
index c74fa007a21..39e93aeaeef 100644
--- gcc/config/gcn/gcn.cc
+++ gcc/config/gcn/gcn.cc
@@ -2985,7 +2985,7 @@ gcn_omp_device_kind_arch_isa (enum 
omp_device_kind_arch_isa trait,
 case omp_device_arch:
   return strcmp (name, "amdgcn") == 0 || strcmp (name, "gcn") == 0;
 case omp_device_isa:
-  if (strcmp (name, "fiji") == 0)
+  if (strcmp (name, "fiji") == 0 || strcmp (name, "gfx803") == 0)
return gcn_arch == PROCESSOR_FIJI;
   if (strcmp (name, "gfx900") == 0)
return gcn_arch == PROCESSOR_VEGA10;
diff --git gcc/config/gcn/t-omp-device gcc/config/gcn/t-omp-device
index 27d36db894b..538624f7ec7 100644
--- gcc/config/gcn/t-omp-device
+++ gcc/config/gcn/t-omp-device
@@ -1,4 +1,4 @@
 omp-device-properties-gcn: $(srcdir)/config/gcn/gcn.cc
echo kind: gpu > $@
echo arch: amdgcn gcn >> $@
-   echo isa: fiji gfx900 gfx906 gfx908 gfx90a >> $@
+   echo isa: fiji gfx803 gfx900 gfx906 gfx908 gfx90a >> $@
diff --git libgomp/testsuite/libgomp.c/declare-variant-4-fiji.c 
libgomp/testsuite/libgomp.c/declare-variant-4-fiji.c
new file mode 100644
index 000..ae2af1cc00c
--- /dev/null
+++ libgomp/testsuite/libgomp.c/declare-variant-4-fiji.c
@@ -0,0 +1,8 @@
+/* { dg-do run { target { offload_target_amdgcn } } } */
+/* { dg-skip-if "fiji/gfx803 only" { ! amdgcn-*-* } { "*" } { 
"-foffload=-march=fiji" } } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#define USE_FIJI_FOR_GFX803
+#include "declare-variant-4.h"
+
+/* { dg-final { scan-offload-tree-dump "= gfx803 \\(\\);" "optimized" } } */
diff --git libgomp/testsuite/libgomp.c/declare-variant-4-gfx803.c 
libgomp/testsuite/libgomp.c/declare-variant-4-gfx803.c
new file mode 100644
index 000..e0437a04d65
--- /dev/null
+++ libgomp/testsuite/libgomp.c/declare-variant-4-gfx803.c
@@ -0,0 +1,7 @@
+/* { dg-do run { target { offload_target_amdgcn } } } */
+/* { dg-skip-if "fiji/gfx803 only" { ! amdgcn-*-* } { "*" } { 
"-foffload=-march=fiji" } } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-4.h"
+
+/* { dg-final { scan-offload-tree-dump "= gfx803 \\(\\);" "optimized" } } */
diff --git libgomp/testsuite/libgomp.c/declare-variant-4-gfx900.c 
libgomp/testsuite/libgomp.c/declare-variant-4-gfx900.c
new file mode 100644
index 000..8de03725dec
--- /dev/null
+++ libgomp/testsuite/libgomp.c/declare-variant-4-gfx900.c
@@ -0,0 +1,7 @@
+/* { dg-do run { target { offload_target_amdgcn } } } */
+/* { dg-skip-if "gfx900 only" { ! amdgcn-*-* } { "*" } { 
"-foffload=-march=gfx900" } } */
+/* {

[committed] libstdc++: Do not use used or packed as identifiers

2022-11-29 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

These names (and __unused) are defined as macros by newlib.

libstdc++-v3/ChangeLog:

* include/std/format: Rename all variables called __used or
__packed.
* testsuite/17_intro/badnames.cc: Add no_pch options.
* testsuite/17_intro/names.cc: Check __packed, __unused and
__used.
---
 libstdc++-v3/include/std/format | 30 ++---
 libstdc++-v3/testsuite/17_intro/badnames.cc |  1 +
 libstdc++-v3/testsuite/17_intro/names.cc| 10 +--
 3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 23ffbdabed8..cb5ce40dece 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -2472,21 +2472,21 @@ namespace __format
   void
   _M_overflow() override
   {
-   auto __used = this->_M_used();
+   auto __s = this->_M_used();
if (_M_max < 0) // No maximum.
- _M_out = ranges::copy(__used, std::move(_M_out)).out;
+ _M_out = ranges::copy(__s, std::move(_M_out)).out;
else if (_M_count < size_t(_M_max))
  {
auto __max = _M_max - _M_count;
span<_CharT> __first;
-   if (__max < __used.size())
- __first = __used.first(__max);
+   if (__max < __s.size())
+ __first = __s.first(__max);
else
- __first = __used;
+ __first = __s;
_M_out = ranges::copy(__first, std::move(_M_out)).out;
  }
this->_M_rewind();
-   _M_count += __used.size();
+   _M_count += __s.size();
   }
 
 public:
@@ -2529,8 +2529,8 @@ namespace __format
   void
   _M_overflow()
   {
-   auto __used = this->_M_used();
-   _M_count += __used.size();
+   auto __s = this->_M_used();
+   _M_count += __s.size();
 
if (_M_max >= 0)
  {
@@ -2544,7 +2544,7 @@ namespace __format
  {
// No maximum character count. Just extend the span to allow
// writing more characters to it.
-   this->_M_reset({__used.data(), __used.size() + 1024}, __used.end());
+   this->_M_reset({__s.data(), __s.size() + 1024}, __s.end());
  }
   }
 
@@ -2594,12 +2594,12 @@ namespace __format
   {
_Iter_sink::_M_overflow();
iter_difference_t<_OutIter> __count(_M_count);
-   auto __used = this->_M_used();
+   auto __s = this->_M_used();
auto __last = _M_first;
-   if (__used.data() == _M_buf) // Wrote at least _M_max characters.
+   if (__s.data() == _M_buf) // Wrote at least _M_max characters.
  __last += _M_max;
else
- __last += iter_difference_t<_OutIter>(__used.size());
+ __last += iter_difference_t<_OutIter>(__s.size());
return { __last, __count };
   }
 };
@@ -3119,10 +3119,10 @@ namespace __format
 constexpr auto
 __pack_arg_types(const array<_Arg_t, _Nm>& __types)
 {
-  __UINT64_TYPE__ __packed = 0;
+  __UINT64_TYPE__ __packed_types = 0;
   for (auto __i = __types.rbegin(); __i != __types.rend(); ++__i)
-   __packed = (__packed << _Bits) | *__i;
-  return __packed;
+   __packed_types = (__packed_types << _Bits) | *__i;
+  return __packed_types;
 }
 } // namespace __format
 /// @endcond
diff --git a/libstdc++-v3/testsuite/17_intro/badnames.cc 
b/libstdc++-v3/testsuite/17_intro/badnames.cc
index 63c955e8277..5f1be094515 100644
--- a/libstdc++-v3/testsuite/17_intro/badnames.cc
+++ b/libstdc++-v3/testsuite/17_intro/badnames.cc
@@ -16,6 +16,7 @@
 // .
 
 // { dg-do compile { target x86_64-*-linux* } }
+// { dg-add-options no_pch }
 
 // Names taken from coding_style.bad_identifiers in the libstdc++ manual.
 // We can't test this on all targets, because these names are used in
diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 6490cd63307..ffbb199797b 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -129,8 +129,11 @@
 #define ptr (
 #endif
 
-// This clashes with newlib so don't use it.
+// These clash with newlib so don't use them.
 # define __lockablecannot be used as an identifier
+# define __packed  cannot be used as an identifier
+# define __unused  cannot be used as an identifier
+# define __usedcannot be used as an identifier
 
 #ifndef __APPLE__
 #define __weak   predefined qualifier on darwin
@@ -239,8 +242,11 @@
 #endif
 
 #if __has_include()
-// newlib's  defines __lockable as a macro.
+// newlib's  defines these as macros.
 #undef __lockable
+#undef __packed
+#undef __unused
+#undef __used
 // newlib's  defines __tzrule_type with these members.
 #undef d
 #undef m
-- 
2.38.1

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2022-11-29 Thread Andrew Pinski via Gcc-patches

On Tue, Nov 29, 2022 at 6:40 AM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> Hi,
> For the following test-case:
>
> int16x8_t foo(int16_t x, int16_t y)
> {
>   return (int16x8_t) { x, y, x, y, x, y, x, y };
> }

(Not to block this patch)
Seems like this trick can be done even with less than perfect initializer too:
e.g.
int16x8_t foo(int16_t x, int16_t y)
{
  return (int16x8_t) { x, y, x, y, x, y, x, 0 };
}

Which should generate something like:
dup v0.8h, w0
dup v1.8h, w1
zip1 v0.8h, v0.8h, v1.8h
ins v0.h[7], wzr

Thanks,
Andrew Pinski


>
> Code gen at -O3:
> foo:
> dupv0.8h, w0
> ins v0.h[1], w1
> ins v0.h[3], w1
> ins v0.h[5], w1
> ins v0.h[7], w1
> ret
>
> For 16 elements, it results in 8 ins instructions which might not be
> optimal perhaps.
> I guess, the above code-gen would be equivalent to the following ?
> dup v0.8h, w0
> dup v1.8h, w1
> zip1 v0.8h, v0.8h, v1.8h
>
> I have attached patch to do the same, if number of elements >= 8,
> which should be possibly better compared to current code-gen ?
> Patch passes bootstrap+test on aarch64-linux-gnu.
> Does the patch look OK ?
>
> Thanks,
> Prathamesh

[aarch64] Use dup and zip1 for interleaving elements in initializing vector

2022-11-29 Thread Prathamesh Kulkarni via Gcc-patches

Hi,
For the following test-case:

int16x8_t foo(int16_t x, int16_t y)
{
  return (int16x8_t) { x, y, x, y, x, y, x, y };
}

Code gen at -O3:
foo:
dupv0.8h, w0
ins v0.h[1], w1
ins v0.h[3], w1
ins v0.h[5], w1
ins v0.h[7], w1
ret

For 16 elements, it results in 8 ins instructions which might not be
optimal perhaps.
I guess, the above code-gen would be equivalent to the following ?
dup v0.8h, w0
dup v1.8h, w1
zip1 v0.8h, v0.8h, v1.8h

I have attached patch to do the same, if number of elements >= 8,
which should be possibly better compared to current code-gen ?
Patch passes bootstrap+test on aarch64-linux-gnu.
Does the patch look OK ?

Thanks,
Prathamesh
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index c91df6f5006..e5dea70e363 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22028,6 +22028,39 @@ aarch64_expand_vector_init (rtx target, rtx vals)
   return;
 }
 
+  /* Check for interleaving case.
+ For eg if initializer is (int16x8_t) {x, y, x, y, x, y, x, y}.
+ Generate following code:
+ dup v0.h, x
+ dup v1.h, y
+ zip1 v0.h, v0.h, v1.h
+ for "large enough" initializer.  */
+
+  if (n_elts >= 8)
+{
+  int i;
+  for (i = 2; i < n_elts; i++)
+   if (!rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, i % 2)))
+ break;
+
+  if (i == n_elts)
+   {
+ machine_mode mode = GET_MODE (target);
+ rtx dest[2];
+
+ for (int i = 0; i < 2; i++)
+   {
+ rtx x = copy_to_mode_reg (GET_MODE_INNER (mode), XVECEXP (vals, 
0, i));
+ dest[i] = gen_reg_rtx (mode);
+ aarch64_emit_move (dest[i], gen_vec_duplicate (mode, x));
+   }
+
+ rtvec v = gen_rtvec (2, dest[0], dest[1]);
+ emit_set_insn (target, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1));
+ return;
+   }
+}
+
   enum insn_code icode = optab_handler (vec_set_optab, mode);
   gcc_assert (icode != CODE_FOR_nothing);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c 
b/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c
new file mode 100644
index 000..ee775048589
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/interleave-init-1.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+
+/*
+** foo:
+** ...
+** dup v[0-9]+\.8h, w[0-9]+
+** dup v[0-9]+\.8h, w[0-9]+
+** zip1v[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h
+** ...
+** ret
+*/
+
+int16x8_t foo(int16_t x, int y)
+{
+  int16x8_t v = (int16x8_t) {x, y, x, y, x, y, x, y}; 
+  return v;
+}
+
+/*
+** foo2:
+** ...
+** dup v[0-9]+\.8h, w[0-9]+
+** moviv[0-9]+\.8h, 0x1
+** zip1v[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h
+** ...
+** ret
+*/
+
+int16x8_t foo2(int16_t x) 
+{
+  int16x8_t v = (int16x8_t) {x, 1, x, 1, x, 1, x, 1}; 
+  return v;
+}

Re: Implement a build-in function with floating point exceptions.

2022-11-29 Thread Xi Ruoyao via Gcc-patches

On Tue, 2022-11-29 at 09:52 +, 陈 小龙 wrote:
> +(define_insn "2"
> +  [(set (match_operand:ANYF 0 "register_operand" "=f")
> + (unspec:ANYF[(match_operand:ANYF 1 "register_operand" "f")]
> + FLOAT))]
> +  "TARGET_HARD_FLOAT&&(flag_fp_int_builtin_inexact ||
> !flag_trapping_math)"
> + 
> "ftint..\t%0,%1\n\tffint..\t%0,%0

It's wrong.  For example, consider ceil(0x1.0p+100).  0x1.0p+100 cannot
be represented by any 64-bit integer, so ftintrp.l.d will raise an
overflow exception and provide "some value" as the output.  Then ffint
can't produce the correct result.

Nacked-by: Xi Ruoyao 

I'd suggest to add frint.{w,l}.{s,d}.{rm,rp,rz} etc. (BTW, also the
variants without raising inexact exceptions) into a future LoongArch ISA
version for this.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH][AArch64] Cleanup move immediate code

2022-11-29 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> Just to make sure I understand: isn't it really just MOVN?  I would have
> expected a 32-bit MOVZ to be equivalent to (and add no capabilities over)
> a 64-bit MOVZ.

The 32-bit MOVZ immediates are equivalent, MOVN never overlaps, and
MOVI has some overlaps . Since we allow all 3 variants, the 2 alternatives
in the movdi pattern are overlapping for MOVZ and MOVI immediates.

> I agree the ctz trick is more elegant than (and an improvement over)
> the current approach to testing for movz.  But I think the overall logic
> is harder to follow than it was in the original version.  Initially
> canonicalising val2 based on the sign bit seems unintuitive since we
> still need to handle all four combinations of (top bit set, top bit clear)
> x (low 48 bits set, low 48 bits clear).  I preferred the original
> approach of testing once with the original value (for MOVZ) and once
> with the inverted value (for MOVN).

Yes, the canonicalization on the sign ends up requiring 2 special cases.
Handling the MOVZ case first and then MOVN does avoid that, and makes
things simpler overall, so I've used that approach in v2.

> Don't the new cases boil down to: if mode is DImode and the upper 32 bits
> are clear, we can test based on SImode instead?  In other words, couldn't
> the "(val >> 32) == 0" part of the final test be done first, with the
> effect of changing the mode to SImode?  Something like:

Yes that works. I used masking of the top bits to avoid repeatedly testing the
same condition. The new version removes most special cases and ends up
both smaller and simpler:


v2: Simplify the special cases in aarch64_move_imm, use aarch64_is_movz.

Simplify, refactor and improve various move immediate functions.
Allow 32-bit MOVZ/I/N as a valid 64-bit immediate which removes special
cases in aarch64_internal_mov_immediate.  Add new constraint so the movdi
pattern only needs a single alternative for move immediate.

Passes bootstrap and regress, OK for commit?

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_bitmask_imm): Use unsigned type.
(aarch64_zeroextended_move_imm): New function.
(aarch64_move_imm): Refactor, assert mode is SImode or DImode.
(aarch64_internal_mov_immediate): Assert mode is SImode or DImode.
Simplify special cases.
(aarch64_uimm12_shift): Simplify code.
(aarch64_clamp_to_uimm12_shift): Likewise.
(aarch64_movw_imm): Rename to aarch64_is_movz.
(aarch64_float_const_rtx_p): Pass either SImode or DImode to
aarch64_internal_mov_immediate.
(aarch64_rtx_costs): Likewise.
* config/aarch64/aarch64.md (movdi_aarch64): Merge 'N' and 'M'
constraints into single 'O'.
(mov_aarch64): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_move_imm): Use unsigned.
(aarch64_bitmask_imm): Likewise.
(aarch64_uimm12_shift): Likewise.
(aarch64_zeroextended_move_imm): New prototype.
* config/aarch64/constraints.md: Add 'O' for 32/64-bit immediates,
limit 'N' to 64-bit only moves.

---

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
4be93c93c26e091f878bc8e4cf06e90888405fb2..8bce6ec7599edcc2e6a1d8006450f35c0ce7f61f
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -756,7 +756,7 @@ void aarch64_post_cfi_startproc (void);
 poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
 bool aarch64_address_valid_for_prefetch_p (rtx, bool);
-bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
+bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode);
 unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
 unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
 bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode 
mode);
@@ -793,7 +793,7 @@ bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, 
unsigned HOST_WIDE_INT,
unsigned HOST_WIDE_INT,
unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
-bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
+bool aarch64_move_imm (unsigned HOST_WIDE_INT, machine_mode);
 machine_mode aarch64_sve_int_mode (machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
 machine_mode aarch64_sve_pred_mode (machine_mode);
@@ -843,8 +843,9 @@ bool aarch64_sve_float_arith_immediate_p (rtx, bool);
 bool aarch64_sve_float_mul_immediate_p (rtx);
 bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
-bool aarch64_uimm12_shift (HOST_WIDE_INT);
+bool aarch64_uimm12_shift (unsigned HOST_WIDE_INT);
 int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);
+bool aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT);
 bool

[PATCH v2 1/2] aarch64: fix warning emission for ABI break since GCC 9.1

2022-11-29 Thread Christophe Lyon via Gcc-patches

While looking at PR 105549, which is about fixing the ABI break
introduced in GCC 9.1 in parameter alignment with bit-fields, we
noticed that the GCC 9.1 warning is not emitted in all the cases where
it should be.  This patch fixes that and the next patch in the series
fixes the GCC 9.1 break.

We split this into two patches since patch #2 introduces a new ABI
break starting with GCC 13.1.  This way, patch #1 can be back-ported
to release branches if needed to fix the GCC 9.1 warning issue.

The main idea is to add a new global boolean that indicates whether
we're expanding the start of a function, so that aarch64_layout_arg
can emit warnings for callees as well as callers.  This removes the
need for aarch64_function_arg_boundary to warn (with its incomplete
information).  However, in the first patch there are still cases where
we emit warnings were we should not; this is fixed in patch #2 where
we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly.

The fix in aarch64_function_arg_boundary (replacing & with &&) looks
like an oversight of a previous commit in this area which changed
'abi_break' from a boolean to an integer.

We also take the opportunity to fix the comment above
aarch64_function_arg_alignment since the value of the abi_break
parameter was changed in a previous commit, no longer matching the
description.

2022-11-28  Christophe Lyon  
Richard Sandiford  

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix
comment.
(aarch64_layout_arg): Factorize warning conditions.
(aarch64_function_arg_boundary): Fix typo.
* function.cc (currently_expanding_function_start): New variable.
(expand_function_start): Handle
currently_expanding_function_start.
* function.h (currently_expanding_function_start): Declare.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New
test.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New
test.
* gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning.h: New test.

warning improvement
---
 gcc/config/aarch64/aarch64.cc |  28 +++-
 gcc/function.cc   |   5 +
 gcc/function.h|   2 +
 .../bitfield-abi-warning-align16-O2-extra.c   |  86 
 .../aarch64/bitfield-abi-warning-align16-O2.c |  87 
 .../bitfield-abi-warning-align32-O2-extra.c   | 119 +
 .../aarch64/bitfield-abi-warning-align32-O2.c | 124 +
 .../aarch64/bitfield-abi-warning-align8-O2.c  |  16 +++
 .../gcc.target/aarch64/bitfield-abi-warning.h | 125 ++
 9 files changed, 585 insertions(+), 7 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align16-O2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align32-O2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align8-O2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning.h

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index ab78b11b158..3623df5bd94 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -7264,9 +7264,9 @@ aarch64_vfp_is_call_candidate (cumulative_args_t pcum_v, 
machine_mode mode,
 /* Given MODE and TYPE of a function argument, return the alignment in
bits.  The idea is to suppress any stronger alignment requested by
the user and opt for the natural alignment (specified in AAPCS64 \S
-   4.1).  ABI_BREAK is set to true if the alignment was incorrectly
-   calculated in versions of GCC prior to GCC-9.  This is a helper
-   function for local use only.  */
+   4.1).  ABI_BREAK is set to the old alignment if the alignment was
+   incorrectly calculated in versions of GCC prior to GCC-9.  This is
+   a helper function for local use only.  */
 
 static unsigned int
 aarch64_function_arg_alignment (machine_mode mode, const_tree type,
@@ -7342,11 +7342,24 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
function_arg_info )
   if (pcum->aapcs_arg_processed)
 return;
 
+  bool warn_pcs_change
+= (warn_psabi
+   && !pcum->silent_p
+   && (currently_expanding_function_start
+  || currently_expanding_gimple_stmt));
+
+  unsigned int alignment
+= aarch64_function_arg_alignment (mode, type, _break);
+  gcc_assert (!alignment || abi_break < alignment);
+
   pcum->aapcs_arg_processed = true;
 
   pure_scalable_type_info

[PATCH v2 2/2] aarch64: Fix bit-field alignment in param passing [PR105549]

2022-11-29 Thread Christophe Lyon via Gcc-patches

While working on enabling DFP for AArch64, I noticed new failures in
gcc.dg/compat/struct-layout-1.exp (t028) which were not actually
caused by DFP types handling. These tests are generated during 'make
check' and enabling DFP made generation different (not sure if new
non-DFP tests are generated, or if existing ones are generated
differently, the tests in question are huge and difficult to compare).

Anyway, I reduced the problem to what I attach at the end of the new
gcc.target/aarch64/aapcs64/va_arg-17.c test and rewrote it in the same
scheme as other va_arg* AArch64 tests.  Richard Sandiford further
reduced this to a non-vararg function, added as a second testcase.

This is a tough case mixing bit-fields and alignment, where
aarch64_function_arg_alignment did not follow what its descriptive
comment says: we want to use the natural alignment of the bit-field
type only if the user didn't reduce the alignment for the bit-field
itself.

The patch also adds a comment and assert that would help someone who
has to look at this area again.

The fix would be very small, except that this introduces a new ABI
break, and we have to warn about that.  Since this actually fixes a
problem introduced in GCC 9.1, we keep the old computation to detect
when we now behave differently.

This patch adds two new tests (va_arg-17.c and
pr105549.c). va_arg-17.c contains the reduced offending testcase from
struct-layout-1.exp for reference.  We update some tests introduced by
the previous patch, where parameters with bit-fields and packed
attribute now emit a different warning.

2022-11-28  Christophe Lyon  
Richard Sandiford  

gcc/
PR target/105549
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment):
Check DECL_PACKED for bitfield.
(aarch64_layout_arg): Warn when parameter passing ABI changes.
(aarch64_function_arg_boundary): Do not warn here.
(aarch64_gimplify_va_arg_expr): Warn when parameter passing ABI
changes.

gcc/testsuite/
PR target/105549
* gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: Update.
* gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: Update.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: Update.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: Update.
* gcc.target/aarch64/aapcs64/va_arg-17.c: New test.
* gcc.target/aarch64/pr105549.c: New test.
---
 gcc/config/aarch64/aarch64.cc | 148 ++
 .../gcc.target/aarch64/aapcs64/va_arg-17.c| 105 +
 .../bitfield-abi-warning-align16-O2-extra.c   |  64 
 .../aarch64/bitfield-abi-warning-align16-O2.c |  48 +++---
 .../bitfield-abi-warning-align32-O2-extra.c   | 131 +++-
 .../aarch64/bitfield-abi-warning-align32-O2.c | 141 -
 gcc/testsuite/gcc.target/aarch64/pr105549.c   |  12 ++
 7 files changed, 414 insertions(+), 235 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-17.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr105549.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3623df5bd94..a6d95dd85bf 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -7265,14 +7265,18 @@ aarch64_vfp_is_call_candidate (cumulative_args_t 
pcum_v, machine_mode mode,
bits.  The idea is to suppress any stronger alignment requested by
the user and opt for the natural alignment (specified in AAPCS64 \S
4.1).  ABI_BREAK is set to the old alignment if the alignment was
-   incorrectly calculated in versions of GCC prior to GCC-9.  This is
-   a helper function for local use only.  */
+   incorrectly calculated in versions of GCC prior to GCC-9.
+   ABI_BREAK_PACKED is set to the old alignment if it was incorrectly
+   calculated in versions between GCC-9 and GCC-13.  This is a helper
+   function for local use only.  */
 
 static unsigned int
 aarch64_function_arg_alignment (machine_mode mode, const_tree type,
-   unsigned int *abi_break)
+   unsigned int *abi_break,
+   unsigned int *abi_break_packed)
 {
   *abi_break = 0;
+  *abi_break_packed = 0;
   if (!type)
 return GET_MODE_ALIGNMENT (mode);
 
@@ -7288,6 +7292,7 @@ aarch64_function_arg_alignment (machine_mode mode, 
const_tree type,
 return TYPE_ALIGN (TREE_TYPE (type));
 
   unsigned int alignment = 0;
+  unsigned int bitfield_alignment_with_packed = 0;
   unsigned int bitfield_alignment = 0;
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 if (TREE_CODE (field) == FIELD_DECL)
@@ -7307,11 +7312,30 @@ aarch64_function_arg_alignment (machine_mode mode, 
const_tree type,
   but gains 8-byte alignment and size thanks to "e".  */
alignment = std::max (alignment, DECL_ALIGN (field));
if (DECL_BIT_FIELD_TYPE (field))
-

[PATCH 3/3] Testcases for move sub blocks on param and ret

2022-11-29 Thread Jiufu Guo via Gcc-patches

Hi,

This patch is just add test cases, and tested on ppc64{,le}. 

With previous patches on this serial passed, Bootstrap and regtest
passed on ppc64{,le} and x86_64.

Is this ok for trunk?

BR,
Jeff (Jiufu)

PR target/65421

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr65421-1.c: New test.
* gcc.target/powerpc/pr65421.c: New test.

---
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 25 
 gcc/testsuite/gcc.target/powerpc/pr65421.c   | 22 +
 2 files changed, 47 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
new file mode 100644
index 000..a5ea675008e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
@@ -0,0 +1,25 @@
+/* PR target/65421 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+
+typedef struct SA
+{
+  double a[3];
+  long l;
+} A;
+
+/* 2 vec load, 2 vec store.  */
+A ret_arg_pt (A *a){return *a;}
+
+/* 4 std */
+A ret_arg (A a) {return a;}
+
+/* 4 std */
+void st_arg (A a, A *p) {*p = a;}
+
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M|\mlvx\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M|\mstvx\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstd\M} 8 } } */
+
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c 
b/gcc/testsuite/gcc.target/powerpc/pr65421.c
new file mode 100644
index 000..27f69e24e29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c
@@ -0,0 +1,22 @@
+/* PR target/65421 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+
+typedef struct SA
+{
+  double a[3];
+} A;
+
+/* 3 lfd */
+A ret_arg_pt (A *a){return *a;}
+
+/* blr */
+A ret_arg (A a) {return a;}
+
+/* 3 stfd */
+void st_arg (A a, A *p) {*p = a;}
+
+/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */
-- 
2.17.1

[PATCH 2/3] Use sub mode to move block for struct returns

2022-11-29 Thread Jiufu Guo via Gcc-patches

Hi,

This patch checks an assignment to see if it is copy block to a return
variable, and if the function return through registers, then use the
register mode to move sub-blocks for the assignment.

Bootstraped and regtested on ppc{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu)

PR target/65421

gcc/ChangeLog:

* cfgexpand.cc (expand_used_vars): Mark DECL_USEDBY_RETURN_P for return
vars.
* expr.cc (expand_assignment): Call move_sub_blocks for assining to a
struct return variable.
* tree-core.h (struct tree_decl_common): Comment DECL_USEDBY_RETURN_P.
* tree.h (DECL_USEDBY_RETURN_P): New define.

---
 gcc/cfgexpand.cc | 14 ++
 gcc/expr.cc  | 13 +
 gcc/tree-core.h  |  3 ++-
 gcc/tree.h   |  4 
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index dd29c03..0783cb27a59 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
 frame_phase = off ? align - off : 0;
   }
 
+  /* Mark VARs on returns.  */
+  if (DECL_RESULT (current_function_decl))
+{
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+   if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
+ {
+   tree val = gimple_return_retval (ret);
+   if (val && VAR_P (val))
+ DECL_USEDBY_RETURN_P (val) = 1;
+ }
+}
+
   /* Set TREE_USED on all variables in the local_decls.  */
   FOR_EACH_LOCAL_DECL (cfun, i, var)
 TREE_USED (var) = 1;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 201fee6fd9a..9be75d6733f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6115,6 +6115,19 @@ expand_assignment (tree to, tree from, bool nontemporal)
   return;
 }
 
+  /* If it is assigning to a struct var which will be returned, and the
+   function is returning via registers, it would be better to use the
+   register's mode to move sub-blocks for the assignment.  */
+  if (VAR_P (to) && DECL_USEDBY_RETURN_P (to) && mode == BLKmode
+  && TREE_CODE (from) != CONSTRUCTOR
+  && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) == PARALLEL)
+{
+  rtx ret = DECL_RTL (DECL_RESULT (current_function_decl));
+  machine_mode sub_mode = GET_MODE (XEXP (XVECEXP (ret, 0, 0), 0));
+  move_sub_blocks (to_rtx, from, sub_mode, nontemporal);
+  return;
+}
+
   /* Compute FROM and store the value in the rtx we got.  */
 
   push_temp_slots ();
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e146b133dbd..de4acca9ba8 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1808,7 +1808,8 @@ struct GTY(()) tree_decl_common {
  In VAR_DECL, PARM_DECL and RESULT_DECL, this is
  DECL_HAS_VALUE_EXPR_P.  */
   unsigned decl_flag_2 : 1;
-  /* In FIELD_DECL, this is DECL_PADDING_P.  */
+  /* In FIELD_DECL, this is DECL_PADDING_P
+ In VAR_DECL, this is DECL_USEDBY_RETURN_P.  */
   unsigned decl_flag_3 : 1;
   /* Logically, these two would go in a theoretical base shared by var and
  parm decl. */
diff --git a/gcc/tree.h b/gcc/tree.h
index 4a19de1c94d..b4fbf226ffc 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3007,6 +3007,10 @@ extern void decl_value_expr_insert (tree, tree);
 #define DECL_PADDING_P(NODE) \
   (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_3)
 
+/* Used in a VAR_DECL to indicate that it is used by a return stmt.  */
+#define DECL_USEDBY_RETURN_P(NODE) \
+  (VAR_DECL_CHECK (NODE)->decl_common.decl_flag_3)
+
 /* Used in a FIELD_DECL to indicate whether this field is not a flexible
array member. This is only valid for the last array type field of a
structure.  */
-- 
2.17.1

[PATCH 1/3] Use sub mode to move block for struct parameter

2022-11-29 Thread Jiufu Guo via Gcc-patches

Hi,

This patch checks an assignment to see if the "from" is about parameter,
and if the parameter may passing through registers, then use the register
mode to move sub-blocks for the assignment.

Bootstraped and regtested on ppc{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* expr.cc (move_sub_blocks): New function.
(expand_assignment): Call move_sub_blocks for assigning from parameter.

---
 gcc/expr.cc | 70 +
 1 file changed, 70 insertions(+)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..201fee6fd9a 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -5559,6 +5559,57 @@ mem_ref_refers_to_non_mem_p (tree ref)
   return non_mem_decl_p (base);
 }
 
+/* Sub routine of expand_assignment, invoked when assigning from a
+   parameter or assigning to a return val on struct type which may
+   be passed through registers.  The mode of register is used to
+   move the content for the assignment.
+
+   This routine generates code for expression FROM which is BLKmode,
+   and move the generated content to TO_RTX by su-blocks in SUB_MODE.  */
+
+static void
+move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode, bool 
nontemporal)
+{
+  HOST_WIDE_INT size, sub_size;
+  int len;
+
+  gcc_assert (MEM_P (to_rtx));
+
+  size = MEM_SIZE (to_rtx).to_constant ();
+  sub_size = GET_MODE_SIZE (sub_mode).to_constant ();
+  len = size / sub_size;
+
+  /* As there are limit registers for passing parameters or return
+ value according target ABI.  It would be not profitable to move
+ through sub-modes, if the size does not follow registers.  */
+  int heurisitic_num = 8;
+  if (!size || (size % sub_size) != 0 || len < 2 || len > heurisitic_num)
+{
+  push_temp_slots ();
+  rtx result = store_expr (from, to_rtx, 0, nontemporal, false);
+  preserve_temp_slots (result);
+  pop_temp_slots ();
+  return;
+}
+
+  push_temp_slots ();
+
+  rtx from_rtx;
+  from_rtx = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL);
+  for (int i = 0; i < len; i++)
+{
+  rtx temp = gen_reg_rtx (sub_mode);
+  rtx src = adjust_address (from_rtx, sub_mode, sub_size * i);
+  rtx dest = adjust_address (to_rtx, sub_mode, sub_size * i);
+  emit_move_insn (temp, src);
+  emit_move_insn (dest, temp);
+}
+
+  preserve_temp_slots (to_rtx);
+  pop_temp_slots ();
+  return;
+}
+
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
is true, try generating a nontemporal store.  */
 
@@ -6045,6 +6096,25 @@ expand_assignment (tree to, tree from, bool nontemporal)
   return;
 }
 
+  /* If it is assigning from a struct param which may be passed via registers,
+ It would be better to use the register's mode to move sub-blocks for the
+ assignment.  */
+  if (TREE_CODE (from) == PARM_DECL && mode == BLKmode
+  && DECL_INCOMING_RTL (from)
+  && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
+ || REG_P (DECL_INCOMING_RTL (from
+{
+  rtx parm = DECL_INCOMING_RTL (from);
+  machine_mode sub_mode;
+  if (REG_P (parm))
+   sub_mode = word_mode;
+  else
+   sub_mode = GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0));
+
+  move_sub_blocks (to_rtx, from, sub_mode, nontemporal);
+  return;
+}
+
   /* Compute FROM and store the value in the rtx we got.  */
 
   push_temp_slots ();
-- 
2.17.1

[PATCH V3] Use sub-mode to move block on struct param and ret

2022-11-29 Thread Jiufu Guo via Gcc-patches

Hi,

When assigning a parameter to a variable, or assigning a variable to
return value with struct type, "block move" are used to expand
the assignment. It would be better to use the register mode according
to the target/ABI to move the blocks. And then this would raise more 
opportunities for other optimization passes(cse/dse/xprop).

As the example code (like code in PR65421):

typedef struct SA {double a[3];} A;
A ret_arg_pt (A *a){return *a;} // on ppc64le, expect only 3 lfd(s)
A ret_arg (A a) {return a;} // just empty fun body
void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)

This patches check the "from" and "to" of an assignment in
"expand_assignment", if it is about param/ret which may passing via
register, then use the register mode to move sub-blocks for the
assignning.

This patches are based on the discussions for previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606498.html

When drafting this patch, I also investigated to update gimplify/nrv
to replace "return D.xxx;" with "return ;". While there is
one issue: "" with PARALLEL code can not be accessed through
address/component_ref.

In this patches serial, the first patch is adding support for assigning
from parameter; the second support assigning to returns; the last
one adds test cases.

BR,
Jeff (Jiufu)

[PATCH] tree-optimization/107852 - missed optimization with PHIs

2022-11-29 Thread Richard Biener via Gcc-patches

The following deals with the situation where we have

 [local count: 1073741824]:
_5 = bytes.D.25336._M_impl.D.24643._M_start;
_6 = bytes.D.25336._M_impl.D.24643._M_finish;
pretmp_66 = bytes.D.25336._M_impl.D.24643._M_end_of_storage;
if (_5 != _6)
  goto ; [70.00%]
else
  goto ; [30.00%]

...

 [local count: 329045359]:
_89 = operator new (4);
_43 = bytes.D.25336._M_impl.D.24643._M_start;
_Num_44 = _137 - _43;
if (_Num_44 != 0)

but fail to see that _137 is equal to _5 and thus eventually _Num_44
is zero if not operator new would possibly clobber the global
bytes variable.

The following resolves this in value-numbering by using the
predicated values for _5 == _6 recorded for the dominating
condition.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107852
* tree-ssa-sccvn.cc (visit_phi): Use equivalences recorded
as predicated values to elide more redundant PHIs.

* gcc.dg/tree-ssa/ssa-fre-101.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c | 47 +++
 gcc/tree-ssa-sccvn.cc   | 51 -
 2 files changed, 97 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c
new file mode 100644
index 000..c67f211dcf6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1-details" } */
+
+int test1 (int i, int j)
+{
+  int k;
+  if (i != j)
+k = i;
+  else
+k = j;
+  return k;
+}
+
+int test2 (int i, int j)
+{
+  int k;
+  if (i != j)
+k = j;
+  else
+k = i;
+  return k;
+}
+
+int test3 (int i, int j)
+{
+  int k;
+  if (i == j)
+k = j;
+  else
+k = i;
+  return k;
+}
+
+int test4 (int i, int j)
+{
+  int k;
+  if (i == j)
+k = i;
+  else
+k = j;
+  return k;
+}
+
+/* We'd expect 4 hits but since we only keep one forwarder the
+   VN predication machinery cannot record something for the entry
+   block since it doesn't work on edges but on their source.  */
+/* { dg-final { scan-tree-dump-times "equal on edge" 2 "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 1f9c6c53b52..6895ae84d13 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -5814,7 +5814,8 @@ visit_phi (gimple *phi, bool *inserted, bool 
backedges_varying_p)
 
   /* See if all non-TOP arguments have the same value.  TOP is
  equivalent to everything, so we can ignore it.  */
-  FOR_EACH_EDGE (e, ei, gimple_bb (phi)->preds)
+  basic_block bb = gimple_bb (phi);
+  FOR_EACH_EDGE (e, ei, bb->preds)
 if (e->flags & EDGE_EXECUTABLE)
   {
tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
@@ -5859,6 +5860,54 @@ visit_phi (gimple *phi, bool *inserted, bool 
backedges_varying_p)
 && known_eq (soff, doff))
  continue;
  }
+   /* There's also the possibility to use equivalences.  */
+   if (!FLOAT_TYPE_P (TREE_TYPE (def)))
+ {
+   vn_nary_op_t vnresult;
+   tree ops[2];
+   ops[0] = def;
+   ops[1] = sameval;
+   tree val = vn_nary_op_lookup_pieces (2, EQ_EXPR,
+boolean_type_node,
+ops, );
+   if (! val && vnresult && vnresult->predicated_values)
+ {
+   val = vn_nary_op_get_predicated_value (vnresult, e->src);
+   if (val && integer_truep (val))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "Predication says ");
+   print_generic_expr (dump_file, def, TDF_NONE);
+   fprintf (dump_file, " and ");
+   print_generic_expr (dump_file, sameval, TDF_NONE);
+   fprintf (dump_file, " are equal on edge %d -> %d\n",
+e->src->index, e->dest->index);
+ }
+   continue;
+ }
+   /* If on all previous edges the value was equal to def
+  we can change sameval to def.  */
+   if (EDGE_COUNT (bb->preds) == 2
+   && (val = vn_nary_op_get_predicated_value
+   (vnresult, EDGE_PRED (bb, 0)->src))
+   && integer_truep (val))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "Predication says ");
+   print_generic_expr (dump_file, def, TDF_NONE);
+

Re: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

2022-11-29 Thread Jiufu Guo via Gcc-patches

Hi Segher,

Thanks for your comment!

Segher Boessenkool  writes:

> On Mon, Nov 28, 2022 at 03:51:59PM +0800, Jiufu Guo wrote:
>> Jiufu Guo via Gcc-patches  writes:
>> > Segher Boessenkool  writes:
>> >>> > +  else
>> >>> > +  {
>> >>> > +emit_move_insn (temp,
>> >>> > +GEN_INT (((ud2 << 16) ^ 0x8000) - 
>> >>> > 0x8000));
>> >>> > +if (ud1 != 0)
>> >>> > +  emit_move_insn (temp, gen_rtx_IOR (DImode, temp, GEN_INT 
>> >>> > (ud1)));
>> >>> > +emit_move_insn (dest,
>> >>> > +gen_rtx_ZERO_EXTEND (DImode,
>> >>> > + gen_lowpart (SImode, 
>> >>> > temp)));
>> >>> > +  }
>> >>
>> >> Why this?  Please just write it in DImode, do not go via SImode?
>> > Thanks for catch this. Yes, gen_lowpart with DImode would be ok.
>> Oh, Sorry. DImode can not be used here.  The genreated pattern with
>> DImode can not be recognized.  Using SImode is to match 'rlwxx'.
>
> There are patterns that accept DImode for rlwinm just fine.  Please use
>   (and:DI (const_int 0x) (x:DI))
> not the obfuscated
>   (zero_extend:DI (subreg:SI (x:DI) LOWBYTE))
>
Agree, 'and 0x' would be easy to read. Here is an small patch
for it.  I believe it should be no regression. :-) To make sure, I will
do more bootstraps and regtests, and then submit it.


BR,
Jeff (Jiufu)

NFC: use more readable pattern to clean high bits

This patch is just using a more readable pattern for "rldicl x,x,0,32"
to clean high 32bits.
Old pattern looks like: r118:DI=zero_extend(r120:DI#0)
new pattern looks like: r118:DI=r120:DI&0x

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update
zero_extend(reg:DI#0) to reg:DI&0x

---
 gcc/config/rs6000/rs6000.cc | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index eb7ad5e954f..5efe9b22d8b 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10267,10 +10267,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
emit_move_insn (copy_rtx (temp),
gen_rtx_IOR (DImode, copy_rtx (temp),
 GEN_INT (ud1)));
-  emit_move_insn (dest,
- gen_rtx_ZERO_EXTEND (DImode,
-  gen_lowpart (SImode,
-   copy_rtx (temp;
+  emit_move_insn (dest, gen_rtx_AND (DImode, temp, GEN_INT (0x)));
 }
   else if (ud1 == ud3 && ud2 == ud4)
 {
-- 
2.17.1

>
> Segher

[PATCH] c++: Incremental fix for g++.dg/gomp/for-21.C [PR84469]

2022-11-29 Thread Jakub Jelinek via Gcc-patches

Hi!

The PR84469 patch I've just posted regresses the for-21.C testcase,
when in OpenMP loop there are at least 2 associated loops and
in a template outer structured binding with non type dependent expression
is used in the expressions of some inner loop, we don't diagnose those
any longer, as the (weirdly worded) diagnostics was only done during
finish_id_expression -> mark_used which for the inner loop expressions
happens before the structured bindings are finalized.  When in templates,
mark_used doesn't diagnose uses of non-deduced variables, and if the
range for expression is type dependent, it is similarly diagnosed during
instantiation.  But newly with the PR84469 fix if the range for expression
is not type dependent, there is no place that would diagnose it, as during
instantiation the structured bindings are already deduced.

The following patch diagnoses it in that case during finish_omp_for (for
consistency with the same weird message).

I'll commit this to trunk if the other patch is approved and it passes
bootstrap/regtest.

2022-11-29  Jakub Jelinek  

PR c++/84469
* semantics.cc: Define INCLUDE_MEMORY before including system.h.
(struct finish_omp_for_data): New type.
(finish_omp_for_decomps_r): New function.
(finish_omp_for): Diagnose uses of non-type-dependent range for
loop decompositions in inner OpenMP associated loops in templates.

* g++.dg/gomp/for-21.C (f6): Adjust lines of expected diagnostics.
* g++.dg/gomp/for-22.C: New test.

--- gcc/cp/semantics.cc.jj  2022-11-19 09:21:14.897436616 +0100
+++ gcc/cp/semantics.cc 2022-11-29 12:58:36.165771985 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 .  */
 
 #include "config.h"
+#define INCLUDE_MEMORY
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
@@ -10401,6 +10402,47 @@ handle_omp_for_class_iterator (int i, lo
   return false;
 }
 
+struct finish_omp_for_data {
+  std::unique_ptr> decomps;
+  bool fail;
+  location_t loc;
+};
+
+/* Helper function for finish_omp_for.  Diagnose uses of structured
+   bindings of OpenMP collapsed loop range for loops in the associated
+   loops.  If not processing_template_decl, this is diagnosed by
+   finish_id_expression -> mark_used before the range for is deduced.
+   And if processing_template_decl and the range for expression is
+   type dependent, it is similarly diagnosed during instantiation.
+   Only when processing_template_decl and range for expression is
+   not type dependent, we wouldn't diagnose it at all, so do it
+   from finish_omp_for in that case.  */
+
+static tree
+finish_omp_for_decomps_r (tree *tp, int *, void *d)
+{
+  if (VAR_P (*tp)
+  && DECL_DECOMPOSITION_P (*tp)
+  && !type_dependent_expression_p (*tp)
+  && DECL_HAS_VALUE_EXPR_P (*tp))
+{
+  tree v = DECL_VALUE_EXPR (*tp);
+  if (TREE_CODE (v) == ARRAY_REF
+ && VAR_P (TREE_OPERAND (v, 0))
+ && DECL_DECOMPOSITION_P (TREE_OPERAND (v, 0)))
+   {
+ finish_omp_for_data *data = (finish_omp_for_data *) d;
+ if (data->decomps->contains (TREE_OPERAND (v, 0)))
+   {
+ error_at (data->loc, "use of %qD before deduction of %",
+   *tp);
+ data->fail = true;
+   }
+   }
+}
+  return NULL_TREE;
+}
+
 /* Build and validate an OMP_FOR statement.  CLAUSES, BODY, COND, INCR
are directly for their associated operands in the statement.  DECL
and INIT are a combo; if DECL is NULL then INIT ought to be a
@@ -10419,6 +10461,7 @@ finish_omp_for (location_t locus, enum t
   int i;
   int collapse = 1;
   int ordered = 0;
+  finish_omp_for_data data;
 
   gcc_assert (TREE_VEC_LENGTH (declv) == TREE_VEC_LENGTH (initv));
   gcc_assert (TREE_VEC_LENGTH (declv) == TREE_VEC_LENGTH (condv));
@@ -10479,7 +10522,25 @@ finish_omp_for (location_t locus, enum t
elocus = EXPR_LOCATION (init);
 
   if (cond == global_namespace)
-   continue;
+   {
+ gcc_assert (processing_template_decl);
+ if (TREE_VEC_LENGTH (declv) > 1
+ && VAR_P (decl)
+ && DECL_DECOMPOSITION_P (decl)
+ && !type_dependent_expression_p (decl))
+   {
+ gcc_assert (DECL_HAS_VALUE_EXPR_P (decl));
+ tree v = DECL_VALUE_EXPR (decl);
+ gcc_assert (TREE_CODE (v) == ARRAY_REF
+ && VAR_P (TREE_OPERAND (v, 0))
+ && DECL_DECOMPOSITION_P (TREE_OPERAND (v, 0)));
+ if (!data.decomps)
+   data.decomps
+ = std::unique_ptr> (new hash_set);
+ data.decomps->add (TREE_OPERAND (v, 0));
+   }
+ continue;
+   }
 
   if (cond == NULL)
{
@@ -10497,6 +10558,37 @@ finish_omp_for (location_t locus, enum t
   TREE_VEC_ELT (initv, i) = init;
 }
 
+  if (data.decomps)
+{
+  data.fail = false;
+

[PATCH] c++: Deduce range for structured bindings if expression is not type dependent [PR84469]

2022-11-29 Thread Jakub Jelinek via Gcc-patches

Hi!

As shown on the decomp56.C testcase, if the range for expression
when using structured bindings is not type dependent, we deduce
the finish the structured binding types only when not in template
(cp_convert_range_for takes care of that), but if in templates,
do_range_for_auto_deduction is called instead and it doesn't handle
structured bindings.  During instantiation they are handled later,
but during the parsing keeping the structured bindings type
dependent when they shouldn't be changes behavior.
The following patch calls cp_finish_decomp even from
do_range_for_auto_deduction.
The patch regresses the OpenMP g++.dg/gomp/for-21.C test (3 errors
are gone), I'll post an incremental patch for it momentarily.

Otherwise bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2022-11-29  Jakub Jelinek  

PR c++/84469
* parser.cc (do_range_for_auto_deduction): Add DECOMP_FIRST_NAME
and DECOMP_CNT arguments.  Call cp_finish_decomp if DECL
is a structured binding.
(cp_parser_range_for): Adjust do_range_for_auto_deduction caller.
(cp_convert_omp_range_for): Likewise.

* g++.dg/cpp1z/decomp56.C: New test.
* g++.dg/gomp/pr84469.C: New test.

--- gcc/cp/parser.cc.jj 2022-11-19 09:21:14.0 +0100
+++ gcc/cp/parser.cc2022-11-25 15:39:15.326262120 +0100
@@ -2342,7 +2342,7 @@ static tree cp_parser_c_for
 static tree cp_parser_range_for
   (cp_parser *, tree, tree, tree, bool, unsigned short, bool);
 static void do_range_for_auto_deduction
-  (tree, tree);
+  (tree, tree, tree, unsigned int);
 static tree cp_parser_perform_range_for_lookup
   (tree, tree *, tree *);
 static tree cp_parser_range_for_member_function
@@ -13668,7 +13668,8 @@ cp_parser_range_for (cp_parser *parser,
   if (!type_dependent_expression_p (range_expr)
  /* do_auto_deduction doesn't mess with template init-lists.  */
  && !BRACE_ENCLOSED_INITIALIZER_P (range_expr))
-   do_range_for_auto_deduction (range_decl, range_expr);
+   do_range_for_auto_deduction (range_decl, range_expr, decomp_first_name,
+decomp_cnt);
 }
   else
 {
@@ -13707,7 +13708,8 @@ build_range_temp (tree range_expr)
a shortcut version of cp_convert_range_for.  */
 
 static void
-do_range_for_auto_deduction (tree decl, tree range_expr)
+do_range_for_auto_deduction (tree decl, tree range_expr,
+tree decomp_first_name, unsigned int decomp_cnt)
 {
   tree auto_node = type_uses_auto (TREE_TYPE (decl));
   if (auto_node)
@@ -13727,6 +13729,8 @@ do_range_for_auto_deduction (tree decl,
iter_decl, auto_node,
tf_warning_or_error,
adc_variable_type);
+ if (VAR_P (decl) && DECL_DECOMPOSITION_P (decl))
+   cp_finish_decomp (decl, decomp_first_name, decomp_cnt);
}
 }
 }
@@ -42981,15 +42985,21 @@ cp_convert_omp_range_for (tree _pre
  && !BRACE_ENCLOSED_INITIALIZER_P (init))
{
  tree d = decl;
+ tree decomp_first_name = NULL_TREE;
+ unsigned decomp_cnt = 0;
  if (decl != error_mark_node && DECL_HAS_VALUE_EXPR_P (decl))
{
  tree v = DECL_VALUE_EXPR (decl);
  if (TREE_CODE (v) == ARRAY_REF
  && VAR_P (TREE_OPERAND (v, 0))
  && DECL_DECOMPOSITION_P (TREE_OPERAND (v, 0)))
-   d = TREE_OPERAND (v, 0);
+   {
+ d = TREE_OPERAND (v, 0);
+ decomp_cnt = tree_to_uhwi (TREE_OPERAND (v, 1)) + 1;
+ decomp_first_name = decl;
+   }
}
- do_range_for_auto_deduction (d, init);
+ do_range_for_auto_deduction (d, init, decomp_first_name, decomp_cnt);
}
   cond = global_namespace;
   incr = NULL_TREE;
--- gcc/testsuite/g++.dg/cpp1z/decomp56.C.jj2022-11-25 15:55:27.673217565 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp56.C   2022-11-25 16:08:28.238930284 
+0100
@@ -0,0 +1,29 @@
+// PR c++/84469
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+struct A {
+  template 
+  void bar () const {}
+  template 
+  void baz () const {}
+};
+struct B { A a; };
+
+template 
+void
+foo ()
+{
+  A a[1][1];
+  for (auto const& [b]: a) // { dg-warning "structured bindings only 
available with" "" { target c++14_down } }
+b.bar ();
+  B c;
+  auto const& [d] = c; // { dg-warning "structured bindings only 
available with" "" { target c++14_down } }
+  d.baz ();
+}
+
+int
+main ()
+{
+  foo ();
+}
--- gcc/testsuite/g++.dg/gomp/pr84469.C.jj  2022-11-25 15:57:25.805510359 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr84469.C 2022-11-25 16:08:40.123758315 +0100
@@ -0,0 +1,24 @@
+// PR c++/84469
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+struct A {
+  template 
+  void bar () const {}

[PATCH] tree-optimization/106995 - if-conversion and vanishing loops

2022-11-29 Thread Richard Biener via Gcc-patches

When we version loops for vectorization during if-conversion it
can happen that either loop vanishes because we run some VN and
CFG cleanup.  If the to-be vectorized part vanishes we already
redirect the versioning condition to the original loop.  The following
does the same in case the original loop vanishes as happened
for the testcase in the bug in the past (but no longer).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106995
* tree-if-conv.cc (pass_if_conversion::execute): Also redirect the
versioning condition to the original loop if this very loop
vanished during CFG cleanup.
---
 gcc/tree-if-conv.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 34bb507ff3b..64b20b4a9e1 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3761,7 +3761,8 @@ pass_if_conversion::execute (function *fun)
   if (!gimple_bb (g))
continue;
   unsigned ifcvt_loop = tree_to_uhwi (gimple_call_arg (g, 0));
-  if (!get_loop (fun, ifcvt_loop))
+  unsigned orig_loop = tree_to_uhwi (gimple_call_arg (g, 1));
+  if (!get_loop (fun, ifcvt_loop) || !get_loop (fun, orig_loop))
{
  if (dump_file)
fprintf (dump_file, "If-converted loop vanished\n");
-- 
2.35.3

Fix PR ada/107810

2022-11-29 Thread Eric Botcazou via Gcc-patches

This just makes the pattern matching more robust.

Tested on SPARC64/Solaris and x86-64/Linux, applied on the mainline.


2022-11-29  Eric Botcazou  

PR ada/107810
* gnat.dg/unchecked_convert9.adb: Adjust pattern.

-- 
Eric Botcazoudiff --git a/gcc/testsuite/gnat.dg/unchecked_convert9.adb b/gcc/testsuite/gnat.dg/unchecked_convert9.adb
index a01584f704d..5d12d623a9c 100644
--- a/gcc/testsuite/gnat.dg/unchecked_convert9.adb
+++ b/gcc/testsuite/gnat.dg/unchecked_convert9.adb
@@ -11,4 +11,4 @@ package body Unchecked_Convert9 is
 
 end Unchecked_Convert9;
 
--- { dg-final { scan-rtl-dump-times "set \\(mem/v" 1 "final" } }
+-- { dg-final { scan-rtl-dump-times "set \\(mem/v/c" 1 "final" } }

Re: [PATCH] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2022-11-29 Thread Manolis Tsamis

Hi all,

based on everyone's comments I have sent a v2 of this patch that can
be found here
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607472.html

As per Richard's comments the pattern now uses vec_cond_expr instead and
includes other fixes as requested.

Also based on Tamar's suggestion I have made it work with poly_int instead of
aborting for VLA vectors.

I would appreciate any further feedback for the new version.

Manolis

On Tue, Nov 22, 2022 at 12:57 PM Tamar Christina
 wrote:
>
> Hi,
>
> > -Original Message-
> > From: Philipp Tomsich 
> > Sent: Tuesday, November 22, 2022 10:35 AM
> > To: Tamar Christina 
> > Cc: Richard Biener ; mtsamis
> > ; GCC Patches ;
> > jiangning@amperecomputing.com
> > Subject: Re: [PATCH] Add pattern to convert vector shift + bitwise and +
> > multiply to vector compare in some cases.
> >
> > Richard & Tamar,
> >
> > On Fri, 26 Aug 2022 at 15:29, Tamar Christina 
> > wrote:
> > >
> > > > -Original Message-
> > > > From: Gcc-patches  > > > bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard
> > > > Biener via Gcc-patches
> > > > Sent: Friday, August 26, 2022 10:08 AM
> > > > To: mtsamis 
> > > > Cc: GCC Patches ;
> > > > jiangning@amperecomputing.com; Philipp Tomsich
> > > > 
> > > > Subject: Re: [PATCH] Add pattern to convert vector shift + bitwise
> > > > and + multiply to vector compare in some cases.
> > > >
> > > > On Sat, Aug 13, 2022 at 11:59 AM mtsamis 
> > wrote:
> > > > >
> > > > > When using SWAR (SIMD in a register) techniques a comparison
> > > > > operation within such a register can be made by using a
> > > > > combination of shifts, bitwise and and multiplication. If code
> > > > > using this scheme is vectorized then there is potential to replace
> > > > > all these operations with a single vector comparison, by
> > > > > reinterpreting the vector types to
> > > > match the width of the SWAR register.
> > > > >
> > > > > For example, for the test function packed_cmp_16_32, the original
> > > > generated code is:
> > > > >
> > > > > ldr q0, [x0]
> > > > > add w1, w1, 1
> > > > > ushrv0.4s, v0.4s, 15
> > > > > and v0.16b, v0.16b, v2.16b
> > > > > shl v1.4s, v0.4s, 16
> > > > > sub v0.4s, v1.4s, v0.4s
> > > > > str q0, [x0], 16
> > > > > cmp w2, w1
> > > > > bhi .L20
> > > > >
> > > > > with this pattern the above can be optimized to:
> > > > >
> > > > > ldr q0, [x0]
> > > > > add w1, w1, 1
> > > > > cmltv0.8h, v0.8h, #0
> > > > > str q0, [x0], 16
> > > > > cmp w2, w1
> > > > > bhi .L20
> > > > >
> > > > > The effect is similar for x86-64.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * match.pd: Simplify vector shift + bit_and + multiply in 
> > > > > some cases.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/aarch64/swar_to_vec_cmp.c: New test.
> > > > >
> > > > > Signed-off-by: mtsamis 
> > > > > ---
> > > > >  gcc/match.pd  | 57 +++
> > > > >  .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72
> > > > +++
> > > > >  2 files changed, 129 insertions(+)  create mode 100644
> > > > > gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > > > > 8bbc0dbd5cd..5c768a94846 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -301,6 +301,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > >  (view_convert (bit_and:itype (view_convert @0)
> > > > >  (ne @1 { build_zero_cst (type);
> > > > > })))
> > > > >
> > > > > +/* In SWAR (SIMD in a register) code a comparison of packed data can
> > > > > +   be consturcted with a particular combination of shift, bitwise 
> > > > > and,
> > > > > +   and multiplication by constants.  If that code is vectorized we 
> > > > > can
> > > > > +   convert this pattern into a more efficient vector comparison.
> > > > > +*/ (simplify  (mult (bit_and (rshift @0 @1) @2) @3)
> > > >
> > > > You should restrict the pattern a bit more, below you use
> > > > uniform_integer_cst_p and also require a vector type thus
> > > >
> > > >   (simplify
> > > >(mult (bit_and (rshift @0 VECTOR_CST@1) VECTOR_CST@2)
> > > > VECTOR_CST@3)
> > > >
> > > >
> > > > > + (with {
> > > > > +   tree op_type = TREE_TYPE (@0);
> > > >
> > > > that's the same as 'type' which is already available.
> > > >
> > > > > +   tree rshift_cst = NULL_TREE;
> > > > > +   tree bit_and_cst = NULL_TREE;
> > > > > +   tree mult_cst = NULL_TREE;
> > > > > +  }
> > > > > +  /* Make sure we're working with vectors and uniform vector
> > > > > + constants.  */  (if (VECTOR_TYPE_P (op_type)
> > > > > +   && (rshift_cst = uniform_integer_cst_p (@1))
> > > > > +   && (bit_and_cst = uniform_integer_cst_p

[PATCH v2] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2022-11-29 Thread Manolis Tsamis

When using SWAR (SIMD in a register) techniques a comparison operation within
such a register can be made by using a combination of shifts, bitwise and and
multiplication. If code using this scheme is vectorized then there is potential
to replace all these operations with a single vector comparison, by 
reinterpreting
the vector types to match the width of the SWAR register.

For example, for the test function packed_cmp_16_32, the original generated 
code is:

ldr q0, [x0]
add w1, w1, 1
ushrv0.4s, v0.4s, 15
and v0.16b, v0.16b, v2.16b
shl v1.4s, v0.4s, 16
sub v0.4s, v1.4s, v0.4s
str q0, [x0], 16
cmp w2, w1
bhi .L20

with this pattern the above can be optimized to:

ldr q0, [x0]
add w1, w1, 1
cmltv0.8h, v0.8h, #0
str q0, [x0], 16
cmp w2, w1
bhi .L20

The effect is similar for x86-64.

Signed-off-by: Manolis Tsamis 

gcc/ChangeLog:

* match.pd: Simplify vector shift + bit_and + multiply in some cases.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/swar_to_vec_cmp.c: New test.

---

Changes in v2:
- Changed pattern to use vec_cond_expr.
- Changed pattern to work with VLA vector.
- Added more checks and comments.

 gcc/match.pd  | 60 
 .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72 +++
 2 files changed, 132 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 67a0a682f31..05e7fc79ba8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -301,6 +301,66 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (view_convert (bit_and:itype (view_convert @0)
 (ne @1 { build_zero_cst (type); })))
 
+/* In SWAR (SIMD in a register) code a signed comparison of packed data can
+   be constructed with a particular combination of shift, bitwise and,
+   and multiplication by constants.  If that code is vectorized we can
+   convert this pattern into a more efficient vector comparison.  */
+(simplify
+ (mult (bit_and (rshift @0 uniform_integer_cst_p@1)
+   uniform_integer_cst_p@2)
+uniform_integer_cst_p@3)
+ (with {
+   tree rshift_cst = uniform_integer_cst_p (@1);
+   tree bit_and_cst = uniform_integer_cst_p (@2);
+   tree mult_cst = uniform_integer_cst_p (@3);
+  }
+  /* Make sure we're working with vectors and uniform vector constants.  */
+  (if (VECTOR_TYPE_P (type)
+   && tree_fits_uhwi_p (rshift_cst)
+   && tree_fits_uhwi_p (mult_cst)
+   && tree_fits_uhwi_p (bit_and_cst))
+   /* Compute what constants would be needed for this to represent a packed
+  comparison based on the shift amount denoted by RSHIFT_CST.  */
+   (with {
+ HOST_WIDE_INT vec_elem_bits = vector_element_bits (type);
+ poly_int64 vec_nelts = TYPE_VECTOR_SUBPARTS (type);
+ poly_int64 vec_bits = vec_elem_bits * vec_nelts;
+
+ unsigned HOST_WIDE_INT cmp_bits_i, bit_and_i, mult_i;
+ unsigned HOST_WIDE_INT target_mult_i, target_bit_and_i;
+ cmp_bits_i = tree_to_uhwi (rshift_cst) + 1;
+ target_mult_i = (HOST_WIDE_INT_1U << cmp_bits_i) - 1;
+
+ mult_i = tree_to_uhwi (mult_cst);
+ bit_and_i = tree_to_uhwi (bit_and_cst);
+ target_bit_and_i = 0;
+
+ /* The bit pattern in BIT_AND_I should be a mask for the least
+significant bit of each packed element that is CMP_BITS wide.  */
+ for (unsigned i = 0; i < vec_elem_bits / cmp_bits_i; i++)
+   target_bit_and_i = (target_bit_and_i << cmp_bits_i) | 1U;
+}
+(if ((exact_log2 (cmp_bits_i)) >= 0
+&& cmp_bits_i < HOST_BITS_PER_WIDE_INT
+&& multiple_p (vec_bits, cmp_bits_i)
+&& vec_elem_bits <= HOST_BITS_PER_WIDE_INT
+&& target_mult_i == mult_i
+&& target_bit_and_i == bit_and_i)
+ /* Compute the vector shape for the comparison and check if the target is
+   able to expand the comparison with that type.  */
+ (with {
+   /* We're doing a signed comparison.  */
+   tree cmp_type = build_nonstandard_integer_type (cmp_bits_i, 0);
+   poly_int64 vector_type_nelts = exact_div (vec_bits, cmp_bits_i);
+   tree vector_cmp_type = build_vector_type (cmp_type, vector_type_nelts);
+   tree zeros = build_zero_cst (vector_cmp_type);
+   tree ones = build_all_ones_cst (vector_cmp_type);
+  }
+  (if (expand_vec_cmp_expr_p (vector_cmp_type, vector_cmp_type, LT_EXPR))
+   (view_convert:type (vec_cond (lt (view_convert:vector_cmp_type @0)
+{ zeros; })
+  { ones; } { zeros; })
+
 (for cmp (gt ge lt le)
  outp (convert convert negate negate)
  outn (negate negate convert convert)
diff --git a/gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c 
b/gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
new file

[PATCH] range-op-float: Fix up multiplication and division reverse operation [PR107879]

2022-11-29 Thread Jakub Jelinek via Gcc-patches

Hi!

While for the normal cases it seems to be correct to implement
reverse multiplication (op1_range/op2_range) through division
with float_binary_op_range_finish, reverse division (op1_range)
through multiplication with float_binary_op_range_finish or
(op2_range) through division with float_binary_op_range_finish,
as e.g. following testcase shows for the corner cases it is
incorrect.
Say on the testcase we are doing reverse multiplication, we
have [-0., 0.] range (no NAN) on lhs and VARYING on op1 (or op2).
We implement that through division, because x from
lhs = x * op2
is
x = lhs / op2
For the division, [-0., 0.] / VARYING is computed (IMHO correctly)
as [-0., 0.] +-NAN, because 0 / anything but 0 or NAN is still
0 and 0 / 0 is NAN and ditto 0 / NAN.  And then we just
float_binary_op_range_finish, which figures out that because lhs
can't be NAN, neither operand can be NAN.  So, the end range is
[-0., 0.].  But that is not correct for the reverse multiplication.
When the result is 0, if op2 can be zero, then x can be anything
(VARYING), to be precise anything but INF (unless result can be NAN),
because anything * 0 is 0 (or NAN for INF).  While if op2 must be
non-zero, then x must be 0.  Of course the sign logic
(signbit(x) = signbit(lhs) ^ signbit(op2)) still holds, so it actually
isn't full VARYING if both lhs and op2 have known sign bits.
And going through other corner cases one by one shows other differences
between what we compute for the corresponding forward operation and
what we should compute for the reverse operations.
The following patch is slightly conservative and includes INF
(in case of result including 0 and not NAN) in the ranges or
0 in the ranges (in case of result including INF and not NAN).
The latter is what happens anyway because we flush denormals to 0,
and the former just not to deal with all the corner cases.
So, the end test is that for reverse multiplication and division
op2_range the cases we need to adjust to VARYING or VARYING positive
or VARYING negative are if lhs and op? ranges both contain 0,
or both contain some infinity, while for division op1_range the
corner case is if lhs range contains 0 and op2 range contains INF or vice
versa.  Otherwise I believe ranges from the corresponding operation
are ok, or could be slightly more conservative (e.g. for
reverse multiplication, if op? range is singleton INF and lhs
range doesn't include any INF, then x's range should be UNDEFINED or
known NAN (depending on if lhs can be NAN), while the division computes
[-0., 0.] +-NAN; or similarly if op? range is only 0 and lhs range
doesn't include 0, division would compute +INF +-NAN, or -INF +-NAN,
or (for lack of multipart franges -INF +INF +-NAN just VARYING +-NAN),
while again it is UNDEFINED or known NAN.

Oh, and I found by code inspection wrong condition for the division's
known NAN result, due to thinko it would trigger not just when
both operands are known to be 0 or both are known to be INF, but
when either both are known to be 0, or at least one is known to be INF.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-29  Jakub Jelinek  

PR tree-optimization/107879
* range-op-float.cc (foperator_mult::op1_range): If both
lhs and op2 ranges contain zero or both ranges contain
some infinity, set r range to zero_to_inf_range depending on
signbit_known_p.
(foperator_div::op2_range): Similarly for lhs and op1 ranges.
(foperator_div::op1_range): If lhs range contains zero and op2
range contains some infinity or vice versa, set r range to
zero_to_inf_range depending on signbit_known_p.
(foperator_div::rv_fold): Fix up condition for returning known NAN.

--- gcc/range-op-float.cc.jj2022-11-18 09:00:44.371322999 +0100
+++ gcc/range-op-float.cc   2022-11-28 19:45:50.347869350 +0100
@@ -2143,8 +2143,30 @@ public:
 range_op_handler rdiv (RDIV_EXPR, type);
 if (!rdiv)
   return false;
-return float_binary_op_range_finish (rdiv.fold_range (r, type, lhs, op2),
-r, type, lhs);
+bool ret = rdiv.fold_range (r, type, lhs, op2);
+if (ret == false)
+  return false;
+const REAL_VALUE_TYPE _lb = lhs.lower_bound ();
+const REAL_VALUE_TYPE _ub = lhs.upper_bound ();
+const REAL_VALUE_TYPE _lb = op2.lower_bound ();
+const REAL_VALUE_TYPE _ub = op2.upper_bound ();
+if ((contains_zero_p (lhs_lb, lhs_ub) && contains_zero_p (op2_lb, op2_ub))
+   || ((real_isinf (_lb) || real_isinf (_ub))
+   && (real_isinf (_lb) || real_isinf (_ub
+  {
+   // If both lhs and op2 could be zeros or both could be infinities,
+   // we don't know anything about op1 except maybe for the sign
+   // and perhaps if it can be NAN or not.
+   REAL_VALUE_TYPE lb, ub;
+   int signbit_known = signbit_known_p (lhs_lb, lhs_ub, op2_lb, op2_ub);
+   zero_to_inf_range (lb, ub, signbit_known);
+

[PATCH] ipa/107897 - avoid property verification ICE after error

2022-11-29 Thread Richard Biener via Gcc-patches

The target clone pass is the only small IPA pass that doesn't disable
itself after errors but has properties whose verification can fail
because we cut off build SSA passes after errors.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR ipa/107897
* multiple_target.cc (pass_target_clone::gate): Disable
after errors.
---
 gcc/multiple_target.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index 77e0f21dd05..fd88c22b002 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -539,7 +539,8 @@ public:
 bool
 pass_target_clone::gate (function *)
 {
-  return true;
+  /* If there were any errors avoid pass property verification errors.  */
+  return !seen_error ();
 }
 
 } // anon namespace
-- 
2.35.3

[PATCH] tree-optimization/107898 - ICE with -Walloca-larger-than

2022-11-29 Thread Richard Biener via Gcc-patches

The following avoids ICEing with a mismatched prototype for alloca
and -Walloca-larger-than using irange for checks which doesn't
like mismatched types.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107898
* gimple-ssa-warn-alloca.cc (alloca_call_type): Check
the type of the alloca argument is compatible with size_t
before querying ranges.
---
 gcc/gimple-ssa-warn-alloca.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/gimple-ssa-warn-alloca.cc b/gcc/gimple-ssa-warn-alloca.cc
index 83a241a3a4b..dcc62ca77bf 100644
--- a/gcc/gimple-ssa-warn-alloca.cc
+++ b/gcc/gimple-ssa-warn-alloca.cc
@@ -217,6 +217,7 @@ alloca_call_type (gimple *stmt, bool is_vla)
   int_range_max r;
   if (warn_limit_specified_p (is_vla)
   && TREE_CODE (len) == SSA_NAME
+  && types_compatible_p (TREE_TYPE (len), size_type_node)
   && get_range_query (cfun)->range_of_expr (r, len, stmt)
   && !r.varying_p ())
 {
-- 
2.35.3

Re: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

2022-11-29 Thread Jiufu Guo via Gcc-patches

Hi Segher,

Thanks for your review!

Segher Boessenkool  writes:

> On Mon, Nov 28, 2022 at 11:37:34AM +0800, Jiufu Guo wrote:
>> Segher Boessenkool  writes:
>> > On Fri, Nov 25, 2022 at 04:11:49PM +0800, Kewen.Lin wrote:
>> >> on 2022/10/26 19:40, Jiufu Guo wrote:
>> >> for "li/lis + oris/xoris", I interpreted it into four combinations:
>> >> 
>> >>li + oris, lis + oris, li + xoris, lis + xoris.
>> >> 
>> >> not sure just me interpreting like that, but the actual combinations
>> >> which this patch adopts are:
>> >> 
>> >>li + oris, li + xoris, lis + xoris.
>> >> 
>> >> It's a bit off, but not a big deal, up to you to reword it or not.  :)
>> >
>> > The first two are obvious, but the last one is almost never a good idea,
>> > there usually are better ways to do the same.  I cannot even think of
>> > any case where this is best?  A lis;rl* is always prefered (it can
>> > optimise better, be combined with other insns).
>> I understant your point here.  The first two: 'li' for lowest 16bits,
>> 'oris/xoris' for next 16bits.
>> 
>> While for 'lis + xoris', it may not obvious, because both 'lis' and
>> 'xoris' operates on 17-31bits.
>> 'lis + xoris' is for case "32(1) || 1(0) || 15(x) || 16(0)". xoris is
>> used to clean bit31.  This case seems hard to be supported by 'rlxx'.
>
> Please put that in a separate patch?  First do a patch with just
> lis;x?oris.  They are unrelated and different in almost every way.

Sure, Thanks for the advice!
>
>> I hit to find this case when I analyze what kind of constants can be
>> build by two instructions. Checked the posssible combinations:
>> "addi/addis" + "neg/ori/../xoris/rldX/rlwX/../sradi/extswsli"(those
>> instructions which accept one register and one immediate).
>> 
>> I also drafted the patch to use "li/lis+rlxx" to build constant.
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601276.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601277.html
>
> Those seem to do many things in one patch as well :-(  It is very hard
> to review such things, it takes many hours each to do properly.
Sorry, I will try to seperate them to smaller granularities!

BR,
Jeff (Jiufu)
>
>
> Segher

Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

2022-11-29 Thread Kito Cheng via Gcc-patches

> >>> Yeah, I personally want to support RVV intrinsics in GCC13. As RVV
> >>> intrinsic is going to release soon next week.
> >>
> >> OK, that's fine with me -- I was leaning that way, and I think Jeff only
> >> had a weak opposition.  Are there any more changes required outside the
> >> RISC-V backend?  Those would be the most controversial and are already
> >> late, but if it's only backend stuff at this point then I'm OK taking
> >> the risk for a bit longer.
> >>
> >> Jeff?
> > It's not ideal, but I can live with the bits going into gcc-13 as long
> > as they don't bleed out of the RISC-V port.
>
> Ya, that's kind of what happens every release though (and not just in
> GCC, it's that way for everything).  Maybe for gcc-14 we can commit to
> taking the stage1/stage3 split seriously in RISC-V land?
>
> It's early enough that nobody should be surprised, and even if we don't
> need to do it as per the GCC rules we're going to go crazy if we keep
> letting things go until the last minute like this.  I think the only
> real fallout we've had so far was the B stuff in binutils, but we've
> been exceedingly close to broken releases way too many times and it's
> going to bite us at some point.

I hope we can follow GCC development rule in GCC 14 too, we don't have enough
engineer resource and community in RISC-V GNU land before, but now we have
more people join the development work and review work, so I believe that
could be improved next year.



Hi Jeff:

Thanksgiving holiday is over, but I guess it's never too late to say thanks.
Thank you for joining the RISC-V world and helping review lots of patches :)

[PATCH (pushed)] re-run configure

2022-11-29 Thread Martin Liška


gcc/ChangeLog:

* configure: Regenerate.
---
 gcc/configure | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index aa0960991c9..a742d4406a8 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7843,7 +7843,8 @@ if test x${enable_multiarch} = xauto; then
   fi
 fi
 if test x${enable_multiarch} = xyes; then
-  $as_echo "#define ENABLE_MULTIARCH 1" >>confdefs.h
+
+$as_echo "#define ENABLE_MULTIARCH 1" >>confdefs.h
 
 fi

 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for multiarch configuration" 
>&5
@@ -19710,7 +19711,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19713 "configure"
+#line 19714 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H

@@ -19816,7 +19817,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19819 "configure"
+#line 19820 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H

--
2.38.1

Re: Ping [PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

2022-11-29 Thread Richard Biener via Gcc-patches

On Tue, Nov 29, 2022 at 2:15 AM HAO CHEN GUI  wrote:
>
> Hi Richard,
>
> 在 2022/11/29 2:46, Richard Biener 写道:
> > Anyhow - my question still stands - what's the fallback for the callers
> > that do not check for failure?  How are we sure we're not running into
> > these when relaxing the requirement that a MODE_CC prepare_cmp_insn
> > must not fail?
>
> I examed the code and found that currently callers should be fine with
> returning a NULL_RTX for MODE_CC processing. The prepare_cmp_insn is called
> by following callers.
>
> 1 gen_cond_trap which doesn't uses MODE_CC
> 2 prepare_cmp_insn itself where is after MODE_CC processing, so it never
> hits MODE_CC
> 3 emit_cmp_and_jump_insns which doesn't uses MODE_CC
> 4 emit_conditional_move which checks the output is null or not
> 5 emit_conditional_add which checks the output is null or not

Thanks for checking.

> Not sure if I missed something. Looking forward to your advice.

I'd then say the non-presence of the optab should be handled the same
as a mismatching predicate as the other comment on the patch indicates.

thanks,
Richard.

> Thanks a lot
> Gui Haochen
>

Re: [PATCH] Fortran: intrinsic MERGE shall use all its arguments [PR107874]

2022-11-29 Thread Paul Richard Thomas via Gcc-patches

Hi Harald,

It looks good to me.

Thanks to you and Steve for the patch.

Paul


On Mon, 28 Nov 2022 at 20:05, Harald Anlauf via Fortran 
wrote:

> Dear all,
>
> as reported, the Fortran standard requires all actual argument
> expressions to be evaluated (e.g. F2018:15.5.3).
>
> There were two cases for intrinsic MERGE where we failed to do so:
>
> - non-constant mask; Steve provided the patch
>
> - constant scalar mask; we need to be careful to simplify only if
>   the argument on the "other" path is known to be constant so that
>   it does not have side-effects and can be immediately removed.
>
> The latter change needed a correction of a sub-test of testcase
> merge_init_expr_2.f90, which should not have been simplified
> the way the original author assumed.  I decided to modify the
> test in such way that simplification is valid and provides
> the expect pattern.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> Thanks,
> Harald
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein

61 matches

Mail list logo