[testsuite] note pitfall in how outputs.exp sets gld

2023-06-22 Thread Alexandre Oliva via Gcc-patches


This patch documents a glitch in gcc.misc-tests/outputs.exp: it checks
whether the linker is GNU ld, and uses that to decide whether to
expect collect2 to create .ld1_args files under -save-temps, but
collect2 bases that decision on whether HAVE_GNU_LD is set, which may
be false zero if the linker in use is GNU ld.  Configuring
--with-gnu-ld fixes this misalignment.  Without that, atsave tests are
likely to fail, because without HAVE_GNU_LD, collect2 won't use @file
syntax to run the linker (so it won't create .ld1_args files).

Long version: HAVE_GNU_LD is set when (i) DEFAULT_LINKER is set during
configure, pointing at GNU ld; (ii) --with-gnu-ld is passed to
configure; or (iii) config.gcc sets gnu_ld=yes.  If a port doesn't set
gnu_ld, and the toolchain isn't configured so as to assume GNU ld,
configure and thus collect2 conservatively assume the linker doesn't
support @file arguments.

But outputs.exp can't see how configure set HAVE_GNU_LD (it may be
used to test an installed compiler), and upon finding that the linker
used by the compiler is GNU ld, it will expect collect2 to use @file
arguments when running the linker.  If that assumption doesn't hold,
atsave tests will fail.

Does it make sense to put this in?  I'd like to preserve this knowledge
somehow, and I suppose this would be most useful for someone observing
these failures and trying to figure out why they come about, so this
seems the best place for them.  Ok to install?


for  gcc/testsuite/ChangeLog

* outputs.exp (gld): Note a known mismatch and record a
workaround.
---
 gcc/testsuite/gcc.misc-tests/outputs.exp |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.misc-tests/outputs.exp 
b/gcc/testsuite/gcc.misc-tests/outputs.exp
index 7ee355034ccaf..9f44cbdb0b5d5 100644
--- a/gcc/testsuite/gcc.misc-tests/outputs.exp
+++ b/gcc/testsuite/gcc.misc-tests/outputs.exp
@@ -50,7 +50,15 @@ if !$skip_lto {
 set ltop [check_linker_plugin_available]
 }
 
-# Check for GNU LD.  Some files like .ld1_args depend on this.
+# Check for GNU LD.  Some files like .ld1_args depend on this.  This
+# should really be testing whether HAVE_GNU_LD was set by configure.
+# If we find GNU ld here, but the compiler wasn't configured
+# --with-gnu-ld or with DEFAULT_LINKER pointing at GNU ld, on a target
+# that doesn't set gnu_ld=yes unconditionally, configure and thus
+# collect2 will conservatively assume there's no support for @file in
+# the linker, but our atfile tests will expect ld1_args files to be
+# created, and thus fail.  Configuring the compiler --with-gnu-ld
+# fixes this.
 set gld [check_effective_target_gld]
 
 # Prepare additional options to be used for linking.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-22 Thread Andrew Pinski via Gcc-patches
On Thu, Jun 22, 2023 at 4:12 AM Philipp Tomsich
 wrote:
>
> From: Manolis Tsamis 
>
> Fixes: 6a2e8dcbbd4bab3
>
> Propagation for the stack pointer in regcprop was enabled in
> 6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
> stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).
>
> This fix adds special handling for stack_pointer_rtx in the places
> where maybe_mode_change is called. This also adds an check in
> maybe_mode_change to return the stack pointer only when the requested
> mode matches the mode of stack_pointer_rtx.

The only comment I have is on the testcase.

>
> PR 110308
>
> gcc/ChangeLog:
>
> * regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
> (find_oldest_value_reg): Special handling of stack_pointer_rtx.
> (copyprop_hardreg_forward_1): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr110308.C: New test.
>
> Signed-off-by: Manolis Tsamis 
> Signed-off-by: Philipp Tomsich 
>
> ---
> This addresses both the PRs (110308 and 110313) and was confirmed to
> resolve the AArch64 bootstrap issue reported by Thiago.
>
> OK for trunk?
>
>  gcc/regcprop.cc | 43 +
>  gcc/testsuite/g++.dg/torture/pr110308.C | 30 +
>  2 files changed, 60 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr110308.C
>
> diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
> index 6cbfadb181f..fe75b7f1fa0 100644
> --- a/gcc/regcprop.cc
> +++ b/gcc/regcprop.cc
> @@ -423,7 +423,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
> copy_mode,
>   It's unclear if we need to do the same for other special registers.  */
>if (regno == STACK_POINTER_REGNUM)
>  {
> -  if (orig_mode == new_mode)
> +  if (orig_mode == new_mode && new_mode == GET_MODE (stack_pointer_rtx))
> return stack_pointer_rtx;
>else
> return NULL_RTX;
> @@ -487,9 +487,14 @@ find_oldest_value_reg (enum reg_class cl, rtx reg, 
> struct value_data *vd)
>new_rtx = maybe_mode_change (oldmode, vd->e[regno].mode, mode, i, 
> regno);
>if (new_rtx)
> {
> - ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
> - REG_ATTRS (new_rtx) = REG_ATTRS (reg);
> - REG_POINTER (new_rtx) = REG_POINTER (reg);
> + if (new_rtx != stack_pointer_rtx)
> +   {
> + ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
> + REG_ATTRS (new_rtx) = REG_ATTRS (reg);
> + REG_POINTER (new_rtx) = REG_POINTER (reg);
> +   }
> + else if (REG_POINTER (new_rtx) != REG_POINTER (reg))
> +   return NULL_RTX;
>   return new_rtx;
> }
>  }
> @@ -965,15 +970,27 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
> value_data *vd)
>
>   if (validate_change (insn, _SRC (set), new_rtx, 0))
> {
> - ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
> - REG_ATTRS (new_rtx) = REG_ATTRS (src);
> - REG_POINTER (new_rtx) = REG_POINTER (src);
> - if (dump_file)
> -   fprintf (dump_file,
> -"insn %u: replaced reg %u with %u\n",
> -INSN_UID (insn), regno, REGNO (new_rtx));
> - changed = true;
> - goto did_replacement;
> + bool can_change;
> + if (new_rtx != stack_pointer_rtx)
> +   {
> + ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
> + REG_ATTRS (new_rtx) = REG_ATTRS (src);
> + REG_POINTER (new_rtx) = REG_POINTER (src);
> + can_change = true;
> +   }
> + else
> +   can_change
> + = (REG_POINTER (new_rtx) == REG_POINTER (src));
> +
> + if (can_change)
> +   {
> + if (dump_file)
> +   fprintf (dump_file,
> +"insn %u: replaced reg %u with %u\n",
> +INSN_UID (insn), regno, REGNO (new_rtx));
> + changed = true;
> + goto did_replacement;
> +   }
> }
>   /* We need to re-extract as validate_change clobbers
>  recog_data.  */
> diff --git a/gcc/testsuite/g++.dg/torture/pr110308.C 
> b/gcc/testsuite/g++.dg/torture/pr110308.C
> new file mode 100644
> index 000..ddd30d4fc3f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr110308.C
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g2 -O2" } */

Since this is in `g++.dg/torture`, you don't need dg-options as the

Re: [PATCH 1/2] go: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-22 Thread Ian Lance Taylor via Gcc-patches
On Thu, Jun 22, 2023, 4:47 PM Peter Bergner  wrote:

> On 6/22/23 6:37 PM, Peter Bergner via Gcc-patches wrote:
> > On 6/16/23 12:01 PM, Ian Lance Taylor via Gcc-patches wrote:
> >> On Fri, Jun 16, 2023 at 9:00 AM Paul E. Murphy via Gcc-patches
> >>  wrote:
> >>>
> >>> TARGET_AIX is defined to a non-zero value on linux and maybe other
> >>> powerpc64le targets.  This leads to unexpected behavior such as
> >>> dropping the .go_export section when linking a shared library
> >>> on linux/powerpc64le.
> >>>
> >>> Instead, use TARGET_AIX_OS to toggle AIX specific behavior.
> >>>
> >>> Fixes golang/go#60798.
> >>>
> >>> gcc/go/ChangeLog:
> >>>
> >>> * go-backend.cc [TARGET_AIX]: Rename and update usage to
> >>> TARGET_AIX_OS.
> >>> * go-lang.cc: Likewise.
> >>
> >> This is OK.
> >>
> >> Thanks.
> >>
> >> Ian
> >
> > I pushed this to trunk for Paul.
>
> I see this is broken on the release branches too.  Are backports ok
> after some burn-in on trunk?
>

Yes.  Thanks.

Ian

>


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-06-22 Thread Ben Boeckel via Gcc-patches
On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> On 1/25/23 16:06, Ben Boeckel wrote:
> > They affect the build, so report them via `-MF` mechanisms.
> 
> Why isn't this covered by the existing code in preprocessed_module?

It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name
in v7 once I finish up testing). This logic cannot be used for p1689
output because it assumes the location and names of CMI files (`.c++m`)
that will be necessary (it is related to the `CXX_IMPORTS +=` GNU
make/libcody extensions that will, e.g., cause `ninja` to choke if it is
read from `-MF` output as it uses "fancier" Makefile syntax than tools
that are not actually `make` are going to be willing to support). This
codepath is the *actual* filename being read at compile time and is
relevant at all times; it may duplicate what `preprocessed_module` sets
up.

I'm also realizing that this is why I need to pass `-fdeps-format=p1689`
when compiling…there may need to be another, more idiomatic, way to
disable this additional syntax usage in `-MF` output.

--Ben


Introduce -finline-stringops (was: Re: [RFC] Introduce -finline-memset-loops)

2023-06-22 Thread Alexandre Oliva via Gcc-patches
On Jun  2, 2023, Alexandre Oliva  wrote:

> Introduce -finline-stringops

Ping?  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620472.html

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH v5] tree-ssa-sink: Improve code sinking pass

2023-06-22 Thread Peter Bergner via Gcc-patches
On 6/1/23 11:54 PM, Ajit Agarwal via Gcc-patches wrote:
> 
> 
> On 01/06/23 2:06 pm, Bernhard Reutner-Fischer wrote:
>> On 1 June 2023 09:20:08 CEST, Ajit Agarwal  wrote:
>>> Hello All:
>>>
>>> This patch improves code sinking pass to sink statements before call to 
>>> reduce
>>> register pressure.
>>> Review comments are incorporated.
>>
>> Hi Ajit!
>>
>> I had two comments for v4 that you did not address in v5 or followed up.
>> thanks,
> 
> Which comments I didn't address. Please let me know.

I believe he's referring to these two comments:

  > + && dominated_by_p (CDI_DOMINATORS, new_best_bb, early_bb))
  > +   {
  > + if (def_use_same_block (use))
  > +   return best_bb;
  > +
  > + return new_best_bb;
  > +   }
  > +   return best_bb;
  > +}
  >  

  Many returns.
  I'd have said
  && !def_use_same_block (use)
return new_best_bb;
  else
return best_bb;

  and rephrase the comment above list of Things to consider accordingly.


I agree with Bernhard's comment that it could be rewritten to be clearer.
Although, the "else" isn't really required.  So Bernhard's version would
look like:

  if (new_best_bb
  && use
  && new_best_bb != best_bb
  && new_best_bb != early_bb
  && !is_gimple_call (stmt)
  && gsi_end_p (gsi_start_phis (new_best_bb))
  && gimple_bb (use) != early_bb
  && !is_gimple_call (use)
  && dominated_by_p (CDI_POST_DOMINATORS, new_best_bb, gimple_bb (use))
  && dominated_by_p (CDI_DOMINATORS, new_best_bb, early_bb)
  && !def_use_same_block (use))
return new_best_bb;
  else
return best_bb;

...or just:

  if (new_best_bb
  && use
  && new_best_bb != best_bb
  && new_best_bb != early_bb
  && !is_gimple_call (stmt)
  && gsi_end_p (gsi_start_phis (new_best_bb))
  && gimple_bb (use) != early_bb
  && !is_gimple_call (use)
  && dominated_by_p (CDI_POST_DOMINATORS, new_best_bb, gimple_bb (use))
  && dominated_by_p (CDI_DOMINATORS, new_best_bb, early_bb)
  && !def_use_same_block (use))
return new_best_bb;

  return best_bb;


Either works.


Peter



[PATCH] c++: Add support for -std={c,gnu}++2{c,6}

2023-06-22 Thread Marek Polacek via Gcc-patches
It seems prudent to add C++26 now that the first C++26 papers have been
approved.  I followed commit r11-6920 as well as r8-3237.

I was puzzled to see that -std=c++23 was marked Undocumented but
-std=c++2b wasn't.  I think it should be the other way round, like
the earlier modes.

As for __cplusplus, I've arbitrarily chosen 202600L:

  $ xg++ -std=c++26 -dM -E -x c++ - < /dev/null | grep cplusplus
  #define __cplusplus 202600L

I've verified the patch with a simple test, exercising the new
directives.  Don't forget to update your GXX_TESTSUITE_STDS!

This patch does not add -Wc++26-extensions.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

* c-common.h (cxx_dialect): Add cxx26 as a dialect.
* c-opts.cc (set_std_cxx26): New.
(c_common_handle_option): Set options when -std={c,gnu}++2{c,6} is
enabled.
(c_common_post_options): Adjust comments.
* c.opt: Add options for -std=c++26, std=c++2c, -std=gnu++26,
and -std=gnu++2c.
(std=c++2b): Mark as Undocumented.
(std=c++23): No longer Undocumented.

gcc/ChangeLog:

* doc/cpp.texi (__cplusplus): Document value for -std=c++26 and
-std=gnu++26.
* doc/invoke.texi: Document -std=c++26 and -std=gnu++26.
* dwarf2out.cc (highest_c_language): Handle GNU C++26.
(gen_compile_unit_die): Likewise.

libcpp/ChangeLog:

* include/cpplib.h (c_lang): Add CXX26 and GNUCXX26.
* init.cc (lang_defaults): Add rows for CXX26 and GNUCXX26.
(cpp_init_builtins): Set __cplusplus to 202600L for C++26.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_c++23): Return
1 also if check_effective_target_c++26.
(check_effective_target_c++23_down): New.
(check_effective_target_c++26_only): New.
(check_effective_target_c++26): New.
* g++.dg/cpp26/cplusplus.C: New test.
---
 gcc/c-family/c-common.h|  4 +++-
 gcc/c-family/c-opts.cc | 28 +++-
 gcc/c-family/c.opt | 24 +
 gcc/doc/cpp.texi   |  5 +
 gcc/doc/invoke.texi| 12 +++
 gcc/dwarf2out.cc   |  5 -
 gcc/testsuite/g++.dg/cpp26/cplusplus.C |  5 +
 gcc/testsuite/lib/target-supports.exp  | 30 +-
 libcpp/include/cpplib.h|  2 +-
 libcpp/init.cc | 13 ---
 10 files changed, 116 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp26/cplusplus.C

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 336a09f4a40..b5ef5ff6b2c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -740,7 +740,9 @@ enum cxx_dialect {
   /* C++20 */
   cxx20,
   /* C++23 */
-  cxx23
+  cxx23,
+  /* C++26 */
+  cxx26
 };
 
 /* The C++ dialect being used. C++98 is the default.  */
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index c68a2a27469..af19140e382 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -111,6 +111,7 @@ static void set_std_cxx14 (int);
 static void set_std_cxx17 (int);
 static void set_std_cxx20 (int);
 static void set_std_cxx23 (int);
+static void set_std_cxx26 (int);
 static void set_std_c89 (int, int);
 static void set_std_c99 (int);
 static void set_std_c11 (int);
@@ -663,6 +664,12 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
set_std_cxx23 (code == OPT_std_c__23 /* ISO */);
   break;
 
+case OPT_std_c__26:
+case OPT_std_gnu__26:
+  if (!preprocessing_asm_p)
+   set_std_cxx26 (code == OPT_std_c__26 /* ISO */);
+  break;
+
 case OPT_std_c90:
 case OPT_std_iso9899_199409:
   if (!preprocessing_asm_p)
@@ -1032,7 +1039,8 @@ c_common_post_options (const char **pfilename)
warn_narrowing = 1;
 
   /* Unless -f{,no-}ext-numeric-literals has been used explicitly,
-for -std=c++{11,14,17,20,23} default to -fno-ext-numeric-literals.  */
+for -std=c++{11,14,17,20,23,26} default to
+-fno-ext-numeric-literals.  */
   if (flag_iso && !OPTION_SET_P (flag_ext_numeric_literals))
cpp_opts->ext_numeric_literals = 0;
 }
@@ -1820,6 +1828,24 @@ set_std_cxx23 (int iso)
   lang_hooks.name = "GNU C++23";
 }
 
+/* Set the C++ 2026 standard (without GNU extensions if ISO).  */
+static void
+set_std_cxx26 (int iso)
+{
+  cpp_set_lang (parse_in, iso ? CLK_CXX26: CLK_GNUCXX26);
+  flag_no_gnu_keywords = iso;
+  flag_no_nonansi_builtin = iso;
+  flag_iso = iso;
+  /* C++26 includes the C11 standard library.  */
+  flag_isoc94 = 1;
+  flag_isoc99 = 1;
+  flag_isoc11 = 1;
+  /* C++26 includes coroutines.  */
+  flag_coroutines = true;
+  cxx_dialect = cxx26;
+  lang_hooks.name = "GNU C++26";
+}
+
 /* Args to -d specify what to dump.  Silently ignore
unrecognized options; they may be aimed at 

Re: [PATCH 2/2] rust: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-22 Thread Peter Bergner via Gcc-patches
On 6/21/23 11:20 AM, Paul E Murphy via Gcc-patches wrote:
> 
> 
> On 6/19/23 3:39 AM, Thomas Schwinge wrote:
>> Hi Paul!
>>
>> On 2023-06-16T11:00:02-0500, "Paul E. Murphy via Gcc-patches" 
>>  wrote:
>>> This was noticed when fixing the gccgo usage of the macro, the
>>> rust usage is very similar.
>>>
>>> TARGET_AIX is defined as a non-zero value on linux/powerpc64le
>>> which may cause unexpected behavior.  TARGET_AIX_OS should be
>>> used to toggle AIX specific behavior.
>>>
>>> gcc/rust/ChangeLog:
>>>
>>>    * rust-object-export.cc [TARGET_AIX]: Rename and update
>>>    usage to TARGET_AIX_OS.
>>
>> I don't have rights to formally approve this GCC/Rust change, but I'll
>> note that it follows "as obvious" (see
>> , "Obvious fixes") to the
>> corresponding GCC/Go change, which has been approved:
>> ,
>> and which is where this GCC/Rust code has been copied from, so I suggest
>> you push both patches at once.
>>
>>
>> Grüße
>>   Thomas
> 
> Hi Thomas,
> 
> Thank you for reviewing.  I do not have commit access, so I cannot push this 
> myself.  If this is OK, could one of the rust maintainers push this patch?
> 
> Thanks,
> Paul

I pushed this to trunk for Paul.

Peter



[PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-22 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Address comments from Richard and Bernhard from V5 patch.
V6 fixed all issues according their comments.

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
LEN_MASK_STORE.
(internal_load_fn_p): Add LEN_MASK_LOAD.
(internal_store_fn_p): Add LEN_MASK_STORE.
(internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
(internal_fn_stored_value_index): Add LEN_MASK_STORE.
(internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
* optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
LEN_MASK_{LOAD,STORE}.
(get_len_load_store_mode): Ditto.
* optabs-tree.h (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
(vectorizable_load): Ditto.

---
 gcc/internal-fn.cc |  37 ++-
 gcc/optabs-tree.cc |  86 ++--
 gcc/optabs-tree.h  |   6 +-
 gcc/tree-vect-stmts.cc | 221 +
 4 files changed, 267 insertions(+), 83 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..1c2fd487e2a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
  * OPTAB.  */
 
 static void
-expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
 {
   class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
@@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
-  rhs = gimple_call_arg (stmt, 3);
+  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
   type = TREE_TYPE (rhs);
   lhs = expand_call_mem_ref (type, stmt, 0);
 
@@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
 case IFN_GATHER_LOAD:
 case IFN_MASK_GATHER_LOAD:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   return true;
 
 default:
@@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   return true;
 
 default:
@@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_SCATTER_STORE:
   return 4;
 
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
+  return 3;
+
 default:
   return (conditional_internal_fn_code (fn) != ERROR_MARK
  || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
@@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
 case IFN_LEN_STORE:
   return 3;
 
+case IFN_LEN_MASK_STORE:
+  return 4;
+
 default:
   return -1;
 }
@@ -4583,13 +4592,33 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
 {
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);
+  int bias_opno = 3;
+
+  if (icode == CODE_FOR_nothing)
+{
+  machine_mode mask_mode;
+  if (!targetm.vectorize.get_mask_mode (mode).exists (_mode))
+   return VECT_PARTIAL_BIAS_UNSUPPORTED;
+  if (ifn == IFN_LEN_LOAD)
+   {
+ /* Try LEN_MASK_LOAD.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
+   }
+  else
+   {
+ /* Try LEN_MASK_STORE.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
+   }
+  icode = convert_optab_handler (optab, mode, mask_mode);
+  bias_opno = 4;
+}
 
   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
-  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
return 0;
-  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
return -1;
 }
 
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index 77bf745ae40..e6ae15939d3 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -543,19 +543,50 @@ target_supports_op_p (tree type, enum tree_code code,
  && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing);
 }
 
-/* Return true if target supports vector masked load/store for mode.  */
+/* Return true if the target has support for masked load/store.
+   We can support masked load/store by either mask{load,store}
+   or len_mask{load,store}.
+   This helper function checks whether target supports masked
+   load/store and return corresponding IFN in the last argument
+   (IFN_MASK_{LOAD,STORE} or IFN_LEN_MASK_{LOAD,STORE}).  */
+
+static bool

Re: [PATCH 1/2] go: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-22 Thread Peter Bergner via Gcc-patches
On 6/22/23 6:37 PM, Peter Bergner via Gcc-patches wrote:
> On 6/16/23 12:01 PM, Ian Lance Taylor via Gcc-patches wrote:
>> On Fri, Jun 16, 2023 at 9:00 AM Paul E. Murphy via Gcc-patches
>>  wrote:
>>>
>>> TARGET_AIX is defined to a non-zero value on linux and maybe other
>>> powerpc64le targets.  This leads to unexpected behavior such as
>>> dropping the .go_export section when linking a shared library
>>> on linux/powerpc64le.
>>>
>>> Instead, use TARGET_AIX_OS to toggle AIX specific behavior.
>>>
>>> Fixes golang/go#60798.
>>>
>>> gcc/go/ChangeLog:
>>>
>>> * go-backend.cc [TARGET_AIX]: Rename and update usage to
>>> TARGET_AIX_OS.
>>> * go-lang.cc: Likewise.
>>
>> This is OK.
>>
>> Thanks.
>>
>> Ian
> 
> I pushed this to trunk for Paul.

I see this is broken on the release branches too.  Are backports ok
after some burn-in on trunk?

Peter





Re: [PATCH 1/2] go: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-22 Thread Peter Bergner via Gcc-patches
On 6/16/23 12:01 PM, Ian Lance Taylor via Gcc-patches wrote:
> On Fri, Jun 16, 2023 at 9:00 AM Paul E. Murphy via Gcc-patches
>  wrote:
>>
>> TARGET_AIX is defined to a non-zero value on linux and maybe other
>> powerpc64le targets.  This leads to unexpected behavior such as
>> dropping the .go_export section when linking a shared library
>> on linux/powerpc64le.
>>
>> Instead, use TARGET_AIX_OS to toggle AIX specific behavior.
>>
>> Fixes golang/go#60798.
>>
>> gcc/go/ChangeLog:
>>
>> * go-backend.cc [TARGET_AIX]: Rename and update usage to
>> TARGET_AIX_OS.
>> * go-lang.cc: Likewise.
> 
> This is OK.
> 
> Thanks.
> 
> Ian

I pushed this to trunk for Paul.

Peter



Re: Re: [PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-22 Thread 钟居哲
Thanks Richard! Will send V6 patch to address your comments.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-23 02:27
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
> (internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
> (internal_fn_stored_value_index): Add LEN_MASK_STORE.
> (internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
> * optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
> LEN_MASK_{LOAD,STORE}.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
> (vectorizable_load): Ditto.
>
> ---
>  gcc/internal-fn.cc |  36 ++-
>  gcc/optabs-tree.cc |  85 +---
>  gcc/optabs-tree.h  |   6 +-
>  gcc/tree-vect-stmts.cc | 220 +
>  4 files changed, 265 insertions(+), 82 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..b90bd85df2c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_SCATTER_STORE:
>return 4;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  default:
>return (conditional_internal_fn_code (fn) != ERROR_MARK
>|| get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,32 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_opno = 3;
> +
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();
 
We can't require this to succeed, since the query is speculative.
I think it should instead be:
 
  machine_mode mask_mode
  if (!targetm.vectorize.get_mask_mode (mode).exists (_mode))
return VECT_PARTIAL_BIAS_UNSUPPORTED;
 
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_opno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
>  return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
>  return -1;
>  }
>  
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 77bf745ae40..ab9514fc8e0 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -543,19 +543,49 @@ target_supports_op_p (tree type, enum 

Re: Re: [PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-22 Thread 钟居哲
Thanks so much. Will send V6 patch.




juzhe.zh...@rivai.ai
 
From: Bernhard Reutner-Fischer
Date: 2023-06-23 00:05
To: juzhe.zhong
CC: rep.dot.nop; gcc-patches; rguenther; richard.sandiford
Subject: Re: [PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
On Thu, 22 Jun 2023 21:53:48 +0800
juzhe.zh...@rivai.ai wrote:
 
> +static bool
> +target_supports_mask_load_store_p (machine_mode mode, machine_mode mask_mode,
> +bool is_load, internal_fn *ifn)
> +{
> +  optab op = is_load ? maskload_optab : maskstore_optab;
> +  optab len_op = is_load ? len_maskload_optab : len_maskstore_optab;
> +  if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
> +{
> +  if (ifn)
> + *ifn = is_load ? IFN_MASK_LOAD : IFN_MASK_STORE;
> +  return true;
> +}
> +  else if (convert_optab_handler (len_op, mode, mask_mode) != 
> CODE_FOR_nothing)
> +{
> +  *ifn = is_load ? IFN_LEN_MASK_LOAD : IFN_LEN_MASK_STORE;
 
It feels inconsistent that you do not check ifn here.
 
> +  return true;
> +}
> +  return false;
> +}
 
> +/* Return true if the target has support for len load/store.
> +   We can support len load/store by either len_{load,store}
> +   or len_mask{load,store}.
> +   This helper function checks whether target supports len
> +   load/store and return corresponding IFN in the last argument
> +   (IFN_LEN_{LOAD,STORE} or IFN_LEN_MASK_{LOAD,STORE}).  */
> +
> +static bool
> +target_supports_len_load_store_p (machine_mode mode, bool is_load,
> +   internal_fn *ifn)
> +{
> +  optab op = is_load ? len_load_optab : len_store_optab;
> +  optab masked_op = is_load ? len_maskload_optab : len_maskstore_optab;
> +
> +  if (direct_optab_handler (op, mode))
> +{
> +  if (ifn)
> + *ifn = is_load ? IFN_LEN_LOAD : IFN_LEN_STORE;
> +  return true;
> +}
> +  machine_mode mask_mode;
> +  if (targetm.vectorize.get_mask_mode (mode).exists (_mode)
> +  && convert_optab_handler (masked_op, mode, mask_mode) != 
> CODE_FOR_nothing)
> +{
> +  if (ifn)
> + *ifn = is_load ? IFN_LEN_MASK_LOAD : IFN_LEN_MASK_STORE;
> +  return true;
> +}
> +  return false;
> +}
> +
>  /* If target supports vector load/store with length for vector mode MODE,
> return the corresponding vector mode, otherwise return opt_machine_mode 
> ().
> There are two flavors for vector load/store with length, one is to measure
> length with bytes, the other is to measure length with lanes.
> As len_{load,store} optabs point out, for the flavor with bytes, we use
> -   VnQI to wrap the other supportable same size vector modes.  */
> +   VnQI to wrap the other supportable same size vector modes.
> +   An additional output in the last argumennt which is the IFN pointer.
> +   We set IFN as LEN_{LOAD,STORE} or LEN_MASK_{LOAD,STORE} according
> +   which optab is supported in the targe.  */
 
s/argumennt/argument/
s/targe\./target./
 
thanks,
 


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-06-22 Thread Jason Merrill via Gcc-patches

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

Signed-off-by: Ben Boeckel 
---
  gcc/cp/module.cc | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ebd30f63d81..dbd1b721616 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18966,6 +18966,8 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
dump () && dump ("CMI is %s", file);
if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  deps_add_dep (cpp_get_deps (reader), file);
fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
e = errno;
  }




Re: [PATCH] Introduce hardbool attribute for C

2023-06-22 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 21 Jun 2023 22:08:55 -0300
Alexandre Oliva  wrote:

> Thanks for the test.
> 
> Did you mean for me to incorporate it into the patch, or do you mean to
> contribute it separately, if the feature happens to be accepted?

These were your tests that i quoted but i or my MUA botched to add one
level of quotes -- sorry for that.

> 
> On Jun 19, 2023, Bernhard Reutner-Fischer  wrote:
> 
> > I don't see explicit tests with _Complex nor __complex__. Would we
> > want to check these here, or are they handled thought the "underlying"
> > tests above?  
> 
> Good question.  The notion of using complex types to hold booleans
> hadn't even crossed my mind.

Maybe it is not real, it just sparkled through somehow.

> On the one hand, there doesn't seem to be reason to rule them out, and
> that could go for literally any other type.
> 
> On the other, there doesn't seem to be any useful case for them.  Can
> anyone think of one?

We could either not reject other such uses and wait or we could reject
them and equally wait for complaints. I would not dare to bet who pops
up first, fuzzers or users, though arguments of the latter would
probably be interesting.. I don't have an opinion (nor a use-case),
really, it was just a thought (i mentioned tinfoil hat, did i ;).

> 
> > I'd welcome a fortran interop note in the docs  
> 
> Is there any good place for such interop notes?  I'm not sure I'm
> best-suited to write them up, since Fortran is not a language I'm
> very familiar with, but I suppose I could give it a try.
> 

I'd append to your extend.texi hunk below the para about uninitialized a
note to the effect of:
Note: Types annotated with this attribute may not be Fortran
interoperable.

I would not go into too much detail about C_BOOL nor LOGICAL for i
reckon anybody sensibilised to either two of that attribute, C and
Fortran will draw her conclusions.
Didn't really think how easy it would be to handle this on the user
side, but i fear the modern iso_c_binding way would need help from the
compiler for the lazy. I'd expect a user to be able to trivially
translate this in wrappers done the old way though, which is a pity
from an educational and modernisation POV. Didn't look closely, so this
guesstimate might be all wrong, of course.

thanks,


Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-22 Thread Thiago Jung Bauermann via Gcc-patches


Hello,

liuhongt via Gcc-patches  writes:

> I notice there's some refactor in vectorizable_conversion
> for code_helper,so I've adjusted my patch to that.
> Here's the patch I'm going to commit.
>
> We have already use intermidate type in case WIDEN, but not for NONE,
> this patch extended that.
>
> gcc/ChangeLog:
>
>   PR target/110018
>   * tree-vect-stmts.cc (vectorizable_conversion): Use
>   intermiediate integer type for float_expr/fix_trunc_expr when
>   direct optab is not existed.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/i386/pr110018-1.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++
>  gcc/tree-vect-stmts.cc | 66 ++-
>  2 files changed, 158 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c

Our CI detected regressions on aarch64-linux-gnu with this commit, in
some aarch64-sve and gfortran tests. I filed the following bug report
with the details:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371

Could you please check?

-- 
Thiago


[PATCH] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

2023-06-22 Thread Harald Anlauf via Gcc-patches
Dear all,

gfortran's ABI specifies that actual arguments to CHARACTER(LEN=1),VALUE
dummy arguments are passed by value in the scalar case.  That did work
for constant strings being passed, but not in several other cases, where
pointers were passed, resulting in subsequent random junk...

The attached patch fixes this for the case of a non-constant string
argument.

It does not touch the character,value bind(c) case - this is a different
thing and may need separate work, as Mikael pointed out - and there is
a missed optimization for the case of actual constant string arguments
of length larger than 1: it appears that the full string is pushed to
the stack.  I did not address that, as the primary aim here is to get
correctly working code.  (I added a TODO in a comment.)

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From bea1e14490e4abc4b67bae8fdca5196bb93acd2d Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 22 Jun 2023 22:07:41 +0200
Subject: [PATCH] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument
 [PR110360]

gcc/fortran/ChangeLog:

	PR fortran/110360
	* trans-expr.cc (gfc_conv_procedure_call): Pass actual argument
	to scalar CHARACTER(1),VALUE dummy argument by value.

gcc/testsuite/ChangeLog:

	PR fortran/110360
	* gfortran.dg/value_9.f90: New test.
---
 gcc/fortran/trans-expr.cc | 19 +++
 gcc/testsuite/gfortran.dg/value_9.f90 | 78 +++
 2 files changed, 97 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/value_9.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 3c209bcde97..c92fccd0be2 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6392,6 +6392,25 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		  else
 		{
 		gfc_conv_expr (, e);
+
+		/* ABI: actual arguments to CHARACTER(len=1),VALUE
+		   dummy arguments are actually passed by value.
+		   The BIND(C) case is handled elsewhere.
+		   TODO: truncate constant strings to length 1.  */
+		if (fsym->ts.type == BT_CHARACTER
+			&& !fsym->ts.is_c_interop
+			&& fsym->ts.u.cl->length->expr_type == EXPR_CONSTANT
+			&& fsym->ts.u.cl->length->ts.type == BT_INTEGER
+			&& (mpz_cmp_ui
+			(fsym->ts.u.cl->length->value.integer, 1) == 0)
+			&& e->expr_type != EXPR_CONSTANT)
+		  {
+			parmse.expr = gfc_string_to_single_character
+			  (build_int_cst (gfc_charlen_type_node, 1),
+			   parmse.expr,
+			   e->ts.kind);
+		  }
+
 		if (fsym->attr.optional
 			&& fsym->ts.type != BT_CLASS
 			&& fsym->ts.type != BT_DERIVED)
diff --git a/gcc/testsuite/gfortran.dg/value_9.f90 b/gcc/testsuite/gfortran.dg/value_9.f90
new file mode 100644
index 000..f6490645e27
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/value_9.f90
@@ -0,0 +1,78 @@
+! { dg-do run }
+! PR fortran/110360 - ABI for scalar character(len=1),value dummy argument
+
+program p
+  implicit none
+  character,   allocatable :: ca
+  character,   pointer :: cp
+  character(len=:),allocatable :: cd
+  character  (kind=4), allocatable :: ca4
+  character  (kind=4), pointer :: cp4
+  character(len=:,kind=4), allocatable :: cd4
+  integer :: a = 65
+  allocate (ca, cp, ca4, cp4)
+
+  ! Check len=1 actual argument cases first
+  ca  =   "a"; cp  =   "b"; cd  =   "c"
+  ca4 = 4_"d"; cp4 = 4_"e"; cd4 = 4_"f"
+  call val  ("B","B")
+  call val  ("A",char(65))
+  call val  ("A",char(a))
+  call val  ("A",mychar(65))
+  call val  ("A",mychar(a))
+  call val4 (4_"C",4_"C")
+  call val4 (4_"A",char(65,kind=4))
+  call val4 (4_"A",char(a, kind=4))
+  call val  (ca,ca)
+  call val  (cp,cp)
+  call val  (cd,cd)
+  call val4 (ca4,ca4)
+  call val4 (cp4,cp4)
+  call val4 (cd4,cd4)
+  call sub  ("S")
+  call sub4 (4_"T")
+
+  ! Check that always the first character of the string is finally used
+  call val  (  "U++",  "U--")
+  call val4 (4_"V**",4_"V//")
+  call sub  (  "WTY")
+  call sub4 (4_"ZXV")
+  cd = "gkl"; cd4 = 4_"hmn"
+  call val  (cd,cd)
+  call val4 (cd4,cd4)
+  call sub  (cd)
+  call sub4 (cd4)
+  deallocate (ca, cp, ca4, cp4, cd, cd4)
+contains
+  subroutine val (x, c)
+character(kind=1), intent(in) :: x  ! control: pass by reference
+character(kind=1), value  :: c
+print *, "by value(kind=1): ", c
+if (c /= x)   stop 1
+c = "*"
+if (c /= "*") stop 2
+  end
+
+  subroutine val4 (x, c)
+character(kind=4), intent(in) :: x  ! control: pass by reference
+character(kind=4), value  :: c
+print *, "by value(kind=4): ", c
+if (c /= x) stop 3
+c = 4_"#"
+if (c /= 4_"#") stop 4
+  end
+
+  subroutine sub (s)
+character(*), intent(in) :: s
+call val (s, s)
+  end
+  subroutine sub4 (s)
+character(kind=4,len=*), intent(in) :: s
+call val4 (s, s)
+  end
+
+  character function mychar (i)
+integer, intent(in) :: i
+mychar = char (i)
+  end
+end
--
2.35.3



[PATCH] analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [pr110198]

2023-06-22 Thread Benjamin Priour via Gcc-patches
From: benjamin priour 

Resend with proper subject line ...

Hi,

Below is the fix to regression bug 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110198
Was bootstrapped and regtested successfully on x86_64-linux-gnu
Considering mishap from last patch, I'd would appreciate if you could
also regtest it, to be sure :)

Thanks,
Benjamin.


g++.dg/analyzer/pr100244.C was failing after a patch of PR109439.
The reason was a spurious preemptive return of get_store_value upon 
out-of-bounds read that
was preventing further checks. Now instead, a boolean value check_poisoned goes 
to false when
a OOB is detected, and is later on given to get_or_create_initial_value.

gcc/analyzer/ChangeLog:

* region-model-manager.cc 
(region_model_manager::get_or_create_initial_value): Take an
optional boolean value to bypass poisoning checks
* region-model-manager.h: Update declaration of the above function.
* region-model.cc (region_model::get_store_value): No longer
returns on OOB, but rather gives a boolean to 
get_or_create_initial_value.
(region_model::check_region_access): Update docstring.
(region_model::check_region_for_write): Update docstring.

Signed-off-by: benjamin priour 
---
 gcc/analyzer/region-model-manager.cc |  5 +++--
 gcc/analyzer/region-model-manager.h  |  3 ++-
 gcc/analyzer/region-model.cc | 15 ---
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 1453acf7bc9..4f11ef4bd29 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -293,9 +293,10 @@ region_model_manager::create_unique_svalue (tree type)
necessary.  */
 
 const svalue *
-region_model_manager::get_or_create_initial_value (const region *reg)
+region_model_manager::get_or_create_initial_value (const region *reg,
+  bool check_poisoned)
 {
-  if (!reg->can_have_initial_svalue_p ())
+  if (!reg->can_have_initial_svalue_p () && check_poisoned)
 return get_or_create_poisoned_svalue (POISON_KIND_UNINIT,
  reg->get_type ());
 
diff --git a/gcc/analyzer/region-model-manager.h 
b/gcc/analyzer/region-model-manager.h
index 3340c3ebd1e..ff5333bf07c 100644
--- a/gcc/analyzer/region-model-manager.h
+++ b/gcc/analyzer/region-model-manager.h
@@ -49,7 +49,8 @@ public:
 tree type);
   const svalue *get_or_create_poisoned_svalue (enum poison_kind kind,
   tree type);
-  const svalue *get_or_create_initial_value (const region *reg);
+  const svalue *get_or_create_initial_value (const region *reg,
+bool check_poisoned = true);
   const svalue *get_ptr_svalue (tree ptr_type, const region *pointee);
   const svalue *get_or_create_unaryop (tree type, enum tree_code op,
   const svalue *arg);
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6bc60f89f3d..187013a37cc 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2373,8 +2373,9 @@ region_model::get_store_value (const region *reg,
   if (reg->empty_p ())
 return m_mgr->get_or_create_unknown_svalue (reg->get_type ());
 
+  bool check_poisoned = true;
   if (check_region_for_read (reg, ctxt))
-return m_mgr->get_or_create_unknown_svalue(reg->get_type());
+check_poisoned = false;
 
   /* Special-case: handle var_decls in the constant pool.  */
   if (const decl_region *decl_reg = reg->dyn_cast_decl_region ())
@@ -2427,7 +2428,7 @@ region_model::get_store_value (const region *reg,
   == RK_GLOBALS)
 return get_initial_value_for_global (reg);
 
-  return m_mgr->get_or_create_initial_value (reg);
+  return m_mgr->get_or_create_initial_value (reg, check_poisoned);
 }
 
 /* Return false if REG does not exist, true if it may do.
@@ -2790,7 +2791,7 @@ region_model::get_string_size (const region *reg) const
 
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
using DIR to determine if this access is a read or write.
-   Return TRUE if an UNKNOWN_SVALUE needs be created.
+   Return TRUE if an OOB access was detected.
If SVAL_HINT is non-NULL, use it as a hint in diagnostics
about the value that would be written to REG.  */
 
@@ -2804,10 +2805,10 @@ region_model::check_region_access (const region *reg,
   if (!ctxt)
 return false;
 
-  bool need_unknown_sval = false;
+  bool oob_access_detected = false;
   check_region_for_taint (reg, dir, ctxt);
   if (!check_region_bounds (reg, dir, sval_hint, ctxt))
-need_unknown_sval = true;
+oob_access_detected = true;
 
   switch (dir)
 {
@@ -2820,7 +2821,7 @@ region_model::check_region_access (const region *reg,
   check_for_writable_region (reg, ctxt);
   break;
 }

[no subject]

2023-06-22 Thread Benjamin Priour via Gcc-patches
Hi,

Below is the fix to regression bug 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110198
Was bootstrapped and regtested successfully on x86_64-linux-gnu
Considering mishap from last patch, I'd would appreciate if you could
also regtest it, to be sure :)

Thanks,
Benjamin.

>From 04186f04a3f172d7ccf9824cc71faca489eb39af Mon Sep 17 00:00:00 2001
From: benjamin priour 
Date: Thu, 22 Jun 2023 21:39:05 +0200
Subject: [PATCH] [PATCH] analyzer: Fix regression bug after
 r14-1632-g9589a46ddadc8b [pr110198]

g++.dg/analyzer/pr100244.C was failing after a patch of PR109439.
The reason was a spurious preemptive return of get_store_value upon 
out-of-bounds read that
was preventing further checks. Now instead, a boolean value check_poisoned goes 
to false when
a OOB is detected, and is later on given to get_or_create_initial_value.

gcc/analyzer/ChangeLog:

* region-model-manager.cc 
(region_model_manager::get_or_create_initial_value): Take an
optional boolean value to bypass poisoning checks
* region-model-manager.h: Update declaration of the above function.
* region-model.cc (region_model::get_store_value): No longer
returns on OOB, but rather gives a boolean to 
get_or_create_initial_value.
(region_model::check_region_access): Update docstring.
(region_model::check_region_for_write): Update docstring.

Signed-off-by: benjamin priour 
---
 gcc/analyzer/region-model-manager.cc |  5 +++--
 gcc/analyzer/region-model-manager.h  |  3 ++-
 gcc/analyzer/region-model.cc | 15 ---
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 1453acf7bc9..4f11ef4bd29 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -293,9 +293,10 @@ region_model_manager::create_unique_svalue (tree type)
necessary.  */
 
 const svalue *
-region_model_manager::get_or_create_initial_value (const region *reg)
+region_model_manager::get_or_create_initial_value (const region *reg,
+  bool check_poisoned)
 {
-  if (!reg->can_have_initial_svalue_p ())
+  if (!reg->can_have_initial_svalue_p () && check_poisoned)
 return get_or_create_poisoned_svalue (POISON_KIND_UNINIT,
  reg->get_type ());
 
diff --git a/gcc/analyzer/region-model-manager.h 
b/gcc/analyzer/region-model-manager.h
index 3340c3ebd1e..ff5333bf07c 100644
--- a/gcc/analyzer/region-model-manager.h
+++ b/gcc/analyzer/region-model-manager.h
@@ -49,7 +49,8 @@ public:
 tree type);
   const svalue *get_or_create_poisoned_svalue (enum poison_kind kind,
   tree type);
-  const svalue *get_or_create_initial_value (const region *reg);
+  const svalue *get_or_create_initial_value (const region *reg,
+bool check_poisoned = true);
   const svalue *get_ptr_svalue (tree ptr_type, const region *pointee);
   const svalue *get_or_create_unaryop (tree type, enum tree_code op,
   const svalue *arg);
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6bc60f89f3d..187013a37cc 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2373,8 +2373,9 @@ region_model::get_store_value (const region *reg,
   if (reg->empty_p ())
 return m_mgr->get_or_create_unknown_svalue (reg->get_type ());
 
+  bool check_poisoned = true;
   if (check_region_for_read (reg, ctxt))
-return m_mgr->get_or_create_unknown_svalue(reg->get_type());
+check_poisoned = false;
 
   /* Special-case: handle var_decls in the constant pool.  */
   if (const decl_region *decl_reg = reg->dyn_cast_decl_region ())
@@ -2427,7 +2428,7 @@ region_model::get_store_value (const region *reg,
   == RK_GLOBALS)
 return get_initial_value_for_global (reg);
 
-  return m_mgr->get_or_create_initial_value (reg);
+  return m_mgr->get_or_create_initial_value (reg, check_poisoned);
 }
 
 /* Return false if REG does not exist, true if it may do.
@@ -2790,7 +2791,7 @@ region_model::get_string_size (const region *reg) const
 
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
using DIR to determine if this access is a read or write.
-   Return TRUE if an UNKNOWN_SVALUE needs be created.
+   Return TRUE if an OOB access was detected.
If SVAL_HINT is non-NULL, use it as a hint in diagnostics
about the value that would be written to REG.  */
 
@@ -2804,10 +2805,10 @@ region_model::check_region_access (const region *reg,
   if (!ctxt)
 return false;
 
-  bool need_unknown_sval = false;
+  bool oob_access_detected = false;
   check_region_for_taint (reg, dir, ctxt);
   if (!check_region_bounds (reg, dir, sval_hint, ctxt))
-need_unknown_sval = true;
+oob_access_detected = true;

Re: [PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-22 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
> (internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
> (internal_fn_stored_value_index): Add LEN_MASK_STORE.
> (internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
> * optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
> LEN_MASK_{LOAD,STORE}.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
> (vectorizable_load): Ditto.
>
> ---
>  gcc/internal-fn.cc |  36 ++-
>  gcc/optabs-tree.cc |  85 +---
>  gcc/optabs-tree.h  |   6 +-
>  gcc/tree-vect-stmts.cc | 220 +
>  4 files changed, 265 insertions(+), 82 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..b90bd85df2c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_SCATTER_STORE:
>return 4;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  default:
>return (conditional_internal_fn_code (fn) != ERROR_MARK
> || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,32 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_opno = 3;
> +
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();

We can't require this to succeed, since the query is speculative.
I think it should instead be:

  machine_mode mask_mode
  if (!targetm.vectorize.get_mask_mode (mode).exists (_mode))
return VECT_PARTIAL_BIAS_UNSUPPORTED;

> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_opno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
>   return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
>   return -1;
>  }
>  
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 77bf745ae40..ab9514fc8e0 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -543,19 +543,49 @@ target_supports_op_p (tree type, enum tree_code code,
> && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing);
>  }
>  
> -/* Return true if target supports vector masked load/store for mode.  */
> +/* Return true if the target has 

Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-06-22 Thread Andre Vieira (lists) via Gcc-patches
Some comments below, all quite minor. I'll continue to review tomorrow, 
I need a fresher brain for arm_mve_check_df_chain_back_for_implic_predic 
 ;)


+static int
+arm_mve_get_vctp_lanes (rtx x)
+{
+  if (GET_CODE (x) == SET && GET_CODE (XEXP (x, 1)) == UNSPEC
+  && (XINT (XEXP (x, 1), 1) == VCTP || XINT (XEXP (x, 1), 1) == 
VCTP_M))

+{
+  switch (GET_MODE (XEXP (x, 1)))
+   {
+ case V16BImode:
+   return 16;
+ case V8BImode:
+   return 8;
+ case V4BImode:
+   return 4;
+ case V2QImode:
+   return 2;
+ default:
+   break;
+   }
+}
+  return 0;
+}

I think you can replace the switch with something along the lines of:
machine_mode mode = GET_MODE (XEXP (x, 1));
return VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 0;


+/* Check if an insn requires the use of the VPR_REG, if it does, return the
+   sub-rtx of the VPR_REG.  The `type` argument controls whether
+   this function should:
+   * For type == 0, check all operands, including the OUT operands,
+ and return the first occurance of the VPR_REG.

s/occurance/occurrence/

+ bool requires_vpr;
+  extract_constrain_insn (insn);

indent of requires_vpr is off.

+  if (type == 1 && (recog_data.operand_type[op] == OP_OUT
+   || recog_data.operand_type[op] == OP_INOUT))
+   continue;
+  else if (type == 2 && (recog_data.operand_type[op] == OP_IN
+|| recog_data.operand_type[op] == OP_INOUT))
+   continue;

Why skip INOUT? I guess this will become clear when I see the uses, but 
I'm wondering whether 'only check the input operands.' is clear enough. 
Maybe 'check operands that are input only.' would be more accurate?


+ /* Fetch the reg_class for each entry and check it against the
+  * VPR_REG reg_class.  */

Remove leading * on the second line.

+
+/* Wrapper function of arm_get_required_vpr_reg with type == 1, so return
+   something only if the VPR reg is an input operand to the insn.  */

When talking about a function parameter in comments capitalize (INSN) 
the name. Same for:


+/* Wrapper function of arm_get_required_vpr_reg with type == 2, so return
+   something only if the VPR reg is the retrurn value, an output of, or is
+   clobbered by the insn.  */

+/* Return true if an insn is an MVE instruction that VPT-predicable, but in
+   its unpredicated form, or if it is predicated, but on a predicate other
+   than vpr_reg.  */

In this one also 'is a MVE instruction that is VPT-predicable' would be 
better I think.



On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:
>  Hi all,
>
>  This is the 2/2 patch that contains the functional changes needed
>  for MVE Tail Predicated Low Overhead Loops.  See my previous email
>  for a general introduction of MVE LOLs.
>
>  This support is added through the already existing loop-doloop
>  mechanisms that are used for non-MVE dls/le looping.
>
>  Mid-end changes are:
>
>  1) Relax the loop-doloop mechanism in the mid-end to allow for
> decrement numbers other that -1 and for `count` to be an
> rtx containing a simple REG (which in this case will contain
> the number of elements to be processed), rather
> than an expression for calculating the number of iterations.
>  2) Added a new df utility function: `df_bb_regno_only_def_find` that
> will return the DEF of a REG only if it is DEF-ed once within the
> basic block.
>
>  And many things in the backend to implement the above optimisation:
>
>  3)  Implement the `arm_predict_doloop_p` target hook to instruct the
>  mid-end about Low Overhead Loops (MVE or not), as well as
>  `arm_loop_unroll_adjust` which will prevent unrolling of any 
loops
>  that are valid for becoming MVE Tail_Predicated Low Overhead 
Loops
>  (unrolling can transform a loop in ways that invalidate the 
dlstp/

>  letp tranformation logic and the benefit of the dlstp/letp loop
>  would be considerably higher than that of unrolling)
>  4)  Appropriate changes to the define_expand of doloop_end, new
>  patterns for dlstp and letp, new iterators,  unspecs, etc.
>  5) `arm_mve_loop_valid_for_dlstp` and a number of checking 
functions:

> * `arm_mve_dlstp_check_dec_counter`
> * `arm_mve_dlstp_check_inc_counter`
> * `arm_mve_check_reg_origin_is_num_elems`
> * `arm_mve_check_df_chain_back_for_implic_predic`
> * `arm_mve_check_df_chain_fwd_for_implic_predic_impact`
> This all, in smoe way or another, are running checks on the loop
> structure in order to determine if the loop is valid for 
dlstp/letp

> transformation.
>  6) `arm_attempt_dlstp_transform`: (called from the define_expand of
>  doloop_end) this function re-checks for the loop's 
suitability for


Re: PING: Re: [PATCH] c++: provide #include hint for missing includes [PR110164]

2023-06-22 Thread Marek Polacek via Gcc-patches
On Wed, Jun 21, 2023 at 04:44:00PM -0400, David Malcolm via Gcc-patches wrote:
> I'd like to ping this C++ FE patch for review:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621779.html

Not an approval, but LGTM, though some nits below:
 
> On Wed, 2023-06-14 at 20:28 -0400, David Malcolm wrote:
> > PR c++/110164 notes that in cases where we have a forward decl
> > of a std library type such as:
> > 
> > std::array x;
> > 
> > we omit this diagnostic:
> > 
> > error: aggregate ‘std::array x’ has incomplete type and cannot be 
> > defined
> > 
> > This patch adds this hint to the diagnostic:
> > 
> > note: ‘std::array’ is defined in header ‘’; this is probably fixable 
> > by adding ‘#include ’
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > OK for trunk?
> > 
> > gcc/cp/ChangeLog:
> > PR c++/110164
> > * cp-name-hint.h (maybe_suggest_missing_header): New decl.
> > * decl.cc: Define INCLUDE_MEMORY.  Add include of
> > "cp/cp-name-hint.h".
> > (start_decl_1): Call maybe_suggest_missing_header.
> > * name-lookup.cc (maybe_suggest_missing_header): Remove "static".
> > 
> > gcc/testsuite/ChangeLog:
> > PR c++/110164
> > * g++.dg/missing-header-pr110164.C: New test.
> > ---
> >  gcc/cp/cp-name-hint.h  |  3 +++
> >  gcc/cp/decl.cc | 10 ++
> >  gcc/cp/name-lookup.cc  |  2 +-
> >  gcc/testsuite/g++.dg/missing-header-pr110164.C | 10 ++
> >  4 files changed, 24 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/g++.dg/missing-header-pr110164.C
> > 
> > diff --git a/gcc/cp/cp-name-hint.h b/gcc/cp/cp-name-hint.h
> > index bfa7c53c8f6..e2387e23d1f 100644
> > --- a/gcc/cp/cp-name-hint.h
> > +++ b/gcc/cp/cp-name-hint.h
> > @@ -32,6 +32,9 @@ along with GCC; see the file COPYING3.  If not see
> >  
> >  extern name_hint suggest_alternatives_for (location_t, tree, bool);
> >  extern name_hint suggest_alternatives_in_other_namespaces (location_t, 
> > tree);
> > +extern name_hint maybe_suggest_missing_header (location_t location,
> > +  tree name,
> > +  tree scope);

The enclosing decls omit the parameter names; if you do that, it may
fit on one line.

> >  extern name_hint suggest_alternative_in_explicit_scope (location_t, tree, 
> > tree);
> >  extern name_hint suggest_alternative_in_scoped_enum (tree, tree);
> >  
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index a672e4844f1..504b08ec250 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
> >     line numbers.  For example, the CONST_DECLs for enum values.  */
> >  
> >  #include "config.h"
> > +#define INCLUDE_MEMORY
> >  #include "system.h"
> >  #include "coretypes.h"
> >  #include "target.h"
> > @@ -46,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "c-family/c-objc.h"
> >  #include "c-family/c-pragma.h"
> >  #include "c-family/c-ubsan.h"
> > +#include "cp/cp-name-hint.h"
> >  #include "debug.h"
> >  #include "plugin.h"
> >  #include "builtins.h"
> > @@ -5995,7 +5997,11 @@ start_decl_1 (tree decl, bool initialized)
> > ;   /* An auto type is ok.  */
> >    else if (TREE_CODE (type) != ARRAY_TYPE)
> > {
> > + auto_diagnostic_group d;
> >   error ("variable %q#D has initializer but incomplete type", decl);
> > + maybe_suggest_missing_header (input_location,
> > +   TYPE_IDENTIFIER (type),
> > +   TYPE_CONTEXT (type));

Maybe CP_TYPE_CONTEXT?

> >   type = TREE_TYPE (decl) = error_mark_node;
> > }
> >    else if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (type
> > @@ -6011,8 +6017,12 @@ start_decl_1 (tree decl, bool initialized)
> > gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (type));
> >    else
> > {
> > + auto_diagnostic_group d;
> >   error ("aggregate %q#D has incomplete type and cannot be defined",
> >  decl);
> > + maybe_suggest_missing_header (input_location,
> > +   TYPE_IDENTIFIER (type),
> > +   TYPE_CONTEXT (type));

Here as well.

> >   /* Change the type so that assemble_variable will give
> >  DECL an rtl we can live with: (mem (const_int 0)).  */
> >   type = TREE_TYPE (decl) = error_mark_node;
> > diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> > index 6ac58a35b56..917b481c163 100644
> > --- a/gcc/cp/name-lookup.cc
> > +++ b/gcc/cp/name-lookup.cc
> > @@ -6796,7 +6796,7 @@ maybe_suggest_missing_std_header (location_t 
> > location, tree name)
> >     for NAME within SCOPE at LOCATION, or an empty name_hint if this isn't

RE: [PATCH v3] Streamer: Fix out of range memory access of machine mode

2023-06-22 Thread Li, Pan2 via Gcc-patches
Looks Jivan notices this issue too, cc Jivan for awareness.

> Re: LTO: buffer overflow in lto_output_init_mode_table
> Hi Robbin.

> Thank you for responding.
> I will defer my thread.

> On Thu, Jun 22, 2023 at 3:42 PM Robin Dapp  wrote:

>> Hi Jivan,
>>
>> I think Pan is already on this problem.  Please see this thread:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622129.html
>>
>> Regards
>>  Robin

Pan


-Original Message-
From: Li, Pan2  
Sent: Wednesday, June 21, 2023 3:58 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; rdapp@gmail.com; jeffreya...@gmail.com; Li, Pan2 
; Wang, Yanzhang ; 
kito.ch...@gmail.com; rguent...@suse.de; ja...@redhat.com
Subject: [PATCH v3] Streamer: Fix out of range memory access of machine mode

From: Pan Li 

We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the streamer. It has one hard coded array
for the machine code like size 256.

In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are
added. While the machine mode array in tree-streamer still leave 256 as is.

Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.

This patch would like to take the MAX_MACHINE_MODE as the size of the
array in streamer, to make sure there is no potential unexpected
memory access in future. Meanwhile, this patch also adjust some place
which has MAX_MACHINE_MODE <= 256 assumption.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
bits for machine mode table.
* lto-streamer-out.cc (lto_write_mode_table): Stream out the
HOST machine mode bits.
* lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
* tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
as the table size.
* tree-streamer.h (streamer_mode_table): Ditto.
(bp_pack_machine_mode): Take 1 << ceil_log2 (MAX_MACHINE_MODE)
as the packing limit.
(bp_unpack_machine_mode): Ditto.
---
 gcc/lto-streamer-in.cc  | 12 
 gcc/lto-streamer-out.cc | 11 ---
 gcc/lto-streamer.h  |  2 ++
 gcc/tree-streamer.cc|  2 +-
 gcc/tree-streamer.h | 14 +-
 5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2cb83406db5..2a0720b4e6f 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
 internal_error ("cannot read LTO mode table from %s",
file_data->file_name);
 
-  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
-  file_data->mode_table = table;
   const struct lto_simple_header_with_strings *header
 = (const struct lto_simple_header_with_strings *) data;
   int string_offset;
@@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
*file_data)
header->string_size, vNULL);
   bitpack_d bp = streamer_read_bitpack ();
 
+  unsigned mode_bits = bp_unpack_value (, 5);
+  unsigned char *table = ggc_cleared_vec_alloc (1 << mode_bits);
+
+  file_data->mode_table = table;
+  file_data->mode_bits = mode_bits;
+
   table[VOIDmode] = VOIDmode;
   table[BLKmode] = BLKmode;
   unsigned int m;
-  while ((m = bp_unpack_value (, 8)) != VOIDmode)
+  while ((m = bp_unpack_value (, mode_bits)) != VOIDmode)
 {
   enum mode_class mclass
= bp_unpack_enum (, mode_class, MAX_MODE_CLASS);
   poly_uint16 size = bp_unpack_poly_value (, 16);
   poly_uint16 prec = bp_unpack_poly_value (, 16);
-  machine_mode inner = (machine_mode) bp_unpack_value (, 8);
+  machine_mode inner = (machine_mode) bp_unpack_value (, mode_bits);
   poly_uint16 nunits = bp_unpack_poly_value (, 16);
   unsigned int ibit = 0, fbit = 0;
   unsigned int real_fmt_len = 0;
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 5ab2eb4301e..36899283ded 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -3196,6 +3196,11 @@ lto_write_mode_table (void)
if (inner_m != m)
  streamer_mode_table[(int) inner_m] = 1;
   }
+
+  /* Pack the mode_bits value within 5 bits (up to 31) in the beginning.  */
+  unsigned mode_bits = ceil_log2 (MAX_MACHINE_MODE);
+  bp_pack_value (, mode_bits, 5);
+
   /* First stream modes that have GET_MODE_INNER (m) == m,
  so that we can refer to them afterwards.  */
   for (int pass = 0; pass < 2; pass++)
@@ -3205,11 +3210,11 @@ lto_write_mode_table (void)
  machine_mode m = (machine_mode) i;
  if ((GET_MODE_INNER (m) == m) ^ (pass == 0))
continue;
- bp_pack_value (, m, 8);
+ bp_pack_value (, m, mode_bits);
  bp_pack_enum (, mode_class, MAX_MODE_CLASS, 

Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-22 Thread Robin Dapp via Gcc-patches
> Just curious about the combine pass you mentioned, not very sure my
> understand is correct but it looks like the combine pass totally
> ignore the iterator requirement?
> 
> It is sort of surprise to me as the combine pass may also need the
> information of iterators.

combine tries to match instructions (with fitting modes of course).
It does not look at the insn constraints that reload/lra later can
use to switch between alternatives depending on the register situation
and other factors.

We e.g. have an instruction
 (define_insn "bla"
   (set (match_operand:VF 1   "=vd")
(match_operand:VF 2   "vr"))
   ...
and implicitly
  [(set_attr "enabled" "true")]

This instruction gets multiplexed via the VF iterator into (among others)
  (define_insn "bla"
(set (match_operand:VNx4HF 1   "=vd")
 (match_operand:VNx4HF 2   "vr"))
...
  [(set_attr "enabled" "true")]

When we set "enabled" to "false" via "fp_vector_disabled", we have:
  (define_insn "bla"
(set (match_operand:VNx4HF 1   "=vd")
 (match_operand:VNx4HF 2   "vr"))
...
  [(set_attr "enabled" "false")]

This means the only available alternative is disabled but the insn
itself is still there, particularly for combine which does not look
into the constraints.

So in our case the iterator "allowed" the instruction (leading combine
to think it is available) and we later masked it out with "enabled = false".
Now we could argue that combine's behavior should change here and an
insn without any alternatives is not actually available but that's not
a battle I'm willing to fight :D

Regards
 Robin


Re: [PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-22 Thread Philipp Tomsich
Richard,

OK for backport to GCC-13?

Thanks,
Philipp.

On Thu, 22 Jun 2023 at 16:18, Richard Sandiford via Gcc-patches
 wrote:
>
> Di Zhao OS via Gcc-patches  writes:
> > This patch enables reassociation of floating-point additions on ampere1.
> > This brings about 1% overall benefit on spec2017 fprate cases. (There
> > are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)
> >
> > Bootstrapped and tested on aarch64-unknown-linux-gnu. Is this OK for trunk?
> >
> > Thanks,
> > Di Zhao
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1
>
> Thanks, pushed to trunk.
>
> Richard
>
> > ---
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index d16565b5581..301c9f6c0cd 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1927,7 +1927,7 @@ static const struct tune_params ampere1_tunings =
> >"32:12",   /* loop_align.  */
> >2, /* int_reassoc_width.  */
> >4, /* fp_reassoc_width.  */
> > -  1, /* fma_reassoc_width.  */
> > +  4, /* fma_reassoc_width.  */
> >2, /* vec_reassoc_width.  */
> >2, /* min_div_recip_mul_sf.  */
> >2, /* min_div_recip_mul_df.  */


RE: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-22 Thread Li, Pan2 via Gcc-patches
Just curious about the combine pass you mentioned, not very sure my understand 
is correct but it looks like the combine pass totally ignore the iterator 
requirement?

It is sort of surprise to me as the combine pass may also need the information 
of iterators.

Pan


From: 钟居哲 
Sent: Thursday, June 22, 2023 9:37 PM
To: rdapp.gcc ; gcc-patches ; 
palmer ; kito.cheng ; Li, Pan2 
; Jeff Law 
Cc: rdapp.gcc 
Subject: Re: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

Oh. I see. I think I am wrong.  Sorry for that :).
load/store are using 'V' iterators.

This patch looks reasonable to me now.

Thanks for catching this.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-06-22 21:32
To: 钟居哲; 
gcc-patches; palmer; 
kito.cheng; pan2.li; 
Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).
> I don't understand why it is necessary to bother "VF". "VF” should
> not be changed since intrinsic stuff is quite stable and any
> unreasonable changes are unacceptable.

Ok, I hear your concern.  My argument is: Currently our mechanism
of disabling instructions is incorrect and if any of the VF instructions
were to be created by combine, fwprop or other passes we'd potentially
ICE in reload.  The other option is to leave VF unchanged and duplicate
all patterns for VHF.  Those can have a TARGET_ZVFH then.

> vle/vse/vluxei/vloxei/vsuxei/vsoxei/vlse/vsse.

These are all V/VT and not VF? (apart from vlse which I adjusted)

Regards
Robin



Re: [PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-22 Thread Richard Sandiford via Gcc-patches
Di Zhao OS via Gcc-patches  writes:
> This patch enables reassociation of floating-point additions on ampere1.
> This brings about 1% overall benefit on spec2017 fprate cases. (There
> are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)
>
> Bootstrapped and tested on aarch64-unknown-linux-gnu. Is this OK for trunk?
>
> Thanks,
> Di Zhao
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1

Thanks, pushed to trunk.

Richard

> ---
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d16565b5581..301c9f6c0cd 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -1927,7 +1927,7 @@ static const struct tune_params ampere1_tunings =
>"32:12",   /* loop_align.  */
>2, /* int_reassoc_width.  */
>4, /* fp_reassoc_width.  */
> -  1, /* fma_reassoc_width.  */
> +  4, /* fma_reassoc_width.  */
>2, /* vec_reassoc_width.  */
>2, /* min_div_recip_mul_sf.  */
>2, /* min_div_recip_mul_df.  */


Re: Re: [PATCH V4] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-22 Thread 钟居哲
Hi, Richi. Thanks so much for the review && comments.
I have sent V5 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622539.html 
with addressing your comments.

Would you mind taking a look at it? This only difference between V5 and V4 is
optabs-tree.cc

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-22 20:47
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V4] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
On Wed, 21 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
> (internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
> (internal_fn_stored_value_index): Add LEN_MASK_STORE.
> (internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
> * optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
> LEN_MASK_{LOAD,STORE}.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
> (vectorizable_load): Ditto.
> 
> ---
>  gcc/internal-fn.cc |  36 ++-
>  gcc/optabs-tree.cc |  60 +--
>  gcc/optabs-tree.h  |   6 +-
>  gcc/tree-vect-stmts.cc | 220 +
>  4 files changed, 243 insertions(+), 79 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..b90bd85df2c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_SCATTER_STORE:
>return 4;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  default:
>return (conditional_internal_fn_code (fn) != ERROR_MARK
>|| get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,32 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_opno = 3;
> +
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_opno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
>  return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
>  return -1;
>  }
>  
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 77bf745ae40..e90e1b62ebc 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -548,14 +548,29 @@ target_supports_op_p (tree type, enum 

[PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-22 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
LEN_MASK_STORE.
(internal_load_fn_p): Add LEN_MASK_LOAD.
(internal_store_fn_p): Add LEN_MASK_STORE.
(internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
(internal_fn_stored_value_index): Add LEN_MASK_STORE.
(internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
* optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
LEN_MASK_{LOAD,STORE}.
(get_len_load_store_mode): Ditto.
* optabs-tree.h (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
(vectorizable_load): Ditto.

---
 gcc/internal-fn.cc |  36 ++-
 gcc/optabs-tree.cc |  85 +---
 gcc/optabs-tree.h  |   6 +-
 gcc/tree-vect-stmts.cc | 220 +
 4 files changed, 265 insertions(+), 82 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..b90bd85df2c 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
  * OPTAB.  */
 
 static void
-expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
 {
   class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
@@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
-  rhs = gimple_call_arg (stmt, 3);
+  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
   type = TREE_TYPE (rhs);
   lhs = expand_call_mem_ref (type, stmt, 0);
 
@@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
 case IFN_GATHER_LOAD:
 case IFN_MASK_GATHER_LOAD:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   return true;
 
 default:
@@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   return true;
 
 default:
@@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_SCATTER_STORE:
   return 4;
 
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
+  return 3;
+
 default:
   return (conditional_internal_fn_code (fn) != ERROR_MARK
  || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
@@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
 case IFN_LEN_STORE:
   return 3;
 
+case IFN_LEN_MASK_STORE:
+  return 4;
+
 default:
   return -1;
 }
@@ -4583,13 +4592,32 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
 {
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);
+  int bias_opno = 3;
+
+  if (icode == CODE_FOR_nothing)
+{
+  machine_mode mask_mode
+   = targetm.vectorize.get_mask_mode (mode).require ();
+  if (ifn == IFN_LEN_LOAD)
+   {
+ /* Try LEN_MASK_LOAD.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
+   }
+  else
+   {
+ /* Try LEN_MASK_STORE.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
+   }
+  icode = convert_optab_handler (optab, mode, mask_mode);
+  bias_opno = 4;
+}
 
   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
-  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
return 0;
-  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
return -1;
 }
 
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index 77bf745ae40..ab9514fc8e0 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -543,19 +543,49 @@ target_supports_op_p (tree type, enum tree_code code,
  && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing);
 }
 
-/* Return true if target supports vector masked load/store for mode.  */
+/* Return true if the target has support for masked load/store.
+   We can support masked load/store by either mask{load,store}
+   or len_mask{load,store}.
+   This helper function checks whether target supports masked
+   load/store and return corresponding IFN in the last argument
+   (IFN_MASK_{LOAD,STORE} or IFN_LEN_MASK_{LOAD,STORE}).  */
+
+static bool
+target_supports_mask_load_store_p (machine_mode mode, machine_mode mask_mode,
+  bool is_load, internal_fn *ifn)
+{
+  optab op = is_load ? 

Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()

2023-06-22 Thread Jan Hubicka via Gcc-patches
> 
> 
> On 6/22/23 00:31, Richard Biener wrote:
> > I think there's a difference in that __builtin_trap () is observable
> > while __builtin_unreachable () is not and reaching __builtin_unreachable
> > () invokes undefined behavior while reaching __builtin_trap () does not.
> > 
> > So the isolation code marking the trapping code volatile should be
> > enough and the trap () is just there to end the basic block
> > (and maybe be on the safe side to really trap).
> Agreed WRT observability -- but that's not really the point of the trap and
> if we wanted we could change that behavior.
> 
> The trap is there to halt execution immediately rather than letting it keep
> running.  That was a design decision from a security standpoint. If we've
> detected that we're executing undefined behavior, stop rather than
> potentially letting a malicious actor turn a bug into an exploit.

Also as discussed some time ago, the volatile loads between traps has
effect of turning previously pure/const functions into non-const which
is somewhat sad, so it is still on my todo list to change it this stage1
to something more careful.   We discussed internal functions trap_store
and trap_load which will expand to load/store + trap but will make it
clear that side effect does not count to modref.

I wanted to give it some time if I can come with something better, but
didn't so far.

Honza


Re: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-22 Thread 钟居哲
Oh. I see. I think I am wrong.  Sorry for that :).
load/store are using 'V' iterators. 

This patch looks reasonable to me now.

Thanks for catching this.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-22 21:32
To: 钟居哲; gcc-patches; palmer; kito.cheng; pan2.li; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).
> I don't understand why it is necessary to bother "VF". "VF” should
> not be changed since intrinsic stuff is quite stable and any
> unreasonable changes are unacceptable.
 
Ok, I hear your concern.  My argument is: Currently our mechanism
of disabling instructions is incorrect and if any of the VF instructions
were to be created by combine, fwprop or other passes we'd potentially
ICE in reload.  The other option is to leave VF unchanged and duplicate
all patterns for VHF.  Those can have a TARGET_ZVFH then.
 
> vle/vse/vluxei/vloxei/vsuxei/vsoxei/vlse/vsse.
 
These are all V/VT and not VF? (apart from vlse which I adjusted)
 
Regards
Robin
 


Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()

2023-06-22 Thread Jeff Law via Gcc-patches




On 6/22/23 00:31, Richard Biener wrote:

I think there's a difference in that __builtin_trap () is observable
while __builtin_unreachable () is not and reaching __builtin_unreachable
() invokes undefined behavior while reaching __builtin_trap () does not.

So the isolation code marking the trapping code volatile should be
enough and the trap () is just there to end the basic block
(and maybe be on the safe side to really trap).
Agreed WRT observability -- but that's not really the point of the trap 
and if we wanted we could change that behavior.


The trap is there to halt execution immediately rather than letting it 
keep running.  That was a design decision from a security standpoint. 
If we've detected that we're executing undefined behavior, stop rather 
than potentially letting a malicious actor turn a bug into an exploit.


jeff


Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-22 Thread Robin Dapp via Gcc-patches
> I don't understand why it is necessary to bother "VF". "VF” should
> not be changed since intrinsic stuff is quite stable and any
> unreasonable changes are unacceptable.

Ok, I hear your concern.  My argument is: Currently our mechanism
of disabling instructions is incorrect and if any of the VF instructions
were to be created by combine, fwprop or other passes we'd potentially
ICE in reload.  The other option is to leave VF unchanged and duplicate
all patterns for VHF.  Those can have a TARGET_ZVFH then.

> vle/vse/vluxei/vloxei/vsuxei/vsoxei/vlse/vsse.

These are all V/VT and not VF? (apart from vlse which I adjusted)

Regards
 Robin


Re: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-22 Thread 钟居哲
I don't understand why it is necessary to bother "VF".
"VF” should not be changed since intrinsic stuff is quite stable and any 
unreasonable changes
are unacceptable.

>> What are the many instructions that are valid in TARGET_ZVFHMIN?
vle/vse/vluxei/vloxei/vsuxei/vsoxei/vlse/vsse.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-22 21:22
To: 钟居哲; gcc-patches; palmer; kito.cheng; pan2.li; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).
> You change "VF" constraint as "TARGET_ZVFH" which is incorrect since
> we a lot of instructions are valid in "TARGET_ZVFHMIN"  in vector.md
> but you disabled them in this patch. You disabled them unexpectedly.
 
Yes that was kind of the point :)  IMHO all the :VF insns are actually
only valid in a TARGET_ZVFH setting with the exception of pred_broadcast
which I changed to VF_ZVFHMIN (vfmv.v.f and vfmv.s.f will be "masked out"
by the "enabled" attribute).  Now I'm not saying I might have missed/mixed
up some insns in this patch but e.g. the binops/unops shouldn't be enabled
(and their alternatives are already disabled as of now).
 
What are the many instructions that are valid in TARGET_ZVFHMIN?
 
Regards
Robin
 
 


Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-22 Thread Robin Dapp via Gcc-patches
> You change "VF" constraint as "TARGET_ZVFH" which is incorrect since
> we a lot of instructions are valid in "TARGET_ZVFHMIN"  in vector.md
> but you disabled them in this patch. You disabled them unexpectedly.

Yes that was kind of the point :)  IMHO all the :VF insns are actually
only valid in a TARGET_ZVFH setting with the exception of pred_broadcast
which I changed to VF_ZVFHMIN (vfmv.v.f and vfmv.s.f will be "masked out"
by the "enabled" attribute).  Now I'm not saying I might have missed/mixed
up some insns in this patch but e.g. the binops/unops shouldn't be enabled
(and their alternatives are already disabled as of now).

What are the many instructions that are valid in TARGET_ZVFHMIN?

Regards
 Robin



Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-22 Thread 钟居哲
You change "VF" constraint as "TARGET_ZVFH" which is incorrect since we
a lot of instructions are valid in "TARGET_ZVFHMIN"  in vector.md but you 
disabled them in this patch.
You disabled them unexpectedly.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-22 21:03
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; Li, Pan2; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Split VF iterators for Zvfh(min).
Hi,
 
when working on FP widening/narrowing I realized the Zvfhmin handling
is not ideal right now:  We use the "enabled" insn attribute to disable
instructions not available with Zvfhmin (but only with Zvfh).
 
However, "enabled == 0" only disables insn alternatives, in our case all
of them when the mode is a HFmode.  The insn itself remains available
(e.g. for combine to match) and we end up with an insn without alternatives
that reload cannot handle --> ICE.
 
The proper solution is to disable the instruction for the respective
mode altogether.  This patch achieves this by splitting the VF as well
as VWEXTF iterators into variants with TARGET_ZVFH and
TARGET_VECTOR_ELEN_FP_16 (which is true when either TARGET_ZVFH or
TARGET_ZVFHMIN are true).  Also, VWCONVERTI, VHF and VHF_LMUL1 need
adjustments.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: VF_AUTO -> VF.
* config/riscv/vector-iterators.md: Introduce VF_ZVFHMIN,
VWEXTF_ZVFHMIN and use TARGET_ZVFH in VWCONVERTI, VHF and
VHF_LMUL1.
* config/riscv/vector.md: Use new iterators.
---
gcc/config/riscv/autovec.md  | 28 ++---
gcc/config/riscv/vector-iterators.md | 63 ++--
gcc/config/riscv/vector.md   | 20 -
3 files changed, 64 insertions(+), 47 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f1641d7e1ea..7f0b2befd6b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -539,9 +539,9 @@ (define_expand "abs2"
;; - vfneg.v/vfabs.v
;; 
---
(define_expand "2"
-  [(set (match_operand:VF_AUTO 0 "register_operand")
-(any_float_unop_nofrm:VF_AUTO
- (match_operand:VF_AUTO 1 "register_operand")))]
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop_nofrm:VF
+ (match_operand:VF 1 "register_operand")))]
   "TARGET_VECTOR"
{
   insn_code icode = code_for_pred (, mode);
@@ -556,9 +556,9 @@ (define_expand "2"
;; - vfsqrt.v
;; 
---
(define_expand "2"
-  [(set (match_operand:VF_AUTO 0 "register_operand")
-(any_float_unop:VF_AUTO
- (match_operand:VF_AUTO 1 "register_operand")))]
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop:VF
+ (match_operand:VF 1 "register_operand")))]
   "TARGET_VECTOR"
{
   insn_code icode = code_for_pred (, mode);
@@ -777,10 +777,10 @@ (define_expand "vec_extract"
;; - vfadd.vf/vfsub.vf/...
;; -
(define_expand "3"
-  [(match_operand:VF_AUTO 0 "register_operand")
-   (any_float_binop:VF_AUTO
-(match_operand:VF_AUTO 1 "register_operand")
-(match_operand:VF_AUTO 2 "register_operand"))]
+  [(match_operand:VF 0 "register_operand")
+   (any_float_binop:VF
+(match_operand:VF 1 "register_operand")
+(match_operand:VF 2 "register_operand"))]
   "TARGET_VECTOR"
{
   riscv_vector::emit_vlmax_fp_insn (code_for_pred (, mode),
@@ -794,10 +794,10 @@ (define_expand "3"
;; - vfmin.vf/vfmax.vf
;; -
(define_expand "3"
-  [(match_operand:VF_AUTO 0 "register_operand")
-   (any_float_binop_nofrm:VF_AUTO
-(match_operand:VF_AUTO 1 "register_operand")
-(match_operand:VF_AUTO 2 "register_operand"))]
+  [(match_operand:VF 0 "register_operand")
+   (any_float_binop_nofrm:VF
+(match_operand:VF 1 "register_operand")
+(match_operand:VF 2 "register_operand"))]
   "TARGET_VECTOR"
{
   riscv_vector::emit_vlmax_insn (code_for_pred (, mode),
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 6ca1c54c709..ae9f44b5f78 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -271,7 +271,7 @@ (define_mode_iterator VWI [
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
])
-(define_mode_iterator VF [
+(define_mode_iterator VF_ZVFHMIN [
   (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
@@ -295,11 +295,12 @@ (define_mode_iterator VF [
;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
-;; TARGET_ZVFHMIN while we actually disable all instructions apart from
-;; load, store and convert for it.
-;; Consequently the autovec 

[committed] libgomp.texi: Improve OpenMP ICV description

2023-06-22 Thread Tobias Burnus

Committed as r14-2032-g2cd0689a79498d

Nonetheless, comments are highly welcome. https://gcc.gnu.org/PR110364
lists some follow-up tasks related to environment variables, both in
terms of code handling and in terms of documentation.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 2cd0689a79498dcaaadc8cc5c1c4d0a452a4fb09
Author: Tobias Burnus 
Date:   Thu Jun 22 14:57:54 2023 +0200

libgomp.texi: Improve OpenMP ICV description

Use @var{} instead of @emph{} - for semantic texinfo formatting; the result
is similar: slanted instead of italic in PDF, still italic in HTML, albeit
in info is is now uppercase instead of '_' as pre/suffix.

The patch also documents the newer _ALL/_DEV/_DEV_ env var suffixes
and as it refers to the ICV vars and their scope, those were added to the
OMP_ env vars for reference. For OMP_NESTING, a note that those were
deprecated was added plus a bunch of cross references. For OMP_ALLOCATOR,
add note about the lack of per-device env vars support.

A new section, consisting mostly of cross references was added to document
the implementation-defined ICV initialization, especially as OpenMP demands
that implementations document what they do for 'implementation defined'.

For nvptx, the implementation-defined used stack size was documented

libgomp/
* libgomp.texi: Use @var for ICV vars.
(OpenMP Environment Variables): Mention _ALL/_DEV/_DEV_ variants,
document which ICV is set and which scope the ICV has; extend/cleanup
some @ref.
(Implementation-defined ICV Initialization): New.
(nvptx): Document the implementation-defined used per-warp stack size.
---
 libgomp/libgomp.texi | 129 ++-
 1 file changed, 117 insertions(+), 12 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index db8b1f1427e..7d27cc50df5 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -188,9 +188,9 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{metadirective} directive @tab N @tab
 @item @code{declare variant} directive
   @tab P @tab @emph{simd} traits not handled correctly
-@item @emph{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
+@item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
   env variable @tab Y @tab
-@item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
+@item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
 @item @code{requires} directive @tab P
   @tab complete but no non-host devices provides @code{unified_shared_memory}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
@@ -364,7 +364,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 
 @multitable @columnfractions .60 .10 .25
 @headitem Description @tab Status @tab Comments
-@item @code{omp_in_explicit_task} routine and @emph{explicit-task-var} ICV
+@item @code{omp_in_explicit_task} routine and @var{explicit-task-var} ICV
   @tab Y @tab
 @item @code{omp}/@code{ompx}/@code{omx} sentinels and @code{omp_}/@code{ompx_}
   namespaces @tab N/A
@@ -422,7 +422,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @item For Fortran, optional comma between directive and clause @tab N @tab
 @item Conforming device numbers and @code{omp_initial_device} and
   @code{omp_invalid_device} enum/PARAMETER @tab Y @tab
-@item Initial value of @emph{default-device-var} ICV with
+@item Initial value of @var{default-device-var} ICV with
   @code{OMP_TARGET_OFFLOAD=mandatory} @tab Y @tab
 @item @emph{interop_types} in any position of the modifier list for the @code{init} clause
   of the @code{interop} construct @tab N @tab
@@ -912,6 +912,9 @@ Nested parallel regions can be enabled or disabled at runtime using
 regions with @code{omp_set_max_active_levels} to one to disable, or
 above one to enable.
 
+Note that the @code{omp_get_nested} API routine was deprecated
+in the OpenMP specification 5.2 in favor of @code{omp_get_max_active_levels}.
+
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
@@ -923,7 +926,7 @@ above one to enable.
 @end multitable
 
 @item @emph{See also}:
-@ref{omp_set_max_active_levels}, @ref{omp_set_nested},
+@ref{omp_get_max_active_levels}, @ref{omp_set_nested},
 @ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
 
 @item @emph{Reference}:
@@ -1416,6 +1419,9 @@ Enabling nested parallel regions will also set the maximum number of
 active nested regions to the maximum supported.  Disabling nested parallel
 regions will set the maximum 

[PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-22 Thread Robin Dapp via Gcc-patches
Hi,

when working on FP widening/narrowing I realized the Zvfhmin handling
is not ideal right now:  We use the "enabled" insn attribute to disable
instructions not available with Zvfhmin (but only with Zvfh).

However, "enabled == 0" only disables insn alternatives, in our case all
of them when the mode is a HFmode.  The insn itself remains available
(e.g. for combine to match) and we end up with an insn without alternatives
that reload cannot handle --> ICE.

The proper solution is to disable the instruction for the respective
mode altogether.  This patch achieves this by splitting the VF as well
as VWEXTF iterators into variants with TARGET_ZVFH and
TARGET_VECTOR_ELEN_FP_16 (which is true when either TARGET_ZVFH or
TARGET_ZVFHMIN are true).  Also, VWCONVERTI, VHF and VHF_LMUL1 need
adjustments.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md: VF_AUTO -> VF.
* config/riscv/vector-iterators.md: Introduce VF_ZVFHMIN,
VWEXTF_ZVFHMIN and use TARGET_ZVFH in VWCONVERTI, VHF and
VHF_LMUL1.
* config/riscv/vector.md: Use new iterators.
---
 gcc/config/riscv/autovec.md  | 28 ++---
 gcc/config/riscv/vector-iterators.md | 63 ++--
 gcc/config/riscv/vector.md   | 20 -
 3 files changed, 64 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f1641d7e1ea..7f0b2befd6b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -539,9 +539,9 @@ (define_expand "abs2"
 ;; - vfneg.v/vfabs.v
 ;; 
---
 (define_expand "2"
-  [(set (match_operand:VF_AUTO 0 "register_operand")
-(any_float_unop_nofrm:VF_AUTO
- (match_operand:VF_AUTO 1 "register_operand")))]
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop_nofrm:VF
+ (match_operand:VF 1 "register_operand")))]
   "TARGET_VECTOR"
 {
   insn_code icode = code_for_pred (, mode);
@@ -556,9 +556,9 @@ (define_expand "2"
 ;; - vfsqrt.v
 ;; 
---
 (define_expand "2"
-  [(set (match_operand:VF_AUTO 0 "register_operand")
-(any_float_unop:VF_AUTO
- (match_operand:VF_AUTO 1 "register_operand")))]
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop:VF
+ (match_operand:VF 1 "register_operand")))]
   "TARGET_VECTOR"
 {
   insn_code icode = code_for_pred (, mode);
@@ -777,10 +777,10 @@ (define_expand "vec_extract"
 ;; - vfadd.vf/vfsub.vf/...
 ;; -
 (define_expand "3"
-  [(match_operand:VF_AUTO 0 "register_operand")
-   (any_float_binop:VF_AUTO
-(match_operand:VF_AUTO 1 "register_operand")
-(match_operand:VF_AUTO 2 "register_operand"))]
+  [(match_operand:VF 0 "register_operand")
+   (any_float_binop:VF
+(match_operand:VF 1 "register_operand")
+(match_operand:VF 2 "register_operand"))]
   "TARGET_VECTOR"
 {
   riscv_vector::emit_vlmax_fp_insn (code_for_pred (, mode),
@@ -794,10 +794,10 @@ (define_expand "3"
 ;; - vfmin.vf/vfmax.vf
 ;; -
 (define_expand "3"
-  [(match_operand:VF_AUTO 0 "register_operand")
-   (any_float_binop_nofrm:VF_AUTO
-(match_operand:VF_AUTO 1 "register_operand")
-(match_operand:VF_AUTO 2 "register_operand"))]
+  [(match_operand:VF 0 "register_operand")
+   (any_float_binop_nofrm:VF
+(match_operand:VF 1 "register_operand")
+(match_operand:VF 2 "register_operand"))]
   "TARGET_VECTOR"
 {
   riscv_vector::emit_vlmax_insn (code_for_pred (, mode),
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 6ca1c54c709..ae9f44b5f78 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -271,7 +271,7 @@ (define_mode_iterator VWI [
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
 ])
 
-(define_mode_iterator VF [
+(define_mode_iterator VF_ZVFHMIN [
   (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
@@ -295,11 +295,12 @@ (define_mode_iterator VF [
 
 ;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
 ;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
-;; TARGET_ZVFHMIN while we actually disable all instructions apart from
-;; load, store and convert for it.
-;; Consequently the autovec expanders should also only be enabled with
-;; TARGET_ZVFH.
-(define_mode_iterator VF_AUTO [
+;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
+;; from load, store and convert for it.
+;; It is not enough to set the "enabled" attribute to false
+;; since this will only disable insn alternatives in reload but still
+;; allow the instruction 

Re: [PATCH V4] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-22 Thread Richard Biener via Gcc-patches
On Wed, 21 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
> (internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
> (internal_fn_stored_value_index): Add LEN_MASK_STORE.
> (internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
> * optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
> LEN_MASK_{LOAD,STORE}.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
> (vectorizable_load): Ditto.
> 
> ---
>  gcc/internal-fn.cc |  36 ++-
>  gcc/optabs-tree.cc |  60 +--
>  gcc/optabs-tree.h  |   6 +-
>  gcc/tree-vect-stmts.cc | 220 +
>  4 files changed, 243 insertions(+), 79 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..b90bd85df2c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_SCATTER_STORE:
>return 4;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  default:
>return (conditional_internal_fn_code (fn) != ERROR_MARK
> || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,32 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_opno = 3;
> +
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_opno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
>   return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
>   return -1;
>  }
>  
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 77bf745ae40..e90e1b62ebc 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -548,14 +548,29 @@ target_supports_op_p (tree type, enum tree_code code,
>  bool
>  can_vec_mask_load_store_p (machine_mode mode,
>  machine_mode mask_mode,
> -bool is_load)
> +bool is_load,
> +internal_fn *ifn)

Please update the function comment.

>  {
>optab op = is_load ? maskload_optab : maskstore_optab;
> +  optab len_op = is_load ? len_maskload_optab : len_maskstore_optab;
>  

Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-22 Thread Richard Biener via Gcc-patches
On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
 wrote:
>
> On Tue, 20 Jun 2023 at 16:47, Richard Biener  
> wrote:
> >
> > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> >  wrote:
> > >
> > > Hi Richard,
> > > For the following reduced test-case taken from PR:
> > >
> > > #include "arm_sve.h"
> > > svuint32_t l() {
> > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > }
> > >
> > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > during GIMPLE pass: fre
> > > pr110280.c: In function 'l':
> > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > tree-ssa-sccvn.cc:6890
> > > 5 | }
> > >   | ^
> > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > gimple_stmt_iterator*)
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > ../../gcc/gcc/domwalk.cc:311
> > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > 0x1214664 do_rpo_vn_1
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > 0x1215ba5 execute
> > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > >
> > > cc1 simplifies:
> > >   lanes[0] = 0;
> > >   lanes[1] = 0;
> > >   lanes[2] = 0;
> > >   lanes[3] = 0;
> > >   _1 = { -1, ... };
> > >   _7 = svld1rq_u32 (_1, );
> > >
> > > to:
> > >   _9 = MEM  [(unsigned int * {ref-all})];
> > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > >
> > > and then fre1 dump shows:
> > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> > > 0, 0, 0, 0 }
> > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 
> > > 0 }
> > >
> > > The issue seems to be with the following pattern:
> > > (simplify
> > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > >  @0)
> > >
> > > which simplifies above VEC_PERM_EXPR to:
> > > _7 = {0, 0, 0, 0}
> > > which is incorrect since _9 and mask have different vector lengths.
> > >
> > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> > > only if operand and mask have same number of elements, which seems to fix
> > > the issue, and we're left with the following in .optimized dump:
> > >[local count: 1073741824]:
> > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... 
> > > }>;
> >
> > it would be nice to have this optimized.
> >
> > -
> >  (simplify
> >   (vec_perm vec_same_elem_p@0 @0 @1)
> > - @0)
> > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> > +  @0))
> >
> > that looks good I think.  Maybe even better use 'type' instead of TREE_TYPE 
> > (@1)
> > since that's more obviously the return type in which case
> >
> >   (if (types_match (type, TREE_TYPE (@0))
> >
> > would be more to the point.
> >
> > But can't you to simplify this in the !known_eq case do a simple
> >
> >   { build_vector_from_val (type, the-element); }
> >
> > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> >
> >  (with { tree el = uniform_vector_p (@0); }
> >   (if (el)
> >{ build_vector_from_val (type, el); })))
> >
> > would be the cheapest workaround.
> Hi Richard,
> Thanks for the suggestions. Using build_vector_from_val simplifies it to:
>[local count: 1073741824]:
>   return { 0, ... };
>
> Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> x86_64-linux-gnu.
> OK to commit ?

Can you retain the case of matching type?  Like

  (if (types_match (type, TREE_TYPE (@0))
   @0
   (with
{
   tree elem = uniform_vector_p (@0);
}
   (if (elem)
{ build_vector_from_val (type, elem); }

?  Because uniform_vector_p is strictly less powerful than (vec_same_elem_p ...)

OK with that change.

Richard.


>
> Thanks,
> Prathamesh
> >
> > >   return _2;
> > >
> > > code-gen:
> > > l:
> > > mov z0.b, #0
> > > ret
> > >
> > > Patch is bootstrapped+tested on aarch64-linux-gnu.
> > > OK to commit ?
> > >
> > > Thanks,
> > > Prathamesh


Re: LTO: buffer overflow in lto_output_init_mode_table

2023-06-22 Thread Jivan Hakobyan via Gcc-patches
Hi Robbin.

Thank you for responding.
I will defer my thread.


On Thu, Jun 22, 2023 at 3:42 PM Robin Dapp  wrote:

> Hi Jivan,
>
> I think Pan is already on this problem.  Please see this thread:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622129.html
>
> Regards
>  Robin
>


-- 
With the best regards
Jivan Hakobyan


Re: LTO: buffer overflow in lto_output_init_mode_table

2023-06-22 Thread Robin Dapp via Gcc-patches
Hi Jivan,

I think Pan is already on this problem.  Please see this thread:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622129.html

Regards
 Robin


LTO: buffer overflow in lto_output_init_mode_table

2023-06-22 Thread Jivan Hakobyan via Gcc-patches
In the case when enabled -flto=N GCC aborted compilation.
The reason is the overflowing streamer_mode_table buffer.
It has 1 << 8 bytes but lto_output_init_mode_table() tries to fill
with MAX_MACHINE_MODE bytes.

gcc/ChangeLog:
   * tree-streamer.h (streamer_mode_table): Changed buffer size
   * tree-streamer.cc (streamer_mode_table): Likewise.


-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/tree-streamer.cc b/gcc/tree-streamer.cc
index ed65a7692e3..a28ef9c7920 100644
--- a/gcc/tree-streamer.cc
+++ b/gcc/tree-streamer.cc
@@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
During streaming in, we translate the on the disk mode using this
table.  For normal LTO it is set to identity, for ACCEL_COMPILER
depending on the mode_table content.  */
-unsigned char streamer_mode_table[1 << 8];
+unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 
 /* Check that all the TS_* structures handled by the streamer_write_* and
streamer_read_* routines are exactly ALL the structures defined in
diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
index 170d61cf20b..51a292c8d80 100644
--- a/gcc/tree-streamer.h
+++ b/gcc/tree-streamer.h
@@ -75,7 +75,7 @@ void streamer_write_tree_body (struct output_block *, tree);
 void streamer_write_integer_cst (struct output_block *, tree);
 
 /* In tree-streamer.cc.  */
-extern unsigned char streamer_mode_table[1 << 8];
+extern unsigned char streamer_mode_table[MAX_MACHINE_MODE];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
  hashval_t, unsigned *);


[PATCH] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-22 Thread Philipp Tomsich
From: Manolis Tsamis 

Fixes: 6a2e8dcbbd4bab3

Propagation for the stack pointer in regcprop was enabled in
6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).

This fix adds special handling for stack_pointer_rtx in the places
where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.

PR 110308

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
(find_oldest_value_reg): Special handling of stack_pointer_rtx.
(copyprop_hardreg_forward_1): Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr110308.C: New test.

Signed-off-by: Manolis Tsamis 
Signed-off-by: Philipp Tomsich 

---
This addresses both the PRs (110308 and 110313) and was confirmed to
resolve the AArch64 bootstrap issue reported by Thiago.

OK for trunk?

 gcc/regcprop.cc | 43 +
 gcc/testsuite/g++.dg/torture/pr110308.C | 30 +
 2 files changed, 60 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110308.C

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index 6cbfadb181f..fe75b7f1fa0 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -423,7 +423,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
  It's unclear if we need to do the same for other special registers.  */
   if (regno == STACK_POINTER_REGNUM)
 {
-  if (orig_mode == new_mode)
+  if (orig_mode == new_mode && new_mode == GET_MODE (stack_pointer_rtx))
return stack_pointer_rtx;
   else
return NULL_RTX;
@@ -487,9 +487,14 @@ find_oldest_value_reg (enum reg_class cl, rtx reg, struct 
value_data *vd)
   new_rtx = maybe_mode_change (oldmode, vd->e[regno].mode, mode, i, regno);
   if (new_rtx)
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
- REG_ATTRS (new_rtx) = REG_ATTRS (reg);
- REG_POINTER (new_rtx) = REG_POINTER (reg);
+ if (new_rtx != stack_pointer_rtx)
+   {
+ ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
+ REG_ATTRS (new_rtx) = REG_ATTRS (reg);
+ REG_POINTER (new_rtx) = REG_POINTER (reg);
+   }
+ else if (REG_POINTER (new_rtx) != REG_POINTER (reg))
+   return NULL_RTX;
  return new_rtx;
}
 }
@@ -965,15 +970,27 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
 
  if (validate_change (insn, _SRC (set), new_rtx, 0))
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
- REG_ATTRS (new_rtx) = REG_ATTRS (src);
- REG_POINTER (new_rtx) = REG_POINTER (src);
- if (dump_file)
-   fprintf (dump_file,
-"insn %u: replaced reg %u with %u\n",
-INSN_UID (insn), regno, REGNO (new_rtx));
- changed = true;
- goto did_replacement;
+ bool can_change;
+ if (new_rtx != stack_pointer_rtx)
+   {
+ ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
+ REG_ATTRS (new_rtx) = REG_ATTRS (src);
+ REG_POINTER (new_rtx) = REG_POINTER (src);
+ can_change = true;
+   }
+ else
+   can_change
+ = (REG_POINTER (new_rtx) == REG_POINTER (src));
+
+ if (can_change)
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"insn %u: replaced reg %u with %u\n",
+INSN_UID (insn), regno, REGNO (new_rtx));
+ changed = true;
+ goto did_replacement;
+   }
}
  /* We need to re-extract as validate_change clobbers
 recog_data.  */
diff --git a/gcc/testsuite/g++.dg/torture/pr110308.C 
b/gcc/testsuite/g++.dg/torture/pr110308.C
new file mode 100644
index 000..ddd30d4fc3f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr110308.C
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-g2 -O2" } */
+
+int channelCount, decodeBlock_outputLength;
+struct BlockCodec {
+  virtual int decodeBlock(const unsigned char *, short *);
+};
+struct ms_adpcm_state {
+  char predictorIndex;
+  int sample1;
+  ms_adpcm_state();
+};
+bool decodeBlock_ok;
+void encodeBlock() { ms_adpcm_state(); }
+struct MSADPCM : BlockCodec {
+  int decodeBlock(const unsigned char *, short *);
+};
+void decodeSample(ms_adpcm_state, bool 

[PATCH, OpenACC 2.7] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2023-06-22 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,
This patch adjusts the implementation of acc_map_data/acc_unmap_data API library
routines to more fit the description in the OpenACC 2.7 specification.

Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA
special value to mark acc_map_data-created mappings, and allow adjustment of
dynamic_refcount of such mappings by other constructs. Enforcing of an initial
value of 1 for such mappings, and only allowing acc_unmap_data to delete such
mappings, is implemented as specified.

Actually, there is no real change (or improvement) in behavior of the API (thus
no new tests) I've looked at the related OpenACC spec issues, and it seems that
this part of the 2.7 spec change is mostly a clarification (see no downside in
current REFCOUNT_INFINITY based implementation either).
But this patch does make the internals more close to the spec description.

Tested without regressions using powerpc64le-linux/nvptx, okay for trunk?

Thanks,
Chung-Lin

2023-06-22  Chung-Lin Tang  

libgomp/ChangeLog:

* libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2).
* oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
initialize dynamic_refcount as 1.
(acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
(goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case.
(goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect
REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
* target.c (gomp_increment_refcount): Add REFCOUNT_ACC_MAP_DATA case.
(gomp_decrement_refcount): Add REFCOUNT_ACC_MAP_DATA case, force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust
testcase error output scan test.
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 4d2bfab4b71..fb8ef651dfb 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1166,6 +1166,8 @@ struct target_mem_desc;
 /* Special value for refcount - tgt_offset contains target address of the
artificial pointer to "omp declare target link" object.  */
 #define REFCOUNT_LINK (REFCOUNT_SPECIAL | 1)
+/* Special value for refcount - created through acc_map_data.  */
+#define REFCOUNT_ACC_MAP_DATA (REFCOUNT_SPECIAL | 2)
 
 /* Special value for refcount - structure element sibling list items.
All such key refounts have REFCOUNT_STRUCTELEM bits set, with _FLAG_FIRST
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index fe632740769..2a782ac22c1 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -411,7 +411,8 @@ acc_map_data (void *h, void *d, size_t s)
   assert (n->refcount == 1);
   assert (n->dynamic_refcount == 0);
   /* Special reference counting behavior.  */
-  n->refcount = REFCOUNT_INFINITY;
+  n->refcount = REFCOUNT_ACC_MAP_DATA;
+  n->dynamic_refcount = 1;
 
   if (profiling_p)
{
@@ -460,7 +461,7 @@ acc_unmap_data (void *h)
  the different 'REFCOUNT_INFINITY' cases, or simply separate
  'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA'
  etc.)?  */
-  else if (n->refcount != REFCOUNT_INFINITY)
+  else if (n->refcount != REFCOUNT_ACC_MAP_DATA)
 {
   gomp_mutex_unlock (_dev->lock);
   gomp_fatal ("refusing to unmap block [%p,+%d] that has not been mapped"
@@ -519,7 +520,8 @@ goacc_map_var_existing (struct gomp_device_descr *acc_dev, 
void *hostaddr,
 }
 
   assert (n->refcount != REFCOUNT_LINK);
-  if (n->refcount != REFCOUNT_INFINITY)
+  if (n->refcount != REFCOUNT_INFINITY
+  && n->refcount != REFCOUNT_ACC_MAP_DATA)
 n->refcount++;
   n->dynamic_refcount++;
 
@@ -683,6 +685,7 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void 
*h, size_t s,
 
   assert (n->refcount != REFCOUNT_LINK);
   if (n->refcount != REFCOUNT_INFINITY
+  && n->refcount != REFCOUNT_ACC_MAP_DATA
   && n->refcount < n->dynamic_refcount)
 {
   gomp_mutex_unlock (_dev->lock);
@@ -691,15 +694,27 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, 
void *h, size_t s,
 
   if (finalize)
 {
-  if (n->refcount != REFCOUNT_INFINITY)
+  if (n->refcount != REFCOUNT_INFINITY
+ && n->refcount != REFCOUNT_ACC_MAP_DATA)
n->refcount -= n->dynamic_refcount;
-  n->dynamic_refcount = 0;
+
+  if (n->refcount == REFCOUNT_ACC_MAP_DATA)
+   /* Mappings created by acc_map_data are returned to initial
+  dynamic_refcount of 1. Can only be deleted by acc_unmap_data.  */
+   n->dynamic_refcount = 1;
+  else
+   n->dynamic_refcount = 0;
 }
   else if (n->dynamic_refcount)
 {
-  if (n->refcount != REFCOUNT_INFINITY)
+  if (n->refcount != REFCOUNT_INFINITY
+ && n->refcount != REFCOUNT_ACC_MAP_DATA)
n->refcount--;
-  n->dynamic_refcount--;
+
+  /* When mapping is created by acc_map_data, 

Re: [PATCH] Update array address space in c_build_qualified_type

2023-06-22 Thread SenthilKumar.Selvaraj--- via Gcc-patches
On Wed, 2023-06-21 at 18:37 +, Joseph Myers wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On Wed, 21 Jun 2023, Richard Biener via Gcc-patches wrote:
> 
> > >   This patch sets the address space of the array type to that of the
> > >   element type.
> > > 
> > >   Regression tests for avr look ok. Ok for trunk?
> > 
> > The patch looks OK to me but please let a C frontend maintainer
> > double-check (I've CCed Joseph for this).
> 
> The question would be whether there are any TYPE_QUALS uses in the C front
> end that behave incorrectly given TYPE_ADDR_SPACE (acting as qualifiers)
> being set on an array type - conceptually, before C2x, array types are
> unqualified, only the element types are qualified.

Hmm, but tree.cc:build_array_type_1 sets the address space of the element
type to the array type, and the relevant commit's ChangeLog entry (from 2009)
says "Inherit array address space from element type."

On the avr target, for an array like

const __flash int arr[] = {1,2,3};

I can see that the array type gets the address space of the element type
already, even before this patch. Surely that must have caused any code that
doesn't expect address space in array type quals to be broken then? 

Regards
Senthil


Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-22 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 20 Jun 2023 at 16:47, Richard Biener  wrote:
>
> On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > Hi Richard,
> > For the following reduced test-case taken from PR:
> >
> > #include "arm_sve.h"
> > svuint32_t l() {
> >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> >   return svld1rq_u32(svptrue_b8(), lanes);
> > }
> >
> > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > during GIMPLE pass: fre
> > pr110280.c: In function 'l':
> > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > tree-ssa-sccvn.cc:6890
> > 5 | }
> >   | ^
> > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > gimple_stmt_iterator*)
> > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > ../../gcc/gcc/domwalk.cc:311
> > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > 0x1214664 do_rpo_vn_1
> > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > 0x1215ba5 execute
> > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> >
> > cc1 simplifies:
> >   lanes[0] = 0;
> >   lanes[1] = 0;
> >   lanes[2] = 0;
> >   lanes[3] = 0;
> >   _1 = { -1, ... };
> >   _7 = svld1rq_u32 (_1, );
> >
> > to:
> >   _9 = MEM  [(unsigned int * {ref-all})];
> >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> >
> > and then fre1 dump shows:
> > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> > 0, 0, 0, 0 }
> > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 0, 0 }
> >
> > The issue seems to be with the following pattern:
> > (simplify
> >  (vec_perm vec_same_elem_p@0 @0 @1)
> >  @0)
> >
> > which simplifies above VEC_PERM_EXPR to:
> > _7 = {0, 0, 0, 0}
> > which is incorrect since _9 and mask have different vector lengths.
> >
> > The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> > only if operand and mask have same number of elements, which seems to fix
> > the issue, and we're left with the following in .optimized dump:
> >[local count: 1073741824]:
> >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... }>;
>
> it would be nice to have this optimized.
>
> -
>  (simplify
>   (vec_perm vec_same_elem_p@0 @0 @1)
> - @0)
> + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> +  @0))
>
> that looks good I think.  Maybe even better use 'type' instead of TREE_TYPE 
> (@1)
> since that's more obviously the return type in which case
>
>   (if (types_match (type, TREE_TYPE (@0))
>
> would be more to the point.
>
> But can't you to simplify this in the !known_eq case do a simple
>
>   { build_vector_from_val (type, the-element); }
>
> ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
>
>  (with { tree el = uniform_vector_p (@0); }
>   (if (el)
>{ build_vector_from_val (type, el); })))
>
> would be the cheapest workaround.
Hi Richard,
Thanks for the suggestions. Using build_vector_from_val simplifies it to:
   [local count: 1073741824]:
  return { 0, ... };

Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
x86_64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
>
> >   return _2;
> >
> > code-gen:
> > l:
> > mov z0.b, #0
> > ret
> >
> > Patch is bootstrapped+tested on aarch64-linux-gnu.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
[aarch64/match.pd] Fix ICE observed in PR110280.

gcc/ChangeLog:
PR tree-optimization/110280
* match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector
using build_vector_from_val with the element of input operand, and
mask's type.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pr110280.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 2dd23826034..76a37297d3c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8672,7 +8672,12 @@ and,
 
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
- @0)
+ (with
+  {
+tree elem = uniform_vector_p (@0);
+  }
+  (if (elem)
+   { build_vector_from_val (type, elem); })))
 
 /* Push VEC_PERM earlier if that may help FMA perception (PR101895).  */
 (simplify
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c
new file mode 100644
index 000..d3279f38362
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+#include "arm_sve.h"
+
+svuint32_t l()
+{
+  _Alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
+  return svld1rq_u32(svptrue_b8(), lanes);
+}
+
+/* 

Re: [COMMITTED] ada: Add CHERI intrinsic bindings and helper functions.

2023-06-22 Thread Marc Poulhiès via Gcc-patches


Hi Alex,

> On 20/06/2023 15:47, Marc Poulhiès wrote:
>> Hi,
>>
>> >> The package Interfaces.CHERI provides intrinsic bindings and
>> >> helper functions to allow software to query, create, and
>> >> manipulate CHERI capabilities.
>> >
>> > I'm curious what the motivation for these intrinsic wrappers is, given that
>> > GCC trunk doesn't currently support them. Out of interest, can you share 
>> > what
>> > the use case for these is?
>>
>> We share the same Ada frontend with different GCC compilers and
>> contribute it in GCC's master branch.
>>
>> You're correct that this particular change is not useful (yet) with
>> master, but we are testing/using it with a CHERI-aware GCC.
>
> Interesting, I was only curious because we (Arm) are maintaining a branch 
> with CHERI
> and Morello support in the ARM/morello vendor branch:
> https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/ARM/heads/morello
>
> is this a different CHERI GCC port that you're referring to?

No, we have used your morello branch as the starting point for our port
which consists in having the latest Ada frontend/runtime work there.

Marc


[PATCH] tree-optimization/110332 - fix ICE with phiprop

2023-06-22 Thread Richard Biener via Gcc-patches
The following fixes an ICE that occurs when we visit an edge
inserted load from the code validating correctness for inserting
an aggregate copy there.  We can simply skip those loads here.

Boostrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110332
* tree-ssa-phiprop.cc (propagate_with_phi): Always
check aliasing with edge inserted loads.

* g++.dg/torture/pr110332.C: New testcase.
* gcc.dg/torture/pr110332-1.c: Likewise.
* gcc.dg/torture/pr110332-2.c: Likewise.
---
 gcc/testsuite/g++.dg/torture/pr110332.C   | 16 
 gcc/testsuite/gcc.dg/torture/pr110332-1.c | 13 +
 gcc/testsuite/gcc.dg/torture/pr110332-2.c | 10 ++
 gcc/tree-ssa-phiprop.cc   | 10 +++---
 4 files changed, 46 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110332.C
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110332-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110332-2.c

diff --git a/gcc/testsuite/g++.dg/torture/pr110332.C 
b/gcc/testsuite/g++.dg/torture/pr110332.C
new file mode 100644
index 000..31dc93ecee4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr110332.C
@@ -0,0 +1,16 @@
+// { dg-do compile }
+
+struct SlotIndex { int lie; };
+SlotIndex si7, si8;
+
+unsigned u9, u6;
+bool b3, b4;
+unsigned () {
+  return b4 ? u6 : u9;
+}
+void transferValues() {
+  unsigned RegIdx;
+  SlotIndex End;
+  RegIdx = value();
+  End = b3 ? si7 : si8;
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr110332-1.c 
b/gcc/testsuite/gcc.dg/torture/pr110332-1.c
new file mode 100644
index 000..438993e8daf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110332-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+struct s { int lie; };
+struct s si7, si8;
+unsigned u9, u6;
+_Bool b3, b4;
+unsigned transferValues(struct s *End) {
+  unsigned RegIdx;
+  unsigned *t = b4 ?  : 
+  RegIdx = *t;
+  *End = *(b3 ?  : );
+  return RegIdx;
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr110332-2.c 
b/gcc/testsuite/gcc.dg/torture/pr110332-2.c
new file mode 100644
index 000..18b656ffb2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110332-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+_Bool a;
+struct s { int t; } c, d;
+unsigned e, f;
+unsigned transferValues(struct s *End) {
+  unsigned RegIdx = *(a ?  : );
+  *End = *(a ?  : );
+  return RegIdx;
+}
diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index 21a349a25e2..8c9ce903472 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -399,14 +399,18 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
 there are no statements that could read from memory
 aliasing the lhs in between the start of bb and use_stmt.
 As we require use_stmt to have a VDEF above, loads after
-use_stmt will use a different virtual SSA_NAME.  */
+use_stmt will use a different virtual SSA_NAME.  When
+we reach an edge inserted load the constraints we place
+on processing guarantees that program order is preserved
+so we can avoid checking those.  */
  FOR_EACH_IMM_USE_FAST (vuse_p, vui, vuse)
{
  vuse_stmt = USE_STMT (vuse_p);
  if (vuse_stmt == use_stmt)
continue;
- if (!dominated_by_p (CDI_DOMINATORS,
-  gimple_bb (vuse_stmt), bb))
+ if (!gimple_bb (vuse_stmt)
+ || !dominated_by_p (CDI_DOMINATORS,
+ gimple_bb (vuse_stmt), bb))
continue;
  if (ref_maybe_used_by_stmt_p (vuse_stmt,
gimple_assign_lhs (use_stmt)))
-- 
2.35.3


Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-22 Thread Philipp Tomsich
This should be covered by PR110308 (proposed fix attached there) and PR110313.
Our bootstrap runs are still in progress to confirm.


On Thu, 22 Jun 2023 at 09:40, Richard Biener  wrote:
>
> On Thu, Jun 22, 2023 at 1:42 AM Thiago Jung Bauermann
>  wrote:
> >
> >
> > Hello,
> >
> > Jeff Law  writes:
> >
> > > On 6/19/23 22:52, Tamar Christina wrote:
> > >
> > >>> It's a bit hackish, but could we reject the stack pointer for operand1 
> > >>> in the
> > >>> stack-tie?  And if we do so, does it help?
> > >> Yeah this one I had to defer until later this week to look at closer 
> > >> because what I'm
> > >> wondering about is whether the optimization should apply to frame related
> > >> RTX as well.
> > >> Looking at the description of RTX_FRAME_RELATED_P that this optimization 
> > >> may
> > >> end up de-optimizing RISC targets by creating an offset that is larger 
> > >> than offset
> > >> which can be used from a SP making reload having to spill.  i.e. 
> > >> sometimes the
> > >> move was explicitly done. So perhaps it should not apply it to
> > >> RTX_FRAME_RELATED_P in find_oldest_value_reg and 
> > >> copyprop_hardreg_forward_1?
> > >> Other parts of this pass already seems to bail out in similar 
> > >> situations. So I needed
> > >> to
> > >> write some testcases to check what would happen in these cases hence the 
> > >> deferral.
> > >> to later in the week.
> > > Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably 
> > > better in general to
> > > me.  The cases where we're looking to clean things up aren't really in the
> > > prologue/epilogue, but instead in the main body after register 
> > > elimination has turned fp
> > > into sp + offset, thus making all kinds of things no longer valid.
> >
> > The problems I reported were fixed by commits:
> >
> > 580b74a79146 "aarch64: Robustify stack tie handling"
> > 079f31c55318 "aarch64: Fix gcc.target/aarch64/sve/pcs failures"
> >
> > Thanks!
> >
> > But unfortunately I'm still seeing bootstrap failures (ICE segmentation
> > fault) in today's trunk with build config bootstrap-lto in both
> > armv8l-linux-gnueabihf and aarch64-linux-gnu.
>
> If there's not yet a bugreport for this please make sure to open one so
> this issue doesn't get lost.
>
> > If I revert commit 6a2e8dcbbd4b "cprop_hardreg: Enable propagation of
> > the stack pointer if possible" from trunk then both bootstraps succeed.
> >
> > Here's the command I'm using to build on armv8l:
> >
> > ~/src/configure \
> > SHELL=/bin/bash \
> > --with-gnu-as \
> > --with-gnu-ld \
> > --disable-libmudflap \
> > --enable-lto \
> > --enable-shared \
> > --without-included-gettext \
> > --enable-nls \
> > --with-system-zlib \
> > --disable-sjlj-exceptions \
> > --enable-gnu-unique-object \
> > --enable-linker-build-id \
> > --disable-libstdcxx-pch \
> > --enable-c99 \
> > --enable-clocale=gnu \
> > --enable-libstdcxx-debug \
> > --enable-long-long \
> > --with-cloog=no \
> > --with-ppl=no \
> > --with-isl=no \
> > --disable-multilib \
> > --with-float=hard \
> > --with-fpu=neon-fp-armv8 \
> > --with-mode=thumb \
> > --with-arch=armv8-a \
> > --enable-threads=posix \
> > --enable-multiarch \
> > --enable-libstdcxx-time=yes \
> > --enable-gnu-indirect-function \
> > --disable-werror \
> > --enable-checking=yes \
> > --enable-bootstrap \
> > --with-build-config=bootstrap-lto \
> > --enable-languages=c,c++,fortran,lto \
> > && make \
> > profiledbootstrap \
> > SHELL=/bin/bash \
> > -w \
> > -j 40 \
> > CFLAGS_FOR_BUILD="-pipe -g -O2" \
> > CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
> > LDFLAGS_FOR_BUILD="-static-libgcc" \
> > MAKEINFOFLAGS=--force \
> > BUILD_INFO="" \
> > MAKEINFO=echo
> >
> > And here's the slightly different one for aarch64-linux:
> >
> > ~/src/configure \
> > SHELL=/bin/bash \
> > --with-gnu-as \
> > --with-gnu-ld \
> > --disable-libmudflap \
> > --enable-lto \
> > --enable-shared \
> > --without-included-gettext \
> > --enable-nls \
> > --with-system-zlib \
> > --disable-sjlj-exceptions \
> > --enable-gnu-unique-object \
> > --enable-linker-build-id \
> > --disable-libstdcxx-pch \
> > --enable-c99 \
> > --enable-clocale=gnu \
> > --enable-libstdcxx-debug \
> > --enable-long-long \
> > --with-cloog=no \
> > --with-ppl=no \
> > --with-isl=no \
> > --disable-multilib \
> > --enable-fix-cortex-a53-835769 \
> > --enable-fix-cortex-a53-843419 \
> > --with-arch=armv8-a \
> > --enable-threads=posix \
> > --enable-multiarch \
> > --enable-libstdcxx-time=yes \
> > --enable-gnu-indirect-function \
> > --disable-werror \
> > --enable-checking=yes \
> > --enable-bootstrap \
> > --with-build-config=bootstrap-lto \
> > 

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-22 Thread Richard Biener via Gcc-patches
On Thu, Jun 22, 2023 at 1:42 AM Thiago Jung Bauermann
 wrote:
>
>
> Hello,
>
> Jeff Law  writes:
>
> > On 6/19/23 22:52, Tamar Christina wrote:
> >
> >>> It's a bit hackish, but could we reject the stack pointer for operand1 in 
> >>> the
> >>> stack-tie?  And if we do so, does it help?
> >> Yeah this one I had to defer until later this week to look at closer 
> >> because what I'm
> >> wondering about is whether the optimization should apply to frame related
> >> RTX as well.
> >> Looking at the description of RTX_FRAME_RELATED_P that this optimization 
> >> may
> >> end up de-optimizing RISC targets by creating an offset that is larger 
> >> than offset
> >> which can be used from a SP making reload having to spill.  i.e. sometimes 
> >> the
> >> move was explicitly done. So perhaps it should not apply it to
> >> RTX_FRAME_RELATED_P in find_oldest_value_reg and 
> >> copyprop_hardreg_forward_1?
> >> Other parts of this pass already seems to bail out in similar situations. 
> >> So I needed
> >> to
> >> write some testcases to check what would happen in these cases hence the 
> >> deferral.
> >> to later in the week.
> > Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably better 
> > in general to
> > me.  The cases where we're looking to clean things up aren't really in the
> > prologue/epilogue, but instead in the main body after register elimination 
> > has turned fp
> > into sp + offset, thus making all kinds of things no longer valid.
>
> The problems I reported were fixed by commits:
>
> 580b74a79146 "aarch64: Robustify stack tie handling"
> 079f31c55318 "aarch64: Fix gcc.target/aarch64/sve/pcs failures"
>
> Thanks!
>
> But unfortunately I'm still seeing bootstrap failures (ICE segmentation
> fault) in today's trunk with build config bootstrap-lto in both
> armv8l-linux-gnueabihf and aarch64-linux-gnu.

If there's not yet a bugreport for this please make sure to open one so
this issue doesn't get lost.

> If I revert commit 6a2e8dcbbd4b "cprop_hardreg: Enable propagation of
> the stack pointer if possible" from trunk then both bootstraps succeed.
>
> Here's the command I'm using to build on armv8l:
>
> ~/src/configure \
> SHELL=/bin/bash \
> --with-gnu-as \
> --with-gnu-ld \
> --disable-libmudflap \
> --enable-lto \
> --enable-shared \
> --without-included-gettext \
> --enable-nls \
> --with-system-zlib \
> --disable-sjlj-exceptions \
> --enable-gnu-unique-object \
> --enable-linker-build-id \
> --disable-libstdcxx-pch \
> --enable-c99 \
> --enable-clocale=gnu \
> --enable-libstdcxx-debug \
> --enable-long-long \
> --with-cloog=no \
> --with-ppl=no \
> --with-isl=no \
> --disable-multilib \
> --with-float=hard \
> --with-fpu=neon-fp-armv8 \
> --with-mode=thumb \
> --with-arch=armv8-a \
> --enable-threads=posix \
> --enable-multiarch \
> --enable-libstdcxx-time=yes \
> --enable-gnu-indirect-function \
> --disable-werror \
> --enable-checking=yes \
> --enable-bootstrap \
> --with-build-config=bootstrap-lto \
> --enable-languages=c,c++,fortran,lto \
> && make \
> profiledbootstrap \
> SHELL=/bin/bash \
> -w \
> -j 40 \
> CFLAGS_FOR_BUILD="-pipe -g -O2" \
> CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
> LDFLAGS_FOR_BUILD="-static-libgcc" \
> MAKEINFOFLAGS=--force \
> BUILD_INFO="" \
> MAKEINFO=echo
>
> And here's the slightly different one for aarch64-linux:
>
> ~/src/configure \
> SHELL=/bin/bash \
> --with-gnu-as \
> --with-gnu-ld \
> --disable-libmudflap \
> --enable-lto \
> --enable-shared \
> --without-included-gettext \
> --enable-nls \
> --with-system-zlib \
> --disable-sjlj-exceptions \
> --enable-gnu-unique-object \
> --enable-linker-build-id \
> --disable-libstdcxx-pch \
> --enable-c99 \
> --enable-clocale=gnu \
> --enable-libstdcxx-debug \
> --enable-long-long \
> --with-cloog=no \
> --with-ppl=no \
> --with-isl=no \
> --disable-multilib \
> --enable-fix-cortex-a53-835769 \
> --enable-fix-cortex-a53-843419 \
> --with-arch=armv8-a \
> --enable-threads=posix \
> --enable-multiarch \
> --enable-libstdcxx-time=yes \
> --enable-gnu-indirect-function \
> --disable-werror \
> --enable-checking=yes \
> --enable-bootstrap \
> --with-build-config=bootstrap-lto \
> --enable-languages=c,c++,fortran,lto \
> && make \
> profiledbootstrap \
> SHELL=/bin/bash \
> -w \
> -j 40 \
> LDFLAGS_FOR_TARGET="-Wl,-fix-cortex-a53-843419" \
> CFLAGS_FOR_BUILD="-pipe -g -O2" \
> CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
> LDFLAGS_FOR_BUILD="-static-libgcc" \
> MAKEINFOFLAGS=--force \
> BUILD_INFO="" \
> MAKEINFO=echo
>
> --
> Thiago


Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-06-22 Thread Richard Biener via Gcc-patches
On Wed, Jun 21, 2023 at 10:39 PM Joseph Myers  wrote:
>
> On Wed, 21 Jun 2023, Richard Biener via Gcc-patches wrote:
>
> > > > int32_t x = (int32_t)0x1.0p32;
> > > > int32_t y = (int32_t)(int64_t)0x1.0p32;
> > > >
> > > > sets x to 2147483647 and y to 0.
> >
> > Hmm, good question.  GENERIC has a direct truncation to unsigned char
> > for example, the C standard generally says if the integral part cannot
> > be represented then the behavior is undefined.  So I think we should be
> > safe here (0x1.0p32 doesn't fit an int).
>
> We should be following Annex F (unspecified value plus "invalid" exception
> for out-of-range floating-to-integer conversions rather than undefined
> behavior).  But we don't achieve that very well at present (see bug 93806
> comments 27-29 for examples of how such conversions produce wobbly
> values).

That would mean guarding this with !flag_trapping_math would be the appropriate
thing to do.

Richard.

> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

2023-06-22 Thread Richard Biener via Gcc-patches
On Wed, 21 Jun 2023, Jeff Law wrote:

> 
> 
> On 6/21/23 01:49, Richard Biener via Gcc-patches wrote:
> > The following addresses a miscompilation by RTL scheduling related
> > to the representation of masked stores.  For that we have
> > 
> > (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ]
> > [90])
> >  (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]
> >  )
> >  (const_int -4 [0xfffc] [1 MEM
> >   [(int *)vectp_b.12_28]+0 S64 A32])
> >  (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
> >  (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
> >  (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]
> >  )
> >  (const_int -4 [0xfffc] [1 MEM
> >   [(int *)vectp_b.12_28]+0 S64
> >  A32])
> > 
> > and specifically the memory attributes
> > 
> >[1 MEM  [(int *)vectp_b.12_28]+0 S64 A32]
> > 
> > are problematic.  They tell us the instruction stores and reads a full
> > vector which it if course does not.  There isn't any good MEM_EXPR
> > we can use here (we lack a way to just specify a pointer and restrict
> > info for example), and since the MEMs have a vector mode it's
> > difficult in general as passes do not need to look at the memory
> > attributes at all.
> > 
> > The easiest way to avoid running into the alias analysis problem is
> > to scrap the MEM_EXPR when we expand the internal functions for
> > partial loads/stores.  That avoids the disambiguation we run into
> > which is realizing that we store to an object of less size as
> > the size of the mode we appear to store.
> > 
> > After the patch we see just
> > 
> >[1  S64 A32]
> > 
> > so we preserve the alias set, the alignment and the size (the size
> > is redundant if the MEM insn't BLKmode).  That's still not good
> > in case the RTL alias oracle would implement the same
> > disambiguation but it fends off the gimple one.
> > 
> > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> > and --param=vect-partial-vector-usage=1.
> > 
> > On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
> > side we might instead want to use a BLKmode MEM?  Any better
> > ideas here?
> I'd expect that using BLKmode will fend off the RTL aliasing code.

I suspect there's no way to specify the desired semantics?  OTOH
code that looks at the MEM operand only and not the insn (which
should have some UNSPEC wrapped) needs to be conservative, so maybe
the alias code shouldn't assume that a (mem:V16SI ..) actually
performs an access of the size of V16SI at the specified location?

Anyway, any opinion on the actual patch?  It's enough to fix the
observed miscompilation.

Thanks,
Richard.


Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()

2023-06-22 Thread Richard Biener via Gcc-patches
On Wed, 21 Jun 2023, Jeff Law wrote:

> 
> 
> On 6/21/23 00:41, Richard Biener wrote:
> >> I thought during the introduction of erroneous path isolation that we
> >> concluded stores, calls and such had observable side effects that must be
> >> preserved, even when we hit a block that leads to __builtin_unreachable.
> > 
> > Indeed, I remember we repeatedly hit this in the past.  But
> > double-checking I see that we instrument
> > 
> >if (x)
> >  *(int *)0 = 0;
> > 
> > as
> > 
> > [local count: 1073741824]:
> >if (x_2(D) != 0)
> >  goto ; [50.00%]
> >else
> >  goto ; [50.00%]
> > 
> > [local count: 536870913]:
> >MEM[(int *)0B] ={v} 0;
> >__builtin_trap ();
> > 
> > path isolation doesn't seem to use __builtin_unreachable ().  I did
> > not add __builtin_trap () as possible sink (but I did want to treat
> > __builtin_unreachable () and __builtin_unreachable_trap () the same
> > way).  The pass also marks the offending store as volatile.
> Nuts.  I mixed up trap vs unreachable in my own head.  Though I think for the
> purposes of this issue they should be treated the same.  The only difference
> is one actively halts the code, the other silently lets it keep going.

I think there's a difference in that __builtin_trap () is observable
while __builtin_unreachable () is not and reaching __builtin_unreachable 
() invokes undefined behavior while reaching __builtin_trap () does not.

So the isolation code marking the trapping code volatile should be
enough and the trap () is just there to end the basic block
(and maybe be on the safe side to really trap).

Richard.

> > So yes, I think preserving the original trap kind (if there is any)
> > is important and it still seems to work.  I don't remember whether
> > we have any test coverage for that though.  I'll also note that
> > __builtin_trap () has virtual operands (def and use) while
> > __builtin_unreachable[_trap] () are 'const'.  Honza correctly
> > says they should probably be novops instead of 'const' preserving
> > the fact that they have side-effects.
> If we have test coverage it's probably minimal -- a few things to validate
> proper behavior around builtin_trap plus whatever regressions came up.
> 
> 
> > I think it's desirable for assertions.  Since we elide plain
> > __builtin_unreachable () and fall thru whereever it leads that
> > shouldn't be an issue.
> > 
> > If I manually add a __builtin_unreachable () to the above case
> > I see the *(int *)0 = 0; store DSEd.  Maybe we should avoid
> > removing stores that might trap here?  POSIX wise such a trap
> > could be a way to jump out of the path leading to unreachable ()
> > via siglongjmp ...
> Yea, removing the store seemswrong .  As you note, someone could have a
> handler to catch the store, then longjump elsewhere to continue some kind of
> sensible execution.
> 
> The erroneous path isolation bits have some code to try and clean up the bogus
> path a bit.  Ideally leaving a single statement with undefined beahvior and
> the trap.  I wonder if there's any code you could re-use from that effort.
> 
> Essentially a mini pass that cleans up paths post-dominated by a
> builtin_unreachable or builtin_trap.


Re: [Patch, fortran] PR107900 Select type with intrinsic type inside associate causes ICE / Segmenation fault

2023-06-22 Thread Paul Richard Thomas via Gcc-patches
Hi Both,

> while I only had a minor question regarding gfc_is_ptr_fcn(),
> can you still try to enlighten me why that second part
> was necessary?  (I believed it to be redundant and may have
> overlooked the obvious.)

Blast! I forgot about checking that. Lurking in the back of my mind
and going back to the first days of OOP in gfortran is a distinction
between a class entity with the pointer attribute and one whose data
component has the class_pointer attribute. I'll check it out and do
whatever is needed.


> > +  else if (context && gfc_is_ptr_fcn (assoc->target))
> > + {
> > +   if (!gfc_notify_std (GFC_STD_F2018, "%qs at %L associated to "
> > +"pointer function target being used in a "
> > +"variable definition context (%s)", name,
> > +>where, context))
>
> I'm curious why you decided to put context in braces and not simply use
> quotes as per %qs?

That's the way it's done in the preceding errors. I had to keep the
context in context, so to speak.

> > +/* Build the associate name  */
> > +static int
> > +build_associate_name (const char *name, gfc_expr **e1, gfc_expr **e2)
> > +{
>
> > +return 1;
>
> > +  return 0;
> > +}
>
> I've gone through the frontend recently and changed several such
> boolean functions to use bool where appropriate. May i ask folks to use
> narrower types in new code, please?
> Iff later in the pipeline it is considered appropriate or benefical to
> promote types, these will eventually be promoted.
>
> > diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
> > index e6a4337c0d2..18589e17843 100644
> > --- a/gcc/fortran/trans-decl.cc
> > +++ b/gcc/fortran/trans-decl.cc
>
> > @@ -1906,6 +1915,7 @@ gfc_get_symbol_decl (gfc_symbol * sym)
> >   gcc_assert (!sym->value || sym->value->expr_type == EXPR_NULL);
> >  }
> >
> > +
>

'twas accidental. There had previously been another version of the fix
that I commented out and the extra line crept in when I deleted it.
Thanks for the spot.

>
> Please kindly excuse my comment and, again, thanks!
>
> >gfc_finish_var_decl (decl, sym);
> >
> >if (sym->ts.type == BT_CHARACTER)

Regards

Paul