Re: [PATCH 2/2] xtensa: Fix conflicting hard regno between indirect sibcall fixups and EH_RETURN_STACKADJ_RTX

2022-07-29 Thread Max Filippov via Gcc-patches
On Fri, Jul 29, 2022 at 12:34 PM Takayuki 'January June' Suwa
 wrote:
>
> The hard register A10 was already allocated for EH_RETURN_STACKADJ_RTX.
> (although exception handling and sibling call may not apply at the same time,
>  but for safety)
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md: Change hard register number used in
> the split patterns for indirect sibling call fixups from 10 to 11,
> the last free one for the CALL0 ABI.
> ---
>  gcc/config/xtensa/xtensa.md | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH 1/2] xtensa: Add RTX costs for if_then_else

2022-07-29 Thread Max Filippov via Gcc-patches
On Fri, Jul 29, 2022 at 12:34 PM Takayuki 'January June' Suwa
 wrote:
>
> It takes one machine instruction for both condtional branch and move.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_rtx_costs):
> Add new case for IF_THEN_ELSE.
> ---
>  gcc/config/xtensa/xtensa.cc | 1 +
>  1 file changed, 1 insertion(+)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH v3] LoongArch: add addr_global attribute

2022-07-29 Thread Lulu Cheng



在 2022/7/30 上午1:17, Xi Ruoyao 写道:

Change v2 to v3:
- Disable section anchor for addr_global symbols.
- Use -O2 in test to make sure section anchor is disabled.

--

Background:
https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379

Question:  Do you have a better name than "addr_global"?


I think the name can be changed to "get_through_got".What do you think of it?



Alternatives:

1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
modules.  It's stupid IMO.
2. Implement a "-maddress-local-with-got" option for GCC and use it for
kernel modules.  It seems too overkill: we might create many unnecessary
GOT entries.
3. For all variables with a section attribute, consider it global.  It
may make sense, but I just checked x86_64 and riscv and they don't do
this.
4. Implement -mcmodel=extreme and use it for kernel modules.  To me
"extreme" seems really too extreme.
5. More hacks in kernel. (Convert relocations against .data..percpu with
objtool?  But objtool is not even implemented for LoongArch yet.)

Note: I'll be mostly AFK in the following week.  My attempt to finish
the kernel support for new relocs before going AFK now failed miserably
:(.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.



Others I think are ok.



Re: [PATCH] analyzer: support for creat, dup, dup2 and dup3 in sm-fd.cc [PR106300]

2022-07-29 Thread David Malcolm via Gcc-patches
On Fri, 2022-07-29 at 21:59 +0530, Immad Mir wrote:
> This patch extends the state machine in sm-fd.cc to support
> creat, dup, dup2 and dup3 functions.

Thanks for the patch.

Please can you use PR 106298 for this (in the ChangeLog and subject
line), rather than 106300, as it's more specific.

It's always a battle to keep the subject line a reasonable length,
especially with an "analyzer: " prefix and a " [PRnn]" suffix.  I
think you can leave off the " in sm-fd.cc" from the subject line, which
will help.

I think the patch is close to ready; review comments inline...

> 
> Lightly tested on x86_64 Linux.
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/106300
> * sm-fd.cc (fd_state_machine::on_open): Add
> creat, dup, dup2 and dup3 functions.
> (enum dup): New.
> (fd_state_machine::valid_to_unchecked_state): New.
> (fd_state_machine::on_creat): New.
> (fd_state_machine::on_dup): New.
> 
> gcc/testsuite/ChangeLog:
> PR analyzer/106300
> * gcc.dg/analyzer/fd-1.c: Add tests for 'creat'.
> * gcc.dg/analyzer/fd-2.c: Likewise.
> * gcc.dg/analyzer/fd-4.c: Likewise.
> * gcc.dg/analyzer/fd-6.c: New tests.

At some point we'll want to add documentation to invoke.texi about what
functions we're special-casing in -fanalyzer, but you can postpone the
sm-fd.cc part of that to a follow-up patch if you like.

[...snip...]

> --- a/gcc/analyzer/sm-fd.cc
> +++ b/gcc/analyzer/sm-fd.cc
> @@ -69,6 +69,13 @@ enum access_directions
>    DIRS_WRITE
>  };
>  
> +enum dup
> +{
> +  DUP_1,
> +  DUP_2,
> +  DUP_3
> +};

Please add a comment documenting this enum.

[...snip...]

> +void
> +fd_state_machine::check_for_dup (sm_context *sm_ctxt, const
> supernode *node,
> +    const gimple *stmt, const gcall
> *call,
> +    const tree callee_fndecl, enum dup
> kind) const
> +{
> +  tree lhs = gimple_call_lhs (call);
> +  tree arg_1 = gimple_call_arg (call, 0);
> +  state_t state_arg_1 = sm_ctxt->get_state (stmt, arg_1);
> +  tree diag_arg_1 = sm_ctxt->get_diagnostic_tree (arg_1);
> +  if (state_arg_1 == m_stop)
> +    return;
> +  if (!(is_constant_fd_p (state_arg_1) || is_valid_fd_p
> (state_arg_1)))
> +    {
> +  sm_ctxt->warn (
> + node, stmt, arg_1,
> + new fd_use_without_check (*this, diag_arg_1,
> callee_fndecl));
> +  if (kind == DUP_1)
> +   return;
> +    }

I don't see test coverage for dup of a closed fd; I think we'd want to
warn for that with an fd_use_after_close.  That ought to be tested for
here, but if I'm reading it right, the check is missing.  Can you reuse
fd_state_machine::check_for_open_fd here, rather than duplicating the
logic for use-without-check and for use-after-close ? (since I think
the code is almost the same, apart, perhaps from that early return when
kind is DUP_1).

> +  switch (kind)
> +    {
> +    case DUP_1:
> +  if (!is_constant_fd_p (state_arg_1))
> +   if (lhs)
> + sm_ctxt->set_next_state (stmt, lhs,
> +  valid_to_unchecked_state
> (state_arg_1));
> +  break;

What happens on dup() of an integer constant?
My understanding is that:
* in terms of POSIX: dup will return either a new FD, or -1.
* as for this patch, the LHS won't be checked (for validity, or leaks).
Shouldn't we transition the LHS to m_unchecked_read_write?  Or am I
missing something?

I don't see test coverage for dup of integer constants - please add
some.


> +
> +    case DUP_2:
> +    case DUP_3:
> +  tree arg_2 = gimple_call_arg (call, 1);
> +  state_t state_arg_2 = sm_ctxt->get_state (stmt, arg_2);
> +  tree diag_arg_2 = sm_ctxt->get_diagnostic_tree (arg_2);
> +  if (state_arg_2 == m_stop)
> +   return;
> +  if (!(is_constant_fd_p (state_arg_2) || is_valid_fd_p
> (state_arg_2)))
> +   {
> + sm_ctxt->warn (
> + node, stmt, arg_2,
> + new fd_use_without_check (*this, diag_arg_2,
> callee_fndecl));
> + return;
> +   }
> +
> +  if (!is_constant_fd_p (state_arg_2))
> +   {
> + if (lhs)
> +   sm_ctxt->set_next_state (stmt, lhs,
> +    valid_to_unchecked_state
> (state_arg_2));

I got a bit confused by the logic here, but I think it's correct. 
Maybe add a comment clarifying that we want to make sure we don't pass
-1 as the 2nd argument, and therefor we want to check for
is_valid_fd_p.

We're not yet modeling that lhs == arg_2 on success, but maybe we don't
need to.  Maybe add a comment to that effect.

[...snip...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/fd-6.c

I get a bit lost when there are lots of numbered test files.  Perhaps
you could renaming this, say, to fd-dup-1.c, to reflect the theme of
what's being tested.  But that's up to you.

[...snip...]

Thanks again for the patch.

Dave



[PATCH, v3] Fortran: detect blanks within literal constants in free-form mode [PR92805]

2022-07-29 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 29.07.22 um 22:36 schrieb Mikael Morin:

Indeed, I overlooked that, but my opinion remains that we shouldn’t play
with fixed vs free form considerations here.
So the options I can see are:
  - handle the locus in get_kind; we do it a lot already in matching
functions, so it wouldn’t be different here.
  - implement a variant of gfc_match_char without space gobbling.
  - use gfc_match(...), which is a bit heavy weight to match a single
char string, but otherwise would keep things concise.

My preference goes to the third option, but I’m fine with either of them
if you have a different one.



how about the attached?

This introduces the helper function gfc_match_next_char, which is
your second option.

Thanks,
Harald
From 0a95d103e4855fbcc20fd24d44b97b690d570333 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 28 Jul 2022 22:07:02 +0200
Subject: [PATCH] Fortran: detect blanks within literal constants in free-form
 mode [PR92805]

gcc/fortran/ChangeLog:

	PR fortran/92805
	* gfortran.h (gfc_match_next_char): Declare it.
	* primary.cc (get_kind): Do not skip over blanks in free-form mode.
	(match_string_constant): Likewise.
	* scanner.cc (gfc_match_next_char): New.  Match next character of
	input, treating whitespace depending on fixed or free form.

gcc/testsuite/ChangeLog:

	PR fortran/92805
	* gfortran.dg/literal_constants.f: New test.
	* gfortran.dg/literal_constants.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/gfortran.h|  1 +
 gcc/fortran/primary.cc| 17 +
 gcc/fortran/scanner.cc| 17 +
 gcc/testsuite/gfortran.dg/literal_constants.f | 20 
 .../gfortran.dg/literal_constants.f90 | 24 +++
 5 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/literal_constants.f
 create mode 100644 gcc/testsuite/gfortran.dg/literal_constants.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 696aadd7db6..645a30e7d49 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3209,6 +3209,7 @@ gfc_char_t gfc_next_char (void);
 char gfc_next_ascii_char (void);
 gfc_char_t gfc_peek_char (void);
 char gfc_peek_ascii_char (void);
+match gfc_match_next_char (gfc_char_t);
 void gfc_error_recovery (void);
 void gfc_gobble_whitespace (void);
 void gfc_new_file (void);
diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 3f01f67cd49..9fa6779200f 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -92,14 +92,17 @@ get_kind (int *is_iso_c)
 {
   int kind;
   match m;
+  char c;
 
   *is_iso_c = 0;
 
-  if (gfc_match_char ('_') != MATCH_YES)
+  if (gfc_match_next_char ('_') != MATCH_YES)
 return -2;
 
-  m = match_kind_param (, is_iso_c);
-  if (m == MATCH_NO)
+  m = MATCH_NO;
+  c = gfc_peek_ascii_char ();
+  if ((gfc_current_form == FORM_FREE && gfc_is_whitespace (c))
+  || (m = match_kind_param (, is_iso_c)) == MATCH_NO)
 gfc_error ("Missing kind-parameter at %C");
 
   return (m == MATCH_YES) ? kind : -1;
@@ -1074,17 +1077,9 @@ match_string_constant (gfc_expr **result)
   c = gfc_next_char ();
 }
 
-  if (c == ' ')
-{
-  gfc_gobble_whitespace ();
-  c = gfc_next_char ();
-}
-
   if (c != '_')
 goto no_match;
 
-  gfc_gobble_whitespace ();
-
   c = gfc_next_char ();
   if (c != '\'' && c != '"')
 goto no_match;
diff --git a/gcc/fortran/scanner.cc b/gcc/fortran/scanner.cc
index 2dff2514700..2d1980c074c 100644
--- a/gcc/fortran/scanner.cc
+++ b/gcc/fortran/scanner.cc
@@ -1690,6 +1690,23 @@ gfc_peek_ascii_char (void)
 }
 
 
+/* Match next character of input.  In fixed form mode, we also ignore
+   spaces.  */
+
+match
+gfc_match_next_char (gfc_char_t c)
+{
+  locus where;
+
+  where = gfc_current_locus;
+  if (gfc_next_char () == c)
+return MATCH_YES;
+
+  gfc_current_locus = where;
+  return MATCH_NO;
+}
+
+
 /* Recover from an error.  We try to get past the current statement
and get lined up for the next.  The next statement follows a '\n'
or a ';'.  We also assume that we are not within a character
diff --git a/gcc/testsuite/gfortran.dg/literal_constants.f b/gcc/testsuite/gfortran.dg/literal_constants.f
new file mode 100644
index 000..4d1f1b7eb4c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/literal_constants.f
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! { dg-options "-ffixed-form" }
+! PR fortran/92805 - blanks within literal constants in fixed-form mode
+
+  implicit none
+  integer, parameter :: ck = kind ("a")  ! default character kind
+  integer, parameter :: rk = kind (1.0)  ! default real kind
+  print *, 1_"abc"
+  print *, 1 _"abc"
+  print *, 1_ "abc"
+  print *, ck_"a"
+  print *, ck _"ab"
+  print *, ck_ "ab"
+  print *, 3.1415_4
+  print *, 3.1415 _4
+  print *, 3.1415_ 4
+  print *, 3.1415_rk
+  print *, 3.1415 _rk
+  

Re: [PATCH, v2] Fortran: detect blanks within literal constants in free-form mode [PR92805]

2022-07-29 Thread Mikael Morin

Le 29/07/2022 à 21:59, Harald Anlauf via Fortran a écrit :


Am 29.07.22 um 13:11 schrieb Mikael Morin:

 > and use gfc_next_char instead of gfc_match_char

in get_kind.


There is one important functionality in gfc_match_char(): it manages
the locus.  We would need then to add this explicitly to get_kind,
which does not look to me like a big improvement over the present
solution.  Otherwise I get test regressions.

Indeed, I overlooked that, but my opinion remains that we shouldn’t play 
with fixed vs free form considerations here.

So the options I can see are:
 - handle the locus in get_kind; we do it a lot already in matching 
functions, so it wouldn’t be different here.

 - implement a variant of gfc_match_char without space gobbling.
 - use gfc_match(...), which is a bit heavy weight to match a single 
char string, but otherwise would keep things concise.


My preference goes to the third option, but I’m fine with either of them 
if you have a different one.


Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 06:03:18PM +0100, Andrew Stubbs wrote:
> On 29/07/2022 16:59, Jakub Jelinek wrote:
> > Doing the fold_convert seems to be a wasted effort to me.
> > Can't this be done conditional on whether some change is needed at all
> > and just using gimple_build_assign with NOP_EXPR, so something like:
> 
> I'm just not familiar enough with this stuff to run fold_convert in my head
> with confidence.

The thing with fold_convert is that if some conversion is needed (and
fold_convert actually is strict, so even if the conversion is useless
but the type isn't exactly the same) it creates a NOP_EXPR around the
argument, and then gimple_build_assign notices it should create a NOP_EXPR
assign rhs op and just uses the argument of NOP_EXPR, where the NOP_EXPR
will be GC later.
Plus, if the conversion isn't needed, it creates an extra assignment that
will be only later in some other pass optimized away.
> 
> >   tree shift_cvt_conv = shift_cnt;
> >   if (!useless_type_conversion_p (TREE_TYPE (mask),
> >   TREE_TYPE (shift_cnt)))
> > {
> >   shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
> >   g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
> >   gsi_insert_after (, g, GSI_CONTINUE_LINKING);
> > }
> > 
> 
> Your version gives the same output mine does, at least on amdgcn anyway.
> 
> Am I OK to commit this version?

Yes, thanks.

> openmp-simd-clone: Match shift types
> 
> Ensure that both parameters to vector shifts use the same mode.  This is most
> important for amdgcn where the masks are DImode.
> 
> gcc/ChangeLog:
> 
>   * omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
>   the mask type.
> 
> Co-authored-by: Jakub Jelinek  
> 
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index 32649bc3f9a..58bd68b129b 100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -1305,8 +1305,16 @@ simd_clone_adjust (struct cgraph_node *node)
>  build_int_cst (TREE_TYPE (iter1), c));
> gsi_insert_after (, g, GSI_CONTINUE_LINKING);
>   }
> +   tree shift_cnt_conv = shift_cnt;
> +   if (!useless_type_conversion_p (TREE_TYPE (mask),
> +   TREE_TYPE (shift_cnt)))
> + {
> +   shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
> +   g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
> +   gsi_insert_after (, g, GSI_CONTINUE_LINKING);
> + }
> g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
> -RSHIFT_EXPR, mask, shift_cnt);
> +RSHIFT_EXPR, mask, shift_cnt_conv);
> gsi_insert_after (, g, GSI_CONTINUE_LINKING);
> mask = gimple_assign_lhs (g);
> g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),


Jakub



Re: [PATCH, v2] Fortran: fix invalid rank error in ASSOCIATED when rank is remapped [PR77652]

2022-07-29 Thread Harald Anlauf via Gcc-patches

Am 28.07.22 um 22:19 schrieb Mikael Morin:

Hello,

Le 27/07/2022 à 21:45, Harald Anlauf via Fortran a écrit :

ok, I have thought about your comments in the review process,
and played with the Cray compiler.  Attached is a refined version
of the patch that now rejects in addition these cases for which there
are no possible related pointer assignments with bounds remapping:

   ASSOCIATED (scalar, array) ! impossible, cannot remap bounds
   ASSOCIATED (array, scalar) ! a scalar is not simply contiguous


In principle, it could make sense to construct a one-sized array pointer
(of any rank) pointing to a scalar, but I agree that if we follow the
rules of the standard to the letter, it should be rejected (and we do
reject such a pointer assignment).
So, with this case eliminated, this patch looks good to me as is.


OK, so I will push that version soon.


Regarding Toon’s suggestion to ask the fortran committee, it sounds
sensible.  I propose to prepare something tomorrow.



Depending on the answer we can later refine the compile-time check
and relax if needed.

Harald


[PATCH, v2] Fortran: detect blanks within literal constants in free-form mode [PR92805]

2022-07-29 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 29.07.22 um 13:11 schrieb Mikael Morin:

Hello,

Le 28/07/2022 à 22:11, Harald Anlauf via Fortran a écrit :

Dear all,

in free-form mode, blanks are significant, so they cannot appear
in literal constants, especially not before or after the "_" that
separates the literal and the kind specifier.

The initial patch from Steve addressed numerical literals, which
I completed by adjusting the parsing of string literals.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


It looks correct, but I think we should continue to have the free vs
fixed form abstracted away from the parsing code.


yes, that makes sense.


So, I suggest instead to remove the calls to gfc_gobble_whitespace in
match_string_constant,


Indeed, removing these simplifies the function and indeed works!

> and use gfc_next_char instead of gfc_match_char

in get_kind.


There is one important functionality in gfc_match_char(): it manages
the locus.  We would need then to add this explicitly to get_kind,
which does not look to me like a big improvement over the present
solution.  Otherwise I get test regressions.


Mikael



I've attached a revised version with improved match_string_constant().
What do you think?

Thanks,
Harald
From f8e7c297b7c9e5a2b22185c7e0d638764c33aa71 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 28 Jul 2022 22:07:02 +0200
Subject: [PATCH] Fortran: detect blanks within literal constants in free-form
 mode [PR92805]

gcc/fortran/ChangeLog:

	PR fortran/92805
	* primary.cc (get_kind): Do not skip over blanks in free-form mode.
	(match_string_constant): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/92805
	* gfortran.dg/literal_constants.f: New test.
	* gfortran.dg/literal_constants.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/primary.cc| 19 +++
 gcc/testsuite/gfortran.dg/literal_constants.f | 20 
 .../gfortran.dg/literal_constants.f90 | 24 +++
 3 files changed, 53 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/literal_constants.f
 create mode 100644 gcc/testsuite/gfortran.dg/literal_constants.f90

diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 3f01f67cd49..604f98a8dd9 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -92,14 +92,21 @@ get_kind (int *is_iso_c)
 {
   int kind;
   match m;
+  char c;
 
   *is_iso_c = 0;
 
+  c = gfc_peek_ascii_char ();
+  if (gfc_current_form == FORM_FREE && gfc_is_whitespace (c))
+return -2;
+
   if (gfc_match_char ('_') != MATCH_YES)
 return -2;
 
-  m = match_kind_param (, is_iso_c);
-  if (m == MATCH_NO)
+  m = MATCH_NO;
+  c = gfc_peek_ascii_char ();
+  if ((gfc_current_form == FORM_FREE && gfc_is_whitespace (c))
+  || (m = match_kind_param (, is_iso_c)) == MATCH_NO)
 gfc_error ("Missing kind-parameter at %C");
 
   return (m == MATCH_YES) ? kind : -1;
@@ -1074,17 +1081,9 @@ match_string_constant (gfc_expr **result)
   c = gfc_next_char ();
 }
 
-  if (c == ' ')
-{
-  gfc_gobble_whitespace ();
-  c = gfc_next_char ();
-}
-
   if (c != '_')
 goto no_match;
 
-  gfc_gobble_whitespace ();
-
   c = gfc_next_char ();
   if (c != '\'' && c != '"')
 goto no_match;
diff --git a/gcc/testsuite/gfortran.dg/literal_constants.f b/gcc/testsuite/gfortran.dg/literal_constants.f
new file mode 100644
index 000..4d1f1b7eb4c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/literal_constants.f
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! { dg-options "-ffixed-form" }
+! PR fortran/92805 - blanks within literal constants in fixed-form mode
+
+  implicit none
+  integer, parameter :: ck = kind ("a")  ! default character kind
+  integer, parameter :: rk = kind (1.0)  ! default real kind
+  print *, 1_"abc"
+  print *, 1 _"abc"
+  print *, 1_ "abc"
+  print *, ck_"a"
+  print *, ck _"ab"
+  print *, ck_ "ab"
+  print *, 3.1415_4
+  print *, 3.1415 _4
+  print *, 3.1415_ 4
+  print *, 3.1415_rk
+  print *, 3.1415 _rk
+  print *, 3.1415_ rk
+  end
diff --git a/gcc/testsuite/gfortran.dg/literal_constants.f90 b/gcc/testsuite/gfortran.dg/literal_constants.f90
new file mode 100644
index 000..f8908f9ad76
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/literal_constants.f90
@@ -0,0 +1,24 @@
+! { dg-do compile }
+! { dg-options "-ffree-form" }
+! PR fortran/92805 - blanks within literal constants in free-form mode
+
+  implicit none
+  integer, parameter :: ck = kind ("a")  ! default character kind
+  integer, parameter :: rk = kind (1.0)  ! default real kind
+  print *, 1_"abc"
+  print *, 1 _"abc"   ! { dg-error "Syntax error" }
+  print *, 1_ "abc"   ! { dg-error "Missing kind-parameter" }
+  print *, 1 _ "abc"  ! { dg-error "Syntax error" }
+  print *, ck_"a"
+  print *, ck _"ab"   ! { dg-error "Syntax error" }
+  print *, ck_ "ab"   ! { dg-error "Syntax error" }
+ 

Re: [GCC13][Patch][V2][1/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-07-29 Thread Qing Zhao via Gcc-patches
Hi, Richard,

Thanks a lot for your comments and suggestions. (And sorry for my late reply).

> On Jul 28, 2022, at 3:26 AM, Richard Biener  wrote:
> 
> On Tue, 19 Jul 2022, Qing Zhao wrote:
> 
>> From 3854004802b8e2f132ebf218fc35a632f5e80c6a Mon Sep 17 00:00:00 2001
>> From: Qing Zhao 
>> Date: Mon, 18 Jul 2022 17:04:12 +
>> Subject: [PATCH 1/2] Add a new option -fstrict-flex-array[=n] and new
>> attribute strict_flex_array
>> 
>> Add the following new option -fstrict-flex-array[=n] and a corresponding
>> attribute strict_flex_array to GCC:
>> 
>> '-fstrict-flex-array'
>>Treat the trailing array of a structure as a flexible array member
>>in a stricter way.  The positive form is equivalent to
>>'-fstrict-flex-array=3', which is the strictest.  A trailing array
>>is treated as a flexible array member only when it is declared as a
>>flexible array member per C99 standard onwards.  The negative form
>>is equivalent to '-fstrict-flex-array=0', which is the least
>>strict.  All trailing arrays of structures are treated as flexible
>>array members.
>> 
>> '-fstrict-flex-array=LEVEL'
>>Treat the trailing array of a structure as a flexible array member
>>in a stricter way.  The value of LEVEL controls the level of
>>strictness.
>> 
>>The possible values of LEVEL are the same as for the
>>'strict_flex_array' attribute (*note Variable Attributes::).
>> 
>>You can control this behavior for a specific trailing array field
>>of a structure by using the variable attribute 'strict_flex_array'
>>attribute (*note Variable Attributes::).
>> 
>> 'strict_flex_array (LEVEL)'
>>The 'strict_flex_array' attribute should be attached to the
>>trailing array field of a structure.  It specifies the level of
>>strictness of treating the trailing array field of a structure as a
>>flexible array member.  LEVEL must be an integer betwen 0 to 3.
>> 
>>LEVEL=0 is the least strict level, all trailing arrays of
>>structures are treated as flexible array members.  LEVEL=3 is the
>>strictest level, only when the trailing array is declared as a
>>flexible array member per C99 standard onwards ([]), it is treated
>>as a flexible array member.
>> 
>>There are two more levels in between 0 and 3, which are provided to
>>support older codes that use GCC zero-length array extension ([0])
>>or one-size array as flexible array member ([1]): When LEVEL is 1,
>>the trailing array is treated as a flexible array member when it is
>>declared as either [], [0], or [1]; When LEVEL is 2, the trailing
>>array is treated as a flexible array member when it is declared as
>>either [], or [0].
>> 
>>This attribute can be used with or without '-fstrict-flex-array'.
>>When both the attribute and the option present at the same time,
>>the level of the strictness for the specific trailing array field
>>is determined by the attribute.
>> 
>> gcc/c-family/ChangeLog:
>> 
>>  * c-attribs.cc (handle_strict_flex_array_attribute): New function.
>>  (c_common_attribute_table): New item for strict_flex_array.
>>  * c.opt (fstrict-flex-array): New option.
>>  (fstrict-flex-array=): New option.
>> 
>> gcc/c/ChangeLog:
>> 
>>  * c-decl.cc (add_flexible_array_elts_to_size): Call new utility
>>  routine flexible_array_member_p.
>>  (is_flexible_array_member_p): New function.
>>  (finish_struct): Set the new DECL_NOT_FLEXARRAY flag.
>> 
>> gcc/ChangeLog:
>> 
>>  * doc/extend.texi: Document strict_flex_array attribute.
>>  * doc/invoke.texi: Document -fstrict-flex-array[=n] option.
>>  * tree-core.h (struct tree_decl_common): New bit field
>>  decl_not_flexarray.
>>  * tree.cc (component_ref_size): Reorg by using new utility functions.
>>  (flexible_array_member_p): New function.
>>  (zero_length_array_p): Likewise.
>>  (one_element_array_p): Likewise.
>>  (flexible_array_type_p): Likewise.
>>  * tree.h (DECL_NOT_FLEXARRAY): New flag.
>>  (zero_length_array_p): New function prototype.
>>  (one_element_array_p): Likewise.
>>  (flexible_array_member_p): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/strict-flex-array-1.c: New test.
>> ---
>> gcc/c-family/c-attribs.cc  |  47 
>> gcc/c-family/c.opt |   7 ++
>> gcc/c/c-decl.cc|  91 +--
>> gcc/doc/extend.texi|  25 
>> gcc/doc/invoke.texi|  27 -
>> gcc/testsuite/gcc.dg/strict-flex-array-1.c |  31 +
>> gcc/tree-core.h|   5 +-
>> gcc/tree.cc| 130 ++---
>> gcc/tree.h |  16 ++-
>> 9 files changed, 322 insertions(+), 57 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/strict-flex-array-1.c
>> 
>> diff --git a/gcc/c-family/c-attribs.cc 

[PATCH 2/2] xtensa: Fix conflicting hard regno between indirect sibcall fixups and EH_RETURN_STACKADJ_RTX

2022-07-29 Thread Takayuki 'January June' Suwa via Gcc-patches
The hard register A10 was already allocated for EH_RETURN_STACKADJ_RTX.
(although exception handling and sibling call may not apply at the same time,
 but for safety)

gcc/ChangeLog:

* config/xtensa/xtensa.md: Change hard register number used in
the split patterns for indirect sibling call fixups from 10 to 11,
the last free one for the CALL0 ABI.
---
 gcc/config/xtensa/xtensa.md | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 899ce2755aa..1294aab6c5d 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -25,7 +25,7 @@
   (A7_REG  7)
   (A8_REG  8)
   (A9_REG  9)
-  (A10_REG 10)
+  (A11_REG 11)
 
   (UNSPEC_NOP  2)
   (UNSPEC_PLT  3)
@@ -2295,9 +2295,9 @@
   "reload_completed
&& !TARGET_WINDOWED_ABI && SIBLING_CALL_P (insn)
&& ! call_used_or_fixed_reg_p (REGNO (operands[0]))"
-  [(set (reg:SI A10_REG)
+  [(set (reg:SI A11_REG)
(match_dup 0))
-   (call (mem:SI (reg:SI A10_REG))
+   (call (mem:SI (reg:SI A11_REG))
 (match_dup 1))])
 
 (define_expand "sibcall_value"
@@ -2328,10 +2328,10 @@
   "reload_completed
&& !TARGET_WINDOWED_ABI && SIBLING_CALL_P (insn)
&& ! call_used_or_fixed_reg_p (REGNO (operands[1]))"
-  [(set (reg:SI A10_REG)
+  [(set (reg:SI A11_REG)
(match_dup 1))
(set (match_dup 0)
-   (call (mem:SI (reg:SI A10_REG))
+   (call (mem:SI (reg:SI A11_REG))
  (match_dup 2)))])
 
 (define_insn "entry"
-- 
2.20.1


[PATCH 1/2] xtensa: Add RTX costs for if_then_else

2022-07-29 Thread Takayuki 'January June' Suwa via Gcc-patches
It takes one machine instruction for both condtional branch and move.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_rtx_costs):
Add new case for IF_THEN_ELSE.
---
 gcc/config/xtensa/xtensa.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index a851a7ae6b3..6ac879c38fb 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -4273,6 +4273,7 @@ xtensa_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case ZERO_EXTRACT:
 case ZERO_EXTEND:
+case IF_THEN_ELSE:
   *total = COSTS_N_INSNS (1);
   return true;
 
-- 
2.20.1


[PATCH v3] LoongArch: add addr_global attribute

2022-07-29 Thread Xi Ruoyao via Gcc-patches
Change v2 to v3:
- Disable section anchor for addr_global symbols.
- Use -O2 in test to make sure section anchor is disabled.

--

Background:
https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379

Question:  Do you have a better name than "addr_global"?

Alternatives:

1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
modules.  It's stupid IMO.
2. Implement a "-maddress-local-with-got" option for GCC and use it for
kernel modules.  It seems too overkill: we might create many unnecessary
GOT entries.
3. For all variables with a section attribute, consider it global.  It
may make sense, but I just checked x86_64 and riscv and they don't do
this.
4. Implement -mcmodel=extreme and use it for kernel modules.  To me
"extreme" seems really too extreme.
5. More hacks in kernel. (Convert relocations against .data..percpu with
objtool?  But objtool is not even implemented for LoongArch yet.)

Note: I'll be mostly AFK in the following week.  My attempt to finish
the kernel support for new relocs before going AFK now failed miserably
:(.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_attribute_table):
New attribute table.
(TARGET_ATTRIBUTE_TABLE): Define the target hook.
(loongarch_handle_addr_global_attribute): New static function.
(loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
SYMBOL_REF_DECL with addr_global attribute.
(loongarch_use_anchors_for_symbol_p): New static function.
(TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook.
* doc/extend.texi (Variable Attributes): Document new
LoongArch specific attribute.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/addr-global.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 61 +++
 gcc/doc/extend.texi   | 17 ++
 .../gcc.target/loongarch/addr-global.c| 28 +
 3 files changed, 106 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..db6f84d4e66 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x)
   && !loongarch_symbol_binds_local_p (x))
 return SYMBOL_GOT_DISP;
 
+  if (SYMBOL_REF_P (x))
+{
+  tree decl = SYMBOL_REF_DECL (x);
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+   return SYMBOL_GOT_DISP;
+}
+
   return SYMBOL_PCREL;
 }
 
@@ -6068,6 +6075,54 @@ loongarch_starting_frame_offset (void)
   return crtl->outgoing_args_size;
 }
 
+static tree
+loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int,
+bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+{
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "addr_global", 0, 0, true, false, false, false,
+loongarch_handle_addr_global_attribute, NULL },
+  /* The last attribute spec is set to be NULL.  */
+  {}
+};
+
+bool
+loongarch_use_anchors_for_symbol_p (const_rtx symbol)
+{
+  tree decl = SYMBOL_REF_DECL (symbol);
+
+  /* addr_global indicates we don't know how the linker will locate the symbol,
+ so the use of anchor may cause relocation overflow.  */
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+return false;
+
+  return default_use_anchors_for_symbol_p (symbol);
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6311,12 @@ loongarch_starting_frame_offset (void)
 #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef  TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE 

Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs

On 29/07/2022 16:59, Jakub Jelinek wrote:

Doing the fold_convert seems to be a wasted effort to me.
Can't this be done conditional on whether some change is needed at all
and just using gimple_build_assign with NOP_EXPR, so something like:


I'm just not familiar enough with this stuff to run fold_convert in my 
head with confidence.



  tree shift_cvt_conv = shift_cnt;
  if (!useless_type_conversion_p (TREE_TYPE (mask),
  TREE_TYPE (shift_cnt)))
{
  shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
  g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
  gsi_insert_after (, g, GSI_CONTINUE_LINKING);
}



Your version gives the same output mine does, at least on amdgcn anyway.

Am I OK to commit this version?

Andrew
openmp-simd-clone: Match shift types

Ensure that both parameters to vector shifts use the same mode.  This is most
important for amdgcn where the masks are DImode.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
the mask type.

Co-authored-by: Jakub Jelinek  

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 32649bc3f9a..58bd68b129b 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -1305,8 +1305,16 @@ simd_clone_adjust (struct cgraph_node *node)
   build_int_cst (TREE_TYPE (iter1), c));
  gsi_insert_after (, g, GSI_CONTINUE_LINKING);
}
+ tree shift_cnt_conv = shift_cnt;
+ if (!useless_type_conversion_p (TREE_TYPE (mask),
+ TREE_TYPE (shift_cnt)))
+   {
+ shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
+ g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
+ gsi_insert_after (, g, GSI_CONTINUE_LINKING);
+   }
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
-  RSHIFT_EXPR, mask, shift_cnt);
+  RSHIFT_EXPR, mask, shift_cnt_conv);
  gsi_insert_after (, g, GSI_CONTINUE_LINKING);
  mask = gimple_assign_lhs (g);
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),


[PATCH] analyzer: support for creat, dup, dup2 and dup3 in sm-fd.cc [PR106300]

2022-07-29 Thread Immad Mir via Gcc-patches
This patch extends the state machine in sm-fd.cc to support
creat, dup, dup2 and dup3 functions.

Lightly tested on x86_64 Linux.

gcc/analyzer/ChangeLog:
PR analyzer/106300
* sm-fd.cc (fd_state_machine::on_open): Add
creat, dup, dup2 and dup3 functions.
(enum dup): New.
(fd_state_machine::valid_to_unchecked_state): New.
(fd_state_machine::on_creat): New.
(fd_state_machine::on_dup): New.

gcc/testsuite/ChangeLog:
PR analyzer/106300
* gcc.dg/analyzer/fd-1.c: Add tests for 'creat'.
* gcc.dg/analyzer/fd-2.c: Likewise.
* gcc.dg/analyzer/fd-4.c: Likewise.
* gcc.dg/analyzer/fd-6.c: New tests.

Signed-off-by: Immad Mir 
---
 gcc/analyzer/sm-fd.cc| 117 ++-
 gcc/testsuite/gcc.dg/analyzer/fd-1.c |  21 
 gcc/testsuite/gcc.dg/analyzer/fd-2.c |  15 +++
 gcc/testsuite/gcc.dg/analyzer/fd-4.c |  31 -
 gcc/testsuite/gcc.dg/analyzer/fd-6.c | 168 +++
 5 files changed, 350 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-6.c

diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index ed923ade100..7906034599c 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -69,6 +69,13 @@ enum access_directions
   DIRS_WRITE
 };
 
+enum dup
+{
+  DUP_1,
+  DUP_2,
+  DUP_3
+};
+
 class fd_state_machine : public state_machine
 {
 public:
@@ -114,7 +121,9 @@ public:
   bool is_readonly_fd_p (state_t s) const;
   bool is_writeonly_fd_p (state_t s) const;
   enum access_mode get_access_mode_from_flag (int flag) const;
-
+  /* Function for one-to-one correspondence between valid
+ and unchecked states.  */
+  state_t valid_to_unchecked_state (state_t state) const;
   /* State for a constant file descriptor (>= 0) */
   state_t m_constant_fd;
 
@@ -147,6 +156,8 @@ public:
 private:
   void on_open (sm_context *sm_ctxt, const supernode *node, const gimple *stmt,
const gcall *call) const;
+  void on_creat (sm_context *sm_ctxt, const supernode *node, const gimple 
*stmt,
+   const gcall *call) const;
   void on_close (sm_context *sm_ctxt, const supernode *node, const gimple 
*stmt,
 const gcall *call) const;
   void on_read (sm_context *sm_ctxt, const supernode *node, const gimple *stmt,
@@ -170,6 +181,9 @@ private:
   const gimple *stmt, const gcall *call,
   const tree callee_fndecl, const char *attr_name,
   access_directions fd_attr_access_dir) const;
+  void check_for_dup (sm_context *sm_ctxt, const supernode *node,
+   const gimple *stmt, const gcall *call, const tree callee_fndecl,
+   enum dup kind) const;
 };
 
 /* Base diagnostic class relative to fd_state_machine.  */
@@ -723,6 +737,20 @@ fd_state_machine::is_constant_fd_p (state_t state) const
   return (state == m_constant_fd);
 }
 
+fd_state_machine::state_t
+fd_state_machine::valid_to_unchecked_state (state_t state) const
+{
+  if (state == m_valid_read_write)
+return m_unchecked_read_write;
+  else if (state == m_valid_write_only)
+return m_unchecked_write_only;
+  else if (state == m_valid_read_only)
+return m_unchecked_read_only;
+  else
+gcc_unreachable ();
+  return NULL;
+}
+
 bool
 fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
   const gimple *stmt) const
@@ -736,6 +764,11 @@ fd_state_machine::on_stmt (sm_context *sm_ctxt, const 
supernode *node,
return true;
  } //  "open"
 
+   if (is_named_call_p (callee_fndecl, "creat", call, 2))
+ {
+   on_creat (sm_ctxt, node, stmt, call);
+ } // "creat"
+
if (is_named_call_p (callee_fndecl, "close", call, 1))
  {
on_close (sm_ctxt, node, stmt, call);
@@ -754,6 +787,23 @@ fd_state_machine::on_stmt (sm_context *sm_ctxt, const 
supernode *node,
return true;
  } // "read"
 
+   if (is_named_call_p (callee_fndecl, "dup", call, 1))
+ {
+   check_for_dup (sm_ctxt, node, stmt, call, callee_fndecl, DUP_1);
+   return true;
+ }
+
+   if (is_named_call_p (callee_fndecl, "dup2", call, 2))
+ {
+   check_for_dup (sm_ctxt, node, stmt, call, callee_fndecl, DUP_2);
+   return true;
+ }
+
+   if (is_named_call_p (callee_fndecl, "dup3", call, 3))
+ {
+   check_for_dup (sm_ctxt, node, stmt, call, callee_fndecl, DUP_3);
+   return true;
+ }
 
{
  // Handle __attribute__((fd_arg))
@@ -899,6 +949,71 @@ fd_state_machine::on_open (sm_context *sm_ctxt, const 
supernode *node,
 }
 }
 
+void
+fd_state_machine::on_creat (sm_context *sm_ctxt, const supernode *node,
+   const gimple *stmt, const gcall *call) const
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs)
+sm_ctxt->on_transition (node, stmt, lhs, m_start, 

Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 04:53:51PM +0100, Andrew Stubbs wrote:
> This patch adjusts the generation of SIMD "inbranch" clones that use integer
> masks to ensure that it vectorizes on amdgcn.
> 
> The problem was only that an amdgcn mask is DImode and the shift amount was
> SImode, and the difference causes vectorization to fail.
> 
> OK for mainline?
> 
> Andrew

> openmp-simd-clone: Match shift types
> 
> Ensure that both parameters to vector shifts use the same mode.  This is most
> important for amdgcn where the masks are DImode.
> 
> gcc/ChangeLog:
> 
>   * omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
>   the mask type.
> 
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index 32649bc3f9a..5d3a90730e7 100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -1305,8 +1305,12 @@ simd_clone_adjust (struct cgraph_node *node)
>  build_int_cst (TREE_TYPE (iter1), c));
> gsi_insert_after (, g, GSI_CONTINUE_LINKING);
>   }
> +   tree shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
> +   g = gimple_build_assign (shift_cnt_conv,
> +fold_convert (TREE_TYPE (mask), shift_cnt));
> +   gsi_insert_after (, g, GSI_CONTINUE_LINKING);

Doing the fold_convert seems to be a wasted effort to me.
Can't this be done conditional on whether some change is needed at all
and just using gimple_build_assign with NOP_EXPR, so something like:
  tree shift_cvt_conv = shift_cnt;
  if (!useless_type_conversion_p (TREE_TYPE (mask),
  TREE_TYPE (shift_cnt)))
{
  shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
  g = gimple_build_assign (shift_cnt_conv, NOP_EXPR, shift_cnt);
  gsi_insert_after (, g, GSI_CONTINUE_LINKING);
}

> g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
> -RSHIFT_EXPR, mask, shift_cnt);
> +RSHIFT_EXPR, mask, shift_cnt_conv);
> gsi_insert_after (, g, GSI_CONTINUE_LINKING);
> mask = gimple_assign_lhs (g);
> g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),

?

Jakub



[PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs
This patch adjusts the generation of SIMD "inbranch" clones that use 
integer masks to ensure that it vectorizes on amdgcn.


The problem was only that an amdgcn mask is DImode and the shift amount 
was SImode, and the difference causes vectorization to fail.


OK for mainline?

Andrewopenmp-simd-clone: Match shift types

Ensure that both parameters to vector shifts use the same mode.  This is most
important for amdgcn where the masks are DImode.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust): Convert shift_cnt to match
the mask type.

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 32649bc3f9a..5d3a90730e7 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -1305,8 +1305,12 @@ simd_clone_adjust (struct cgraph_node *node)
   build_int_cst (TREE_TYPE (iter1), c));
  gsi_insert_after (, g, GSI_CONTINUE_LINKING);
}
+ tree shift_cnt_conv = make_ssa_name (TREE_TYPE (mask));
+ g = gimple_build_assign (shift_cnt_conv,
+  fold_convert (TREE_TYPE (mask), shift_cnt));
+ gsi_insert_after (, g, GSI_CONTINUE_LINKING);
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),
-  RSHIFT_EXPR, mask, shift_cnt);
+  RSHIFT_EXPR, mask, shift_cnt_conv);
  gsi_insert_after (, g, GSI_CONTINUE_LINKING);
  mask = gimple_assign_lhs (g);
  g = gimple_build_assign (make_ssa_name (TREE_TYPE (mask)),


[PATCH v2] LoongArch: add addr_global attribute

2022-07-29 Thread Xi Ruoyao via Gcc-patches
Don't look at V1 patch: I selected wrong file and sent a draft with bugs
mistakenly.

Background:
https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379

Question:  Do you have a better name than "addr_global"?

Alternatives:

1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
modules.  It's stupid IMO.
2. Implement a "-maddress-local-with-got" option for GCC and use it for
kernel modules.  It seems too overkill: we might create many unnecessary
GOT entries.
3. For all variables with a section attribute, consider it global.  It
may make sense, but I just checked x86_64 and riscv and they don't do
this.
4. Implement -mcmodel=extreme and use it for kernel modules.  To me
"extreme" seems really too extreme.
5. More hacks in kernel. (Convert relocations against .data..percpu with
objtool?  But objtool is not even implemented for LoongArch yet.)

Note: I'll be mostly AFK in the following week.  My attempt to finish
the kernel support for new relocs before going AFK now failed miserably
:(.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_attribute_table):
New attribute table.
(TARGET_ATTRIBUTE_TABLE): Define the target hook.
(loongarch_handle_addr_global_attribute): New static function.
(loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
SYMBOL_REF_DECL with addr_global attribute.
* doc/extend.texi (Variable Attributes): Document new
LoongArch specific attribute.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/addr-global.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 45 +++
 gcc/doc/extend.texi   | 17 +++
 .../gcc.target/loongarch/addr-global.c| 28 
 3 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..cdf6bf44e36 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x)
   && !loongarch_symbol_binds_local_p (x))
 return SYMBOL_GOT_DISP;
 
+  if (SYMBOL_REF_P (x))
+{
+  tree decl = SYMBOL_REF_DECL (x);
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+   return SYMBOL_GOT_DISP;
+}
+
   return SYMBOL_PCREL;
 }
 
@@ -6068,6 +6075,41 @@ loongarch_starting_frame_offset (void)
   return crtl->outgoing_args_size;
 }
 
+static tree
+loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int,
+bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+{
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "addr_global", 0, 0, true, false, false, false,
+loongarch_handle_addr_global_attribute, NULL },
+  /* The last attribute spec is set to be NULL.  */
+  {}
+};
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6298,9 @@ loongarch_starting_frame_offset (void)
 #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef  TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-loongarch.h"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7fe7f8817cd..4a1cd7ab5e4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7314,6 +7314,7 @@ attributes.
 * Blackfin Variable Attributes::
 * H8/300 Variable Attributes::
 * IA-64 Variable Attributes::
+* LoongArch Variable Attributes::
 * M32R/D Variable Attributes::
 * MeP Variable Attributes::
 * Microsoft Windows Variable Attributes::
@@ -8098,6 +8099,22 @@ defined by shared libraries.
 
 @end table
 

Re: [GCC13][Patch][V2][2/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-07-29 Thread Qing Zhao via Gcc-patches


> On Jul 28, 2022, at 3:28 AM, Richard Biener  wrote:
> 
> On Tue, 19 Jul 2022, Qing Zhao wrote:
> 
>> From a09f39ded462611286a44d9e8273de8342673ba2 Mon Sep 17 00:00:00 2001
>> From: Qing Zhao 
>> Date: Mon, 18 Jul 2022 18:12:26 +
>> Subject: [PATCH 2/2] Use new flag DECL_NOT_FLEXARRAY in __builtin_object_size
>> [PR101836]
>> 
>> Use new flag DECL_NOT_FLEXARRAY to determine whether the trailing array
>> of a structure is flexible array member in __builtin_object_size.
>> 
>> gcc/ChangeLog:
>> 
>>  PR tree-optimization/101836
>>  * tree-object-size.cc (addr_object_size): Use array_at_struct_end_p
>>  and DECL_NOT_FLEXARRAY to determine a flexible array member reference.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  PR tree-optimization/101836
>>  * gcc.dg/pr101836.c: New test.
>>  * gcc.dg/pr101836_1.c: New test.
>>  * gcc.dg/pr101836_2.c: New test.
>>  * gcc.dg/pr101836_3.c: New test.
>>  * gcc.dg/pr101836_4.c: New test.
>>  * gcc.dg/pr101836_5.c: New test.
>>  * gcc.dg/strict-flex-array-2.c: New test.
>>  * gcc.dg/strict-flex-array-3.c: New test.
>> ---
>> gcc/testsuite/gcc.dg/pr101836.c| 60 ++
>> gcc/testsuite/gcc.dg/pr101836_1.c  | 60 ++
>> gcc/testsuite/gcc.dg/pr101836_2.c  | 60 ++
>> gcc/testsuite/gcc.dg/pr101836_3.c  | 60 ++
>> gcc/testsuite/gcc.dg/pr101836_4.c  | 60 ++
>> gcc/testsuite/gcc.dg/pr101836_5.c  | 60 ++
>> gcc/testsuite/gcc.dg/strict-flex-array-2.c | 60 ++
>> gcc/testsuite/gcc.dg/strict-flex-array-3.c | 60 ++
>> gcc/tree-object-size.cc| 18 +++
>> 9 files changed, 489 insertions(+), 9 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_1.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_2.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_3.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_4.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr101836_5.c
>> create mode 100644 gcc/testsuite/gcc.dg/strict-flex-array-2.c
>> create mode 100644 gcc/testsuite/gcc.dg/strict-flex-array-3.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/pr101836.c 
>> b/gcc/testsuite/gcc.dg/pr101836.c
>> new file mode 100644
>> index 000..e5b4e5160a4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr101836.c
>> @@ -0,0 +1,60 @@
>> +/* -fstrict-flex-array is aliased with -ftrict-flex-array=3, which is the
>> +   strictest, only [] is treated as flexible array.  */ 
>> +/* PR tree-optimization/101836 */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fstrict-flex-array" } */
>> +
>> +#include 
>> +
>> +#define expect(p, _v) do { \
>> +size_t v = _v; \
>> +if (p == v) \
>> +printf("ok:  %s == %zd\n", #p, p); \
>> +else \
>> +{  \
>> +  printf("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
>> +  __builtin_abort (); \
>> +} \
>> +} while (0);
>> +
>> +struct trailing_array_1 {
>> +int a;
>> +int b;
>> +int c[4];
>> +};
>> +
>> +struct trailing_array_2 {
>> +int a;
>> +int b;
>> +int c[1];
>> +};
>> +
>> +struct trailing_array_3 {
>> +int a;
>> +int b;
>> +int c[0];
>> +};
>> +struct trailing_array_4 {
>> +int a;
>> +int b;
>> +int c[];
>> +};
>> +
>> +void __attribute__((__noinline__)) stuff(
>> +struct trailing_array_1 *normal,
>> +struct trailing_array_2 *trailing_1,
>> +struct trailing_array_3 *trailing_0,
>> +struct trailing_array_4 *trailing_flex)
>> +{
>> +expect(__builtin_object_size(normal->c, 1), 16);
>> +expect(__builtin_object_size(trailing_1->c, 1), 4);
>> +expect(__builtin_object_size(trailing_0->c, 1), 0);
>> +expect(__builtin_object_size(trailing_flex->c, 1), -1);
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +stuff((void *)argv[0], (void *)argv[0], (void *)argv[0], (void 
>> *)argv[0]);
>> +
>> +return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/pr101836_1.c 
>> b/gcc/testsuite/gcc.dg/pr101836_1.c
>> new file mode 100644
>> index 000..30ea20427a5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr101836_1.c
>> @@ -0,0 +1,60 @@
>> +/* -fstrict-flex-array=3 is the strictest, only [] is treated as
>> +   flexible array.  */ 
>> +/* PR tree-optimization/101836 */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fstrict-flex-array=3" } */
>> +
>> +#include 
>> +
>> +#define expect(p, _v) do { \
>> +size_t v = _v; \
>> +if (p == v) \
>> +printf("ok:  %s == %zd\n", #p, p); \
>> +else \
>> +{  \
>> +  printf("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
>> +  __builtin_abort (); \
>> +} \
>> +} while (0);
>> +
>> +struct trailing_array_1 {
>> +int a;
>> +int b;
>> +int c[4];
>> +};
>> +
>> +struct trailing_array_2 {
>> +int a;
>> +int b;
>> +int 

[PATCH] LoongArch: add addr_global attribute

2022-07-29 Thread Xi Ruoyao via Gcc-patches
Background:
https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379

Question:  Do you have a better name than "addr_global"?

Alternatives:

1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel
modules.  It's stupid IMO.
2. Implement a "-maddress-local-with-got" option for GCC and use it for
kernel modules.  It seems too overkill: we might create many unnecessary
GOT entries.
3. For all variables with a section attribute, consider it global.  It
may make sense, but I just checked x86_64 and riscv and they don't do
this.
4. Implement -mcmodel=extreme and use it for kernel modules.  To me
"extreme" seems really too extreme.
5. More hacks in kernel. (Convert relocations against .data..percpu with
objtool?  But objtool is not even implemented for LoongArch yet.)

Note: I'll be mostly AFK in the following week.  My attempt to finish
the kernel support for new relocs before going AFK now failed miserably
:(.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_attribute_table):
New attribute table.
(TARGET_ATTRIBUTE_TABLE): Define the target hook.
(loongarch_handle_addr_global_attribute): New static function.
(loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
SYMBOL_REF_DECL with addr_global attribute.
* doc/extend.texi (Variable Attributes): Document new
LoongArch specific attribute.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/addr-global.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 43 +++
 gcc/doc/extend.texi   | 17 
 .../gcc.target/loongarch/addr-global.c| 28 
 3 files changed, 88 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..5f4e6114fc9 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x)
   && !loongarch_symbol_binds_local_p (x))
 return SYMBOL_GOT_DISP;
 
+  if (SYMBOL_REF_P (x))
+{
+  tree decl = SYMBOL_REF_DECL (x);
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+   return SYMBOL_GOT_DISP;
+}
+
   return SYMBOL_PCREL;
 }
 
@@ -6068,6 +6075,39 @@ loongarch_starting_frame_offset (void)
   return crtl->outgoing_args_size;
 }
 
+static tree
+loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int,
+bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+{
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "addr_global", 0, 0, true, false, false, false,
+loongarch_handle_addr_global_attribute, NULL }
+};
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6296,9 @@ loongarch_starting_frame_offset (void)
 #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef  TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-loongarch.h"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7fe7f8817cd..4a1cd7ab5e4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7314,6 +7314,7 @@ attributes.
 * Blackfin Variable Attributes::
 * H8/300 Variable Attributes::
 * IA-64 Variable Attributes::
+* LoongArch Variable Attributes::
 * M32R/D Variable Attributes::
 * MeP Variable Attributes::
 * Microsoft Windows Variable Attributes::
@@ -8098,6 +8099,22 @@ defined by shared libraries.
 
 @end table
 
+@node LoongArch Variable Attributes
+@subsection LoongArch Variable Attributes
+
+One attribute is currently defined for the LoongArch.
+

[PATCH v2] cselib: add function to check if SET is redundant [PR106187]

2022-07-29 Thread Richard Earnshaw via Gcc-patches
A SET operation that writes memory may have the same value as an earlier 
store but if the alias sets of the new and earlier store do not conflict 
then the set is not truly redundant.  This can happen, for example, if 
objects of different types share a stack slot.


To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.

The routine is used in two places elsewhere in the compiler.  Firstly
in cfgcleanup and secondly in postreload.

gcc/ChangeLog:
* alias.h (mems_same_for_tbaa_p): Declare.
* alias.cc (mems_same_for_tbaa_p): New function.
* dse.cc (record_store): Use it instead of open-coding
alias check.
* cselib.h (cselib_redundant_set_p): Declare.
* cselib.cc: Include alias.h
(cselib_redundant_set_p): New function.
* cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
of rtx_equal_for_cselib_p.
* postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
(reload_cse_noop_set_p): Delete.diff --git a/gcc/alias.cc b/gcc/alias.cc
index 8c08452e0ac..d54feb15268 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -389,6 +389,20 @@ refs_same_for_tbaa_p (tree earlier, tree later)
 	  || alias_set_subset_of (later_base_set, earlier_base_set));
 }
 
+/* Similar to refs_same_for_tbaa_p() but for use on MEM rtxs.  */
+bool
+mems_same_for_tbaa_p (rtx earlier, rtx later)
+{
+  gcc_assert (MEM_P (earlier));
+  gcc_assert (MEM_P (later));
+
+  return ((MEM_ALIAS_SET (earlier) == MEM_ALIAS_SET (later)
+	   || alias_set_subset_of (MEM_ALIAS_SET (later),
+   MEM_ALIAS_SET (earlier)))
+	  && (!MEM_EXPR (earlier)
+	  || refs_same_for_tbaa_p (MEM_EXPR (earlier), MEM_EXPR (later;
+}
+
 /* Returns a pointer to the alias set entry for ALIAS_SET, if there is
such an entry, or NULL otherwise.  */
 
diff --git a/gcc/alias.h b/gcc/alias.h
index b2596518ac9..ee3db466763 100644
--- a/gcc/alias.h
+++ b/gcc/alias.h
@@ -40,6 +40,7 @@ tree reference_alias_ptr_type_1 (tree *);
 bool alias_ptr_types_compatible_p (tree, tree);
 int compare_base_decls (tree, tree);
 bool refs_same_for_tbaa_p (tree, tree);
+bool mems_same_for_tbaa_p (rtx, rtx);
 
 /* This alias set can be used to force a memory to conflict with all
other memories, creating a barrier across which no memory reference
diff --git a/gcc/cfgcleanup.cc b/gcc/cfgcleanup.cc
index 18047da7b24..a8b0139bb4d 100644
--- a/gcc/cfgcleanup.cc
+++ b/gcc/cfgcleanup.cc
@@ -208,7 +208,7 @@ mark_effect (rtx exp, regset nonequal)
   return false;
 
 case SET:
-  if (rtx_equal_for_cselib_p (SET_DEST (exp), SET_SRC (exp)))
+  if (cselib_redundant_set_p (exp))
 	return false;
   dest = SET_DEST (exp);
   if (dest == pc_rtx)
diff --git a/gcc/cselib.cc b/gcc/cselib.cc
index 6769beeeaf8..6a5609786fa 100644
--- a/gcc/cselib.cc
+++ b/gcc/cselib.cc
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "cselib.h"
 #include "function-abi.h"
+#include "alias.h"
 
 /* A list of cselib_val structures.  */
 struct elt_list
@@ -1157,6 +1158,75 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, machine_mode memmode, int depth)
   return 1;
 }
 
+/* Wrapper for rtx_equal_for_cselib_p to determine whether a SET is
+   truly redundant, taking into account aliasing information.  */
+bool
+cselib_redundant_set_p (rtx set)
+{
+  gcc_assert (GET_CODE (set) == SET);
+  rtx dest = SET_DEST (set);
+  if (cselib_reg_set_mode (dest) != GET_MODE (dest))
+return false;
+
+  if (!rtx_equal_for_cselib_p (dest, SET_SRC (set)))
+return false;
+
+  while (GET_CODE (dest) == SUBREG
+	 || GET_CODE (dest) == ZERO_EXTRACT
+	 || GET_CODE (dest) == STRICT_LOW_PART)
+dest = XEXP (dest, 0);
+
+  if (!flag_strict_aliasing || !MEM_P (dest))
+return true;
+
+  /* For a store we need to check that suppressing it will not change
+ the effective alias set.  */
+  rtx dest_addr = XEXP (dest, 0);
+
+  /* Lookup the equivalents to the original dest (rather than just the
+ MEM).  */
+  cselib_val *src_val = cselib_lookup (SET_DEST (set),
+   GET_MODE (SET_DEST (set)),
+   0, VOIDmode);
+
+  if (src_val)
+{
+  /* Walk the list of source equivalents to find the MEM accessing
+	 the same location.  */
+  for (elt_loc_list *l = src_val->locs; l; l = l->next)
+	{
+	  rtx src_equiv = l->loc;
+	  while (GET_CODE (src_equiv) == SUBREG
+		 || GET_CODE (src_equiv) == ZERO_EXTRACT
+		 || GET_CODE (src_equiv) == STRICT_LOW_PART)
+	src_equiv = XEXP (src_equiv, 0);
+
+	  if (MEM_P (src_equiv))
+	{
+	  /* Match the MEMs by comparing the addresses.  We can
+		 only remove the later store if the earlier aliases at
+		 least all the accesses of the later one.  */
+	  if (rtx_equal_for_cselib_1 (dest_addr, XEXP (src_equiv, 0),
+	  GET_MODE (dest), 0))
+		return mems_same_for_tbaa_p (src_equiv, 

Progress of the new linking implementation

2022-07-29 Thread Gaius Mulley via Gcc-patches


Hello,

the non shared library linking is complete and the gm2 driver has been
rewritten using the fortran/c++ code base.  Once the shared library
scaffold is complete the focus will be on tidying up and
platform/architecture testing.

All 11.7k tests pass on amd64 and aarch64 debian

regards,
Gaius


Re: [PATCH] tree-optimization/105679 - disable backward threading of unlikely entry

2022-07-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Jul 2022, Aldy Hernandez wrote:

> On Fri, Jul 29, 2022 at 11:02 AM Richard Biener  wrote:
> >
> > The following makes the backward threader reject threads whose entry
> > edge is probably never executed according to the profile.  That in
> > particular, for the testcase, avoids threading the irq == 1 check
> > on the path where irq > 31, thereby avoiding spurious -Warray-bounds
> > diagnostics
> >
> >   if (irq_1(D) > 31)
> > goto ; [0.00%]
> >   else
> > goto ; [100.00%]
> >
> > ;;   basic block 3, loop depth 0, count 0 (precise), probably never executed
> >   _2 = (unsigned long) irq_1(D);
> >   __builtin___ubsan_handle_shift_out_of_bounds (&*.Lubsan_data0, 1, _2);
> >
> >   _3 = 1 << irq_1(D);
> >   mask_4 = (u32) _3;
> >   entry = instance_5(D)->array[irq_1(D)];
> >   capture (mask_4);
> >   if (level_6(D) != 0)
> > goto ; [34.00%]
> >   else
> > goto ; [66.00%]
> >
> > ;;   basic block 5, loop depth 0, count 708669600 (estimated locally), 
> > maybe hot  if (irq_1(D) == 1)
> > goto ; [20.97%]
> >   else
> > goto ; [79.03%]
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > The testcase in the PR requries both ubsan and sancov so I'm not sure
> > where to put it but IIRC there were quite some duplicate PRs wrt
> > threading unlikely paths exposing diagnostics, eventually some
> > testcase will come out of those (when we identify them).  Note
> > the patch is quite conservative in only disabling likely never
> > executed paths rather than requiring maybe_hot_edge_p (OTOH those
> > are somewhat similar in the end).
> 
> Sounds reasonable, if for no other reason than to quiet down the
> annoyingly large amount of false positives we have because of
> aggressive threading, or better ranges as a whole.
> 
> How does this fit in with Jeff's original idea of isolating known
> problematic paths (null dereferences?), which in theory are also
> unlikely?

If profile estimation gets to see those the paths should turn unlikely.
Path isolation should also make sure to adjust the profile accordingly,
though local profile adjustments are somewhat difficult.

Richard.

> Thanks for doing this.
> Aldy
> 
> >
> > I'm going to push it when testing finishes but maybe there are some
> > testcases to adjust.
> >
> > PR tree-optimization/105679
> > * tree-ssa-threadbackwards.cc
> > (back_threader_profitability::profitable_path_p): Avoid threading
> > when the entry edge is probably never executed.
> > ---
> >  gcc/tree-ssa-threadbackward.cc | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> > index 3519aca84cd..90f5331c265 100644
> > --- a/gcc/tree-ssa-threadbackward.cc
> > +++ b/gcc/tree-ssa-threadbackward.cc
> > @@ -777,6 +777,15 @@ back_threader_profitability::profitable_path_p (const 
> > vec _path,
> >  "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> >   return false;
> > }
> > +  edge entry = find_edge (m_path[m_path.length () - 1],
> > + m_path[m_path.length () - 2]);
> > +  if (probably_never_executed_edge_p (cfun, entry))
> > +   {
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > +   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> > +"path entry is probably never executed.\n");
> > + return false;
> > +   }
> >  }
> >else if (!m_speed_p && n_insns > 1)
> >  {
> > --
> > 2.35.3
> >
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[committed] amdgcn: 64-bit vector shifts

2022-07-29 Thread Andrew Stubbs
I've committed this patch to implement V64DImode vector-vector and 
vector-scalar shifts.


In particular, these are used by the SIMD "inbranch" clones that I'm 
working on right now, but it's an omission that ought to have been fixed 
anyway.


Andrewamdgcn: 64-bit vector shifts

Enable 64-bit vector-vector and vector-scalar shifts.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (V_INT_noHI): New iterator.
(3): Use V_INT_noHI.
(v3): Likewise.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index abe46201344..8c33ae0c717 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -60,6 +60,8 @@ (define_mode_iterator V_noHI
 
 (define_mode_iterator V_INT_noQI
  [V64HI V64SI V64DI])
+(define_mode_iterator V_INT_noHI
+ [V64SI V64DI])
 
 ; All of above
 (define_mode_iterator V_ALL
@@ -2086,10 +2088,10 @@ (define_expand "3"
   })
 
 (define_insn "3"
-  [(set (match_operand:V_SI 0 "register_operand"  "= v")
-   (shiftop:V_SI
- (match_operand:V_SI 1 "gcn_alu_operand" "  v")
- (vec_duplicate:V_SI
+  [(set (match_operand:V_INT_noHI 0 "register_operand"  "= v")
+   (shiftop:V_INT_noHI
+ (match_operand:V_INT_noHI 1 "gcn_alu_operand" "  v")
+ (vec_duplicate:
(match_operand:SI 2 "gcn_alu_operand"  "SvB"]
   ""
   "v_0\t%0, %2, %1"
@@ -2117,10 +2119,10 @@ (define_expand "v3"
   })
 
 (define_insn "v3"
-  [(set (match_operand:V_SI 0 "register_operand"  "=v")
-   (shiftop:V_SI
- (match_operand:V_SI 1 "gcn_alu_operand" " v")
- (match_operand:V_SI 2 "gcn_alu_operand" "vB")))]
+  [(set (match_operand:V_INT_noHI 0 "register_operand"  "=v")
+   (shiftop:V_INT_noHI
+ (match_operand:V_INT_noHI 1 "gcn_alu_operand" " v")
+ (match_operand: 2 "gcn_alu_operand" "vB")))]
   ""
   "v_0\t%0, %2, %1"
   [(set_attr "type" "vop2")


[committed] amdgcn: 64-bit not

2022-07-29 Thread Andrew Stubbs

I've committed this patch to enable DImode one's-complement on amdgcn.

The hardware doesn't have 64-bit not, and this isn't needed by expand 
which is happy to use two SImode operations, but the vectorizer isn't so 
clever. Vector condition masks are DImode on amdgcn, so this has been 
causing lots of conditional code to fail to vectorize.


Andrewamdgcn: 64-bit not

This makes the auto-vectorizer happier when handling masks.

gcc/ChangeLog:

* config/gcn/gcn.md (one_cmpldi2): New.

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 033c1708e88..70a769babc4 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1676,6 +1676,26 @@ (define_expand "si3_scc"
 ;; }}}
 ;; {{{ ALU: generic 64-bit
 
+(define_insn_and_split "one_cmpldi2"
+  [(set (match_operand:DI 0 "register_operand""=Sg,v")
+   (not:DI (match_operand:DI 1 "gcn_alu_operand" "SgA,vSvDB")))
+   (clobber (match_scratch:BI 2  "=cs,X"))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 3) (not:SI (match_dup 4)))
+ (clobber (match_dup 2))])
+   (parallel [(set (match_dup 5) (not:SI (match_dup 6)))
+ (clobber (match_dup 2))])]
+  {
+operands[3] = gcn_operand_part (DImode, operands[0], 0);
+operands[4] = gcn_operand_part (DImode, operands[1], 0);
+operands[5] = gcn_operand_part (DImode, operands[0], 1);
+operands[6] = gcn_operand_part (DImode, operands[1], 1);
+  }
+  [(set_attr "type" "mult")]
+)
+
 (define_code_iterator vec_and_scalar64_com [and ior xor])
 
 (define_insn_and_split "di3"


Re: [PATCH] tree-optimization/105679 - disable backward threading of unlikely entry

2022-07-29 Thread Aldy Hernandez via Gcc-patches
On Fri, Jul 29, 2022 at 11:02 AM Richard Biener  wrote:
>
> The following makes the backward threader reject threads whose entry
> edge is probably never executed according to the profile.  That in
> particular, for the testcase, avoids threading the irq == 1 check
> on the path where irq > 31, thereby avoiding spurious -Warray-bounds
> diagnostics
>
>   if (irq_1(D) > 31)
> goto ; [0.00%]
>   else
> goto ; [100.00%]
>
> ;;   basic block 3, loop depth 0, count 0 (precise), probably never executed
>   _2 = (unsigned long) irq_1(D);
>   __builtin___ubsan_handle_shift_out_of_bounds (&*.Lubsan_data0, 1, _2);
>
>   _3 = 1 << irq_1(D);
>   mask_4 = (u32) _3;
>   entry = instance_5(D)->array[irq_1(D)];
>   capture (mask_4);
>   if (level_6(D) != 0)
> goto ; [34.00%]
>   else
> goto ; [66.00%]
>
> ;;   basic block 5, loop depth 0, count 708669600 (estimated locally), maybe 
> hot  if (irq_1(D) == 1)
> goto ; [20.97%]
>   else
> goto ; [79.03%]
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> The testcase in the PR requries both ubsan and sancov so I'm not sure
> where to put it but IIRC there were quite some duplicate PRs wrt
> threading unlikely paths exposing diagnostics, eventually some
> testcase will come out of those (when we identify them).  Note
> the patch is quite conservative in only disabling likely never
> executed paths rather than requiring maybe_hot_edge_p (OTOH those
> are somewhat similar in the end).

Sounds reasonable, if for no other reason than to quiet down the
annoyingly large amount of false positives we have because of
aggressive threading, or better ranges as a whole.

How does this fit in with Jeff's original idea of isolating known
problematic paths (null dereferences?), which in theory are also
unlikely?

Thanks for doing this.
Aldy

>
> I'm going to push it when testing finishes but maybe there are some
> testcases to adjust.
>
> PR tree-optimization/105679
> * tree-ssa-threadbackwards.cc
> (back_threader_profitability::profitable_path_p): Avoid threading
> when the entry edge is probably never executed.
> ---
>  gcc/tree-ssa-threadbackward.cc | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index 3519aca84cd..90f5331c265 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -777,6 +777,15 @@ back_threader_profitability::profitable_path_p (const 
> vec _path,
>  "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
>   return false;
> }
> +  edge entry = find_edge (m_path[m_path.length () - 1],
> + m_path[m_path.length () - 2]);
> +  if (probably_never_executed_edge_p (cfun, entry))
> +   {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +"path entry is probably never executed.\n");
> + return false;
> +   }
>  }
>else if (!m_speed_p && n_insns > 1)
>  {
> --
> 2.35.3
>



Re: [patch] libgompd: Add thread handles

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 04, 2022 at 10:34:03PM +0200, Ahmed Sayed Mousse wrote:
> *This patch is the initial implementation of OpenMP-API specs book section
> **20.5.5 with title "Thread Handles".*

Sorry for the delay, have been on vacation.

> *I have fixed the first version after revising the notes on it.*
> 
> *libgomp/ChangeLog
> 
> 2022-07-01  Ahmed Sayed   >
> *
> 
> ** Makefile.am (libgompd_la_SOURCES): Add ompd-threads.c.*
> 
> ** Makefile.in: Regenerate.*
> 
> ** team.c ( gomp_free_thread ): Called ompd_bp_thread_end ().*
> 
> ** ompd-support.c ( gompd_thread_initial_tls_bias ): New Variable.*
> 
> *   (gompd_load): Initialize gompd_thread_initial_tls_bias.*
> 
> ** ompd-threads.c: New File.*

The ChangeLog formatting is wrong, so wouldn't go through the commit
hook checking.  Unclear what part of it is just a fault of your mailer
setting and what is really wrong.  But
There should be just > after gmail.com, not another email address,
all the non-empty lines should be indented by a single tab,
there shouldn't be an extra * at the start of end of lines.
There shouldn't be empty lines in between the different changes, just
between the date/name/email line and the ret.
There shouldn't be spaces after ( or before ).
Instead of Called ompd_bp_thread_end (). say just Call ompd_bp_thread_end.
New variable. rather than New Variable.
The line with (gompd_load) is weirdly extra indented.
New file. rather than New File.

> diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
> index 6d913a93e7f..23f5bede1bf 100644
> --- a/libgomp/Makefile.am
> +++ b/libgomp/Makefile.am
> @@ -94,7 +94,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
> env.c error.c \
>   priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
>   oacc-target.c ompd-support.c
>  
> -libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
> +libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c ompd-threads.c
>  
>  include $(top_srcdir)/plugin/Makefrag.am
>  

You've just changed libgompd_la_SOURCES but there are many changes
in the generated file, that means either you didn't use the right
libtool version (1.15.1) or something else wrong is happening.

> diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
> index 40f896b5f03..8bbc46cca25 100644
> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in
> @@ -133,21 +133,8 @@ target_triplet = @target@
>  @USE_FORTRAN_TRUE@am__append_7 = openacc.f90
>  subdir = .
>  ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
> -am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
> - $(top_srcdir)/../config/ax_count_cpus.m4 \
> - $(top_srcdir)/../config/depstand.m4 \
> - $(top_srcdir)/../config/enable.m4 \
> - $(top_srcdir)/../config/futex.m4 \
> - $(top_srcdir)/../config/lead-dot.m4 \
> - $(top_srcdir)/../config/lthostflags.m4 \
> - $(top_srcdir)/../config/multi.m4 \
> - $(top_srcdir)/../config/override.m4 \
> - $(top_srcdir)/../config/tls.m4 \
> - $(top_srcdir)/../config/toolexeclibdir.m4 \
> - $(top_srcdir)/../ltoptions.m4 $(top_srcdir)/../ltsugar.m4 \
> - $(top_srcdir)/../ltversion.m4 $(top_srcdir)/../lt~obsolete.m4 \
> - $(top_srcdir)/acinclude.m4 $(top_srcdir)/../libtool.m4 \
> - $(top_srcdir)/../config/cet.m4 \
> +am__aclocal_m4_deps = $(top_srcdir)/acinclude.m4 \
> + $(top_srcdir)/../libtool.m4 $(top_srcdir)/../config/cet.m4 \
>   $(top_srcdir)/plugin/configfrag.ac $(top_srcdir)/configure.ac

The above certainly shouldn't be changed.

>  am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
>   $(ACLOCAL_M4)
> @@ -233,7 +220,8 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
> critical.lo \
>   affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
>   oacc-target.lo ompd-support.lo $(am__objects_1)
>  libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
> -am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo
> +am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo \
> + ompd-threads.lo
>  libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
>  AM_V_P = $(am__v_P_@AM_V@)
>  am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)

The above yes.

> @@ -485,7 +473,6 @@ dvidir = @dvidir@
>  enable_shared = @enable_shared@
>  enable_static = @enable_static@
>  exec_prefix = @exec_prefix@
> -get_gcc_base_ver = @get_gcc_base_ver@
>  host = @host@
>  host_alias = @host_alias@
>  host_cpu = @host_cpu@
> @@ -501,10 +488,8 @@ libtool_VERSION = @libtool_VERSION@
>  link_gomp = @link_gomp@
>  localedir = @localedir@
>  localstatedir = @localstatedir@
> -lt_host_flags = @lt_host_flags@
>  mandir = @mandir@
>  mkdir_p = @mkdir_p@
> -multi_basedir = @multi_basedir@
>  offload_additional_lib_paths = @offload_additional_lib_paths@
>  offload_additional_options = @offload_additional_options@
>  offload_plugins = @offload_plugins@
> @@ -514,6 +499,7 @@ pdfdir = @pdfdir@
>  prefix = @prefix@
>  program_transform_name = @program_transform_name@
>  psdir = @psdir@
> +runstatedir = 

Re: [PATCH] Fortran: detect blanks within literal constants in free-form mode [PR92805]

2022-07-29 Thread Mikael Morin

Hello,

Le 28/07/2022 à 22:11, Harald Anlauf via Fortran a écrit :

Dear all,

in free-form mode, blanks are significant, so they cannot appear
in literal constants, especially not before or after the "_" that
separates the literal and the kind specifier.

The initial patch from Steve addressed numerical literals, which
I completed by adjusting the parsing of string literals.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

It looks correct, but I think we should continue to have the free vs 
fixed form abstracted away from the parsing code.
So, I suggest instead to remove the calls to gfc_gobble_whitespace in 
match_string_constant, and use gfc_next_char instead of gfc_match_char 
in get_kind.


Mikael


Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Jul 2022, Jakub Jelinek wrote:

> On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via 
> Gcc-patches wrote:
> > The 'only on the vectorized code path' remains the same though as vect_recog
> > also only happens on the vectorized code path right?
> 
> if conversion (in some cases) duplicates a loop and guards one copy with
> an ifn which resolves to true if that particular loop is vectorized and
> false otherwise.  So, then changes that shouldn't be done in case of
> vectorization failure can be done on the for vectorizer only copy of the
> loop.

And just to mention, one issue with lowering of bitfield accesses
is bitfield inserts which, on some architectures (hello m68k) have
instructions operating on memory directly.  For those it's difficult
to not regress in code quality if a bitfield store becomes a
read-modify-write cycle.  That's one of the things holding this
back.  One idea would be to lower to .INSV directly for those targets
(but presence of insv isn't necessarily indicating support for
memory destinations).

Richard.


[PATCH] Mips: Enable TSAN for 64-bit ABIs

2022-07-29 Thread Dimitrije Milosevic
The following patch enables TSAN for mips64, on which it is supported.

Signed-off-by: Dimitrije Milosevic .

libsanitizer/ChangeLog:

* configure.tgt: Enable
TSAN for 64-bit ABIs.
---
 libsanitizer/configure.tgt | 4 
 1 file changed, 4 insertions(+)

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index fb89df4935c..6855a6ca9e7 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -55,6 +55,10 @@ case "${target}" in
   arm*-*-linux*)
;;
   mips*-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x8; then
+   TSAN_SUPPORTED=yes
+   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_mips64.lo
+   fi
;;
   aarch64*-*-linux*)
if test x$ac_cv_sizeof_void_p = x8; then
-- 
2.25.1

Re: [Patch] Add libgomp.c-c++-common/pr106449-2.c (was: [committed] openmp: Fix up handling of non-rectangular simd loops with pointer type iterators [PR106449])

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 11:48:08AM +0200, Tobias Burnus wrote:
> On 29.07.22 10:03, Jakub Jelinek wrote:
> > There were 2 issues visible on this new testcase, one that we didn't have
> > special POINTER_TYPE_P handling in a few spots of expand_omp_simd ...
> > The other issue was that we put n2 expression directly into a
> > comparison in a condition and regimplified that, for the [512] case that
> > and with gimplification being destructed that unfortunately meant 
> > modification
> > of original fd->loops[?].n2.  Fixed by unsharing the expression.
> 
> I created a testcase for the non-simd case – and due to messing up, it failed;
> hence, I filled PR middle-end/106467.  After fixing the testcase, it passes.
> (→ closed PR as invalid).
> 
> However, given that the testcase now exists, I think it makes sense to add it 
> :-)
> 
> Changes compared to the simd testcase: replaced '(parallel for) simd' by 
> 'for',
> removed 'linear', used now 'b' and 'c' instead of storing both ptrs in 'b'.
> 
> Side remark: Before GCC 12, GCC complained about "q = p + n" with
> "error: initializer expression refers to iteration variable ‘p’".
> 
> OK for mainline?

Ok, thanks.

Jakub



Re: [Patch] OpenMP/Fortran: Permit assumed-size arrays in uniform clause

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 11:47:54AM +0200, Tobias Burnus wrote:
> Testcase wise, the run-time testcase libgomp.fortran/examples-4/simd-2.f90
> checks essentially the same, except that it uses an array-descriptor array
> (assumed shape) while this testcase uses an assumed-size array.
> 
> I decided for an extra compile-time only testcase, but it could be also be
> moved to libgomp as run-time test or the other test could be extended to
> also test assumed-size arrays.
> 
> The OpenMP examples document contains two testcases which now pass,
> but are reject without this patch:
> - SIMD/sources/SIMD.2.f90 (for OpenMP 4.0)
> - SIMD/sources/linear_modifier.3.f90 (for OpenMP 5.2)
> 
> OK for mainline?

Ok, thanks.

Jakub



Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via Gcc-patches 
wrote:
> The 'only on the vectorized code path' remains the same though as vect_recog
> also only happens on the vectorized code path right?

if conversion (in some cases) duplicates a loop and guards one copy with
an ifn which resolves to true if that particular loop is vectorized and
false otherwise.  So, then changes that shouldn't be done in case of
vectorization failure can be done on the for vectorizer only copy of the
loop.

Jakub



Re: cselib: add function to check if SET is redundant [PR106187]

2022-07-29 Thread Richard Biener via Gcc-patches
On Fri, Jul 29, 2022 at 11:52 AM Richard Earnshaw
 wrote:
>
>
>
> On 29/07/2022 08:06, Richard Biener via Gcc-patches wrote:
> > On Thu, Jul 28, 2022 at 6:46 PM Richard Earnshaw
> >  wrote:
> >>
> >> [resend with correct subject line]
> >>
> >> A SET operation that writes memory may have the same value as an earlier
> >> store but if the alias sets of the new and earlier store do not conflict
> >> then the set is not truly redundant.  This can happen, for example, if
> >> objects of different types share a stack slot.
> >>
> >> To fix this we define a new function in cselib that first checks for
> >> equality and if that is successful then finds the earlier store in the
> >> value history and checks the alias sets.
> >>
> >> The routine is used in two places elsewhere in the compiler.  Firstly
> >> in cfgcleanup and secondly in postreload.
> >
> > I can't comment on the stripping on SUBREGs and friends but it seems
> > to be conservative apart from
> >
> > +  if (!flag_strict_aliasing || !MEM_P (dest))
> > +return true;
> >
> > where if dest is not a MEM but were to contain one we'd miss it.
> > Double-checking
> > from more RTL literate people appreciated.
>
> There are very few things that can wrap a MEM in a SET_DEST.  I'm pretty
> sure that's all of them.  It certainly matches the code in
> cselib_invalidate_rtx which has to deal with this sort of case.
>
> >
> > +  /* Lookup the equivalents to the dest.  This is more likely to succeed
> > + than looking up the equivalents to the source (for example, when the
> > + src is some form of constant).  */
> >
> > I think the comment is misleading - we _do_ have to lookup the MEM,
> > looking up equivalences of a reg or an expression on the RHS isn't
> > what we are interested in.
>
> OK, I'll try to reword it.
>
> >
> > +   return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
> > + MEM_ALIAS_SET (src_equiv));
> >
> > that's not conservative enough - dse.cc has correct boilerplate, we have
> > to check both MEM_ALIAS_SET and MEM_EXPR here (the latter only
> > if the former load/store has a MEM_EXPR).  Note in particular
> > using alias_set_subset_of instead of alias_sets_conflict_p.
> >
> >/* We can only remove the later store if the earlier aliases
> >   at least all accesses the later one.  */
> >&& ((MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
> > || alias_set_subset_of (MEM_ALIAS_SET (mem),
> > MEM_ALIAS_SET (s_info->mem)))
> >&& (!MEM_EXPR (s_info->mem)
> >|| refs_same_for_tbaa_p (MEM_EXPR (s_info->mem),
> > MEM_EXPR (mem)
> >
>
> OK, that's an easy enough change.
>
> > +  /* We failed to find a recorded value in the cselib history, so try the
> > + source of this set.  */
> > +  rtx src = SET_SRC (set);
> > +  while (GET_CODE (src) == SUBREG)
> > +src = XEXP (src, 0);
> > +
> > +  if (MEM_P (src) && rtx_equal_for_cselib_1 (dest_addr, XEXP (src, 0),
> > +GET_MODE (dest), 0))
> > +return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
> > + MEM_ALIAS_SET (src));
> >
> > this looks like an odd case to me - wouldn't that only catch things
> > like self-assignments, aka *p = *p?  So I'd simply drop this fallback.
>
> It catches the case of *p = *q when p and q have the same value.  It did
> come up in testing on x86 (when previously I was aborting to make sure
> I'd caught everything).  We could leave it out as the fallback case in
> this instance is to record a conflict, but it's not a path that's likely
> to be performance critical and the probability of this being a redundant
> store is quite high.  I'll update the comment to make this clearer.

Ah OK - if it did actually catch cases then it's fine to keep.  Note the
alias check needs to be updated the same as above.

Richard.

>
>
> R.
>
> >
> > Otherwise it looks OK to me.
> >
> > Thanks,
> > Richard.
> >
> >> gcc/ChangeLog:
> >>  * cselib.h (cselib_redundant_set_p): Declare.
> >>  * cselib.cc: Include alias.h
> >>  (cselib_redundant_set_p): New function.
> >>  * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
> >>  of rtx_equal_for_cselib_p.
> >>  * postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
> >>  (reload_cse_noop_set_p): Delete.


Re: cselib: add function to check if SET is redundant [PR106187]

2022-07-29 Thread Richard Earnshaw via Gcc-patches




On 29/07/2022 08:06, Richard Biener via Gcc-patches wrote:

On Thu, Jul 28, 2022 at 6:46 PM Richard Earnshaw
 wrote:


[resend with correct subject line]

A SET operation that writes memory may have the same value as an earlier
store but if the alias sets of the new and earlier store do not conflict
then the set is not truly redundant.  This can happen, for example, if
objects of different types share a stack slot.

To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.

The routine is used in two places elsewhere in the compiler.  Firstly
in cfgcleanup and secondly in postreload.


I can't comment on the stripping on SUBREGs and friends but it seems
to be conservative apart from

+  if (!flag_strict_aliasing || !MEM_P (dest))
+return true;

where if dest is not a MEM but were to contain one we'd miss it.
Double-checking
from more RTL literate people appreciated.


There are very few things that can wrap a MEM in a SET_DEST.  I'm pretty 
sure that's all of them.  It certainly matches the code in 
cselib_invalidate_rtx which has to deal with this sort of case.




+  /* Lookup the equivalents to the dest.  This is more likely to succeed
+ than looking up the equivalents to the source (for example, when the
+ src is some form of constant).  */

I think the comment is misleading - we _do_ have to lookup the MEM,
looking up equivalences of a reg or an expression on the RHS isn't
what we are interested in.


OK, I'll try to reword it.



+   return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
+ MEM_ALIAS_SET (src_equiv));

that's not conservative enough - dse.cc has correct boilerplate, we have
to check both MEM_ALIAS_SET and MEM_EXPR here (the latter only
if the former load/store has a MEM_EXPR).  Note in particular
using alias_set_subset_of instead of alias_sets_conflict_p.

   /* We can only remove the later store if the earlier aliases
  at least all accesses the later one.  */
   && ((MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
|| alias_set_subset_of (MEM_ALIAS_SET (mem),
MEM_ALIAS_SET (s_info->mem)))
   && (!MEM_EXPR (s_info->mem)
   || refs_same_for_tbaa_p (MEM_EXPR (s_info->mem),
MEM_EXPR (mem)



OK, that's an easy enough change.


+  /* We failed to find a recorded value in the cselib history, so try the
+ source of this set.  */
+  rtx src = SET_SRC (set);
+  while (GET_CODE (src) == SUBREG)
+src = XEXP (src, 0);
+
+  if (MEM_P (src) && rtx_equal_for_cselib_1 (dest_addr, XEXP (src, 0),
+GET_MODE (dest), 0))
+return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
+ MEM_ALIAS_SET (src));

this looks like an odd case to me - wouldn't that only catch things
like self-assignments, aka *p = *p?  So I'd simply drop this fallback.


It catches the case of *p = *q when p and q have the same value.  It did 
come up in testing on x86 (when previously I was aborting to make sure 
I'd caught everything).  We could leave it out as the fallback case in 
this instance is to record a conflict, but it's not a path that's likely 
to be performance critical and the probability of this being a redundant 
store is quite high.  I'll update the comment to make this clearer.



R.



Otherwise it looks OK to me.

Thanks,
Richard.


gcc/ChangeLog:
 * cselib.h (cselib_redundant_set_p): Declare.
 * cselib.cc: Include alias.h
 (cselib_redundant_set_p): New function.
 * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
 of rtx_equal_for_cselib_p.
 * postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
 (reload_cse_noop_set_p): Delete.


[Patch] Add libgomp.c-c++-common/pr106449-2.c (was: [committed] openmp: Fix up handling of non-rectangular simd loops with pointer type iterators [PR106449])

2022-07-29 Thread Tobias Burnus

On 29.07.22 10:03, Jakub Jelinek wrote:

There were 2 issues visible on this new testcase, one that we didn't have
special POINTER_TYPE_P handling in a few spots of expand_omp_simd ...
The other issue was that we put n2 expression directly into a
comparison in a condition and regimplified that, for the [512] case that
and with gimplification being destructed that unfortunately meant modification
of original fd->loops[?].n2.  Fixed by unsharing the expression.


I created a testcase for the non-simd case – and due to messing up, it failed;
hence, I filled PR middle-end/106467.  After fixing the testcase, it passes.
(→ closed PR as invalid).

However, given that the testcase now exists, I think it makes sense to add it 
:-)

Changes compared to the simd testcase: replaced '(parallel for) simd' by 'for',
removed 'linear', used now 'b' and 'c' instead of storing both ptrs in 'b'.

Side remark: Before GCC 12, GCC complained about "q = p + n" with
"error: initializer expression refers to iteration variable ‘p’".

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Add libgomp.c-c++-common/pr106449-2.c

This run-time test test pointer-based iteration with collapse,
similar to the '(parallel) simd' test for PR106449 but for 'for'.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/pr106449-2.c: New test.

 .../testsuite/libgomp.c-c++-common/pr106449-2.c| 64 ++
 1 file changed, 64 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/pr106449-2.c b/libgomp/testsuite/libgomp.c-c++-common/pr106449-2.c
new file mode 100644
index 000..7fef7461bcf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/pr106449-2.c
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+/* Based on pr106449.c - but using 'for' instead of 'simd'.
+   Cf. PR middle-end/106449 (for pr106449.c) and PR middle-end/106467.  */
+
+void
+foo (void)
+{
+  int a[1024], *b[65536], *c[65536];
+  int *p, *q, **r = [0], **r2 = [0], i;
+  #pragma omp for collapse(2)
+  for (p = [0]; p < [512]; p++)
+for (q = p + 64; q < p + 128; q++)
+  {
+	*r++ = p;
+	*r2++ = q;
+  }
+  for (i = 0; i < 32768; i++)
+if (b[i] != [i / 64] || c[i] != [(i / 64) + 64 + (i % 64)])
+  __builtin_abort ();
+}
+
+void
+bar (int n, int m)
+{
+  int a[1024], *b[32768], *c[32768];
+  int *p, *q, **r = [0], **r2 = [0], i;
+  #pragma omp for collapse(2)
+  for (p = [0]; p < [512]; p++)
+for (q = p + n; q < p + m; q++)
+  {
+	*r++ = p;
+	*r2++ = q;
+  }
+  for (i = 0; i < 32768; i++)
+if (b[i] != [i / 64] || c[i] != [(i / 64) + 64 + (i % 64)])
+  __builtin_abort ();
+}
+
+void
+baz (int n, int m)
+{
+  int a[1024], *b[8192], *c[8192];
+  int *p, *q, **r = [0], **r2 = [0], i;
+  #pragma omp for collapse(2)
+  for (p = [0]; p < [512]; p += 4)
+for (q = p + n; q < p + m; q += 2)
+  {
+	*r++ = p;
+	*r2++ = q;
+  }
+  for (i = 0; i < 4096; i++)
+if (b[i] != [(i / 32) * 4] || c[i] != [(i / 32) * 4 + 64 + (i % 32) * 2])
+  __builtin_abort ();
+}
+
+int
+main ()
+{
+  foo ();
+  bar (64, 128);
+  baz (64, 128);
+  return 0;
+}


[Patch] OpenMP/Fortran: Permit assumed-size arrays in uniform clause

2022-07-29 Thread Tobias Burnus

Testcase wise, the run-time testcase libgomp.fortran/examples-4/simd-2.f90
checks essentially the same, except that it uses an array-descriptor array
(assumed shape) while this testcase uses an assumed-size array.

I decided for an extra compile-time only testcase, but it could be also be
moved to libgomp as run-time test or the other test could be extended to
also test assumed-size arrays.

The OpenMP examples document contains two testcases which now pass,
but are reject without this patch:
- SIMD/sources/SIMD.2.f90 (for OpenMP 4.0)
- SIMD/sources/linear_modifier.3.f90 (for OpenMP 5.2)

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Permit assumed-size arrays in uniform clause

gcc/fortran/ChangeLog:

	* openmp.cc (resolve_omp_clauses): Permit assumed-size arrays
	in uniform clause.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/declare-simd-3.f90: New test.

 gcc/fortran/openmp.cc |  3 ++-
 gcc/testsuite/gfortran.dg/gomp/declare-simd-3.f90 | 30 +++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index df9cdf43eb7..a7eb6c3e8f4 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7386,7 +7386,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 			|| code->op == EXEC_OACC_PARALLEL
 			|| code->op == EXEC_OACC_SERIAL))
 		  check_array_not_assumed (n->sym, n->where, name);
-		else if (n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
+		else if (list != OMP_LIST_UNIFORM
+			 && n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
 		  gfc_error ("Assumed size array %qs in %s clause at %L",
 			 n->sym->name, name, >where);
 		if (n->sym->attr.in_namelist && !is_reduction)
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-simd-3.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-simd-3.f90
new file mode 100644
index 000..b94587ef35a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-simd-3.f90
@@ -0,0 +1,30 @@
+! { dg-do compile }
+
+module m
+  implicit none (type, external)
+contains
+  real function add(x, y, j) result(res)
+!$omp declare simd(add) uniform(x, y) linear(j : 1) simdlen(4)
+integer, value :: j
+real, intent(in) :: x(*), y(*)
+res = x(j) + y(j)
+  end function
+end module m
+
+program main
+  use m
+  implicit none (type, external)
+  real, allocatable :: A(:), B(:), C(:)
+  integer :: i, N
+  N = 128
+  A = [(3*i, i = 1, N)]
+  B = [(7*i, i = 1, N)]
+  allocate (C(N))
+
+  !$omp simd
+  do i = 1, N
+C(i) = add(A, B, i)
+  end do
+
+  if (any (C /= [(10*i, i = 1, N)])) error stop
+end program main


Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Jul 2022, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> Thanks for the review, I don't completely understand all of the below, so I
> added some extra questions to help me understand :)
> 
> On 27/07/2022 12:37, Richard Biener wrote:
> > On Tue, 26 Jul 2022, Andre Vieira (lists) wrote:
> >
> > I don't think this is a good approach for what you gain and how
> > necessarily limited it will be.  Similar to the recent experiment with
> > handling _Complex loads/stores this is much better tackled by lowering
> > things earlier (it will be lowered at RTL expansion time).
> I assume the approach you are referring to here is the lowering of the
> BIT_FIELD_DECL to BIT_FIELD_REF in the vect_recog part of the vectorizer. I am
> all for lowering earlier, the reason I did it there was as a 'temporary'
> approach until we have that earlier loading.

I understood, but "temporary" things in GCC tend to be still around
10 years later, so ...

> >
> > One place to do this experimentation would be to piggy-back on the
> > if-conversion pass so the lowering would happen only on the
> > vectorized code path.
> This was one of my initial thoughts, though the if-conversion changes are a
> bit more intrusive for a temporary approach and not that much earlier. It does
> however have the added benefit of not having to make any changes to the
> vectorizer itself later if we do do the earlier lowering, assuming the
> lowering results in the same.
> 
> The 'only on the vectorized code path' remains the same though as vect_recog
> also only happens on the vectorized code path right?
> >   Note that the desired lowering would look like
> > the following for reads:
> >
> >_1 = a.b;
> >
> > to
> >
> >_2 = a.;
> >_1 = BIT_FIELD_REF <2, ...>; // extract bits
> I don't yet have a well formed idea of what '' is
> supposed to look like in terms of tree expressions. I understand what it's
> supposed to be representing, the 'larger than bit-field'-load. But is it going
> to be a COMPONENT_REF with a fake 'FIELD_DECL' with the larger size? Like I
> said on IRC, the description of BIT_FIELD_REF makes it sound like this isn't
> how we are supposed to use it, are we intending to make a change to that here?

 is what DECL_BIT_FIELD_REPRESENTATIVE 
(FIELD_DECL-for-b) gives you, it is a "fake" FIELD_DECL for the underlying
storage, conveniently made available to you by the middle-end.  For
your 31bit field it would be simply 'int' typed.

The BIT_FIELD_REF then extracts the actual bitfield from the underlying
storage, but it's now no longer operating on memory but on the register
holding the underlying data.  To the vectorizer we'd probably have to
pattern-match this to shifts & masks and hope for the conversion to
combine with a later one.

> > and for writes:
> >
> >a.b = _1;
> >
> > to
> >
> >_2 = a.;
> >_3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits
> >a. = _3;
> I was going to avoid writes for now because they are somewhat more
> complicated, but maybe it's not that bad, I'll add them too.

Only handling loads at start is probably fine as experiment, but
handling stores should be straight forward - of course the
BIT_INSERT_EXPR lowering to shifts & masks will be more
complicated.

> > so you trade now handled loads/stores with not handled
> > BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to
> > pattern match to shifts and logical ops in the vectorizer.
> Yeah that vect_recog pattern already exists in my RFC patch, though I can
> probably simplify it by moving the bit-field-ref stuff to ifcvt.
> >
> > There's a separate thing of actually promoting all uses, for
> > example
> >
> > struct { long long x : 33; } a;
> >
> >   a.a = a.a + 1;
> >
> > will get you 33bit precision adds (for bit fields less than 32bits
> > they get promoted to int but not for larger bit fields).  RTL
> > expansion again will rewrite this into larger ops plus masking.
> Not sure I understand why this is relevant here? The current way I am doing
> this would likely lower a  bit-field like that to a 64-bit load  followed by
> the masking away of the top 31 bits, same would happen with a ifcvt-lowering
> approach.

Yes, but if there's anything besides loading or storing you will have
a conversion from, say int:31 to int in the IL before any arithmetic.
I've not looked but your patch probably handles conversions to/from
bitfield types by masking / extending.  What I've mentioned with the
33bit example is that with that you can have arithmetic in 33 bits
_without_ intermediate conversions, so you'd have to properly truncate
after every such operation (or choose not to vectorize which I think
is what would happen now).

> >
> > So I think the time is better spent in working on the lowering of
> > bitfield accesses, if sufficiently separated it could be used
> > from if-conversion by working on loop SEME regions.
> I will start to look at modifying ifcvt to add the lowering there. Will likely
> require two pass though 

[committed] libstdc++: Tweak common_iterator::operator-> return type [PR104443]

2022-07-29 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This adjusts the return type to match the resolution of LWG 3672. There
is no functional difference, because decltype(auto) always deduced a
value anyway, but this makes it simpler and consistent with the working
draft.

libstdc++-v3/ChangeLog:

PR libstdc++/104443
* include/bits/stl_iterator.h (common_iterator::operator->):
Change return type to just auto.
---
 libstdc++-v3/include/bits/stl_iterator.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 9cd262cd1d9..003cebbec83 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -2058,7 +2058,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
 [[nodiscard]]
-constexpr decltype(auto)
+constexpr auto
 operator->() const requires __detail::__common_iter_has_arrow<_It>
 {
   __glibcxx_assert(_M_index == 0);
-- 
2.37.1



Re: [RFC] Teach vectorizer to deal with bitfield reads

2022-07-29 Thread Andre Vieira (lists) via Gcc-patches

Hi Richard,

Thanks for the review, I don't completely understand all of the below, 
so I added some extra questions to help me understand :)


On 27/07/2022 12:37, Richard Biener wrote:

On Tue, 26 Jul 2022, Andre Vieira (lists) wrote:

I don't think this is a good approach for what you gain and how
necessarily limited it will be.  Similar to the recent experiment with
handling _Complex loads/stores this is much better tackled by lowering
things earlier (it will be lowered at RTL expansion time).
I assume the approach you are referring to here is the lowering of the 
BIT_FIELD_DECL to BIT_FIELD_REF in the vect_recog part of the 
vectorizer. I am all for lowering earlier, the reason I did it there was 
as a 'temporary' approach until we have that earlier loading.


One place to do this experimentation would be to piggy-back on the
if-conversion pass so the lowering would happen only on the
vectorized code path.
This was one of my initial thoughts, though the if-conversion changes 
are a bit more intrusive for a temporary approach and not that much 
earlier. It does however have the added benefit of not having to make 
any changes to the vectorizer itself later if we do do the earlier 
lowering, assuming the lowering results in the same.


The 'only on the vectorized code path' remains the same though as 
vect_recog also only happens on the vectorized code path right?

  Note that the desired lowering would look like
the following for reads:

   _1 = a.b;

to

   _2 = a.;
   _1 = BIT_FIELD_REF <2, ...>; // extract bits
I don't yet have a well formed idea of what '' is 
supposed to look like in terms of tree expressions. I understand what 
it's supposed to be representing, the 'larger than bit-field'-load. But 
is it going to be a COMPONENT_REF with a fake 'FIELD_DECL' with the 
larger size? Like I said on IRC, the description of BIT_FIELD_REF makes 
it sound like this isn't how we are supposed to use it, are we intending 
to make a change to that here?



and for writes:

   a.b = _1;

to

   _2 = a.;
   _3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits
   a. = _3;
I was going to avoid writes for now because they are somewhat more 
complicated, but maybe it's not that bad, I'll add them too.

so you trade now handled loads/stores with not handled
BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to
pattern match to shifts and logical ops in the vectorizer.
Yeah that vect_recog pattern already exists in my RFC patch, though I 
can probably simplify it by moving the bit-field-ref stuff to ifcvt.


There's a separate thing of actually promoting all uses, for
example

struct { long long x : 33; } a;

  a.a = a.a + 1;

will get you 33bit precision adds (for bit fields less than 32bits
they get promoted to int but not for larger bit fields).  RTL
expansion again will rewrite this into larger ops plus masking.
Not sure I understand why this is relevant here? The current way I am 
doing this would likely lower a  bit-field like that to a 64-bit load  
followed by the masking away of the top 31 bits, same would happen with 
a ifcvt-lowering approach.


So I think the time is better spent in working on the lowering of
bitfield accesses, if sufficiently separated it could be used
from if-conversion by working on loop SEME regions.
I will start to look at modifying ifcvt to add the lowering there. Will 
likely require two pass though because we can no longer look at the 
number of BBs to determine whether ifcvt is even needed, so we will 
first need to look for bit-field-decls, then version the loops and then 
look for them again for transformation, but I guess that's fine?

The patches
doing previous implementations are probably not too useful anymore
(I find one from 2011 and one from 2016, both pre-dating BIT_INSERT_EXPR)

Richard.


[PATCH] tree-optimization/105679 - disable backward threading of unlikely entry

2022-07-29 Thread Richard Biener via Gcc-patches
The following makes the backward threader reject threads whose entry
edge is probably never executed according to the profile.  That in
particular, for the testcase, avoids threading the irq == 1 check
on the path where irq > 31, thereby avoiding spurious -Warray-bounds
diagnostics

  if (irq_1(D) > 31)
goto ; [0.00%]
  else
goto ; [100.00%]

;;   basic block 3, loop depth 0, count 0 (precise), probably never executed
  _2 = (unsigned long) irq_1(D);
  __builtin___ubsan_handle_shift_out_of_bounds (&*.Lubsan_data0, 1, _2);

  _3 = 1 << irq_1(D);
  mask_4 = (u32) _3;
  entry = instance_5(D)->array[irq_1(D)];
  capture (mask_4);
  if (level_6(D) != 0)
goto ; [34.00%]
  else
goto ; [66.00%]

;;   basic block 5, loop depth 0, count 708669600 (estimated locally), maybe 
hot  if (irq_1(D) == 1)
goto ; [20.97%]
  else
goto ; [79.03%]

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

The testcase in the PR requries both ubsan and sancov so I'm not sure
where to put it but IIRC there were quite some duplicate PRs wrt
threading unlikely paths exposing diagnostics, eventually some
testcase will come out of those (when we identify them).  Note
the patch is quite conservative in only disabling likely never
executed paths rather than requiring maybe_hot_edge_p (OTOH those
are somewhat similar in the end).

I'm going to push it when testing finishes but maybe there are some
testcases to adjust.

PR tree-optimization/105679
* tree-ssa-threadbackwards.cc
(back_threader_profitability::profitable_path_p): Avoid threading
when the entry edge is probably never executed.
---
 gcc/tree-ssa-threadbackward.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 3519aca84cd..90f5331c265 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -777,6 +777,15 @@ back_threader_profitability::profitable_path_p (const 
vec _path,
 "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
  return false;
}
+  edge entry = find_edge (m_path[m_path.length () - 1],
+ m_path[m_path.length () - 2]);
+  if (probably_never_executed_edge_p (cfun, entry))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
+"path entry is probably never executed.\n");
+ return false;
+   }
 }
   else if (!m_speed_p && n_insns > 1)
 {
-- 
2.35.3


[PATCH] tree-optimization/106422 - verify block copying in forward threading

2022-07-29 Thread Richard Biener via Gcc-patches
The forward threader failed to check whether it can actually duplicate
blocks.  The following adds this in a similar place the backwards threader
performs this check.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106422
* tree-ssa-threadupdate.cc (fwd_jt_path_registry::update_cfg):
Check whether we can copy thread blocks and cancel the thread if not.

* gcc.dg/torture/pr106422.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr106422.c | 14 ++
 gcc/tree-ssa-threadupdate.cc|  4 +++-
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr106422.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr106422.c 
b/gcc/testsuite/gcc.dg/torture/pr106422.c
new file mode 100644
index 000..a2cef1aecb6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr106422.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+void vfork() __attribute__((__leaf__));
+void semanage_reload_policy(char *arg, void cb(void))
+{
+  if (!arg)
+{
+  cb();
+  return;
+}
+  vfork();
+  if (arg)
+__builtin_free(arg);
+}
diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
index f901c7759e3..0f2b319d44a 100644
--- a/gcc/tree-ssa-threadupdate.cc
+++ b/gcc/tree-ssa-threadupdate.cc
@@ -2678,7 +2678,9 @@ fwd_jt_path_registry::update_cfg (bool 
may_peel_loop_headers)
for (j = 0; j < path->length (); j++)
  {
edge e = (*path)[j]->e;
-   if (m_removed_edges->find_slot (e, NO_INSERT))
+   if (m_removed_edges->find_slot (e, NO_INSERT)
+   || ((*path)[j]->type == EDGE_COPY_SRC_BLOCK
+   && !can_duplicate_block_p (e->src)))
  break;
  }
 
-- 
2.35.3


[PATCH 1/1] PR 106101: IBM zSystems: Fix strict_low_part problem

2022-07-29 Thread Andreas Krebbel via Gcc-patches
This avoids generating illegal (strict_low_part (reg ...)) RTXs. This
required two changes:

1. Do not use gen_lowpart to generate the inner expression of a
STRICT_LOW_PART.  gen_lowpart might fold the SUBREG either because
there is already a paradoxical subreg or because it can directly be
applied to the register. A new wrapper function makes sure that we
always end up having an actual SUBREG.

2. Change the movstrict patterns to enforce a SUBREG as inner operand
of the STRICT_LOW_PARTs.  The new predicate introduced for the
destination operand requires a SUBREG expression with a
register_operand as inner operand.  However, since reload strips away
the majority of the SUBREGs we have to accept single registers as well
once we reach reload.

Bootstrapped and regression tested on IBM zSystems 64 bit.

gcc/ChangeLog:

PR target/106101
* config/s390/predicates.md (subreg_register_operand): New
predicate.
* config/s390/s390-protos.h (s390_gen_lowpart_subreg): New
function prototype.
* config/s390/s390.cc (s390_gen_lowpart_subreg): New function.
(s390_expand_insv): Use s390_gen_lowpart_subreg instead of
gen_lowpart.
* config/s390/s390.md ("*get_tp_64", "*zero_extendhisi2_31")
("*zero_extendqisi2_31", "*zero_extendqihi2_31"): Likewise.
("movstrictqi", "movstricthi", "movstrictsi"): Use the
subreg_register_operand predicate instead of register_operand.

gcc/testsuite/ChangeLog:

PR target/106101
* gcc.c-torture/compile/pr106101.c: New test.
---
 gcc/config/s390/predicates.md | 12 
 gcc/config/s390/s390-protos.h |  1 +
 gcc/config/s390/s390.cc   | 27 +++-
 gcc/config/s390/s390.md   | 36 +--
 .../gcc.c-torture/compile/pr106101.c  | 62 +++
 5 files changed, 116 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr106101.c

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 33194d3f3d6..430cf6edfd6 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -594,3 +594,15 @@
 (define_predicate "addv_const_operand"
   (and (match_code "const_int")
(match_test "INTVAL (op) >= -32768 && INTVAL (op) <= 32767")))
+
+; Match (subreg (reg ...)) operands.
+; Used for movstrict destination operands
+; When replacing pseudos with hard regs reload strips away the
+; subregs. Accept also plain registers then to prevent the insn from
+; becoming unrecognizable.
+(define_predicate "subreg_register_operand"
+  (ior (and (match_code "subreg")
+   (match_test "register_operand (SUBREG_REG (op), GET_MODE 
(SUBREG_REG (op)))"))
+   (and (match_code "reg")
+   (match_test "reload_completed || reload_in_progress")
+   (match_test "register_operand (op, GET_MODE (op))"
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index fd4acaae44a..765d843a418 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -50,6 +50,7 @@ extern void s390_set_has_landing_pad_p (bool);
 extern bool s390_hard_regno_rename_ok (unsigned int, unsigned int);
 extern int s390_class_max_nregs (enum reg_class, machine_mode);
 extern bool s390_return_addr_from_memory(void);
+extern rtx s390_gen_lowpart_subreg (machine_mode, rtx);
 extern bool s390_fma_allowed_p (machine_mode);
 #if S390_USE_TARGET_ATTRIBUTE
 extern tree s390_valid_target_attribute_tree (tree args,
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 5aaf76a9490..5e06bf9350c 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -458,6 +458,31 @@ s390_return_addr_from_memory ()
   return cfun_gpr_save_slot(RETURN_REGNUM) == SAVE_SLOT_STACK;
 }
 
+/* Generate a SUBREG for the MODE lowpart of EXPR.
+
+   In contrast to gen_lowpart it will always return a SUBREG
+   expression.  This is useful to generate STRICT_LOW_PART
+   expressions.  */
+rtx
+s390_gen_lowpart_subreg (machine_mode mode, rtx expr)
+{
+  rtx lowpart = gen_lowpart (mode, expr);
+
+  /* There might be no SUBREG in case it could be applied to the hard
+ REG rtx or it could be folded with a paradoxical subreg.  Bring
+ it back.  */
+  if (!SUBREG_P (lowpart))
+{
+  machine_mode reg_mode = TARGET_ZARCH ? DImode : SImode;
+  gcc_assert (REG_P (lowpart));
+  lowpart = gen_lowpart_SUBREG (mode,
+   gen_rtx_REG (reg_mode,
+REGNO (lowpart)));
+}
+
+  return lowpart;
+}
+
 /* Return nonzero if it's OK to use fused multiply-add for MODE.  */
 bool
 s390_fma_allowed_p (machine_mode mode)
@@ -6520,7 +6545,7 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
   /* Emit a strict_low_part pattern if possible.  */
   if (smode_bsize == bitsize && bitpos == mode_bsize - smode_bsize)
{
- rtx 

[committed] openmp: Reject invalid forms of C++ #pragma omp atomic compare [PR106448]

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

The allowed syntaxes of atomic compare don't allow ()s around the condition
of ?:, but we were accepting it in one case for C++.

Fixed thusly.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

PR c++/106448
* parser.cc (cp_parser_omp_atomic): For simple cast followed by
CPP_QUERY token, don't try cp_parser_binary_operation if compare
is true.

* c-c++-common/gomp/atomic-32.c: New test.

--- gcc/cp/parser.cc.jj 2022-07-26 10:32:23.557273390 +0200
+++ gcc/cp/parser.cc2022-07-28 15:12:44.288195066 +0200
@@ -41535,7 +41535,9 @@ restart:
  goto saw_error;
}
  token = cp_lexer_peek_token (parser->lexer);
- if (token->type != CPP_SEMICOLON && !cp_tree_equal (lhs, rhs1))
+ if (token->type != CPP_SEMICOLON
+ && (!compare || token->type != CPP_QUERY)
+ && !cp_tree_equal (lhs, rhs1))
{
  cp_parser_abort_tentative_parse (parser);
  cp_parser_parse_tentatively (parser);
--- gcc/testsuite/c-c++-common/gomp/atomic-32.c.jj  2022-07-28 
15:23:54.567237524 +0200
+++ gcc/testsuite/c-c++-common/gomp/atomic-32.c 2022-07-28 15:25:25.127027325 
+0200
@@ -0,0 +1,14 @@
+/* PR c++/106448 */
+
+int x, expr;
+  
+void
+foo (void)
+{
+  #pragma omp atomic compare
+  x = (expr > x) ? expr : x;   /* { dg-error "invalid (form|operator)" } */
+  #pragma omp atomic compare
+  x = (x < expr) ? expr : x;   /* { dg-error "invalid (form|operator)" } */
+  #pragma omp atomic compare
+  x = (x == expr) ? expr : x;  /* { dg-error "invalid (form|operator)" } */
+}

Jakub



[committed] openmp: Fix up handling of non-rectangular simd loops with pointer type iterators [PR106449]

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

There were 2 issues visible on this new testcase, one that we didn't have
special POINTER_TYPE_P handling in a few spots of expand_omp_simd - for
pointers we need to use POINTER_PLUS_EXPR and need to have the non-pointer
part in sizetype, for non-rectangular loop on the other side we can rely on
multiplication factor 1, pointers can't be multiplied, without those changes
we'd ICE.  The other issue was that we put n2 expression directly into a
comparison in a condition and regimplified that, for the [512] case that
and with gimplification being destructed that unfortunately meant modification
of original fd->loops[?].n2.  Fixed by unsharing the expression.  This was
causing a runtime failure on the testcase.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

PR middle-end/106449
* omp-expand.cc (expand_omp_simd): Fix up handling of pointer
iterators in non-rectangular simd loops.  Unshare fd->loops[i].n2
or n2 before regimplifying it inside of a condition.

* testsuite/libgomp.c-c++-common/pr106449.c: New test.

--- gcc/omp-expand.cc.jj2022-06-28 13:03:30.930689700 +0200
+++ gcc/omp-expand.cc   2022-07-28 10:47:03.138939297 +0200
@@ -6714,7 +6696,7 @@ expand_omp_simd (struct omp_region *regi
   if (fd->loops[i].m2)
t = n2v = create_tmp_var (itype);
   else
-   t = fold_convert (itype, fd->loops[i].n2);
+   t = fold_convert (itype, unshare_expr (fd->loops[i].n2));
   t = force_gimple_operand_gsi (, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
   tree v = fd->loops[i].v;
@@ -6728,7 +6710,7 @@ expand_omp_simd (struct omp_region *regi
   if (fd->collapse > 1 && !broken_loop)
t = n2var;
   else
-   t = fold_convert (type, n2);
+   t = fold_convert (type, unshare_expr (n2));
   t = force_gimple_operand_gsi (, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
   tree v = fd->loop.v;
@@ -6840,7 +6819,7 @@ expand_omp_simd (struct omp_region *regi
  if (fd->loops[i].m2)
t = nextn2v = create_tmp_var (itype);
  else
-   t = fold_convert (itype, fd->loops[i].n2);
+   t = fold_convert (itype, unshare_expr (fd->loops[i].n2));
  t = force_gimple_operand_gsi (, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
  tree v = fd->loops[i].v;
@@ -6870,17 +6849,25 @@ expand_omp_simd (struct omp_region *regi
  ne->probability = e->probability.invert ();
 
  gsi = gsi_after_labels (init_bb);
- t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
-   fd->loops[i + 1].n1);
  if (fd->loops[i + 1].m1)
{
- tree t2 = fold_convert (TREE_TYPE (t),
+ tree t2 = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
  fd->loops[i + 1
- fd->loops[i + 1].outer].v);
- tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m1);
- t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
- t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+ if (POINTER_TYPE_P (TREE_TYPE (t2)))
+   t = fold_build_pointer_plus (t2, fd->loops[i + 1].n1);
+ else
+   {
+ t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
+   fd->loops[i + 1].n1);
+ tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m1);
+ t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
+ t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+   }
}
+ else
+   t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
+ fd->loops[i + 1].n1);
  expand_omp_build_assign (, fd->loops[i + 1].v, t);
  if (fd->loops[i + 1].m2)
{
@@ -6889,14 +6876,19 @@ expand_omp_simd (struct omp_region *regi
  gcc_assert (n2v == NULL_TREE);
  n2v = create_tmp_var (TREE_TYPE (fd->loops[i + 1].v));
}
- t = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
-   fd->loops[i + 1].n2);
- tree t2 = fold_convert (TREE_TYPE (t),
+ tree t2 = fold_convert (TREE_TYPE (fd->loops[i + 1].v),
  fd->loops[i + 1
- fd->loops[i + 1].outer].v);
- tree t3 = fold_convert (TREE_TYPE (t), fd->loops[i + 1].m2);
- t2 = fold_build2 (MULT_EXPR, TREE_TYPE (t), t2, t3);
- t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t, t2);
+ if (POINTER_TYPE_P (TREE_TYPE (t2)))
+   t = fold_build_pointer_plus (t2, fd->loops[i + 1].n2);
+ else
+ 

[committed] openmp: Simplify fold_build_pointer_plus callers in omp-expand

2022-07-29 Thread Jakub Jelinek via Gcc-patches
Hi!

Tobias mentioned in PR106449 that fold_build_pointer_plus already
fold_converts the second argument to sizetype if it doesn't already
have an integral type gimple compatible with sizetype.

So, this patch simplifies the callers of fold_build_pointer_plus in
omp-expand so that they don't do those conversions manually.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-07-29  Jakub Jelinek  

* omp-expand.cc (expand_omp_for_init_counts, expand_omp_for_init_vars,
extract_omp_for_update_vars, expand_omp_for_ordered_loops,
expand_omp_simd): Don't fold_convert second argument to
fold_build_pointer_plus to sizetype.

--- gcc/omp-expand.cc.jj2022-06-28 13:03:30.930689700 +0200
+++ gcc/omp-expand.cc   2022-07-28 10:47:03.138939297 +0200
@@ -2267,8 +2267,7 @@ expand_omp_for_init_counts (struct omp_f
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[i].m1));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[i].n1));
+ t = unshare_expr (fd->loops[i].n1);
  n1 = fold_build_pointer_plus (vs[i - fd->loops[i].outer], t);
}
  else
@@ -2291,8 +2290,7 @@ expand_omp_for_init_counts (struct omp_f
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[i].m2));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[i].n2));
+ t = unshare_expr (fd->loops[i].n2);
  n2 = fold_build_pointer_plus (vs[i - fd->loops[i].outer], t);
}
  else
@@ -2353,8 +2351,7 @@ expand_omp_for_init_counts (struct omp_f
  tree step = fold_convert (itype,
unshare_expr (fd->loops[i].step));
  if (POINTER_TYPE_P (TREE_TYPE (vs[i])))
-   t = fold_build_pointer_plus (vs[i],
-fold_convert (sizetype, step));
+   t = fold_build_pointer_plus (vs[i], step);
  else
t = fold_build2 (PLUS_EXPR, itype, vs[i], step);
  t = force_gimple_operand_gsi (, t, true, NULL_TREE,
@@ -2794,8 +2791,7 @@ expand_omp_for_init_vars (struct omp_for
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[j].m1));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[j].n1));
+ t = unshare_expr (fd->loops[j].n1);
  n1 = fold_build_pointer_plus (vs[j - fd->loops[j].outer], t);
}
  else
@@ -2818,8 +2814,7 @@ expand_omp_for_init_vars (struct omp_for
  else if (POINTER_TYPE_P (itype))
{
  gcc_assert (integer_onep (fd->loops[j].m2));
- t = fold_convert (sizetype,
-   unshare_expr (fd->loops[j].n2));
+ t = unshare_expr (fd->loops[j].n2);
  n2 = fold_build_pointer_plus (vs[j - fd->loops[j].outer], t);
}
  else
@@ -2895,8 +2890,7 @@ expand_omp_for_init_vars (struct omp_for
  tree step
= fold_convert (itype, unshare_expr (fd->loops[j].step));
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (vs[j], fold_convert (sizetype,
- step));
+   t = fold_build_pointer_plus (vs[j], step);
  else
t = fold_build2 (PLUS_EXPR, itype, vs[j], step);
}
@@ -2959,8 +2953,7 @@ expand_omp_for_init_vars (struct omp_for
= fold_convert (itype, unshare_expr (fd->loops[j].step));
  t = fold_build2 (MULT_EXPR, itype, t, t2);
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (n1,
-fold_convert (sizetype, t));
+   t = fold_build_pointer_plus (n1, t);
  else
t = fold_build2 (PLUS_EXPR, itype, n1, t);
}
@@ -2970,8 +2963,7 @@ expand_omp_for_init_vars (struct omp_for
  t = fold_build2 (MULT_EXPR, itype, t,
   fold_convert (itype, fd->loops[j].step));
  if (POINTER_TYPE_P (vtype))
-   t = fold_build_pointer_plus (fd->loops[j].n1,
-fold_convert (sizetype, t));
+   t = fold_build_pointer_plus (fd->loops[j].n1, t);
  else
t = fold_build2 (PLUS_EXPR, itype, fd->loops[j].n1, t);
}
@@ -3035,9 +3027,8 @@ 

RE: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.

2022-07-29 Thread Roger Sayle


Hi Segher,
> > > To implement this, and some closely related transformations, we
> > > build upon the existing val_signbit_known_clear_p predicate.  In the
> > > first chunk, nonzero_bits knows that FFS and ABS can't leave the
> > > sign-bit bit set,
> >
> > Is that guaranteed in all cases?  Also at -O0, also for args bigger
> > than 64 bits?
> 
> val_signbit_known_clear_p should work for any size/precision arg.

No, you're right!  Please forgive/excuse me.  Neither val_signbit_p nor
nonzero_bits have yet been updated to use "wide_int", so don't work
for TImode or wider modes.  Doh!

I'm shocked.

Roger
--




Re: cselib: add function to check if SET is redundant [PR106187]

2022-07-29 Thread Richard Biener via Gcc-patches
On Thu, Jul 28, 2022 at 6:46 PM Richard Earnshaw
 wrote:
>
> [resend with correct subject line]
>
> A SET operation that writes memory may have the same value as an earlier
> store but if the alias sets of the new and earlier store do not conflict
> then the set is not truly redundant.  This can happen, for example, if
> objects of different types share a stack slot.
>
> To fix this we define a new function in cselib that first checks for
> equality and if that is successful then finds the earlier store in the
> value history and checks the alias sets.
>
> The routine is used in two places elsewhere in the compiler.  Firstly
> in cfgcleanup and secondly in postreload.

I can't comment on the stripping on SUBREGs and friends but it seems
to be conservative apart from

+  if (!flag_strict_aliasing || !MEM_P (dest))
+return true;

where if dest is not a MEM but were to contain one we'd miss it.
Double-checking
from more RTL literate people appreciated.

+  /* Lookup the equivalents to the dest.  This is more likely to succeed
+ than looking up the equivalents to the source (for example, when the
+ src is some form of constant).  */

I think the comment is misleading - we _do_ have to lookup the MEM,
looking up equivalences of a reg or an expression on the RHS isn't
what we are interested in.

+   return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
+ MEM_ALIAS_SET (src_equiv));

that's not conservative enough - dse.cc has correct boilerplate, we have
to check both MEM_ALIAS_SET and MEM_EXPR here (the latter only
if the former load/store has a MEM_EXPR).  Note in particular
using alias_set_subset_of instead of alias_sets_conflict_p.

  /* We can only remove the later store if the earlier aliases
 at least all accesses the later one.  */
  && ((MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
   || alias_set_subset_of (MEM_ALIAS_SET (mem),
   MEM_ALIAS_SET (s_info->mem)))
  && (!MEM_EXPR (s_info->mem)
  || refs_same_for_tbaa_p (MEM_EXPR (s_info->mem),
   MEM_EXPR (mem)

+  /* We failed to find a recorded value in the cselib history, so try the
+ source of this set.  */
+  rtx src = SET_SRC (set);
+  while (GET_CODE (src) == SUBREG)
+src = XEXP (src, 0);
+
+  if (MEM_P (src) && rtx_equal_for_cselib_1 (dest_addr, XEXP (src, 0),
+GET_MODE (dest), 0))
+return alias_sets_conflict_p (MEM_ALIAS_SET (dest),
+ MEM_ALIAS_SET (src));

this looks like an odd case to me - wouldn't that only catch things
like self-assignments, aka *p = *p?  So I'd simply drop this fallback.

Otherwise it looks OK to me.

Thanks,
Richard.

> gcc/ChangeLog:
> * cselib.h (cselib_redundant_set_p): Declare.
> * cselib.cc: Include alias.h
> (cselib_redundant_set_p): New function.
> * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
> of rtx_equal_for_cselib_p.
> * postreload.c (reload_cse_simplify): Use cselib_redundant_set_p.
> (reload_cse_noop_set_p): Delete.


Re: [PATCH v2] LoongArch: Define the macro ASM_PREFERRED_EH_DATA_FORMAT by checking the assembler's support for eh_frame encoding.

2022-07-29 Thread Lulu Cheng



在 2022/7/29 下午12:02, Lulu Cheng 写道:

The ASM_PREFERRED_EH_DATA_FORMAT macro before and after modification is as 
follows:

  #define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
-  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_absptr)
+  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | DW_EH_PE_sdata4)

Use the following tests to describe the effect of modifying this macro on the 
generated
assembly code:

#include 
#include 

using namespace std;
int main()
{

  try {
  throw 1;
  }
  catch (int i)
  {
cout << i << endl;
  }
}

The main comparisons related to the eh_frame section are as follows:

   OLD  
  NEW
.LFB1997 = .  |  .LFB1780 = .
   .cfi_startproc  |  
.cfi_startproc
   .cfi_personality 0x80,DW.ref.__gxx_personality_v0   |  
.cfi_personality 0x9b,DW.ref.__gxx_personality_v0
   .cfi_lsda 0,.LLSDA1997  |  
.cfi_lsda 0x1b,.LLSDA1780

If the assembly file generated by the new gcc is compiled with the binutils of 
version 2.38, the
following error will be reported:
test.s:74: Error: invalid or unsupported encoding in .cfi_personality
test.s:75: Error: invalid or unsupported encoding in .cfi_lsda

So I think I should judge whether binutils supports this encoding when gcc is 
configured, and then choose how to define
macro ASM_PREFERRED_EH_DATA_FORMAT.





.eh_frame DW_EH_PE_pcrel encoding format is not supported by gas <= 2.39.
Check if the assembler support DW_EH_PE_PCREL encoding and define .eh_frame
encoding type.

gcc/ChangeLog:

* config.in: Regenerate.
* config/loongarch/loongarch.h (ASM_PREFERRED_EH_DATA_FORMAT):
Select the value of the macro definition according to whether
HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT is defined.
* configure: Regenerate.
* configure.ac: Reinstate HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT test.
---
  gcc/config.in|  8 +++-
  gcc/config/loongarch/loongarch.h |  5 +
  gcc/configure| 34 
  gcc/configure.ac |  8 
  4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/config.in b/gcc/config.in
index 16bb963b45b..413b2bd36cb 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -404,13 +404,19 @@
  #endif
  
  
+/* Define if your assembler supports eh_frame pcrel encoding. */

+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT
+#endif
+
+
  /* Define if your assembler supports the R_PPC64_ENTRY relocation. */
  #ifndef USED_FOR_TARGET
  #undef HAVE_AS_ENTRY_MARKERS
  #endif
  
  
-/* Define if your assembler supports explicit relocations. */

+/* Define if your assembler supports explicit relocation. */
  #ifndef USED_FOR_TARGET
  #undef HAVE_AS_EXPLICIT_RELOCS
  #endif
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 89a5bd728fe..8b1288961e4 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -1127,8 +1127,13 @@ struct GTY (()) machine_function
  };
  #endif
  
+#ifdef HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT

+#define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
+  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | DW_EH_PE_sdata4)
+#else
  #define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
(((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_absptr)
+#endif
  
  /* Do emit .note.GNU-stack by default.  */

  #ifndef NEED_INDICATE_EXEC_STACK
diff --git a/gcc/configure b/gcc/configure
index 7eb9479ae8e..05efa5b01ef 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -28836,6 +28836,40 @@ if test $gcc_cv_as_loongarch_explicit_relocs = yes; 
then
  
  $as_echo "#define HAVE_AS_EXPLICIT_RELOCS 1" >>confdefs.h
  
+fi

+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for eh_frame pcrel 
encoding support" >&5
+$as_echo_n "checking assembler for eh_frame pcrel encoding support... " >&6; }
+if ${gcc_cv_as_loongarch_eh_frame_pcrel_encoding_support+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_loongarch_eh_frame_pcrel_encoding_support=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.cfi_startproc
+   .cfi_personality 0x9b,a
+   .cfi_lsda 0x1b,b
+   .cfi_endproc' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_loongarch_eh_frame_pcrel_encoding_support=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+  

RE: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.

2022-07-29 Thread Roger Sayle


Hi Segher,
 
> On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> > This patch implements some additional zero-extension and
> > sign-extension related optimizations in simplify-rtx.cc.  The original
> > motivation comes from PR rtl-optimization/71775, where in comment #2
> Andrew Pinski sees:
> >
> > Failed to match this instruction:
> > (set (reg:DI 88 [ _1 ])
> > (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> >
> > On many platforms the result of DImode CTZ is constrained to be a
> > small unsigned integer (between 0 and 64), hence the truncation to
> > 32-bits (using a SUBREG) and the following sign extension back to
> > 64-bits are effectively a no-op, so the above should ideally (often)
> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
> 
> And you can also do that if ctz is undefined for a zero argument!

Forgive my perhaps poor use of terminology.  The case of ctz 0 on
x64_64 isn't "undefined behaviour" (UB) in the C/C++ sense that
would allow us to do anything, but implementation defined (which
Intel calls "undefined" in their documentation).  Hence, we don't
know which DI value is placed in the result register.  In this case,
truncating to SI mode, then sign extending the result is not a no-op,
as the top bits will/must now all be the same [though admittedly to an
unknown undefined signbit].  Hence the above optimization would 
be invalid, as it doesn't guarantee the result would be sign-extended.

> > To implement this, and some closely related transformations, we build
> > upon the existing val_signbit_known_clear_p predicate.  In the first
> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> > bit set,
> 
> Is that guaranteed in all cases?  Also at -O0, also for args bigger than
> 64 bits?

val_signbit_known_clear_p should work for any size/precision arg.
I'm not sure if the results are affected by -O0, but even if they are, this
will
not affect correctness only whether these optimizations are performed,
which is precisely what -O0 controls.
 
> > +  /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> > +  if (GET_CODE (op) == SUBREG
> > + && subreg_lowpart_p (op)
> > + && GET_MODE (SUBREG_REG (op)) == mode
> > + && is_a  (mode, _mode)
> > + && is_a  (GET_MODE (op), _mode)
> > + && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> > + && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
> (int_mode)
> > + && (nonzero_bits (SUBREG_REG (op), mode)
> > + & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
> 
> (spaces around >> please)

Doh! Good catch, thanks.

> Please use val_signbit_known_{set,clear}_p?

Alas, it's not just the SI mode's signbit that we care about, but all of the
bits above it in the DImode operand/result.  These all need to be zero,
for the operand to already be zero-extended/sign_extended.

> > +   return SUBREG_REG (op);
> 
> Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if the
> value it returns in its second arg does not survive sign extending
unmodified (if it
> is 0x for an extend from SI to DI for example).

Fortunately, C[LT]Z_DEFINED_VALUE_AT_ZERO being defined to return a negative
result, such as -1 is already handled (accounted for) in nonzero_bits.  The
relevant
code in rtlanal.cc's nonzero_bits1 is:

case CTZ:
  /* If CTZ has a known value at zero, then the nonzero bits are
 that value, plus the number of bits in the mode minus one.  */
  if (CTZ_DEFINED_VALUE_AT_ZERO (mode, nonzero))
nonzero
  |= (HOST_WIDE_INT_1U << (floor_log2 (mode_width))) - 1;
  else
nonzero = -1;
  break;

Hence, any bits set by the constant returned by the target's
DEFINED_VALUE_AT_ZERO will be set in the result of nonzero_bits.
So if this is negative, say -1, then val_signbit_known_clear_p (or the
more complex tests above) will return false.

I'm currently bootstrapping and regression testing the whitespace 
change/correction suggested above.

Thanks,
Roger
--




Re: [PATCH] libsanitizer: Cherry-pick 2bfb0fcb51510f22723c8cdfefe from upstream

2022-07-29 Thread Dimitrije Milosevic
Thanks Martin! I'm sending out the output from git format-patch as an 
attachment to this email.


From: Martin Liška 
Sent: Thursday, July 28, 2022 3:48 PM
To: Dimitrije Milosevic ; 
gcc-patches@gcc.gnu.org 
Cc: ma...@embecosm.com ; Djordje Todorovic 
; jos...@codesourcery.com 

Subject: Re: [PATCH] libsanitizer: Cherry-pick 2bfb0fcb51510f22723c8cdfefe from 
upstream 
 
On 7/28/22 15:43, Dimitrije Milosevic wrote:
> |Gentle ping, requiring someone to push the change. :)|

Sure, I can do that, but please attach output of (git format-patch) as an 
attachment
to an email. The current email has a weird format I can directly apply with git 
am.

Cheers,
MartinFrom 9d7646428bf36449cde380230bcc20fa835857dc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dimitrije=20Milo=C5=A1evi=C4=87?=
 
Date: Fri, 29 Jul 2022 08:36:06 +0200
Subject: [PATCH] libsanitizer: Cherry-pick 2bfb0fcb51510f22723c8cdfefe from
 upstream

2bfb0fcb51510f22723c8cdfefe [Sanitizer][MIPS] Fix stat struct size for the O32 ABI.

Signed-off-by: Dimitrije Milosevic .

---
 .../sanitizer_common/sanitizer_platform_limits_posix.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
index 89772a7e5c0..75c6cc7f285 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
@@ -81,9 +81,10 @@ const unsigned struct_kernel_stat64_sz = 104;
 const unsigned struct_kernel_stat_sz = 144;
 const unsigned struct_kernel_stat64_sz = 104;
 #elif defined(__mips__)
-const unsigned struct_kernel_stat_sz = SANITIZER_ANDROID
-   ? FIRST_32_SECOND_64(104, 128)
-   : FIRST_32_SECOND_64(144, 216);
+const unsigned struct_kernel_stat_sz =
+SANITIZER_ANDROID
+? FIRST_32_SECOND_64(104, 128)
+: FIRST_32_SECOND_64((_MIPS_SIM == _ABIN32) ? 160 : 144, 216);
 const unsigned struct_kernel_stat64_sz = 104;
 #elif defined(__s390__) && !defined(__s390x__)
 const unsigned struct_kernel_stat_sz = 64;
-- 
2.25.1



Re: [PATCH v2 00/10] Introduce strub: machine-independent stack scrubbing

2022-07-29 Thread Alexandre Oliva via Gcc-patches
On Jul 29, 2022, Alexandre Oliva  wrote:

> This patch adds the strub attribute for function and variable types,
> command-line options, passes and adjustments to implement it,
> documentation, and tests.

The entire patch, and the patchlets used for testing, are available from
the GCC git repo, branch refs/users/aoliva/heads/strub:

e9d1e252e0bcd silence warnings
4a93638ee91b1 enable strub for all viable functions by default, for testing
29e98f10c5faa Introduce strub: machine-independent stack scrubbing

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH v2 10/10] Introduce strub: strub pass

2022-07-29 Thread Alexandre Oliva via Gcc-patches


This final fragment of ipa-strub.cc adds the strub pass, that
implements the needed function interface changes and adds calls to the
strub builtins.

+/* Define a pass to introduce strub transformations.  */
+const pass_data pass_data_ipa_strub = {
+  SIMPLE_IPA_PASS,
+  "strub",
+  OPTGROUP_NONE,
+  TV_NONE,
+  PROP_cfg | PROP_ssa, // properties_required
+  0,   // properties_provided
+  0,   // properties_destroyed
+  0,   // properties_start
+  TODO_update_ssa
+  | TODO_cleanup_cfg
+  | TODO_rebuild_cgraph_edges
+  | TODO_verify_il, // properties_finish
+};
+
+class pass_ipa_strub : public simple_ipa_opt_pass
+{
+public:
+  pass_ipa_strub (gcc::context *ctxt)
+: simple_ipa_opt_pass (pass_data_ipa_strub, ctxt)
+  {}
+  opt_pass *clone () { return new pass_ipa_strub (m_ctxt); }
+  virtual bool gate (function *) { return flag_strub && !seen_error (); }
+  virtual unsigned int execute (function *);
+
+  /* Define on demand and cache some types we use often.  */
+#define DEF_TYPE(NAME, INIT)   \
+  static inline tree get_ ## NAME () { \
+static tree type = NULL_TREE;  \
+if (!type) \
+  type = (INIT);   \
+return type;   \
+  }
+
+  /* Use a distinct ptr_type_node to denote the watermark, so that we can
+ recognize it in arg lists and avoid modifying types twice.  */
+  DEF_TYPE (wmt, build_variant_type_copy (ptr_type_node))
+
+  DEF_TYPE (pwmt, build_reference_type (get_wmt ()))
+
+  DEF_TYPE (qpwmt,
+   build_qualified_type (get_pwmt (),
+ TYPE_QUAL_RESTRICT
+ /* | TYPE_QUAL_CONST */))
+
+  DEF_TYPE (qptr,
+   build_qualified_type (ptr_type_node,
+ TYPE_QUAL_RESTRICT
+ | TYPE_QUAL_CONST))
+
+  DEF_TYPE (qpvalst,
+   build_qualified_type (build_reference_type
+ (va_list_type_node),
+ TYPE_QUAL_RESTRICT
+ /* | TYPE_QUAL_CONST */))
+
+#undef DEF_TYPE
+
+  /* Define non-strub builtins on demand.  */
+#define DEF_NM_BUILTIN(NAME, CODE, FNTYPELIST) \
+  static tree get_ ## NAME () {\
+tree decl = builtin_decl_explicit (CODE);  \
+if (!decl) \
+  {\
+   tree type = build_function_type_list FNTYPELIST;\
+   decl = add_builtin_function \
+ ("__builtin_" #NAME,  \
+  type, CODE, BUILT_IN_NORMAL, \
+  NULL, NULL); \
+   TREE_NOTHROW (decl) = true; \
+   set_builtin_decl ((CODE), decl, true);  \
+  }\
+return decl;   \
+  }
+
+  DEF_NM_BUILTIN (stack_address,
+ BUILT_IN_STACK_ADDRESS,
+ (ptr_type_node, NULL))
+
+#undef DEF_NM_BUILTIN
+
+  /* Define strub builtins on demand.  */
+#define DEF_SS_BUILTIN(NAME, FNSPEC, CODE, FNTYPELIST) \
+  static tree get_ ## NAME () {\
+tree decl = builtin_decl_explicit (CODE);  \
+if (!decl) \
+  {\
+   tree type = build_function_type_list FNTYPELIST;\
+   tree attrs = NULL;  \
+   if (FNSPEC && HAVE_ATTR_FNSPEC) \
+ attrs = tree_cons (get_identifier ("fn spec"),\
+build_tree_list\
+(NULL_TREE,\
+ build_string (strlen (FNSPEC),\
+   (FNSPEC))), \
+attrs);\
+   decl = add_builtin_function_ext_scope   \
+ ("__builtin___strub_" #NAME,  \
+  type, CODE, BUILT_IN_NORMAL, \
+  "__strub_" #NAME, attrs);\
+   TREE_NOTHROW (decl) = true; \
+   set_builtin_decl ((CODE), decl, true);  \
+  }\
+return decl;   \
+  }
+
+  DEF_SS_BUILTIN (enter, ". Ot",
+ BUILT_IN___STRUB_ENTER,
+ (void_type_node, get_qpwmt (), NULL))
+  

[PATCH v2 09/10] Introduce strub: strubm (mode assignment) pass

2022-07-29 Thread Alexandre Oliva via Gcc-patches


This middle fragment of ipa-strub.cc covers strub mode selection and
assignment logic, and most of the pass that performs that assignment.

+/* Return TRUE iff NODE calls builtin va_start.  */
+
+static bool
+calls_builtin_va_start_p (cgraph_node *node)
+{
+  bool result = false;
+
+  for (cgraph_edge *e = node->callees; e; e = e->next_callee)
+{
+  tree cdecl = e->callee->decl;
+  if (fndecl_built_in_p (cdecl, BUILT_IN_VA_START))
+   return true;
+}
+
+  return result;
+}
+
+/* Return TRUE iff NODE calls builtin apply_args, and optionally REPORT it.  */
+
+static bool
+calls_builtin_apply_args_p (cgraph_node *node, bool report = false)
+{
+  bool result = false;
+
+  for (cgraph_edge *e = node->callees; e; e = e->next_callee)
+{
+  tree cdecl = e->callee->decl;
+  if (!fndecl_built_in_p (cdecl, BUILT_IN_APPLY_ARGS))
+   continue;
+
+  result = true;
+
+  if (!report)
+   break;
+
+  sorry_at (gimple_location (e->call_stmt),
+   "at-calls % does not support call to %qD",
+   cdecl);
+}
+
+  return result;
+}
+
+/* Return TRUE iff NODE carries the always_inline attribute.  */
+
+static inline bool
+strub_always_inline_p (cgraph_node *node)
+{
+  return lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl));
+}
+
+/* Return TRUE iff NODE is potentially eligible for any strub-enabled mode, and
+   optionally REPORT the reasons for ineligibility.  */
+
+static inline bool
+can_strub_p (cgraph_node *node, bool report = false)
+{
+  bool result = true;
+
+  if (!report && strub_always_inline_p (node))
+return result;
+
+  if (lookup_attribute ("noipa", DECL_ATTRIBUTES (node->decl)))
+{
+  result = false;
+
+  if (!report)
+   return result;
+
+  sorry_at (DECL_SOURCE_LOCATION (node->decl),
+   "%qD is not eligible for %"
+   " because of attribute %",
+   node->decl);
+}
+
+  /* We can't, and don't want to vectorize the watermark and other
+ strub-introduced parms.  */
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (node->decl)))
+{
+  result = false;
+
+  if (!report)
+   return result;
+
+  sorry_at (DECL_SOURCE_LOCATION (node->decl),
+   "%qD is not eligible for %"
+   " because of attribute %",
+   node->decl);
+}
+
+  return result;
+}
+
+/* Return TRUE iff NODE is eligible for at-calls strub, and optionally REPORT
+   the reasons for ineligibility.  Besides general non-eligibility for
+   strub-enabled modes, at-calls rules out calling builtin apply_args.  */
+
+static bool
+can_strub_at_calls_p (cgraph_node *node, bool report = false)
+{
+  bool result = !report || can_strub_p (node, report);
+
+  if (!result && !report)
+return result;
+
+  return !calls_builtin_apply_args_p (node, report);
+}
+
+/* Return TRUE iff the called function (pointer or, if available,
+   decl) undergoes a significant type conversion for the call.  Strub
+   mode changes between function types, and other non-useless type
+   conversions, are regarded as significant.  When the function type
+   is overridden, the effective strub mode for the call is that of the
+   call fntype, rather than that of the pointer or of the decl.
+   Functions called with type overrides cannot undergo type changes;
+   it's as if their address was taken, so they're considered
+   non-viable for implicit at-calls strub mode.  */
+
+static inline bool
+strub_call_fntype_override_p (const gcall *gs)
+{
+  if (gimple_call_internal_p (gs))
+return false;
+  tree fn_type = TREE_TYPE (TREE_TYPE (gimple_call_fn (gs)));
+  if (tree decl = gimple_call_fndecl (gs))
+fn_type = TREE_TYPE (decl);
+
+  /* We do NOT want to take the mode from the decl here.  This
+ function is used to tell whether we can change the strub mode of
+ a function, and whether the effective mode for the call is to be
+ taken from the decl or from an overrider type.  When the strub
+ mode is explicitly declared, or overridden with a type cast, the
+ difference will be noticed in function types.  However, if the
+ strub mode is implicit due to e.g. strub variables or -fstrub=*
+ command-line flags, we will adjust call types along with function
+ types.  In either case, the presence of type or strub mode
+ overriders in calls will prevent a function from having its strub
+ modes changed in ways that would imply type changes, but taking
+ strub modes from decls would defeat this, since we set strub
+ modes and then call this function to tell whether the original
+ type was overridden to decide whether to adjust the call.  We
+ need the answer to be about the type, not the decl.  */
+  enum strub_mode mode = get_strub_mode_from_type (fn_type);
+  return (get_strub_mode_from_type (gs->u.fntype) != mode
+ || !useless_type_conversion_p (gs->u.fntype, fn_type));
+}
+
+/* Return TRUE iff NODE is called 

[PATCH] Use CONVERT_EXPR_CODE_P

2022-07-29 Thread Richard Biener via Gcc-patches


Bootstrapped on x86_64-unknown-linux-gnu, pushed.

* gimple-ssa-warn-restrict.cc (builtin_memref::set_base_and_offset):
Use CONVERT_EXPR_CODE_P.
---
 gcc/gimple-ssa-warn-restrict.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-warn-restrict.cc b/gcc/gimple-ssa-warn-restrict.cc
index 6b6097a3d4c..b7ed15c8902 100644
--- a/gcc/gimple-ssa-warn-restrict.cc
+++ b/gcc/gimple-ssa-warn-restrict.cc
@@ -430,7 +430,7 @@ builtin_memref::set_base_and_offset (tree expr)
   else if (is_gimple_assign (stmt))
{
  tree_code code = gimple_assign_rhs_code (stmt);
- if (code == NOP_EXPR)
+ if (CONVERT_EXPR_CODE_P (code))
{
  tree rhs = gimple_assign_rhs1 (stmt);
  if (POINTER_TYPE_P (TREE_TYPE (rhs)))
-- 
2.35.3


[PATCH v2 08/10] Introduce strub: strub modes

2022-07-29 Thread Alexandre Oliva via Gcc-patches


This initial fragment of ipa-strub.cc covers strub modes and their
internal representation.

for  gcc/ChangeLog

* ipa-strub.cc: New.

diff --git a/gcc/ipa-strub.cc b/gcc/ipa-strub.cc
new file mode 100644
index 0..d61b7e2e36e43
--- /dev/null
+++ b/gcc/ipa-strub.cc
@@ -0,0 +1,3489 @@
+/* strub (stack scrubbing) support.
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+   Contributed by Alexandre Oliva .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "gimplify.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "gimplify-me.h"
+#include "tree-into-ssa.h"
+#include "tree-ssa.h"
+#include "tree-cfg.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgcleanup.h"
+#include "tree-eh.h"
+#include "except.h"
+#include "builtins.h"
+#include "attribs.h"
+#include "tree-inline.h"
+#include "cgraph.h"
+#include "alloc-pool.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
+#include "gimple-fold.h"
+#include "fold-const.h"
+#include "gimple-walk.h"
+#include "tree-dfa.h"
+#include "langhooks.h"
+#include "calls.h"
+#include "vec.h"
+#include "stor-layout.h"
+#include "varasm.h"
+#include "alias.h"
+#include "diagnostic.h"
+#include "intl.h"
+#include "ipa-strub.h"
+
+#if BUILDING_GCC_MAJOR >= 11
+# include "symtab-thunks.h"
+# include "attr-fnspec.h"
+# define HAVE_ATTR_FNSPEC 1
+# define FOR_GCC_11P 1
+#else
+# define HAVE_ATTR_FNSPEC 0
+# define FOR_GCC_11P 0
+#endif
+
+/* Const and pure functions that gain a watermark parameter for strub purposes
+   are still regarded as such, which may cause the inline expansions of the
+   __strub builtins to malfunction.  Ideally, attribute "fn spec" would enable
+   us to inform the backend about requirements and side effects of the call, 
but
+   call_fusage building in calls.c:expand_call does not even look at
+   attr_fnspec, so we resort to asm loads and updates to attain an equivalent
+   effect.  Once expand_call gains the ability to issue extra memory uses and
+   clobbers based on pure/const function's fnspec, we can define this to 1.  */
+#define ATTR_FNSPEC_DECONST_WATERMARK 0
+
+enum strub_mode {
+  /* This mode denotes a regular function, that does not require stack
+ scrubbing (strubbing).  It may call any other functions, but if
+ it calls AT_CALLS (or WRAPPED) ones, strubbing logic is
+ automatically introduced around those calls (the latter, by
+ inlining INTERNAL wrappers).  */
+  STRUB_DISABLED = 0,
+
+  /* This denotes a function whose signature is (to be) modified to
+ take an extra parameter, for stack use annotation, and its
+ callers must initialize and pass that argument, and perform the
+ strubbing.  Functions that are explicitly marked with attribute
+ strub must have the mark visible wherever the function is,
+ including aliases, and overriders and overriding methods.
+ Functions that are implicitly marked for strubbing, for accessing
+ variables explicitly marked as such, will only select this
+ strubbing method if they are internal to a translation unit.  It
+ can only be inlined into other strubbing functions, i.e.,
+ STRUB_AT_CALLS or STRUB_WRAPPED.  */
+  STRUB_AT_CALLS = 1,
+
+  /* This denotes a function that is to perform strubbing internally,
+ without any changes to its interface (the function is turned into
+ a strubbing wrapper, and its original body is moved to a separate
+ STRUB_WRAPPED function, with a modified interface).  Functions
+ may be explicitly marked with attribute strub(2), and the
+ attribute must be visible at the point of definition.  Functions
+ that are explicitly marked for strubbing, for accessing variables
+ explicitly marked as such, may select this strubbing mode if
+ their interface cannot change, e.g. because its interface is
+ visible to other translation units, directly, by indirection
+ (having its address taken), inheritance, etc.  Functions that use
+ this method must not have the noclone attribute, nor the noipa
+ one.  Functions marked as always_inline may select this mode, but
+ they are NOT wrapped, they remain unchanged, and are only inlined
+ into strubbed contexts.  

[PATCH] Avoid vect_get_vector_types_for_stmt

2022-07-29 Thread Richard Biener via Gcc-patches
This replaces vect_get_vector_types_for_stmt with get_vectype_for_scalar_type
in vect_recog_bool_pattern.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-patterns.cc (vect_recog_bool_pattern): Use
get_vectype_for_scalar_type instead of
vect_get_vector_types_for_stmt.
---
 gcc/tree-vect-patterns.cc | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c6..09574bb1a26 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4509,10 +4509,8 @@ vect_recog_bool_pattern (vec_info *vinfo,
   && STMT_VINFO_DATA_REF (stmt_vinfo))
 {
   stmt_vec_info pattern_stmt_info;
-  tree nunits_vectype;
-  if (!vect_get_vector_types_for_stmt (vinfo, stmt_vinfo, ,
-  _vectype)
- || !VECTOR_MODE_P (TYPE_MODE (vectype)))
+  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
+  if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
return NULL;
 
   if (check_bool_pattern (var, vinfo, bool_stmts))
-- 
2.35.3


[PATCH v2 07/10] Introduce strub: infrastructure interfaces and adjustments

2022-07-29 Thread Alexandre Oliva via Gcc-patches


Introduce the new strub passes, adjust other passes and front-end
declaration for strub.

for  gcc/ChangeLog

* Makefile.in (OBJS): Add ipa-strub.o.
* ipa-inline.cc: Include ipa-strub.h.
(can_inline_edge_p): Test strub_inlinable_to_p.
* ipa-split.cc: Include ipa-strub.h.
(execute_split_functions): Test strub_splittable_p.
* ipa-strub.h: New.
* passes.def: Add strub_mode and strub passes.
* tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts.
* tree-pass.h (make_pass_ipa_strub_mode): Declare.
(make_pass_ipa_strub): Declare.
(make_pass_ipa_function_and_variable_visibility): Fix
formatting.
* tree-ssa-ccp.cc (optimize_stack_restore): Keep restores
before strub leave.
* multiple_target.cc (pass_target_clone::gate): Test seen_error.

for  gcc/ada/ChangeLog

* gcc-interface/trans.cc: Include ipa-strub.h.
(gigi): Make internal decls for targets of compiler-generated
calls strub-callable too.
(build_raise_check): Likewise.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 203f0a15187d2..4100531d73ae7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1480,6 +1480,7 @@ OBJS = \
ipa-reference.o \
ipa-ref.o \
ipa-utils.o \
+   ipa-strub.o \
ipa.o \
ira.o \
ira-build.o \
diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index c1dd567b2e4e2..e3553a7a87bfc 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -69,6 +69,21 @@
 #include "ada-tree.h"
 #include "gigi.h"
 
+/* The following #include is for strub_make_callable.
+
+   This function marks a function as safe to call from strub contexts.  We mark
+   Ada subprograms that may be called implicitly by the compiler, and that 
won't
+   leave on the stack caller data passed to them.  This stops implicit calls
+   introduced in subprograms that have their stack scrubbed from being flagged
+   as unsafe, even in -fstrub=strict mode.
+
+   These subprograms are also marked with the strub(callable) attribute in Ada
+   sources, but their declarations aren't necessarily imported by GNAT, or made
+   visible to gigi, in units that end up relying on them.  So when gigi
+   introduces their declarations on its own, it must also add the attribute, by
+   calling strub_make_callable.  */
+#include "ipa-strub.h"
+
 /* We should avoid allocating more than ALLOCA_THRESHOLD bytes via alloca,
for fear of running out of stack space.  If we need more, we use xmalloc
instead.  */
@@ -449,6 +464,7 @@ gigi (Node_Id gnat_root,
 int64_type, NULL_TREE),
   NULL_TREE, is_default, true, true, true, false,
   false, NULL, Empty);
+  strub_make_callable (mulv64_decl);
 
   if (Enable_128bit_Types)
 {
@@ -461,6 +477,7 @@ gigi (Node_Id gnat_root,
 NULL_TREE),
   NULL_TREE, is_default, true, true, true, false,
   false, NULL, Empty);
+  strub_make_callable (mulv128_decl);
 }
 
   /* Name of the _Parent field in tagged record types.  */
@@ -716,6 +733,7 @@ build_raise_check (int check, enum exception_info_kind kind)
 = create_subprog_decl (get_identifier (Name_Buffer), NULL_TREE, ftype,
   NULL_TREE, is_default, true, true, true, false,
   false, NULL, Empty);
+  strub_make_callable (result);
 
   return result;
 }
diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
index 14969198cde1c..0674fe138574d 100644
--- a/gcc/ipa-inline.cc
+++ b/gcc/ipa-inline.cc
@@ -119,6 +119,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "asan.h"
+#include "ipa-strub.h"
 
 typedef fibonacci_heap  edge_heap_t;
 typedef fibonacci_node  edge_heap_node_t;
@@ -397,6 +398,11 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
   inlinable = false;
 }
 
+  if (inlinable && !strub_inlinable_to_p (callee, caller))
+{
+  e->inline_failed = CIF_UNSPECIFIED;
+  inlinable = false;
+}
   if (!inlinable && report)
 report_inline_failed_reason (e);
   return inlinable;
diff --git a/gcc/ipa-split.cc b/gcc/ipa-split.cc
index 16734617d0381..b3b9963f13669 100644
--- a/gcc/ipa-split.cc
+++ b/gcc/ipa-split.cc
@@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "cfgloop.h"
 #include "attribs.h"
+#include "ipa-strub.h"
 
 /* Per basic block info.  */
 
@@ -1810,6 +1811,12 @@ execute_split_functions (void)
 "section.\n");
   return 0;
 }
+  if (!strub_splittable_p (node))
+{
+  if (dump_file)
+   fprintf (dump_file, "Not splitting: function is a strub context.\n");
+  return 0;
+}
 
   /* We 

[PATCH v2 06/10] Introduce strub: attributes

2022-07-29 Thread Alexandre Oliva via Gcc-patches


Ada already has support for the strub attributes stubbed-out, and the
front-end code already has support for them and their effects in the
type system.

for  gcc/ChangeLog

* attribs.cc: Include ipa-strub.h.
(decl_attributes): Support applying attributes to function
type, rather than pointer type, at handler's request.
(comp_type_attributes): Combine strub_comptypes and target
comp_type results.

for  gcc/c-family/ChangeLog

* c-attribs.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(c_common_attribute_table): Add strub.

for  gcc/ada/ChangeLog

* gcc-interface/utils.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(gnat_internal_attribute_table): Add strub.

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index fb89616ff296b..d559cfc1b9f4e 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "attribs.h"
 #include "fold-const.h"
+#include "ipa-strub.h"
 #include "stor-layout.h"
 #include "langhooks.h"
 #include "plugin.h"
@@ -774,12 +775,11 @@ decl_attributes (tree *node, tree attributes, int flags,
  flags &= ~(int) ATTR_FLAG_TYPE_IN_PLACE;
}
 
-  if (spec->function_type_required && TREE_CODE (*anode) != FUNCTION_TYPE
- && TREE_CODE (*anode) != METHOD_TYPE)
+  if (spec->function_type_required
+ && !FUNC_OR_METHOD_TYPE_P (*anode))
{
  if (TREE_CODE (*anode) == POINTER_TYPE
- && (TREE_CODE (TREE_TYPE (*anode)) == FUNCTION_TYPE
- || TREE_CODE (TREE_TYPE (*anode)) == METHOD_TYPE))
+ && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (*anode)))
{
  /* OK, this is a bit convoluted.  We can't just make a copy
 of the pointer type and modify its TREE_TYPE, because if
@@ -887,7 +887,24 @@ decl_attributes (tree *node, tree attributes, int flags,
  TYPE_NAME (tt) = *node;
}
 
- *anode = cur_and_last_decl[0];
+ if (*anode != cur_and_last_decl[0])
+   {
+ /* Even if !spec->function_type_required, allow the attribute
+handler to request the attribute to be applied to the function
+type, rather than to the function pointer type, by setting
+cur_and_last_decl[0] to the function type.  */
+ if (!fn_ptr_tmp
+ && POINTER_TYPE_P (*anode)
+ && TREE_TYPE (*anode) == cur_and_last_decl[0]
+ && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (*anode)))
+   {
+ fn_ptr_tmp = TREE_TYPE (*anode);
+ fn_ptr_quals = TYPE_QUALS (*anode);
+ anode = _ptr_tmp;
+   }
+ *anode = cur_and_last_decl[0];
+   }
+
  if (ret == error_mark_node)
{
  warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1491,9 +1508,20 @@ comp_type_attributes (const_tree type1, const_tree type2)
   if ((lookup_attribute ("nocf_check", TYPE_ATTRIBUTES (type1)) != NULL)
   ^ (lookup_attribute ("nocf_check", TYPE_ATTRIBUTES (type2)) != NULL))
 return 0;
+  int strub_ret = strub_comptypes (CONST_CAST_TREE (type1),
+  CONST_CAST_TREE (type2));
+  if (strub_ret == 0)
+return strub_ret;
   /* As some type combinations - like default calling-convention - might
  be compatible, we have to call the target hook to get the final result.  
*/
-  return targetm.comp_type_attributes (type1, type2);
+  int target_ret = targetm.comp_type_attributes (type1, type2);
+  if (target_ret == 0)
+return target_ret;
+  if (strub_ret == 2 || target_ret == 2)
+return 2;
+  if (strub_ret == 1 && target_ret == 1)
+return 1;
+  gcc_unreachable ();
 }
 
 /* PREDICATE acts as a function of type:
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e4f1d3542f378..08c7d71f827a2 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "common/common-target.h"
 #include "langhooks.h"
 #include "tree-inline.h"
+#include "ipa-strub.h"
 #include "toplev.h"
 #include "tree-iterator.h"
 #include "opts.h"
@@ -69,6 +70,7 @@ static tree handle_asan_odr_indicator_attribute (tree *, 
tree, tree, int,
 static tree handle_stack_protect_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_stack_protector_function_attribute (tree *, tree,
tree, int, bool *);
+static tree handle_strub_attribute (tree *, tree, tree, int, bool *);
 static tree handle_noinline_attribute (tree *, tree, tree, int, bool *);
 static tree handle_noclone_attribute (tree *, tree, tree, int, bool *);
 static tree handle_nocf_check_attribute (tree *, tree, tree, int, bool *);
@@ -314,6 +316,8 @@ const 

[PATCH v2 05/10] Introduce strub: builtins and runtime

2022-07-29 Thread Alexandre Oliva via Gcc-patches


for  gcc/ChangeLog

* builtins.def (BUILT_IN_STACK_ADDRESS): New.
(BUILT_IN___STRUB_ENTER): New.
(BUILT_IN___STRUB_UPDATE): New.
(BUILT_IN___STRUB_LEAVE): New.
* builtins.cc: Include ipa-strub.h.
(STACK_STOPS, STACK_UNSIGNED): Define.
(expand_builtin_stack_address): New.
(expand_builtin_strub_enter): New.
(expand_builtin_strub_update): New.
(expand_builtin_strub_leave): New.
(expand_bulitin): Call them.

for  libgcc/ChangeLog

* Makefile.in (LIB2ADD): Add strub.c.
* libgcc2.h (__strub_enter, __strub_update, __strub_leave):
Declare.
* strub.c: New.

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index b08b4365da36b..656186c308997 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "intl.h"
 #include "file-prefix-map.h" /* remap_macro_filename()  */
+#include "ipa-strub.h" /* strub_watermark_parm()  */
 #include "gomp-constants.h"
 #include "omp-general.h"
 #include "tree-dfa.h"
@@ -151,6 +152,7 @@ static rtx expand_builtin_strnlen (tree, rtx, machine_mode);
 static rtx expand_builtin_alloca (tree);
 static rtx expand_builtin_unop (machine_mode, tree, rtx, rtx, optab);
 static rtx expand_builtin_frame_address (tree, tree);
+static rtx expand_builtin_stack_address ();
 static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static rtx expand_builtin_expect_with_probability (tree, rtx);
@@ -4968,6 +4970,256 @@ expand_builtin_frame_address (tree fndecl, tree exp)
 }
 }
 
+#ifndef STACK_GROWS_DOWNWARD
+# define STACK_TOPS GT
+#else
+# define STACK_TOPS LT
+#endif
+
+#ifdef POINTERS_EXTEND_UNSIGNED
+# define STACK_UNSIGNED POINTERS_EXTEND_UNSIGNED
+#else
+# define STACK_UNSIGNED true
+#endif
+
+/* Expand a call to builtin function __builtin_stack_address.  */
+
+static rtx
+expand_builtin_stack_address ()
+{
+  return convert_to_mode (ptr_mode, copy_to_reg (stack_pointer_rtx),
+ STACK_UNSIGNED);
+}
+
+/* Expand a call to builtin function __builtin_strub_enter.  */
+
+static rtx
+expand_builtin_strub_enter (tree exp)
+{
+  if (!validate_arglist (exp, POINTER_TYPE, VOID_TYPE))
+return NULL_RTX;
+
+  if (optimize < 1 || flag_no_inline)
+return NULL_RTX;
+
+  rtx stktop = NULL_RTX;
+
+#if 1 || defined RED_ZONE_SIZE
+  if (tree wmptr = (optimize
+   ? strub_watermark_parm (current_function_decl)
+   : NULL_TREE))
+{
+  tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
+  tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
+build_int_cst (TREE_TYPE (wmptr), 0));
+  rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
+  stktop = force_reg (ptr_mode, wmark);
+}
+#endif
+
+  if (!stktop)
+stktop = expand_builtin_stack_address ();
+
+  tree wmptr = CALL_EXPR_ARG (exp, 0);
+  tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
+  tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
+build_int_cst (TREE_TYPE (wmptr), 0));
+  rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
+
+  emit_move_insn (wmark, stktop);
+
+  return const0_rtx;
+}
+
+/* Expand a call to builtin function __builtin_strub_update.  */
+
+static rtx
+expand_builtin_strub_update (tree exp)
+{
+  if (!validate_arglist (exp, POINTER_TYPE, VOID_TYPE))
+return NULL_RTX;
+
+  if (optimize < 2 || flag_no_inline)
+return NULL_RTX;
+
+  rtx stktop = expand_builtin_stack_address ();
+
+#ifdef RED_ZONE_SIZE
+  /* Here's how the strub enter, update and leave functions deal with red 
zones.
+
+ If it weren't for red zones, update, called from within a strub context,
+ would bump the watermark to the top of the stack.  Enter and leave, 
running
+ in the caller, would use the caller's top of stack address both to
+ initialize the watermark passed to the callee, and to start strubbing the
+ stack afterwards.
+
+ Ideally, we'd update the watermark so as to cover the used amount of red
+ zone, and strub starting at the caller's other end of the (presumably
+ unused) red zone.  Normally, only leaf functions use the red zone, but at
+ this point we can't tell whether a function is a leaf, nor can we tell how
+ much of the red zone it uses.  Furthermore, some strub contexts may have
+ been inlined so that update and leave are called from the same stack 
frame,
+ and the strub builtins may all have been inlined, turning a strub function
+ into a leaf.
+
+ So cleaning the range from the caller's stack pointer (one end of the red
+ zone) to the (potentially inlined) callee's (other end of the) red zone
+ could scribble over the caller's own red zone.
+
+ We avoid this possibility by arranging for callers that are strub contexts
+ to use their own watermark 

[PATCH v2 04/10] Introduce strub: tests for C++ and Ada

2022-07-29 Thread Alexandre Oliva via Gcc-patches


for  gcc/testsuite/ChangeLog

* g++.dg/strub-run1.C: New.
* g++.dg/torture/strub-init1.C: New.
* g++.dg/torture/strub-init2.C: New.
* g++.dg/torture/strub-init3.C: New.
* gnat.dg/strub_attr.adb, gnat.dg/strub_attr.ads: New.
* gnat.dg/strub_ind.adb, gnat.dg/strub_ind.ads: New.

diff --git a/gcc/testsuite/g++.dg/strub-run1.C 
b/gcc/testsuite/g++.dg/strub-run1.C
new file mode 100644
index 0..0d367fb83d09d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/strub-run1.C
@@ -0,0 +1,19 @@
+// { dg-do run }
+// { dg-options "-fstrub=internal" }
+
+// Check that we don't get extra copies.
+
+struct T {
+  T 
+  void check () const { if ( != this) __builtin_abort (); }
+  T() : self (*this) { check (); }
+  T(const T& ck) : self (*this) { ck.check (); check (); }
+  ~T() { check (); }
+};
+
+T foo (T q) { q.check (); return T(); }
+T bar (T p) { p.check (); return foo (p); }
+
+int main () {
+  bar (T()).check ();
+}
diff --git a/gcc/testsuite/g++.dg/torture/strub-init1.C 
b/gcc/testsuite/g++.dg/torture/strub-init1.C
new file mode 100644
index 0..c226ab10ff651
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/strub-init1.C
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict -fdump-ipa-strub" } */
+
+extern int __attribute__((__strub__)) initializer ();
+
+int f() {
+  static int x = initializer ();
+  return x;
+}
+
+/* { dg-final { scan-ipa-dump "strub_enter" "strub" } } */
+/* { dg-final { scan-ipa-dump "strub_leave" "strub" } } */
+/* { dg-final { scan-ipa-dump-not "strub_update" "strub" } } */
diff --git a/gcc/testsuite/g++.dg/torture/strub-init2.C 
b/gcc/testsuite/g++.dg/torture/strub-init2.C
new file mode 100644
index 0..a7911f1fa7212
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/strub-init2.C
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict -fdump-ipa-strub" } */
+
+extern int __attribute__((__strub__)) initializer ();
+
+static int x = initializer ();
+
+int f() {
+  return x;
+}
+
+/* { dg-final { scan-ipa-dump "strub_enter" "strub" } } */
+/* { dg-final { scan-ipa-dump "strub_leave" "strub" } } */
+/* { dg-final { scan-ipa-dump-not "strub_update" "strub" } } */
diff --git a/gcc/testsuite/g++.dg/torture/strub-init3.C 
b/gcc/testsuite/g++.dg/torture/strub-init3.C
new file mode 100644
index 0..6ebebcd01e8ea
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/strub-init3.C
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict -fdump-ipa-strub" } */
+
+extern int __attribute__((__strub__)) initializer ();
+
+int f() {
+  int x = initializer ();
+  return x;
+}
+
+/* { dg-final { scan-ipa-dump "strub_enter" "strub" } } */
+/* { dg-final { scan-ipa-dump "strub_leave" "strub" } } */
+/* { dg-final { scan-ipa-dump-not "strub_update" "strub" } } */
diff --git a/gcc/testsuite/gnat.dg/strub_access.adb 
b/gcc/testsuite/gnat.dg/strub_access.adb
new file mode 100644
index 0..29e6996ecf61c
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/strub_access.adb
@@ -0,0 +1,21 @@
+--  { dg-do compile }
+--  { dg-options "-fstrub=relaxed -fdump-ipa-strubm" }
+
+--  The main subprogram doesn't read from the automatic variable, but
+--  being an automatic variable, its presence should be enough for the
+--  procedure to get strub enabled.
+
+procedure Strub_Access is
+   type Strub_Int is new Integer;
+   pragma Machine_Attribute (Strub_Int, "strub");
+   
+   X : aliased Strub_Int := 0;
+
+   function F (P : access Strub_Int) return Strub_Int is (P.all);
+
+begin
+   X := F (X'Access);
+end Strub_Access;
+
+--  { dg-final { scan-ipa-dump-times "\[(\]strub \[(\]internal\[)\]\[)\]" 1 
"strubm" } }
+--  { dg-final { scan-ipa-dump-times "\[(\]strub \[(\]at-calls-opt\[)\]\[)\]" 
1 "strubm" } }
diff --git a/gcc/testsuite/gnat.dg/strub_access1.adb 
b/gcc/testsuite/gnat.dg/strub_access1.adb
new file mode 100644
index 0..dae4706016436
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/strub_access1.adb
@@ -0,0 +1,16 @@
+--  { dg-do compile }
+--  { dg-options "-fstrub=relaxed" }
+
+--  Check that we reject 'Access of a strub variable whose type does
+--  not carry a strub modifier.
+
+procedure Strub_Access1 is
+   X : aliased Integer := 0;
+   pragma Machine_Attribute (X, "strub");
+
+   function F (P : access Integer) return Integer is (P.all);
+   
+begin
+   X := F (X'Unchecked_access); -- OK.
+   X := F (X'Access); -- { dg-error "target access type drops .strub. mode" }
+end Strub_Access1;
diff --git a/gcc/testsuite/gnat.dg/strub_attr.adb 
b/gcc/testsuite/gnat.dg/strub_attr.adb
new file mode 100644
index 0..10445d7cf8451
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/strub_attr.adb
@@ -0,0 +1,37 @@
+--  { dg-do compile }
+--  { dg-options "-fstrub=strict -fdump-ipa-strubm -fdump-ipa-strub" }
+
+package body Strub_Attr is
+   E : exception;
+
+   procedure P (X : Integer) is
+   begin
+  raise E;
+   end;
+   
+   function F (X : Integer) return Integer is
+   

[PATCH v2 03/10] Introduce strub: non-torture tests for C and C++

2022-07-29 Thread Alexandre Oliva via Gcc-patches


for  gcc/testsuite/ChangeLog

* c-c++-common/strub-O0.c: New.
* c-c++-common/strub-O1.c: New.
* c-c++-common/strub-O2.c: New.
* c-c++-common/strub-O2fni.c: New.
* c-c++-common/strub-O3.c: New.
* c-c++-common/strub-O3fni.c: New.
* c-c++-common/strub-Og.c: New.
* c-c++-common/strub-Os.c: New.
* c-c++-common/strub-all1.c: New.
* c-c++-common/strub-all2.c: New.
* c-c++-common/strub-apply1.c: New.
* c-c++-common/strub-apply2.c: New.
* c-c++-common/strub-apply3.c: New.
* c-c++-common/strub-apply4.c: New.
* c-c++-common/strub-at-calls1.c: New.
* c-c++-common/strub-at-calls2.c: New.
* c-c++-common/strub-defer-O1.c: New.
* c-c++-common/strub-defer-O2.c: New.
* c-c++-common/strub-defer-O3.c: New.
* c-c++-common/strub-defer-Os.c: New.
* c-c++-common/strub-internal1.c: New.
* c-c++-common/strub-internal2.c: New.
* c-c++-common/strub-parms1.c: New.
* c-c++-common/strub-parms2.c: New.
* c-c++-common/strub-parms3.c: New.
* c-c++-common/strub-relaxed1.c: New.
* c-c++-common/strub-relaxed2.c: New.
* c-c++-common/strub-short-O0-exc.c: New.
* c-c++-common/strub-short-O0.c: New.
* c-c++-common/strub-short-O1.c: New.
* c-c++-common/strub-short-O2.c: New.
* c-c++-common/strub-short-O3.c: New.
* c-c++-common/strub-short-Os.c: New.
* c-c++-common/strub-strict1.c: New.
* c-c++-common/strub-strict2.c: New.
* c-c++-common/strub-tail-O1.c: New.
* c-c++-common/strub-tail-O2.c: New.

diff --git a/gcc/testsuite/c-c++-common/strub-O0.c 
b/gcc/testsuite/c-c++-common/strub-O0.c
new file mode 100644
index 0..c7a79a6ea0d8a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/strub-O0.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -fstrub=strict -fdump-rtl-expand" } */
+
+/* At -O0, none of the strub builtins are expanded inline.  */
+
+int __attribute__ ((__strub__)) var;
+
+int f() {
+  return var;
+}
+
+/* { dg-final { scan-rtl-dump "strub_enter" "expand" } } */
+/* { dg-final { scan-rtl-dump "strub_update" "expand" } } */
+/* { dg-final { scan-rtl-dump "strub_leave" "expand" } } */
diff --git a/gcc/testsuite/c-c++-common/strub-O1.c 
b/gcc/testsuite/c-c++-common/strub-O1.c
new file mode 100644
index 0..96285c975d98e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/strub-O1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fstrub=strict -fdump-rtl-expand" } */
+
+/* At -O1, without -fno-inline, we fully expand enter, but neither update nor
+   leave.  */
+
+int __attribute__ ((__strub__)) var;
+
+int f() {
+  return var;
+}
+
+/* { dg-final { scan-rtl-dump-not "strub_enter" "expand" } } */
+/* { dg-final { scan-rtl-dump "strub_update" "expand" } } */
+/* { dg-final { scan-rtl-dump "strub_leave" "expand" } } */
diff --git a/gcc/testsuite/c-c++-common/strub-O2.c 
b/gcc/testsuite/c-c++-common/strub-O2.c
new file mode 100644
index 0..8edc0d8aa1321
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/strub-O2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstrub=strict -fdump-rtl-expand" } */
+
+/* At -O2, without -fno-inline, we fully expand enter and update, and add a 
test
+   around the leave call.  */
+
+int __attribute__ ((__strub__)) var;
+
+int f() {
+  return var;
+}
+
+/* { dg-final { scan-rtl-dump-not "strub_enter" "expand" } } */
+/* { dg-final { scan-rtl-dump-not "strub_update" "expand" } } */
+/* { dg-final { scan-rtl-dump "strub_leave" "expand" } } */
+/* { dg-final { scan-rtl-dump 
"\[(\]call\[^\n\]*strub_leave.*\n\[(\]code_label" "expand" } } */
diff --git a/gcc/testsuite/c-c++-common/strub-O2fni.c 
b/gcc/testsuite/c-c++-common/strub-O2fni.c
new file mode 100644
index 0..c6d900cf3c45b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/strub-O2fni.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstrub=strict -fdump-rtl-expand -fno-inline" } */
+
+/* With -fno-inline, none of the strub builtins are inlined.  */
+
+int __attribute__ ((__strub__)) var;
+
+int f() {
+  return var;
+}
+
+/* { dg-final { scan-rtl-dump "strub_enter" "expand" } } */
+/* { dg-final { scan-rtl-dump "strub_update" "expand" } } */
+/* { dg-final { scan-rtl-dump "strub_leave" "expand" } } */
+/* { dg-final { scan-rtl-dump-not 
"\[(\]call\[^\n\]*strub_leave.*\n\[(\]code_label" "expand" } } */
diff --git a/gcc/testsuite/c-c++-common/strub-O3.c 
b/gcc/testsuite/c-c++-common/strub-O3.c
new file mode 100644
index 0..33ee465e51cb6
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/strub-O3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fstrub=strict -fdump-rtl-expand" } */
+
+int __attribute__ ((__strub__)) var;
+
+int f() {
+  return var;
+}
+
+/* { dg-final { scan-rtl-dump-not "strub_enter" "expand" } } */
+/* { dg-final { 

[PATCH v2 02/10] Introduce strub: torture tests for C and C++

2022-07-29 Thread Alexandre Oliva via Gcc-patches


for  gcc/testsuite/ChangeLog

* c-c++-common/torture/strub-callable1.c: New.
* c-c++-common/torture/strub-callable2.c: New.
* c-c++-common/torture/strub-const1.c: New.
* c-c++-common/torture/strub-const2.c: New.
* c-c++-common/torture/strub-const3.c: New.
* c-c++-common/torture/strub-const4.c: New.
* c-c++-common/torture/strub-data1.c: New.
* c-c++-common/torture/strub-data2.c: New.
* c-c++-common/torture/strub-data3.c: New.
* c-c++-common/torture/strub-data4.c: New.
* c-c++-common/torture/strub-data5.c: New.
* c-c++-common/torture/strub-indcall1.c: New.
* c-c++-common/torture/strub-indcall2.c: New.
* c-c++-common/torture/strub-indcall3.c: New.
* c-c++-common/torture/strub-inlinable1.c: New.
* c-c++-common/torture/strub-inlinable2.c: New.
* c-c++-common/torture/strub-ptrfn1.c: New.
* c-c++-common/torture/strub-ptrfn2.c: New.
* c-c++-common/torture/strub-ptrfn3.c: New.
* c-c++-common/torture/strub-ptrfn4.c: New.
* c-c++-common/torture/strub-pure1.c: New.
* c-c++-common/torture/strub-pure2.c: New.
* c-c++-common/torture/strub-pure3.c: New.
* c-c++-common/torture/strub-pure4.c: New.
* c-c++-common/torture/strub-run1.c: New.
* c-c++-common/torture/strub-run2.c: New.
* c-c++-common/torture/strub-run3.c: New.
* c-c++-common/torture/strub-run4.c: New.
* c-c++-common/torture/strub-run4c.c: New.
* c-c++-common/torture/strub-run4d.c: New.
* c-c++-common/torture/strub-run4i.c: New.

diff --git a/gcc/testsuite/c-c++-common/torture/strub-callable1.c 
b/gcc/testsuite/c-c++-common/torture/strub-callable1.c
new file mode 100644
index 0..b5e45ab0525ad
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/strub-callable1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict" } */
+
+/* Check that strub and non-strub functions can be called from non-strub
+   contexts, and that strub and callable functions can be called from strub
+   contexts.  */
+
+#define OMIT_IMPERMISSIBLE_CALLS 1
+#include "strub-callable2.c"
diff --git a/gcc/testsuite/c-c++-common/torture/strub-callable2.c 
b/gcc/testsuite/c-c++-common/torture/strub-callable2.c
new file mode 100644
index 0..96aa7fe4b07f7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/strub-callable2.c
@@ -0,0 +1,264 @@
+/* { dg-do compile } */
+/* { dg-options "-fstrub=strict" } */
+
+/* Check that impermissible (cross-strub-context) calls are reported.  */
+
+extern int __attribute__ ((__strub__ ("callable"))) xcallable (void);
+extern int __attribute__ ((__strub__ ("internal"))) xinternal (void);
+extern int __attribute__ ((__strub__ ("at-calls"))) xat_calls (void);
+extern int __attribute__ ((__strub__ ("disabled"))) xdisabled (void);
+
+int __attribute__ ((__strub__ ("callable"))) callable (void);
+int __attribute__ ((__strub__ ("internal"))) internal (void);
+int __attribute__ ((__strub__ ("at-calls"))) at_calls (void);
+int __attribute__ ((__strub__ ("disabled"))) disabled (void);
+
+int __attribute__ ((__strub__)) var;
+int var_user (void);
+
+static inline int __attribute__ ((__always_inline__, __strub__ ("callable")))
+icallable (void);
+static inline int __attribute__ ((__always_inline__, __strub__ ("internal")))
+iinternal (void);
+static inline int __attribute__ ((__always_inline__, __strub__ ("at-calls")))
+iat_calls (void);
+static inline int __attribute__ ((__always_inline__, __strub__ ("disabled")))
+idisabled (void);
+static inline int __attribute__ ((__always_inline__))
+ivar_user (void);
+
+static inline int __attribute__ ((__always_inline__, __strub__ ("callable")))
+i_callable (void) { return 0; }
+static inline int __attribute__ ((__always_inline__, __strub__ ("internal")))
+i_internal (void) { return var; }
+static inline int __attribute__ ((__always_inline__, __strub__ ("at-calls")))
+i_at_calls (void) { return var; }
+static inline int __attribute__ ((__always_inline__, __strub__ ("disabled")))
+i_disabled (void) { return 0; }
+static inline int __attribute__ ((__always_inline__))
+i_var_user (void) { return var; }
+
+#define CALLS_GOOD_FOR_STRUB_CONTEXT(ISEP) \
+  do { \
+ret += i ## ISEP ## at_calls ();   \
+ret += i ## ISEP ## internal ();   \
+ret += i ## ISEP ## var_user ();   \
+  } while (0)
+
+#define CALLS_GOOD_FOR_NONSTRUB_CONTEXT(ISEP)  \
+  do { \
+ret += internal ();\
+ret += disabled ();\
+ret += var_user ();\
+   \
+ret += i ## ISEP ## disabled ();   \
+   \
+ret += xinternal ();   \
+ret += 

[PATCH v2 01/10] Introduce strub: documentation, and new command-line options

2022-07-29 Thread Alexandre Oliva via Gcc-patches


Ada already has some strub documentation in
gcc/ada/doc/gnat_rm/security_hardening_features.rst.

for  gcc/ChangeLog

* common.opt (fstrub=*): New options.
* doc/extend.texi (strub): New type attribute.
(__builtin_stack_address): New function.
(Stack Scrubbing): New section.
* doc/invoke.texi (-fstrub=*): New options.
(-fdump-ipa-*): New passes.

diff --git a/gcc/common.opt b/gcc/common.opt
index e7a51e882bade..2c635806bbf2c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2780,6 +2780,35 @@ fstrict-overflow
 Common
 Treat signed overflow as undefined.  Negated as -fwrapv -fwrapv-pointer.
 
+fstrub=disable
+Common RejectNegative Var(flag_strub, 0)
+Disable stack scrub entirely, disregarding strub attributes.
+
+fstrub=strict
+Common RejectNegative Var(flag_strub, -4)
+Enable stack scrub as per attributes, with strict call checking.
+
+; If any strub-enabling attribute is seen when the default or strict
+; initializer values are in effect, flag_strub is bumped up by 2.  The
+; scrub mode gate function will then bump these initializer values to
+; 0 if no strub-enabling attribute is seen.  This minimizes the strub
+; overhead.
+fstrub=relaxed
+Common RejectNegative Var(flag_strub, -3) Init(-3)
+Restore default strub mode: as per attributes, with relaxed checking.
+
+fstrub=all
+Common RejectNegative Var(flag_strub, 3)
+Enable stack scrubbing for all viable functions.
+
+fstrub=at-calls
+Common RejectNegative Var(flag_strub, 1)
+Enable at-calls stack scrubbing for all viable functions.
+
+fstrub=internal
+Common RejectNegative Var(flag_strub, 2)
+Enable internal stack scrubbing for all viable functions.
+
 fsync-libcalls
 Common Var(flag_sync_libcalls) Init(1)
 Implement __atomic operations via libcalls to legacy __sync functions.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7fe7f8817cdd4..90d38c9f08362 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -77,6 +77,7 @@ extensions, accepted by GCC in C90 mode and in C++.
 * Function Names::  Printable strings which are the name of the current
 function.
 * Return Address::  Getting the return or frame address of a function.
+* Stack Scrubbing:: Stack scrubbing internal interfaces.
 * Vector Extensions::   Using vector instructions through built-in functions.
 * Offsetof::Special syntax for implementing @code{offsetof}.
 * __sync Builtins:: Legacy built-in functions for atomic memory access.
@@ -8890,6 +8891,263 @@ pid_t wait (wait_status_ptr_t p)
 @}
 @end smallexample
 
+@item strub
+@cindex @code{strub} type attribute
+This attribute defines stack-scrubbing properties of functions and
+variables.  Being a type attribute, it attaches to types, even when
+specified in function and variable declarations.  When applied to
+function types, it takes an optional string argument.  When applied to a
+pointer-to-function type, if the optional argument is given, it gets
+propagated to the function type.
+
+@smallexample
+/* A strub variable.  */
+int __attribute__ ((strub)) var;
+/* A strub variable that happens to be a pointer.  */
+__attribute__ ((strub)) int *strub_ptr_to_int;
+/* A pointer type that may point to a strub variable.  */
+typedef int __attribute__ ((strub)) *ptr_to_strub_int_type;
+
+/* A declaration of a strub function.  */
+extern int __attribute__ ((strub)) foo (void);
+/* A pointer to that strub function.  */
+int __attribute__ ((strub ("at-calls"))) (*ptr_to_strub_fn)(void) = foo;
+@end smallexample
+
+A function associated with @code{at-calls} @code{strub} mode
+(@code{strub("at-calls")}, or just @code{strub}) undergoes interface
+changes.  Its callers are adjusted to match the changes, and to scrub
+(overwrite with zeros) the stack space used by the called function after
+it returns.  The interface change makes the function type incompatible
+with an unadorned but otherwise equivalent type, so @emph{every}
+declaration and every type that may be used to call the function must be
+associated with this strub mode.
+
+A function associated with @code{internal} @code{strub} mode
+(@code{strub("internal")}) retains an unmodified, type-compatible
+interface, but it may be turned into a wrapper that calls the wrapped
+body using a custom interface.  The wrapper then scrubs the stack space
+used by the wrapped body.  Though the wrapped body has its stack space
+scrubbed, the wrapper does not, so arguments and return values may
+remain unscrubbed even when such a function is called by another
+function that enables @code{strub}.  This is why, when compiling with
+@option{-fstrub=strict}, a @code{strub} context is not allowed to call
+@code{internal} @code{strub} functions.
+
+@smallexample
+/* A declaration of an internal-strub function.  */
+extern int __attribute__ ((strub ("internal"))) bar (void);
+
+int __attribute__ ((strub))
+baz (void)
+@{
+  /* Ok, foo was declared above as an at-calls strub function.  */
+  foo 

Re: [GCC13][Patch][V2][1/2]Add a new option -fstrict-flex-array[=n] and attribute strict_flex_array(n) and use it in PR101836

2022-07-29 Thread Richard Biener via Gcc-patches
On Thu, 28 Jul 2022, Kees Cook wrote:

> On Thu, Jul 28, 2022 at 07:26:57AM +, Richard Biener wrote:
> > On Tue, 19 Jul 2022, Qing Zhao wrote:
> > > [...]
> > > +@cindex @code{strict_flex_array} variable attribute
> > > +@item strict_flex_array (@var{level})
> > > +The @code{strict_flex_array} attribute should be attached to the trailing
> > > +array field of a structure.  It specifies the level of strictness of
> > > +treating the trailing array field of a structure as a flexible array
> > > +member. @var{level} must be an integer betwen 0 to 3.
> > > +
> > > +@var{level}=0 is the least strict level, all trailing arrays of 
> > > structures
> > > +are treated as flexible array members. @var{level}=3 is the strictest 
> > > level,
> > > +only when the trailing array is declared as a flexible array member per 
> > > C99
> > > +standard onwards ([]), it is treated as a flexible array member.
> > 
> > How is level 3 (thus -fstrict-flex-array) interpreted when you specify 
> > -std=c89?  How for -std=gnu89?
> 
> To me, it makes sense that either c99 is required (most sane to me)
> or it would disable flexible arrays entirely (seems an unlikely combo to
> be useful).
> 
> > 
> > > +
> > > +There are two more levels in between 0 and 3, which are provided to 
> > > support
> > > +older codes that use GCC zero-length array extension ([0]) or one-size 
> > > array
> > > +as flexible array member ([1]):
> > > +When @var{level} is 1, the trailing array is treated as a flexible array 
> > > member
> > > +when it is declared as either "[]", "[0]", or "[1]";
> > > +When @var{level} is 2, the trailing array is treated as a flexible array 
> > > member
> > > +when it is declared as either "[]", or "[0]".
> > 
> > Given the above does adding level 2 make sense given that [0] is a GNU
> > extension?
> 
> Level 1 removes the general "all trailing arrays are flex arrays" logic, but
> allows the 2 common "historical" fake flex array styles ("[1]" and "[0]").
> Level 2 additionally removes the "[1]" style.
> Level 3 additionally removes the "[0]" style.
> 
> I don't understand how "[0]" being a GNU extension matters here for
> level 2 -- it's dropping "[1]". And for level 3, the point is to defang
> the GNU extension of "[0]" to no longer mean "flexible array", and
> instead only mean "zero sized member" (as if it were something like
> "struct { } no_size;").
> 
> Note that for the Linux kernel, we only care about level 3, but could
> make do with level 2. We need to purge all the "fake" flexible array usage
> so we can start building a sane set of behaviors around array bounds
> that are reliably introspectable.

Note we've seen "historical" fake flex arrays like
struct X { int n; char str[4]; }; used by people being extra clever
(or careful?  char str[1] would not be a flex array since there's
a padding "member" behind it?!) in handling the padding.

I was just worried in confusing people too much.  Given 
-fstrict-flex-arrays enables level 3 should we warn with -std=c89
that it disables all flex arrays?  I think it should at least be
documented somehow.

> As a related bit of feature creep, it would be great to expose something
> like __builtin_has_flex_array_p() so FORTIFY could do a better job
> filtering __builtin_object_size() information.
> 
> Given:
> 
> struct inside {
> int foo;
> int bar;
> unsigned long items[];
> };
> 
> struct outside {
> int a;
> int b;
> struct inside inner;
> };
> 
> The follow properties are seen within, for example:
> 
> void stuff(struct outside *outer, struct inside *inner)
> {
>   ...
> }
> 
>   __builtin_object_size(>inner, 1) == 8
>   __builtin_object_size(inner, 1) == -1
> 
> (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832)

I think that would be a bug in bos worth fixing.

> So things like FORTIFY misfire on >inner, as it's _not_ actually
> 8 bytes -- it has a potential trailing flex array.
> 
> If it could be introspected better, FORTIFY could check for the flex
> array. For example, instead of using the inconsistent __bos(ptr, 1) for
> finding member sizes, it could do something like:
> 
> #define __member_size(ptr)\
>   (__builtin_has_flex_array_p(ptr) ? -1 : \
>__builtin_object_size(ptr, 1))
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH v2 00/10] Introduce strub: machine-independent stack scrubbing

2022-07-29 Thread Alexandre Oliva via Gcc-patches


This patch adds the strub attribute for function and variable types,
command-line options, passes and adjustments to implement it,
documentation, and tests.

Stack scrubbing is implemented in a machine-independent way: functions
with strub enabled are modified so that they take an extra stack
watermark argument, that they update with their stack use, and the
caller can then zero it out once it regains control, whether by return
or exception.  There are two ways to go about it: at-calls, that
modifies the visible interface (signature) of the function, and
internal, in which the body is moved to a clone, the clone undergoes
the interface change, and the function becomes a wrapper, preserving
its original interface, that calls the clone and then clears the stack
used by it.

Variables can also be annotated with the strub attribute, so that
functions that read from them get stack scrubbing enabled implicitly,
whether at-calls, for functions only usable within a translation unit,
or internal, for functions whose interfaces must not be modified.

There is a strict mode, in which functions that have their stack
scrubbed can only call other functions with stack-scrubbing
interfaces, or those explicitly marked as callable from strub
contexts, so that an entire call chain gets scrubbing, at once or
piecemeal depending on optimization levels.  In the default mode,
relaxed, this requirement is not enforced by the compiler.

The implementation adds two IPA passes, one that assigns strub modes
early on, another that modifies interfaces and adds calls to the
builtins that jointly implement stack scrubbing.  Another builtin,
that obtains the stack pointer, is added for use in the implementation
of the builtins, whether expanded inline or called in libgcc.

There are new command-line options to change operation modes and to
force the feature disabled; it is enabled by default, but it has no
effect and is implicitly disabled if the strub attribute is never
used.  There are also options meant to use for testing the feature,
enabling different strubbing modes for all (viable) functions.

I'm going to split up the very large patch into separate posts:
1:documentation, tests (2:C/C++ torture, 3:C/C++ non-torture, 4:C++
and Ada), 5:builtins and runtime, 6:attributes, 7:interfaces and
adjustments, and ipa-strub.cc fragments (8:strub modes, 9:strub mode
assignment pass, and 10:strub pass proper).

This is an refreshed and updated version of the patch posted in Sept
last year.  It was regstrapped on x86_64-linux-gnu, bootstrapped with a
patchlet that enables -fstrub=all by default (with a fix for a warning
exposed by it).  Earlier, essentially the same change has undergone
significant testing, with and without -fstrub=all on multiple embedded
platforms.  Ok to install?


for  gcc/ChangeLog

* Makefile.in (OBJS): Add ipa-strub.o.
* builtins.def (BUILT_IN_STACK_ADDRESS): New.
(BUILT_IN___STRUB_ENTER): New.
(BUILT_IN___STRUB_UPDATE): New.
(BUILT_IN___STRUB_LEAVE): New.
* builtins.cc: Include ipa-strub.h.
(STACK_STOPS, STACK_UNSIGNED): Define.
(expand_builtin_stack_address): New.
(expand_builtin_strub_enter): New.
(expand_builtin_strub_update): New.
(expand_builtin_strub_leave): New.
(expand_bulitin): Call them.
* common.opt (fstrub=*): New options.
* doc/extend.texi (strub): New type attribute.
(__builtin_stack_address): New function.
(Stack Scrubbing): New section.
* doc/invoke.texi (-fstrub=*): New options.
(-fdump-ipa-*): New passes.
* ipa-inline.cc: Include ipa-strub.h.
(can_inline_edge_p): Test strub_inlinable_to_p.
* ipa-split.cc: Include ipa-strub.h.
(execute_split_functions): Test strub_splittable_p.
* ipa-strub.cc, ipa-strub.h: New.
* passes.def: Add strub_mode and strub passes.
* tree-cfg.cc (gimple_verify_flow_info): Note on debug stmts.
* tree-pass.h (make_pass_ipa_strub_mode): Declare.
(make_pass_ipa_strub): Declare.
(make_pass_ipa_function_and_variable_visibility): Fix
formatting.
* tree-ssa-ccp.cc (optimize_stack_restore): Keep restores
before strub leave.
* multiple_target.cc (pass_target_clone::gate): Test seen_error.
* attribs.cc: Include ipa-strub.h.
(decl_attributes): Support applying attributes to function
type, rather than pointer type, at handler's request.
(comp_type_attributes): Combine strub_comptypes and target
comp_type results.

for  gcc/c-family/ChangeLog

* c-attribs.cc: Include ipa-strub.h.
(handle_strub_attribute): New.
(c_common_attribute_table): Add strub.

for  gcc/ada/ChangeLog

* gcc-interface/trans.cc: Include ipa-strub.h.
(gigi): Make internal decls for targets of compiler-generated
calls strub-callable too.
(build_raise_check): Likewise.

[x86_64 PATCH] Add rotl64ti2_doubleword pattern to i386.md

2022-07-29 Thread Roger Sayle

This patch adds rot[lr]64ti2_doubleword patterns to the x86_64 backend,
to move splitting of 128-bit TImode rotates by 64 bits after reload,
matching what we now do for 64-bit DImode rotations by 32 bits with -m32.

In theory moving when this rotation is split should have little
influence on code generation, but in practice "reload" sometimes
decides to make use of the increased flexibility to reduce the number
of registers used, and the code size, by using xchg.

For example:
__int128 x;
__int128 y;
__int128 a;
__int128 b;

void foo()
{
unsigned __int128 t = x;
t ^= a;
t = (t<<64) | (t>>64);
t ^= b;
y = t;
}

Before:
movqx(%rip), %rsi
movqx+8(%rip), %rdi
xorqa(%rip), %rsi
xorqa+8(%rip), %rdi
movq%rdi, %rax
movq%rsi, %rdx
xorqb(%rip), %rax
xorqb+8(%rip), %rdx
movq%rax, y(%rip)
movq%rdx, y+8(%rip)
ret

After:
movqx(%rip), %rax
movqx+8(%rip), %rdx
xorqa(%rip), %rax
xorqa+8(%rip), %rdx
xchgq   %rdx, %rax
xorqb(%rip), %rax
xorqb+8(%rip), %rdx
movq%rax, y(%rip)
movq%rdx, y+8(%rip)
ret

One some modern architectures this is a small win, on some older
architectures this is a small loss.  The decision which code to
generate is made in "reload", and could probably be tweaked by
register preferencing.  The much bigger win is that (eventually) all
TImode mode shifts and rotates by constants will become potential
candidates for TImode STV.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-07-29  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (define_expand ti3): For
rotations by 64 bits use new rot[lr]64ti2_doubleword pattern.
(rot[lr]64ti2_doubleword): New post-reload splitter.


Thanks again,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index fab6aed..f1158e1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -13820,6 +13820,8 @@
   if (const_1_to_63_operand (operands[2], VOIDmode))
 emit_insn (gen_ix86_ti3_doubleword
(operands[0], operands[1], operands[2]));
+  else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 64)
+emit_insn (gen_64ti2_doubleword (operands[0], operands[1]));
   else
 {
   rtx amount = force_reg (QImode, operands[2]);
@@ -14045,6 +14047,24 @@
 }
 })
 
+(define_insn_and_split "64ti2_doubleword"
+ [(set (match_operand:TI 0 "register_operand" "=r,r,r")
+   (any_rotate:TI (match_operand:TI 1 "nonimmediate_operand" "0,r,o")
+  (const_int 64)))]
+ "TARGET_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (match_dup 3))
+  (set (match_dup 2) (match_dup 1))]
+{
+  split_double_mode (TImode, [0], 2, [0], [2]);
+  if (rtx_equal_p (operands[0], operands[1]))
+{
+  emit_insn (gen_swapdi (operands[0], operands[2]));
+  DONE;
+}
+})
+
 (define_mode_attr rorx_immediate_operand
[(SI "const_0_to_31_operand")
 (DI "const_0_to_63_operand")])