Re: [PATCH] cprop_hardreg: Workaround for narrow mode != lowpart targets

2022-01-20 Thread Richard Sandiford via Gcc-patches
Andreas Krebbel via Gcc-patches  writes:
> On 1/14/22 20:41, Andreas Krebbel via Gcc-patches wrote:
>> On 1/14/22 08:37, Richard Biener wrote:
>> ...
>>> Can the gist of this bug be put into the GCC bugzilla so the rev can
>>> refer to it? 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104034
>> 
>>> Can we have a testcase even?
>> The testcase from Jakub is in the BZ. However, since it doesn't fail with 
>> head I didn't try to
>> include it in my patch.
>> 
>>> I'm not quite understanding the problem but is it that, say,
>>>
>>>  (subreg:DI (reg:V2DI ..) 0)
>>>
>>> isn't the same as
>>>
>>>  (lowpart:DI (reg:V2DI ...) 0)
>> 
>> (reg:DI v0) does not match the lower order bits of (reg:TI v0)
>> 
>>> ?  The regcprop code looks more like asking whether the larger reg
>>> is a composition of multiple other hardregs and will return the specific
>>> hardreg corresponding to the lowpart - so like if on s390 the vector
>>> registers overlap with some other regset.  But then doing the actual
>>> accesses via the other regset regs doesn't actually work?  Isn't the
>>> backend then lying to us (aka the mode_change_ok returns the
>>> wrong answer)?
>> 
>> can_change_mode_class should do the right thing. We return false in case 
>> somebody wants to change TI
>> to DI for a vector register. However, the hook never gets called like this 
>> from regcprop. regcprop
>> only asks whether it is ok to change (reg:TI r8) to (reg:DI r8) and that's 
>> indeed ok.
>
> After writing this I'm wondering whether this would be a better fix:
>
> diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> index 18132425ab2..b6a3f4e3804 100644
> --- a/gcc/regcprop.c
> +++ b/gcc/regcprop.c
> @@ -402,7 +402,8 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
> copy_mode,
>
>if (orig_mode == new_mode)
>  return gen_raw_REG (new_mode, regno);
> -  else if (mode_change_ok (orig_mode, new_mode, regno))
> +  else if (mode_change_ok (orig_mode, new_mode, regno)
> +   && mode_change_ok (copy_mode, new_mode, copy_regno))
>  {
>int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
>int use_nregs = hard_regno_nregs (copy_regno, new_mode);
>

Yeah, this looks good to me FWIW.

Richard


Re: [PATCH v2] Disable -fsplit-stack support on non-glibc targets

2022-01-20 Thread Andreas Krebbel via Gcc-patches
On 1/20/22 23:52, Richard Sandiford wrote:
> cc:ing the x86 and s390 maintainers
> 
> soeren--- via Gcc-patches  writes:
>> From: Sören Tempel 
>>
>> The -fsplit-stack option requires the pthread_t TCB definition in the
>> libc to provide certain struct fields at specific hardcoded offsets. As
>> far as I know, only glibc provides these fields at the required offsets.
>> Most notably, musl libc does not have these fields. However, since gcc
>> accesses the fields using a fixed offset, this does not cause a
>> compile-time error, but instead results in a silent memory corruption at
>> run-time with musl libc. For example, on s390x libgcc's
>> __stack_split_initialize CTOR will overwrite the cancel field in the
>> pthread_t TCB on musl.
>>
>> The -fsplit-stack option is used within the gcc code base itself by
>> gcc-go (if available). On musl-based systems with split-stack support
>> (i.e. s390x or x86) this causes Go programs compiled with gcc-go to
>> misbehave at run-time.
>>
>> This patch fixes gcc-go on musl by disabling -fsplit-stack in gcc itself
>> since it is not supported on non-glibc targets anyhow. This is achieved
>> by checking if gcc targets a glibc-based system. This check has been
>> added for x86 and s390x, the rs6000 config already checks for
>> TARGET_GLIBC_MAJOR. Other architectures do not have split-stack
>> support. With this patch applied, the gcc-go configure script will
>> detect that -fsplit-stack support is not available and will not use it.
>>
>> See https://www.openwall.com/lists/musl/2012/10/16/12
>>
>> This patch was written under the assumption that glibc is the only libc
>> implementation which supports the required fields at the required
>> offsets in the pthread_t TCB. The patch has been tested on Alpine Linux
>> Edge on the s390x and x86 architectures by bootstrapping Google's Go
>> implementation with gcc-go.
>>
>> Signed-off-by: Sören Tempel 
>>
>> gcc/ChangeLog:
>>
>>  * common/config/s390/s390-common.c (s390_supports_split_stack):
>>  Only support split-stack on glibc targets.
>>  * config/i386/gnu-user-common.h (STACK_CHECK_STATIC_BUILTIN): Ditto.
>>  * config/i386/gnu.h (defined): Ditto.

s390 parts are ok.

Thanks!

Andreas

>> ---
>> This version of the patch addresses feedback by Andrew Pinski and uses
>> OPTION_GLIBC as well as opts->x_linux_libc == LIBC_GLIBC to detect glibc
>> targets (instead of relying on TARGET_GLIBC_MAJOR).
>>
>>  gcc/common/config/s390/s390-common.c | 11 +--
>>  gcc/config/i386/gnu-user-common.h|  5 +++--
>>  gcc/config/i386/gnu.h|  6 +-
>>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> Sorry for the slow review.  The patch LGTM bar some minor formatting
> nits below, but target maintainers should have the final say.
> 
>> diff --git a/gcc/common/config/s390/s390-common.c 
>> b/gcc/common/config/s390/s390-common.c
>> index b6bc8501742..fc86e0bc5e7 100644
>> --- a/gcc/common/config/s390/s390-common.c
>> +++ b/gcc/common/config/s390/s390-common.c
>> @@ -116,13 +116,20 @@ s390_handle_option (struct gcc_options *opts 
>> ATTRIBUTE_UNUSED,
>>  
>>  /* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
>> We don't verify it, since earlier versions just have padding at
>> -   its place, which works just as well.  */
>> +   its place, which works just as well. For other libc implementations
> 
> GCC style is to use 2 spaces after a full stop.  Same for the x86 part.
> 
>> +   we disable the feature entirely to avoid corrupting the TCB.  */
>>  
>>  static bool
>>  s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
>> struct gcc_options *opts ATTRIBUTE_UNUSED)
> 
> These parameters are no longer unused after the patch, so it'd be good
> to remove the attributes.
> 
>>  {
>> -  return true;
>> +  if (opts->x_linux_libc == LIBC_GLIBC) {
>> +return true;
>> +  } else {
>> +if (report)
>> +  error("%<-fsplit-stack%> currently only supported on GNU/Linux");
>> +return false;
>> +  }
> 
> Normal GCC formatting would be something like:
> 
>   if (opts->x_linux_libc == LIBC_GLIBC)
> return true;
> 
>   if (report)
> error ("%<-fsplit-stack%> currently only supported on GNU/Linux");
>   return false;
> 
> Sorry for the fussy rules.
> 
> Thanks,
> Richard
> 
>>  }
>>  
>>  #undef TARGET_DEFAULT_TARGET_FLAGS
>> diff --git a/gcc/config/i386/gnu-user-common.h 
>> b/gcc/config/i386/gnu-user-common.h
>> index 00226f5a455..6e13315b5a3 100644
>> --- a/gcc/config/i386/gnu-user-common.h
>> +++ b/gcc/config/i386/gnu-user-common.h
>> @@ -66,7 +66,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #define STACK_CHECK_STATIC_BUILTIN 1
>>  
>>  /* We only build the -fsplit-stack support in libgcc if the
>> -   assembler has full support for the CFI directives.  */
>> -#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
>> +   assembler has full support for the CFI directives and
>> +   targets glibc.  */
>> +#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE 

Re: [PATCH] warn-access: Fix up warning_at arguments

2022-01-20 Thread Richard Biener via Gcc-patches
On Fri, 21 Jan 2022, Jakub Jelinek wrote:

> Hi!
> 
> A warning regression fix I'm about to post warns (and breaks bootstrap due
> to that) on the following spot.  Seems it is a copy and paste from
> earlier code that mentions the %qD variable instead of talking about
> unnamed temporary.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richrd.

> 2022-01-21  Jakub Jelinek  
> 
>   * gimple-ssa-warn-access.cc (pass_waccess::warn_invalid_pointer):
>   Avoid passing var to warning_at when the format string doesn't
>   refer to it.
> 
> --- gcc/gimple-ssa-warn-access.cc.jj  2022-01-20 21:24:40.600955673 +0100
> +++ gcc/gimple-ssa-warn-access.cc 2022-01-20 22:58:19.488226389 +0100
> @@ -3953,15 +3953,14 @@ pass_waccess::warn_invalid_pointer (tree
>   "may be used")
>  : G_("using dangling pointer %qE to an unnamed "
>   "temporary")),
> -   ref, var))
> +   ref))
>|| (!ref
> && warning_at (use_loc, OPT_Wdangling_pointer_,
>(maybe
> ? G_("dangling pointer to an unnamed temporary "
>  "may be used")
> : G_("using a dangling pointer to an unnamed "
> -"temporary")),
> -  var)))
> +"temporary")
>  {
>inform (DECL_SOURCE_LOCATION (var),
> "unnamed temporary defined here");
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] optabs: Don't create pseudos in prepare_cmp_insn when not allowed [PR102478]

2022-01-20 Thread Richard Biener via Gcc-patches
On Fri, 21 Jan 2022, Jakub Jelinek wrote:

> Hi!
> 
> cond traps can be created during ce3 after reload (and e.g. PR103028
> recently fixed some ce3 cond trap related bug, so I think often that
> works fine and we shouldn't disable cond traps after RA altogether),
> but it calls prepare_cmp_insn.  This function can fail, so I don't
> see why we couldn't make it work after RA (in most cases it already
> just works).  The first hunk is just an optimization which doesn't
> make sense after RA, so I've guarded it with can_create_pseudo_p.
> The second hunk is just a theoretical case, I don't have a testcase for it.
> prepare_cmp_insn has some other spots that can create pseudos, like when
> both operands have VOIDmode, or when it is BLKmode comparison, or
> not OPTAB_DIRECT, but I think none of that applies to ce3, we punt on
> BLKmode earlier, use OPTAB_DIRECT and shouldn't be comparing two
> VOIDmode CONST_INTs.
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux and powerpc64le-linux,
> ok for trunk?

OK.

Richard.

> 2022-01-21  Jakub Jelinek  
> 
>   PR rtl-optimization/102478
>   * optabs.cc (prepare_cmp_insn): If !can_create_pseudo_p (), don't
>   force_reg constants and for -fnon-call-exceptions fail if copy_to_reg
>   would be needed.
> 
>   * gcc.dg/pr102478.c: New test.
> 
> --- gcc/optabs.cc.jj  2022-01-20 11:30:45.586578023 +0100
> +++ gcc/optabs.cc 2022-01-20 16:54:35.439409550 +0100
> @@ -4398,12 +4398,14 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>/* If we are optimizing, force expensive constants into a register.  */
>if (CONSTANT_P (x) && optimize
>&& (rtx_cost (x, mode, COMPARE, 0, optimize_insn_for_speed_p ())
> -  > COSTS_N_INSNS (1)))
> +  > COSTS_N_INSNS (1))
> +  && can_create_pseudo_p ())
>  x = force_reg (mode, x);
>  
>if (CONSTANT_P (y) && optimize
>&& (rtx_cost (y, mode, COMPARE, 1, optimize_insn_for_speed_p ())
> -  > COSTS_N_INSNS (1)))
> +  > COSTS_N_INSNS (1))
> +  && can_create_pseudo_p ())
>  y = force_reg (mode, y);
>  
>/* Don't let both operands fail to indicate the mode.  */
> @@ -4472,6 +4474,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
>   compare and branch in different basic blocks.  */
>if (cfun->can_throw_non_call_exceptions)
>  {
> +  if (!can_create_pseudo_p () && (may_trap_p (x) || may_trap_p (y)))
> + goto fail;
>if (may_trap_p (x))
>   x = copy_to_reg (x);
>if (may_trap_p (y))
> --- gcc/testsuite/gcc.dg/pr102478.c.jj2022-01-20 17:00:52.220192056 
> +0100
> +++ gcc/testsuite/gcc.dg/pr102478.c   2022-01-20 17:00:30.917487026 +0100
> @@ -0,0 +1,29 @@
> +/* PR rtl-optimization/102478 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-if-conversion -Wno-div-by-zero" } */
> +
> +unsigned a, b, c;
> +
> +void
> +foo (void)
> +{
> +  c |= __builtin_expect (65535 / a, 0) && 0 / 0;
> +  b = 0;
> +}
> +
> +void
> +bar (void)
> +{
> +  if (a <= 65535)
> +__builtin_trap ();
> +  b = 0;
> +}
> +
> +void
> +baz (void)
> +{
> +  if (a > 65535)
> +b = 0;
> +  else
> +__builtin_trap ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


[PATCH] c-family: Fix up a -Wformat regression [PR104148]

2022-01-20 Thread Jakub Jelinek via Gcc-patches
Hi!

As can be seen on the testcase, GCC 11 no longer warns if the format
string is wrapped inside of ()s.
This regressed with r11-2457-gdf5cf47a978, which added
if (TREE_NO_WARNING (param)) return;
to check_function_arguments_recurse.  That function is used with a callback
for two cases, for -Wformat and for -Wnonnull.  For the latter it is
desirable to not warn in parameters or their subexpressions where that
warning is suppressed, but for -Wformat the function is used solely
to discover the string literals if any so that the c-format.cc code can
diagnose them.  I believe no warning suppression should stand in the
way of that, -Wformat* warnings should be decided from warning suppression
on the CALL_EXPR only.
In the PR Martin argued that now that we have specialized
warning_suppressed_p we should use it, so instead of adding a bool
arg to check_function_arguments_recurse I've added opt_code to the
function, but will defer the warning_suppressed_p change to him.
For OPT_Wformat_ we don't want to call it anyway at all (as I said,
I think there should be no suppression for it during the string discovery,
there isn't just one -Wformat= option, there are many and
warning_suppression_p even with no_warnings actually tests the
TREE_NO_WARNING bit).
Initially, I thought I'd restrict also call to fn with format_arg attribute
handling in check_function_arguments_recurse to OPT_Wformat_ only, but
after looking around, it perhaps is intentional that way, most functions
with format_arg attribute don't have nonnull attribute for that arg too,
various gettext implementations handle NULL argument by passing it through,
but when result of gettext (NULL) etc. is passed to non-NULL argument, it
makes sense to warn.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-21  Jakub Jelinek  

PR c++/104148
* c-common.h (check_function_arguments_recurse): Add for_format
arg.
* c-common.cc (check_function_nonnull): Pass false to
check_function_arguments_recurse's last argument.
(check_function_arguments_recurse): Add for_format argument,
if true, don't stop on warning_suppressed_p.
* c-format.cc (check_format_info): Pass true to
check_function_arguments_recurse's last argument.

* c-c++-common/Wformat-pr104148.c: New test.

--- gcc/c-family/c-common.h.jj  2022-01-20 11:30:45.329581659 +0100
+++ gcc/c-family/c-common.h 2022-01-20 19:36:35.195601009 +0100
@@ -853,7 +853,8 @@ extern void check_function_arguments_rec
  (void *, tree,
   unsigned HOST_WIDE_INT),
  void *, tree,
- unsigned HOST_WIDE_INT);
+ unsigned HOST_WIDE_INT,
+ opt_code);
 extern bool check_builtin_function_arguments (location_t, vec,
  tree, tree, int, tree *);
 extern void check_function_format (const_tree, tree, int, tree *,
--- gcc/c-family/c-common.cc.jj 2022-01-20 11:30:45.329581659 +0100
+++ gcc/c-family/c-common.cc2022-01-20 19:41:02.787901300 +0100
@@ -5592,7 +5592,7 @@ check_function_nonnull (nonnull_arg_ctx
   firstarg = 1;
   if (!closure)
check_function_arguments_recurse (check_nonnull_arg, , argarray[0],
- firstarg);
+ firstarg, OPT_Wnonnull);
 }
 
   tree attrs = lookup_attribute ("nonnull", TYPE_ATTRIBUTES (ctx.fntype));
@@ -5611,7 +5611,7 @@ check_function_nonnull (nonnull_arg_ctx
   if (a != NULL_TREE)
 for (int i = firstarg; i < nargs; i++)
   check_function_arguments_recurse (check_nonnull_arg, , argarray[i],
-   i + 1);
+   i + 1, OPT_Wnonnull);
   else
 {
   /* Walk the argument list.  If we encounter an argument number we
@@ -5627,7 +5627,8 @@ check_function_nonnull (nonnull_arg_ctx
 
  if (a != NULL_TREE)
check_function_arguments_recurse (check_nonnull_arg, ,
- argarray[i], i + 1);
+ argarray[i], i + 1,
+ OPT_Wnonnull);
}
 }
   return ctx.warned_p;
@@ -6095,14 +6096,16 @@ check_function_arguments (location_t loc
 
 /* Generic argument checking recursion routine.  PARAM is the argument to
be checked.  PARAM_NUM is the number of the argument.  CALLBACK is invoked
-   once the argument is resolved.  CTX is context for the callback.  */
+   once the argument is resolved.  CTX is context for the callback.
+   OPT is the warning for which this is done.  */
 void
 check_function_arguments_recurse (void (*callback)
  (void *, tree, unsigned 

[PATCH] warn-access: Fix up warning_at arguments

2022-01-20 Thread Jakub Jelinek via Gcc-patches
Hi!

A warning regression fix I'm about to post warns (and breaks bootstrap due
to that) on the following spot.  Seems it is a copy and paste from
earlier code that mentions the %qD variable instead of talking about
unnamed temporary.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-21  Jakub Jelinek  

* gimple-ssa-warn-access.cc (pass_waccess::warn_invalid_pointer):
Avoid passing var to warning_at when the format string doesn't
refer to it.

--- gcc/gimple-ssa-warn-access.cc.jj2022-01-20 21:24:40.600955673 +0100
+++ gcc/gimple-ssa-warn-access.cc   2022-01-20 22:58:19.488226389 +0100
@@ -3953,15 +3953,14 @@ pass_waccess::warn_invalid_pointer (tree
"may be used")
   : G_("using dangling pointer %qE to an unnamed "
"temporary")),
- ref, var))
+ ref))
   || (!ref
  && warning_at (use_loc, OPT_Wdangling_pointer_,
 (maybe
  ? G_("dangling pointer to an unnamed temporary "
   "may be used")
  : G_("using a dangling pointer to an unnamed "
-  "temporary")),
-var)))
+  "temporary")
 {
   inform (DECL_SOURCE_LOCATION (var),
  "unnamed temporary defined here");

Jakub



[PATCH] optabs: Don't create pseudos in prepare_cmp_insn when not allowed [PR102478]

2022-01-20 Thread Jakub Jelinek via Gcc-patches
Hi!

cond traps can be created during ce3 after reload (and e.g. PR103028
recently fixed some ce3 cond trap related bug, so I think often that
works fine and we shouldn't disable cond traps after RA altogether),
but it calls prepare_cmp_insn.  This function can fail, so I don't
see why we couldn't make it work after RA (in most cases it already
just works).  The first hunk is just an optimization which doesn't
make sense after RA, so I've guarded it with can_create_pseudo_p.
The second hunk is just a theoretical case, I don't have a testcase for it.
prepare_cmp_insn has some other spots that can create pseudos, like when
both operands have VOIDmode, or when it is BLKmode comparison, or
not OPTAB_DIRECT, but I think none of that applies to ce3, we punt on
BLKmode earlier, use OPTAB_DIRECT and shouldn't be comparing two
VOIDmode CONST_INTs.

Bootstrapped/regtested on x86_64-linux, i686-linux and powerpc64le-linux,
ok for trunk?

2022-01-21  Jakub Jelinek  

PR rtl-optimization/102478
* optabs.cc (prepare_cmp_insn): If !can_create_pseudo_p (), don't
force_reg constants and for -fnon-call-exceptions fail if copy_to_reg
would be needed.

* gcc.dg/pr102478.c: New test.

--- gcc/optabs.cc.jj2022-01-20 11:30:45.586578023 +0100
+++ gcc/optabs.cc   2022-01-20 16:54:35.439409550 +0100
@@ -4398,12 +4398,14 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
   /* If we are optimizing, force expensive constants into a register.  */
   if (CONSTANT_P (x) && optimize
   && (rtx_cost (x, mode, COMPARE, 0, optimize_insn_for_speed_p ())
-  > COSTS_N_INSNS (1)))
+  > COSTS_N_INSNS (1))
+  && can_create_pseudo_p ())
 x = force_reg (mode, x);
 
   if (CONSTANT_P (y) && optimize
   && (rtx_cost (y, mode, COMPARE, 1, optimize_insn_for_speed_p ())
-  > COSTS_N_INSNS (1)))
+  > COSTS_N_INSNS (1))
+  && can_create_pseudo_p ())
 y = force_reg (mode, y);
 
   /* Don't let both operands fail to indicate the mode.  */
@@ -4472,6 +4474,8 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx
  compare and branch in different basic blocks.  */
   if (cfun->can_throw_non_call_exceptions)
 {
+  if (!can_create_pseudo_p () && (may_trap_p (x) || may_trap_p (y)))
+   goto fail;
   if (may_trap_p (x))
x = copy_to_reg (x);
   if (may_trap_p (y))
--- gcc/testsuite/gcc.dg/pr102478.c.jj  2022-01-20 17:00:52.220192056 +0100
+++ gcc/testsuite/gcc.dg/pr102478.c 2022-01-20 17:00:30.917487026 +0100
@@ -0,0 +1,29 @@
+/* PR rtl-optimization/102478 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-if-conversion -Wno-div-by-zero" } */
+
+unsigned a, b, c;
+
+void
+foo (void)
+{
+  c |= __builtin_expect (65535 / a, 0) && 0 / 0;
+  b = 0;
+}
+
+void
+bar (void)
+{
+  if (a <= 65535)
+__builtin_trap ();
+  b = 0;
+}
+
+void
+baz (void)
+{
+  if (a > 65535)
+b = 0;
+  else
+__builtin_trap ();
+}

Jakub



Re: [PATCH v2, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-20 Thread HAO CHEN GUI via Gcc-patches
Thanks so much for your advice. Please see my comments.

On 21/1/2022 上午 5:42, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jan 20, 2022 at 01:46:48PM -0500, David Edelsohn wrote:
>> On Thu, Jan 20, 2022 at 2:36 AM HAO CHEN GUI  wrote:
>>>This patch adds a combine pattern for "CA minus one". As CA only has two
>>> values (0 or 1), we could convert following pattern
>>>   (sign_extend:DI (plus:SI (reg:SI 98 ca)
>>> (const_int -1 [0x]
>>> to
>>>(plus:DI (reg:DI 98 ca)
>>> (const_int -1 [0x])))
>>>With this patch, one unnecessary sign extend is eliminated.
>>>
>>>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>>> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> There are ten gazillion similar things we could make extra backend
> patterns for, and we still would not cover a majority of cases.
> 
> If instead we got some generic way to handle this we could cover many
> more cases, for much less effort.
Could we add an additional pass to exam the finally generated instructions
and its used registers to decide which extension is unnecessary?
> 
> We need both widening modes from SI to DI, amd narrowing modes from DI
> to SI.  Both are useful in certain cases; it is not like using wider
> modes is always better, in some cases narrower modes is better (in cases
> where we can let the generated code then generate whatever bits in the
> high half of the word, for example; a typical example is addition in an
> unsigned int).
Just for this case, converting CA from DI to SI is supported in simplify_rtx.
The original comparison result is in DI mode. But it's truncated to SI mode as
C standard requires.

Trying 8 -> 11:
8: {r127:DI=ca:DI-0x1;clobber ca:DI;}
  REG_DEAD ca:DI
  REG_UNUSED ca:DI
   11: r128:SI=r127:DI#0
  REG_DEAD r127:DI
Successfully matched this instruction:
(set (reg:SI 128)
(plus:SI (reg:SI 98 ca)
(const_int -1 [0x])))
allowing combination of insns 8 and 11
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 8.
modifying insn i311: {r128:SI=ca:SI-0x1;clobber ca:SI;}
  REG_UNUSED ca:SI
deferring rescan insn with uid = 11.

The C standard type promotion requirement and 64-bit return value are the
root cause of such problem, I think.
> 
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
>>> @@ -0,0 +1,10 @@
>>> +/* PR target/95737 */
>>> +/* { dg-do compile { target lp64 } } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
>>
>> Why does the testcase force power8? This testcase is not specific to
>> Power8 or later.
> 
> Yes, and we should generate the same code on older machines.
> 
>>> +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
>>> +
>>> +
>>> +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
>>> long b)
>>> +{
>>> +   return -(a < b);
>>> +}
>>
>> If you're only testing for lp64, the testcase could use "long" instead
>> of "long long".
> 
> The testcase really needs "powerpc64", if that would mean "test if
> -mpowerpc64 is (implicitly) used".  But that is not what it currently
> means (it is something akin to "powerpc64_hw", instead).
> 
> So we test lp64, which is set if and only if -m64 was used.  It is
> reasonable coverage, no one cares much for -m32 -mpowerpc64 .
> 
> 
> Segher


[committed] Fix expected output for various MIPS multiplication tests

2022-01-20 Thread Jeff Law via Gcc-patches
The recent multiply-highpart work twiddled code generation on the MIPS 
targets and is causing mips.exp failures.


The resultant code is actually better and matches a comment in the test 
files which indicates that it would be better to generate a 
mult-highpart.  So I'm pretty confident in removing the undesired mflo & 
changing the name of the target pattern we expect to see.


This fixes the mips64 and mips64el failures in my tester.  I suspect 
it'll also fix the failures on mipsisa32, but that target is 
bootstrapped with qemu -- which takes forever so it only runs once a 
week ;-)


Committed.

Jeffcommit 6f45deb2aed804b185e7dabd2392bfbe14e9bb57
Author: Jeff Law 
Date:   Thu Jan 20 23:48:03 2022 -0500

[committed] Fix expected output for various MIPS multiplication tests

The recent multiply-highpart work twiddled code generation on the MIPS 
targets
and is causing mips.exp failures.

The resultant code is actually better and matches a comment in the test 
files
which indicates that it would be better to generate a mult-highpart.  So I'm
pretty confident in removing the undesired mflo & changing the name of the
target pattern we expect to see.

This fixes the mips64 and mips64el failures in my tester.  I suspect it'll
also fix the failures on mipsisa32, but that target is bootstrapped with 
qemu --
which takes forever so it only runs once a week ;-)

gcc/testsuite
* gcc.target/mips/fix-r4000-2.c: Update expected output.
* gcc.target/mips/fix-r4000-3.c: Update expected output.  Add
-fexpensive-optimizations for consistency in output.
* gcc.target/mips/fix-r4000-7.c: Update expected output.
* gcc.target/mips/fix-r4000-8.c: Update expected output.

diff --git a/gcc/testsuite/gcc.target/mips/fix-r4000-2.c 
b/gcc/testsuite/gcc.target/mips/fix-r4000-2.c
index 4290d5f7fab..e0e65d60f42 100644
--- a/gcc/testsuite/gcc.target/mips/fix-r4000-2.c
+++ b/gcc/testsuite/gcc.target/mips/fix-r4000-2.c
@@ -4,6 +4,4 @@
 typedef int int32_t;
 typedef long long int64_t;
 NOMIPS16 int32_t foo (int32_t x, int32_t y) { return ((int64_t) x * y) >> 32; }
-/* ??? A highpart pattern would be a better choice, but we currently
-   don't use them.  */
-/* { dg-final { scan-assembler "[concat 
{\tmult\t\$[45],\$[45][^\n]+mulsidi3_32bit_r4000\n\tmflo\t\$3\n\tmfhi\t\$2\n}]" 
} } */
+/* { dg-final { scan-assembler "[concat 
{\tmult\t\$[45],\$[45][^\n]+smulsi3_highpart_internal\n\tmfhi\t\$2\n}]" } } */
diff --git a/gcc/testsuite/gcc.target/mips/fix-r4000-3.c 
b/gcc/testsuite/gcc.target/mips/fix-r4000-3.c
index 5bc8fc8ddd4..ec9d655dcfc 100644
--- a/gcc/testsuite/gcc.target/mips/fix-r4000-3.c
+++ b/gcc/testsuite/gcc.target/mips/fix-r4000-3.c
@@ -1,8 +1,6 @@
-/* { dg-options "-mips1 -mfix-r4000 -dp -EB" } */
+/* { dg-options "-mips1 -mfix-r4000 -dp -EB -fexpensive-optimizations" } */
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
 typedef unsigned int uint32_t;
 typedef unsigned long long uint64_t;
 NOMIPS16 uint32_t foo (uint32_t x, uint32_t y) { return ((uint64_t) x * y) >> 
32; }
-/* ??? A highpart pattern would be a better choice, but we currently
-   don't use them.  */
-/* { dg-final { scan-assembler "[concat 
{\tmultu\t\$[45],\$[45][^\n]+umulsidi3_32bit_r4000\n\tmflo\t\$3\n\tmfhi\t\$2\n}]"
 } } */
+/* { dg-final { scan-assembler "[concat 
{\tmultu\t\$[45],\$[45][^\n]+umulsi3_highpart_internal\n\tmfhi\t\$2\n}]" } } */
diff --git a/gcc/testsuite/gcc.target/mips/fix-r4000-7.c 
b/gcc/testsuite/gcc.target/mips/fix-r4000-7.c
index 25178db9070..9b1057774e9 100644
--- a/gcc/testsuite/gcc.target/mips/fix-r4000-7.c
+++ b/gcc/testsuite/gcc.target/mips/fix-r4000-7.c
@@ -4,6 +4,4 @@
 typedef long long int64_t;
 typedef int int128_t __attribute__((mode(TI)));
 NOMIPS16 int64_t foo (int64_t x, int64_t y) { return ((int128_t) x * y) >> 64; 
}
-/* ??? A highpart pattern would be a better choice, but we currently
-   don't use them.  */
-/* { dg-final { scan-assembler "[concat 
{\tdmult\t\$[45],\$[45][^\n]+mulditi3_r4000\n\tmflo\t\$3\n\tmfhi\t\$2\n}]" } } 
*/
+/* { dg-final { scan-assembler "[concat 
{\tdmult\t\$[45],\$[45][^\n]+smuldi3_highpart_internal\n\tmfhi\t\$2\n}]" } } */
diff --git a/gcc/testsuite/gcc.target/mips/fix-r4000-8.c 
b/gcc/testsuite/gcc.target/mips/fix-r4000-8.c
index eae148817ce..1ce35df3014 100644
--- a/gcc/testsuite/gcc.target/mips/fix-r4000-8.c
+++ b/gcc/testsuite/gcc.target/mips/fix-r4000-8.c
@@ -1,8 +1,6 @@
-/* { dg-options "-march=r4000 -mfix-r4000 -mgp64 -dp -EB" } */
+/* { dg-options "-march=r4000 -mfix-r4000 -mgp64 -dp -EB 
-fexpensive-optimizations" } */
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
 typedef unsigned long long uint64_t;
 typedef unsigned int uint128_t __attribute__((mode(TI)));
 NOMIPS16 uint64_t foo (uint64_t x, uint64_t y) { return ((uint128_t) x * y) >> 
64; }
-/* ??? A highpart pattern would be a better choice, but we currently
-   don't 

libgo patch committed: Build panic32.go on amd64p32

2022-01-20 Thread Ian Lance Taylor via Gcc-patches
This libgo patch builds panic32.go on amd64p32 (that is, x86 x32
mode).  This fixes GCC PR 104149.  Bootstrapped and ran Go testsuite
on x86_64-pc-linux-gnu, and H.J. tested the patch on x32 (thanks!).
Committed to mainline.

Ian
4f614712c908ca00fdb83057420e0023c9171477
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 9cc6a1c63c6..a42d88d25c4 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-799e9807c36fc661b14dfff136369556f09a5ebf
+7d510bf5fcec9b0ccc0282f4193a80c0a164df63
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/runtime/panic32.go b/libgo/go/runtime/panic32.go
index a2bf7e8fa2a..11d2a8450d9 100644
--- a/libgo/go/runtime/panic32.go
+++ b/libgo/go/runtime/panic32.go
@@ -2,8 +2,8 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-//go:build 386 || arm || mips || mipsle || armbe || m68k || nios2 || ppc || 
riscv || s390 || sh || shbe || sparc
-// +build 386 arm mips mipsle armbe m68k nios2 ppc riscv s390 sh shbe sparc
+//go:build 386 || amd64p32 || arm || mips || mipsle || armbe || m68k || nios2 
|| ppc || riscv || s390 || sh || shbe || sparc
+// +build 386 amd64p32 arm mips mipsle armbe m68k nios2 ppc riscv s390 sh shbe 
sparc
 
 package runtime
 


[PATCH v2] c++: ICE with noexcept and canonical types [PR101715]

2022-01-20 Thread Marek Polacek via Gcc-patches
On Thu, Jan 20, 2022 at 03:23:24PM -0500, Jason Merrill wrote:
> On 1/18/22 11:05, Marek Polacek wrote:
> > On Mon, Jan 17, 2022 at 01:48:48PM -0500, Jason Merrill wrote:
> > > On 1/14/22 19:22, Marek Polacek wrote:
> > > > This is a "canonical types differ for identical types" ICE, which 
> > > > started
> > > > with r11-4682.  It's a bit tricky to explain.  Consider:
> > > > 
> > > > template  struct S {
> > > >   S bar() noexcept(T::value);  // #1
> > > >   S foo() noexcept(T::value);  // #2
> > > > };
> > > > 
> > > > template  S S::foo() noexcept(T::value) {}  // #3
> > > > 
> > > > We ICE because #3 and #2 have the same type, but their canonical types
> > > > differ: TYPE_CANONICAL (#3) == #2 but TYPE_CANONICAL (#2) == #1.
> > > > 
> > > > The member functions #1 and #2 have the same type.  However, since their
> > > > noexcept-specifier is deferred, when parsing them, we create a variant 
> > > > for
> > > > both of them, because DEFERRED_PARSE cannot be compared.  In other 
> > > > words,
> > > > build_cp_fntype_variant's
> > > > 
> > > > tree v = TYPE_MAIN_VARIANT (type);
> > > > for (; v; v = TYPE_NEXT_VARIANT (v))
> > > >   if (cp_check_qualified_type (v, type, type_quals, rqual, raises, 
> > > > late))
> > > > return v;
> > > > 
> > > > will *not* find an existing variant when creating a method_type for #2, 
> > > > so we
> > > > have to create a new one.
> > > > 
> > > > But then we perform delayed parsing and call 
> > > > fixup_deferred_exception_variants
> > > > for #1 and #2.  f_d_e_v will replace TYPE_RAISES_EXCEPTIONS with the 
> > > > newly
> > > > parsed noexcept-specifier.  It also sets TYPE_CANONICAL (#2) to #1.  
> > > > Both
> > > > noexcepts turned out to be the same, so now we have two equivalent 
> > > > variants in
> > > > the list!  I.e.,
> > > > 
> > > > +-+  +-+  +-+
> > > > |  main   |  |  #2 |  |  #1 |
> > > > | S S::(S*) |->| S S::(S*) |->| S S::(S*) 
> > > > |->NULL
> > > > |-|  |  noex(T::value) |  |  noex(T::value) |
> > > > +-+  +-+  +-+
> > > > 
> > > > Then we get to #3.  As for #1 and #2, grokdeclarator calls 
> > > > build_memfn_type,
> > > > which ends up calling build_cp_fntype_variant, which will use the loop
> > > > above to look for an existing variant.  The first one that matches
> > > > cp_check_qualified_type will be used, so we use #2 rather than #1, and 
> > > > the
> > > > TYPE_CANONICAL mismatch follows.  Hopefully that makes sense.
> > > 
> > > Why doesn't the TYPE_CANONICAL (v) == v check prevent this?
> > 
> > In other words, I think you're asking: why did 
> > fixup_deferred_exception_variants
> > set TYPE_CANONICAL (#2) to #1 (which then differs from TYPE_CANONICAL (#3),
> > which is #2)?
> 
> I meant to ask why TYPE_CANONICAL (#3) got set to #2 instead of #1?
> 
> And to answer my own question, it's because the check I mention is in
> fixup_deferred_exception_variants, and #3 doesn't go through there at all;
> the loop in build_cp_fntype_variant assumes no duplicate variants, which
> your patch fixes.

Right, fixup_deferred_exception_variants is only called for fn decls in
unparsed_noexcepts.

> > The method_type for #1 (I'll mark is as #1 here) is built with it being its 
> > own
> > canonical type.
> > 
> > The first call to fixup_deferred_exception_variants does not change it: in
> > there, VARIANT is #1, the loop with 'TYPE_CANONICAL (v) == v' cannot find
> > an existing variant that would match, so when we do
> > 
> >  v = build_cp_fntype_variant (TYPE_CANONICAL (variant),
> >   rqual, cr, false);
> > we get #1 so
> >  TYPE_CANONICAL (variant) = v;
> > is just
> >  TYPE_CANONICAL (#1) = #1;
> > so no change.
> > 
> > The second call to fixup_deferred_exception_variants: here we're working 
> > with
> > VARIANT #2.  Now we again scan the list of variants {main, #2, #1} where we
> > find a match for #2: #1.  #1's TYPE_CANONICAL is #1 as per above, so we set
> >  TYPE_CANONICAL (#2) = #1;
> > which I think is correct.
> > 
> > 
> > I think TYPE_CANONICAL (#3) should also be #1, not #2, which my patch 
> > attempts
> > to do.
> > 
> > 
> > Hope this explanation makes some sense, please ask away if it doesn't!
> > 
> > > > As for the fix, I didn't think I could rewrite the method_type #2 with 
> > > > #1
> > > > because the type may have escaped via decltype.  So my approach is to
> > > > elide #2 from the list, so when looking for a matching variant, we 
> > > > always
> > > > find #1 (#2 remains live though, which admittedly sounds sort of dodgy).
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?
> > > > 
> > > > PR c++/101715
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * tree.c (fixup_deferred_exception_variants): 

[committed] analyzer: reject ((i + 1 > 0) && (i < 0)) for integers [PR94362]

2022-01-20 Thread David Malcolm via Gcc-patches
PR analyzer/94362 reports a false positive from
-Wanalyzer-null-dereference seen when analyzing OpenSSL.

The root cause is that the analyzer's path feasibility checker
erroneously considers this to be feasible:
  (R + 1 > 0) && (R < 0)
for int R (the return value from sk_EVP_PKEY_ASN1_METHOD_num),
whereas it's not satisfiable for any int R.

This patch makes the constraint manager try harder to reject
such combinations of conditions, fixing the false positive;
perhaps in the longer term we ought to use an SMT solver.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-6782-gc4b8f3730a80025192fdb485ad2535c165340e41.

gcc/analyzer/ChangeLog:
PR analyzer/94362
* constraint-manager.cc (bound::ensure_closed): Convert param to
enum bound_kind.
(range::constrained_to_single_element): Likewise.
(range::add_bound): New.
(constraint_manager::add_constraint): Handle SVAL + OFFSET
compared to a constant.
(constraint_manager::get_ec_bounds): Rewrite in terms of
range::add_bound.
(constraint_manager::eval_condition): Reject if range::add_bound
fails.
(selftest::test_constant_comparisons): Add test coverage for
various impossible combinations of integer comparisons.
* constraint-manager.h (enum bound_kind): New.
(struct bound): Likewise.
(bound::ensure_closed): Convert to param to enum bound_kind.
(struct range): Convert to...
(class range): ...this, making fields private.
(range::add_bound): New decls.
* region-model.cc (region_model::add_constraint): Fail if
constraint_manager::add_constraint fails.

gcc/testsuite/ChangeLog:
PR analyzer/94362
* gcc.dg/analyzer/pr94362-1.c: New test.
* gcc.dg/analyzer/pr94362-2.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/constraint-manager.cc| 172 --
 gcc/analyzer/constraint-manager.h |  15 +-
 gcc/analyzer/region-model.cc  |   5 +-
 gcc/testsuite/gcc.dg/analyzer/pr94362-1.c |  60 
 gcc/testsuite/gcc.dg/analyzer/pr94362-2.c |  42 ++
 5 files changed, 281 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr94362-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr94362-2.c

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index 568e7150ea7..7c4a85bbb24 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -117,7 +117,7 @@ minus_one (tree cst)
closed one.  */
 
 void
-bound::ensure_closed (bool is_upper)
+bound::ensure_closed (enum bound_kind bound_kind)
 {
   if (!m_closed)
 {
@@ -125,7 +125,7 @@ bound::ensure_closed (bool is_upper)
 For example, convert 3 < x into 4 <= x,
 and convert x < 5 into x <= 4.  */
   gcc_assert (CONSTANT_CLASS_P (m_constant));
-  m_constant = fold_build2 (is_upper ? MINUS_EXPR : PLUS_EXPR,
+  m_constant = fold_build2 (bound_kind == BK_UPPER ? MINUS_EXPR : 
PLUS_EXPR,
TREE_TYPE (m_constant),
m_constant, integer_one_node);
   gcc_assert (CONSTANT_CLASS_P (m_constant));
@@ -205,8 +205,8 @@ range::constrained_to_single_element ()
 return NULL_TREE;
 
   /* Convert any open bounds to closed bounds.  */
-  m_lower_bound.ensure_closed (false);
-  m_upper_bound.ensure_closed (true);
+  m_lower_bound.ensure_closed (BK_LOWER);
+  m_upper_bound.ensure_closed (BK_UPPER);
 
   // Are they equal?
   tree comparison = fold_binary (EQ_EXPR, boolean_type_node,
@@ -301,6 +301,80 @@ range::above_upper_bound (tree rhs_const) const
m_upper_bound.m_constant).is_true ();
 }
 
+/* Attempt to add B to the bound of the given kind of this range.
+   Return true if feasible; false if infeasible.  */
+
+bool
+range::add_bound (bound b, enum bound_kind bound_kind)
+{
+  b.ensure_closed (bound_kind);
+
+  switch (bound_kind)
+{
+default:
+  gcc_unreachable ();
+case BK_LOWER:
+  /* Discard redundant bounds.  */
+  if (m_lower_bound.m_constant)
+   {
+ m_lower_bound.ensure_closed (BK_LOWER);
+ if (!tree_int_cst_lt (b.m_constant,
+   m_lower_bound.m_constant))
+   return true;
+   }
+  m_lower_bound = b;
+  break;
+case BK_UPPER:
+  /* Discard redundant bounds.  */
+  if (m_upper_bound.m_constant)
+   {
+ m_upper_bound.ensure_closed (BK_UPPER);
+ if (tree_int_cst_le (b.m_constant,
+  m_upper_bound.m_constant))
+   return true;
+   }
+  m_upper_bound = b;
+  break;
+}
+  if (m_lower_bound.m_constant
+  && m_upper_bound.m_constant)
+{
+  m_lower_bound.ensure_closed (BK_LOWER);
+  m_upper_bound.ensure_closed (BK_UPPER);
+
+  /* Reject LOWER <= V <= UPPER when 

[committed] analyzer: add regression test [PR103685]

2022-01-20 Thread David Malcolm via Gcc-patches
PR analyzer/103685 reports a false positive from -Wanalyzer-null-dereference
seen at -O2 with GCC 11.  I can reproduce it with GCC 11, but not with
trunk; this patch adds a reduced test case that reproduces it with
GCC 11 as a regression test for GCC 12 onwards.

Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-6781-gf5a9d76be849b4cf20b1b687febc34a937381dc3.

gcc/testsuite/ChangeLog:
PR analyzer/103685
* gcc.dg/analyzer/torture/pr103685.c: New test.

Signed-off-by: David Malcolm 
---
 .../gcc.dg/analyzer/torture/pr103685.c| 33 +++
 1 file changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/pr103685.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr103685.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr103685.c
new file mode 100644
index 000..1b222487417
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr103685.c
@@ -0,0 +1,33 @@
+typedef struct ec_key_st EC_KEY;
+typedef struct ec_group_st EC_GROUP;
+typedef struct R3410_ec {
+  int nid;
+  EC_GROUP *group;
+} R3410_ec_params;
+extern R3410_ec_params R3410_2012_512_paramset[];
+
+static R3410_ec_params *gost_nid2params(int nid) {
+  R3410_ec_params *params;
+
+  params = R3410_2012_512_paramset;
+  while (params->nid != 0) {
+if (params->nid == nid)
+  return params;
+params++;
+  }
+
+  return ((void *)0);
+}
+
+int fill_GOST_EC_params(EC_KEY *eckey, int nid) {
+  R3410_ec_params *params = gost_nid2params(nid);
+  if (!eckey || !params) {
+return 0;
+  }
+
+  if (params->group) {
+return 1;
+  }
+
+  return 0;
+}
-- 
2.26.3



Re: [PATCH] dwarf2out: Fix -gsplit-dwarf on riscv [PR103874]

2022-01-20 Thread Palmer Dabbelt

On Thu, 20 Jan 2022 13:33:35 PST (-0800), Palmer Dabbelt wrote:

On Thu, 20 Jan 2022 13:20:34 PST (-0800), gcc-patches@gcc.gnu.org wrote:

On Thu, Jan 20, 2022 at 01:13:45PM -0800, Palmer Dabbelt wrote:

On Thu, 20 Jan 2022 02:45:53 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> riscv*-*-* are the only modern targets that !HAVE_AS_LEB128 (apparently
> due to some aggressive linker optimizations).

I don't really understand the rest of this, but we do have a subset of
LEB128 (constant expressions only).  I'm not sure exactly what the
requirements are here, but one could imagine extending our assembler support
to cover them -- we might never have full support for LEB128 expressions
(because of linker relaxation), but we might be able to make more stuff
work.

I'm not sure if that helps or hurts, though, as we'll still be a special
case.


HAVE_AS_LEB128 really needs to be able to handle both constants and
difference of two labels in the same section.
Most targets resolve something like that in the assembler as constant which
they encode into sleb128 or uleb128 and put into the section that uses
those directives (typically debugging sections).
If a target performs aggressive linker relaxation, then probably some
relocation would need to be added (but one that can encode the two symbols
for the difference, so perhaps two relocations that must be consecutive or
something similar) and resolve that by the linker.  Though, that would mean
that even in the debugging section offsets wouldn't be fixed during
assembly...


Differences are the hard case for RISC-V, as they can grow numerically.
That could  then cause the LEB to grow in byte size, possibly violating
one of our linker relaxation invariants.  The only way I've come up with
to support these would be to pad the LEBs, and I'm not sure if that's
legal.

Not sure if I'm missing something, though.


Andrew points out that label differences within the same section can't 
increase, so this might be a lot more manageable than I thought it was.


[PATCH v2] constrain conservative string lengths to array sizes [PR104119]

2022-01-20 Thread Martin Sebor via Gcc-patches

The updated patch ensures the tighter bound isn't used to compute
the sprintf result and adds a test to verify that.  (This is messy
in the strlen/sprintf pass and should be cleaned up to avoid this
mistake in the future.)

Rested on x86_64-linux.

On 1/19/22 18:20, Martin Sebor wrote:

The attached patch suppresses a class of unexpected -Wformat-overflow
(and -truncation) warnings introduced as a result of better range info
with the integration of the strlen pass with Ranger.

The sprintf warning code relies on the strlen pass data to determine
the lengths of string arguments to %s directives.  When the data for
a string are present, such as after a strlen call, the length can be
either a constant or, in the case of interest, a range (including
[N, PTRDIFF_MAX - 2] for a string of unbounded length).  When absent
because no string call has been seen yet, the string length is
considered to be bounded by the size of the array it's stored in.
This constrains the maximum number of bytes output by the %s directive
and reduces false positives.

The problem this patch addresses is that in the interesting case there
is no logic similar to the last ("no data") case, and so the maximum
number of bytes can be in excess of the size of the array.  The patch
does it by computing the size of the object (or member) in which
the string is stored and using its size minus 1 as the upper bound
on the length.  To do that, I had to adjust the APIs to pass in
the pointer_query instance of the range_query.  The meat of the change
is in the new get_maxbound() function.

There might be opportunities to do better still.  I'll try to look
into them if I still have time.

Tested on x86_64-linux.

Martin
Constrain conservative string lengths to array sizes [PR104119].

Resolves:
PR tree-optimization/104119 - unexpected -Wformat-overflow after strlen in ILP32 since Ranger integration

gcc/ChangeLog:

	PR tree-optimization/104119
	* gimple-ssa-sprintf.cc (struct directive): Change argument type.
	(format_none): Same.
	(format_percent): Same.
	(format_integer): Same.
	(format_floating): Same.
	(get_string_length): Same.
	(format_character): Same.
	(format_string): Same.
	(format_plain): Same.
	(format_directive): Same.
	(compute_format_length): Same.
	(handle_printf_call): Same.
	* tree-ssa-strlen.cc (get_range_strlen_dynamic): Same.   Call
	get_maxbound.
	(get_range_strlen_phi): Same.
	(get_maxbound): New function.
	(strlen_pass::get_len_or_size): Adjust to parameter change.
	* tree-ssa-strlen.h (get_range_strlen_dynamic): Change argument type.

gcc/testsuite/ChangeLog:

	PR tree-optimization/104119
	* gcc.dg/tree-ssa/builtin-snprintf-13.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-29.c: New test.

diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
index 98ab563a01b..c93f12f90b5 100644
--- a/gcc/gimple-ssa-sprintf.cc
+++ b/gcc/gimple-ssa-sprintf.cc
@@ -600,7 +600,7 @@ struct directive
 
   /* Format conversion function that given a directive and an argument
  returns the formatting result.  */
-  fmtresult (*fmtfunc) (const directive &, tree, range_query *);
+  fmtresult (*fmtfunc) (const directive &, tree, pointer_query &);
 
   /* Return True when the format flag CHR has been used.  */
   bool get_flag (char chr) const
@@ -968,7 +968,7 @@ directive::set_precision (tree arg, range_query *query)
 /* Return the result of formatting a no-op directive (such as '%n').  */
 
 static fmtresult
-format_none (const directive &, tree, range_query *)
+format_none (const directive &, tree, pointer_query &)
 {
   fmtresult res (0);
   return res;
@@ -977,7 +977,7 @@ format_none (const directive &, tree, range_query *)
 /* Return the result of formatting the '%%' directive.  */
 
 static fmtresult
-format_percent (const directive &, tree, range_query *)
+format_percent (const directive &, tree, pointer_query &)
 {
   fmtresult res (1);
   return res;
@@ -1199,7 +1199,7 @@ adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
used when the directive argument or its value isn't known.  */
 
 static fmtresult
-format_integer (const directive , tree arg, range_query *query)
+format_integer (const directive , tree arg, pointer_query _qry)
 {
   tree intmax_type_node;
   tree uintmax_type_node;
@@ -1383,7 +1383,7 @@ format_integer (const directive , tree arg, range_query *query)
   /* Try to determine the range of values of the integer argument
 	 (range information is not available for pointers).  */
   value_range vr;
-  query->range_of_expr (vr, arg, dir.info->callstmt);
+  ptr_qry.rvals->range_of_expr (vr, arg, dir.info->callstmt);
 
   if (!vr.varying_p () && !vr.undefined_p ())
 	{
@@ -1414,7 +1414,7 @@ format_integer (const directive , tree arg, range_query *query)
 	  if (code == INTEGER_CST)
 		{
 		  arg = gimple_assign_rhs1 (def);
-		  return format_integer (dir, arg, query);
+		  return format_integer (dir, arg, ptr_qry);
 		}
 
 	  if (code == NOP_EXPR)
@@ -1459,16 

Re: [PATCH v2] Disable -fsplit-stack support on non-glibc targets

2022-01-20 Thread Richard Sandiford via Gcc-patches
cc:ing the x86 and s390 maintainers

soeren--- via Gcc-patches  writes:
> From: Sören Tempel 
>
> The -fsplit-stack option requires the pthread_t TCB definition in the
> libc to provide certain struct fields at specific hardcoded offsets. As
> far as I know, only glibc provides these fields at the required offsets.
> Most notably, musl libc does not have these fields. However, since gcc
> accesses the fields using a fixed offset, this does not cause a
> compile-time error, but instead results in a silent memory corruption at
> run-time with musl libc. For example, on s390x libgcc's
> __stack_split_initialize CTOR will overwrite the cancel field in the
> pthread_t TCB on musl.
>
> The -fsplit-stack option is used within the gcc code base itself by
> gcc-go (if available). On musl-based systems with split-stack support
> (i.e. s390x or x86) this causes Go programs compiled with gcc-go to
> misbehave at run-time.
>
> This patch fixes gcc-go on musl by disabling -fsplit-stack in gcc itself
> since it is not supported on non-glibc targets anyhow. This is achieved
> by checking if gcc targets a glibc-based system. This check has been
> added for x86 and s390x, the rs6000 config already checks for
> TARGET_GLIBC_MAJOR. Other architectures do not have split-stack
> support. With this patch applied, the gcc-go configure script will
> detect that -fsplit-stack support is not available and will not use it.
>
> See https://www.openwall.com/lists/musl/2012/10/16/12
>
> This patch was written under the assumption that glibc is the only libc
> implementation which supports the required fields at the required
> offsets in the pthread_t TCB. The patch has been tested on Alpine Linux
> Edge on the s390x and x86 architectures by bootstrapping Google's Go
> implementation with gcc-go.
>
> Signed-off-by: Sören Tempel 
>
> gcc/ChangeLog:
>
>   * common/config/s390/s390-common.c (s390_supports_split_stack):
>   Only support split-stack on glibc targets.
>   * config/i386/gnu-user-common.h (STACK_CHECK_STATIC_BUILTIN): Ditto.
>   * config/i386/gnu.h (defined): Ditto.
> ---
> This version of the patch addresses feedback by Andrew Pinski and uses
> OPTION_GLIBC as well as opts->x_linux_libc == LIBC_GLIBC to detect glibc
> targets (instead of relying on TARGET_GLIBC_MAJOR).
>
>  gcc/common/config/s390/s390-common.c | 11 +--
>  gcc/config/i386/gnu-user-common.h|  5 +++--
>  gcc/config/i386/gnu.h|  6 +-
>  3 files changed, 17 insertions(+), 5 deletions(-)

Sorry for the slow review.  The patch LGTM bar some minor formatting
nits below, but target maintainers should have the final say.

> diff --git a/gcc/common/config/s390/s390-common.c 
> b/gcc/common/config/s390/s390-common.c
> index b6bc8501742..fc86e0bc5e7 100644
> --- a/gcc/common/config/s390/s390-common.c
> +++ b/gcc/common/config/s390/s390-common.c
> @@ -116,13 +116,20 @@ s390_handle_option (struct gcc_options *opts 
> ATTRIBUTE_UNUSED,
>  
>  /* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
> We don't verify it, since earlier versions just have padding at
> -   its place, which works just as well.  */
> +   its place, which works just as well. For other libc implementations

GCC style is to use 2 spaces after a full stop.  Same for the x86 part.

> +   we disable the feature entirely to avoid corrupting the TCB.  */
>  
>  static bool
>  s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
>  struct gcc_options *opts ATTRIBUTE_UNUSED)

These parameters are no longer unused after the patch, so it'd be good
to remove the attributes.

>  {
> -  return true;
> +  if (opts->x_linux_libc == LIBC_GLIBC) {
> +return true;
> +  } else {
> +if (report)
> +  error("%<-fsplit-stack%> currently only supported on GNU/Linux");
> +return false;
> +  }

Normal GCC formatting would be something like:

  if (opts->x_linux_libc == LIBC_GLIBC)
return true;

  if (report)
error ("%<-fsplit-stack%> currently only supported on GNU/Linux");
  return false;

Sorry for the fussy rules.

Thanks,
Richard

>  }
>  
>  #undef TARGET_DEFAULT_TARGET_FLAGS
> diff --git a/gcc/config/i386/gnu-user-common.h 
> b/gcc/config/i386/gnu-user-common.h
> index 00226f5a455..6e13315b5a3 100644
> --- a/gcc/config/i386/gnu-user-common.h
> +++ b/gcc/config/i386/gnu-user-common.h
> @@ -66,7 +66,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define STACK_CHECK_STATIC_BUILTIN 1
>  
>  /* We only build the -fsplit-stack support in libgcc if the
> -   assembler has full support for the CFI directives.  */
> -#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
> +   assembler has full support for the CFI directives and
> +   targets glibc.  */
> +#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE && OPTION_GLIBC
>  #define TARGET_CAN_SPLIT_STACK
>  #endif
> diff --git a/gcc/config/i386/gnu.h b/gcc/config/i386/gnu.h
> index 25fbc07f58c..adfe817201e 100644
> --- a/gcc/config/i386/gnu.h
> +++ b/gcc/config/i386/gnu.h
> @@ 

Re: [PATCH][GCC13?] RISC-V: Replace `smin'/`smax' RTL patterns with `fmin'/`fmax'

2022-01-20 Thread Joseph Myers
On Thu, 20 Jan 2022, Andrew Waterman wrote:

> The old formulation of the instructions were never ratified as a
> RISC-V standard.  I don't think we need to hamstring ourselves here by
> assuming the possibility of their implementation.

If you ignore the old version, then the instructions can unconditionally 
be used for the fminimum_num / fmaximum_num functions (which GCC doesn't 
support as built-in functions at present) - but can't be used for the fmin 
/ fmax functions if flag_signaling_nans.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][GCC13?] RISC-V: Replace `smin'/`smax' RTL patterns with `fmin'/`fmax'

2022-01-20 Thread Andrew Waterman
On Thu, Jan 20, 2022 at 12:30 PM Palmer Dabbelt  wrote:
>
> On Thu, 20 Jan 2022 07:44:25 PST (-0800), ma...@embecosm.com wrote:
> > RISC-V FMIN and FMAX machine instructions are IEEE-754-conformant[1]:
> >
> > "For FMIN and FMAX, if at least one input is a signaling NaN, or if both
> > inputs are quiet NaNs, the result is the canonical NaN.  If one operand
> > is a quiet NaN and the other is not a NaN, the result is the non-NaN
> > operand."
> >
> > as required by our `fminM3' and `fmaxM3' standard RTL patterns.
> >
> > However we only define `sminM3' and `smaxM3' standard RTL patterns to
> > produce the FMIN and FMAX machine instructions, which in turn causes the
> > `__builtin_fmin' and `__builtin_fmax' family of intrinsics to emit the
> > corresponding libcalls rather than the relevant machine instructions.
> >
> > Rename the `smin3' and `smax3' patterns to `fmin3' and
> > `fmax3' respectively then, removing the need to use libcalls for
> > IEEE 754 semantics with the minimum and maximum operations.
> >
> > [1] "The RISC-V Instruction Set Manual, Volume I: User-Level ISA",
> > Document Version 2.2, May 7, 2017, Section 8.3 "NaN Generation and
> > Propagation", p. 48
> >
> >   gcc/
> >   * config/riscv/riscv.md (smin3): Rename pattern to...
> >   (fmin3): ... this.
> >   (smax3): Likewise...
> >   (fmax3): ... this.
> > ---
> > Hi,
> >
> >  It's not clear to me how it's been missed or whether there is anything I
> > might be actually missing.  It looks to me like a clear oversight however.
>
> I'm not really a floating point person, but IIUC It's actually on
> purpose: earlier versions of the ISA spec didn't have this behavior, and
> at the time we originally merged the GCC port we decided to play it
> safe.  Pretty sure we discussed this before on the GCC mailing list
> ,maybe around the time the glibc port was going upstream?  I think Jim
> was the one who figured out how all the specs fit together.
>
> I can't find those older discussions, but this definately recently came
> up in glibc:
> https://sourceware.org/pipermail/libc-alpha/2021-October/131637.html .
> Looks like back then nobody knew of any hardware that ran glibc and
> implemented the old behavior, but there also haven't been patches posted
> yet so it's not set in stone.
>
> It's probably worth repeating the question here since there are a lot of
> RISC-V users that don't use glibc but do use GCC.  I don't know of
> anyone who implemented the old floating point standards off the top of
> my head, even in embedded land, but I'm pretty lost when it comes to ISA
> versioning these days so I might be missing something.
>
> One option could be to tie this to the ISA spec version and emit the
> required emulation routines, but I don't think that's worth bothering to
> do unless someone knows of an implementation that implements the old
> behavior.

The old formulation of the instructions were never ratified as a
RISC-V standard.  I don't think we need to hamstring ourselves here by
assuming the possibility of their implementation.

>
> > And in any case this change has passed full GCC regression testing (except
> > for the D frontend, which has stopped being built recently due to a defect
> > in Debian I haven't yet got to getting fixed) with the `riscv64-linux-gnu'
> > target using the HiFive Unmatched (U74 CPU) target board, so it seems to
> > be doing the right thing.
> >
> >  Timing might a bit unfortunate for this submission and given that it is
> > not a regression fix I guess this is GCC 13 material.  Please let me know
> > otherwise.
> >
> >  In any case OK to apply (when the time comes)?
>
> IMO waiting is the right way to go, as if this does uncover any issues
> they'll be a long-tail sort of thing.  That way we'll at least have a
> whole release cycle for folks to test on their hardware, which is about
> as good as we can do here.
>
> Acked-by: Palmer Dabbelt  # for 13
>
> Someone should probably do the glibc version, too ;)
>
> >
> >   Maciej
> > ---
> >  gcc/config/riscv/riscv.md |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > gcc-riscv-fmin-fmax.diff
> > Index: gcc/gcc/config/riscv/riscv.md
> > ===
> > --- gcc.orig/gcc/config/riscv/riscv.md
> > +++ gcc/gcc/config/riscv/riscv.md
> > @@ -1214,7 +1214,7 @@
> >  ;;
> >  ;;  
> >
> > -(define_insn "smin3"
> > +(define_insn "fmin3"
> >[(set (match_operand:ANYF0 "register_operand" "=f")
> >   (smin:ANYF (match_operand:ANYF 1 "register_operand" " f")
> >  (match_operand:ANYF 2 "register_operand" " f")))]
> > @@ -1223,7 +1223,7 @@
> >[(set_attr "type" "fmove")
> > (set_attr "mode" "")])
> >
> > -(define_insn "smax3"
> > +(define_insn "fmax3"
> >[(set (match_operand:ANYF0 "register_operand" "=f")
> >   (smax:ANYF (match_operand:ANYF 1 "register_operand" " f")
> >  

[PATCH] PR fortran/104127 - [9/10/11/12 Regression] ICE in get_array_charlen, at fortran/trans-array.c:7244

2022-01-20 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

when simplifying TRANSFER with a MOLD argument of type character
and with SIZE=0 we lose the character length.

This happens in all gfortran versions and results in wrong code.
The purported regression is that at some point in the 9-development
this lead to a (previously possibly latent) ICE.

The attached patch sets up the character length in the typespec and
fixes the ICE.  There is another generic hidden/latent problem with
array constructors of size 0 passed to procedures (see e.g. pr86277)
which will remain and is beyond the scope of this fix.

Regtested on x86_64-pc-linux-gnu.  I also fixed a minor logic bug
in testcase transfer_simplify_11.f90.

OK for mainline?  Backports to branches?

Thanks,
Harald

From c9882ace6199e2a327b69449f825e0366b442cba Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 20 Jan 2022 22:36:50 +0100
Subject: [PATCH] Fortran: fix simplification of TRANSFER for zero-sized
 character array result

gcc/fortran/ChangeLog:

	PR fortran/104127
	* simplify.cc (gfc_simplify_transfer): Ensure that the result
	typespec is set up for TRANSFER with MOLD of type CHARACTER
	including character length even if the result is a zero-sized
	array.

gcc/testsuite/ChangeLog:

	PR fortran/104127
	* gfortran.dg/transfer_simplify_11.f90: Fix logic.
	* gfortran.dg/transfer_simplify_13.f90: New test.
---
 gcc/fortran/simplify.cc   | 13 ++-
 .../gfortran.dg/transfer_simplify_11.f90  |  2 +-
 .../gfortran.dg/transfer_simplify_13.f90  | 34 +++
 3 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/transfer_simplify_13.f90

diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
index 3881370d947..8604162cfd5 100644
--- a/gcc/fortran/simplify.cc
+++ b/gcc/fortran/simplify.cc
@@ -8162,7 +8162,18 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)
  set even for array expressions, in order to pass this information into
  gfc_target_interpret_expr.  */
   if (result->ts.type == BT_CHARACTER && gfc_is_constant_expr (mold_element))
-result->value.character.length = mold_element->value.character.length;
+{
+  result->value.character.length = mold_element->value.character.length;
+
+  /* Let the typespec of the result inherit the string length.
+	 This is crucial if a resulting array has size zero.  */
+  if (mold_element->ts.u.cl->length)
+	result->ts.u.cl->length = gfc_copy_expr (mold_element->ts.u.cl->length);
+  else
+	result->ts.u.cl->length =
+	  gfc_get_int_expr (gfc_charlen_int_kind, NULL,
+			mold_element->value.character.length);
+}

   /* Set the number of elements in the result, and determine its size.  */

diff --git a/gcc/testsuite/gfortran.dg/transfer_simplify_11.f90 b/gcc/testsuite/gfortran.dg/transfer_simplify_11.f90
index 0911f9dba3a..409e4768a10 100644
--- a/gcc/testsuite/gfortran.dg/transfer_simplify_11.f90
+++ b/gcc/testsuite/gfortran.dg/transfer_simplify_11.f90
@@ -4,5 +4,5 @@
integer, parameter :: N = 2
character(len=1) :: chr(N)
chr = transfer(repeat("x",ncopies=N),[character(len=1) ::], N)
-   if (chr(1) /= 'x' .and. chr(2) /= 'x') STOP 1
+   if (chr(1) /= 'x' .or. chr(2) /= 'x') STOP 1
 end
diff --git a/gcc/testsuite/gfortran.dg/transfer_simplify_13.f90 b/gcc/testsuite/gfortran.dg/transfer_simplify_13.f90
new file mode 100644
index 000..59109c6029d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/transfer_simplify_13.f90
@@ -0,0 +1,34 @@
+! { dg-do compile }
+! { dg-options "-fdump-tree-original" }
+! PR fortran/104127 - ICE in get_array_charlen
+! Contributed by G.Steinmetz
+
+program p
+  character(4) :: mold = "XYZ"
+  integer  :: i = 0
+  integer, parameter :: l1 = len  (transfer('ab', 'xyz', size=0))
+  integer, parameter :: s1 = size (transfer('ab', 'xyz', size=0))
+  integer, parameter :: l4 = len  (transfer(4_'abcd', 4_'xy', size=0))
+  integer, parameter :: s4 = size (transfer(4_'abcd', 4_'xy', size=0))
+  integer, parameter :: l2 = len  (transfer('ab', mold,  size=0))
+  integer, parameter :: l3 = len  (transfer('ab', mold,  size=1))
+  integer, parameter :: l5 = len  (transfer('ab',['xyz'], size=0))
+  integer, parameter :: s5 = size (transfer('ab',['xyz'], size=0))
+  call sub0 ( transfer('a', 'y', size=0) )
+  call sub1 ([transfer('a', 'y', size=0)])
+  call sub2 ([transfer('a',['y'],size=0)])
+  call sub3 ( transfer('a', 'y', size=1) )
+  call sub4 ([transfer('a', 'y', size=1)])
+  call sub5 ( transfer('a', 'y', size=i) )
+  call sub6 ( transfer(1_'abcd', 1_'xy' , size=0))
+  call sub7 ( transfer(1_'abcd',[1_'xy'], size=0))
+  call sub8 ( transfer(4_'abcd', 4_'xy' , size=0))
+  call sub9 ( transfer(4_'abcd',[4_'xy'], size=0))
+  print *, transfer('abcd', 'xy', size=0)
+  if (l1 /= 3 .or. s1 /= 0) stop 1
+  if (l4 /= 2 .or. s4 /= 0) stop 2
+  if (l2 /= 4 .or. l3 /= 4) stop 3
+  if (l5 /= 3 .or. s5 /= 0) stop 1
+end
+
+! { dg-final { scan-tree-dump-not 

Re: [PATCH v2, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-20 Thread Segher Boessenkool
Hi!

On Thu, Jan 20, 2022 at 01:46:48PM -0500, David Edelsohn wrote:
> On Thu, Jan 20, 2022 at 2:36 AM HAO CHEN GUI  wrote:
> >This patch adds a combine pattern for "CA minus one". As CA only has two
> > values (0 or 1), we could convert following pattern
> >   (sign_extend:DI (plus:SI (reg:SI 98 ca)
> > (const_int -1 [0x]
> > to
> >(plus:DI (reg:DI 98 ca)
> > (const_int -1 [0x])))
> >With this patch, one unnecessary sign extend is eliminated.
> >
> >Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> > Is this okay for trunk? Any recommendations? Thanks a lot.

There are ten gazillion similar things we could make extra backend
patterns for, and we still would not cover a majority of cases.

If instead we got some generic way to handle this we could cover many
more cases, for much less effort.

We need both widening modes from SI to DI, amd narrowing modes from DI
to SI.  Both are useful in certain cases; it is not like using wider
modes is always better, in some cases narrower modes is better (in cases
where we can let the generated code then generate whatever bits in the
high half of the word, for example; a typical example is addition in an
unsigned int).

> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> > @@ -0,0 +1,10 @@
> > +/* PR target/95737 */
> > +/* { dg-do compile { target lp64 } } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> 
> Why does the testcase force power8? This testcase is not specific to
> Power8 or later.

Yes, and we should generate the same code on older machines.

> > +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
> > +
> > +
> > +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
> > long b)
> > +{
> > +   return -(a < b);
> > +}
> 
> If you're only testing for lp64, the testcase could use "long" instead
> of "long long".

The testcase really needs "powerpc64", if that would mean "test if
-mpowerpc64 is (implicitly) used".  But that is not what it currently
means (it is something akin to "powerpc64_hw", instead).

So we test lp64, which is set if and only if -m64 was used.  It is
reasonable coverage, no one cares much for -m32 -mpowerpc64 .


Segher


Re: [PATCH] Fortran: Fix scope for OMP AFFINITY clause iterator variables [PR103695]

2022-01-20 Thread Sandra Loosemore

On 1/19/22 3:01 PM, Thomas Koenig wrote:


Hi Sandra,


This patch is for PR103695, marked as a P1 regression.  OK to check in?


I'm not an OpenMP expert, but this looks straightforward enough.

I assume you ran a regression-test?  OK if that is the case.


Yes, test results on x86_64-linux-gnu look good.

Tobias pointed out to me that this bug was likely also the cause of the 
ICE reported in PR102621, so I verified that was also fixed now, added 
the test case from that issue to the patch, and committed this version.


-Sandra
commit d2ad748eeef0dd260f3993b8dcbffbded3240a0a
Author: Sandra Loosemore 
Date:   Thu Jan 20 13:29:48 2022 -0800

Fortran: Fix scope for OMP AFFINITY clause iterator variables [PR103695]

gfc_finish_var_decl was confused by the undocumented overloading of
the proc_name field in struct gfc_namespace to contain iterator
variables for the OpenMP AFFINITY clause, causing it to insert the
decls in the wrong scope.  This patch adds a new distinct field to
hold these variables.

2022-01-20  Sandra Loosemore  

	PR fortran/103695
	PR fortran/102621

	gcc/fortran
	* gfortran.h (struct gfc_namespace) Add omp_affinity_iterator
	field.
	* dump-parse-tree.cc (show_iterator): Use it.
	* openmp.cc (gfc_match_iterator): Likewise.
	(resolve_omp_clauses): Likewise.
	* trans-decl.cc (gfc_finish_var_decl): Likewise.
	* trans-openmp.cc (handle_iterator): Likewise.

	gcc/testsuite/
	* gfortran.dg/gomp/affinity-clause-3.f90: Adjust pattern.
	* gfortran.dg/gomp/pr102621.f90: New.
	* gfortran.dg/gomp/pr103695.f90: New.

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index a618ae2..3112cae 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1302,10 +1302,10 @@ show_code (int level, gfc_code *c)
 static void
 show_iterator (gfc_namespace *ns)
 {
-  for (gfc_symbol *sym = ns->proc_name; sym; sym = sym->tlink)
+  for (gfc_symbol *sym = ns->omp_affinity_iterators; sym; sym = sym->tlink)
 {
   gfc_constructor *c;
-  if (sym != ns->proc_name)
+  if (sym != ns->omp_affinity_iterators)
 	fputc (',', dumpfile);
   fputs (sym->name, dumpfile);
   fputc ('=', dumpfile);
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 00a558a..993879f 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2107,6 +2107,9 @@ typedef struct gfc_namespace
   /* !$ACC ROUTINE clauses.  */
   gfc_omp_clauses *oacc_routine_clauses;
 
+  /* !$ACC TASK AFFINITY iterator symbols.  */
+  gfc_symbol *omp_affinity_iterators;
+
   /* !$ACC ROUTINE names.  */
   gfc_oacc_routine_name *oacc_routine_names;
 
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 9b73b9f..073e5a1 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -1123,7 +1123,7 @@ gfc_match_iterator (gfc_namespace **ns, bool permit_var)
   if (last)
 	last->tlink = sym;
   else
-	(*ns)->proc_name = sym;
+	(*ns)->omp_affinity_iterators = sym;
   last = sym;
   sym->declared_at = prev_loc;
   sym->ts = ts;
@@ -6832,8 +6832,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		&& n->u2.ns && !n->u2.ns->resolved)
 		  {
 		n->u2.ns->resolved = 1;
-		for (gfc_symbol *sym = n->u2.ns->proc_name; sym;
-			 sym = sym->tlink)
+		for (gfc_symbol *sym = n->u2.ns->omp_affinity_iterators;
+			 sym; sym = sym->tlink)
 		  {
 			gfc_constructor *c;
 			c = gfc_constructor_first (sym->value->value.constructor);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 1112ca9..6493cc2 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -647,6 +647,9 @@ gfc_finish_var_decl (tree decl, gfc_symbol * sym)
 	   && sym->ns->proc_name->attr.flavor == FL_LABEL)
 	/* This is a BLOCK construct.  */
 	add_decl_as_local (decl);
+  else if (sym->ns->omp_affinity_iterators)
+	/* This is a block-local iterator.  */
+	add_decl_as_local (decl);
   else
 	gfc_add_decl_to_parent_function (decl);
 }
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 9eabf68..d5a6b2d 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2483,7 +2483,7 @@ static tree
 handle_iterator (gfc_namespace *ns, stmtblock_t *iter_block, tree block)
 {
   tree list = NULL_TREE;
-  for (gfc_symbol *sym = ns->proc_name; sym; sym = sym->tlink)
+  for (gfc_symbol *sym = ns->omp_affinity_iterators; sym; sym = sym->tlink)
 {
   gfc_constructor *c;
   gfc_se se;
diff --git a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-3.f90 b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-3.f90
index 3fd39fe..eebe4dd 100644
--- a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-3.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-3.f90
@@ -11,4 +11,4 @@ subroutine foo
   !$omp end task
 end
 ! { dg-final { scan-tree-dump-times "= ibar \\(\\." 3 

Re: [PATCH] dwarf2out: Fix -gsplit-dwarf on riscv [PR103874]

2022-01-20 Thread Palmer Dabbelt

On Thu, 20 Jan 2022 13:20:34 PST (-0800), gcc-patches@gcc.gnu.org wrote:

On Thu, Jan 20, 2022 at 01:13:45PM -0800, Palmer Dabbelt wrote:

On Thu, 20 Jan 2022 02:45:53 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> riscv*-*-* are the only modern targets that !HAVE_AS_LEB128 (apparently
> due to some aggressive linker optimizations).

I don't really understand the rest of this, but we do have a subset of
LEB128 (constant expressions only).  I'm not sure exactly what the
requirements are here, but one could imagine extending our assembler support
to cover them -- we might never have full support for LEB128 expressions
(because of linker relaxation), but we might be able to make more stuff
work.

I'm not sure if that helps or hurts, though, as we'll still be a special
case.


HAVE_AS_LEB128 really needs to be able to handle both constants and
difference of two labels in the same section.
Most targets resolve something like that in the assembler as constant which
they encode into sleb128 or uleb128 and put into the section that uses
those directives (typically debugging sections).
If a target performs aggressive linker relaxation, then probably some
relocation would need to be added (but one that can encode the two symbols
for the difference, so perhaps two relocations that must be consecutive or
something similar) and resolve that by the linker.  Though, that would mean
that even in the debugging section offsets wouldn't be fixed during
assembly...


Differences are the hard case for RISC-V, as they can grow numerically.  
That could  then cause the LEB to grow in byte size, possibly violating 
one of our linker relaxation invariants.  The only way I've come up with 
to support these would be to pad the LEBs, and I'm not sure if that's 
legal.


Not sure if I'm missing something, though.


Re: [PATCH][GCC13?] RISC-V: Replace `smin'/`smax' RTL patterns with `fmin'/`fmax'

2022-01-20 Thread Joseph Myers
On Thu, 20 Jan 2022, Joseph Myers wrote:

> The C functions fmin and fmax correspond to the IEEE 754-2008 operations 
> minNum and maxNum, which operate as described, and the RISC-V 'F' and 'D' 
> extensions *before* version 2.2 of those extensions also corresponded to 
> minNum and maxNum.

And, to make things clear for people confused by the different version 
numbers involved: RISC-V ISA version 2.2 contains 'F' and 'D' extensions 
version 2.0.  It's version 2.2 of the extensions, *not* of the ISA, that 
changed the instructions from minNum and maxNum to minimumNumber and 
maximumNumber.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] dwarf2out: Fix -gsplit-dwarf on riscv [PR103874]

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 01:13:45PM -0800, Palmer Dabbelt wrote:
> On Thu, 20 Jan 2022 02:45:53 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> > riscv*-*-* are the only modern targets that !HAVE_AS_LEB128 (apparently
> > due to some aggressive linker optimizations).
> 
> I don't really understand the rest of this, but we do have a subset of
> LEB128 (constant expressions only).  I'm not sure exactly what the
> requirements are here, but one could imagine extending our assembler support
> to cover them -- we might never have full support for LEB128 expressions
> (because of linker relaxation), but we might be able to make more stuff
> work.
> 
> I'm not sure if that helps or hurts, though, as we'll still be a special
> case.

HAVE_AS_LEB128 really needs to be able to handle both constants and
difference of two labels in the same section.
Most targets resolve something like that in the assembler as constant which
they encode into sleb128 or uleb128 and put into the section that uses
those directives (typically debugging sections).
If a target performs aggressive linker relaxation, then probably some
relocation would need to be added (but one that can encode the two symbols
for the difference, so perhaps two relocations that must be consecutive or
something similar) and resolve that by the linker.  Though, that would mean
that even in the debugging section offsets wouldn't be fixed during
assembly...

Jakub



Re: [PATCH][GCC13?] RISC-V: Replace `smin'/`smax' RTL patterns with `fmin'/`fmax'

2022-01-20 Thread Joseph Myers
On Thu, 20 Jan 2022, Maciej W. Rozycki wrote:

> RISC-V FMIN and FMAX machine instructions are IEEE-754-conformant[1]:
> 
> "For FMIN and FMAX, if at least one input is a signaling NaN, or if both 
> inputs are quiet NaNs, the result is the canonical NaN.  If one operand 
> is a quiet NaN and the other is not a NaN, the result is the non-NaN 
> operand."
> 
> as required by our `fminM3' and `fmaxM3' standard RTL patterns.
> 
> However we only define `sminM3' and `smaxM3' standard RTL patterns to 
> produce the FMIN and FMAX machine instructions, which in turn causes the 
> `__builtin_fmin' and `__builtin_fmax' family of intrinsics to emit the 
> corresponding libcalls rather than the relevant machine instructions.
> 
> Rename the `smin3' and `smax3' patterns to `fmin3' and 
> `fmax3' respectively then, removing the need to use libcalls for 
> IEEE 754 semantics with the minimum and maximum operations.
> 
> [1] "The RISC-V Instruction Set Manual, Volume I: User-Level ISA",
> Document Version 2.2, May 7, 2017, Section 8.3 "NaN Generation and 
> Propagation", p. 48

That's an old version of the instruction set, and an old version of IEEE 
754.

The C functions fmin and fmax correspond to the IEEE 754-2008 operations 
minNum and maxNum, which operate as described, and the RISC-V 'F' and 'D' 
extensions *before* version 2.2 of those extensions also corresponded to 
minNum and maxNum.

IEEE 754-2019 removes minNum and maxNum because they are non-associative 
in the presence of signaling NaNs, replacing them with new operations 
minimum, minimumNumber, maximum, maximumNumber.  C23 defines new functions 
fminimum, fminimum_num, fmaximum, fmaximum_num corresponding to those new 
operations, leaving fmin and fmax unchanged.  And the RISC-V 'F' and 'D' 
extensions version 2.2 change the FMIN and FMAX instructions to correspond 
to minimumNumber and maximumNumber instead of minNum and maxNum.

So, if generating code that might be run on processors implementing 
versions of 'F' and 'D' older than 2.2, it's not safe to generate FMAX and 
FMIN instructions for any of those standard C library functions when 
signaling NaN operands are a possibility (i.e. flag_signaling_nans is 
set), because code built for older versions might run on newer versions 
with changed instruction semantics.  If generating code that requires 
version 2.2 or later of 'F' and 'D' (and I don't know if GCC actually 
supports generating code for older versions), it's OK to generate those 
instructions as part of expanding calls to the C23 functions fminimum_num 
and fmaximum_num - but not as part of expanding calls to fmin and fmax 
(unless !flag_signaling_nans, in which case the differences between those 
functions aren't relevant).  And GCC currently doesn't have built-in 
functions for fminimum_num and fmaximum_num, and I don't think it has insn 
patterns corresponding to those functions either.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] dwarf2out: Fix -gsplit-dwarf on riscv [PR103874]

2022-01-20 Thread Palmer Dabbelt

On Thu, 20 Jan 2022 02:45:53 PST (-0800), gcc-patches@gcc.gnu.org wrote:

Hi!

riscv*-*-* are the only modern targets that !HAVE_AS_LEB128 (apparently
due to some aggressive linker optimizations).


I don't really understand the rest of this, but we do have a subset of 
LEB128 (constant expressions only).  I'm not sure exactly what the 
requirements are here, but one could imagine extending our assembler 
support to cover them -- we might never have full support for LEB128 
expressions (because of linker relaxation), but we might be able to make 
more stuff work.


I'm not sure if that helps or hurts, though, as we'll still be a special 
case.



As the following testcase shows, we mishandle in index_rnglists the
!HAVE_AS_LEB128 && !have_multiple_function_sections case.

output_rnglists does roughly:
  FOR_EACH_VEC_SAFE_ELT (ranges_table, i, r)
{
...
  if (block_num > 0)
{
...
  if (HAVE_AS_LEB128)
{
  if (!have_multiple_function_sections)
{
  // code not using r->*_entry
  continue;
}
  // code that sometimes doesn't use r->*_entry,
  // sometimes r->begin_entry
}
  else if (dwarf_split_debug_info)
{
  // code that uses both r->begin_entry and r->end_entry
}
  else
{
  // code not using r->*_entry
}
}
  else if (block_num < 0)
{
  if (!have_multiple_function_sections)
gcc_unreachable ();
...
}
}
and index_rnglists is what sets up those r->{begin,end}_entry members.
The code did an early if (!have_multiple_function_sections) continue;
which is fine for the HAVE_AS_LEB128 case, because r->*_entry is not
used in that case, but not for !HAVE_AS_LEB128 that uses it anyway.

Fixed thusly, tested on the testcase with x86_64 -> riscv64 cross,
bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-20  Jakub Jelinek  

PR debug/103874
* dwarf2out.cc (index_rnglists): For !HAVE_AS_LEB128 and
block_num > 0, index entry even if !have_multiple_function_sections.

* gcc.dg/debug/dwarf2/pr103874.c: New test.

--- gcc/dwarf2out.cc.jj 2022-01-18 11:58:59.0 +0100
+++ gcc/dwarf2out.cc2022-01-19 13:30:08.936008194 +0100
@@ -12094,9 +12094,10 @@ index_rnglists (void)
   if (r->label && r->idx != DW_RANGES_IDX_SKELETON)
r->idx = rnglist_idx++;

-  if (!have_multiple_function_sections)
-   continue;
   int block_num = r->num;
+  if ((HAVE_AS_LEB128 || block_num < 0)
+ && !have_multiple_function_sections)
+   continue;
   if (HAVE_AS_LEB128 && (r->label || r->maybe_new_sec))
base = false;
   if (block_num > 0)
--- gcc/testsuite/gcc.dg/debug/dwarf2/pr103874.c.jj 2022-01-19 
13:35:25.485631843 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/pr103874.c2022-01-19 
13:36:53.608413534 +0100
@@ -0,0 +1,12 @@
+/* PR debug/103874 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g -gsplit-dwarf -dA -Wno-implicit-function-declaration" 
} */
+
+void
+foo (void)
+{
+  {
+bar ();
+baz ();
+  }
+}

Jakub


Re: Re: [PATCH v2] Disable -fsplit-stack support on non-glibc targets

2022-01-20 Thread Sören Tempel via Gcc-patches
Ping.

Summary: Patch disable -fstack-split on non-glibc targets to prevent
corruptions of the TCB on libcs which do not support the required
fields in pthread_t. This is an important fix for having gccgo work on
musl by default.

See: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587142.html

If the patch needs to be revised further please let me know.

Greetings,
Sören

Sören Tempel  wrote:
> The -fsplit-stack option requires the pthread_t TCB definition in the
> libc to provide certain struct fields at specific hardcoded offsets. As
> far as I know, only glibc provides these fields at the required offsets.
> Most notably, musl libc does not have these fields. However, since gcc
> accesses the fields using a fixed offset, this does not cause a
> compile-time error, but instead results in a silent memory corruption at
> run-time with musl libc. For example, on s390x libgcc's
> __stack_split_initialize CTOR will overwrite the cancel field in the
> pthread_t TCB on musl.
> 
> The -fsplit-stack option is used within the gcc code base itself by
> gcc-go (if available). On musl-based systems with split-stack support
> (i.e. s390x or x86) this causes Go programs compiled with gcc-go to
> misbehave at run-time.
> 
> This patch fixes gcc-go on musl by disabling -fsplit-stack in gcc itself
> since it is not supported on non-glibc targets anyhow. This is achieved
> by checking if gcc targets a glibc-based system. This check has been
> added for x86 and s390x, the rs6000 config already checks for
> TARGET_GLIBC_MAJOR. Other architectures do not have split-stack
> support. With this patch applied, the gcc-go configure script will
> detect that -fsplit-stack support is not available and will not use it.
> 
> See https://www.openwall.com/lists/musl/2012/10/16/12
> 
> This patch was written under the assumption that glibc is the only libc
> implementation which supports the required fields at the required
> offsets in the pthread_t TCB. The patch has been tested on Alpine Linux
> Edge on the s390x and x86 architectures by bootstrapping Google's Go
> implementation with gcc-go.
> 
> Signed-off-by: Sören Tempel 
> 
> gcc/ChangeLog:
> 
>   * common/config/s390/s390-common.c (s390_supports_split_stack):
>   Only support split-stack on glibc targets.
>   * config/i386/gnu-user-common.h (STACK_CHECK_STATIC_BUILTIN): Ditto.
>   * config/i386/gnu.h (defined): Ditto.
> ---
> This version of the patch addresses feedback by Andrew Pinski and uses
> OPTION_GLIBC as well as opts->x_linux_libc == LIBC_GLIBC to detect glibc
> targets (instead of relying on TARGET_GLIBC_MAJOR).
> 
>  gcc/common/config/s390/s390-common.c | 11 +--
>  gcc/config/i386/gnu-user-common.h|  5 +++--
>  gcc/config/i386/gnu.h|  6 +-
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/common/config/s390/s390-common.c 
> b/gcc/common/config/s390/s390-common.c
> index b6bc8501742..fc86e0bc5e7 100644
> --- a/gcc/common/config/s390/s390-common.c
> +++ b/gcc/common/config/s390/s390-common.c
> @@ -116,13 +116,20 @@ s390_handle_option (struct gcc_options *opts 
> ATTRIBUTE_UNUSED,
>  
>  /* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
> We don't verify it, since earlier versions just have padding at
> -   its place, which works just as well.  */
> +   its place, which works just as well. For other libc implementations
> +   we disable the feature entirely to avoid corrupting the TCB.  */
>  
>  static bool
>  s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
>  struct gcc_options *opts ATTRIBUTE_UNUSED)
>  {
> -  return true;
> +  if (opts->x_linux_libc == LIBC_GLIBC) {
> +return true;
> +  } else {
> +if (report)
> +  error("%<-fsplit-stack%> currently only supported on GNU/Linux");
> +return false;
> +  }
>  }
>  
>  #undef TARGET_DEFAULT_TARGET_FLAGS
> diff --git a/gcc/config/i386/gnu-user-common.h 
> b/gcc/config/i386/gnu-user-common.h
> index 00226f5a455..6e13315b5a3 100644
> --- a/gcc/config/i386/gnu-user-common.h
> +++ b/gcc/config/i386/gnu-user-common.h
> @@ -66,7 +66,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define STACK_CHECK_STATIC_BUILTIN 1
>  
>  /* We only build the -fsplit-stack support in libgcc if the
> -   assembler has full support for the CFI directives.  */
> -#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
> +   assembler has full support for the CFI directives and
> +   targets glibc.  */
> +#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE && OPTION_GLIBC
>  #define TARGET_CAN_SPLIT_STACK
>  #endif
> diff --git a/gcc/config/i386/gnu.h b/gcc/config/i386/gnu.h
> index 25fbc07f58c..adfe817201e 100644
> --- a/gcc/config/i386/gnu.h
> +++ b/gcc/config/i386/gnu.h
> @@ -35,7 +35,11 @@ along with GCC.  If not, see 
> .
> crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
>  #endif
>  
> -#ifdef TARGET_LIBC_PROVIDES_SSP
> +/* -fsplit-stack uses a field 

Re: [PATCH][GCC13?] RISC-V: Replace `smin'/`smax' RTL patterns with `fmin'/`fmax'

2022-01-20 Thread Palmer Dabbelt

On Thu, 20 Jan 2022 07:44:25 PST (-0800), ma...@embecosm.com wrote:

RISC-V FMIN and FMAX machine instructions are IEEE-754-conformant[1]:

"For FMIN and FMAX, if at least one input is a signaling NaN, or if both
inputs are quiet NaNs, the result is the canonical NaN.  If one operand
is a quiet NaN and the other is not a NaN, the result is the non-NaN
operand."

as required by our `fminM3' and `fmaxM3' standard RTL patterns.

However we only define `sminM3' and `smaxM3' standard RTL patterns to
produce the FMIN and FMAX machine instructions, which in turn causes the
`__builtin_fmin' and `__builtin_fmax' family of intrinsics to emit the
corresponding libcalls rather than the relevant machine instructions.

Rename the `smin3' and `smax3' patterns to `fmin3' and
`fmax3' respectively then, removing the need to use libcalls for
IEEE 754 semantics with the minimum and maximum operations.

[1] "The RISC-V Instruction Set Manual, Volume I: User-Level ISA",
Document Version 2.2, May 7, 2017, Section 8.3 "NaN Generation and
Propagation", p. 48

gcc/
* config/riscv/riscv.md (smin3): Rename pattern to...
(fmin3): ... this.
(smax3): Likewise...
(fmax3): ... this.
---
Hi,

 It's not clear to me how it's been missed or whether there is anything I
might be actually missing.  It looks to me like a clear oversight however.


I'm not really a floating point person, but IIUC It's actually on 
purpose: earlier versions of the ISA spec didn't have this behavior, and 
at the time we originally merged the GCC port we decided to play it 
safe.  Pretty sure we discussed this before on the GCC mailing list 
,maybe around the time the glibc port was going upstream?  I think Jim 
was the one who figured out how all the specs fit together.


I can't find those older discussions, but this definately recently came 
up in glibc:
https://sourceware.org/pipermail/libc-alpha/2021-October/131637.html .  
Looks like back then nobody knew of any hardware that ran glibc and 
implemented the old behavior, but there also haven't been patches posted 
yet so it's not set in stone.


It's probably worth repeating the question here since there are a lot of 
RISC-V users that don't use glibc but do use GCC.  I don't know of 
anyone who implemented the old floating point standards off the top of 
my head, even in embedded land, but I'm pretty lost when it comes to ISA 
versioning these days so I might be missing something.


One option could be to tie this to the ISA spec version and emit the 
required emulation routines, but I don't think that's worth bothering to 
do unless someone knows of an implementation that implements the old 
behavior.



And in any case this change has passed full GCC regression testing (except
for the D frontend, which has stopped being built recently due to a defect
in Debian I haven't yet got to getting fixed) with the `riscv64-linux-gnu'
target using the HiFive Unmatched (U74 CPU) target board, so it seems to
be doing the right thing.

 Timing might a bit unfortunate for this submission and given that it is
not a regression fix I guess this is GCC 13 material.  Please let me know
otherwise.

 In any case OK to apply (when the time comes)?


IMO waiting is the right way to go, as if this does uncover any issues 
they'll be a long-tail sort of thing.  That way we'll at least have a 
whole release cycle for folks to test on their hardware, which is about 
as good as we can do here.


Acked-by: Palmer Dabbelt  # for 13

Someone should probably do the glibc version, too ;)



  Maciej
---
 gcc/config/riscv/riscv.md |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

gcc-riscv-fmin-fmax.diff
Index: gcc/gcc/config/riscv/riscv.md
===
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -1214,7 +1214,7 @@
 ;;
 ;;  

-(define_insn "smin3"
+(define_insn "fmin3"
   [(set (match_operand:ANYF0 "register_operand" "=f")
(smin:ANYF (match_operand:ANYF 1 "register_operand" " f")
   (match_operand:ANYF 2 "register_operand" " f")))]
@@ -1223,7 +1223,7 @@
   [(set_attr "type" "fmove")
(set_attr "mode" "")])

-(define_insn "smax3"
+(define_insn "fmax3"
   [(set (match_operand:ANYF0 "register_operand" "=f")
(smax:ANYF (match_operand:ANYF 1 "register_operand" " f")
   (match_operand:ANYF 2 "register_operand" " f")))]


[pushed] c++: designator for base class member [PR101405]

2022-01-20 Thread Jason Merrill via Gcc-patches
A C++20 designator must name a direct non-static member of the class; in
this case it names a member of a base class, and we should give an error
instead of crashing.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/101405

gcc/cp/ChangeLog:

* decl.cc (reshape_init_class): Reject designator for a member of
another class.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig20.C: New test.
---
 gcc/cp/decl.cc   | 14 ++
 gcc/testsuite/g++.dg/cpp2a/desig20.C | 20 
 2 files changed, 30 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig20.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 1cbe9a34be0..8e5421848d1 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -6569,16 +6569,22 @@ reshape_init_class (tree type, reshape_iter *d, bool 
first_initializer_p,
  tree ictx = DECL_CONTEXT (field);
  if (!same_type_ignoring_top_level_qualifiers_p (ictx, type))
{
- gcc_assert (ANON_AGGR_TYPE_P (ictx));
  /* Find the anon aggr that is a direct member of TYPE.  */
- while (true)
+ while (ANON_AGGR_TYPE_P (ictx))
{
  tree cctx = TYPE_CONTEXT (ictx);
  if (same_type_ignoring_top_level_qualifiers_p (cctx, type))
-   break;
+   goto found;
  ictx = cctx;
}
- /* And then the TYPE member with that anon aggr type.  */
+
+ /* Not found, e.g. FIELD is a member of a base class.  */
+ if (complain & tf_error)
+   error ("%qD is not a direct member of %qT", field, type);
+ return error_mark_node;
+
+   found:
+ /* Now find the TYPE member with that anon aggr type.  */
  tree aafield = TYPE_FIELDS (type);
  for (; aafield; aafield = TREE_CHAIN (aafield))
if (TREE_TYPE (aafield) == ictx)
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig20.C 
b/gcc/testsuite/g++.dg/cpp2a/desig20.C
new file mode 100644
index 000..0ceda7ccabd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig20.C
@@ -0,0 +1,20 @@
+// PR c++/101405
+// { dg-do compile { target c++20 } }
+
+struct A {
+  int const a = 1;
+  int const b = 2;
+};
+
+struct B : A {
+  using A::a;
+  using A::b;
+  int const c = 3;
+  int const d = 4;
+};
+
+int main()
+{
+  [[maybe_unused]] B b =
+  { .a = 10, .d = 42 };// { dg-error "not a direct member" }
+}

base-commit: 30b38394b482ce894d9bc81731a0eea8711f4587
-- 
2.27.0



Re: [PATCH] c++: ICE with noexcept and canonical types [PR101715]

2022-01-20 Thread Jason Merrill via Gcc-patches

On 1/18/22 11:05, Marek Polacek wrote:

On Mon, Jan 17, 2022 at 01:48:48PM -0500, Jason Merrill wrote:

On 1/14/22 19:22, Marek Polacek wrote:

This is a "canonical types differ for identical types" ICE, which started
with r11-4682.  It's a bit tricky to explain.  Consider:

template  struct S {
  S bar() noexcept(T::value);  // #1
  S foo() noexcept(T::value);  // #2
};

template  S S::foo() noexcept(T::value) {}  // #3

We ICE because #3 and #2 have the same type, but their canonical types
differ: TYPE_CANONICAL (#3) == #2 but TYPE_CANONICAL (#2) == #1.

The member functions #1 and #2 have the same type.  However, since their
noexcept-specifier is deferred, when parsing them, we create a variant for
both of them, because DEFERRED_PARSE cannot be compared.  In other words,
build_cp_fntype_variant's

tree v = TYPE_MAIN_VARIANT (type);
for (; v; v = TYPE_NEXT_VARIANT (v))
  if (cp_check_qualified_type (v, type, type_quals, rqual, raises, late))
return v;

will *not* find an existing variant when creating a method_type for #2, so we
have to create a new one.

But then we perform delayed parsing and call fixup_deferred_exception_variants
for #1 and #2.  f_d_e_v will replace TYPE_RAISES_EXCEPTIONS with the newly
parsed noexcept-specifier.  It also sets TYPE_CANONICAL (#2) to #1.  Both
noexcepts turned out to be the same, so now we have two equivalent variants in
the list!  I.e.,

+-+  +-+  +-+
|  main   |  |  #2 |  |  #1 |
| S S::(S*) |->| S S::(S*) |->| S S::(S*) |->NULL
|-|  |  noex(T::value) |  |  noex(T::value) |
+-+  +-+  +-+

Then we get to #3.  As for #1 and #2, grokdeclarator calls build_memfn_type,
which ends up calling build_cp_fntype_variant, which will use the loop
above to look for an existing variant.  The first one that matches
cp_check_qualified_type will be used, so we use #2 rather than #1, and the
TYPE_CANONICAL mismatch follows.  Hopefully that makes sense.


Why doesn't the TYPE_CANONICAL (v) == v check prevent this?


In other words, I think you're asking: why did fixup_deferred_exception_variants
set TYPE_CANONICAL (#2) to #1 (which then differs from TYPE_CANONICAL (#3),
which is #2)?


I meant to ask why TYPE_CANONICAL (#3) got set to #2 instead of #1?

And to answer my own question, it's because the check I mention is in 
fixup_deferred_exception_variants, and #3 doesn't go through there at 
all; the loop in build_cp_fntype_variant assumes no duplicate variants, 
which your patch fixes.



The method_type for #1 (I'll mark is as #1 here) is built with it being its own
canonical type.

The first call to fixup_deferred_exception_variants does not change it: in
there, VARIANT is #1, the loop with 'TYPE_CANONICAL (v) == v' cannot find
an existing variant that would match, so when we do

 v = build_cp_fntype_variant (TYPE_CANONICAL (variant),
  rqual, cr, false);
we get #1 so
 TYPE_CANONICAL (variant) = v;
is just
 TYPE_CANONICAL (#1) = #1;
so no change.

The second call to fixup_deferred_exception_variants: here we're working with
VARIANT #2.  Now we again scan the list of variants {main, #2, #1} where we
find a match for #2: #1.  #1's TYPE_CANONICAL is #1 as per above, so we set
 TYPE_CANONICAL (#2) = #1;
which I think is correct.


I think TYPE_CANONICAL (#3) should also be #1, not #2, which my patch attempts
to do.


Hope this explanation makes some sense, please ask away if it doesn't!


As for the fix, I didn't think I could rewrite the method_type #2 with #1
because the type may have escaped via decltype.  So my approach is to
elide #2 from the list, so when looking for a matching variant, we always
find #1 (#2 remains live though, which admittedly sounds sort of dodgy).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?

PR c++/101715

gcc/cp/ChangeLog:

* tree.c (fixup_deferred_exception_variants): Remove duplicate
variants after parsing the exception specifications.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept72.C: New test.
* g++.dg/cpp0x/noexcept73.C: New test.
---
   gcc/cp/tree.c   | 16 +++-
   gcc/testsuite/g++.dg/cpp0x/noexcept72.C | 21 +
   gcc/testsuite/g++.dg/cpp0x/noexcept73.C | 13 +
   3 files changed, 49 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept72.C
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept73.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 7f7de86b4e8..2efad49e7c1 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -2804,8 +2804,9 @@ fixup_deferred_exception_variants (tree type, tree raises)
 /* Though sucky, this walk will process the canonical variants
first.  */
+  tree prev = NULL_TREE;
 

[pushed] c++: Add test for fixed PR [PR102338]

2022-01-20 Thread Marek Polacek via Gcc-patches
This was fixed by r12-6025 and is sufficiently different from
noexcept71.C that I think we should add it.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/102338

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/noexcept2.C: New test.
---
 gcc/testsuite/g++.dg/cpp1y/noexcept2.C | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/noexcept2.C

diff --git a/gcc/testsuite/g++.dg/cpp1y/noexcept2.C 
b/gcc/testsuite/g++.dg/cpp1y/noexcept2.C
new file mode 100644
index 000..38dd05cd066
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/noexcept2.C
@@ -0,0 +1,20 @@
+// PR c++/102338
+// { dg-do compile { target c++14 } }
+
+struct S {
+template
+static auto f(T&& t) noexcept {
+return true;
+}
+
+template
+static auto f(T&& t, Ts&& ... ts) noexcept(noexcept(f(ts...))) {
+return f(ts...);
+}
+
+};
+
+int main() {
+S::f(true, 0, 5u);
+return 0;
+}

base-commit: 2f34d7ef3d026cf7109b6e6bb2eca14c840e7c71
-- 
2.34.1



Re: [PATCH v2, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-20 Thread David Edelsohn via Gcc-patches
On Thu, Jan 20, 2022 at 2:36 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch adds a combine pattern for "CA minus one". As CA only has two
> values (0 or 1), we could convert following pattern
>   (sign_extend:DI (plus:SI (reg:SI 98 ca)
> (const_int -1 [0x]
> to
>(plus:DI (reg:DI 98 ca)
> (const_int -1 [0x])))
>With this patch, one unnecessary sign extend is eliminated.
>
>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-20 Haochen Gui 
>
> gcc/
> * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr95737.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6ecb0bd6142..1d8b212962f 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -2358,6 +2358,19 @@ (define_insn "subf3_carry_in_xx"
>"subfe %0,%0,%0"
>[(set_attr "type" "add")])
>
> +(define_insn_and_split "*extenddi_ca_minus_one"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> +   (sign_extend:DI (plus:SI (reg:SI CA_REGNO)
> +(const_int -1]
> +  ""
> +  "#"
> +  ""
> +  [(parallel [(set (match_dup 0)
> +  (plus:DI (reg:DI CA_REGNO)
> +   (const_int -1)))
> + (clobber (reg:DI CA_REGNO))])]
> +  ""
> +)
>
>  (define_insn "@neg2"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c 
> b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> new file mode 100644
> index 000..94320f23423
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> @@ -0,0 +1,10 @@
> +/* PR target/95737 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */

Why does the testcase force power8? This testcase is not specific to
Power8 or later.

> +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
> +
> +
> +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
> long b)
> +{
> +   return -(a < b);
> +}

If you're only testing for lp64, the testcase could use "long" instead
of "long long".

This is okay with those changes.

Thanks, David


Re: [PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 10:52:10AM -0700, Martin Sebor wrote:
> On 1/20/22 10:03, Jakub Jelinek wrote:
> > On Thu, Jan 20, 2022 at 09:56:59AM -0700, Martin Sebor wrote:
> > > > With normal -Wformat I see all expected warnings in:
> > > > char *foo (const char *) __attribute__((format_arg(1)));
> > > > void bar (const char *, ...) __attribute__((format(printf, 1, 2)));
> > > 
> > > -Wformat-diag is internal to GCC and needs one of the GCC-internal
> > > attributes to enable, like __gcc_cxxdiag__, for example like this:
> > > 
> > >__attribute__ ((format (__gcc_cxxdiag__, 1, 2)))
> > >void bar (const char *, ...);
> > > 
> > > With that it triggers in all the same instances as -Wformat below
> > > (as near I can tell for a modified test case).
> > 
> > Glad to hear that, but then I don't understand why we didn't warn on
> > cp/error.cc before Martin L.'s change when --disable-nls wasn't used.
> 
> Good question!  There does seem to be some strange interplay between
> parentheses and -Wformat for __gcc_cdiag__ functions in the C++ front
> end:
> 
> __attribute__ ((format (__gcc_cxxdiag__, 1, 2)))
> void bar (const char *, ...);
> 
> void
> baz (int x)
> {
>   bar (x ? "<%s" : "%i", x);   // -Wformat-diag
>   bar ((x ? "<%s" : "%i"), x); // silence
>   bar ((x ? ("<%s") : ("%i")), x); // silence
> }
> 
> The C front end warns on all three calls.
> 
> With attribute printf both the C and C++ front ends issue a -Wformat
> for all three calls as expected (passing an int to %s).

Filed PR104148 now.

Jakub



[committed] c++: add testcase for recently fixed PR [PR103631]

2022-01-20 Thread Patrick Palka via Gcc-patches
We accept this testcase after r12-6773 ("CTAD inside alias template").

PR c++/103631

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class51.C: New test.
---
 gcc/testsuite/g++.dg/cpp2a/nontype-class51.C | 26 
 1 file changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class51.C

diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class51.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class51.C
new file mode 100644
index 000..1501aa1c426
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class51.C
@@ -0,0 +1,26 @@
+// PR c++/103631
+// { dg-do compile { target c++20 } }
+
+template class T>
+constexpr bool is_specialize_value_v = false;
+
+template class T, auto Ts>
+constexpr bool is_specialize_value_v, T> = true;
+
+template class T>
+concept specialize_value = is_specialize_value_v;
+
+template struct Test { };
+
+template
+struct A {
+  template void f(T) requires specialize_value;
+};
+
+int main() {
+  A{}> a0;
+  A{}> a1;
+  a0.f(a0);
+  a0.f(a1);
+  a0.f(0); // { dg-error "no match" }
+}
-- 
2.35.0.rc1



Re: [PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Martin Sebor via Gcc-patches

On 1/20/22 10:03, Jakub Jelinek wrote:

On Thu, Jan 20, 2022 at 09:56:59AM -0700, Martin Sebor wrote:

With normal -Wformat I see all expected warnings in:
char *foo (const char *) __attribute__((format_arg(1)));
void bar (const char *, ...) __attribute__((format(printf, 1, 2)));


-Wformat-diag is internal to GCC and needs one of the GCC-internal
attributes to enable, like __gcc_cxxdiag__, for example like this:

   __attribute__ ((format (__gcc_cxxdiag__, 1, 2)))
   void bar (const char *, ...);

With that it triggers in all the same instances as -Wformat below
(as near I can tell for a modified test case).


Glad to hear that, but then I don't understand why we didn't warn on
cp/error.cc before Martin L.'s change when --disable-nls wasn't used.


Good question!  There does seem to be some strange interplay between
parentheses and -Wformat for __gcc_cdiag__ functions in the C++ front
end:

__attribute__ ((format (__gcc_cxxdiag__, 1, 2)))
void bar (const char *, ...);

void
baz (int x)
{
  bar (x ? "<%s" : "%i", x);   // -Wformat-diag
  bar ((x ? "<%s" : "%i"), x); // silence
  bar ((x ? ("<%s") : ("%i")), x); // silence
}

The C front end warns on all three calls.

With attribute printf both the C and C++ front ends issue a -Wformat
for all three calls as expected (passing an int to %s).

Martin


Re: [PATCH] Fix alignment of stack slots for overaligned types [PR103500]

2022-01-20 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response.

Alex Coplan  writes:
> On 20/12/2021 13:19, Richard Sandiford wrote:
>> Alex Coplan via Gcc-patches  writes:
>> > Hi,
>> >
>> > This fixes PR103500 i.e. ensuring that stack slots for
>> > passed-by-reference overaligned types are appropriately aligned. For the
>> > testcase:
>> >
>> > typedef struct __attribute__((aligned(32))) {
>> >   long x,y;
>> > } S;
>> > S x;
>> > void f(S);
>> > void g(void) { f(x); }
>> >
>> > on AArch64, we currently generate (at -O2):
>> >
>> > g:
>> > adrpx1, .LANCHOR0
>> > add x1, x1, :lo12:.LANCHOR0
>> > stp x29, x30, [sp, -48]!
>> > mov x29, sp
>> > ldp q0, q1, [x1]
>> > add x0, sp, 16
>> > stp q0, q1, [sp, 16]
>> > bl  f
>> > ldp x29, x30, [sp], 48
>> > ret
>> >
>> > so the stack slot for the passed-by-reference copy of the structure is
>> > at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the
>> > structure is only 16-byte aligned. The PCS requires the structure to be
>> > 32-byte aligned. After this patch, we generate:
>> >
>> > g:
>> > adrpx1, .LANCHOR0
>> > add x1, x1, :lo12:.LANCHOR0
>> > stp x29, x30, [sp, -64]!
>> > mov x29, sp
>> > add x0, sp, 47
>> > ldp q0, q1, [x1]
>> > and x0, x0, -32
>> > stp q0, q1, [x0]
>> > bl  f
>> > ldp x29, x30, [sp], 64
>> > ret
>> >
>> > i.e. we ensure 32-byte alignment for the struct.
>> >
>> > The approach taken here is similar to that in
>> > function.c:assign_parm_setup_block where it handles the case for
>> > DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is
>> > similar to the approach taken in cfgexpand.c:expand_stack_vars (where
>> > the function calls get_dynamic_stack_size) which is the code that
>> > handles the alignment for overaligned structures as addressable local
>> > variables (see the related case discussed in the PR).
>> 
>> A difference with the latter is that cfgexpand (AFAICT) always
>> honours the DECL/TYPE_ALIGN, with LOCAL_DECL_ALIGNMENT supposedly
>> only increasing the alignment for efficiency reasons (rather than
>> decreasing it).
>> 
>> So…
>> 
>> > This patch also updates the aapcs64 test mentioned in the PR to avoid
>> > the frontend folding away the alignment check. I've confirmed that the
>> > execution test actually fails on aarch64-linux-gnu prior to the patch
>> > being applied and passes afterwards.
>> >
>> > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and
>> > arm-linux-gnueabihf: no regressions.
>> >
>> > I'd appreciate any feedback. Is it OK for trunk?
>> >
>> > Thanks,
>> > Alex
>> >
>> > gcc/ChangeLog:
>> >
>> >PR middle-end/103500
>> >* function.c (get_stack_local_alignment): Align BLKmode overaligned
>> >types to the alignment required by the type.
>> >(assign_stack_temp_for_type): Handle BLKmode overaligned stack
>> >slots by allocating a larger-than-necessary buffer and aligning
>> >the address within appropriately.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >PR middle-end/103500
>> >* gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref):
>> >Prevent the frontend from folding our alignment check away by
>> >using snprintf to store the pointer into a string and recovering
>> >it with sscanf.
>> >
>> > diff --git a/gcc/function.c b/gcc/function.c
>> > index 61b3bd036b8..5ed722ab959 100644
>> > --- a/gcc/function.c
>> > +++ b/gcc/function.c
>> > @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode 
>> > mode)
>> >unsigned int alignment;
>> >  
>> >if (mode == BLKmode)
>> > -alignment = BIGGEST_ALIGNMENT;
>> > +alignment = (type && TYPE_ALIGN (type) > 
>> > MAX_SUPPORTED_STACK_ALIGNMENT)
>> > +  ? TYPE_ALIGN (type)
>> > +  : BIGGEST_ALIGNMENT;
>> 
>> …I'm not sure about this calculation.  Why do we only honour TYPE_ALIGN
>> if it's greater than MAX_SUPPORTED_STACK_ALIGNMENT, and fall all the
>> way back to BIGGEST_ALIGNMENT otherwise?  It looks like on nvptx
>> this would have the effect of honouring (say) 2048-byte alignment,
>> but not 32-byte alignment (which falls between BIGGEST_ALIGNMENT
>> and MAX_SUPPORTED_STACK_ALIGNMENT).
>
> So, to be honest, this was a bit of a bodge to try and work around
> issues on x86. My original attempt at solving the PR used the more
> obvious calculation you suggest below, i.e. the max of BIGGEST_ALIGNMENT
> and TYPE_ALIGN (type). The problem with that is, on x86,
> MAX_SUPPORTED_STACK_ALIGNMENT has a huge value (2^31).
> explow.c:get_dynamic_stack_size has:
>
>   if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0)
> {
>   size = round_push (size);
>   [...]
> }
>   *psize = size;
>
> so inevitably we end up calling round_push on x86 which in turn ends up
> going down the else branch:
>
>   else
> {
>   /* If 

Re: [PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 09:56:59AM -0700, Martin Sebor wrote:
> > With normal -Wformat I see all expected warnings in:
> > char *foo (const char *) __attribute__((format_arg(1)));
> > void bar (const char *, ...) __attribute__((format(printf, 1, 2)));
> 
> -Wformat-diag is internal to GCC and needs one of the GCC-internal
> attributes to enable, like __gcc_cxxdiag__, for example like this:
> 
>   __attribute__ ((format (__gcc_cxxdiag__, 1, 2)))
>   void bar (const char *, ...);
> 
> With that it triggers in all the same instances as -Wformat below
> (as near I can tell for a modified test case).

Glad to hear that, but then I don't understand why we didn't warn on
cp/error.cc before Martin L.'s change when --disable-nls wasn't used.

Jakub



Re: [PATCH v5 1/1] [ARM] Add support for TLS register based stack protector canary access

2022-01-20 Thread Ard Biesheuvel via Gcc-patches
On Wed, 19 Jan 2022 at 18:02, Ard Biesheuvel  wrote:
>
> On Wed, 19 Jan 2022 at 17:54, Kyrylo Tkachov  wrote:
> >
> > Hi Ard,
> >
> > > -Original Message-
> > > From: Gcc-patches  > > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Ard
> > > Biesheuvel via Gcc-patches
> > > Sent: Monday, November 15, 2021 6:04 PM
> > > To: linux-harden...@vger.kernel.org
> > > Cc: Richard Sandiford ;
> > > thomas.preudho...@celest.fr; Keith Packard ;
> > > gcc-patches@gcc.gnu.org; Kyrylo Tkachov ; Ard
> > > Biesheuvel 
> > > Subject: [PATCH v5 1/1] [ARM] Add support for TLS register based stack
> > > protector canary access
> > >
> > > Add support for accessing the stack canary value via the TLS register,
> > > so that multiple threads running in the same address space can use
> > > distinct canary values. This is intended for the Linux kernel running in
> > > SMP mode, where processes entering the kernel are essentially threads
> > > running the same program concurrently: using a global variable for the
> > > canary in that context is problematic because it can never be rotated,
> > > and so the OS is forced to use the same value as long as it remains up.
> > >
> > > Using the TLS register to index the stack canary helps with this, as it
> > > allows each CPU to context switch the TLS register along with the rest
> > > of the process, permitting each process to use its own value for the
> > > stack canary.
> >
> > I've tested this patch on an arm-none-linux-gnueabihf target and the 
> > results look clean.
> > Have you tested this patch with a kernel build as well? (since the 
> > functionality is intended for that use).
>
> Of course.
>
> > If so, the patch is okay but please rebase it and repost so that we can 
> > commit it taking into account
> >
>
> Will do.

I have sent out my v6 - please let me know if there is anything else I
need to do to get this landed.

Thanks,
Ard.


Re: [PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Martin Sebor via Gcc-patches

On 1/20/22 09:43, Jakub Jelinek wrote:

On Thu, Jan 20, 2022 at 09:33:30AM -0700, Martin Sebor wrote:

Oh, and one more thing, but this time not about this source file but about
the warning.  Does it handle the gettext case?
I think -Wformat generally does, gettext has format_arg attribute.
If the warning handles
pp_printf ("", str);
and
pp_printf (cond ? "" : "", str);
and
pp_printf (cond ? "" : "something %s", str);
and
pp_printf (gettext (""), str);
then maybe it should also handle
pp_printf (cond ? gettext ("") : ", str);
and
pp_printf (cond ? gettext ("") : "something %s, str);
too?


-Wformat-diag is part of -Wformat so they both should handle the same
things.  Do you see a difference between what they handle?


With normal -Wformat I see all expected warnings in:
char *foo (const char *) __attribute__((format_arg(1)));
void bar (const char *, ...) __attribute__((format(printf, 1, 2)));


-Wformat-diag is internal to GCC and needs one of the GCC-internal
attributes to enable, like __gcc_cxxdiag__, for example like this:

  __attribute__ ((format (__gcc_cxxdiag__, 1, 2)))
  void bar (const char *, ...);

With that it triggers in all the same instances as -Wformat below
(as near I can tell for a modified test case).

Martin



void
baz (int x)
{
   bar ("%ld", x);
   bar (x ? "%ld" : "%ld", x);
   bar (x ? "%ld" : "%lld", x);
   bar (foo ("%ld"), x);
   bar (x ? foo ("%ld") : "%ld", x);
   bar (x ? foo ("%ld") : "%lld", x);
   bar (foo (x ? "%ld" : "%ld"), x);
   bar (foo (x ? "%ld" : "%lld"), x);
}
(on all bar calls, on those with different strings or one in foo and other
not 2).
 From the fact that -Wformat-diag didn't warn on the
pp_printf (cond ? gettext ("") : ", str);
case I assume -Wformat-diag doesn't handle this.

Jakub





[PATCH] preprocessor: -Wbidi-chars and UCNs [PR104030]

2022-01-20 Thread Marek Polacek via Gcc-patches
Stephan Bergmann reported that our -Wbidi-chars breaks the build
of LibreOffice because we warn about UCNs even when their usage
is correct: LibreOffice constructs strings piecewise, as in:

  aText = u"\u202D" + aText;

and warning about that is overzealous.  Since no editor (AFAIK)
interprets UCNs to show them as Unicode characters, there's less
risk in misinterpreting them, and so perhaps we shouldn't warn
about them by default.  However, identifiers containing UCNs or
programs generating other programs could still cause confusion,
so I'm keeping the UCN checking.  To turn it on, you just need
to use -Wbidi-chars=unpaired,ucn or -Wbidi-chars=any,ucn.

The implementation is done by hardcoding and therefore ugly, but
my attempts to do something better quickly failed: this option is
marked as CPP, therefore needs Var and Init, and in turn Enum, etc.
And removing CPP doesn't sound like a great option.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR preprocessor/104030

gcc/c-family/ChangeLog:

* c.opt (Wbidi-chars): Also accept =any,ucn, =ucn,any,
=unpaired,ucn, and =ucn,unpaired.

gcc/ChangeLog:

* doc/invoke.texi: Update documentation for -Wbidi-chars.

libcpp/ChangeLog:

* include/cpplib.h (enum cpp_bidirectional_level): Add
bidirectional_unpaired_ucn and bidirectional_any_ucn enumerators.
* lex.cc (maybe_warn_bidi_on_close): Don't warn about UCNs
unless UCN checking is on.
(maybe_warn_bidi_on_char): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/Wbidi-chars-10.c: Turn on UCN checking.
* c-c++-common/Wbidi-chars-11.c: Likewise.
* c-c++-common/Wbidi-chars-14.c: Likewise.
* c-c++-common/Wbidi-chars-16.c: Likewise.
* c-c++-common/Wbidi-chars-17.c: Likewise.
* c-c++-common/Wbidi-chars-4.c: Likewise.
* c-c++-common/Wbidi-chars-5.c: Likewise.
* c-c++-common/Wbidi-chars-6.c: Likewise.
* c-c++-common/Wbidi-chars-7.c: Likewise.
* c-c++-common/Wbidi-chars-8.c: Likewise.
* c-c++-common/Wbidi-chars-9.c: Likewise.
* c-c++-common/Wbidi-chars-ranges.c: Likewise.
* c-c++-common/Wbidi-chars-18.c: New test.
* c-c++-common/Wbidi-chars-19.c: New test.
* c-c++-common/Wbidi-chars-20.c: New test.
* c-c++-common/Wbidi-chars-21.c: New test.
---
 gcc/c-family/c.opt  | 14 +-
 gcc/doc/invoke.texi |  8 ++--
 gcc/testsuite/c-c++-common/Wbidi-chars-10.c |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-11.c |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-14.c |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-16.c |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-17.c |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-18.c | 11 +++
 gcc/testsuite/c-c++-common/Wbidi-chars-19.c | 11 +++
 gcc/testsuite/c-c++-common/Wbidi-chars-20.c | 11 +++
 gcc/testsuite/c-c++-common/Wbidi-chars-21.c | 11 +++
 gcc/testsuite/c-c++-common/Wbidi-chars-4.c  |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-5.c  |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-6.c  |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-7.c  |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-8.c  |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-9.c  |  2 +-
 gcc/testsuite/c-c++-common/Wbidi-chars-ranges.c |  2 +-
 libcpp/include/cpplib.h |  5 -
 libcpp/lex.cc   | 14 +-
 20 files changed, 88 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wbidi-chars-18.c
 create mode 100644 gcc/testsuite/c-c++-common/Wbidi-chars-19.c
 create mode 100644 gcc/testsuite/c-c++-common/Wbidi-chars-20.c
 create mode 100644 gcc/testsuite/c-c++-common/Wbidi-chars-21.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index db65c14a7a5..f829656fc36 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -380,7 +380,7 @@ C ObjC C++ ObjC++ Warning Alias(Wbidi-chars=,any,none)
 
 Wbidi-chars=
 C ObjC C++ ObjC++ RejectNegative Joined Warning CPP(cpp_warn_bidirectional) 
CppReason(CPP_W_BIDIRECTIONAL) Var(warn_bidirectional) 
Init(bidirectional_unpaired) Enum(cpp_bidirectional_level)
--Wbidi-chars=[none|unpaired|any] Warn about UTF-8 bidirectional control 
characters.
+-Wbidi-chars=[none|unpaired|any|unpaired,ucn|any,ucn] Warn about UTF-8 
bidirectional control characters.
 
 ; Required for these enum values.
 SourceInclude
@@ -398,6 +398,18 @@ Enum(cpp_bidirectional_level) String(unpaired) 
Value(bidirectional_unpaired)
 EnumValue
 Enum(cpp_bidirectional_level) String(any) Value(bidirectional_any)
 
+EnumValue
+Enum(cpp_bidirectional_level) String(any,ucn) Value(bidirectional_any_ucn)
+
+EnumValue
+Enum(cpp_bidirectional_level) String(ucn,any) Value(bidirectional_any_ucn)
+
+EnumValue
+Enum(cpp_bidirectional_level) String(unpaired,ucn) 

Re: [PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 09:33:30AM -0700, Martin Sebor wrote:
> > Oh, and one more thing, but this time not about this source file but about
> > the warning.  Does it handle the gettext case?
> > I think -Wformat generally does, gettext has format_arg attribute.
> > If the warning handles
> >pp_printf ("", str);
> > and
> >pp_printf (cond ? "" : "", str);
> > and
> >pp_printf (cond ? "" : "something %s", str);
> > and
> >pp_printf (gettext (""), str);
> > then maybe it should also handle
> >pp_printf (cond ? gettext ("") : ", str);
> > and
> >pp_printf (cond ? gettext ("") : "something %s, str);
> > too?
> 
> -Wformat-diag is part of -Wformat so they both should handle the same
> things.  Do you see a difference between what they handle?

With normal -Wformat I see all expected warnings in:
char *foo (const char *) __attribute__((format_arg(1)));
void bar (const char *, ...) __attribute__((format(printf, 1, 2)));

void
baz (int x)
{
  bar ("%ld", x);
  bar (x ? "%ld" : "%ld", x);
  bar (x ? "%ld" : "%lld", x);
  bar (foo ("%ld"), x);
  bar (x ? foo ("%ld") : "%ld", x);
  bar (x ? foo ("%ld") : "%lld", x);
  bar (foo (x ? "%ld" : "%ld"), x);
  bar (foo (x ? "%ld" : "%lld"), x);
}
(on all bar calls, on those with different strings or one in foo and other
not 2).
>From the fact that -Wformat-diag didn't warn on the
pp_printf (cond ? gettext ("") : ", str);
case I assume -Wformat-diag doesn't handle this.

Jakub



Re: [PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Martin Sebor via Gcc-patches

On 1/20/22 03:28, Jakub Jelinek wrote:

On Thu, Jan 20, 2022 at 11:17:28AM +0100, Jakub Jelinek via Gcc-patches wrote:

--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -768,6 +768,11 @@ class_key_or_enum_as_string (tree t)
  return "struct";
  }
+#if __GNUC__ >= 10
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wformat-diag"
+#endif
+
  /* Print out a class declaration T under the control of FLAGS,
 in the form `class foo'.  */
@@ -851,6 +856,10 @@ dump_aggr_type (cxx_pretty_printer *pp, tree t, int flags)
 flags & ~TFF_TEMPLATE_HEADER);
  }
+#if __GNUC__ >= 10
+#pragma GCC diagnostic pop
+#endif


Oh, and one more thing, but this time not about this source file but about
the warning.  Does it handle the gettext case?
I think -Wformat generally does, gettext has format_arg attribute.
If the warning handles
   pp_printf ("", str);
and
   pp_printf (cond ? "" : "", str);
and
   pp_printf (cond ? "" : "something %s", str);
and
   pp_printf (gettext (""), str);
then maybe it should also handle
   pp_printf (cond ? gettext ("") : ", str);
and
   pp_printf (cond ? gettext ("") : "something %s, str);
too?


-Wformat-diag is part of -Wformat so they both should handle the same
things.  Do you see a difference between what they handle?

Martin



Jakub





[PATCH] s390: Split CCSmode into CCSINT and CCSFP

2022-01-20 Thread Robin Dapp via Gcc-patches
Hi,

this patch splits the CCSmode into an integer and a floating point
variant.  This allows ifcvt to consider floating point compares which
would be rejected before because they could not be reversed.

Bootstrapped and regtested on s390x.

Is it OK?

Regards
 Robin

--

gcc/ChangeLog:

* config/s390/predicates.md: Add CCSINTmode and CCSFPmode.
* config/s390/s390-modes.def (UNORDERED): Likewise.
(CC_MODE): Likewise.
* config/s390/s390.cc (s390_cc_modes_compatible): Likewise.
(s390_match_ccmode_set): Likewise.
(s390_select_ccmode): Likewise.
(s390_branch_condition_mask): Likewise.
(s390_reverse_condition): Likewise.
* config/s390/s390.h (REVERSIBLE_CC_MODE): Likewise.
* config/s390/s390.md: Likewise.
* config/s390/subst.md: Likewise.commit d3d4f8486340ff889de0dd709262f31953c54eda
Author: Robin Dapp 
Date:   Fri Jul 23 11:13:39 2021 +0200

s390: Split CCSmode in CCSINT and CCSFP.

In order to be able to reverse condition codes, CCSmode needs to be
split into CCSINT and CCSFP modes.

Subsequently, we can add CCSFPmode to REVERSIBLE_CC_MODE.

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 33194d3f3d6..ec47416cc1b 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -325,7 +325,8 @@
 case E_CCURmode:
   return GET_CODE (op) == LTU;
 
-case E_CCSmode:
+case E_CCSINTmode:
+case E_CCSFPmode:
   return GET_CODE (op) == UNGT;
 
 case E_CCSRmode:
@@ -370,7 +371,8 @@
 case E_CCURmode:
   return GET_CODE (op) == GEU;
 
-case E_CCSmode:
+case E_CCSINTmode:
+case E_CCSFPmode:
   return GET_CODE (op) == LE;
 
 case E_CCSRmode:
diff --git a/gcc/config/s390/s390-modes.def b/gcc/config/s390/s390-modes.def
index b419907960e..eafe1e12938 100644
--- a/gcc/config/s390/s390-modes.def
+++ b/gcc/config/s390/s390-modes.def
@@ -48,12 +48,12 @@ CCUR: EQ  GTU  LTU NE (CLGF/R)
 
 Signed compares
 
-CCS:  EQ  LT   GT  UNORDERED  (LTGFR, LTGR, LTR, ICM/Y,
-   LTDBR, LTDR, LTEBR, LTER,
+CCSINT: EQLT   GT  UNORDERED  (LTGFR, LTGR, LTR, ICM/Y,
CG/R, C/R/Y, CGHI, CHI,
-   CDB/R, CD/R, CEB/R, CE/R,
-   ADB/R, AEB/R, SDB/R, SEB/R,
SRAG, SRA, SRDA)
+CCSFP:  EQLT   GT  UNORDERED  (CDB/R, CD/R, CEB/R, CE/R,
+   LTDBR, LTDR, LTEBR, LTER,
+   ADB/R, AEB/R, SDB/R, SEB/R)
 CCSR: EQ  GT   LT  UNORDERED  (CGF/R, CH/Y)
 CCSFPS: EQLT   GT  UNORDERED  (KEB/R, KDB/R, KXBR, KDTR,
 		   KXTR, WFK)
@@ -234,7 +234,8 @@ CC_MODE (CCL2);
 CC_MODE (CCL3);
 CC_MODE (CCU);
 CC_MODE (CCUR);
-CC_MODE (CCS);
+CC_MODE (CCSINT);
+CC_MODE (CCSFP);
 CC_MODE (CCSR);
 CC_MODE (CCSFPS);
 CC_MODE (CCT);
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index f2e4474df99..0a93b4b39f4 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -1378,11 +1378,13 @@ s390_cc_modes_compatible (machine_mode m1, machine_mode m2)
 {
 case E_CCZmode:
   if (m2 == CCUmode || m2 == CCTmode || m2 == CCZ1mode
-	  || m2 == CCSmode || m2 == CCSRmode || m2 == CCURmode)
+	  || m2 == CCSINTmode || m2 == CCSFPmode
+	  || m2 == CCSRmode || m2 == CCURmode)
 	return m2;
   return VOIDmode;
 
-case E_CCSmode:
+case E_CCSINTmode:
+case E_CCSFPmode:
 case E_CCUmode:
 case E_CCTmode:
 case E_CCSRmode:
@@ -1422,7 +1424,8 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
   switch (set_mode)
 {
 case E_CCZ1mode:
-case E_CCSmode:
+case E_CCSINTmode:
+case E_CCSFPmode:
 case E_CCSRmode:
 case E_CCSFPSmode:
 case E_CCUmode:
@@ -1445,7 +1448,8 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
   break;
 
 case E_CCZmode:
-  if (req_mode != CCSmode && req_mode != CCUmode && req_mode != CCTmode
+  if (req_mode != CCSINTmode && req_mode != CCSFPmode
+	  && req_mode != CCUmode && req_mode != CCTmode
 	  && req_mode != CCSRmode && req_mode != CCURmode
 	  && req_mode != CCZ1mode)
 	return 0;
@@ -1625,7 +1629,12 @@ s390_select_ccmode (enum rtx_code code, rtx op0, rtx op1)
 	if ((GET_CODE (op0) == SIGN_EXTEND || GET_CODE (op0) == ZERO_EXTEND)
 	&& GET_CODE (op1) != CONST_INT)
 	  return CCSRmode;
-	return CCSmode;
+	if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_FLOAT
+	|| GET_MODE_CLASS (GET_MODE (op0)) == MODE_DECIMAL_FLOAT)
+	  return CCSFPmode;
+	else
+	  return CCSINTmode;
+	break;
 
   case LTU:
   case GEU:
@@ -2139,7 

[PATCH] aarch64: allow ld1/stq in test output [PR102517]

2022-01-20 Thread Richard Earnshaw via Gcc-patches

Following the changes to the inline memcpy operations get expanded, we
now generate ld1/st1 using a 128-bit vector register rather than ldp
with Q registers.  The behaviour is equivalent, so relax the tests to
permit either variant.

gcc/testsuite/ChangeLog:

PR target/102517
* gcc.target/aarch64/cpymem-q-reg_1.c: Allow ld1 and st1 for the
memcpy expansion.
---
 gcc/testsuite/gcc.target/aarch64/cpymem-q-reg_1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/cpymem-q-reg_1.c b/gcc/testsuite/gcc.target/aarch64/cpymem-q-reg_1.c
index df5f67e425b..45f3f0ad144 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpymem-q-reg_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpymem-q-reg_1.c
@@ -10,7 +10,7 @@ foo (void)
   __builtin_memcpy (dst, src, N * sizeof (int));
 }
 
-/* { dg-final { scan-assembler {ldp\tq[0-9]*} } } */
+/* { dg-final { scan-assembler {ldp\tq[0-9]*|ld1\t{v[0-9]*\.16b - v[0-9]*\.16b}} } } */
 /* { dg-final { scan-assembler-not {ldp\tx[0-9]*} } } */
-/* { dg-final { scan-assembler {stp\tq[0-9]*} } } */
+/* { dg-final { scan-assembler {stp\tq[0-9]*|st1\t{v[0-9]*\.16b - v[0-9]*\.16b}} } } */
 /* { dg-final { scan-assembler-not {stp\tx[0-9]*} } } */


[PATCH][GCC13?] RISC-V: Replace `smin'/`smax' RTL patterns with `fmin'/`fmax'

2022-01-20 Thread Maciej W. Rozycki
RISC-V FMIN and FMAX machine instructions are IEEE-754-conformant[1]:

"For FMIN and FMAX, if at least one input is a signaling NaN, or if both 
inputs are quiet NaNs, the result is the canonical NaN.  If one operand 
is a quiet NaN and the other is not a NaN, the result is the non-NaN 
operand."

as required by our `fminM3' and `fmaxM3' standard RTL patterns.

However we only define `sminM3' and `smaxM3' standard RTL patterns to 
produce the FMIN and FMAX machine instructions, which in turn causes the 
`__builtin_fmin' and `__builtin_fmax' family of intrinsics to emit the 
corresponding libcalls rather than the relevant machine instructions.

Rename the `smin3' and `smax3' patterns to `fmin3' and 
`fmax3' respectively then, removing the need to use libcalls for 
IEEE 754 semantics with the minimum and maximum operations.

[1] "The RISC-V Instruction Set Manual, Volume I: User-Level ISA",
Document Version 2.2, May 7, 2017, Section 8.3 "NaN Generation and 
Propagation", p. 48

gcc/
* config/riscv/riscv.md (smin3): Rename pattern to...
(fmin3): ... this.
(smax3): Likewise...
(fmax3): ... this.
---
Hi,

 It's not clear to me how it's been missed or whether there is anything I 
might be actually missing.  It looks to me like a clear oversight however. 
And in any case this change has passed full GCC regression testing (except 
for the D frontend, which has stopped being built recently due to a defect 
in Debian I haven't yet got to getting fixed) with the `riscv64-linux-gnu' 
target using the HiFive Unmatched (U74 CPU) target board, so it seems to 
be doing the right thing.

 Timing might a bit unfortunate for this submission and given that it is 
not a regression fix I guess this is GCC 13 material.  Please let me know 
otherwise.

 In any case OK to apply (when the time comes)?

  Maciej
---
 gcc/config/riscv/riscv.md |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

gcc-riscv-fmin-fmax.diff
Index: gcc/gcc/config/riscv/riscv.md
===
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -1214,7 +1214,7 @@
 ;;
 ;;  
 
-(define_insn "smin3"
+(define_insn "fmin3"
   [(set (match_operand:ANYF0 "register_operand" "=f")
(smin:ANYF (match_operand:ANYF 1 "register_operand" " f")
   (match_operand:ANYF 2 "register_operand" " f")))]
@@ -1223,7 +1223,7 @@
   [(set_attr "type" "fmove")
(set_attr "mode" "")])
 
-(define_insn "smax3"
+(define_insn "fmax3"
   [(set (match_operand:ANYF0 "register_operand" "=f")
(smax:ANYF (match_operand:ANYF 1 "register_operand" " f")
   (match_operand:ANYF 2 "register_operand" " f")))]


Re: [PATCH] middle-end/100786 - constant folding from incompatible alias

2022-01-20 Thread Richard Biener via Gcc-patches
On Thu, 20 Jan 2022, Richard Biener wrote:

> On Thu, 20 Jan 2022, Jakub Jelinek wrote:
> 
> > On Thu, Jan 20, 2022 at 02:58:21PM +0100, Richard Biener via Gcc-patches 
> > wrote:
> > > The following avoids us ICEing doing constant folding from variables
> > > with aliases of different types.  The formerly used fold_convert
> > > wasn't entirely correct even for the cases it handled and using
> > > a VIEW_CONVERT_EXPR avoids the ICE.  Reading from a larger alias
> > > will cause unfolded constants to appear but appearantly we handle
> > > that just "fine".
> > > 
> > >   b.0_1 = VIEW_CONVERT_EXPR(1);
> > 
> > If they have the same sizes, why not, but doesn't int have
> > different size from double and isn't VCE defined only for same sizes?
> 
> Well yes, it's undefined.  IL wise we only constrain us for SSA
> operands, not constants.  But the whole testcase is undefined,
> and previously we'd happily accept a int -1 as a -1 long by
> sign-extending it.
> 
> I'm going to test an alternative patch tackling get_symbol_constant_value
> which is only used from CCP (which looks suffering from the same issue)
> and folding.

Like the following.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

>From cc3f3c7428253c2326a00699f08bd89467b941f5 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 20 Jan 2022 14:25:51 +0100
Subject: [PATCH] middle-end/100786 - constant folding from incompatible alias
To: gcc-patches@gcc.gnu.org

The following avoids us ICEing doing constant folding from variables
with aliases of different types.  The issue appears both in
folding and CCP and FRE can do more fancy stuff to still constant
fold cases where the load is smaller than the initializer so
defer it to there.

2022-01-20  Richard Biener  

PR middle-end/100786
* gimple-fold.cc (get_symbol_constant_value): Only return
values of compatible type to the symbol.

* gcc.dg/torture/pr100786.c: New testcase.
---
 gcc/gimple-fold.cc  | 4 +++-
 gcc/testsuite/gcc.dg/torture/pr100786.c | 9 +
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr100786.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 08d3cc214ff..3639aa20626 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -291,7 +291,9 @@ get_symbol_constant_value (tree sym)
   if (val)
{
  val = canonicalize_constructor_val (unshare_expr (val), sym);
- if (val && is_gimple_min_invariant (val))
+ if (val
+ && is_gimple_min_invariant (val)
+ && useless_type_conversion_p (TREE_TYPE (sym), TREE_TYPE (val)))
return val;
  else
return NULL_TREE;
diff --git a/gcc/testsuite/gcc.dg/torture/pr100786.c 
b/gcc/testsuite/gcc.dg/torture/pr100786.c
new file mode 100644
index 000..42f4e485593
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr100786.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+
+const double a = 0;
+extern int b __attribute__((alias("a")));
+void inc() { b++; }
+
+const int a2 = 0;
+extern double b2 __attribute__((alias("a2")));
+void inc2() { b2+=1; }
-- 
2.31.1



Re: [PATCH] middle-end/100786 - constant folding from incompatible alias

2022-01-20 Thread Richard Biener via Gcc-patches
On Thu, 20 Jan 2022, Jakub Jelinek wrote:

> On Thu, Jan 20, 2022 at 02:58:21PM +0100, Richard Biener via Gcc-patches 
> wrote:
> > The following avoids us ICEing doing constant folding from variables
> > with aliases of different types.  The formerly used fold_convert
> > wasn't entirely correct even for the cases it handled and using
> > a VIEW_CONVERT_EXPR avoids the ICE.  Reading from a larger alias
> > will cause unfolded constants to appear but appearantly we handle
> > that just "fine".
> > 
> >   b.0_1 = VIEW_CONVERT_EXPR(1);
> 
> If they have the same sizes, why not, but doesn't int have
> different size from double and isn't VCE defined only for same sizes?

Well yes, it's undefined.  IL wise we only constrain us for SSA
operands, not constants.  But the whole testcase is undefined,
and previously we'd happily accept a int -1 as a -1 long by
sign-extending it.

I'm going to test an alternative patch tackling get_symbol_constant_value
which is only used from CCP (which looks suffering from the same issue)
and folding.

Richard.


Re: [PATCH] middle-end/100786 - constant folding from incompatible alias

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 02:58:21PM +0100, Richard Biener via Gcc-patches wrote:
> The following avoids us ICEing doing constant folding from variables
> with aliases of different types.  The formerly used fold_convert
> wasn't entirely correct even for the cases it handled and using
> a VIEW_CONVERT_EXPR avoids the ICE.  Reading from a larger alias
> will cause unfolded constants to appear but appearantly we handle
> that just "fine".
> 
>   b.0_1 = VIEW_CONVERT_EXPR(1);

If they have the same sizes, why not, but doesn't int have
different size from double and isn't VCE defined only for same sizes?

Jakub



[PATCH] libgomp, openmp: Add ompx_pinned_mem_alloc

2022-01-20 Thread Andrew Stubbs
This patch adds a new predefined allocator named ompx_pinned_mem_alloc 
as an extension to the OpenMP standard. It is intended as a convenient 
way to allocate pinned memory using the Linux support patch I posted 
recently. I anticipate it being used by compiler internals in future as 
part of a project to improve performance of existing code (such as 
benchmarks!) It is equivalent to a custom allocator with the pinned 
trait and a null fallback trait.


The name uses the "ompx" extension namespace proposed for OpenMP 5.2, 
rather than a "gomp" prefix, say, so that it can also be used by other 
projects implementing OpenMP in other toolchains (this requirement comes 
from our client who is funding this work).


OK for stage 1?

Andrewlibgomp, openmp: Add ompx_pinned_mem_alloc

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

libgomp/ChangeLog:

* allocator.c (omp_max_predefined_alloc): Update.
(omp_aligned_alloc): Support ompx_pinned_mem_alloc.
(omp_free): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 5ab161b6314..b1f41ccc0d4 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -32,7 +32,7 @@
 #include 
 #include 
 
-#define omp_max_predefined_alloc omp_thread_mem_alloc
+#define omp_max_predefined_alloc ompx_pinned_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.  */
 #ifndef MEMSPACE_ALLOC
@@ -64,6 +64,7 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] 
= {
   omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+  omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
 };
 
 struct omp_allocator_data
@@ -334,8 +335,11 @@ retry:
= (allocator_data
   ? allocator_data->memspace
   : predefined_alloc_mapping[allocator]);
-  ptr = MEMSPACE_ALLOC (memspace, new_size,
-   allocator_data && allocator_data->pinned);
+  int pinned __attribute__((unused))
+   = (allocator_data
+  ? allocator_data->pinned
+  : allocator == ompx_pinned_mem_alloc);
+  ptr = MEMSPACE_ALLOC (memspace, new_size, pinned);
   if (ptr == NULL)
goto fail;
 }
@@ -355,7 +359,8 @@ retry:
 fail:
   int fallback = (allocator_data
  ? allocator_data->fallback
- : allocator == omp_default_mem_alloc
+ : (allocator == omp_default_mem_alloc
+|| allocator == ompx_pinned_mem_alloc)
  ? omp_atv_null_fb
  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -442,7 +447,10 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
   pinned = allocator_data->pinned;
 }
   else
-memspace = predefined_alloc_mapping[data->allocator];
+{
+  memspace = predefined_alloc_mapping[data->allocator];
+  pinned = (data->allocator == ompx_pinned_mem_alloc);
+}
 
   MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
 }
@@ -553,8 +561,11 @@ retry:
= (allocator_data
   ? allocator_data->memspace
   : predefined_alloc_mapping[allocator]);
-  ptr = MEMSPACE_CALLOC (memspace, new_size,
-allocator_data && allocator_data->pinned);
+  int pinned __attribute__((unused))
+   = (allocator_data
+  ? allocator_data->pinned
+  : allocator == ompx_pinned_mem_alloc);
+  ptr = MEMSPACE_CALLOC (memspace, new_size, pinned);
   if (ptr == NULL)
goto fail;
 }
@@ -574,7 +585,8 @@ retry:
 fail:
   int fallback = (allocator_data
  ? allocator_data->fallback
- : allocator == omp_default_mem_alloc
+ : (allocator == omp_default_mem_alloc
+|| allocator == ompx_pinned_mem_alloc)
  ? omp_atv_null_fb
  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -719,11 +731,15 @@ retry:
   gomp_mutex_unlock (_data->lock);
 #endif
   if (prev_size)
-   new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-   data->size, new_size,
-   (free_allocator_data
-&& free_allocator_data->pinned),
- 

[PATCH] middle-end/100786 - constant folding from incompatible alias

2022-01-20 Thread Richard Biener via Gcc-patches
The following avoids us ICEing doing constant folding from variables
with aliases of different types.  The formerly used fold_convert
wasn't entirely correct even for the cases it handled and using
a VIEW_CONVERT_EXPR avoids the ICE.  Reading from a larger alias
will cause unfolded constants to appear but appearantly we handle
that just "fine".

  b.0_1 = VIEW_CONVERT_EXPR(1);

There is no obvious spot and way to just disable the constant folding
so I've not attempted to do that.

Boostrapped on x86_64-unknown-linux-gnu, testing in progress.

2022-01-20  Richard Biener  

PR middle-end/100786
* gimple-fold.cc (fold_stmt_1): Use a VIEW_CONVERT_EXPR on type
mismatches.

* gcc.dg/torture/pr100786.c: New testcase.
---
 gcc/gimple-fold.cc  | 2 +-
 gcc/testsuite/gcc.dg/torture/pr100786.c | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr100786.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 08d3cc214ff..a4f1fdabc18 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -6264,7 +6264,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
tree (*valueize) (tree))
if (new_rhs
&& !useless_type_conversion_p (TREE_TYPE (lhs),
   TREE_TYPE (new_rhs)))
- new_rhs = fold_convert (TREE_TYPE (lhs), new_rhs);
+ new_rhs = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), new_rhs);
if (new_rhs
&& (!inplace
|| get_gimple_rhs_num_ops (TREE_CODE (new_rhs)) < old_num_ops))
diff --git a/gcc/testsuite/gcc.dg/torture/pr100786.c 
b/gcc/testsuite/gcc.dg/torture/pr100786.c
new file mode 100644
index 000..42f4e485593
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr100786.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+
+const double a = 0;
+extern int b __attribute__((alias("a")));
+void inc() { b++; }
+
+const int a2 = 0;
+extern double b2 __attribute__((alias("a2")));
+void inc2() { b2+=1; }
-- 
2.31.1


[PATCH] target/100784 - avoid ICE with folding __builtin_ia32_shufpd

2022-01-20 Thread Richard Biener via Gcc-patches
This avoids ICEing when there is no LHS on the call by following
what foldings of other builtins do in , namely not folding.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Will push as obvious.

Richard.

2022-01-20  Richard Biener  

PR target/100784
* config/i386/i386.cc (ix86_gimple_fold_builtin): Check for
LHS before folding __builtin_ia32_shufpd and friends.
---
 gcc/config/i386/i386.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index af828868205..ad5a5caa413 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -18710,7 +18710,7 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
gimple_call_arg (stmt, n_args - 1)))
break;
   arg2 = gimple_call_arg (stmt, 2);
-  if (TREE_CODE (arg2) == INTEGER_CST)
+  if (TREE_CODE (arg2) == INTEGER_CST && gimple_call_lhs (stmt))
{
  unsigned HOST_WIDE_INT shuffle_mask = TREE_INT_CST_LOW (arg2);
  /* Check valid imm, refer to gcc.target/i386/testimm-10.c.  */
-- 
2.31.1


[committed] libstdc++: Use Clang attribute instead of __constinit

2022-01-20 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


Clang doesn't support the __constinit extension that we use pre-C++20,
but it does have its own equivalent attribute that can be used instead.

This makes it a little easier to use Clang to build libstdc++ (which
isn't supported. but is sometimes attempted for esoteric targets).

libstdc++-v3/ChangeLog:

* src/c++11/cxx11-ios_failure.cc (__constinit): Define as
equivalent attribute for Clang.
* src/c++11/future.cc (__constinit): Likewise.
* src/c++11/system_error.cc (__constinit): Likewise.
* src/c++17/memory_resource.cc (__constinit): Likewise.
---
 libstdc++-v3/src/c++11/cxx11-ios_failure.cc | 4 
 libstdc++-v3/src/c++11/future.cc| 4 
 libstdc++-v3/src/c++11/system_error.cc  | 4 
 libstdc++-v3/src/c++17/memory_resource.cc   | 4 
 4 files changed, 16 insertions(+)

diff --git a/libstdc++-v3/src/c++11/cxx11-ios_failure.cc 
b/libstdc++-v3/src/c++11/cxx11-ios_failure.cc
index ba4b1413bf9..14a7f9cfcb1 100644
--- a/libstdc++-v3/src/c++11/cxx11-ios_failure.cc
+++ b/libstdc++-v3/src/c++11/cxx11-ios_failure.cc
@@ -42,6 +42,10 @@
 # error This file should not be compiled for this configuration.
 #endif
 
+#if __has_cpp_attribute(clang::require_constant_initialization)
+#  define __constinit [[clang::require_constant_initialization]]
+#endif
+
 namespace
 {
   struct io_error_category final : std::error_category
diff --git a/libstdc++-v3/src/c++11/future.cc b/libstdc++-v3/src/c++11/future.cc
index 488ff17a1e6..c52c057ba1d 100644
--- a/libstdc++-v3/src/c++11/future.cc
+++ b/libstdc++-v3/src/c++11/future.cc
@@ -25,6 +25,10 @@
 #include 
 #include 
 
+#if __has_cpp_attribute(clang::require_constant_initialization)
+#  define __constinit [[clang::require_constant_initialization]]
+#endif
+
 namespace
 {
   struct future_error_category final : public std::error_category
diff --git a/libstdc++-v3/src/c++11/system_error.cc 
b/libstdc++-v3/src/c++11/system_error.cc
index 789f2b45382..7b1a5a20637 100644
--- a/libstdc++-v3/src/c++11/system_error.cc
+++ b/libstdc++-v3/src/c++11/system_error.cc
@@ -37,6 +37,10 @@
 #include 
 #endif
 
+#if __has_cpp_attribute(clang::require_constant_initialization)
+#  define __constinit [[clang::require_constant_initialization]]
+#endif
+
 namespace
 {
   using std::string;
diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 5cdb35df2cd..bb6334c9694 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -32,6 +32,10 @@
 # include // std::__exchange
 #endif
 
+#if __has_cpp_attribute(clang::require_constant_initialization)
+#  define __constinit [[clang::require_constant_initialization]]
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
-- 
2.31.1



Re: [committed] libstdc++: Fix libbacktrace build files

2022-01-20 Thread Jonathan Wakely via Gcc-patches
On Wed, 19 Jan 2022 at 15:00, Jonathan Wakely wrote:
>
> Tested x86_64-linux, pushed to trunk.
>
>
> This makes it possible to combine --enable-libstdcxx-debug with
> --enable-libstdcxx-backtrace, by adding a rule to src/Makefile to copy
> the backtrace-supported.h header into the src/debug/libbacktrace
> directory.
>
> Add libbacktrace path to testsuite flags so the tests can link without
> having the library installed.
>
> Also fix some warnings when running automake for the libbacktrace
> makefile.
>
> Use a per-library CPPFLAGS variable to fix:
>
> src/libbacktrace/Makefile.am:38: warning: AM_CPPFLAGS multiply defined in 
> condition TRUE ...
> fragment.am:43: ... 'AM_CPPFLAGS' previously defined here
> src/libbacktrace/Makefile.am:32:   'fragment.am' included from here
>
> Create symlinks to the libbacktrace sources to fix:
>
> src/libbacktrace/Makefile.am:55: warning: source file 
> '../../../libbacktrace/atomic.c' is in a subdirectory,
> src/libbacktrace/Makefile.am:55: but option 'subdir-objects' is disabled
>
> libstdc++-v3/ChangeLog:
>
> * scripts/testsuite_flags.in: Add src/libbacktrace/.libs to
> linker search paths.
> * src/Makefile.am: Fix src/debug/libbacktrace build.
> * src/Makefile.in: Regenerate.
> * src/libbacktrace/Makefile.am: Use per-library CPPFLAGS
> variable. Use symlinks for the source files.
> * src/libbacktrace/Makefile.in: Regenerate.
> ---
>  libstdc++-v3/scripts/testsuite_flags.in   |   3 +-
>  libstdc++-v3/src/Makefile.am  |  12 +-
>  libstdc++-v3/src/Makefile.in  |   9 +-
>  libstdc++-v3/src/libbacktrace/Makefile.am |  56 ++
>  libstdc++-v3/src/libbacktrace/Makefile.in | 128 +-
>  5 files changed, 132 insertions(+), 76 deletions(-)
>
> diff --git a/libstdc++-v3/scripts/testsuite_flags.in 
> b/libstdc++-v3/scripts/testsuite_flags.in
> index cf7f0f7411e..40dd3d3465e 100755
> --- a/libstdc++-v3/scripts/testsuite_flags.in
> +++ b/libstdc++-v3/scripts/testsuite_flags.in
> @@ -78,7 +78,8 @@ case ${query} in
>;;
>  --cxxldflags)
>SECTIONLDFLAGS="@SECTION_LDFLAGS@ @LIBICONV@
> -  -L${BUILD_DIR}/src/filesystem/.libs"
> +  -L${BUILD_DIR}/src/filesystem/.libs
> +  -L${BUILD_DIR}/src/libbacktrace/.libs"
>echo ${SECTIONLDFLAGS}
>;;
>  *)

This part broke nearly every test on macOS, because the linker warns
about unknown paths.

The src/filesystem/.libs dir only exists for
--enable-libstdcxx-filesystem-ts (which is enabled by default on
macOS) and src/libbacktrace/.libs only exists for
--enable-libstdcxx-backtrace (which is disabled by default on all
targets). The src/filesystem/.libs part has been there for years, so
must have been a latent problem on macOS.

Fixed by this patch, tested powerpc64le-linux and pushed to trunk.
commit 5929f253fcdbf24fd47706dd11aafdeac5e9ecb6
Author: Jonathan Wakely 
Date:   Thu Jan 20 11:15:27 2022

libstdc++: Only add valid -L paths to testsuite linker options

The MacOS linker warns about -L arguments that don't exist, which causes
all tests to fail for the defauly configuration (because libbacktrace
isn't built).

libstdc++-v3/ChangeLog:

* scripts/testsuite_flags.in: Only add src/filesystem/.libs and
src/libbacktrace/.libs to LDFLAGS if those directories exist.

diff --git a/libstdc++-v3/scripts/testsuite_flags.in 
b/libstdc++-v3/scripts/testsuite_flags.in
index 40dd3d3465e..18748f0f9ce 100755
--- a/libstdc++-v3/scripts/testsuite_flags.in
+++ b/libstdc++-v3/scripts/testsuite_flags.in
@@ -77,9 +77,15 @@ case ${query} in
   echo ${PCHFLAGS}
   ;;
 --cxxldflags)
-  SECTIONLDFLAGS="@SECTION_LDFLAGS@ @LIBICONV@
-  -L${BUILD_DIR}/src/filesystem/.libs
-  -L${BUILD_DIR}/src/libbacktrace/.libs"
+  FS_LDFLAGS=
+  BT_LDFLAGS=
+  if [ -d ${BUILD_DIR}/src/filesystem/.libs ]; then
+FS_LDFLAGS=-L${BUILD_DIR}/src/filesystem/.libs
+  fi
+  if [ -d ${BUILD_DIR}/src/libbacktrace/.libs ]; then
+BT_LDFLAGS=-L${BUILD_DIR}/src/libbacktrace/.libs
+  fi
+  SECTIONLDFLAGS="@SECTION_LDFLAGS@ @LIBICONV@ $FS_LDFLAGS $BT_LDFLAGS"
   echo ${SECTIONLDFLAGS}
   ;;
 *)


Re: [PING^3][PATCH, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-20 Thread Richard Sandiford via Gcc-patches
Thanks for the patch and sorry for the (very) slow review.

Dan Li  writes:
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 007b928c54b..9b3a35c06bf 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -56,6 +56,8 @@ static tree handle_cold_attribute (tree *, tree, tree, int, 
> bool *);
>   static tree handle_no_sanitize_attribute (tree *, tree, tree, int, bool *);
>   static tree handle_no_sanitize_address_attribute (tree *, tree, tree,
> int, bool *);
> +static tree handle_no_sanitize_shadow_call_stack_attribute (tree *, tree,
> +   tree, int, bool *);
>   static tree handle_no_sanitize_thread_attribute (tree *, tree, tree,
>int, bool *);
>   static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
> @@ -454,6 +456,10 @@ const struct attribute_spec c_common_attribute_table[] =
> handle_no_sanitize_attribute, NULL },
> { "no_sanitize_address",0, 0, true, false, false, false,
> handle_no_sanitize_address_attribute, NULL },
> +  { "no_sanitize_shadow_call_stack",
> +   0, 0, true, false, false, false,
> +   handle_no_sanitize_shadow_call_stack_attribute,
> +   NULL },
> { "no_sanitize_thread", 0, 0, true, false, false, false,
> handle_no_sanitize_thread_attribute, NULL },
> { "no_sanitize_undefined",  0, 0, true, false, false, false,
> @@ -1175,6 +1181,21 @@ handle_no_sanitize_address_attribute (tree *node, tree 
> name, tree, int,
> return NULL_TREE;
>   }
>   
> +/* Handle a "no_sanitize_shadow_call_stack" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +static tree
> +handle_no_sanitize_shadow_call_stack_attribute (tree *node, tree name,
> +   tree, int, bool *no_add_attrs)
> +{
> +  *no_add_attrs = true;
> +  if (TREE_CODE (*node) != FUNCTION_DECL)
> +warning (OPT_Wattributes, "%qE attribute ignored", name);
> +  else
> +add_no_sanitize_value (*node, SANITIZE_SHADOW_CALL_STACK);
> +
> +  return NULL_TREE;
> +}
> +

Do we need this?  I think these days the preference is to use the
general "no_sanitize" attribute with an argument, which is also the
syntax documented on the clang page.

We have to support no_sanitize_foo attributes for some of the early
sanitiser features, to avoid breaking backwards compatibility, but it
doesn't look like clang ever supported no_sanitize_shadow_call_stack.

>   /* Handle a "no_sanitize_thread" attribute; arguments as in
>  struct attribute_spec.handler.  */
>   
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 768e8fae136..150c015df21 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -893,6 +893,7 @@ void aarch64_register_pragmas (void);
>   void aarch64_relayout_simd_types (void);
>   void aarch64_reset_previous_fndecl (void);
>   bool aarch64_return_address_signing_enabled (void);
> +bool aarch64_shadow_call_stack_enabled (void);
>   bool aarch64_bti_enabled (void);
>   void aarch64_save_restore_target_globals (tree);
>   void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 699c105a42a..5a36a459f4e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -79,6 +79,7 @@
>   #include "tree-ssa-loop-niter.h"
>   #include "fractional-cost.h"
>   #include "rtlanal.h"
> +#include "asan.h"
>   
>   /* This file should be included last.  */
>   #include "target-def.h"
> @@ -7799,6 +7800,24 @@ aarch64_return_address_signing_enabled (void)
> && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)));
>   }
>   
> +/* Return TRUE if shadow call stack should be enabled for the current
> +   function, otherwise return FALSE.  */
> +
> +bool
> +aarch64_shadow_call_stack_enabled (void)
> +{
> +  /* This function should only be called after frame laid out.  */
> +  gcc_assert (cfun->machine->frame.laid_out);
> +
> +  if (crtl->calls_eh_return)
> +return false;
> +
> +  /* We only deal with a function if its LR is pushed onto stack
> + and attribute no_sanitize_shadow_call_stack is not specified.  */

(This would need to be updated if we do drop the specific attribute.)

> +  return (sanitize_flags_p (SANITIZE_SHADOW_CALL_STACK)
> +   && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0));
> +}
> +
>   /* Return TRUE if Branch Target Identification Mechanism is enabled.  */
>   bool
>   aarch64_bti_enabled (void)
> @@ -8810,6 +8829,10 @@ aarch64_expand_prologue (void)
> RTX_FRAME_RELATED_P (insn) = 1;
>   }
>   
> +  /* Push return address to shadow call stack.  */
> +  if 

[PATCH 7/7] arm: Add test for AES erratum mitigation

2022-01-20 Thread Richard Earnshaw via Gcc-patches

Add a testcase for the erratum mitigation.  To improve coverage
use -dp on the assembler output and match the pattern names (and where
needed the alternative number).

gcc/testsuite/ChangeLog:

* gcc.target/arm/crypto-vaese-erratum1.c: New test.
---
 .../gcc.target/arm/crypto-vaese-erratum1.c| 28 +++
 1 file changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/crypto-vaese-erratum1.c

diff --git a/gcc/testsuite/gcc.target/arm/crypto-vaese-erratum1.c b/gcc/testsuite/gcc.target/arm/crypto-vaese-erratum1.c
new file mode 100644
index 000..3f16688a8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/crypto-vaese-erratum1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_crypto_ok } */
+/* { dg-options "-O2 -mfix-cortex-a57-aes-1742098 -dp" } */
+/* { dg-add-options arm_crypto } */
+
+#include "arm_neon.h"
+
+uint8x16_t
+foo (uint8x16_t v)
+{
+  const uint8x16_t key1 = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			   0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+  const uint8x16_t key2 = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+			   0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f};
+  int i;
+
+  for (i = 0; i < 16; ++i)
+{
+  v = vaesmcq_u8 (vaeseq_u8 (v, key1));
+  v = vaesmcq_u8 (vaeseq_u8 (v, key2));
+}
+  return v;
+}
+
+/* { dg-final { scan-assembler "aese.8\tq\[0-9\]+, q\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "aes_op_protect/2" 2} } */
+/* { dg-final { scan-assembler-times "aes_op_protect/0" 1} } */
+/* { dg-final { scan-assembler-times "(?:aesmc|aese_fused)_protected" 1} } */


[PATCH 6/7] arm: elide some cases where the AES erratum workaround is not required.

2022-01-20 Thread Richard Earnshaw via Gcc-patches

Some common cases where the AES erratum workaround are not required
are when there are 64- or 128-bit loads from memory, moving a 128-bit
value from core registers, and where a 128-bit constant is being
loaded from a literal pool.  The loads may also be misaligned or
generated via a neon intrinsic function.

gcc/ChangeLog:

* config/arm/crypto.md (aes_op_protect): Allow moves from core
registers and from memory.
(aes_op_protect_misalign_load): New pattern.
(aes_op_protect_neon_vld1v16qi): New pattern.
---
 gcc/config/arm/crypto.md | 55 ++--
 1 file changed, 47 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
index df857352382..4c785073028 100644
--- a/gcc/config/arm/crypto.md
+++ b/gcc/config/arm/crypto.md
@@ -62,17 +62,56 @@ (define_insn "*crypto__insn"
   [(set_attr "type" "")]
 )
 
-; Mitigate against AES erratum on Cortex-A57 and Cortex-A72 by performing
-; a 128-bit operation on an operand producer.  This can be eliminated only
-; if we know that the operand was produced by a full-width operation.
-; V16QImode matches  for the AES instructions.
+;; Mitigate against AES erratum on Cortex-A57 and Cortex-A72 by
+;; performing a 128-bit operation on an operand producer.  This can be
+;; eliminated only if we know that the operand was produced by a
+;; full-width operation.  V16QImode matches  for the AES
+;; instructions.  Handle some very common cases where the source is
+;; known to be safe (transfers from core registers and memory).
 (define_insn "aes_op_protect"
-  [(set (match_operand:V16QI 0 "register_operand" "=w")
-	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")]
+  [(set (match_operand:V16QI 0 "register_operand" "=w,w,w")
+	(unspec:V16QI [(match_operand:V16QI 1 "general_operand" "w,r,Uni")]
+	 UNSPEC_AES_PROTECT))]
+  "TARGET_CRYPTO && fix_aes_erratum_1742098"
+  {
+switch (which_alternative)
+  {
+  case 0: return "vmov\t%q0, %q1";
+  case 1: return "vmov\t%e0, %Q1, %R1  @ V16QI\;vmov\t%f0, %J1, %K1";
+  case 2: return output_move_neon (operands);
+  default: gcc_unreachable ();
+  }
+  }
+  [(set_attr "type" "neon_move_q,neon_from_gp_q,neon_load1_4reg")
+   (set_attr "length" "4,8,8")
+   (set_attr "arm_pool_range" "*,*,1020")
+   (set_attr "thumb2_pool_range" "*,*,1018")
+   (set_attr "neg_pool_range" "*,*,996")]
+)
+
+;; Another safe case is when a movmisalign load is used as the source.
+(define_insn "*aes_op_protect_misalign_load"
+  [(set (match_operand:V16QI 0 "s_register_operand" "=w")
+	(unspec:V16QI
+	 [(unspec:V16QI
+	   [(match_operand:V16QI 1 "neon_permissive_struct_operand" "Um")]
+	   UNSPEC_MISALIGNED_ACCESS)]
 	 UNSPEC_AES_PROTECT))]
   "TARGET_CRYPTO && fix_aes_erratum_1742098"
-  "vmov\\t%q0, %q1"
-  [(set_attr "type" "neon_move_q")]
+  "vld1.8\t%{q0}, %A1"
+  [(set_attr "type" "neon_load1_1reg_q")]
+)
+
+;; Similarly for the vld1 intrinsic
+(define_insn "aes_op_protect_neon_vld1v16qi"
+  [(set (match_operand:V16QI 0 "s_register_operand" "=w")
+(unspec:V16QI
+	 [(unspec:V16QI [(match_operand:V16QI 1 "neon_struct_operand" "Um")]
+   UNSPEC_VLD1)]
+	 UNSPEC_AES_PROTECT))]
+  "TARGET_NEON"
+  "vld1.8\t%h0, %A1"
+  [(set_attr "type" "neon_load1_1reg_q")]
 )
 
 ;; An AESMC operation can feed directly into a subsequent AES


[PATCH 5/7] arm: suppress aes erratum when forwarding from aes

2022-01-20 Thread Richard Earnshaw via Gcc-patches

AES operations are commonly chained and since the result of one AES
operation is never a 32-bit value, they do not need an additional
mitigation instruction for the forwarded result.  We handle this
common case by adding additional patterns that allow for this.

gcc/ChangeLog:

* config/arm/crypto.md (crypto__protected):
New pattern.
(aarch32_crypto_aese_fused_protected): Likewise.
(aarch32_crypto_aesd_fused_protected): Likewise.
---
 gcc/config/arm/crypto.md | 50 
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
index fbee1829ce8..df857352382 100644
--- a/gcc/config/arm/crypto.md
+++ b/gcc/config/arm/crypto.md
@@ -75,6 +75,20 @@ (define_insn "aes_op_protect"
   [(set_attr "type" "neon_move_q")]
 )
 
+;; An AESMC operation can feed directly into a subsequent AES
+;; operation without needing mitigation.
+(define_insn "*crypto__protected"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec:
+	 [(unspec:
+	   [(match_operand: 1 "register_operand" "w")]
+	   CRYPTO_AESMC)]
+	 UNSPEC_AES_PROTECT))]
+  "TARGET_CRYPTO && fix_aes_erratum_1742098"
+  ".\\t%q0, %q1"
+  [(set_attr "type" "")]
+)
+
 ;; When AESE/AESMC fusion is enabled we really want to keep the two together
 ;; and enforce the register dependency without scheduling or register
 ;; allocation messing up the order or introducing moves inbetween.
@@ -95,6 +109,25 @@ (define_insn "*aarch32_crypto_aese_fused"
(set_attr "length" "8")]
 )
 
+;; And similarly when mitigation is enabled, but not needed in this
+;; case.
+(define_insn "*aarch32_crypto_aese_fused_protected"
+  [(set (match_operand:V16QI 0 "register_operand" "=w")
+	(unspec:V16QI
+	 [(unspec:V16QI
+	   [(unspec:V16QI [(xor:V16QI
+			(match_operand:V16QI 1 "register_operand" "%0")
+			(match_operand:V16QI 2 "register_operand" "w"))]
+	 UNSPEC_AESE)]
+	   UNSPEC_AESMC)]
+	 UNSPEC_AES_PROTECT))]
+  "TARGET_CRYPTO && fix_aes_erratum_1742098
+   && arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)"
+  "aese.8\\t%q0, %q2\;aesmc.8\\t%q0, %q0"
+  [(set_attr "type" "crypto_aese")
+   (set_attr "length" "8")]
+)
+
 ;; When AESD/AESIMC fusion is enabled we really want to keep the two together
 ;; and enforce the register dependency without scheduling or register
 ;; allocation messing up the order or introducing moves inbetween.
@@ -115,6 +148,23 @@ (define_insn "*aarch32_crypto_aesd_fused"
(set_attr "length" "8")]
 )
 
+(define_insn "*aarch32_crypto_aesd_fused_protected"
+  [(set (match_operand:V16QI 0 "register_operand" "=w")
+	(unspec:V16QI
+	 [(unspec:V16QI
+	   [(unspec:V16QI [(xor:V16QI
+			(match_operand:V16QI 1 "register_operand" "%0")
+			(match_operand:V16QI 2 "register_operand" "w"))]
+	 UNSPEC_AESD)]
+	   UNSPEC_AESIMC)]
+	 UNSPEC_AES_PROTECT))]
+  "TARGET_CRYPTO && fix_aes_erratum_1742098
+   && arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)"
+  "aesd.8\\t%q0, %q2\;aesimc.8\\t%q0, %q0"
+  [(set_attr "type" "crypto_aese")
+   (set_attr "length" "8")]
+)
+
 (define_insn "crypto_"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec:


[PATCH 4/7] arm: add basic mitigation for Cortex-A AES errata

2022-01-20 Thread Richard Earnshaw via Gcc-patches

This patch adds the basic patterns for mitigation of the erratum, but no
attempt is made at this point to optimize the results for the cases where
the erratum mitigation is not needed.

The mitigation is done by guaranteeing that the input operands are fed
from a full-width operation by using an identity operation on the input
values.

gcc/ChangeLog:

* config/arm/crypto.md (crypto_): Convert
to define_expand.  Add mitigation for the Cortex-A AES erratum
when enabled.
(*crypto__insn): New pattern, based
on original crypto_ insn.
(aes_op_protect): New pattern.
* config/arm/unspecs.md (unspec): Add UNSPEC_AES_PROTECT.
---
 gcc/config/arm/crypto.md  | 36 +++-
 gcc/config/arm/unspecs.md |  1 +
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
index 020dfba7dcf..fbee1829ce8 100644
--- a/gcc/config/arm/crypto.md
+++ b/gcc/config/arm/crypto.md
@@ -29,7 +29,28 @@ (define_insn "crypto_"
   [(set_attr "type" "")]
 )
 
-(define_insn "crypto_"
+(define_expand "crypto_"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec:
+		[(xor:
+		 (match_operand: 1 "register_operand" "%0")
+		 (match_operand: 2 "register_operand" "w"))]
+	CRYPTO_AES))]
+  "TARGET_CRYPTO"
+{
+  if (fix_aes_erratum_1742098)
+{
+  rtx op1_protect = gen_reg_rtx (V16QImode);
+  emit_insn (gen_aes_op_protect (op1_protect, operands[1]));
+  operands[1] = op1_protect;
+  rtx op2_protect = gen_reg_rtx (V16QImode);
+  emit_insn (gen_aes_op_protect (op2_protect, operands[2]));
+  operands[2] = op2_protect;
+}
+  /* Fall through to default expansion.  */
+})
+
+(define_insn "*crypto__insn"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec:
 	 [(xor:
@@ -41,6 +62,19 @@ (define_insn "crypto_"
   [(set_attr "type" "")]
 )
 
+; Mitigate against AES erratum on Cortex-A57 and Cortex-A72 by performing
+; a 128-bit operation on an operand producer.  This can be eliminated only
+; if we know that the operand was produced by a full-width operation.
+; V16QImode matches  for the AES instructions.
+(define_insn "aes_op_protect"
+  [(set (match_operand:V16QI 0 "register_operand" "=w")
+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")]
+	 UNSPEC_AES_PROTECT))]
+  "TARGET_CRYPTO && fix_aes_erratum_1742098"
+  "vmov\\t%q0, %q1"
+  [(set_attr "type" "neon_move_q")]
+)
+
 ;; When AESE/AESMC fusion is enabled we really want to keep the two together
 ;; and enforce the register dependency without scheduling or register
 ;; allocation messing up the order or introducing moves inbetween.
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 2782af08834..7748e784379 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -270,6 +270,7 @@ (define_c_enum "unspec" [
   UNSPEC_AESE
   UNSPEC_AESIMC
   UNSPEC_AESMC
+  UNSPEC_AES_PROTECT
   UNSPEC_SHA1C
   UNSPEC_SHA1M
   UNSPEC_SHA1P


[PATCH 3/7] arm: Add option for mitigating against Cortex-A CPU erratum for AES

2022-01-20 Thread Richard Earnshaw via Gcc-patches

Add a new option -mfix-cortex-a-aes for enabling the Cortex-A AES
erratum work-around and enable it automatically for the affected
products (Cortex-A57 and Cortex-A72).

gcc/ChangeLog:

* config/arm/arm-cpus.in (quirk_aes_1742098): New quirk feature
(ALL_QUIRKS): Add it.
(cortex-a57, cortex-a72): Enable it.
(cortex-a57.cortex-a53, cortex-a72.cortex-a53): Likewise.
* config/arm/arm.opt (mfix-cortex-a57-aes-1742098): New command-line
option.
(mfix-cortex-a72-aes-1655431): New option alias.
* config/arm/arm.cc (arm_option_override): Handle default settings
for AES erratum switch.
* doc/invoke.texi (Arm Options): Document new options.
---
 gcc/config/arm/arm-cpus.in |  9 -
 gcc/config/arm/arm.cc  |  9 +
 gcc/config/arm/arm.opt | 10 ++
 gcc/doc/invoke.texi| 11 +++
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 499e82d790d..0d3082b569f 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -192,6 +192,9 @@ define feature quirk_cm3_ldrd
 # v8-m/v8.1-m VLLDM errata.
 define feature quirk_vlldm
 
+# AES errata on some Cortex-A parts
+define feature quirk_aes_1742098
+
 # Don't use .cpu assembly directive
 define feature quirk_no_asmcpu
 
@@ -329,7 +332,7 @@ define implied vfp_base MVE MVE_FP ALL_FP
 # architectures.
 # xscale isn't really a 'quirk', but it isn't an architecture either and we
 # need to ignore it for matching purposes.
-define fgroup ALL_QUIRKS   quirk_no_volatile_ce quirk_armv6kz quirk_cm3_ldrd quirk_vlldm xscale quirk_no_asmcpu
+define fgroup ALL_QUIRKS   quirk_no_volatile_ce quirk_armv6kz quirk_cm3_ldrd quirk_vlldm xscale quirk_no_asmcpu quirk_aes_1742098
 
 define fgroup IGNORE_FOR_MULTILIB cdecp0 cdecp1 cdecp2 cdecp3 cdecp4 cdecp5 cdecp6 cdecp7
 
@@ -1342,6 +1345,7 @@ begin cpu cortex-a57
  cname cortexa57
  tune flags LDSCHED
  architecture armv8-a+crc+simd
+ isa quirk_aes_1742098
  option crypto add FP_ARMv8 CRYPTO
  costs cortex_a57
  vendor 41
@@ -1353,6 +1357,7 @@ begin cpu cortex-a72
  tune for cortex-a57
  tune flags LDSCHED
  architecture armv8-a+crc+simd
+ isa quirk_aes_1742098
  option crypto add FP_ARMv8 CRYPTO
  costs cortex_a57
  vendor 41
@@ -1391,6 +1396,7 @@ begin cpu cortex-a57.cortex-a53
  tune for cortex-a53
  tune flags LDSCHED
  architecture armv8-a+crc+simd
+ isa quirk_aes_1742098
  option crypto add FP_ARMv8 CRYPTO
  costs cortex_a57
 end cpu cortex-a57.cortex-a53
@@ -1400,6 +1406,7 @@ begin cpu cortex-a72.cortex-a53
  tune for cortex-a53
  tune flags LDSCHED
  architecture armv8-a+crc+simd
+ isa quirk_aes_1742098
  option crypto add FP_ARMv8 CRYPTO
  costs cortex_a57
 end cpu cortex-a72.cortex-a53
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 7825e364c01..04354b36606 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -3638,6 +3638,15 @@ arm_option_override (void)
 	fix_vlldm = 0;
 }
 
+  /* Enable fix_aes by default if required.  */
+  if (fix_aes_erratum_1742098 == 2)
+{
+  if (bitmap_bit_p (arm_active_target.isa, isa_bit_quirk_aes_1742098))
+	fix_aes_erratum_1742098 = 1;
+  else
+	fix_aes_erratum_1742098 = 0;
+}
+
   /* Hot/Cold partitioning is not currently supported, since we can't
  handle literal pool placement in that case.  */
   if (flag_reorder_blocks_and_partition)
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 587fc932f96..2a4f165033a 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -272,6 +272,16 @@ mfix-cmse-cve-2021-35465
 Target Var(fix_vlldm) Init(2)
 Mitigate issues with VLLDM on some M-profile devices (CVE-2021-35465).
 
+mfix-cortex-a57-aes-1742098
+Target Var(fix_aes_erratum_1742098) Init(2) Save
+Mitigate issues with AES instructions on Cortex-A57 and Cortex-A72.
+Arm erratum #1742098
+
+mfix-cortex-a72-aes-1655431
+Target Alias(mfix-cortex-a57-aes-1742098)
+Mitigate issues with AES instructions on Cortex-A57 and Cortex-A72.
+Arm erratum #1655431
+
 munaligned-access
 Target Var(unaligned_access) Init(2) Save
 Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 58751c48b8e..67693d6c5cf 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -812,6 +812,8 @@ Objective-C and Objective-C++ Dialects}.
 -mtp=@var{name}  -mtls-dialect=@var{dialect} @gol
 -mword-relocations @gol
 -mfix-cortex-m3-ldrd @gol
+-mfix-cortex-a57-aes-1742098 @gol
+-mfix-cortex-a72-aes-1655431 @gol
 -munaligned-access @gol
 -mneon-for-64bits @gol
 -mslow-flash-data @gol
@@ -21281,6 +21283,15 @@ with overlapping destination and base registers are used.  This option avoids
 generating these instructions.  This option is enabled by default when
 @option{-mcpu=cortex-m3} is specified.
 
+@item -mfix-cortex-a57-aes-1742098
+@itemx -mno-fix-cortex-a57-aes-1742098
+@itemx 

[PATCH 2/7] arm: Consistently use crypto_mode attribute in crypto patterns

2022-01-20 Thread Richard Earnshaw via Gcc-patches

A couple of patterns in the crypto support code were hard-coding the
mode rather than using the iterators.  While not incorrect, it was
slightly confusing, so adapt those patterns to the style of the rest
of the file.

Also fix some white space issues.

gcc/ChangeLog:

* config/arm/crypto.md (crypto_): Use
 rather than hard-coding the mode.
(crypto_): Fix white space.
(crypto_): Likewise.
(*aarch32_crypto_aese_fused): Likewise.
(*aarch32_crypto_aesd_fused): Likewise.
(crypto_): Likewise.
(crypto_): Likewise.
(crypto_sha1h_lb): Likewise.
(crypto_vmullp64): Likewise.
(crypto_): Likewise.
(crypto__lb): Likewise.
---
 gcc/config/arm/crypto.md | 94 
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
index 6071ea17eac..020dfba7dcf 100644
--- a/gcc/config/arm/crypto.md
+++ b/gcc/config/arm/crypto.md
@@ -22,7 +22,7 @@
 (define_insn "crypto_"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec:
-		[(match_operand: 1 "register_operand" "w")]
+	 [(match_operand: 1 "register_operand" "w")]
 	 CRYPTO_AESMC))]
   "TARGET_CRYPTO"
   ".\\t%q0, %q1"
@@ -30,12 +30,12 @@ (define_insn "crypto_"
 )
 
 (define_insn "crypto_"
-  [(set (match_operand:V16QI 0 "register_operand" "=w")
-	(unspec:V16QI
-		[(xor:V16QI
-		 (match_operand:V16QI 1 "register_operand" "%0")
-		 (match_operand:V16QI 2 "register_operand" "w"))]
-	CRYPTO_AES))]
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec:
+	 [(xor:
+	   (match_operand: 1 "register_operand" "%0")
+	   (match_operand: 2 "register_operand" "w"))]
+	 CRYPTO_AES))]
   "TARGET_CRYPTO"
   ".\\t%q0, %q2"
   [(set_attr "type" "")]
@@ -44,17 +44,16 @@ (define_insn "crypto_"
 ;; When AESE/AESMC fusion is enabled we really want to keep the two together
 ;; and enforce the register dependency without scheduling or register
 ;; allocation messing up the order or introducing moves inbetween.
-;;  Mash the two together during combine.
+;; Mash the two together during combine.
 
 (define_insn "*aarch32_crypto_aese_fused"
   [(set (match_operand:V16QI 0 "register_operand" "=w")
 	(unspec:V16QI
-		[(unspec:V16QI
-		[(xor:V16QI
-			(match_operand:V16QI 1 "register_operand" "%0")
-			(match_operand:V16QI 2 "register_operand" "w"))]
-		UNSPEC_AESE)]
-	UNSPEC_AESMC))]
+	 [(unspec:V16QI [(xor:V16QI
+			  (match_operand:V16QI 1 "register_operand" "%0")
+			  (match_operand:V16QI 2 "register_operand" "w"))]
+	   UNSPEC_AESE)]
+	 UNSPEC_AESMC))]
   "TARGET_CRYPTO
&& arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)"
   "aese.8\\t%q0, %q2\;aesmc.8\\t%q0, %q0"
@@ -65,17 +64,16 @@ (define_insn "*aarch32_crypto_aese_fused"
 ;; When AESD/AESIMC fusion is enabled we really want to keep the two together
 ;; and enforce the register dependency without scheduling or register
 ;; allocation messing up the order or introducing moves inbetween.
-;;  Mash the two together during combine.
+;; Mash the two together during combine.
 
 (define_insn "*aarch32_crypto_aesd_fused"
   [(set (match_operand:V16QI 0 "register_operand" "=w")
 	(unspec:V16QI
-		[(unspec:V16QI
-		[(xor:V16QI
-			(match_operand:V16QI 1 "register_operand" "%0")
-			(match_operand:V16QI 2 "register_operand" "w"))]
-		UNSPEC_AESD)]
-	UNSPEC_AESIMC))]
+	 [(unspec:V16QI [(xor:V16QI
+			  (match_operand:V16QI 1 "register_operand" "%0")
+			  (match_operand:V16QI 2 "register_operand" "w"))]
+	   UNSPEC_AESD)]
+	 UNSPEC_AESIMC))]
   "TARGET_CRYPTO
&& arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)"
   "aesd.8\\t%q0, %q2\;aesimc.8\\t%q0, %q0"
@@ -86,9 +84,9 @@ (define_insn "*aarch32_crypto_aesd_fused"
 (define_insn "crypto_"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec:
-		[(match_operand: 1 "register_operand" "0")
-		(match_operand: 2 "register_operand" "w")]
-	CRYPTO_BINARY))]
+	 [(match_operand: 1 "register_operand" "0")
+	  (match_operand: 2 "register_operand" "w")]
+	 CRYPTO_BINARY))]
   "TARGET_CRYPTO"
   ".\\t%q0, %q2"
   [(set_attr "type" "")]
@@ -96,18 +94,20 @@ (define_insn "crypto_"
 
 (define_insn "crypto_"
   [(set (match_operand: 0 "register_operand" "=w")
-(unspec: [(match_operand: 1 "register_operand" "0")
-  (match_operand: 2 "register_operand" "w")
-  (match_operand: 3 "register_operand" "w")]
- CRYPTO_TERNARY))]
+	(unspec:
+	 [(match_operand: 1 "register_operand" "0")
+	  (match_operand: 2 "register_operand" "w")
+	  (match_operand: 3 "register_operand" "w")]
+	 CRYPTO_TERNARY))]
   "TARGET_CRYPTO"
   ".\\t%q0, %q2, %q3"
   [(set_attr "type" "")]
 )
 
-/* The vec_select operation always selects index 0 from the lower V2SI subreg
-   of the V4SI, adjusted for endianness. Required due to neon_vget_lane and
-   neon_set_lane that change the element ordering in memory for big-endian.  */
+;; The vec_select operation always selects 

[PATCH 1/7] arm: Disambiguate multiple crypto patterns with the same name.

2022-01-20 Thread Richard Earnshaw via Gcc-patches

No functional change, but arm/crypto.md has multiple pattenrs all
called crypto_, which makes references to them
ambiguous, so add the iterator base to the pattern name so that it is
distinct in the commit logs.

gcc/ChangeLog:

* config/arm/crypto.md (crypto_): Add
iterator to pattern name to disambiguate.
(crypto_): Likewise.
(crypto_): Likewise.
(crypto_): Likewise.
(crypto_): Likewise.
(crypto__lb): Likewise.
---
 gcc/config/arm/crypto.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
index 2425641e33f..6071ea17eac 100644
--- a/gcc/config/arm/crypto.md
+++ b/gcc/config/arm/crypto.md
@@ -19,7 +19,7 @@
 ;; .
 
 
-(define_insn "crypto_"
+(define_insn "crypto_"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec:
 		[(match_operand: 1 "register_operand" "w")]
@@ -29,7 +29,7 @@ (define_insn "crypto_"
   [(set_attr "type" "")]
 )
 
-(define_insn "crypto_"
+(define_insn "crypto_"
   [(set (match_operand:V16QI 0 "register_operand" "=w")
 	(unspec:V16QI
 		[(xor:V16QI
@@ -83,7 +83,7 @@ (define_insn "*aarch32_crypto_aesd_fused"
(set_attr "length" "8")]
 )
 
-(define_insn "crypto_"
+(define_insn "crypto_"
   [(set (match_operand: 0 "register_operand" "=w")
 	(unspec:
 		[(match_operand: 1 "register_operand" "0")
@@ -94,7 +94,7 @@ (define_insn "crypto_"
   [(set_attr "type" "")]
 )
 
-(define_insn "crypto_"
+(define_insn "crypto_"
   [(set (match_operand: 0 "register_operand" "=w")
 (unspec: [(match_operand: 1 "register_operand" "0")
   (match_operand: 2 "register_operand" "w")
@@ -145,7 +145,7 @@ (define_insn "crypto_vmullp64"
of the V4SI, adjusted for endianness. Required due to neon_vget_lane and
neon_set_lane that change the element ordering in memory for big-endian.  */
 
-(define_expand "crypto_"
+(define_expand "crypto_"
   [(set (match_operand:V4SI 0 "register_operand")
 	(unspec:
 		[(match_operand: 1 "register_operand")
@@ -160,7 +160,7 @@ (define_expand "crypto_"
   DONE;
 })
 
-(define_insn "crypto__lb"
+(define_insn "crypto__lb"
   [(set (match_operand:V4SI 0 "register_operand" "=w")
 (unspec:
  [(match_operand: 1 "register_operand" "0")


[committed 0/7] Arm: mitigation for AES erratum on Cortex-a57 and Cortex-A72

2022-01-20 Thread Richard Earnshaw via Gcc-patches
The Cortex-A57 and Cortex-A72 processors have an erratum (#1742098
and #1655431 respectively) when running in Arm (32-bit) mode where an
instruction producing a 32-bit result that feeds into an AES encode or
decode can lead to an incorrect result.  The erratum does not occur when
operating in 64-bit (aarch64) mode.

The mitigation approach taken by this patch series is in two parts.
Firstly, to ensure that this cannot happen by inserting a special
128-bit copy operation before each operand to a potentially vulnerable
sequence.  This is overkill, but safe.  The copy operations are
independent instructions, so can be migrated out of loops by the GCSE
pass or other optimizations.

Secondly, we then allow the copy operations to be merged with common
cases where the producer is known to be unaffected by the erratum.
Currently that includes other AES instructions, loads and certain move
operations.

In combination this eliminates the majority of redundant instructions
for normal use cases.  I did consider adding a custom pass to do late
insertion of the mitigation, but decided against it.  A trivial
implemenation would be unable to hoist operations out of the loop, while
a more complex implementation would require a lot of data-flow
analysis to find the optimum location for each mitigation and might
need to insert mitigation instructions on multiple paths.  The pass
would be complex and likely to have difficult to test corner cases.

The series consists of 7 patches.  The first two patches are cleanups
to the existing code.  Patch 3 adds the command line options to enable
the mitigation and the corresponding documentation.  Patch 4 adds the
basic mitigation operation and patches 5 and 6 add various additional
patterns to elide the mitigation for common cases where it is not
needed.  The final patch adds a testcase.

Richard Earnshaw (7):
  arm: Disambiguate multiple crypto patterns with the same name.
  arm: Consistently use crypto_mode attribute in crypto patterns
  arm: Add option for mitigating against Cortex-A CPU erratum for AES
  arm: add basic mitigation for Cortex-A AES errata
  arm: suppress aes erratum when forwarding from aes
  arm: elide some cases where the AES erratum workaround is not
required.
  arm: Add test for AES erratum mitigation

 gcc/config/arm/arm-cpus.in|   9 +-
 gcc/config/arm/arm.cc |   9 +
 gcc/config/arm/arm.opt|  10 +
 gcc/config/arm/crypto.md  | 227 ++
 gcc/config/arm/unspecs.md |   1 +
 gcc/doc/invoke.texi   |  11 +
 .../gcc.target/arm/crypto-vaese-erratum1.c|  28 +++
 7 files changed, 242 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/crypto-vaese-erratum1.c

-- 
2.25.1



Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-20 Thread Andre Vieira (lists) via Gcc-patches



On 20/01/2022 10:45, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:

The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
 iterator instead of HI in mve_vmvnq_n_.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
for operand 1.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 171dd384133..5c3b34dce3a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
   (define_insn "mve_vmvnq_n_"
 [
  (set (match_operand:MVE_5 0 "s_register_operand" "=w")
-   (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
+   (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
 VMVNQ_N))
 ]
 "TARGET_HAVE_MVE"

While fixing this it might be good to fix the constraint and predicate
inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid
the compiler generating wrong assembly, and instead it would probably
lead to the compiler using a load literal.

FWIW: for cases like this, I think it's better to define a predicate
only (not a constraint).  By design, the only time that constraints
are used independently of predicates is during RA, and there's nothing
that RA can/should do for immediate operands.

Thanks,
Richard
Yeah, if I use a predicate it doesn't like the fact that we are passing 
an argument 'imm' rather than actual immediate. To use a constraint like 
DL I'd also need to change the builtin to take a vector of immediates, 
since we can't use immediates as they don't have a mode and the 
constraint needs to be able to know what mode we are using.


This will have to wait...


Re: [PATCH] dwarf2out: Fix -gsplit-dwarf on riscv [PR103874]

2022-01-20 Thread Richard Biener via Gcc-patches
On Thu, 20 Jan 2022, Jakub Jelinek wrote:

> Hi!
> 
> riscv*-*-* are the only modern targets that !HAVE_AS_LEB128 (apparently
> due to some aggressive linker optimizations).
> As the following testcase shows, we mishandle in index_rnglists the
> !HAVE_AS_LEB128 && !have_multiple_function_sections case.
> 
> output_rnglists does roughly:
>   FOR_EACH_VEC_SAFE_ELT (ranges_table, i, r)
> {
> ...
>   if (block_num > 0)
> {
> ...
>   if (HAVE_AS_LEB128)
> {
>   if (!have_multiple_function_sections)
>   {
> // code not using r->*_entry
> continue;
>   }
> // code that sometimes doesn't use r->*_entry,
> // sometimes r->begin_entry
>   }
> else if (dwarf_split_debug_info)
>   {
> // code that uses both r->begin_entry and r->end_entry
>   }
> else
>   {
> // code not using r->*_entry
>   }
> }
>   else if (block_num < 0)
>   {
>   if (!have_multiple_function_sections)
> gcc_unreachable ();
> ...
>   }
> }
> and index_rnglists is what sets up those r->{begin,end}_entry members.
> The code did an early if (!have_multiple_function_sections) continue;
> which is fine for the HAVE_AS_LEB128 case, because r->*_entry is not
> used in that case, but not for !HAVE_AS_LEB128 that uses it anyway.
> 
> Fixed thusly, tested on the testcase with x86_64 -> riscv64 cross,
> bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2022-01-20  Jakub Jelinek  
> 
>   PR debug/103874
>   * dwarf2out.cc (index_rnglists): For !HAVE_AS_LEB128 and
>   block_num > 0, index entry even if !have_multiple_function_sections.
> 
>   * gcc.dg/debug/dwarf2/pr103874.c: New test.
> 
> --- gcc/dwarf2out.cc.jj   2022-01-18 11:58:59.0 +0100
> +++ gcc/dwarf2out.cc  2022-01-19 13:30:08.936008194 +0100
> @@ -12094,9 +12094,10 @@ index_rnglists (void)
>if (r->label && r->idx != DW_RANGES_IDX_SKELETON)
>   r->idx = rnglist_idx++;
>  
> -  if (!have_multiple_function_sections)
> - continue;
>int block_num = r->num;
> +  if ((HAVE_AS_LEB128 || block_num < 0)
> +   && !have_multiple_function_sections)
> + continue;
>if (HAVE_AS_LEB128 && (r->label || r->maybe_new_sec))
>   base = false;
>if (block_num > 0)
> --- gcc/testsuite/gcc.dg/debug/dwarf2/pr103874.c.jj   2022-01-19 
> 13:35:25.485631843 +0100
> +++ gcc/testsuite/gcc.dg/debug/dwarf2/pr103874.c  2022-01-19 
> 13:36:53.608413534 +0100
> @@ -0,0 +1,12 @@
> +/* PR debug/103874 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -g -gsplit-dwarf -dA 
> -Wno-implicit-function-declaration" } */
> +
> +void
> +foo (void)
> +{
> +  {
> +bar ();
> +baz ();
> +  }
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


[PATCH] dwarf2out: Fix -gsplit-dwarf on riscv [PR103874]

2022-01-20 Thread Jakub Jelinek via Gcc-patches
Hi!

riscv*-*-* are the only modern targets that !HAVE_AS_LEB128 (apparently
due to some aggressive linker optimizations).
As the following testcase shows, we mishandle in index_rnglists the
!HAVE_AS_LEB128 && !have_multiple_function_sections case.

output_rnglists does roughly:
  FOR_EACH_VEC_SAFE_ELT (ranges_table, i, r)
{
...
  if (block_num > 0)
{
...
  if (HAVE_AS_LEB128)
{
  if (!have_multiple_function_sections)
{
  // code not using r->*_entry
  continue;
}
  // code that sometimes doesn't use r->*_entry,
  // sometimes r->begin_entry
}
  else if (dwarf_split_debug_info)
{
  // code that uses both r->begin_entry and r->end_entry
}
  else
{
  // code not using r->*_entry
}
}
  else if (block_num < 0)
{
  if (!have_multiple_function_sections)
gcc_unreachable ();
...
}
}
and index_rnglists is what sets up those r->{begin,end}_entry members.
The code did an early if (!have_multiple_function_sections) continue;
which is fine for the HAVE_AS_LEB128 case, because r->*_entry is not
used in that case, but not for !HAVE_AS_LEB128 that uses it anyway.

Fixed thusly, tested on the testcase with x86_64 -> riscv64 cross,
bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-20  Jakub Jelinek  

PR debug/103874
* dwarf2out.cc (index_rnglists): For !HAVE_AS_LEB128 and
block_num > 0, index entry even if !have_multiple_function_sections.

* gcc.dg/debug/dwarf2/pr103874.c: New test.

--- gcc/dwarf2out.cc.jj 2022-01-18 11:58:59.0 +0100
+++ gcc/dwarf2out.cc2022-01-19 13:30:08.936008194 +0100
@@ -12094,9 +12094,10 @@ index_rnglists (void)
   if (r->label && r->idx != DW_RANGES_IDX_SKELETON)
r->idx = rnglist_idx++;
 
-  if (!have_multiple_function_sections)
-   continue;
   int block_num = r->num;
+  if ((HAVE_AS_LEB128 || block_num < 0)
+ && !have_multiple_function_sections)
+   continue;
   if (HAVE_AS_LEB128 && (r->label || r->maybe_new_sec))
base = false;
   if (block_num > 0)
--- gcc/testsuite/gcc.dg/debug/dwarf2/pr103874.c.jj 2022-01-19 
13:35:25.485631843 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/pr103874.c2022-01-19 
13:36:53.608413534 +0100
@@ -0,0 +1,12 @@
+/* PR debug/103874 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g -gsplit-dwarf -dA -Wno-implicit-function-declaration" 
} */
+
+void
+foo (void)
+{
+  {
+bar ();
+baz ();
+  }
+}

Jakub



Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-20 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
>>  iterator instead of HI in mve_vmvnq_n_.
>>
>> 2022-01-13  Christophe Lyon  
>>
>>  gcc/
>>  * config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
>>  for operand 1.
>>
>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> index 171dd384133..5c3b34dce3a 100644
>> --- a/gcc/config/arm/mve.md
>> +++ b/gcc/config/arm/mve.md
>> @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
>>   (define_insn "mve_vmvnq_n_"
>> [
>>  (set (match_operand:MVE_5 0 "s_register_operand" "=w")
>> -(unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
>> +(unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
>>   VMVNQ_N))
>> ]
>> "TARGET_HAVE_MVE"
>
> While fixing this it might be good to fix the constraint and predicate 
> inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid 
> the compiler generating wrong assembly, and instead it would probably 
> lead to the compiler using a load literal.

FWIW: for cases like this, I think it's better to define a predicate
only (not a constraint).  By design, the only time that constraints
are used independently of predicates is during RA, and there's nothing
that RA can/should do for immediate operands.

Thanks,
Richard


Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-20 Thread Andre Vieira (lists) via Gcc-patches



On 20/01/2022 10:40, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

On 20/01/2022 09:14, Christophe Lyon wrote:


On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches
 wrote:

 Hi Christophe,

 On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
 > At some point during the development of this patch series, it
 appeared
 > that in some cases the register allocator wants “VPR or general”
 > rather than “VPR or general or FP” (which is the same thing as
 > ALL_REGS).  The series does not seem to require this anymore, but it
 > seems to be a good thing to do anyway, to give the register
 allocator
 > more freedom.
 Not sure I fully understand this, but I guess it creates an extra
 class
 the register allocator can use to group things that can go into
 VPR or
 general reg?
 >
 > CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
 > regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
 > -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
 I have not looked into this failure, but ...
 >
 > 2022-01-13  Christophe Lyon  
 >
 >       gcc/
 >       * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
 >       (REG_CLASS_NAMES): Likewise.
 >       (REG_CLASS_CONTENTS): Likewise.
 >       (CLASS_MAX_NREGS): Handle VPR.
 >       * config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
 >
 > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 > index bb75921f32d..c3559ca8703 100644
 > --- a/gcc/config/arm/arm.c
 > +++ b/gcc/config/arm/arm.c
 > @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
 >   static unsigned int
 >   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
 >   {
 > +  if (IS_VPR_REGNUM (regno))
 > +    return CEIL (GET_MODE_SIZE (mode), 2);
 When do we ever want to use more than 1 register for VPR?


That was tricky.
Richard Sandiford helped me analyze the problem, I guess I can quote him:

RS> I think the problem is a combination of a few things:
RS>
RS> (1) arm_hard_regno_mode_ok rejects SImode in VPR, so SImode moves
RS>     to or from the VPR_REG class get the maximum cost.
RS>
RS> (2) IRA thinks from CLASS_MAX_NREGS and arm_hard_regno_nregs that
RS>    VPR is big enough to hold SImode.
RS>
RS> (3) If a class C1 is a superset of a class C2, and if C2 is big enough
RS>     to hold a mode M, IRA ensures that move costs for M involving C1
RS>     are >= move costs for M involving C2.
RS>
RS> (1) is correct but (2) isn't.  IMO (3) is dubious: the trigger should
RS> be whether C2 is actually allowed to hold M, not whether C2 is big
enough
RS> to hold M.  However, changing that is likely to cause problems
elsewhere,
RS> and could lead to classes like GENERAL_AND_FP_REGS being used when
RS> FP_REGS are disabled (which might be confusing).
RS>

I understand everything up until here.


RS> “Fixing” (2) using:
RS>
RS>  CEIL (GET_MODE_SIZE (mode), 2)

I was wondering why not just return '1' for VPR_REGNUM, rather than use
the fact that the mode-size we use for VPR is 2 bytes, so diving it by 2
makes 1. Unless we ever decide to use a larger mode for VPR, maybe
that's what this is trying to address? I can't imagine we would ever
need to though since for MVE there is only one VPR register and it is
always 16-bits. Just feels overly complicated to me.

For context, that's what the first version did, and is what led to
the reload failure.  The above is trying to explain why returning
1 doesn't work in practice.

To put (2) a slightly different way: if the port says VPR_REGNUM takes
1 register regardless of the mode passed in, the port is effectively
saying that VPR (and thus VPR_REGNUM) has enough bits to hold *any* mode
passed in (SImode, DImode, etc.).  It actually makes VPR seem bigger
than a general register.

In the particular case of the reload failure, returning 1 effectively
tells the RA that VPR is big enough to hold SImode, but that the port is
nevertheless choosing not to allow VPR to be used to hold SImode.  This
then “infects” the SImode cost of GENERAL_AND_VPR_REGS.

Thanks,
Richard

Ah OK thanks for the explanation.



Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-20 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> On 20/01/2022 09:14, Christophe Lyon wrote:
>>
>>
>> On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches 
>>  wrote:
>>
>> Hi Christophe,
>>
>> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>> > At some point during the development of this patch series, it
>> appeared
>> > that in some cases the register allocator wants “VPR or general”
>> > rather than “VPR or general or FP” (which is the same thing as
>> > ALL_REGS).  The series does not seem to require this anymore, but it
>> > seems to be a good thing to do anyway, to give the register
>> allocator
>> > more freedom.
>> Not sure I fully understand this, but I guess it creates an extra
>> class
>> the register allocator can use to group things that can go into
>> VPR or
>> general reg?
>> >
>> > CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
>> > regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
>> > -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
>> I have not looked into this failure, but ...
>> >
>> > 2022-01-13  Christophe Lyon  
>> >
>> >       gcc/
>> >       * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
>> >       (REG_CLASS_NAMES): Likewise.
>> >       (REG_CLASS_CONTENTS): Likewise.
>> >       (CLASS_MAX_NREGS): Handle VPR.
>> >       * config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
>> >
>> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> > index bb75921f32d..c3559ca8703 100644
>> > --- a/gcc/config/arm/arm.c
>> > +++ b/gcc/config/arm/arm.c
>> > @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
>> >   static unsigned int
>> >   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
>> >   {
>> > +  if (IS_VPR_REGNUM (regno))
>> > +    return CEIL (GET_MODE_SIZE (mode), 2);
>> When do we ever want to use more than 1 register for VPR?
>>
>>
>> That was tricky.
>> Richard Sandiford helped me analyze the problem, I guess I can quote him:
>>
>> RS> I think the problem is a combination of a few things:
>> RS>
>> RS> (1) arm_hard_regno_mode_ok rejects SImode in VPR, so SImode moves
>> RS>     to or from the VPR_REG class get the maximum cost.
>> RS>
>> RS> (2) IRA thinks from CLASS_MAX_NREGS and arm_hard_regno_nregs that
>> RS>    VPR is big enough to hold SImode.
>> RS>
>> RS> (3) If a class C1 is a superset of a class C2, and if C2 is big enough
>> RS>     to hold a mode M, IRA ensures that move costs for M involving C1
>> RS>     are >= move costs for M involving C2.
>> RS>
>> RS> (1) is correct but (2) isn't.  IMO (3) is dubious: the trigger should
>> RS> be whether C2 is actually allowed to hold M, not whether C2 is big 
>> enough
>> RS> to hold M.  However, changing that is likely to cause problems 
>> elsewhere,
>> RS> and could lead to classes like GENERAL_AND_FP_REGS being used when
>> RS> FP_REGS are disabled (which might be confusing).
>> RS>
>
> I understand everything up until here.
>
>> RS> “Fixing” (2) using:
>> RS>
>> RS>  CEIL (GET_MODE_SIZE (mode), 2)
> I was wondering why not just return '1' for VPR_REGNUM, rather than use 
> the fact that the mode-size we use for VPR is 2 bytes, so diving it by 2 
> makes 1. Unless we ever decide to use a larger mode for VPR, maybe 
> that's what this is trying to address? I can't imagine we would ever 
> need to though since for MVE there is only one VPR register and it is 
> always 16-bits. Just feels overly complicated to me.

For context, that's what the first version did, and is what led to
the reload failure.  The above is trying to explain why returning
1 doesn't work in practice.

To put (2) a slightly different way: if the port says VPR_REGNUM takes
1 register regardless of the mode passed in, the port is effectively
saying that VPR (and thus VPR_REGNUM) has enough bits to hold *any* mode
passed in (SImode, DImode, etc.).  It actually makes VPR seem bigger
than a general register.

In the particular case of the reload failure, returning 1 effectively
tells the RA that VPR is big enough to hold SImode, but that the port is
nevertheless choosing not to allow VPR to be used to hold SImode.  This
then “infects” the SImode cost of GENERAL_AND_VPR_REGS.

Thanks,
Richard


[committed] testsuite: Add -Wno-psabi to pr47639.C testcase

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 19, 2022 at 12:42:22PM +0100, Martin Liška wrote:
> The tests are C++ code, so use a proper file extension.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/boolcomplex-1.c: Moved to...
>   * g++.dg/ext/boolcomplex-1.C: ...here.
>   * g++.dg/opt/pr47639.c: Moved to...
>   * g++.dg/opt/pr47639.C: ...here.

This patch fixes
gcc/testsuite/g++.dg/opt/pr47639.C:6:24: warning: MMX vector return without MMX 
enabled changes the ABI [-Wpsabi]
gcc/testsuite/g++.dg/opt/pr47639.C:6:5: warning: MMX vector argument without 
MMX enabled changes the ABI [-Wpsabi]
FAILs on i686-linux.

Committed as obvious to trunk:

2022-01-20  Jakub Jelinek  

* g++.dg/opt/pr47639.C: Add -Wno-psabi to dg-options.

--- gcc/testsuite/g++.dg/opt/pr47639.C.jj   2022-01-19 15:02:10.298013819 
+0100
+++ gcc/testsuite/g++.dg/opt/pr47639.C  2022-01-20 02:11:49.995354895 +0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-fnon-call-exceptions" }
+// { dg-options "-fnon-call-exceptions -Wno-psabi" }
 
 typedef int __attribute__ ((vector_size (8))) vec;
 


Jakub



Re: [PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 11:17:28AM +0100, Jakub Jelinek via Gcc-patches wrote:
> > --- a/gcc/cp/error.cc
> > +++ b/gcc/cp/error.cc
> > @@ -768,6 +768,11 @@ class_key_or_enum_as_string (tree t)
> >  return "struct";
> >  }
> > +#if __GNUC__ >= 10
> > +#pragma GCC diagnostic push
> > +#pragma GCC diagnostic ignored "-Wformat-diag"
> > +#endif
> > +
> >  /* Print out a class declaration T under the control of FLAGS,
> > in the form `class foo'.  */
> > @@ -851,6 +856,10 @@ dump_aggr_type (cxx_pretty_printer *pp, tree t, int 
> > flags)
> >  flags & ~TFF_TEMPLATE_HEADER);
> >  }
> > +#if __GNUC__ >= 10
> > +#pragma GCC diagnostic pop
> > +#endif

Oh, and one more thing, but this time not about this source file but about
the warning.  Does it handle the gettext case?
I think -Wformat generally does, gettext has format_arg attribute.
If the warning handles
  pp_printf ("", str);
and
  pp_printf (cond ? "" : "", str);
and
  pp_printf (cond ? "" : "something %s", str);
and
  pp_printf (gettext (""), str);
then maybe it should also handle
  pp_printf (cond ? gettext ("") : ", str);
and
  pp_printf (cond ? gettext ("") : "something %s, str);
too?

Jakub



Re: [PATCH] Fix -Werror=format-diag with RTL checking

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 10:44:18AM +0100, Martin Liška wrote:
> The patch disables -Wformat-diag for RTL checking diagnostics.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
>   PR bootstrap/104135
> 
> gcc/ChangeLog:
> 
>   * emit-rtl.cc (make_insn_raw): Fix -Wformat-diag warnings.
>   * rtl.cc: Partially disable -Wformat-diag for RTL checking
>   error messages.

> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -3997,7 +3997,7 @@ make_insn_raw (rtx pattern)
> || (GET_CODE (insn) == SET
> && SET_DEST (insn) == pc_rtx)))
>  {
> -  warning (0, "ICE: emit_insn used where emit_jump_insn needed:\n");
> +  warning (0, "ICE: % used where % 
> needed:");
>debug_rtx (insn);
>  }
>  #endif
> diff --git a/gcc/rtl.cc b/gcc/rtl.cc
> index 02dd2554728..f3d85814014 100644
> --- a/gcc/rtl.cc
> +++ b/gcc/rtl.cc
> @@ -870,6 +870,12 @@ dump_rtx_statistics (void)
>  }
>  
>  #if defined ENABLE_RTL_CHECKING && (GCC_VERSION >= 2007)
> +
> +#if __GNUC__ >= 10
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wformat-diag"
> +#endif

Please add a comment why we do this above #if __GNUC__ >= 10
line.

> +
>  void
>  rtl_check_failed_bounds (const_rtx r, int n, const char *file, int line,
>const char *func)
> @@ -945,6 +951,10 @@ rtl_check_failed_code_mode (const_rtx r, enum rtx_code 
> code, machine_mode mode,
> func, trim_filename (file), line);
>  }
> +#if __GNUC__ >= 10
> +#pragma GCC diagnostic pop
> +#endif
> +

Please add empty line above this.
LGTM with those changes.

Jakub



Re: [PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 10:43:55AM +0100, Martin Liška wrote:
> The patch disables "-Wformat-diag" for dump_aggr_type.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
>   PR c++/104134
> 
> gcc/cp/ChangeLog:
> 
>   * error.cc (dump_aggr_type): Partially disable the warning.
> ---
>  gcc/cp/error.cc | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
> index 1ab0c25a477..c031c75cc5e 100644
> --- a/gcc/cp/error.cc
> +++ b/gcc/cp/error.cc
> @@ -768,6 +768,11 @@ class_key_or_enum_as_string (tree t)
>  return "struct";
>  }
> +#if __GNUC__ >= 10
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wformat-diag"
> +#endif
> +
>  /* Print out a class declaration T under the control of FLAGS,
> in the form `class foo'.  */
> @@ -851,6 +856,10 @@ dump_aggr_type (cxx_pretty_printer *pp, tree t, int 
> flags)
>flags & ~TFF_TEMPLATE_HEADER);
>  }
> +#if __GNUC__ >= 10
> +#pragma GCC diagnostic pop
> +#endif
> +

Please add an empty line above #if lines.
Also, it would be nice to use the same style of these at least in the
same file.  The others are:

/* Disable warnings about missing quoting in GCC diagnostics for
   the pp_verbatim calls.  Their format strings deliberately don't
   follow GCC diagnostic conventions.  */
#if __GNUC__ >= 10
#  pragma GCC diagnostic push
#  pragma GCC diagnostic ignored "-Wformat-diag"
#endif

and

#if __GNUC__ >= 10
#  pragma GCC diagnostic pop
#endif

The 2 spaces between # and pragma look just weird, so either
use in all the 4 spaces 1 space between # and pragma, or 0 spaces.
And also the copy the comment from above the other diagnostic push,
perhaps with small tweak (pp_verbatim -> pp_printf)?

Otherwise LGTM.

Jakub



[PATCH] s390: Change costs for load on condition.

2022-01-20 Thread Robin Dapp via Gcc-patches
Hi,

this patch is a follow-up patch to the recent ifcvt changes. It
increased costs for a load on condition to 6.  This ensures that we
if-convert sequences of three regular instructions (of cost 4) e.g. a
compare and two SETs into two loads on condition (of cost 6).  With a
cost of 5, four-insn sequences (three SETs) would also be if-converted.

The adjustment to the mov[qi/si]cc expander makes sure we if-convert a
QImode/bool.  Before, combine would create a paradoxical subreg itself
but need an additional insn.

Bootstrapped and regtested on s390x.

Is it OK?

Regards
 Robin

--

gcc/ChangeLog:

* config/s390/s390.cc (s390_rtx_costs): Increase costs for load
on condition.
* config/s390/s390.md: Change mov[qi/si]cc expander.commit b246f96c2a2813d0e509e7916744cda07cc5131c
Author: Robin Dapp 
Date:   Fri Jun 18 10:51:22 2021 +0200

s390: Increase costs for load on condition and change movqicc expander.

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 43c5c72554a..f2e4474df99 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -3636,7 +3636,7 @@ s390_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
 	/* It is going to be a load/store on condition.  Make it
 	   slightly more expensive than a normal load.  */
-	*total = COSTS_N_INSNS (1) + 1;
+	*total = COSTS_N_INSNS (1) + 2;
 
 	rtx dst = SET_DEST (x);
 	rtx then = XEXP (SET_SRC (x), 1);
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index e3ccbac58c0..5eee8e86b42 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -7003,9 +7003,9 @@
   if (!CONSTANT_P (els))
 els = simplify_gen_subreg (E_SImode, els, mode, 0);
 
-  rtx tmp_target = gen_reg_rtx (E_SImode);
+  rtx tmp_target = simplify_gen_subreg (E_SImode, operands[0], mode, 0);
+
   emit_insn (gen_movsicc (tmp_target, operands[1], then, els));
-  emit_move_insn (operands[0], gen_lowpart (mode, tmp_target));
   DONE;
 })
 


Re: Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061]

2022-01-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 20, 2022 at 09:26:50AM +0100, Thomas Schwinge wrote:
> That's what we need to look into, in particular: if we decompose (GIMPLE
> sequence) an OpenACC 'kernels' region into parts, how to move or
> otherwise handle any 'GIMPLE_DEBUG's.

I admit I haven't looked at the pass except now for the toplevel comment.
It says that OpenACC constructs in the region are perhaps adjusted but
their body is unchanged, so that suggests that debug stmts inside of those
bodies should be kept as is.
Next it says that sequential code in between those loops/whatever are
put into some sequential construct, so I guess if you decide so because
of some non-debug stmts, you can just move the debug stmts into that
construct as well, including those debug stmts before the first such
non-debug stmt and debug stmts after the last such non-debug stmts.
It is not a perfect solution, because normally debug stmts before
loops would affect also what is in the loop unless overridden, but
what the pass does seems terribly destructive for debug experience anyway.
There is then another case, only debug stmts e.g. in between or before
the loops or after them and nothing else.  Perhaps throwing them away at
this point is the best thing to do (but, all of this only after the pass
decides that it will change something).

Another thing is, this is apparently a very early pass, so most real
debug stmts don't exist, they are typically created later.
I'd expect you mostly see gimple_debug_begin_stmt_p stmts.
Those can be removed more easily, it doesn't mean var has this value
for the following code until stated otherwise, but it just said here was
the start of some source code statement.  So, if you drop them, all that
will work worse is break some_line.
So citing from e.g. PR100400:
void foo ()
{
  # DEBUG BEGIN_STMT // Outside of region, don't touch this
  #pragma omp target oacc_kernels map(force_tofrom:p [len: 8])
{
  int c.0;

  # DEBUG BEGIN_STMT   // Drop this
  try
{
  # DEBUG BEGIN_STMT  // If p =  is moved somewhere, move the 
surrounding DEBUG BEGIN_STMTs with it
  # DEBUG BEGIN_STMT
  p = 
  # DEBUG BEGIN_STMT  // Up to here
  #pragma acc loop independent private(c.0) private(c)
  for (c.0 = 0; c.0 < 1; c.0 = c.0 + 1)
{
  c = c.0;
  # DEBUG BEGIN_STMT // Keep this in the body
}
}
  finally
{
  c = {CLOBBER};
}
}
}
If you don't have time for it right now, after deciding you are
going to transform it just gsi_remove gimple_debug_begin_stmt_p stmts
you don't know how to handle.

> With these things now hopfully clarified, is the attached
> "Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
> [PR100400, PR103836, PR104061]" OK to push?  It's of course not the final
> fix, but it at least makes obvious any current silent miscompilation, and
> incremental improvement over the current status.

No, users really don't want to see sorry messages just because they turned
-g on their code.  They might be ok with their kernels not being easily
debuggable, but they surely will not be ok with not being able to debug
the host code in the same TU.

Jakub



Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-20 Thread Christophe Lyon via Gcc-patches
On Thu, Jan 20, 2022 at 10:38 AM Andre Simoes Dias Vieira <
andre.simoesdiasvie...@arm.com> wrote:

>
> On 20/01/2022 09:23, Christophe Lyon wrote:
>
>
>
> On Wed, Jan 19, 2022 at 8:03 PM Andre Vieira (lists) via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
>>
>> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
>> > The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
>> >  iterator instead of HI in mve_vmvnq_n_.
>> >
>> > 2022-01-13  Christophe Lyon  
>> >
>> >   gcc/
>> >   * config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
>> >   for operand 1.
>> >
>> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> > index 171dd384133..5c3b34dce3a 100644
>> > --- a/gcc/config/arm/mve.md
>> > +++ b/gcc/config/arm/mve.md
>> > @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
>> >   (define_insn "mve_vmvnq_n_"
>> > [
>> >  (set (match_operand:MVE_5 0 "s_register_operand" "=w")
>> > - (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
>> > + (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
>> >VMVNQ_N))
>> > ]
>> > "TARGET_HAVE_MVE"
>>
>> While fixing this it might be good to fix the constraint and predicate
>> inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid
>> the compiler generating wrong assembly, and instead it would probably
>> lead to the compiler using a load literal.
>>
>> I kind of think it would be better to have the intrinsic refuse the
>> immediate altogether, but it seems for NEON we also use the load literal
>> approach.
>>
>>
> Ha, I thought that patch had been approved at v2 too:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581344.html
>
> Yeah sorry I had not looked at the previous version of these series!
>
> I can put together a follow-up for this then.
>

No problem, thanks!


[PATCH] Fix -Werror=format-diag with RTL checking

2022-01-20 Thread Martin Liška

The patch disables -Wformat-diag for RTL checking diagnostics.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR bootstrap/104135

gcc/ChangeLog:

* emit-rtl.cc (make_insn_raw): Fix -Wformat-diag warnings.
* rtl.cc: Partially disable -Wformat-diag for RTL checking
error messages.
---
 gcc/emit-rtl.cc |  2 +-
 gcc/rtl.cc  | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 3260ca8c3fa..a26bcb0fa2d 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -3997,7 +3997,7 @@ make_insn_raw (rtx pattern)
  || (GET_CODE (insn) == SET
  && SET_DEST (insn) == pc_rtx)))
 {
-  warning (0, "ICE: emit_insn used where emit_jump_insn needed:\n");
+  warning (0, "ICE: % used where % needed:");
   debug_rtx (insn);
 }
 #endif
diff --git a/gcc/rtl.cc b/gcc/rtl.cc
index 02dd2554728..f3d85814014 100644
--- a/gcc/rtl.cc
+++ b/gcc/rtl.cc
@@ -870,6 +870,12 @@ dump_rtx_statistics (void)
 }
 
 #if defined ENABLE_RTL_CHECKING && (GCC_VERSION >= 2007)
+
+#if __GNUC__ >= 10
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wformat-diag"
+#endif
+
 void
 rtl_check_failed_bounds (const_rtx r, int n, const char *file, int line,
 const char *func)
@@ -945,6 +951,10 @@ rtl_check_failed_code_mode (const_rtx r, enum rtx_code 
code, machine_mode mode,
  func, trim_filename (file), line);
 }
 
+#if __GNUC__ >= 10

+#pragma GCC diagnostic pop
+#endif
+
 /* Report that line LINE of FILE tried to access the block symbol fields
of a non-block symbol.  FUNC is the function that contains the line.  */
 
--

2.34.1



[PATCH] Fix Werror=format-diag with --disable-nls.

2022-01-20 Thread Martin Liška

The patch disables "-Wformat-diag" for dump_aggr_type.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR c++/104134

gcc/cp/ChangeLog:

* error.cc (dump_aggr_type): Partially disable the warning.
---
 gcc/cp/error.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 1ab0c25a477..c031c75cc5e 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -768,6 +768,11 @@ class_key_or_enum_as_string (tree t)
 return "struct";
 }
 
+#if __GNUC__ >= 10

+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wformat-diag"
+#endif
+
 /* Print out a class declaration T under the control of FLAGS,
in the form `class foo'.  */
 
@@ -851,6 +856,10 @@ dump_aggr_type (cxx_pretty_printer *pp, tree t, int flags)

 flags & ~TFF_TEMPLATE_HEADER);
 }
 
+#if __GNUC__ >= 10

+#pragma GCC diagnostic pop
+#endif
+
 /* Dump into the obstack the initial part of the output for a given type.
This is necessary when dealing with things like functions returning
functions.  Examples:
--
2.34.1



Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-20 Thread Andre Vieira (lists) via Gcc-patches



On 20/01/2022 09:14, Christophe Lyon wrote:



On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches 
 wrote:


Hi Christophe,

On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> At some point during the development of this patch series, it
appeared
> that in some cases the register allocator wants “VPR or general”
> rather than “VPR or general or FP” (which is the same thing as
> ALL_REGS).  The series does not seem to require this anymore, but it
> seems to be a good thing to do anyway, to give the register
allocator
> more freedom.
Not sure I fully understand this, but I guess it creates an extra
class
the register allocator can use to group things that can go into
VPR or
general reg?
>
> CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
> regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
> -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
I have not looked into this failure, but ...
>
> 2022-01-13  Christophe Lyon  
>
>       gcc/
>       * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
>       (REG_CLASS_NAMES): Likewise.
>       (REG_CLASS_CONTENTS): Likewise.
>       (CLASS_MAX_NREGS): Handle VPR.
>       * config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index bb75921f32d..c3559ca8703 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
>   static unsigned int
>   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
>   {
> +  if (IS_VPR_REGNUM (regno))
> +    return CEIL (GET_MODE_SIZE (mode), 2);
When do we ever want to use more than 1 register for VPR?


That was tricky.
Richard Sandiford helped me analyze the problem, I guess I can quote him:

RS> I think the problem is a combination of a few things:
RS>
RS> (1) arm_hard_regno_mode_ok rejects SImode in VPR, so SImode moves
RS>     to or from the VPR_REG class get the maximum cost.
RS>
RS> (2) IRA thinks from CLASS_MAX_NREGS and arm_hard_regno_nregs that
RS>    VPR is big enough to hold SImode.
RS>
RS> (3) If a class C1 is a superset of a class C2, and if C2 is big enough
RS>     to hold a mode M, IRA ensures that move costs for M involving C1
RS>     are >= move costs for M involving C2.
RS>
RS> (1) is correct but (2) isn't.  IMO (3) is dubious: the trigger should
RS> be whether C2 is actually allowed to hold M, not whether C2 is big 
enough
RS> to hold M.  However, changing that is likely to cause problems 
elsewhere,

RS> and could lead to classes like GENERAL_AND_FP_REGS being used when
RS> FP_REGS are disabled (which might be confusing).
RS>


I understand everything up until here.


RS> “Fixing” (2) using:
RS>
RS>  CEIL (GET_MODE_SIZE (mode), 2)
I was wondering why not just return '1' for VPR_REGNUM, rather than use 
the fact that the mode-size we use for VPR is 2 bytes, so diving it by 2 
makes 1. Unless we ever decide to use a larger mode for VPR, maybe 
that's what this is trying to address? I can't imagine we would ever 
need to though since for MVE there is only one VPR register and it is 
always 16-bits. Just feels overly complicated to me.

RS>
RS> for VPR_REG & VPR_REGNUM seems to make the costs correct.  I don't 
know

RS> if it would cause other problems though.
RS>
RS> I don't think CLASS_MAX_NREGS should do anything special for 
superclasses

RS> of VPR_REG, even though that makes the definition non-obvious.  If an
RS> SImode is stored in GENERAL_AND_VPR_REGS, it will in reality be stored
RS> in the GENERAL_REGS subset, so the maximum count should come from 
there

RS> rather than VPR_REG.

Does that answer your question?
I guess it end's up being correct, just don't understand the complexity 
that's all.


Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-20 Thread Andre Simoes Dias Vieira via Gcc-patches



On 20/01/2022 09:23, Christophe Lyon wrote:



On Wed, Jan 19, 2022 at 8:03 PM Andre Vieira (lists) via Gcc-patches 
 wrote:



On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
>  iterator instead of HI in mve_vmvnq_n_.
>
> 2022-01-13  Christophe Lyon  
>
>       gcc/
>       * config/arm/mve.md (mve_vmvnq_n_): Use V_elem
mode
>       for operand 1.
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 171dd384133..5c3b34dce3a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
>   (define_insn "mve_vmvnq_n_"
>     [
>      (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> -     (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> +     (unspec:MVE_5 [(match_operand: 1
"immediate_operand" "i")]
>        VMVNQ_N))
>     ]
>     "TARGET_HAVE_MVE"

While fixing this it might be good to fix the constraint and
predicate
inspired by "DL" and "neon_inv_logic_op2" respectively. This would
avoid
the compiler generating wrong assembly, and instead it would probably
lead to the compiler using a load literal.

I kind of think it would be better to have the intrinsic refuse the
immediate altogether, but it seems for NEON we also use the load
literal
approach.


Ha, I thought that patch had been approved at v2 too: 
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581344.html



Yeah sorry I had not looked at the previous version of these series!

I can put together a follow-up for this then.


Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-20 Thread Christophe Lyon via Gcc-patches
On Wed, Jan 19, 2022 at 8:03 PM Andre Vieira (lists) via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> > The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
> >  iterator instead of HI in mve_vmvnq_n_.
> >
> > 2022-01-13  Christophe Lyon  
> >
> >   gcc/
> >   * config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
> >   for operand 1.
> >
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index 171dd384133..5c3b34dce3a 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
> >   (define_insn "mve_vmvnq_n_"
> > [
> >  (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> > - (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> > + (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
> >VMVNQ_N))
> > ]
> > "TARGET_HAVE_MVE"
>
> While fixing this it might be good to fix the constraint and predicate
> inspired by "DL" and "neon_inv_logic_op2" respectively. This would avoid
> the compiler generating wrong assembly, and instead it would probably
> lead to the compiler using a load literal.
>
> I kind of think it would be better to have the intrinsic refuse the
> immediate altogether, but it seems for NEON we also use the load literal
> approach.
>
>
Ha, I thought that patch had been approved at v2 too:
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581344.html


Re: [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p

2022-01-20 Thread Christophe Lyon via Gcc-patches
On Wed, Jan 19, 2022 at 7:25 PM Andre Vieira (lists) via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> > VPR_REG is the only register in its class, so it should be handled by
> > TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
> > default_class_likely_spilled_p.  No test fails without this patch, but
> > it seems it should be implemented.
> >
> > 2022-01-13  Christophe Lyon  
> >
> >   gcc/
> >   * config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.
> >
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index c3559ca8703..64a8f2dc7de 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -29317,7 +29317,7 @@ arm_class_likely_spilled_p (reg_class_t rclass)
> > || rclass  == CC_REG)
> >   return true;
> >
> > -  return false;
> > +  return default_class_likely_spilled_p (rclass);
> >   }
> >
> >   /* Implements target hook small_register_classes_for_mode_p.  */
> LGTM, but await reviewer approval. I suspect this would help avoiding
> spilling of other special registers, though I'm not sure we codegen any
> enough to make a difference, which is why it is likely to have no effect
> on anything else.
>
>
Yeah.

I thought this had been approved at v2:
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581778.html
(like most other patches in the series, except the few ones I had to change
v2 -> v3)

Thanks,

Christophe


Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-20 Thread Christophe Lyon via Gcc-patches
On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi Christophe,
>
> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> > At some point during the development of this patch series, it appeared
> > that in some cases the register allocator wants “VPR or general”
> > rather than “VPR or general or FP” (which is the same thing as
> > ALL_REGS).  The series does not seem to require this anymore, but it
> > seems to be a good thing to do anyway, to give the register allocator
> > more freedom.
> Not sure I fully understand this, but I guess it creates an extra class
> the register allocator can use to group things that can go into VPR or
> general reg?
> >
> > CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
> > regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
> > -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.
> I have not looked into this failure, but ...
> >
> > 2022-01-13  Christophe Lyon  
> >
> >   gcc/
> >   * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
> >   (REG_CLASS_NAMES): Likewise.
> >   (REG_CLASS_CONTENTS): Likewise.
> >   (CLASS_MAX_NREGS): Handle VPR.
> >   * config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.
> >
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index bb75921f32d..c3559ca8703 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
> >   static unsigned int
> >   arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
> >   {
> > +  if (IS_VPR_REGNUM (regno))
> > +return CEIL (GET_MODE_SIZE (mode), 2);
> When do we ever want to use more than 1 register for VPR?
>

That was tricky.
Richard Sandiford helped me analyze the problem, I guess I can quote him:

RS> I think the problem is a combination of a few things:
RS>
RS> (1) arm_hard_regno_mode_ok rejects SImode in VPR, so SImode moves
RS> to or from the VPR_REG class get the maximum cost.
RS>
RS> (2) IRA thinks from CLASS_MAX_NREGS and arm_hard_regno_nregs that
RS>VPR is big enough to hold SImode.
RS>
RS> (3) If a class C1 is a superset of a class C2, and if C2 is big enough
RS> to hold a mode M, IRA ensures that move costs for M involving C1
RS> are >= move costs for M involving C2.
RS>
RS> (1) is correct but (2) isn't.  IMO (3) is dubious: the trigger should
RS> be whether C2 is actually allowed to hold M, not whether C2 is big
enough
RS> to hold M.  However, changing that is likely to cause problems
elsewhere,
RS> and could lead to classes like GENERAL_AND_FP_REGS being used when
RS> FP_REGS are disabled (which might be confusing).
RS>
RS> “Fixing” (2) using:
RS>
RS>  CEIL (GET_MODE_SIZE (mode), 2)
RS>
RS> for VPR_REG & VPR_REGNUM seems to make the costs correct.  I don't know
RS> if it would cause other problems though.
RS>
RS> I don't think CLASS_MAX_NREGS should do anything special for
superclasses
RS> of VPR_REG, even though that makes the definition non-obvious.  If an
RS> SImode is stored in GENERAL_AND_VPR_REGS, it will in reality be stored
RS> in the GENERAL_REGS subset, so the maximum count should come from there
RS> rather than VPR_REG.

Does that answer your question?


> >
> > @@ -1453,7 +1456,9 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
> >  ARM regs are UNITS_PER_WORD bits.
> >  FIXME: Is this true for iWMMX?  */
> >   #define CLASS_MAX_NREGS(CLASS, MODE)  \
> > -  (ARM_NUM_REGS (MODE))
> > +  (CLASS == VPR_REG)   \
> > +  ? CEIL (GET_MODE_SIZE (MODE), 2)\
> > +  : (ARM_NUM_REGS (MODE))
> >
> Same.
>
>


Re: [PATCH] tree-optimization/103721 - Only add equivalencies that are still valid.

2022-01-20 Thread Richard Biener via Gcc-patches
On Wed, Jan 19, 2022 at 7:41 PM Andrew MacLeod  wrote:
>
> On 1/19/22 04:33, Richard Biener wrote:
> > On Wed, Jan 19, 2022 at 2:37 AM Andrew MacLeod via Gcc-patches
> >  wrote:
> >>
> >> OK for trunk?
> > OK.  I don't quite understand how what you describe above works, it sounds
> > a bit odd with respect to the idea that equivalences should be transitive 
> > but
> The transitive check is what prevents us from having to find and update
> all the equivalence sets when a name needs to be removed.  we can simply
> create a new equivalence with that name, and all the older equivalences
> in the dom tree will no longer equate with it and are eliminated by the
> query.  With the nature of on-demand, its possible for equivalences to
> get created in unexpected orders, and logging all the equivalences as
> they are seen and leaving the final determination to query time seems to
> be the easiest and most accurate way to get results.  I suspect we miss
> a few relations if we process things in a  random order, but we
> shouldn't get anything wrong.

Ah, that's an interesting approach to solving this issue!

> > I should note that forming equivalences from PHI nodes with backedges
> > is not possible without being very careful since you will easily end up
> > equating _1 and _1 from different iterations (and thus with different 
> > value).
>
> The dominator search version used by ranger won't create equivalences
> from back edges normally because the back edge doesn't dominate the
> block.  The only time we could get an equivalence from a back edge would
> be if all the other arguments to a PHI at the top of the loop were
> undefined, or the same value as came in on the back edge
>
> ie
>
> top_5 = PHI  would create an equivalence between
> top_5 and val_6...   but that's OK because it is just a copy then anyway.
>
> or
>
> top_5 = PHI 
>
> This will create an equivalence between top_5 and val_6 in the loop,
> until we reach the point where val_6 is defined, and then the
> equivalence will get killed.  its possible that might cause an issue in
> a single BB loop, If I could reproduce that...  let me experiment.  In
> which case I'll simply disable equivalences applied to PHIs if its
> driven by just a back edge.
>
> I dont see any other way we can get an equivalence/relation from a back
> edge with the oracle (other than what the threader does, it has its own
> oracle extensions for paths)

Thanks for the explanation.

> Its on my task list to document the entire oracle mechanism for both
> equivalences and relations in the next month or two.

That would be welcome.

Thanks,
Richard.

>
> Andrew
>


Re: Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061]

2022-01-20 Thread Thomas Schwinge
Hi Jakub!

Thanks for looking into this.

On 2022-01-20T00:00:23+0100, Jakub Jelinek  wrote:
> On Wed, Jan 19, 2022 at 11:29:18PM +0100, Thomas Schwinge wrote:
>> (The pass is still disabled by default, by the way.)
>>
>> We've found that 'gcc/omp-oacc-kernels-decompose.cc' is currently not at
>> all considerate of 'GIMPLE_DEBUG' statements -- and it's not always
>> straight forward how to handle these (not rocket science either; but
>> needs proper understanding and testing).
>
> The general rule is that debug stmts shouldn't affect code generation
> decisions, so when deciding what to optimize/how, they should be ignored

ACK.  (... and I'm confused why we didn't run into this when originally
doing the OpenACC 'kernels' decomposition work, three years ago...)

> and during actual transformation adjusted or worst case reset as needed.

That's what we need to look into, in particular: if we decompose (GIMPLE
sequence) an OpenACC 'kernels' region into parts, how to move or
otherwise handle any 'GIMPLE_DEBUG's.

>> Actually fixing it is a separate task, but it seems prudent to at least
>> catch it, and document via a few test cases.  OK to push
>> "Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
>> [PR100400, PR103836, PR104061]", see attached?
>
>> --- a/gcc/omp-oacc-kernels-decompose.cc
>> +++ b/gcc/omp-oacc-kernels-decompose.cc
>> @@ -1255,6 +1255,16 @@ decompose_kernels_region_body (gimple 
>> *kernels_region, tree kernels_clauses)
>>gsi_next (_n);
>>
>>gimple *stmt = gsi_stmt (gsi);
>> +  if (gimple_code (stmt) == GIMPLE_DEBUG)
>> +{
>> +  if (flag_compare_debug_opt || flag_compare_debug)
>> +/* Let the usual '-fcompare-debug' analysis bail out, as
>> +   necessary.  */
>> +;
>> +  else
>> +sorry_at (loc, "%qs not yet supported",
>> +  gimple_code_name[gimple_code (stmt)]);
>> +}
>
> This is wrong.

I have a different understanding what "wrong" means.  ;-)

> It shouldn't be dependent on flag_compare_debug* options,
> those are just debugging aids to verify that -g/-g0 don't affect code
> generation.  With the above you'd pretend they don't, but they actually
> would (with -g you'd get sorry, without it it would compile fine).

The idea there is: not all 'GIMPLE_DEBUG's are mishandled in the pass,
just some.  If '-fcompare-debug' is in effect, we know that it will
detect any cases of mishandling (code generation difference), so it's
thus fine in that case to skip the coarse-grained 'sorry' here.

> If this code is analysing whether the kernels region body should be
> decomposed or not

This place here is just a convenient one, where we iterate through the
whole GIMPLE sequence.

With these things now hopfully clarified, is the attached
"Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
[PR100400, PR103836, PR104061]" OK to push?  It's of course not the final
fix, but it at least makes obvious any current silent miscompilation, and
incremental improvement over the current status.

> it should be if (is_gimple_debug (stmt)) continue;
> or whatever else to just ignore them (in some opts already during analysis
> phase we remember they are present and something about them, but not in
> a way that would actually affect the code generation decisions).
> And then when actually transforming it, it depends on what transformations
> are done to the variables/values referenced in the debug stmts.
> gimple_debug_bind_reset_value (stmt); update_stmt (stmt); is
> what resets them and can be used as last resort, it will keep saying
> that it describes some var, but will say that the var is optimized out.

Thanks, that'll be helpful later.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 568808ef7ccc97ebeae90bc7cb1aba6bd7659b24 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 19 Jan 2022 14:04:42 +0100
Subject: [PATCH] Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels'
 decomposition [PR100400, PR103836, PR104061]

Actually fixing it is a separate task, but it seems prudent to at least catch
it, and document via a few test cases.

	gcc/
	PR middle-end/100400
	PR middle-end/103836
	PR middle-end/104061
	* omp-oacc-kernels-decompose.cc (decompose_kernels_region_body):
	Catch 'GIMPLE_DEBUG'.
	gcc/testsuite/
	PR middle-end/100400
	PR middle-end/103836
	PR middle-end/104061
	* c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-1.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-2.c: New.
	*