Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool  writes:

> On Wed, Feb 23, 2022 at 07:32:55PM +0800, guojiufu wrote:
>> >We already have TARGET_INSN_COST which you could ask for a cost.
>> >Like if we'd have a single_set then just temporarily substitute
>> >the RHS with the candidate and cost the insns and compare against
>> >the original insn cost.  So why exactly do you need a new hook
>> >for this particular situation?
>> 
>> Thanks for pointing out this! Segher also mentioned this before.
>> Currently, CSE is using rtx_cost. Using insn_cost to replace
>> rtx_cost would be a good idea for all necessary places including CSE.
>
> I have updated many places that used rtx_cost to use insn_cost instead,
> over the years (as a fallback the generic insn_cost will use rtx_cost).
> CSE is the biggest remaining thing.  There is a long tail left as well
> of course.
>
>> For this particular case: check the cost for constants.
>> I did not use insn_cost. Because to use insn_cost, we may need
>> to create a recognizable insn temporarily, and for some kind of
>> constants we may need to create a sequence instructions on some
>> platform, e.g. "li xx; ori ; sldi .." on ppc64, and check the
>> sum cost of those instructions. If only create one fake
>> instruction, the insn_cost may not return the accurate cost either.
>
> That is the problem yes.  You need insns to call insn_cost on.  You can
> look in combine.c:combine_validate_cost to see how this can be done; but
> you need to have some code to generate in the first place, and for CSE
> it isn't always clear what code to generate, it really is based on RTL
> expressions having a cost.

Hi Segher,

Thanks! combine_validate_cost is useful to help me on
evaluating the costs of several instructions or replacements.

As you pointed out, at CSE, it may not be clear to know what
extact insn sequences will be generated. Actually,  the same
issue also exists on RTL expression.  At CSE, it may not clear
the exact cost, since the real instructions maybe emitted in
very late passes.

To get the accurate cost, we may analyze the constant in the
hook(insn_cost or rtx_cost) and estimate the possible final
instructions and then calculate the costs.

We discussed one idea: let the hook insn_cost accept
any interim instruction, and estimate the real instruction
base on the interim insn, and then return the estimated
costs.

For example: input insn "r119:DI=0x100803004101001" to
insn_cost; and in rs6000_insn_cost (for ppc), analyze
constant "0x100803004101001" which would need 5 insns;
then rs6000_insn_cost sumarize the cost of 5 insns.

A minor concern: because we know that reading this
constant from the pool is faster than building it by insns,
we will generate instructions to load constant from the pool
finally, do not emit 5 real instructions to build the value.
So, we are more interested in if it is faster to load from
pool or not.


BR,
Jiufu 

>
>
> Segher


Re: [PATCH] Don't do int cmoves for IEEE comparisons, PR target/104256.

2022-02-23 Thread Robin Dapp via Gcc-patches
Hi,

> Robin's patch has the effct making rs6000_emit_int_cmove return false for
> floating point comparisons, so I marked the bug as being a duplicate of PR
> target/104335.

Didn't I just return false for MODE_CC?  This should not affect proper
floating-point comparisons.  It looks like the problem was indeed caused
by my original patch but then I wonder why I did not catch it in the
first place despite running a Power9 bootstrap and regtest (with Fortran
of course) that looked unchanged.

Shouldn't this have come up? I vaguely recall seeing maxloc FAILs at
some point but not in the final runs.  Going to re-check because this
would have helped not introduce the problem that late.

Regards
 Robin


Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Jiufu Guo via Gcc-patches
Jiufu Guo via Gcc-patches  writes:

> Segher Boessenkool  writes:
>
>> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote:
>>> I'm assuming we're always dealing with
>>> 
>>>   (set (reg:MODE ..) )
>>> 
>>> here and CSE is not substituting into random places of an
>>> instruction(?).  I don't know what 'rtx_cost' should evaluate
>>> to for a constant, if it should implicitely evaluate the cost
>>> of putting the result into a register for example.
>>
>> rtx_cost is no good here (and in most places).  rtx_cost should be 0
>> for anything that is used as input in a machine instruction -- but you
>> need much more context to determine that.  insn_cost is much simpler and
>> much easier to use.
>>
>>> Using RTX_COST with SET and 1 at least looks no worse than using
>>> your proposed new target hook and comparing it with the original
>>> unfolded src (again with SET and 1).
>>
>> It is required to generate valid instructions no matter what, before
>> the pass has finished that is.  On all more modern architectures it is
>> futile to think you can usefully consider the cost of an RTL expression
>> and derive a real-world cost of the generated code from that.
>
> Thanks Segher for pointing out these!  Here is  another reason that I
> did not use rtx_cost: in a few passes, there are codes to check the
> constants and store them in constant pool.  I'm thinking to integerate
> those codes in a consistent way.

Hi Segher, Richard!

I'm thinking the way like: For a constant,
1. if the constant could be used as an immediate for the
instruction, then retreated as an operand;
2. otherwise if the constant can not be stored into a
constant pool, then handle through instructions;
3. if it is faster to access constant from pool, then emit
constant as data(.rodata);
4. otherwise, handle the constant by instructions.

And to store the constant into a pool, besides force_const_mem,
create reference (toc) may be needed on some platforms.

For this particular issue in CSE, there is already code that
tries to put constant into a pool (invoke force_const_mem).
While the code is too late.  So, we may check the constant
earlier and store it into constant pool if profitable.

And another thing as Segher pointed out, CSE is doing too
much work.  It may be ok to separate the constant handling
logic from CSE.

I update a new version patch as follow (did not seprate CSE):

Thanks for the comments and suggestions again!


BR,
Jiufu

---
 gcc/config/rs6000/rs6000.cc   | 39 ++-
 gcc/cse.cc| 36 -
 gcc/doc/tm.texi   |  5 +++
 gcc/doc/tm.texi.in|  2 +
 gcc/target.def|  8 
 gcc/targhooks.cc  |  6 +++
 gcc/targhooks.h   |  1 +
 .../gcc.target/powerpc/medium_offset.c|  2 +-
 gcc/testsuite/gcc.target/powerpc/pr63281.c| 14 +++
 gcc/testsuite/gcc.target/powerpc/pr93012.c|  2 +-
 10 files changed, 84 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d7a7cfe860f..0a8f487d516 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1361,6 +1361,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_CANNOT_FORCE_CONST_MEM
 #define TARGET_CANNOT_FORCE_CONST_MEM rs6000_cannot_force_const_mem
 
+#undef TARGET_FASTER_LOADING_CONSTANT
+#define TARGET_FASTER_LOADING_CONSTANT rs6000_faster_loading_const
+
 #undef TARGET_DELEGITIMIZE_ADDRESS
 #define TARGET_DELEGITIMIZE_ADDRESS rs6000_delegitimize_address
 
@@ -9684,8 +9687,8 @@ rs6000_init_stack_protect_guard (void)
 static bool
 rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 {
-  if (GET_CODE (x) == HIGH
-  && GET_CODE (XEXP (x, 0)) == UNSPEC)
+  /* Exclude CONSTANT HIGH part.  */
+  if (GET_CODE (x) == HIGH)
 return true;
 
   /* A TLS symbol in the TOC cannot contain a sum.  */
@@ -10483,6 +10486,30 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, 
machine_mode mode)
   return false;
 }
 
+
+/* Implement TARGET_FASTER_LOADING_CONSTANT.  */
+
+static bool
+rs6000_faster_loading_const (machine_mode mode, rtx dest, rtx src)
+{
+  gcc_assert (CONSTANT_P (src));
+
+  if (GET_MODE_CLASS (mode) == MODE_CC || mode == VOIDmode)
+return false;
+  if (GET_CODE (src) == HIGH)
+return false;
+  if (toc_relative_expr_p (src, false, NULL, NULL))
+return false;
+  if (rs6000_cannot_force_const_mem (mode, src))
+return false;
+
+  if (REG_P (dest) && FP_REGNO_P (REGNO (dest)))
+return true;
+  if (!CONST_INT_P (src))
+return true;
+  return num_insns_constant (src, mode) > (TARGET_PCREL ? 1 : 2);
+}
+
 /* Emit a move from SOURCE to DEST in mode MODE.  */
 void
 rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
@@ 

Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool  writes:

> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote:
>> I'm assuming we're always dealing with
>> 
>>   (set (reg:MODE ..) )
>> 
>> here and CSE is not substituting into random places of an
>> instruction(?).  I don't know what 'rtx_cost' should evaluate
>> to for a constant, if it should implicitely evaluate the cost
>> of putting the result into a register for example.
>
> rtx_cost is no good here (and in most places).  rtx_cost should be 0
> for anything that is used as input in a machine instruction -- but you
> need much more context to determine that.  insn_cost is much simpler and
> much easier to use.
>
>> Using RTX_COST with SET and 1 at least looks no worse than using
>> your proposed new target hook and comparing it with the original
>> unfolded src (again with SET and 1).
>
> It is required to generate valid instructions no matter what, before
> the pass has finished that is.  On all more modern architectures it is
> futile to think you can usefully consider the cost of an RTL expression
> and derive a real-world cost of the generated code from that.

Thanks Segher for pointing out these!  Here is  another reason that I
did not use rtx_cost: in a few passes, there are codes to check the
constants and store them in constant pool.  I'm thinking to integerate
those codes in a consistent way.


BR,
Jiufu

>
> But there is so much more wrong with cse.c :-(
>
>
> Segher


Fwd: [PATCH 2/2][middle-end/102276] Adding -Wtrivial-auto-var-init and update documentation.

2022-02-23 Thread Qing Zhao
Ping...

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: [PATCH 2/2][middle-end/102276] Adding -Wtrivial-auto-var-init and 
update documentation.
Date: February 19, 2022 at 10:24:09 AM CST
To: richard Biener mailto:rguent...@suse.de>>, Jakub Jelinek 
mailto:ja...@redhat.com>>
Cc: gcc-patches Paul A Clarke via 
mailto:gcc-patches@gcc.gnu.org>>, kees Cook 
mailto:keesc...@chromium.org>>
Reply-To: Qing Zhao mailto:qing.z...@oracle.com>>

Hi,

This is the 2nd patch for fixing pr102276.

Adding -Wtrivial-auto-var-init and update documentation.

Adding a new warning option -Wtrivial-auto-var-init to report cases when
-ftrivial-auto-var-init cannot initialize the auto variable. At the same
time, update documentation for -ftrivial-auto-var-init to connect it with
the new warning option -Wtrivial-auto-var-init,  and add documentation
for -Wtrivial-auto-var-init.

Bootstraped and regression tested on both x86 and aarch64.

Okay for committing?

thanks.

Qing.

==
>From 4346890b8f4258489c4841f1992ba3ce816d7689 Mon Sep 17 00:00:00 2001
From: Qing Zhao mailto:qing.z...@oracle.com>>
Date: Fri, 18 Feb 2022 15:53:15 +
Subject: [PATCH 2/2] Adding -Wtrivial-auto-var-init and update documentation.

Adding a new warning option -Wtrivial-auto-var-init to report cases when
-ftrivial-auto-var-init cannot initialize the auto variable. At the same
time, update documentation for -ftrivial-auto-var-init to connect it with
the new warning option -Wtrivial-auto-var-init,  and add documentation
for -Wtrivial-auto-var-init.

2022-02-18 Qing Zhao  mailto:qing.z...@oracle.com>>
gcc/ChangeLog:

* common.opt (-Wtrivial-auto-var-init): New option.
* doc/invoke.texi (-Wtrivial-auto-var-init): Document new option.
(-ftrivial-auto-var-init): Update option;
* gimplify.cc (maybe_warn_switch_unreachable): Rename...
(maybe_warn_switch_unreachable_and_auto_init): ...to this.
(gimplify_switch_expr): Call new function.

gcc/testsuite/ChangeLog:

* gcc.dg/auto-init-pr102276-3.c: New test.
* gcc.dg/auto-init-pr102276-4.c: New test.
---
gcc/common.opt  |   4 +
gcc/doc/invoke.texi |  14 ++-
gcc/gimplify.cc | 100 
+++-
gcc/testsuite/gcc.dg/auto-init-pr102276-3.c |  40 
gcc/testsuite/gcc.dg/auto-init-pr102276-4.c |  40 
5 files changed, 175 insertions(+), 23 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr102276-3.c
create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr102276-4.c

diff --git a/gcc/common.opt b/gcc/common.opt
index c21e5273ae3..22c95dbfa49 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -801,6 +801,10 @@ Wtrampolines
Common Var(warn_trampolines) Warning
Warn whenever a trampoline is generated.

+Wtrivial-auto-var-init
+Common Var(warn_trivial_auto_var_init) Warning Init(0)
+Warn about where -ftrivial-auto-var-init cannot initialize the auto variable.
+
Wtype-limits
Common Var(warn_type_limits) Warning EnabledBy(Wextra)
Warn if a comparison is always true or always false due to the limited range of 
the data type.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e1a00c80307..c61a5b4b4a5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -399,7 +399,7 @@ Objective-C and Objective-C++ Dialects}.
-Wswitch  -Wno-switch-bool  -Wswitch-default  -Wswitch-enum @gol
-Wno-switch-outside-range  -Wno-switch-unreachable  -Wsync-nand @gol
-Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs @gol
--Wtsan -Wtype-limits  -Wundef @gol
+-Wtrivial-auto-var-init -Wtsan -Wtype-limits  -Wundef @gol
-Wuninitialized  -Wunknown-pragmas @gol
-Wunsuffixed-float-constants  -Wunused @gol
-Wunused-but-set-parameter  -Wunused-but-set-variable @gol
@@ -6953,6 +6953,14 @@ This warning is enabled by default for C and C++ 
programs.
Warn when @code{__sync_fetch_and_nand} and @code{__sync_nand_and_fetch}
built-in functions are used.  These functions changed semantics in GCC 4.4.

+@item -Wtrivial-auto-var-init
+@opindex Wtrivial-auto-var-init
+@opindex Wno-trivial-auto-var-init
+Warn when @code{-ftrivial-auto-var-init} cannot initialize the automatic
+variable.  A common situation is an automatic variable that is declared
+between the controlling expression and the first case lable of a @code{switch}
+statement.
+
@item -Wunused-but-set-parameter
@opindex Wunused-but-set-parameter
@opindex Wno-unused-but-set-parameter
@@ -12314,6 +12322,10 @@ initializer as uninitialized, @option{-Wuninitialized} 
and
warning messages on such automatic variables.
With this option, GCC will also initialize any padding of automatic variables
that have structure or union types to zeroes.
+However, the current implementation cannot initialize automatic variables that
+are declared between the controlling expression and the first case of a
+@code{switch} statement.  Using @option{-Wtrivial-auto-var-init} 

Fwd: [PATCH 1/2][middle-end/102276] Don't emit switch-unreachable warnings for -ftrivial-auto-var-init (PR102276)

2022-02-23 Thread Qing Zhao
Ping.

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: [PATCH 1/2][middle-end/102276] Don't emit switch-unreachable warnings 
for -ftrivial-auto-var-init (PR102276)
Date: February 19, 2022 at 10:22:43 AM CST
To: richard Biener mailto:rguent...@suse.de>>, jakub Jelinek 
mailto:ja...@redhat.com>>
Cc: gcc-patches Paul A Clarke via 
mailto:gcc-patches@gcc.gnu.org>>, kees Cook 
mailto:keesc...@chromium.org>>
Reply-To: Qing Zhao mailto:qing.z...@oracle.com>>

Hi,

Per our discussion in the bug report 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102276

We decided to go with the following solution:

1. avoid emitting switch-unreachable warnings for -ftrivial-auto-var-init;
2. adding a new option -Wtrivial-auto-var-init to emit warnings for the 
switch-unreadable cases to suggest the user modify the source code;
3. update documentation of -ftrivial-auto-var-init for the limitation on 
switch-unreachable cases and introduce the new option -Wtrivial-auto-var-init

with the above 1, we can resolve the current immediate issue of spurious 
warnings of using -ftrivial-auto-var-init to make kernel build succeed;
with the above 2, we provide the user a way to know that 
-ftrivial-auto-var-init has limitation on the switch-unreachable cases, and 
user should modify the source code to avoid this problem;
with the above 3, we will provide the user a clear documentation of the 
-ftrivial-auto-var-init and also provide suggestions how to resolve this issue.

There are two patches included for this bug.  This is the first one.

The patches has been bootstrapped and regression tested on both x86 and aarch64.

Okay for commit?

Thanks.

Qing.

===

>From 65bc9607ff35ad49e5501ec5c392293c5b6358d0 Mon Sep 17 00:00:00 2001
From: Qing Zhao mailto:qing.z...@oracle.com>>
Date: Fri, 18 Feb 2022 15:35:53 +
Subject: [PATCH 1/2] Don't emit switch-unreachable warnings for
-ftrivial-auto-var-init (PR102276)

for the following testing case:
 1 int g(int *);
 2 int f1()
 3 {
 4 switch (0) {
 5 int x;
 6 default:
 7 return g();
 8 }
 9 }
compiling with -O -ftrivial-auto-var-init causes spurious warning:
warning: statement will never be executed [-Wswitch-unreachable]
   5 | int x;
 | ^
This is due to the compiler-generated initialization at the point of
the declaration.

We could avoid the warning by adjusting the routine
"maybe_warn_switch_unreachable" to exclude the following cases:

when
flag_auto_var_init > AUTO_INIT_UNINITIALIZED
And
call to .DEFERRED_INIT

2022-02-18 Qing Zhao  mailto:qing.z...@oracle.com>>
gcc/ChangeLog:

* gimplify.cc (maybe_warn_switch_unreachable): Don't warn 
for compiler
-generated initializations for -ftrivial-auto-var-init.

gcc/testsuite/ChangeLog:

* gcc.dg/auto-init-pr102276-1.c: New test.
* gcc.dg/auto-init-pr102276-2.c: New test.
---
gcc/gimplify.cc |  8 -
gcc/testsuite/gcc.dg/auto-init-pr102276-1.c | 38 +
gcc/testsuite/gcc.dg/auto-init-pr102276-2.c | 38 +
3 files changed, 83 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr102276-1.c
create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr102276-2.c

diff --git a/gcc/gimplify.cc 
b/gcc/gimplify.cc
index f570daa015a..4e3bbf5314d 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -2103,7 +2103,13 @@ maybe_warn_switch_unreachable (gimple_seq seq)
 && TREE_CODE (gimple_goto_dest (stmt)) == LABEL_DECL
 && DECL_ARTIFICIAL (gimple_goto_dest (stmt)))
/* Don't warn for compiler-generated gotos.  These occur
-   in Duff's devices, for example.  */;
+   in Duff's devices, for example.  */
+ ;
+  else if ((flag_auto_var_init > AUTO_INIT_UNINITIALIZED)
+ && (gimple_call_internal_p (stmt, IFN_DEFERRED_INIT)))
+ /* Don't warn for compiler-generated initializations for
+  -ftrivial-auto-var-init.  */
+ ;
  else
warning_at (gimple_location (stmt), OPT_Wswitch_unreachable,
   "statement will never be executed");
diff --git a/gcc/testsuite/gcc.dg/auto-init-pr102276-1.c 
b/gcc/testsuite/gcc.dg/auto-init-pr102276-1.c
new file mode 100644
index 000..d574926e0c8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/auto-init-pr102276-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wall -ftrivial-auto-var-init=zero" } */
+
+int g(int *);
+int f()
+{
+switch (0) {
+int x;  /* { dg-bogus "statement will never be executed" } */
+default:
+return g();
+}
+}
+
+int g1(int);
+int f1()
+{
+switch (0) {
+int x; /* { dg-bogus "statement will never be executed" } */
+default:
+return g1(x);  /* { dg-warning "is used uninitialized" } */
+}
+}
+
+struct S
+{
+  char a;
+  int b;
+};
+int g2(int);
+int f2(int input)
+{
+switch (0) {
+

Re: [PATCH] PR fortran/84519 - [F2018] STOP and ERROR STOP statements with QUIET specifier

2022-02-23 Thread Jerry D via Gcc-patches

On 2/23/22 2:21 PM, Harald Anlauf via Fortran wrote:

Dear Fortranners,

Fortran 2018 added a QUIET= specifier to STOP and ERROR STOP statements.
Janne already implemented the library side code four (4!) years ago,
but so far the frontend implementation was missing.

Furthermore, F2018 allows for non-default-integer stopcode expressions
(finally!).

The attached patch provides this implementation.

That was not too much fun for the following reasons:

- fixed format vs. free format
- F95 and F2003 apparently did not require a blank between STOP and
   stopcode, while F2008+ do require it.

This should explain for the three testcases.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

One step closer to F2018!

Thanks,
Harald


A minor comment.  Is there a way to also have a run-time test case?

OK to commit now and additional test case can be added if necessary later.

Regards,

Jerry


[committed] analyzer: handle __attribute__((const)) [PR104434]

2022-02-23 Thread David Malcolm via Gcc-patches
When testing -fanalyzer on openblas-0.3, I noticed slightly over 2000
false positives from -Wanalyzer-malloc-leak on code like this:

if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' ) ) {
pt_t = (lapack_complex_float*)
LAPACKE_malloc( sizeof(lapack_complex_float) *
ldpt_t * MAX(1,n) );
[...snip...]
}

[...snip lots of code...]

if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'q' ) ) {
LAPACKE_free( pt_t );
}

where LAPACKE_lsame is a char-comparison function implemented in a
different TU.
The analyzer naively considers the execution path where:
  LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' )
is true at the malloc guard, but then false at the free guard, which
is thus a memory leak.

This patch makes -fanalyer respect __attribute__((const)), so that the
analyzer treats such functions as returning the same value when given
the same inputs.

I've filed https://github.com/xianyi/OpenBLAS/issues/3543 suggesting that
LAPACKE_lsame be annotated with __attribute__((const)); with that, and
with this patch, the false positives seem to be fixed.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7364-gaee1adf2cdc1cf4e116e5c05b6e7c92b0fbb264b.

gcc/analyzer/ChangeLog:
PR analyzer/104434
* analyzer.h (class const_fn_result_svalue): New decl.
* region-model-impl-calls.cc (call_details::get_manager): New.
* region-model-manager.cc
(region_model_manager::get_or_create_const_fn_result_svalue): New.
(region_model_manager::log_stats): Log
m_const_fn_result_values_map.
* region-model.cc (const_fn_p): New.
(maybe_get_const_fn_result): New.
(region_model::on_call_pre): Handle fndecls with
__attribute__((const)) by calling the above rather than making
a conjured_svalue.
* region-model.h (visitor::visit_const_fn_result_svalue): New.
(region_model_manager::get_or_create_const_fn_result_svalue): New
decl.
(region_model_manager::const_fn_result_values_map_t): New typedef.
(region_model_manager::m_const_fn_result_values_map): New field.
(call_details::get_manager): New decl.
* svalue.cc (svalue::cmp_ptr): Handle SK_CONST_FN_RESULT.
(const_fn_result_svalue::dump_to_pp): New.
(const_fn_result_svalue::dump_input): New.
(const_fn_result_svalue::accept): New.
* svalue.h (enum svalue_kind): Add SK_CONST_FN_RESULT.
(svalue::dyn_cast_const_fn_result_svalue): New.
(class const_fn_result_svalue): New.
(is_a_helper ::test): New.
(template <> struct default_hash_traits):
New.

gcc/testsuite/ChangeLog:
PR analyzer/104434
* gcc.dg/analyzer/attr-const-1.c: New test.
* gcc.dg/analyzer/attr-const-2.c: New test.
* gcc.dg/analyzer/attr-const-3.c: New test.
* gcc.dg/analyzer/pr104434-const.c: New test.
* gcc.dg/analyzer/pr104434-nonconst.c: New test.
* gcc.dg/analyzer/pr104434.h: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.h   |   1 +
 gcc/analyzer/region-model-impl-calls.cc   |   8 +
 gcc/analyzer/region-model-manager.cc  |  28 +++
 gcc/analyzer/region-model.cc  |  59 +-
 gcc/analyzer/region-model.h   |  10 +
 gcc/analyzer/svalue.cc|  73 
 gcc/analyzer/svalue.h | 133 +-
 gcc/testsuite/gcc.dg/analyzer/attr-const-1.c  | 152 +++
 gcc/testsuite/gcc.dg/analyzer/attr-const-2.c  |  16 ++
 gcc/testsuite/gcc.dg/analyzer/attr-const-3.c  |  26 +++
 .../gcc.dg/analyzer/pr104434-const.c  | 173 ++
 .../gcc.dg/analyzer/pr104434-nonconst.c   | 173 ++
 gcc/testsuite/gcc.dg/analyzer/pr104434.h  | 108 +++
 13 files changed, 954 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-const-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-const-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-const-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr104434-const.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr104434-nonconst.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr104434.h

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 7e58bcd5d70..36db4f2b538 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -54,6 +54,7 @@ class svalue;
   class compound_svalue;
   class conjured_svalue;
   class asm_output_svalue;
+  class const_fn_result_svalue;
 typedef hash_set svalue_set;
 class region;
   class frame_region;
diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 95d9921c61d..65daa342824 100644
--- a/gcc/analyzer/region-model-impl-calls.cc

Re: [PATCH] Don't do int cmoves for IEEE comparisons, PR target/104256.

2022-02-23 Thread Michael Meissner via Gcc-patches
On Thu, Feb 17, 2022 at 05:38:07PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> First, you need to adjust after Robin's patch, and retest.

Robin's patch has the effct making rs6000_emit_int_cmove return false for
floating point comparisons, so I marked the bug as being a duplicate of PR
target/104335.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Add testcase from PR103845

2022-02-23 Thread Alexandre Oliva via Gcc-patches


This problem was already fixed as part of PR104263: the abnormal edge
that remained from before inlining didn't make sense after inlining.
So this patch adds only the testcase.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/testsuite/ChangeLog

PR tree-optimization/103845
PR tree-optimization/104263
* gcc.dg/pr103845.c: New.
---
 gcc/testsuite/gcc.dg/pr103845.c |   29 +
 1 file changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr103845.c

diff --git a/gcc/testsuite/gcc.dg/pr103845.c b/gcc/testsuite/gcc.dg/pr103845.c
new file mode 100644
index 0..45ab518d07c9a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr103845.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fharden-compares -fno-ipa-pure-const" } */
+
+int
+baz (void);
+
+__attribute__ ((returns_twice)) void
+bar (void)
+{
+}
+
+int
+quux (int y, int z)
+{
+  return (y || z >= 0) ? y : z;
+}
+
+int
+foo (int x)
+{
+  int a = 0, b = x == a;
+
+  bar ();
+
+  if (!!baz () < quux (b, a))
+++x;
+
+  return x;
+}

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Cope with NULL dw_cfi_cfa_loc

2022-02-23 Thread Alexandre Oliva via Gcc-patches


In def_cfa_0, we may set the 2nd operand's dw_cfi_cfa_loc to NULL, but
then cfi_oprnd_equal_p calls cfa_equal_p with a NULL dw_cfa_location*.
This patch aranges for us to tolerate NULL dw_cfi_cfa_loc.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR middle-end/104540
* dwarf2cfi.cc (cfi_oprnd_equal_p): Cope with NULL
dw_cfi_cfa_loc.

for  gcc/testsuite/ChangeLog

PR middle-end/104540
* g++.dg/PR104540.C: New.
---
 gcc/dwarf2cfi.cc|3 +++
 gcc/testsuite/g++.dg/pr104540.C |   21 +
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pr104540.C

diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index 9ca97d7a3bf56..ab7c5cc5b27b5 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -788,6 +788,9 @@ cfi_oprnd_equal_p (enum dw_cfi_oprnd_type t, dw_cfi_oprnd 
*a, dw_cfi_oprnd *b)
 case dw_cfi_oprnd_loc:
   return loc_descr_equal_p (a->dw_cfi_loc, b->dw_cfi_loc);
 case dw_cfi_oprnd_cfa_loc:
+  /* If any of them is NULL, don't dereference either.  */
+  if (!a->dw_cfi_cfa_loc || !b->dw_cfi_cfa_loc)
+   return a->dw_cfi_cfa_loc == b->dw_cfi_cfa_loc;
   return cfa_equal_p (a->dw_cfi_cfa_loc, b->dw_cfi_cfa_loc);
 }
   gcc_unreachable ();
diff --git a/gcc/testsuite/g++.dg/pr104540.C b/gcc/testsuite/g++.dg/pr104540.C
new file mode 100644
index 0..a86ecbfd088c3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr104540.C
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fharden-conditional-branches -mforce-drap -mstackrealign 
--param=max-grow-copy-bb-insns=125" } */
+
+char c;
+int i;
+
+void bar(int);
+
+struct S {
+  int mi;
+  long ml;
+  S(int);
+};
+
+
+void foo() {
+  int s = c == 0 ? 1 : 2;
+  bar(s);
+  if (i)
+S s(0);
+}


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Copy EH phi args for throwing hardened compares

2022-02-23 Thread Alexandre Oliva via Gcc-patches


When we duplicate a throwing compare for hardening, the EH edge from
the original compare gets duplicated for the inverted compare, but we
failed to adjust any PHI nodes in the EH block.  This patch adds the
needed adjustment, copying the PHI args from those of the preexisting
edge.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/103856
* gimple-harden-conditionals.cc (non_eh_succ_edge): Enable the
eh edge to be requested through an extra parameter.
(pass_harden_compares::execute): Copy PHI args in the EH dest
block for the new EH edge added for the inverted compare.

for  gcc/testsuite/ChangeLog

PR tree-optimization/103856
* g++.dg/pr103856.C: New.
---
 gcc/gimple-harden-conditionals.cc |   31 ---
 gcc/testsuite/g++.dg/pr103856.C   |   17 +
 2 files changed, 45 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr103856.C

diff --git a/gcc/gimple-harden-conditionals.cc 
b/gcc/gimple-harden-conditionals.cc
index 9418194cb20c2..6a5fc3fb9e1a2 100644
--- a/gcc/gimple-harden-conditionals.cc
+++ b/gcc/gimple-harden-conditionals.cc
@@ -361,9 +361,9 @@ make_pass_harden_conditional_branches (gcc::context *ctxt)
 }
 
 /* Return the fallthru edge of a block whose other edge is an EH
-   edge.  */
+   edge.  If EHP is not NULL, store the EH edge in it.  */
 static inline edge
-non_eh_succ_edge (basic_block bb)
+non_eh_succ_edge (basic_block bb, edge *ehp = NULL)
 {
   gcc_checking_assert (EDGE_COUNT (bb->succs) == 2);
 
@@ -375,6 +375,9 @@ non_eh_succ_edge (basic_block bb)
   gcc_checking_assert (!(ret->flags & EDGE_EH)
   && (eh->flags & EDGE_EH));
 
+  if (ehp)
+*ehp = eh;
+
   return ret;
 }
 
@@ -538,8 +541,9 @@ pass_harden_compares::execute (function *fun)
add_stmt_to_eh_lp (asgnck, lookup_stmt_eh_lp (asgn));
make_eh_edges (asgnck);
 
+   edge ckeh;
basic_block nbb = split_edge (non_eh_succ_edge
- (gimple_bb (asgnck)));
+ (gimple_bb (asgnck), ));
gsi_split = gsi_start_bb (nbb);
 
if (dump_file)
@@ -547,6 +551,27 @@ pass_harden_compares::execute (function *fun)
   "Splitting non-EH edge from block %i into %i after"
   " the newly-inserted reversed throwing compare\n",
   gimple_bb (asgnck)->index, nbb->index);
+
+   if (!gimple_seq_empty_p (phi_nodes (ckeh->dest)))
+ {
+   edge aseh;
+   non_eh_succ_edge (gimple_bb (asgn), );
+
+   gcc_checking_assert (aseh->dest == ckeh->dest);
+
+   for (gphi_iterator psi = gsi_start_phis (ckeh->dest);
+!gsi_end_p (psi); gsi_next ())
+ {
+   gphi *phi = psi.phi ();
+   add_phi_arg (phi, PHI_ARG_DEF_FROM_EDGE (phi, aseh), ckeh,
+gimple_phi_arg_location_from_edge (phi, aseh));
+ }
+
+   if (dump_file)
+ fprintf (dump_file,
+  "Copying PHI args in EH block %i from %i to %i\n",
+  aseh->dest->index, aseh->src->index, 
ckeh->src->index);
+ }
  }
 
gcc_checking_assert (single_succ_p (gsi_bb (gsi_split)));
diff --git a/gcc/testsuite/g++.dg/pr103856.C b/gcc/testsuite/g++.dg/pr103856.C
new file mode 100644
index 0..26c7d8750255a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr103856.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -fnon-call-exceptions -fsignaling-nans -fharden-compares" 
} */
+
+struct S {
+  S(float);
+  S();
+  operator float();
+  ~S() {}
+};
+
+int
+main() {
+  S s_arr[] = {2};
+  S var1;
+  if (var1)
+;
+}


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PR103302] skip multi-word pre-move clobber during lra

2022-02-23 Thread Alexandre Oliva via Gcc-patches
On Feb 21, 2022, Richard Biener  wrote:

>> Ok to revert commit r12-5852-g50e8b0c9bca6cdc57804f860ec5311b641753fbb

> OK.  Please re-open the bug as appropriate.

Thanks.  I've reopened it.  Here's what I'm installing.  I'm not
reverting the testcase, since it stopped failing even before the patch
was put in.


Revert commit r12-5852-g50e8b0c9bca6cdc57804f860ec5311b641753fbb

The patch for PR103302 caused PR104121, and extended the live ranges
of LRA reloads.


for gcc/ChangeLog

PR target/104121
PR target/103302
* expr.cc (emit_move_multi_word): Restore clobbers during LRA.
---
 gcc/expr.cc |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 35e40299753bb..5f7142b975ada 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3929,7 +3929,7 @@ emit_move_multi_word (machine_mode mode, rtx x, rtx y)
  hard regs shouldn't appear here except as return values.
  We never want to emit such a clobber after reload.  */
   if (x != y
-  && ! (lra_in_progress || reload_in_progress || reload_completed)
+  && ! (reload_in_progress || reload_completed)
   && need_clobber != 0)
 emit_clobber (x);
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


*Ping* [PATCH] PR fortran/104573 - ICE in resolve_structure_cons, at fortran/resolve.cc:1299

2022-02-23 Thread Harald Anlauf via Gcc-patches

Am 16.02.22 um 22:20 schrieb Harald Anlauf via Gcc-patches:

Dear Fortranners,

while we detect invalid uses of type(*), we may run into other issues
later when the declared variable is used, leading to an ICE due to a
NULL pointer dereference.  This is demonstrated by Gerhard's testcase.

Steve and I came to rather similar fixes, see PR.  Mine is attached.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald





[PATCH] x86: Skip ENDBR when emitting direct call/jmp to local function

2022-02-23 Thread H.J. Lu via Gcc-patches
Skip the 4-byte ENDBR when emitting a direct call/jmp to a local function
with ENDBR at function entry.

This has been tested on Linux kernel.

gcc/

PR target/102953
* config/i386/i386-features.cc
(rest_of_insert_endbr_and_patchable_area): Set
SYMBOL_FLAG_FUNCTION_ENDBR when inserting ENDBR.
* config/i386/i386.cc (ix86_print_operand): Skip the 4-byte ENDBR
when calling the local function with ENDBR at function entry.
* config/i386/i386.h (SYMBOL_FLAG_FUNCTION_ENDBR): New.
(SYMBOL_FLAG_FUNCTION_ENDBR_P): Likewise.

gcc/testsuite/

PR target/102953
* gcc.target/i386/pr102953-1.c: New test.
* gcc.target/i386/pr102953-2.c: Likewise.
---
 gcc/config/i386/i386-features.cc   |  2 ++
 gcc/config/i386/i386.cc| 11 +++-
 gcc/config/i386/i386.h |  5 
 gcc/testsuite/gcc.target/i386/pr102953-1.c | 25 ++
 gcc/testsuite/gcc.target/i386/pr102953-2.c | 30 ++
 5 files changed, 72 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-2.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 6fe41c3c24f..3ca1131ed59 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1979,6 +1979,8 @@ rest_of_insert_endbr_and_patchable_area (bool need_endbr,
  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES
  && DECL_DLLIMPORT_P (cfun->decl
{
+ rtx symbol = XEXP (DECL_RTL (cfun->decl), 0);
+ SYMBOL_REF_FLAGS (symbol) |= SYMBOL_FLAG_FUNCTION_ENDBR;
  if (crtl->profile && flag_fentry)
{
  /* Queue ENDBR insertion to x86_function_profiler.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b2bf90576d5..33777f942f2 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -13810,7 +13810,16 @@ ix86_print_operand (FILE *file, rtx x, int code)
   else if (flag_pic || MACHOPIC_INDIRECT)
output_pic_addr_const (file, x, code);
   else
-   output_addr_const (file, x);
+   {
+ /* Skip ENDBR when emitting a direct call/jmp to a local
+function with ENDBR at function entry.  */
+ if (code == 'P'
+ && GET_CODE (x) == SYMBOL_REF
+ && SYMBOL_REF_LOCAL_P (x)
+ && SYMBOL_FLAG_FUNCTION_ENDBR_P (x))
+   x = gen_rtx_PLUS (Pmode, x, GEN_INT (4));
+ output_addr_const (file, x);
+   }
 }
 }
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index f41e0908250..e3e50e1ebbb 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2786,6 +2786,11 @@ extern GTY(()) tree ms_va_list_type_node;
 #define SYMBOL_REF_STUBVAR_P(X) \
((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_STUBVAR) != 0)
 
+/* Flag to mark a function with ENDBR at entry.  */
+#define SYMBOL_FLAG_FUNCTION_ENDBR (SYMBOL_FLAG_MACH_DEP << 5)
+#define SYMBOL_FLAG_FUNCTION_ENDBR_P(X) \
+   ((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_FUNCTION_ENDBR) != 0)
+
 extern void debug_ready_dispatch (void);
 extern void debug_dispatch_window (int);
 
diff --git a/gcc/testsuite/gcc.target/i386/pr102953-1.c 
b/gcc/testsuite/gcc.target/i386/pr102953-1.c
new file mode 100644
index 000..2afad391baf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr102953-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { ! *-*-darwin* } } } */
+/* { dg-options "-O2 -fno-pic -fplt -fcf-protection" } */
+
+extern int func (int);
+
+extern int i;
+
+__attribute__ ((noclone, noinline, noipa))
+static int
+bar (int x)
+{
+  if (x == 0)
+return x;
+  return bar (x - 1) + func (x);
+}
+
+void *
+foo (void)
+{
+  i = bar (2);
+  return bar;
+}
+
+/* { dg-final { scan-assembler-times {call\t_?bar\+4\M} 2 } } */
+/* { dg-final { scan-assembler-times {call\t_?func\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr102953-2.c 
b/gcc/testsuite/gcc.target/i386/pr102953-2.c
new file mode 100644
index 000..5b8d517f4f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr102953-2.c
@@ -0,0 +1,30 @@
+/* { dg-do compile { target { ! *-*-darwin* } } } */
+/* { dg-options "-O2 -fno-pic -fplt -fcf-protection" } */
+
+static int bar (int x);
+extern int func (int);
+
+int
+foo (int i)
+{
+  return bar (i);
+}
+
+void *
+bar_p (void)
+{
+  return bar;
+}
+
+__attribute__ ((noclone, noinline, noipa))
+static int
+bar (int x)
+{
+  if (x == 0)
+return x;
+  return bar (x - 1) + func (x);
+}
+
+/* { dg-final { scan-assembler-times {call\t_?bar\+4\M} 1 } } */
+/* { dg-final { scan-assembler-times {jmp\t_?bar\+4\M} 1 } } */
+/* { dg-final { scan-assembler-times {call\t_?func\M} 1 } } */
-- 
2.35.1



[PATCH] Add -fcf-check-attribute=[yes|no|none] for Linux kernel

2022-02-23 Thread H.J. Lu via Gcc-patches
When compiling Linux kernel with -fcf-protection=branch to enable x86
Indiret Branch Tracking (IBT), ENDBR is added to all global functions.
This creates more "legal" forward edges than necessary.  -mmanual-endbr
provides a way to insert ENDBR instruction at function entry only via
the 'cf_check' function attribute and programmers can add the 'cf_check'
function attribute to functions which can be reached by indirect branch.

Add -fcf-check-attribute=[yes|no|none] to imply "cf_check" or "nocf_check"
function attributes so that GCC can produce a diagnostic when there is a
mismatch in cf_check or nocf_check function attributes.

This has been tested on Linux kernel.

gcc/

PR target/102953
* attribs.cc (decl_attributes): Add implied cf_check or nocf_check
function attributes.
* common.opt: Add -fcf-check-attribute=.
* flag-types.h (cf_check_attribute): New.
* doc/invoke.texi: Document -fcf-check-attribute=.

gcc/c/

PR target/102953
* c-decl.cc (diagnose_mismatched_decls): Check implied cf_check
and nocf_check function attributes.

gcc/testsuite/

PR target/102953
* gcc.target/i386/pr102953-3.c: New test.
* gcc.target/i386/pr102953-4.c: Likewise.
* gcc.target/i386/pr102953-5.c: Likewise.
* gcc.target/i386/pr102953-6.c: Likewise.
---
 gcc/attribs.cc | 19 +++
 gcc/c/c-decl.cc| 22 +-
 gcc/common.opt | 16 
 gcc/doc/invoke.texi| 12 
 gcc/flag-types.h   |  8 
 gcc/testsuite/gcc.target/i386/pr102953-3.c |  8 
 gcc/testsuite/gcc.target/i386/pr102953-4.c |  7 +++
 gcc/testsuite/gcc.target/i386/pr102953-5.c |  7 +++
 gcc/testsuite/gcc.target/i386/pr102953-6.c |  8 
 9 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102953-6.c

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 497dcff84df..4af9df5d976 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -689,6 +689,25 @@ decl_attributes (tree *node, tree attributes, int flags,
attributes = tree_cons (get_identifier ("no_icf"),  NULL, attributes);
 }
 
+  /* -fcf-check-attribute=[yes|no] implies "cf_check" or "nocf_check"
+ function attribute.  */
+  if (TREE_CODE (*node) == FUNCTION_DECL
+  && flag_cf_check_attribute != CF_CHECK_ATTRIBUTE_NONE
+  && !fndecl_built_in_p (*node)
+  && lookup_attribute ("nocf_check",
+  DECL_ATTRIBUTES (*node)) == nullptr
+  && lookup_attribute ("cf_check",
+  DECL_ATTRIBUTES (*node)) == nullptr
+  && (!attributes
+ || (lookup_attribute ("nocf_check", attributes) == nullptr
+ && lookup_attribute ("cf_check", attributes) == nullptr)))
+{
+  const char *attr = (flag_cf_check_attribute == CF_CHECK_ATTRIBUTE_YES
+ ? "cf_check" : "nocf_check");
+  attributes = tree_cons (get_identifier (attr), nullptr,
+ attributes);
+}
+
   targetm.insert_attributes (*node, );
 
   /* Note that attributes on the same declaration are not necessarily
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index c701f07befe..787c39dc0fe 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -2133,7 +2133,27 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
error ("conflicting type qualifiers for %q+D", newdecl);
}
  else
-   error ("conflicting types for %q+D; have %qT", newdecl, newtype);
+   {
+ if (flag_cf_check_attribute == CF_CHECK_ATTRIBUTE_NO
+ && (!lookup_attribute ("nocf_check",
+ TYPE_ATTRIBUTES (oldtype))
+  != !lookup_attribute ("nocf_check",
+TYPE_ATTRIBUTES (newtype
+   error ("conflicting types for %q+D; have %qT with "
+  "implied % attribute",
+  newdecl, newtype);
+ else if (flag_cf_check_attribute == CF_CHECK_ATTRIBUTE_YES
+  && (!lookup_attribute ("cf_check",
+TYPE_ATTRIBUTES (oldtype))
+ != !lookup_attribute ("cf_check",
+   TYPE_ATTRIBUTES (newtype
+   error ("conflicting types for %q+D; have %qT with "
+  "implied % attribute",
+  newdecl, newtype);
+ else
+   error ("conflicting types for %q+D; have %qT",
+  newdecl, newtype);

[PATCH] PR fortran/84519 - [F2018] STOP and ERROR STOP statements with QUIET specifier

2022-02-23 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

Fortran 2018 added a QUIET= specifier to STOP and ERROR STOP statements.
Janne already implemented the library side code four (4!) years ago,
but so far the frontend implementation was missing.

Furthermore, F2018 allows for non-default-integer stopcode expressions
(finally!).

The attached patch provides this implementation.

That was not too much fun for the following reasons:

- fixed format vs. free format
- F95 and F2003 apparently did not require a blank between STOP and
  stopcode, while F2008+ do require it.

This should explain for the three testcases.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

One step closer to F2018!

Thanks,
Harald

From 66e80a9847b3e16d4c619ba8da9f3dba891cff34 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 23 Feb 2022 23:08:29 +0100
Subject: [PATCH] Fortran: frontend code for F2018 QUIET specifier to STOP and
 ERROR STOP

Fortran 2018 allows for a QUIET specifier to the STOP and ERROR STOP
statements.  Whilst the gfortran library code provides support for this
specifier for quite some time, the frontend implementation was missing.

gcc/fortran/ChangeLog:

	PR fortran/84519
	* dump-parse-tree.cc (show_code_node): Dump QUIET specifier when
	present.
	* match.cc (gfc_match_stopcode): Implement parsing of F2018 QUIET
	specifier.  F2018 stopcodes may have non-default integer kind.
	* trans-stmt.cc (gfc_trans_stop): Pass QUIET specifier to call of
	library function.

gcc/testsuite/ChangeLog:

	PR fortran/84519
	* gfortran.dg/stop_1.f90: New test.
	* gfortran.dg/stop_2.f: New test.
	* gfortran.dg/stop_3.f90: New test.
---
 gcc/fortran/dump-parse-tree.cc   |  5 +++
 gcc/fortran/match.cc | 62 +++-
 gcc/fortran/trans-stmt.cc| 21 --
 gcc/testsuite/gfortran.dg/stop_1.f90 | 44 
 gcc/testsuite/gfortran.dg/stop_2.f   | 31 ++
 gcc/testsuite/gfortran.dg/stop_3.f90 | 22 ++
 6 files changed, 172 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/stop_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/stop_2.f
 create mode 100644 gcc/testsuite/gfortran.dg/stop_3.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 2a2f9901b08..322416e6556 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -2370,6 +2370,11 @@ show_code_node (int level, gfc_code *c)
 	show_expr (c->expr1);
   else
 	fprintf (dumpfile, "%d", c->ext.stop_code);
+  if (c->expr2 != NULL)
+	{
+	  fputs (" QUIET=", dumpfile);
+	  show_expr (c->expr2);
+	}

   break;

diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index 8edfe4a3a2d..715a74eba51 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -2978,6 +2978,13 @@ Fortran 2008 has
R856 allstop-stmt  is ALL STOP [ stop-code ]
R857 stop-code is scalar-default-char-constant-expr
   or scalar-int-constant-expr
+Fortran 2018 has
+
+   R1160 stop-stmt   is STOP [ stop-code ] [ , QUIET = scalar-logical-expr]
+   R1161 error-stop-stmt is
+  ERROR STOP [ stop-code ] [ , QUIET = scalar-logical-expr]
+   R1162 stop-code   is scalar-default-char-expr
+ or scalar-int-expr

 For free-form source code, all standards contain a statement of the form:

@@ -2994,8 +3001,10 @@ static match
 gfc_match_stopcode (gfc_statement st)
 {
   gfc_expr *e = NULL;
+  gfc_expr *quiet = NULL;
   match m;
   bool f95, f03, f08;
+  char c;

   /* Set f95 for -std=f95.  */
   f95 = (gfc_option.allow_std == GFC_STD_OPT_F95);
@@ -3006,11 +3015,16 @@ gfc_match_stopcode (gfc_statement st)
   /* Set f08 for -std=f2008.  */
   f08 = (gfc_option.allow_std == GFC_STD_OPT_F08);

-  /* Look for a blank between STOP and the stop-code for F2008 or later.  */
-  if (gfc_current_form != FORM_FIXED && !(f95 || f03))
-{
-  char c = gfc_peek_ascii_char ();
+  /* Plain STOP statement?  */
+  if (gfc_match_eos () == MATCH_YES)
+goto checks;
+
+  /* Look for a blank between STOP and the stop-code for F2008 or later.
+ But allow for F2018's ,QUIET= specifier.  */
+  c = gfc_peek_ascii_char ();

+  if (gfc_current_form != FORM_FIXED && !(f95 || f03) && c != ',')
+{
   /* Look for end-of-statement.  There is no stop-code.  */
   if (c == '\n' || c == '!' || c == ';')
 goto done;
@@ -3023,7 +3037,12 @@ gfc_match_stopcode (gfc_statement st)
 	}
 }

-  if (gfc_match_eos () != MATCH_YES)
+  if (c == ' ')
+{
+  gfc_gobble_whitespace ();
+  c = gfc_peek_ascii_char ();
+}
+  if (c != ',')
 {
   int stopcode;
   locus old_locus;
@@ -3053,11 +3072,20 @@ gfc_match_stopcode (gfc_statement st)
 	goto cleanup;
   if (m == MATCH_NO)
 	goto syntax;
+}

-  if (gfc_match_eos () != MATCH_YES)
-	goto syntax;
+  if (gfc_match (" , quiet = %e", ) == MATCH_YES)
+{
+  if (!gfc_notify_std (GFC_STD_F2018, "QUIET= specifier for %s 

Re: [PATCH v1] RISC-V: Add support for inlining subword atomic operations

2022-02-23 Thread Palmer Dabbelt

On Mon, 07 Feb 2022 16:48:41 PST (-0800), patr...@rivosinc.com wrote:

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2022-02-07 Patrick O'Neill 

PR target/104338
* riscv.opt: Add command-line flag.
* sync.md (atomic_fetch_): logic for
expanding subword atomic operations.
* sync.md (subword_atomic_fetch_strong_): LR/SC
block for performing atomic operation
* atomic.c: Add reference to duplicate logic.
* inline-atomics-1.c: New test.
* inline-atomics-2.c: Likewise.
* inline-atomics-3.c: Likewise.
* inline-atomics-4.c: Likewise.
* inline-atomics-5.c: Likewise.
* inline-atomics-6.c: Likewise.
* inline-atomics-7.c: Likewise.
* inline-atomics-8.c: Likewise.
* inline-atomics-9.c: Likewise.

Signed-off-by: Patrick O'Neill 
---
There may be further concerns about the memory consistency of these
operations, but this patch focuses on simply moving the logic inline.
Those concerns can be addressed in a future patch.
---
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/sync.md  |  96 +++
 .../gcc.target/riscv/inline-atomics-1.c   |  11 +
 .../gcc.target/riscv/inline-atomics-2.c   |  12 +
 .../gcc.target/riscv/inline-atomics-3.c   | 569 ++
 .../gcc.target/riscv/inline-atomics-4.c   | 566 +
 .../gcc.target/riscv/inline-atomics-5.c   |  13 +
 .../gcc.target/riscv/inline-atomics-6.c   |  12 +
 .../gcc.target/riscv/inline-atomics-7.c   |  12 +
 .../gcc.target/riscv/inline-atomics-8.c   |  17 +
 .../gcc.target/riscv/inline-atomics-9.c   |  17 +
 libgcc/config/riscv/atomic.c  |   2 +
 12 files changed, 1331 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-9.c

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index e294e223151..fb702317233 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -211,3 +211,7 @@ Enum(isa_spec_class) String(20191213) 
Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) 
Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Bool Var(ALWAYS_INLINE_SUBWORD_ATOMIC) Init(-1)
+Always inline subword atomic operations.


We usually have lower-case names for variables, but I think you can get 
away with a target flag here (which makes things slightly easier).  The 
-1 initializer is also a bit odd, but that'd go away with a target flag.


At a bare minimum this needs a dov/invoke.texi blurb, but IMO this 
should really be called out as a news entry as well -- we're already 
finding some ABI-related fallout in libstdc++, so we should make this as 
visible as possible to users.  I think it's OK to default to enabling 
the inline atomics, as we're not directly breaking the ABI from GCC.



diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 747a799e237..e19b4157d3c 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -92,6 +92,102 @@
   "%F3amo.%A3 %0,%z2,%1"
   [(set (attr "length") (const_int 8))])

+(define_expand "atomic_fetch_"
+  [(set (match_operand:SHORT 0 "register_operand" "=") ;; old 
value at mem
+   (match_operand:SHORT 1 "memory_operand" "+A"));; 
mem location
+   (set (match_dup 1)
+   (unspec_volatile:SHORT
+ [(any_atomic:SHORT (match_dup 1)
+(match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value 
for op
+  (match_operand:SI 3 "const_int_operand")];; model
+UNSPEC_SYNC_OLD_OP))]
+  "TARGET_ATOMIC && ALWAYS_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+ subword_atomic_fetch_strong_ to implement a LR/SC version of the
+ operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for 

Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Segher Boessenkool
On Wed, Feb 23, 2022 at 07:32:55PM +0800, guojiufu wrote:
> >We already have TARGET_INSN_COST which you could ask for a cost.
> >Like if we'd have a single_set then just temporarily substitute
> >the RHS with the candidate and cost the insns and compare against
> >the original insn cost.  So why exactly do you need a new hook
> >for this particular situation?
> 
> Thanks for pointing out this! Segher also mentioned this before.
> Currently, CSE is using rtx_cost. Using insn_cost to replace
> rtx_cost would be a good idea for all necessary places including CSE.

I have updated many places that used rtx_cost to use insn_cost instead,
over the years (as a fallback the generic insn_cost will use rtx_cost).
CSE is the biggest remaining thing.  There is a long tail left as well
of course.

> For this particular case: check the cost for constants.
> I did not use insn_cost. Because to use insn_cost, we may need
> to create a recognizable insn temporarily, and for some kind of
> constants we may need to create a sequence instructions on some
> platform, e.g. "li xx; ori ; sldi .." on ppc64, and check the
> sum cost of those instructions. If only create one fake
> instruction, the insn_cost may not return the accurate cost either.

That is the problem yes.  You need insns to call insn_cost on.  You can
look in combine.c:combine_validate_cost to see how this can be done; but
you need to have some code to generate in the first place, and for CSE
it isn't always clear what code to generate, it really is based on RTL
expressions having a cost.


Segher


Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-23 Thread Martin Uecker via Gcc-patches
> 
> On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote:
> >>> That is EH, then there are calls that might not return because they leave
> >>> in some other way (e.g. longjmp), or might loop forever, might exit, might
> >>> abort, trap etc.
> >> Generally speaking, calls which do not return should not now be a 
> >> problem...
> >> as long as they do not transfer control to somewhere else in the current
> >> function.
> > I thought all of those cases are very relevant to PR104530.
> > If we have:
> >_1 = ptr_2(D) == 0;
> >// unrelated code in the same bb
> >_3 = *ptr_2(D);
> > then in light of PR104288, we can optimize ptr_2(D) == 0 into true only if
> > there are no calls inside of "// unrelated code in the same bb"
> > or if all calls in "// unrelated code in the same bb" are guaranteed to
> > return exactly once.  Because, if there is a call in there which could
> > exit (that is the PR104288 testcase), or abort, or trap, or loop forever,
> > or throw externally, or longjmp or in any other non-UB way
> > cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but
> > _3 = *ptr_2(D) not being invoked, then we can't optimize the earlier
> > comparison because ptr_2(D) could be NULL in a valid program.
> > While if there are no calls (and no problematic inline asms) and no trapping
> > insns in between, we can and PR104530 is asking that we continue to optimize
> > that.
> Right.  This is similar to some of the restrictions we deal with in the 
> path isolation pass.  Essentially we have a path, when traversed, would 
> result in a *0.  We would like to be able to find the edge upon-which 
> the *0 is control dependent and optimize the test so that it always went 
> to the valid path rather than the *0 path.
> 
> The problem is there may be observable side effects on the *0 path 
> between the test and the actual *0 -- including calls to nonreturning 
> functions, setjmp/longjmp, things that could trap, etc.  This case is 
> similar.  We can't back-propagate the non-null status through any 
> statements with observable side effects.

There are cases with volatile accesses where
this is currently not the case.

Also there is a proposal for C++ to require
an explicit std::observable fence to make sure
observable side effects are not undone by
later UB. (see the discussion about "reordering
of trapping operations and volatile" in
Janunary on the gcc list)

In my opinion it is much better if a compiler
makes sure to preserve observable side effects
even in code path with later UB (because the
behavior may otherwise be surprising and existing
code may be broken in a subtle way).  If I
understand you correctly, GCC intends to do
this.

Can we then also agree that the remaining volatile
cases used be fixed?


Martin







Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Segher Boessenkool
On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote:
> I'm assuming we're always dealing with
> 
>   (set (reg:MODE ..) )
> 
> here and CSE is not substituting into random places of an
> instruction(?).  I don't know what 'rtx_cost' should evaluate
> to for a constant, if it should implicitely evaluate the cost
> of putting the result into a register for example.

rtx_cost is no good here (and in most places).  rtx_cost should be 0
for anything that is used as input in a machine instruction -- but you
need much more context to determine that.  insn_cost is much simpler and
much easier to use.

> Using RTX_COST with SET and 1 at least looks no worse than using
> your proposed new target hook and comparing it with the original
> unfolded src (again with SET and 1).

It is required to generate valid instructions no matter what, before
the pass has finished that is.  On all more modern architectures it is
futile to think you can usefully consider the cost of an RTL expression
and derive a real-world cost of the generated code from that.

But there is so much more wrong with cse.c :-(


Segher


Re: [PATCH][middle-end/104550]Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding

2022-02-23 Thread Qing Zhao


> On Feb 23, 2022, at 11:49 AM, Jakub Jelinek  wrote:
> 
> On Wed, Feb 23, 2022 at 05:33:57PM +, Qing Zhao wrote:
>> From my understanding, __builtin_clear_padding (), does not _use_ any 
>> variable,
>> therefore, no uninitialized usage warning should be emitted for it. 
> 
> __builtin_clear_padding ()
> sometimes expands to roughly:
> *(int *)((char *) + 32) = 0;
> etc., in that case it shouldn't be suppressed in any way, it doesn't read
> anything, only stores.
> Or at other times it is:
> *(int *)((char *) + 32) &= 0xfec7dab1;
> etc., in that case it reads bytes from the object which can be
> uninitialized, we mask some bits off and store.

Okay, I see. 
So, only the MEM_REF that will be used to read first should be suppressed 
warning. Then there is only one (out of 4) MEM_REF
should be suppressed warning, that’s the following one (line 4371 and then line 
4382):

4371   tree dst = build2_loc (buf->loc, MEM_REF, atype, buf->base,
4372  build_int_cst (buf->alias_type, off));
4373   tree src;
4374   gimple *g;
4375   if (all_ones
4376   && nonzero_first == start
4377   && nonzero_last == start + eltsz)
4378 src = build_zero_cst (type);
4379   else
4380 {
4381   src = make_ssa_name (type);
4382   g = gimple_build_assign (src, unshare_expr (dst));
4383   gimple_set_location (g, buf->loc);
4384   gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
4385   tree mask = native_interpret_expr (type,
4386  buf->buf + i + start,
4387  eltsz);
4388   gcc_assert (mask && TREE_CODE (mask) == INTEGER_CST);
4389   mask = fold_build1 (BIT_NOT_EXPR, type, mask);
4390   tree src_masked = make_ssa_name (type);
4391   g = gimple_build_assign (src_masked, BIT_AND_EXPR,
4392src, mask);
4393   gimple_set_location (g, buf->loc);
4394   gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
4395   src = src_masked;
4396 }
4397   g = gimple_build_assign (dst, src);


All the other 3 MEM_REFs are not read. So, we can just exclude them from 
suppressing warning, right?
Another question, for the above MEM_REF, should I suppress warning for line 
4371 “dst”? Or shall I 
Suppress warning for line 4382 (for the “unshared_expr(dst)”)?

I think that we should suppress warning for the latter, i.e 
“unshared_expr(dst)” at line 4382, right?

> 
> It is similar to what object.bitfld = 3; expands to,
> but usually only after the uninit pass.  Though, we have the
> optimize_bit_field_compare optimization, that is done very early
> and I wonder what uninit does about that.  Perhaps it ignores
> BIT_FIELD_REFs, I'd need to check that.

Yes, I see that uninitialized warning specially handles BIT_INSERT_EXPR as: 
(tree-ssa-uninit.cc)

 573   /* Do not warn if the result of the access is then used for
 574  a BIT_INSERT_EXPR. */
 575   if (lhs && TREE_CODE (lhs) == SSA_NAME)
 576 FOR_EACH_IMM_USE_FAST (luse_p, liter, lhs)
 577   {
 578 gimple *use_stmt = USE_STMT (luse_p);
 579 /* BIT_INSERT_EXPR first operand should not be considered
 580a use for the purpose of uninit warnings.  */
 
> 
> Anyway, if we want to disable uninit warnings for __builtin_clear_padding,
> we should do that with suppress_warning on the read stmts that load
> a byte (or more adjacent ones) before they are masked off and stored again,
> so that we don't warn about that.

IN addition to this read stmts, shall we suppress warnings for the following:

/* Emit a runtime loop:
   for (; buf.base != end; buf.base += sz)
 __builtin_clear_padding (buf.base);  */

static void
clear_padding_emit_loop (clear_padding_struct *buf, tree type,
 tree end, bool for_auto_init)
{

i.e, should we suppress warnings for the above “buf.base != end”, “buf.base += 
sz”?

No need to suppress warning for them since they just read the address of the 
object, not the object itself?

thanks.

Qing

> 
>   Jakub
> 



[PATCH V2] bpf: do not --enable-gcov for bpf-*-* targets

2022-02-23 Thread Jose E. Marchesi
This patch changes the build machinery in order to disable the build
of GCOV (both compiler and libgcc) in bpf-*-* targets.  The reason for
this change is that BPF is (currently) too restricted in order to
support the coverage instrumentalization.

Tested in bpf-unknown-none and x86_64-linux-gnu targets.

2022-02-23  Jose E. Marchesi  

gcc/ChangeLog

PR target/104656
* configure.ac: --disable-gcov if targetting bpf-*.
* configure: Regenerate.

libgcc/ChangeLog

PR target/104656
* configure.ac: --disable-gcov if targetting bpf-*.
* configure: Regenerate.
---
 gcc/configure   | 14 +++---
 gcc/configure.ac| 10 +-
 libgcc/configure| 31 +++
 libgcc/configure.ac | 17 -
 4 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 258b17a226e..22eb3451e3d 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -8085,12 +8085,20 @@ fi
 if test "${enable_gcov+set}" = set; then :
   enableval=$enable_gcov;
 else
-  enable_gcov=yes
+  case $target in
+   bpf-*-*)
+ enable_gcov=no
+   ;;
+   *)
+ enable_gcov=yes
+   ;;
+ esac
 fi
 
 
 
 
+
 # Check whether --with-specs was given.
 if test "${with_specs+set}" = set; then :
   withval=$with_specs; CONFIGURE_SPECS=$withval
@@ -19659,7 +19667,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19662 "configure"
+#line 19670 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19765,7 +19773,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19768 "configure"
+#line 19776 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 06750cee977..20da90901f8 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1041,7 +1041,15 @@ AC_SUBST(enable_shared)
 
 AC_ARG_ENABLE(gcov,
 [  --disable-gcov  don't provide libgcov and related host tools],
-[], [enable_gcov=yes])
+[], [case $target in
+   bpf-*-*)
+ enable_gcov=no
+   ;;
+   *)
+ enable_gcov=yes
+   ;;
+ esac])
+
 AC_SUBST(enable_gcov)
 
 AC_ARG_WITH(specs,
diff --git a/libgcc/configure b/libgcc/configure
index 4919a56f518..52bf25d4e94 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -630,6 +630,7 @@ LIPO
 AR
 toolexeclibdir
 toolexecdir
+enable_gcov
 target_subdir
 host_subdir
 build_subdir
@@ -653,7 +654,6 @@ build_cpu
 build
 with_aix_soname
 enable_vtable_verify
-enable_gcov
 enable_shared
 libgcc_topdir
 target_alias
@@ -701,7 +701,6 @@ with_target_subdir
 with_cross_host
 with_ld
 enable_shared
-enable_gcov
 enable_vtable_verify
 with_aix_soname
 enable_version_specific_runtime_libs
@@ -709,6 +708,7 @@ with_toolexeclibdir
 with_slibdir
 enable_maintainer_mode
 with_build_libsubdir
+enable_gcov
 enable_largefile
 enable_decimal_float
 with_system_libunwind
@@ -1342,12 +1342,12 @@ Optional Features:
   --disable-FEATURE   do not include FEATURE (same as --enable-FEATURE=no)
   --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
   --disable-shareddon't provide a shared libgcc
-  --disable-gcov  don't provide libgcov and related host tools
   --enable-vtable-verifyEnable vtable verification feature
   --enable-version-specific-runtime-libsSpecify that runtime libraries 
should be installed in a compiler-specific directory
   --enable-maintainer-mode
   enable make rules and dependencies not useful (and
   sometimes confusing) to the casual installer
+  --disable-gcov  don't provide libgcov and related host tools
   --disable-largefile omit support for large files
   --enable-decimal-float={no,yes,bid,dpd}
enable decimal float extension to C.  Selecting 'bid'
@@ -2252,15 +2252,6 @@ fi
 
 
 
-# Check whether --enable-gcov was given.
-if test "${enable_gcov+set}" = set; then :
-  enableval=$enable_gcov;
-else
-  enable_gcov=yes
-fi
-
-
-
 # Check whether --enable-vtable-verify was given.
 if test "${enable_vtable_verify+set}" = set; then :
   enableval=$enable_vtable_verify; case "$enableval" in
@@ -2713,6 +2704,22 @@ fi
 target_subdir=${target_noncanonical}
 
 
+# Check whether --enable-gcov was given.
+if test "${enable_gcov+set}" = set; then :
+  enableval=$enable_gcov;
+else
+  case $target in
+   bpf-*-*)
+ enable_gcov=no
+   ;;
+   *)
+ enable_gcov=yes
+   ;;
+ esac
+fi
+
+
+
 # Calculate toolexeclibdir
 # Also toolexecdir, though it's only used in toolexeclibdir
 case ${version_specific_libs} in
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index 13a80b2551b..6f0b67c2adc 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -68,11 +68,6 @@ AC_ARG_ENABLE(shared,
 ], [enable_shared=yes])
 

Re: [PATCH] bpf: do not --enable-gcov for bpf-*-* targets

2022-02-23 Thread Jose E. Marchesi


> On Feb 23 2022, Jose E. Marchesi wrote:
>
>> diff --git a/gcc/configure.ac b/gcc/configure.ac
>> index 06750cee977..a892e170997 100644
>> --- a/gcc/configure.ac
>> +++ b/gcc/configure.ac
>> @@ -1042,6 +1042,12 @@ AC_SUBST(enable_shared)
>>  AC_ARG_ENABLE(gcov,
>>  [  --disable-gcov  don't provide libgcov and related host tools],
>>  [], [enable_gcov=yes])
>> +
>> +case $target in
>> +  bpf-*-*)
>> +enable_gcov=no
>> +  ;;
>> +esac
>
> I think that should be moved inside the fourth argument of AC_ARG_ENABLE
> so that it does not override an explicit --enable-gcov.

Good idea.  V2 on the way...


Re: [PATCH] bpf: do not --enable-gcov for bpf-*-* targets

2022-02-23 Thread Andreas Schwab
On Feb 23 2022, Jose E. Marchesi wrote:

> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 06750cee977..a892e170997 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -1042,6 +1042,12 @@ AC_SUBST(enable_shared)
>  AC_ARG_ENABLE(gcov,
>  [  --disable-gcov  don't provide libgcov and related host tools],
>  [], [enable_gcov=yes])
> +
> +case $target in
> +  bpf-*-*)
> +enable_gcov=no
> +  ;;
> +esac

I think that should be moved inside the fourth argument of AC_ARG_ENABLE
so that it does not override an explicit --enable-gcov.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH] bpf: do not --enable-gcov for bpf-*-* targets

2022-02-23 Thread Jose E. Marchesi
This patch changes the build machinery in order to disable the build
of GCOV (both compiler and libgcc) in bpf-*-* targets.  The reason for
this change is that BPF is (currently) too restricted in order to
support the coverage instrumentalization.

Tested in bpf-unknown-none and x86_64-linux-gnu targets.

2022-02-23  Jose E. Marchesi  

gcc/ChangeLog

PR target/104656
* configure.ac: --disable-gcov if targetting bpf-*.
* configure: Regenerate.

libgcc/ChangeLog

PR target/104656
* configure.ac: --disable-gcov if targetting bpf-*.
* configure: Regenerate.
---
 gcc/configure   | 10 --
 gcc/configure.ac|  6 ++
 libgcc/configure| 31 +++
 libgcc/configure.ac | 17 -
 4 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 258b17a226e..c129d118d27 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -8089,6 +8089,12 @@ else
 fi
 
 
+case $target in
+  bpf-*-*)
+enable_gcov=no
+  ;;
+esac
+
 
 
 # Check whether --with-specs was given.
@@ -19659,7 +19665,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19662 "configure"
+#line 19668 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19765,7 +19771,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19768 "configure"
+#line 19774 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 06750cee977..a892e170997 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1042,6 +1042,12 @@ AC_SUBST(enable_shared)
 AC_ARG_ENABLE(gcov,
 [  --disable-gcov  don't provide libgcov and related host tools],
 [], [enable_gcov=yes])
+
+case $target in
+  bpf-*-*)
+enable_gcov=no
+  ;;
+esac
 AC_SUBST(enable_gcov)
 
 AC_ARG_WITH(specs,
diff --git a/libgcc/configure b/libgcc/configure
index 4919a56f518..4e3c25cbad5 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -630,6 +630,7 @@ LIPO
 AR
 toolexeclibdir
 toolexecdir
+enable_gcov
 target_subdir
 host_subdir
 build_subdir
@@ -653,7 +654,6 @@ build_cpu
 build
 with_aix_soname
 enable_vtable_verify
-enable_gcov
 enable_shared
 libgcc_topdir
 target_alias
@@ -701,7 +701,6 @@ with_target_subdir
 with_cross_host
 with_ld
 enable_shared
-enable_gcov
 enable_vtable_verify
 with_aix_soname
 enable_version_specific_runtime_libs
@@ -709,6 +708,7 @@ with_toolexeclibdir
 with_slibdir
 enable_maintainer_mode
 with_build_libsubdir
+enable_gcov
 enable_largefile
 enable_decimal_float
 with_system_libunwind
@@ -1342,12 +1342,12 @@ Optional Features:
   --disable-FEATURE   do not include FEATURE (same as --enable-FEATURE=no)
   --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
   --disable-shareddon't provide a shared libgcc
-  --disable-gcov  don't provide libgcov and related host tools
   --enable-vtable-verifyEnable vtable verification feature
   --enable-version-specific-runtime-libsSpecify that runtime libraries 
should be installed in a compiler-specific directory
   --enable-maintainer-mode
   enable make rules and dependencies not useful (and
   sometimes confusing) to the casual installer
+  --disable-gcov  don't provide libgcov and related host tools
   --disable-largefile omit support for large files
   --enable-decimal-float={no,yes,bid,dpd}
enable decimal float extension to C.  Selecting 'bid'
@@ -2252,15 +2252,6 @@ fi
 
 
 
-# Check whether --enable-gcov was given.
-if test "${enable_gcov+set}" = set; then :
-  enableval=$enable_gcov;
-else
-  enable_gcov=yes
-fi
-
-
-
 # Check whether --enable-vtable-verify was given.
 if test "${enable_vtable_verify+set}" = set; then :
   enableval=$enable_vtable_verify; case "$enableval" in
@@ -2713,6 +2704,22 @@ fi
 target_subdir=${target_noncanonical}
 
 
+# Check whether --enable-gcov was given.
+if test "${enable_gcov+set}" = set; then :
+  enableval=$enable_gcov;
+else
+  enable_gcov=yes
+fi
+
+
+case $target in
+  bpf-*-*)
+enable_gcov=no
+  ;;
+esac
+
+
+
 # Calculate toolexeclibdir
 # Also toolexecdir, though it's only used in toolexeclibdir
 case ${version_specific_libs} in
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index 13a80b2551b..c7a2d89c426 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -68,11 +68,6 @@ AC_ARG_ENABLE(shared,
 ], [enable_shared=yes])
 AC_SUBST(enable_shared)
 
-AC_ARG_ENABLE(gcov,
-[  --disable-gcov  don't provide libgcov and related host tools],
-[], [enable_gcov=yes])
-AC_SUBST(enable_gcov)
-
 AC_ARG_ENABLE(vtable-verify,
 [  --enable-vtable-verifyEnable vtable verification feature ],
 [case "$enableval" in
@@ -163,6 +158,18 @@ ACX_NONCANONICAL_HOST
 ACX_NONCANONICAL_TARGET
 GCC_TOPLEV_SUBDIRS
 

Re: [PATCH][middle-end/104550]Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding

2022-02-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Feb 23, 2022 at 05:33:57PM +, Qing Zhao wrote:
> From my understanding, __builtin_clear_padding (), does not _use_ any 
> variable,
>  therefore, no uninitialized usage warning should be emitted for it. 

__builtin_clear_padding ()
sometimes expands to roughly:
*(int *)((char *) + 32) = 0;
etc., in that case it shouldn't be suppressed in any way, it doesn't read
anything, only stores.
Or at other times it is:
*(int *)((char *) + 32) &= 0xfec7dab1;
etc., in that case it reads bytes from the object which can be
uninitialized, we mask some bits off and store.

It is similar to what object.bitfld = 3; expands to,
but usually only after the uninit pass.  Though, we have the
optimize_bit_field_compare optimization, that is done very early
and I wonder what uninit does about that.  Perhaps it ignores
BIT_FIELD_REFs, I'd need to check that.

Anyway, if we want to disable uninit warnings for __builtin_clear_padding,
we should do that with suppress_warning on the read stmts that load
a byte (or more adjacent ones) before they are masked off and stored again,
so that we don't warn about that.

Jakub



[pushed] c++: Add new test [PR79493]

2022-02-23 Thread Marek Polacek via Gcc-patches
A nice side effect of r12-1822 was improving the diagnostic
we emit for the following test.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/79493

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/undeclared1.C: New test.
---
 gcc/testsuite/g++.dg/diagnostic/undeclared1.C | 7 +++
 1 file changed, 7 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/undeclared1.C

diff --git a/gcc/testsuite/g++.dg/diagnostic/undeclared1.C 
b/gcc/testsuite/g++.dg/diagnostic/undeclared1.C
new file mode 100644
index 000..98c1ecb6581
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/undeclared1.C
@@ -0,0 +1,7 @@
+// PR c++/79493
+
+namespace A { }
+struct B {
+  void f(A::nonexistent param); // { dg-error ".A::nonexistent. has not been 
declared" }
+  void* g(A::nonexistent param); // { dg-error ".A::nonexistent. has not been 
declared" }
+};

base-commit: 9675ecf7f9b340f93c68cf22280f5975a902
-- 
2.35.1



[PATCH] c: Added testcase for already fixed PR [PR93432]

2022-02-23 Thread Krishna Narayanan via Gcc-patches
Hello,

The following patch is a testcase for PR93432,which deals with the
warning for uninitialized variables.The testcase is for the bug
already fixed.

Regtested on x86_64, OK for commit ? Please do review it.

2022-02-23 Krishna Narayanan 

PR c/93432
gcc/testsuite/Changelog:
*gcc.dg/pr93432.c: New test

---
 gcc/testsuite/pr93432.c | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/pr93432.c

diff --git a/gcc/testsuite/pr93432.c b/gcc/testsuite/pr93432.c
new file mode 100644
index 0..cd7199a56
--- /dev/null
+++ b/gcc/testsuite/pr93432.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wuninitialized -Wmaybe-uninitialized" } */
+ int test(int y) {
+int z; /* { dg-message "note: 'z' was declared here"  } */
+int x;
+int a; /* { dg-warning "'a' may be used uninitialized" } */
+for (x = 0; x < 10; x = x + 1, y = y + 1,a = a + 1)
+{
+if (y < 10) {
+z = z + 1 + a; /* { dg-warning "'z' may be used uninitialized" } */
+}
+}
+return z;
+}
-- 
2.25.1


[pushed] c++: Add fixed test [PR70077]

2022-02-23 Thread Marek Polacek via Gcc-patches
Fixed with r10-1280.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/70077

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept76.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/noexcept76.C | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept76.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept76.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept76.C
new file mode 100644
index 000..fc816477e57
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept76.C
@@ -0,0 +1,17 @@
+// PR c++/70077
+// { dg-do compile { target c++11 } }
+
+struct B {
+B(int) noexcept { }
+virtual void f() = 0;
+};
+
+struct D: public B {
+using B::B;
+D() noexcept(noexcept(D{42})): B{42} { }
+void f() override { }
+};
+
+int main() {
+B *b = new D{};
+}

base-commit: fdc46830f1b793dc791099acfadc3f0f8cc24c0e
-- 
2.35.1



Re: [PATCH][middle-end/104550]Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding

2022-02-23 Thread Qing Zhao
Hi, Richard,

> On Feb 23, 2022, at 1:38 AM, Richard Biener  wrote:
> 
> On Tue, 22 Feb 2022, Qing Zhao wrote:
> 
>> __builtin_clear_padding() will clear all the padding bits of the 
>> object.
>> actually, it doesn't involve any use of an user variable. Therefore, users do
>> not expect any uninitialized warning from it. It's reasonable to suppress
>> uninitialized warnings for all new created uses from __builtin_clear_padding
>> folding.
>> 
>> The patch has been bootstrapped and regress tested on both x86 and aarch64.
>> 
>> Okay for trunk?
>> 
>> Thanks.
>> 
>> Qing
>> 
>> ==
>> From cf6620005f55d4a1f782332809445c270d22cf86 Mon Sep 17 00:00:00 2001
>> From: qing zhao 
>> Date: Mon, 21 Feb 2022 16:38:31 +
>> Subject: [PATCH] Suppress uninitialized warnings for new created uses from
>> __builtin_clear_padding folding [PR104550]
>> 
>> __builtin_clear_padding() will clear all the padding bits of the 
>> object.
>> actually, it doesn't involve any use of an user variable. Therefore, users do
>> not expect any uninitialized warning from it. It's reasonable to suppress
>> uninitialized warnings for all new created uses from __builtin_clear_padding
>> folding.
>> 
>>  PR middle-end/104550
>> 
>> gcc/ChangeLog:
>> 
>>  * gimple-fold.cc (clear_padding_flush): Suppress warnings for new
>>  created uses.
>>  (clear_padding_emit_loop): Likewise.
>>  (clear_padding_type): Likewise.
>>  (gimple_fold_builtin_clear_padding): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.dg/auto-init-pr104550-1.c: New test.
>>  * gcc.dg/auto-init-pr104550-2.c: New test.
>>  * gcc.dg/auto-init-pr104550-3.c: New test.
>> ---
>> gcc/gimple-fold.cc  | 31 +++--
>> gcc/testsuite/gcc.dg/auto-init-pr104550-1.c | 10 +++
>> gcc/testsuite/gcc.dg/auto-init-pr104550-2.c | 11 
>> gcc/testsuite/gcc.dg/auto-init-pr104550-3.c | 11 
>> 4 files changed, 55 insertions(+), 8 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-1.c
>> create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-2.c
>> create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-3.c
>> 
>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>> index 16f02c2d098..1e18ba3465a 100644
>> --- a/gcc/gimple-fold.cc
>> +++ b/gcc/gimple-fold.cc
>> @@ -4296,6 +4296,7 @@ clear_padding_flush (clear_padding_struct *buf, bool 
>> full)
>>   build_int_cst (buf->alias_type,
>>  buf->off + padding_end
>>  - padding_bytes));
>> +  suppress_warning (dst, OPT_Wuninitialized);
>>gimple *g = gimple_build_assign (dst, src);
>>gimple_set_location (g, buf->loc);
>>gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
>> @@ -4341,6 +4342,7 @@ clear_padding_flush (clear_padding_struct *buf, bool 
>> full)
>>tree dst = build2_loc (buf->loc, MEM_REF, atype,
>>   buf->base,
>>   build_int_cst (buf->alias_type, off));
>> +  suppress_warning (dst, OPT_Wuninitialized);
>>gimple *g = gimple_build_assign (dst, src);
>>gimple_set_location (g, buf->loc);
>>gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
>> @@ -4370,6 +4372,7 @@ clear_padding_flush (clear_padding_struct *buf, bool 
>> full)
>>  atype = build_aligned_type (type, buf->align);
>>tree dst = build2_loc (buf->loc, MEM_REF, atype, buf->base,
>>   build_int_cst (buf->alias_type, off));
>> +  suppress_warning (dst, OPT_Wuninitialized);
>>tree src;
>>gimple *g;
>>if (all_ones
>> @@ -4420,6 +4423,7 @@ clear_padding_flush (clear_padding_struct *buf, bool 
>> full)
>>   build_int_cst (buf->alias_type,
>>  buf->off + end
>>  - padding_bytes));
>> +  suppress_warning (dst, OPT_Wuninitialized);
>>gimple *g = gimple_build_assign (dst, src);
>>gimple_set_location (g, buf->loc);
>>gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
>> @@ -4620,14 +4624,18 @@ clear_padding_emit_loop (clear_padding_struct *buf, 
>> tree type,
>>   gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
>>   clear_padding_type (buf, type, buf->sz, for_auto_init);
>>   clear_padding_flush (buf, true);
>> -  g = gimple_build_assign (buf->base, POINTER_PLUS_EXPR, buf->base,
>> -   size_int (buf->sz));
>> +  tree rhs = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (buf->base),
>> +  buf->base, size_int (buf->sz));
>> +  suppress_warning (rhs, OPT_Wuninitialized);
>> +  g = gimple_build_assign (buf->base, rhs);
> 
> why do we need to suppress warnings on a POINTER_PLUS_EXPR?

This 

Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-23 Thread Andrew MacLeod via Gcc-patches

On 2/23/22 02:48, Richard Biener wrote:

On Tue, Feb 22, 2022 at 8:19 PM Andrew MacLeod via Gcc-patches
 wrote:

On 2/22/22 13:07, Jeff Law wrote:


On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote:

That is EH, then there are calls that might not return because they
leave
in some other way (e.g. longjmp), or might loop forever, might
exit, might
abort, trap etc.

Generally speaking, calls which do not return should not now be a
problem...
as long as they do not transfer control to somewhere else in the
current
function.

I thought all of those cases are very relevant to PR104530.
If we have:
_1 = ptr_2(D) == 0;
// unrelated code in the same bb
_3 = *ptr_2(D);
then in light of PR104288, we can optimize ptr_2(D) == 0 into true
only if
there are no calls inside of "// unrelated code in the same bb"
or if all calls in "// unrelated code in the same bb" are guaranteed to
return exactly once.  Because, if there is a call in there which could
exit (that is the PR104288 testcase), or abort, or trap, or loop
forever,
or throw externally, or longjmp or in any other non-UB way
cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but
_3 = *ptr_2(D) not being invoked, then we can't optimize the earlier
comparison because ptr_2(D) could be NULL in a valid program.
While if there are no calls (and no problematic inline asms) and no
trapping
insns in between, we can and PR104530 is asking that we continue to
optimize
that.

Right.  This is similar to some of the restrictions we deal with in
the path isolation pass.  Essentially we have a path, when traversed,
would result in a *0.  We would like to be able to find the edge
upon-which the *0 is control dependent and optimize the test so that
it always went to the valid path rather than the *0 path.

The problem is there may be observable side effects on the *0 path
between the test and the actual *0 -- including calls to nonreturning
functions, setjmp/longjmp, things that could trap, etc.  This case is
similar.  We can't back-propagate the non-null status through any
statements with observable side effects.

Jeff


We can't back propagate, but we can alter our forward view.  Any
ssa-name defined before the observable side effect can be recalculated
using the updated values, and all uses of those names after the
side-effect would then appear to be "up-to-date"

This does not actually change anything before the side-effect statement,
but the lazy re-evalaution ranger employs makes it appear as if we do a
new computation when _1 is used afterwards. ie:

 _1 = ptr_2(D) == 0;
 // unrelated code in the same bb
 _3 = *ptr_2(D);
 _4 = ptr_2(D) == 0;  // ptr_2 is known to be [+1, +INF] now.
And we use _4 everywhere _1 was used.   This is the effect.

so we do not actually change anything in the unrelated code, just
observable effects afterwards.  We already do these recalculations on
outgoing edges in other blocks, just not within the definition block
because non-null wasn't visible within the def block.

Additionally, In the testcase, there is a store to C before the side
effects.
these patches get rid of the branch and thus the call in the testcase as
requested, but we still have to compute _3 in order to store it into
global C since it occurs  pre side-effect.

  b.0_1 = b;
  _2 = b.0_1 == 0B;
  _3 = (int) _2;
  c = _3;
  _5 = *b.0_1;

No matter how you look at it, you are going to need to process a block
twice in order to handle any code pre-side-effect.  Whether it be
assigning stmt uids, or what have you.

Yes.  I thought that is what ranger already does when it discovers new
ranges from edges.  Say we have

   _1 = 10 / _2;
   if (_2 == 1)
 {
_3 = _1 + 1;

then when evaluating _1 + 1 we re-evaluate 10 / _2 using _2 == 1 and
can compute _3 to [11, 11]?


Correct, we get most of these first order effects via edges.




That obviously extends to any stmt-level ranges we discover for uses
(not defs because defs are never used upthread).  And doing that is
_not_ affected by any function/BB terminating calls or EH or whatnot
as long as the updated ranges are only affecting stmts dominating the
current one.

What complicates all this reasoning is that it is straight-forward when
you work with a traditional IL walking pass but it gets hard (and possibly
easy to get wrong) with on-demand processing and caching because
everything you cache will now be context dependent (valid only
starting after stmt X and for stmts dominated by it).


Yeah, which is why this particular side effect code only applies to 
definitions during a dom walk.  we know we will not return to a def.


The non-null list (and next release the generalized side-effects) are 
only applied to on-exit ranges via non-EH edges.. so they cant really 
get us into trouble as we are sure of those values only affecting 
dominated blocks.  Pure on-demand clients will not get any of 

[PATCH] OpenMP, C++: Add template support for the has_device_addr clause.

2022-02-23 Thread Marcel Vollweiler

Hi,

The patch for adding the has_device_addr clause on the target construct was
recently committed (bbb7f8604e1dfc08f44354cfd93d2287f2fdd489).

Additionally, this patch adds support for list items in the has_device_addr
clause which type is given by C++ template parameters.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP, C++: Add template support for the has_device_addr clause.

gcc/cp/ChangeLog:

* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_HAS_DEVICE_ADDR.
* semantics.cc (finish_omp_clauses): Handle PARM_DECL and
NON_LVALUE_EXPR.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Handle NON_LVALUE_EXPR.
(gimplify_adjust_omp_clauses): Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_omp_target): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c++/target-has-device-addr-7.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-8.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-9.C: New test.

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 6dda660..86446d7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17652,6 +17652,7 @@ tsubst_omp_clauses (tree clauses, enum 
c_omp_region_type ort,
case OMP_CLAUSE_USE_DEVICE_PTR:
case OMP_CLAUSE_USE_DEVICE_ADDR:
case OMP_CLAUSE_IS_DEVICE_PTR:
+   case OMP_CLAUSE_HAS_DEVICE_ADDR:
case OMP_CLAUSE_INCLUSIVE:
case OMP_CLAUSE_EXCLUSIVE:
  OMP_CLAUSE_DECL (nc)
@@ -17797,6 +17798,7 @@ tsubst_omp_clauses (tree clauses, enum 
c_omp_region_type ort,
  case OMP_CLAUSE_USE_DEVICE_PTR:
  case OMP_CLAUSE_USE_DEVICE_ADDR:
  case OMP_CLAUSE_IS_DEVICE_PTR:
+ case OMP_CLAUSE_HAS_DEVICE_ADDR:
  case OMP_CLAUSE_INCLUSIVE:
  case OMP_CLAUSE_EXCLUSIVE:
  case OMP_CLAUSE_ALLOCATE:
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0cb17a6..452ecfd 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -8534,11 +8534,14 @@ finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
{
  if (handle_omp_array_sections (c, ort))
remove = true;
+ else if (TREE_CODE (TREE_CHAIN (t)) == PARM_DECL)
+   t = TREE_CHAIN (t);
  else
{
  t = OMP_CLAUSE_DECL (c);
  while (TREE_CODE (t) == INDIRECT_REF
-|| TREE_CODE (t) == ARRAY_REF)
+|| TREE_CODE (t) == ARRAY_REF
+|| TREE_CODE (t) == NON_LVALUE_EXPR)
t = TREE_OPERAND (t, 0);
}
}
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f570daa..b1bb5be 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -10285,7 +10285,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
case OMP_CLAUSE_HAS_DEVICE_ADDR:
  decl = OMP_CLAUSE_DECL (c);
  while (TREE_CODE (decl) == INDIRECT_REF
-|| TREE_CODE (decl) == ARRAY_REF)
+|| TREE_CODE (decl) == ARRAY_REF
+|| TREE_CODE (decl) == NON_LVALUE_EXPR)
decl = TREE_OPERAND (decl, 0);
  flags = GOVD_EXPLICIT;
  goto do_add_decl;
@@ -11443,7 +11444,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, 
gimple_seq body, tree *list_p,
case OMP_CLAUSE_HAS_DEVICE_ADDR:
  decl = OMP_CLAUSE_DECL (c);
  while (TREE_CODE (decl) == INDIRECT_REF
-|| TREE_CODE (decl) == ARRAY_REF)
+|| TREE_CODE (decl) == ARRAY_REF
+|| TREE_CODE (decl) == NON_LVALUE_EXPR)
decl = TREE_OPERAND (decl, 0);
  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
  remove = n == NULL || !(n->value & GOVD_SEEN);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 77176ef..30cc9b6 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1384,7 +1384,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
}
  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
{
- if (TREE_CODE (decl) == INDIRECT_REF)
+ if (TREE_CODE (decl) == INDIRECT_REF
+ || TREE_CODE (decl) == NON_LVALUE_EXPR)
decl = TREE_OPERAND (decl, 0);
  install_var_field (decl, true, 3, ctx);
}
@@ -1747,7 +1748,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
{
  while (TREE_CODE (decl) == INDIRECT_REF
-|| TREE_CODE (decl) == ARRAY_REF)
+|| TREE_CODE (decl) == ARRAY_REF
+|| 

Re: [PATCH 2/2] tree-optimization/104530 - Mark defs dependent on non-null stale.

2022-02-23 Thread Andrew MacLeod via Gcc-patches

On 2/23/22 02:33, Richard Biener wrote:

On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches
 wrote:

This patch simply leverages the existing computation machinery to
re-evaluate values dependent on a newly found non-null value

Ranger associates a monotonically increasing temporal value with every
def as it is defined.  When that value is used, we check if any of the
values used in the definition have been updated, making the current
cached global value stale.  This makes the evaluation lazy, if there are
no more uses, we will never re-evaluate.

When an ssa-name is marked non-null it does not change the global value,
and thus will not invalidate any global values.  This patch marks any
definitions in the block which are dependent on the non-null value as
stale.  This will cause them to be re-evaluated when they are next used.

Imports: b.0_1  d.3_7
Exports: b.0_1  _2  _3  d.3_7  _8
   _2 : b.0_1(I)
   _3 : b.0_1(I)  _2
   _8 : b.0_1(I)  _2  _3  d.3_7(I)

 b.0_1 = b;
  _2 = b.0_1 == 0B;
  _3 = (int) _2;
  c = _3;
  _5 = *b.0_1;<<-- from this point b.0_1 is [+1, +INF]
  a = _5;
  d.3_7 = d;
  _8 = _3 % d.3_7;
  if (_8 != 0)

when _5 is defined, and n.0_1 becomes non-null,  we mark the dependent
names that are exports and defined in this block as stale.  so _2, _3
and _8.

When _8 is being calculated, _3 is stale, and causes it to be
recomputed.  it is dependent on _2, alsdo stale, so it is also
recomputed, and we end up with

_2 == [0, 0]
_3 == [0 ,0]
and _8 = [0, 0]
And then we can fold away the condition.

The side effect is that _2 and _3 are globally changed to be [0, 0], but
this is OK because it is the definition block, so it dominates all other
uses of these names, and they should be [0,0] upon exit anyway.  The
previous patch ensure that the global values written to
SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3.

The patch would have been even smaller if I already had a mark_stale
method.   I thought there was one, but I guess it never made it in from
lack of need at the time.   The only other tweak was to make the value
stale if the dependent value was the same as the definitions.

This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running
to ensure.

@@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block
bb, tree name)
 {
   r.set_nonzero (type);
   m_on_entry.set_bb_range (name, bb, r);
+ // Mark consumers of name stale so they can be recomputed.
+ if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb))
+   {
+ tree x;
+ FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x)
+   if (m_gori.in_chain_p (name, x)
+   && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb)
+ m_temporal->set_stale (x);
+   }
 }

so if we have a BB that exports N names and each of those is updated to nonnull
this is going to be quadratic?  It also looks like the gimple_bb check
is cheaper
than the bitmap test done in in_chain_p.  What comes to my mind is why we need
to mark "consumers"?  Can't consumers check their uses defs when they look
at their timestamp?  This whole set_stale thing doesn't seem to be


They do.  The timestamps only look at direct uses. Any use of _2 should 
look at the def and notice it is stale relative to b.0_1 automatically. 
We miss the opportunity in the example which uses _3 to compute _8.  _3 
is directly dependent on _2 whose def is not stale relative to _3, so we 
miss the transitive staleness via b.0_1.   This marks all the consumers 
whose calculation is derived from the now non-null value as stale.   
Within the block, it is fully transitive and anything potentially 
derived from NAME will be recalculated if it is used.  In old EVRP 
terms, it would be like updating the current value vector for any 
ssa-names derived from NAME when it becomes non-null, except it is done 
lazily.




transitive anyway,
consider:

_1 = ...


_2 = _1 + ..;


   _3 = _2 + ...;

so when _1 is updated to non-null we mark _2 as stale but _3 should
also be stale, no?
When we visit _3 before eventually getting to _2 (to see whether it
updates and thus
we more precisely we know if it makes _3 stale) we won't re-evaluate it?



That said, the change looks somewhat ad-hoc to get to 1-level deep second-level
opportunities?


The patch applies only to dom-walks, and primarily targets definitions 
in the current block that we have already seen that we now know are 
stale. It is one approach to applying non-null later in the same block 
without resorting to much of an algorithmic change.  It's not really 
intended to affect anything cross block as that is handled differently 
via the GORI engine.  It would provide better on-exit ranges in the 
definition block for some of the names involved.


That said, I'm not crazy about putting anything else into this release 

Re: [PATCH 1/2] tree-optimization/104530 - Export global ranges during the VRP block walk.

2022-02-23 Thread Andrew MacLeod via Gcc-patches

On 2/23/22 02:25, Richard Biener wrote:

On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches
 wrote:

Ranger currently waits until the end of the VRP pass, then calls
export_global_ranges ().

This method walks the list of ssa-names looking for names which it
thinks should have SSA_NAME_RANGE_INFO updated, and is an artifact of
the on-demand mechanism where there isn't an obvious time to finalize a
name.

The changes for 104288 introduced the register_side_effects method and
do provide a final place where stmt's are processed during the DOMWALK.

This patch exports the global range calculated by the statement (before
processing side effects), and avoids the need for calling the export
method.  This is generally better all round I think.

Bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running to
ensure...

OK for trunk? or defer to stage 1?

I'm getting a bit nervous so lets defer to stage 1 unless a P1 fix
requires this.


Me too, so good :-)




[PATCH] tree-optimization/104658 - avoid mixing mask & non-mask vector defs

2022-02-23 Thread Richard Biener via Gcc-patches
When pattern recognition fails to sanitize all defs of a mask
producing operation and the respective def is external or constant
we end up trying to produce a VECTOR_BOOLEAN_TYPE_P constructor
which in turn ends up exposing stmts like

   _135 = _49 ? -1 : 0;

which isn't handled well in followup SLP and generates awful code.

We do rely heavily on pattern recognition to sanitize mask vs.
data uses of bools but that fails here which means we also should
fail vectorization.  That avoids ICEing because of such stmts
and it also avoids generating weird code which makes the
vectorization not profitable.

The following patch simply disallows external VECTOR_BOOLEAN_TYPE_P
defs and arranges the promote to external code to instead promote
mask uses to extern (that's just a short-cut here).

I've also looked at aarch64 and with SVE and a fixed vector length
for the gcc.target/i386/pr101636.c testcase.  I see similar vectorization
(using ) there but it's hard to decide whether the
old, the new or no vectorization is better for this.  The code
generated with traditional integer masks isn't as awkward but we
still get the != 0 promotion done for each scalar element which
doesn't look like intended - this operation should be visible upfront.

That also means some cases will now become a missed optimization
that needs to be fixed by bool pattern recognition which I plan to
look at in more detail during stage1.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.

2022-02-22  Richard Biener  

PR tree-optimization/104658
* tree-vect-slp.cc (vect_slp_convert_to_external): Do not
create VECTOR_BOOLEAN_TYPE_P extern defs.  Reset the vector
type on nodes we promote.
(vectorizable_bb_reduc_epilogue): Deal with externalized
root.
* tree-vect-stmts.cc (vect_maybe_update_slp_op_vectype): Do
not allow VECTOR_BOOLEAN_TYPE_P extern defs.

* gcc.target/i386/pr104658.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr104658.c | 113 +++
 gcc/tree-vect-slp.cc |   9 +-
 gcc/tree-vect-stmts.cc   |   5 +
 3 files changed, 125 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104658.c

diff --git a/gcc/testsuite/gcc.target/i386/pr104658.c 
b/gcc/testsuite/gcc.target/i386/pr104658.c
new file mode 100644
index 000..2b8d02aacab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104658.c
@@ -0,0 +1,113 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fgimple -ftree-slp-vectorize -mavx512f -fdump-tree-slp2" 
} */
+
+void __GIMPLE (ssa,guessed_local(118111600))
+bar (int * restrict a, int * restrict e,
+ _Bool d0, _Bool d1, _Bool d2, _Bool d3, _Bool d4, _Bool d5, _Bool d6, 
_Bool d7)
+{
+  int _1;
+  int _4;
+  int _6;
+  int _8;
+  int _10;
+  int _12;
+  int _14;
+  int _16;
+  int _27;
+  _Bool _37;
+  _Bool _39;
+  _Bool _41;
+  int _43;
+  _Bool _45;
+  _Bool _47;
+  _Bool _49;
+  _Bool _53;
+  _Bool _54;
+  _Bool _55;
+  int _56;
+  _Bool _57;
+  _Bool _58;
+  _Bool _59;
+  int _60;
+  _Bool _61;
+  _Bool _62;
+  _Bool _63;
+  int _64;
+  _Bool _65;
+  _Bool _66;
+  _Bool _67;
+  int _68;
+  _Bool _69;
+  _Bool _70;
+  _Bool _71;
+  int _72;
+  _Bool _73;
+  _Bool _74;
+  _Bool _75;
+  int _76;
+
+  __BB(2,guessed_local(118111600)):
+  _73 = d0_2(D);
+  _69 = d1_5(D);
+  _65 = d2_7(D);
+  _61 = d3_9(D);
+  _57 = d4_11(D);
+  _53 = d5_13(D);
+  _41 = d6_15(D);
+  _49 = d7_17(D);
+  a_81 = a_22(D);
+  e_82 = e_23(D);
+  _1 = __MEM  (a_81 + _Literal (int * restrict) 32);
+  _4 = __MEM  (a_81 + _Literal (int * restrict) 36);
+  _6 = __MEM  (a_81);
+  _8 = __MEM  (a_81 + _Literal (int * restrict) 4);
+  _10 = __MEM  (a_81 + _Literal (int * restrict) 48);
+  _12 = __MEM  (a_81 + _Literal (int * restrict) 52);
+  _14 = __MEM  (a_81 + _Literal (int * restrict) 16);
+  _16 = __MEM  (a_81 + _Literal (int * restrict) 60);
+  _74 = _1 != 0;
+  _75 = _73 & _74;
+  _76 = _75 ? _1 : 0;
+  __MEM  (e_82) = _76;
+  __MEM  (e_82 + _Literal (int * restrict) 4) = _76;
+  __MEM  (e_82 + _Literal (int * restrict) 8) = _76;
+  __MEM  (e_82 + _Literal (int * restrict) 12) = _76;
+  __MEM  (e_82 + _Literal (int * restrict) 16) = _76;
+  __MEM  (e_82 + _Literal (int * restrict) 20) = _76;
+  __MEM  (e_82 + _Literal (int * restrict) 24) = _76;
+  __MEM  (e_82 + _Literal (int * restrict) 28) = _76;
+  __MEM  (e_82 + _Literal (int * restrict) 32) = _76;
+  _70 = _4 != 0;
+  _71 = _69 & _70;
+  _72 = _71 ? _4 : 0;
+  __MEM  (e_82 + _Literal (int * restrict) 36) = _72;
+  _66 = _6 != 0;
+  _67 = _65 & _66;
+  _68 = _67 ? _6 : 0;
+  __MEM  (e_82 + _Literal (int * restrict) 40) = _68;
+  _62 = _8 != 0;
+  _63 = _61 & _62;
+  _64 = _63 ? _8 : 0;
+  __MEM  (e_82 + _Literal (int * restrict) 44) = _64;
+  _58 = _10 != 0;
+  _59 = _57 & _58;
+  _60 = _59 ? _10 : 0;
+  __MEM  (e_82 + _Literal (int * restrict) 48) = _60;
+  _54 = _12 != 0;
+  _55 = _53 & _54;
+  _56 = _55 ? _12 : 0;
+ 

Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Richard Biener via Gcc-patches
On Wed, 23 Feb 2022, guojiufu wrote:

> 
> 
> On 2/22/22 PM3:26, Richard Biener wrote:
> > On Tue, 22 Feb 2022, Jiufu Guo wrote:
> > 
> >> Hi,
> >>
> >> For constants, there are some codes to check: if it is able to put
> >> to instruction as an immediate operand or it is profitable to load from
> >> mem.  There are still some places that could be improved for platforms.
> >>
> >> This patch could handle PR63281/57836.  This patch does not change
> >> too much on the code like force_const_mem and legitimate_constant_p.
> >> We may integrate these APIs for passes like expand/cse/combine
> >> as a whole solution in the future (maybe better for stage1?).
> >>
> >> Bootstrap and regtest pass on ppc64le and x86_64. Is this ok for trunk?
> >> Thanks for comments!
> > 
> > I'm not sure whether we need a new hook here, but iff, then I think
> > whether loading a constant (from memory?) is faster or not depends
> > on the context.  So what's the exact situation and the two variants
> > you are costing against each other?  I assume (since you are
> > touching CSE) you are costing
> 
> Hi Richard,
> 
> Thanks for your review!
> 
> In some contexts, it may be faster to load from memory for some
> constant value, and for some constant value, it would be faster
> to put into immediate of few (1 or 2) instructions.
> 
> For example 0x1234567812345678, on ppc64, we may need 3 instructions
> to build it, and then it would be better to put it in .rodata, and
> then load it from memory.
> 
> Currently, we already have hooks TARGET_CANNOT_FORCE_CONST_MEM and
> TARGET_LEGITIMATE_CONSTANT_P.
> 
> TARGET_CANNOT_FORCE_CONST_MEM is used to check if one 'rtx' can be
> store into the constant pool.
> On some targets (e.g. alpha), TARGET_LEGITIMATE_CONSTANT_P does the
> behavior like what we expect:
> 
> I once thought to use TARGET_LEGITIMATE_CONSTANT_P too.
> But in general, it seems this hook is designed to check if one
> 'rtx' could be used as an immediate instruction. This hook is used
> in RTL passes: ira/reload. It is also used in recog.cc and expr.cc.
> 
> In other words, I feel, whether putting a constant in the constant
> pool, we could check:
> - If TARGET_CANNOT_FORCE_CONST_MEM returns true, we should not put
> the 'constant' in the constant pool.
> - If TARGET_LEGITIMATE_CONSTANT_P returns true, then the 'constant'
> would be immediate of **one** instruction, and not put to constant
> pool.
> - If the new hook TARGET_FASTER_LOADING_CONSTANT returns true, then
> the 'constant' would be stored in the constant pool.
> Otherwise, it would be better to use an instructions-seq to build the
> 'constant'.
> This is why I introduce a new hook.

I agree TARGET_CANNOT_FORCE_CONST_MEM and TARGET_LEGITIMATE_CONSTANT_P
are not the correct vehicle for a cost based consideration, they
are used for correctness checks.

> We may also use the new hook at other places, e.g. expand/combining...
> where is calling force_const_mem.
> 
> Any suggestions?
> 
> > 
> >(set (...) (mem))  (before CSE)
> > 
> > against
> > 
> >(set (...) (immediate))  (what CSE does now)
> > 
> > vs.
> > 
> >(set (...) (mem))  (original, no CSE)
> > 
> > ?  With the new hook you are skipping _all_ of the following loops
> > logic which does look like a quite bad design and hack (not that
> > I am very familiar with the candidate / costing logic in there).
> 
> At cse_insn, in the following loop of the code, it is also testing
> the constant and try to put into memory:
> 
>   else if (crtl->uses_const_pool
>&& CONSTANT_P (trial)
>&& !CONST_INT_P (trial)
>&& (src_folded == 0 || !MEM_P (src_folded))
>&& GET_MODE_CLASS (mode) != MODE_CC
>&& mode != VOIDmode)
> {
>   src_folded = force_const_mem (mode, trial);
>   if (src_folded)
> {
>   src_folded_cost = COST (src_folded, mode);
>   src_folded_regcost = approx_reg_cost (src_folded);
> }
> }
> 
> This code is at the end of the loop, it would only be tested for
> the next iteration. It may be better to test "if need to put the
> constant into memory" for all iterations.
>
> The current patch is adding an additional test before the loop.
> I will update the patch to integrate these two places!

I'm assuming we're always dealing with

  (set (reg:MODE ..) )

here and CSE is not substituting into random places of an
instruction(?).  I don't know what 'rtx_cost' should evaluate
to for a constant, if it should implicitely evaluate the cost
of putting the result into a register for example.

The RTX_COST target hook at least has some context
(outer_code and opno) and COST passes SET and 1 here.

So why is adjusting the RTX_COST hook not enough then for this case?
Using RTX_COST with SET and 1 at least looks no worse than using
your proposed new target hook and comparing it with the original
unfolded 

Re: [PATCH] middle-end/104644 - recursion with bswap match.pd pattern

2022-02-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Feb 23, 2022 at 01:53:38PM +0100, Richard Biener wrote:
> The following patch avoids infinite recursion during generic folding.
> The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
> (bswap @1) actually being simplified, if it is not simplified, we just
> move the bswap from one operand to the other and if @0 is also INTEGER_CST,
> we apply the same rule next.
> 
> The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
> has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
> such cases:
> static inline bool
> integer_cst_p (tree t)
> {
>   return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
> }
> The patch uses ! modifier to ensure the bswap is simplified and
> extends support to GENERIC by means of requiring !EXPR_P which
> is not perfect but a conservative approximation.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> I didn't bother to un-#ifdef GIMPLE the few use-cases of ! we have
> in match.pd since those are not regressions.  Let's remember
> that for stage1.  I also agree this is the safes approach at this
> stage.

Note, it is not clear if all the #if GIMPLE around ! modifier uses
are purely because of the error or not, I think we'll need to do
some archeology for each of the cases to see.  I vaguely remember one case
where I've added such #if GIMPLE wrapping only for that reason, but
don't remember which of those 6 !s I've added it was, and then there are
some from Marc and Feng.

> OK?

LGTM, thanks.

> 2022-02-22  Richard Biener  
> 
>   PR tree-optimization/104644
>   * doc/match-and-simplify.texi: Amend ! documentation.
>   * genmatch.cc (expr::gen_transform): Code-generate ! support
>   for GENERIC.
>   (parser::parse_expr): Allow ! for GENERIC.
>   * match.pd (cmp (bswap @0) INTEGER_CST@1): Use ! modifier on
>   bswap.
> 
>   * gcc.dg/pr104644.c: New test.

Jakub



[PATCH] middle-end/104644 - recursion with bswap match.pd pattern

2022-02-23 Thread Richard Biener via Gcc-patches
The following patch avoids infinite recursion during generic folding.
The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
(bswap @1) actually being simplified, if it is not simplified, we just
move the bswap from one operand to the other and if @0 is also INTEGER_CST,
we apply the same rule next.

The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
such cases:
static inline bool
integer_cst_p (tree t)
{
  return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
}
The patch uses ! modifier to ensure the bswap is simplified and
extends support to GENERIC by means of requiring !EXPR_P which
is not perfect but a conservative approximation.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I didn't bother to un-#ifdef GIMPLE the few use-cases of ! we have
in match.pd since those are not regressions.  Let's remember
that for stage1.  I also agree this is the safes approach at this
stage.

OK?

Thanks,
Richard.

2022-02-22  Richard Biener  

PR tree-optimization/104644
* doc/match-and-simplify.texi: Amend ! documentation.
* genmatch.cc (expr::gen_transform): Code-generate ! support
for GENERIC.
(parser::parse_expr): Allow ! for GENERIC.
* match.pd (cmp (bswap @0) INTEGER_CST@1): Use ! modifier on
bswap.

* gcc.dg/pr104644.c: New test.

Co-Authored-by: Jakub Jelinek 
---
 gcc/doc/match-and-simplify.texi |  6 --
 gcc/genmatch.cc | 20 +---
 gcc/match.pd|  2 +-
 gcc/testsuite/gcc.dg/pr104644.c |  9 +
 4 files changed, 23 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr104644.c

diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi
index 63a73ae047c..055a5308e7d 100644
--- a/gcc/doc/match-and-simplify.texi
+++ b/gcc/doc/match-and-simplify.texi
@@ -374,8 +374,10 @@ for example
 
 which moves the outer @code{plus} operation to the inner arms
 of the @code{vec_cond} expression but only if the actual plus
-operations both simplify.  Note this is currently only supported
-for code generation targeting @code{GIMPLE}.
+operations both simplify.  Note that on @code{GENERIC} a simple
+operand means that the result satisfies @code{!EXPR_P} which
+can be limiting if the operation itself simplifies but the
+remaining operand is an (unrelated) expression.
 
 As intermediate conversions are often optional there is a way to
 avoid the need to repeat patterns both with and without such
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 97f6f00fa68..2eda7300821 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -2553,19 +2553,20 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
fprintf_indent (f, indent, "_r%d = fold_build%d_loc (loc, %s, %s",
depth, ops.length(), opr_name, type);
   else
-   {
- fprintf_indent (f, indent, "{\n");
- fprintf_indent (f, indent, "  _r%d = maybe_build_call_expr_loc (loc, "
- "%s, %s, %d", depth, opr_name, type, ops.length());
-   }
+   fprintf_indent (f, indent, "_r%d = maybe_build_call_expr_loc (loc, "
+   "%s, %s, %d", depth, opr_name, type, ops.length());
   for (unsigned i = 0; i < ops.length (); ++i)
fprintf (f, ", _o%d[%u]", depth, i);
   fprintf (f, ");\n");
   if (opr->kind != id_base::CODE)
{
- fprintf_indent (f, indent, "  if (!_r%d)\n", depth);
- fprintf_indent (f, indent, "goto %s;\n", fail_label);
- fprintf_indent (f, indent, "}\n");
+ fprintf_indent (f, indent, "if (!_r%d)\n", depth);
+ fprintf_indent (f, indent, "  goto %s;\n", fail_label);
+   }
+  if (force_leaf)
+   {
+ fprintf_indent (f, indent, "if (EXPR_P (_r%d))\n", depth);
+ fprintf_indent (f, indent, "  goto %s;\n", fail_label);
}
   if (*opr == CONVERT_EXPR)
{
@@ -4297,9 +4298,6 @@ parser::parse_expr ()
   && token->type == CPP_NOT
   && !(token->flags & PREV_WHITE))
 {
-  if (!gimple)
-   fatal_at (token, "forcing simplification to a leaf is not supported "
- "for GENERIC");
   eat_token (CPP_NOT);
   e->force_leaf = true;
 }
diff --git a/gcc/match.pd b/gcc/match.pd
index cad61848daa..cf78a11ddd4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3962,7 +3962,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (simplify
(cmp (bswap @0) INTEGER_CST@1)
(with { tree ctype = TREE_TYPE (@1); }
-(cmp (convert:ctype @0) (bswap @1)
+(cmp (convert:ctype @0) (bswap! @1)
  /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & C2.  */
  (simplify
   (bit_and (convert1? (rshift@0 (convert2? (bswap@4 @1)) INTEGER_CST@2))
diff --git a/gcc/testsuite/gcc.dg/pr104644.c b/gcc/testsuite/gcc.dg/pr104644.c
new file mode 100644

Re: [PATCH][nvptx] Fix dummy location in gen_comment

2022-02-23 Thread Thomas Schwinge
Hi!

On 2022-02-23T12:14:57+0100, Tom de Vries via Gcc-patches 
 wrote:
> [ Re: [committed][nvptx] Add -mptx-comment ]
>
> On 2/22/22 14:53, Tom de Vries wrote:
>> Add functionality that indicates which insns are added by -minit-regs, such
>> that for instance we have for pr53465.s:
>> ...
>>  // #APP
>> // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
>>  // Start: Added by -minit-regs=3:
>>  // #NO_APP
>>  mov.u32 %r26, 0;
>>  // #APP
>> // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
>>  // End: Added by -minit-regs=3:
>>  // #NO_APP
>> ...
>>
>> Can be switched off using -mno-ptx-comment.
>>
>> Tested on nvptx.
>
> But tested in combination with another patch, which is still waiting for
> review.
>
> This patch by itself caused some regressions

I'd just begun analyzing and determined that it was
commit c2b23aaaf4457278403c01cd145cd3936683384e
"[nvptx] Add -mptx-comment" that causes a load of FAILs in nvptx
offloading testing:

Program received signal SIGSEGV, Segmentation fault.
0x0084abad in final_scan_insn_1 (insn=insn@entry=0x77380940, 
file=file@entry=0x1f50c40, optimize_p=optimize_p@entry=0, 
nopeepholes=nopeepholes@entry=0, seen=seen@entry=0x7fffd07c) at 
[...]/source-gcc/gcc/final.cc:2650
2650if (*loc.file && loc.line)
(gdb) print loc
$1 = {file = 0x0, line = 0, column = 0, data = 0x0, sysp = false}
(gdb) bt
#0  0x0084abad in final_scan_insn_1 
(insn=insn@entry=0x77380940, file=file@entry=0x1f50c40, 
optimize_p=optimize_p@entry=0, nopeepholes=nopeepholes@entry=0, 
seen=seen@entry=0x7fffd07c) at [...]/source-gcc/gcc/final.cc:2650
#1  0x0084b86a in final_scan_insn (insn=insn@entry=0x77380940, 
file=file@entry=0x1f50c40, optimize_p=optimize_p@entry=0, 
nopeepholes=nopeepholes@entry=0, seen=seen@entry=0x7fffd07c) at 
[...]/source-gcc/gcc/final.cc:2942
#2  0x0084823a in final_1 (first=0x774631c0, file=0x1f50c40, 
seen=1, optimize_p=0) at [...]/source-gcc/gcc/final.cc:1999
#3  0x0085091a in rest_of_handle_final () at 
[...]/source-gcc/gcc/final.cc:4287
#4  0x00850de4 in (anonymous namespace)::pass_final::execute 
(this=0x1f4bd00) at [...]/source-gcc/gcc/final.cc:4365
#5  0x00b781b1 in execute_one_pass (pass=pass@entry=0x1f4bd00) at 
[...]/source-gcc/gcc/passes.cc:2639
#6  0x00b7855a in execute_pass_list_1 (pass=0x1f4bd00) at 
[...]/source-gcc/gcc/passes.cc:2739
#7  0x00b7858d in execute_pass_list_1 (pass=0x1f4b820) at 
[...]/source-gcc/gcc/passes.cc:2740
#8  0x00b7858d in execute_pass_list_1 (pass=0x1f49d20, 
pass@entry=0x1f45780) at [...]/source-gcc/gcc/passes.cc:2740
#9  0x00b785e9 in execute_pass_list (fn=0x772e1e40, 
pass=0x1f45780) at [...]/source-gcc/gcc/passes.cc:2750
#10 0x00732a66 in cgraph_node::expand (this=0x772efbb0) at 
[...]/source-gcc/gcc/cgraphunit.cc:1836
#11 0x0073336a in cgraph_order_sort::process (this=0x20730f8) at 
[...]/source-gcc/gcc/cgraphunit.cc:2075
#12 0x007336f4 in output_in_order () at 
[...]/source-gcc/gcc/cgraphunit.cc:2143
#13 0x00733dbe in symbol_table::compile (this=0x77542000) at 
[...]/source-gcc/gcc/cgraphunit.cc:2347
#14 0x0065d79b in lto_main () at [...]/source-gcc/gcc/lto/lto.cc:655
#15 0x00c709e6 in compile_file () at 
[...]/source-gcc/gcc/toplev.cc:454
#16 0x00c73abb in do_compile (no_backend=no_backend@entry=false) at 
[...]/source-gcc/gcc/toplev.cc:2160
#17 0x00c73ea6 in toplev::main (this=this@entry=0x7fffd4b0, 
argc=argc@entry=16, argv=0x1f1db40, argv@entry=0x7fffd5b8) at 
[...]/source-gcc/gcc/toplev.cc:2312
#18 0x0174fe5f in main (argc=16, argv=0x7fffd5b8) at 
[...]/source-gcc/gcc/main.cc:41

> currently testing attached
> fix.

Per the test results that I've got so far (but is still running), your
proposed fix does resolve the SIGSEGVs, thanks.


Grüße
 Thomas


> [nvptx] Fix dummy location in gen_comment
>
> I committed "[nvptx] Add -mptx-comment", but tested it in combination with the
> proposed "[final] Handle compiler-generated asm insn" (
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html ), so
> by itself the commit introduced some regressions:
> ...
> FAIL: gcc.dg/20020426-2.c (internal compiler error: Segmentation fault)
> FAIL: gcc.dg/analyzer/zlib-3.c (internal compiler error: Segmentation fault)
> FAIL: gcc.dg/pr101223.c (internal compiler error: Segmentation fault)
> FAIL: gcc.dg/torture/pr80764.c   -O2  (internal compiler error: Segmentation 
> fault)
> ...
>
> There are due to cfun->function_start_locus == 0.
>
> Fix these by using DECL_SOURCE_LOCATION (cfun->decl) instead.
>
> Tested on nvptx.
>
> gcc/ChangeLog:
>
> 2022-02-23  Tom de Vries  
>
>   * config/nvptx/nvptx.cc (gen_comment): Use
>   

[PATCH][nvptx] Add shf.{l,r}.wrap insn

2022-02-23 Thread Tom de Vries via Gcc-patches
Hi,

Ptx contains funnel shift operations shf.l.wrap and shf.r.wrap that can be
used to implement 32-bit left or right rotate.

Add define_insns rotlsi3 and rotrsi3.

Currently testing.

Thanks,
- Tom

[nvptx] Add shf.{l,r}.wrap insn

gcc/ChangeLog:

2022-02-23  Tom de Vries  

* config/nvptx/nvptx.md (define_insn "rotlsi3", define_insn
"rotrsi3"): New define_insn.

gcc/testsuite/ChangeLog:

2022-02-23  Tom de Vries  

* gcc.target/nvptx/rotate-run.c: New test.
* gcc.target/nvptx/rotate.c: New test.

---
 gcc/config/nvptx/nvptx.md   | 16 
 gcc/testsuite/gcc.target/nvptx/rotate-run.c | 23 +++
 gcc/testsuite/gcc.target/nvptx/rotate.c | 20 
 3 files changed, 59 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 216e89f230ac..4989b5642e29 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -808,6 +808,22 @@
   ""
   "%.\\tshr.u%T0\\t%0, %1, %2;")
 
+(define_insn "rotlsi3"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (rotate:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+  (and:SI (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")
+  (const_int 31]
+  "TARGET_SM35"
+  "%.\\tshf.l.wrap.b32\\t%0, %1, %1, %2;")
+
+(define_insn "rotrsi3"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (rotatert:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+(and:SI (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")
+(const_int 31]
+  "TARGET_SM35"
+  "%.\\tshf.r.wrap.b32\\t%0, %1, %1, %2;")
+
 ;; Logical operations
 
 (define_code_iterator any_logic [and ior xor])
diff --git a/gcc/testsuite/gcc.target/nvptx/rotate-run.c 
b/gcc/testsuite/gcc.target/nvptx/rotate-run.c
new file mode 100644
index ..14cb6f8b0b3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/rotate-run.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "rotate.c"
+
+#define ASSERT(EXPR)   \
+  do   \
+{  \
+  if (!(EXPR)) \
+   __builtin_abort (); \
+} while (0)
+
+int
+main (void)
+{
+  ASSERT (rotl (0x12345678, 8) == 0x34567812);
+  ASSERT (rotl (0x12345678, 8 + 32) == 0x34567812);
+
+  ASSERT (rotr (0x12345678, 8) == 0x78123456);
+  ASSERT (rotr (0x12345678, 8 + 32) == 0x78123456);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/nvptx/rotate.c 
b/gcc/testsuite/gcc.target/nvptx/rotate.c
new file mode 100644
index ..1c9b83b4809d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/rotate.c
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -save-temps" } */
+
+#define MASK 0x1f
+
+unsigned int
+rotl (unsigned int val, unsigned int cnt) {
+  cnt &= MASK;
+  return (val << cnt) | (val >> (-cnt & MASK));
+}
+
+unsigned int
+rotr (unsigned int val, unsigned int cnt) {
+  cnt &= MASK;
+  return (val >> cnt) | (val << (-cnt & MASK));
+}
+
+/* { dg-final { scan-assembler-times "shf.l.wrap.b32" 1 } } */
+/* { dg-final { scan-assembler-times "shf.r.wrap.b32" 1 } } */
+/* { dg-final { scan-assembler-not "and.b32" } } */


Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread guojiufu via Gcc-patches




On 2/22/22 PM3:26, Richard Biener wrote:

On Tue, 22 Feb 2022, Jiufu Guo wrote:


Hi,

For constants, there are some codes to check: if it is able to put
to instruction as an immediate operand or it is profitable to load from
mem.  There are still some places that could be improved for platforms.

This patch could handle PR63281/57836.  This patch does not change
too much on the code like force_const_mem and legitimate_constant_p.
We may integrate these APIs for passes like expand/cse/combine
as a whole solution in the future (maybe better for stage1?).

Bootstrap and regtest pass on ppc64le and x86_64. Is this ok for trunk?
Thanks for comments!


I'm not sure whether we need a new hook here, but iff, then I think
whether loading a constant (from memory?) is faster or not depends
on the context.  So what's the exact situation and the two variants
you are costing against each other?  I assume (since you are
touching CSE) you are costing


Hi Richard,

Thanks for your review!

In some contexts, it may be faster to load from memory for some
constant value, and for some constant value, it would be faster
to put into immediate of few (1 or 2) instructions.

For example 0x1234567812345678, on ppc64, we may need 3 instructions
to build it, and then it would be better to put it in .rodata, and
then load it from memory.

Currently, we already have hooks TARGET_CANNOT_FORCE_CONST_MEM and
TARGET_LEGITIMATE_CONSTANT_P.

TARGET_CANNOT_FORCE_CONST_MEM is used to check if one 'rtx' can be
store into the constant pool.
On some targets (e.g. alpha), TARGET_LEGITIMATE_CONSTANT_P does the
behavior like what we expect:

I once thought to use TARGET_LEGITIMATE_CONSTANT_P too.
But in general, it seems this hook is designed to check if one
'rtx' could be used as an immediate instruction. This hook is used
in RTL passes: ira/reload. It is also used in recog.cc and expr.cc.

In other words, I feel, whether putting a constant in the constant
pool, we could check:
- If TARGET_CANNOT_FORCE_CONST_MEM returns true, we should not put
the 'constant' in the constant pool.
- If TARGET_LEGITIMATE_CONSTANT_P returns true, then the 'constant'
would be immediate of **one** instruction, and not put to constant
pool.
- If the new hook TARGET_FASTER_LOADING_CONSTANT returns true, then
the 'constant' would be stored in the constant pool.
Otherwise, it would be better to use an instructions-seq to build the
'constant'.
This is why I introduce a new hook.

We may also use the new hook at other places, e.g. expand/combining...
where is calling force_const_mem.

Any suggestions?



   (set (...) (mem))  (before CSE)

against

   (set (...) (immediate))  (what CSE does now)

vs.

   (set (...) (mem))  (original, no CSE)

?  With the new hook you are skipping _all_ of the following loops
logic which does look like a quite bad design and hack (not that
I am very familiar with the candidate / costing logic in there).


At cse_insn, in the following loop of the code, it is also testing
the constant and try to put into memory:

  else if (crtl->uses_const_pool
   && CONSTANT_P (trial)
   && !CONST_INT_P (trial)
   && (src_folded == 0 || !MEM_P (src_folded))
   && GET_MODE_CLASS (mode) != MODE_CC
   && mode != VOIDmode)
{
  src_folded = force_const_mem (mode, trial);
  if (src_folded)
{
  src_folded_cost = COST (src_folded, mode);
  src_folded_regcost = approx_reg_cost (src_folded);
}
}

This code is at the end of the loop, it would only be tested for
the next iteration. It may be better to test "if need to put the
constant into memory" for all iterations.

The current patch is adding an additional test before the loop.
I will update the patch to integrate these two places!



We already have TARGET_INSN_COST which you could ask for a cost.
Like if we'd have a single_set then just temporarily substitute
the RHS with the candidate and cost the insns and compare against
the original insn cost.  So why exactly do you need a new hook
for this particular situation?


Thanks for pointing out this! Segher also mentioned this before.
Currently, CSE is using rtx_cost. Using insn_cost to replace
rtx_cost would be a good idea for all necessary places including CSE.

For this particular case: check the cost for constants.
I did not use insn_cost. Because to use insn_cost, we may need
to create a recognizable insn temporarily, and for some kind of
constants we may need to create a sequence instructions on some
platform, e.g. "li xx; ori ; sldi .." on ppc64, and check the
sum cost of those instructions. If only create one fake
instruction, the insn_cost may not return the accurate cost either.

BR,
Jiufu



Thanks,
Richard.




BR,
Jiufu

gcc/ChangeLog:

PR target/94393
PR rtl-optimization/63281
* config/rs6000/rs6000.cc 

[PATCH] Support SSA name declarations with pointer type

2022-02-23 Thread Richard Biener via Gcc-patches
Currently we fail to parse

  int * _3;

as SSA name and instead get a VAR_DECL because of the way the C
frontends declarator specs work.  That causes havoc if those
supposed SSA names are used in PHIs or in other places where
VAR_DECLs are not allowed.  The following fixes the pointer case
in an ad-hoc way - for more complex type declarators we probably
have to find a way to re-use the C frontend grokdeclarator without
actually creating a VAR_DECL there (or maybe make it create an
SSA name).

Pointers appear too often to be neglected though, thus the following
ad-hoc fix for this.  This also adds verification that we do not
end up with SSA names without definitions as can happen when
reducing a GIMPLE testcase.  Instead of working through segfaults
one-by-one we emit errors for all of those at once now.

Bootstrap and regtest running on x86_64-unknown-linux-gnu - not exactly
stage4 material but IMHO important enough for creating unit tests.

Richard.

2022-02-23  Richard Biener  

gcc/c
* gimple-parser.cc (c_parser_parse_gimple_body): Diagnose
SSA names without definition.
(c_parser_gimple_declaration): Handle pointer typed SSA names.

gcc/testsuite/
* gcc.dg/gimplefe-49.c: New testcase.
* gcc.dg/gimplefe-error-13.c: Likewise.
---
 gcc/c/gimple-parser.cc   | 34 +++-
 gcc/testsuite/gcc.dg/gimplefe-49.c   | 27 +++
 gcc/testsuite/gcc.dg/gimplefe-error-13.c | 11 
 3 files changed, 66 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-49.c
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-error-13.c

diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
index 31075237c98..d1afd42556c 100644
--- a/gcc/c/gimple-parser.cc
+++ b/gcc/c/gimple-parser.cc
@@ -330,13 +330,17 @@ c_parser_parse_gimple_body (c_parser *cparser, char 
*gimple_pass,
  }
gsi_remove (, true);
  }
- /* Fill SSA name gaps, putting them on the freelist.  */
+ /* Fill SSA name gaps, putting them on the freelist and diagnose
+SSA names without definition.  */
  for (unsigned i = 1; i < num_ssa_names; ++i)
if (!ssa_name (i))
  {
tree name = make_ssa_name_fn (cfun, integer_type_node, NULL, i);
release_ssa_name_fn (cfun, name);
  }
+   else if (!SSA_NAME_DEF_STMT (ssa_name (i)))
+ error ("SSA name %qE with version %d has no definition",
+ssa_name (i), i);
  /* No explicit virtual operands (yet).  */
  bitmap_obstack_initialize (NULL);
  update_ssa (TODO_update_ssa_only_virtuals);
@@ -2061,16 +2065,34 @@ c_parser_gimple_declaration (gimple_parser )
   /* Handle SSA name decls specially, they do not go into the identifier
  table but we simply build the SSA name for later lookup.  */
   unsigned version, ver_offset;
-  if (declarator->kind == cdk_id
- && is_gimple_reg_type (specs->type)
- && c_parser_parse_ssa_name_id (declarator->u.id.id,
+  /* Handle SSA pointer declarations in a very simplistic ways, we
+probably would like to call grokdeclarator in a special mode to
+just build the type of the decl - start_decl already pushes
+the identifier to the bindings for lookup, something we do not
+want.  */
+  struct c_declarator *id_declarator = declarator;
+  while (id_declarator->kind == cdk_pointer)
+   id_declarator = id_declarator->declarator;
+  if (id_declarator->kind == cdk_id
+ && (declarator->kind == cdk_pointer
+ || is_gimple_reg_type (specs->type))
+ && c_parser_parse_ssa_name_id (id_declarator->u.id.id,
 , _offset)
  /* The following restricts it to unnamed anonymous SSA names
 which fails parsing of named ones in dumps (we could
 decide to not dump their name for -gimple).  */
  && ver_offset == 0)
-   c_parser_parse_ssa_name (parser, declarator->u.id.id, specs->type,
-version, ver_offset);
+   {
+ struct c_declarator *p = declarator;
+ tree type = specs->type;
+ while (p->kind == cdk_pointer)
+   {
+ type = build_pointer_type (type);
+ p = p->declarator;
+   }
+ c_parser_parse_ssa_name (parser, id_declarator->u.id.id, type,
+  version, ver_offset);
+   }
   else
{
  tree postfix_attrs = NULL_TREE;
diff --git a/gcc/testsuite/gcc.dg/gimplefe-49.c 
b/gcc/testsuite/gcc.dg/gimplefe-49.c
new file mode 100644
index 000..d28dc70841e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-49.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+
+__GIMPLE (ssa) int *
+bar (int i, int a, int b)
+{
+  int * _3;
+  int *p;
+

Re: [PATCH] match.pd, v2: Avoid infinite recursion during bswap (int_cst_ovf1) cmp int_cst_ovf2 [PR104644]

2022-02-23 Thread Richard Biener via Gcc-patches
On Wed, 23 Feb 2022, Jakub Jelinek wrote:

> On Wed, Feb 23, 2022 at 11:38:30AM +0100, Richard Biener wrote:
> > If we go for a match.pd solution I'd go with the other one but as said
> > in the PR I think we should constant fold bswap (1(OVF)) but simply
> > make the (OVF) sticky as done in other constant foldings.
> 
> Changing what fold-const-call.cc does at this point seems extremely risky to
> me.  There are many different builtins, shall we propagate e.g.
> TREE_OVERFLOW from INTEGER_CST operands to REAL_CST result, or vice versa,
> etc.?  I think not folding those is the conservatively right answer, we
> don't have time to analyze all those dozens of different builtins and all
> their corner cases, and with OVF in there they aren't valid C or C++
> constant expressions, so we don't need to fold those during GENERIC,
> and in GIMPLE we drop the overflow flags and can fold those there then.

Note we're not 100% there on a (OVF)-free GIMPLE IL.

At this point I'd simply adjust

static tree
fold_const_call_1 (combined_fn fn, tree type, tree arg)
{
  machine_mode mode = TYPE_MODE (type);
  machine_mode arg_mode = TYPE_MODE (TREE_TYPE (arg));

  if (integer_cst_p (arg))
{
  if (SCALAR_INT_MODE_P (mode))
{
  wide_int result;
  if (fold_const_call_ss (, fn, wi::to_wide (arg),
  TYPE_PRECISION (type), TREE_TYPE (arg)))
return wide_int_to_tree (type, result);

to check if (TREE_CODE (arg) == INTEGER_CST) and

return force_fit_type (type, result, 0, TREE_OVERFLOW (arg));

not constant folding something with constant arguments is much
more risky than accidentially dropping a TREE_OVERFLOW (as we can
see with this endless recursion).  All cases currently handled
in the fold_const_call_ss overload do not in itself perform
arithmetic that can introduce overflow from an argument that
does not have TREE_OVERFLOW set.

Just fixing this single case leaves others unfixed, but for
example complex_cst_p fails to check for TREE_OVERFLOW already.

Richard.


[PATCH][nvptx] Fix dummy location in gen_comment

2022-02-23 Thread Tom de Vries via Gcc-patches

[ Re: [committed][nvptx] Add -mptx-comment ]

On 2/22/22 14:53, Tom de Vries wrote:

Hi,

Add functionality that indicates which insns are added by -minit-regs, such
that for instance we have for pr53465.s:
...
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // Start: Added by -minit-regs=3:
 // #NO_APP
 mov.u32 %r26, 0;
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // End: Added by -minit-regs=3:
 // #NO_APP
...

Can be switched off using -mno-ptx-comment.

Tested on nvptx.


But tested in combination with another patch, which is still waiting for 
review.


This patch by itself caused some regressions, currently testing attached 
fix.


Thanks,
- Tom
[nvptx] Fix dummy location in gen_comment

I committed "[nvptx] Add -mptx-comment", but tested it in combination with the
proposed "[final] Handle compiler-generated asm insn" (
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html ), so
by itself the commit introduced some regressions:
...
FAIL: gcc.dg/20020426-2.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/analyzer/zlib-3.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/pr101223.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/torture/pr80764.c   -O2  (internal compiler error: Segmentation fault)
...

There are due to cfun->function_start_locus == 0.

Fix these by using DECL_SOURCE_LOCATION (cfun->decl) instead.

Tested on nvptx.

gcc/ChangeLog:

2022-02-23  Tom de Vries  

	* config/nvptx/nvptx.cc (gen_comment): Use
	DECL_SOURCE_LOCATION (cfun->decl) instead of cfun->function_start_locus.

---
 gcc/config/nvptx/nvptx.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 858789e6df76..6f6d592e4621 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5382,7 +5382,7 @@ gen_comment (const char *s)
   char *comment = (char *) alloca (len);
   snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
   return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
-cfun->function_start_locus);
+DECL_SOURCE_LOCATION (cfun->decl));
 }
 
 /* Initialize all declared regs at function entry.


[PATCH] tree-optimization/101636 - CTOR vectorization ICE

2022-02-23 Thread Richard Biener via Gcc-patches
The following fixes an ICE when vectorizing the defs of a CTOR
results in a different vector type than expected.  That can happen
with AARCH64 SVE and a fixed vector length as noted in r10-5979
and on x86 with AVX512 mask CTORs and trying to re-vectorize
using SSE as shown in this bug.

The fix is simply to reject the vectorization when it didn't
produce the desired type.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-02-23  Richard Biener  

PR tree-optimization/101636
* tree-vect-slp.cc (vect_print_slp_tree): Dump the
vector type of the node.
(vect_slp_analyze_operations): Make sure the CTOR
is vectorized with an expected type.
(vectorize_slp_instance_root_stmt): Revert r10-5979 fix.

* gcc.target/i386/pr101636.c: New testcase.
* c-c++-common/torture/pr101636.c: Likewise.
---
 gcc/testsuite/c-c++-common/torture/pr101636.c | 30 ++
 gcc/testsuite/gcc.target/i386/pr101636.c  | 94 +++
 gcc/tree-vect-slp.cc  | 17 ++--
 3 files changed, 135 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/torture/pr101636.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101636.c

diff --git a/gcc/testsuite/c-c++-common/torture/pr101636.c 
b/gcc/testsuite/c-c++-common/torture/pr101636.c
new file mode 100644
index 000..aedaa1fdcae
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/pr101636.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize -fno-vect-cost-model" } */
+/* { dg-additional-options "-mavx512f" { target x86_64-*-* i?86-*-* } } */
+
+static inline int
+foo (int y, int a)
+{
+  return (y && a) ? a : 0;
+}
+
+void
+bar (int *__restrict a, int *__restrict d, int *__restrict e, int i)
+{
+  while (i < 1)
+{
+  e[8] = e[7] = e[6] = e[5] = e[4] = e[3] = e[2] = e[1] = e[0]
+= foo (d[8], a[8]);
+  e[9] = foo (d[9], a[9]);
+  e[10] = foo (d[0], a[0]);
+  e[11] = foo (d[1], a[1]);
+  e[12] = foo (d[12], a[12]);
+  e[13] = foo (d[13], a[13]);
+  e[14] = foo (d[4], a[4]);
+  e[15] = foo (d[15], a[15]);
+
+  a += 16;
+  e += 1;
+  i += 1;
+}
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr101636.c 
b/gcc/testsuite/gcc.target/i386/pr101636.c
new file mode 100644
index 000..76399cc2927
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101636.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -O -mavx512f -ftree-vectorize -fno-vect-cost-model" 
} */
+
+typedef _Bool sbool1 __attribute__((signed_bool_precision(1)));
+typedef int v16si __attribute__((vector_size(64)));
+typedef v16si v16sim __attribute__((vector_mask));
+typedef long v16di __attribute__((vector_size(128)));
+
+void __GIMPLE (ssa,guessed_local(118111600),startwith("slp"))
+bar (int * restrict a, int * restrict d, int * restrict e)
+{
+  int * vectp_14;
+  v16si * vectp_e_13;
+  v16si vect_iftmp_12;
+  v16sim mask__75_11;
+  v16sim mask__74_10;
+  v16si vect__6_9;
+  v16si vect__1_8;
+  int * vectp_7;
+  v16si * vectp_a_6;
+  int _2;
+  int _5;
+  int _7;
+  int _9;
+  int _11;
+  int _13;
+  int _15;
+  int _17;
+  _Bool _41;
+  _Bool _49;
+  _Bool _53;
+  _Bool _57;
+  _Bool _61;
+  _Bool _65;
+  _Bool _69;
+  _Bool _73;
+  sbool1 _135;
+  sbool1 _136;
+  sbool1 _137;
+  sbool1 _138;
+  sbool1 _139;
+  sbool1 _140;
+  sbool1 _141;
+  sbool1 _142;
+  sbool1 _143;
+  sbool1 _144;
+  sbool1 _145;
+  sbool1 _146;
+  sbool1 _147;
+  sbool1 _148;
+  sbool1 _149;
+  sbool1 _150;
+  v16sim _151;
+
+  __BB(2,guessed_local(105119324)):
+  _2 = __MEM  (d_26(D) + _Literal (int * restrict) 32);
+  _73 = _2 != 0;
+  _5 = __MEM  (d_26(D) + _Literal (int * restrict) 36);
+  _69 = _5 != 0;
+  _7 = __MEM  (d_26(D));
+  _65 = _7 != 0;
+  _9 = __MEM  (d_26(D) + _Literal (int * restrict) 4);
+  _61 = _9 != 0;
+  _11 = __MEM  (d_26(D) + _Literal (int * restrict) 48);
+  _57 = _11 != 0;
+  _13 = __MEM  (d_26(D) + _Literal (int * restrict) 52);
+  _53 = _13 != 0;
+  _15 = __MEM  (d_26(D) + _Literal (int * restrict) 16);
+  _41 = _15 != 0;
+  _17 = __MEM  (d_26(D) + _Literal (int * restrict) 60);
+  _49 = _17 != 0;
+  _135 = _49 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _136 = _41 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _137 = _53 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _138 = _57 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _139 = _61 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _140 = _65 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _141 = _69 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _142 = _73 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _143 = _73 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _144 = _73 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _145 = _73 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _146 = _73 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _147 = _73 ? _Literal (sbool1) -1 : _Literal (sbool1) 0;
+  _148 = 

Re: [PATCH] match.pd, v2: Avoid infinite recursion during bswap (int_cst_ovf1) cmp int_cst_ovf2 [PR104644]

2022-02-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Feb 23, 2022 at 11:38:30AM +0100, Richard Biener wrote:
> If we go for a match.pd solution I'd go with the other one but as said
> in the PR I think we should constant fold bswap (1(OVF)) but simply
> make the (OVF) sticky as done in other constant foldings.

Changing what fold-const-call.cc does at this point seems extremely risky to
me.  There are many different builtins, shall we propagate e.g.
TREE_OVERFLOW from INTEGER_CST operands to REAL_CST result, or vice versa,
etc.?  I think not folding those is the conservatively right answer, we
don't have time to analyze all those dozens of different builtins and all
their corner cases, and with OVF in there they aren't valid C or C++
constant expressions, so we don't need to fold those during GENERIC,
and in GIMPLE we drop the overflow flags and can fold those there then.

Jakub



RE: [PATCH][GCC] aarch64: fix: ls64 tests fail on aarch64-linux-gnu_ilp32 [PR103729]

2022-02-23 Thread Przemyslaw Wirkus via Gcc-patches
Ping :)

> This patch is sorting issue with LS64 intrinsics tests failing with
> aarch64-linux-gnu_ilp32 target.
>
> Regtested on aarch64-linux-gnu_ilp32, aarch64-elf and aarch64_be-elf
> and no issues.
>
> OK to install?
>
> gcc/ChangeLog:
>
>    PR target/103729
>    * config/aarch64/aarch64-builtins.c 
>(aarch64_expand_builtin_ls64):
>    Handle SImode for ILP32.

--- 

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
0d09fe9dd6dd65c655f5bd0b9a622e7550b61a4b..58bcd99d25b79191589cf9bf8a99db4f4b6a6ba1
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2216,7 +2216,8 @@ aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx 
target)
   {
 rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
 create_output_operand ([0], target, V8DImode);
-create_input_operand ([1], op0, DImode);
+create_input_operand ([1],
+GET_MODE (op0) == SImode ? gen_reg_rtx (DImode) : op0, DImode);
 expand_insn (CODE_FOR_ld64b, 2, ops);
 return ops[0].value;
   }
@@ -2234,7 +2235,8 @@ aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx 
target)
 rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
 rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
 create_output_operand ([0], target, DImode);
-create_input_operand ([1], op0, DImode);
+create_input_operand ([1],
+GET_MODE (op0) == SImode ? gen_reg_rtx (DImode) : op0, DImode);
 create_input_operand ([2], op1, V8DImode);
 expand_insn (CODE_FOR_st64bv, 3, ops);
 return ops[0].value;
@@ -2244,7 +2246,8 @@ aarch64_expand_builtin_ls64 (int fcode, tree exp, rtx 
target)
 rtx op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
 rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 1));
 create_output_operand ([0], target, DImode);
-create_input_operand ([1], op0, DImode);
+create_input_operand ([1],
+GET_MODE (op0) == SImode ? gen_reg_rtx (DImode) : op0, DImode);
 create_input_operand ([2], op1, V8DImode);
 expand_insn (CODE_FOR_st64bv0, 3, ops);
 return ops[0].value;



Re: [committed][nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt

2022-02-23 Thread Tom de Vries via Gcc-patches

On 2/23/22 10:06, Thomas Schwinge wrote:

Hi Tom!

This is me again, following along GCC/nvptx devlopment, and asking
questions.  ;-)



Yes, thanks for that, that's useful :)


On 2022-02-19T20:07:18+0100, Tom de Vries via Gcc-patches 
 wrote:

With the default ptx isa 6.0, we have for uniform-simt-1.c:
...
 @%r33   atom.global.cas.b32 %r26, [a], %r28, %r29;
 shfl.sync.idx.b32   %r26, %r26, %r32, 31, 0x;
...

The atomic insn is predicated by -muniform-simt, and the subsequent insn does
a warp sync, at which point the warp is uniform again.


I understand the concern here is Independent Thread Scheduling, where the
execution of predicated-off threads of a warp ('@ ! %r33') may proceed
with the next instruction, 'shfl', without implicitly waiting for the
other threads of a warp still working on the 'atom'?  Hence, the 'sync'
aspect of 'shfl.sync', as a means that PTX provides at the ISA level such
that we're getting the desired semantics: as its first step, "wait for
all threads in membermask to arrive".



Indeed.


But with -mptx=3.1, we have instead:
...
 @%r33   atom.global.cas.b32 %r26, [a], %r28, %r29;
 shfl.idx.b32%r26, %r26, %r32, 31;
...

The shfl does not sync the warp, and we want the warp to go back to executing
uniformly asap.  We cannot enforce this


Is it really the case that such code may cause "permanent" warp-divergent
execution (until re-converging "somewhere")?  My understanding has been
that predicated-off threads of a warp ('@ ! %r33') would simply idle,
implicitly waiting for the other threads of a warp still working on the
'atom' -- due to the nature of a shared program counter per warp, and the
desire to re-converge as soon as possible.

For example, PTX ISA 7.2, 3.1. "A Set of SIMT Multiprocessors":

| [...]
| At every instruction issue time, the SIMT unit selects a warp that is ready 
to execute and
| issues the next instruction to the active threads of the warp. A warp 
executes one common
| instruction at a time, so full efficiency is realized when all threads of a 
warp agree on their
| execution path. If threads of a warp diverge via a data-dependent conditional 
branch, the
| warp serially executes each branch path taken, disabling threads that are not 
on that path,
| and when all paths complete, the threads converge back to the same execution 
path. [...]

So I'd have assumed that after the potentially-diverging
'@%r33'-predicated 'atom' instruction, we're implicitly re-converging for
the unpredicated 'shfl' (as long as Independent Thread Scheduling isn't
involved, which it it's for '-mptx=3.1')?

As I'm understanding you, my understanding is not correct, and we may
thus be getting "permanent" warp-divergent execution as soon as there's
any predication/conditional involved that may evaluate differently for
individual threads of a warp, and we thus need such *explicit*
synchronization after all such instances?



Reading the ptx manual, I think your interpretation of what _should_ 
happen is right.


Regardless, the JIT is still free to translate say a block of equally 
predicated insns using a branch as long as it inserts a warp sync right 
after.  And then there might be a JIT bug that optimizes that sync away, 
or shift it further out, past the shfl.


So perhaps the rationale should have been formulated more in terms of 
the shfl.  Note btw that it's possible that there's a compiler bug that 
does a diverging branch earlier, which would give problems for the shfl, 
and which the check would catch.


Note that the uniform-warp-check insn doesn't enforce convergence.  It 
only checks that the warp is convergent.


So, if the warp is not convergent, the check will abort.

If the warp is convergent, the JIT optimizer is free to optimize the 
check away.


And sometimes we have seen that adding the check makes the warp 
convergent (as in: preventing some JIT bug to trigger).


Anyway, unfortunately at this point I don't remember whether I found a 
smoking gun specifically for openmp.


Thanks,
- Tom


but at least check this using
nvptx_uniform_warp_check, similar to how that is done for openacc.

Likewise, detect the case that no shfl insn is emitted, and add a
nvptx_uniform_warp_check or nvptx_warpsync.


For example, 'nvptx-none/mgomp/libatomic/cas_1_.o':

 [...]
  @ %r71 atom.cas.b64 %r62,[%r35],%r29,%r61;
 +{
 +.reg .b32 act;
 +vote.ballot.b32 act,1;
 +.reg .pred uni;
 +setp.eq.b32 uni,act,0x;
 +@ ! uni trap;
 +@ ! uni exit;
 +}
  mov.b64 {%r69,%r70},%r62;
  shfl.idx.b32 %r69,%r69,%r68,31;
  shfl.idx.b32 %r70,%r70,%r68,31;
 [...]

So that's basically an 'assert' that all threads of a warp are converged.
(Is the JIT maybe even able to optimize that out?)  I guess I just wonder
if that's not satisfied implicitly.


Grüße
  Thomas



[nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt

gcc/ChangeLog:

2022-02-19  Tom 

Re: [PATCH] match.pd: Avoid infinite recursion during bswap (int_cst_ovf1) cmp int_cst_ovf2 [PR104644]

2022-02-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Feb 23, 2022 at 11:36:26AM +0100, Richard Biener wrote:
> > The following patch avoids infinite recursion during generic folding.
> > The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
> > (bswap @1) actually being simplified, if it is not simplified, we just
> > move the bswap from one operand to the other and if @0 is also INTEGER_CST,
> > we apply the same rule next.
> > 
> > The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
> > has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
> > such cases:
> > static inline bool
> > integer_cst_p (tree t)
> > {
> >   return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
> > }
> > The patch uses ! modifier to ensure the bswap is simplified, but because
> > ! is only supported in gimple-match, guards it also with #if GIMPLE.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Does it recurse on GIMPLE as well?  If not, the ! is not necessary
> since you guard with GIMPLE already (but it deserves a comment I guess).

It could if TREE_OVERFLOW would somehow appear in there, but given that the
gimplifier and other spots drop_tree_overflow, it is only theoretical.

Note, I've posted a different version of the patch that allows optimizing
this even if GENERIC in most cases.  I think I prefer that version.

Jakub



Re: [PATCH] match.pd, v2: Avoid infinite recursion during bswap (int_cst_ovf1) cmp int_cst_ovf2 [PR104644]

2022-02-23 Thread Richard Biener via Gcc-patches
On Wed, 23 Feb 2022, Jakub Jelinek wrote:

> On Wed, Feb 23, 2022 at 09:32:41AM +0100, Jakub Jelinek via Gcc-patches wrote:
> > The following patch avoids infinite recursion during generic folding.
> > The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
> > (bswap @1) actually being simplified, if it is not simplified, we just
> > move the bswap from one operand to the other and if @0 is also INTEGER_CST,
> > we apply the same rule next.
> > 
> > The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
> > has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
> > such cases:
> > static inline bool
> > integer_cst_p (tree t)
> > {
> >   return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
> > }
> > The patch uses ! modifier to ensure the bswap is simplified, but because
> > ! is only supported in gimple-match, guards it also with #if GIMPLE.
> 
> Here is another variant, which just breaks the possible ping-pong.
> If @0 is not INTEGER_CST, we still want to canonicalize to bswap on the
> INTEGER_CST (e.g. in the hope that we throw away TREE_OVERFLOW during/after
> gimplification), but if it is INTEGER_CST, we don't want to move bswap
> to the operand with TREE_OVERFLOW on it.
> 
> Ok for trunk if this passes bootstrap/regtest (it fixes the testcase too)?

If we go for a match.pd solution I'd go with the other one but as said
in the PR I think we should constant fold bswap (1(OVF)) but simply
make the (OVF) sticky as done in other constant foldings.

Richard.

> 2022-02-23  Jakub Jelinek  
> 
>   PR tree-optimization/104644
>   * match.pd (cmp (bswap @0) INTEGER_CST@1): Don't simplify
>   if TREE_OVERFLOW (@1) and @0 is INTEGER_CST.
> 
>   * gcc.dg/pr104644.c: New test.
> 
> --- gcc/match.pd.jj   2022-02-23 09:17:04.867124392 +0100
> +++ gcc/match.pd  2022-02-23 10:31:05.417304115 +0100
> @@ -3961,8 +3961,9 @@ (define_operator_list SYNC_FETCH_AND_AND
>  (cmp (convert:ctype @0) (convert:ctype @1
>(simplify
> (cmp (bswap @0) INTEGER_CST@1)
> -   (with { tree ctype = TREE_TYPE (@1); }
> -(cmp (convert:ctype @0) (bswap @1)
> +   (if (TREE_CODE (@0) != INTEGER_CST || !TREE_OVERFLOW (@1))
> +(with { tree ctype = TREE_TYPE (@1); }
> + (cmp (convert:ctype @0) (bswap @1))
>   /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & C2.  */
>   (simplify
>(bit_and (convert1? (rshift@0 (convert2? (bswap@4 @1)) INTEGER_CST@2))
> --- gcc/testsuite/gcc.dg/pr104644.c.jj2022-02-23 10:29:50.704341688 
> +0100
> +++ gcc/testsuite/gcc.dg/pr104644.c   2022-02-23 10:29:50.704341688 +0100
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/104644 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wno-overflow" } */
> +
> +int
> +foo (void)
> +{
> +  return __builtin_bswap16 (1.31072e+5f) != (signed char) 1.31072e+5f;
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] match.pd: Avoid infinite recursion during bswap (int_cst_ovf1) cmp int_cst_ovf2 [PR104644]

2022-02-23 Thread Richard Biener via Gcc-patches
On Wed, 23 Feb 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following patch avoids infinite recursion during generic folding.
> The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
> (bswap @1) actually being simplified, if it is not simplified, we just
> move the bswap from one operand to the other and if @0 is also INTEGER_CST,
> we apply the same rule next.
> 
> The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
> has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
> such cases:
> static inline bool
> integer_cst_p (tree t)
> {
>   return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
> }
> The patch uses ! modifier to ensure the bswap is simplified, but because
> ! is only supported in gimple-match, guards it also with #if GIMPLE.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Does it recurse on GIMPLE as well?  If not, the ! is not necessary
since you guard with GIMPLE already (but it deserves a comment I guess).

> 2022-02-22  Jakub Jelinek  
> 
>   PR tree-optimization/104644
>   * match.pd (cmp (bswap @0) INTEGER_CST@1): Restrict optimization to
>   GIMPLE only and use ! modifier on bswap.
> 
>   * gcc.dg/pr104644.c: New test.
> 
> --- gcc/match.pd.jj   2022-02-18 12:38:06.075393091 +0100
> +++ gcc/match.pd  2022-02-22 20:22:02.222022022 +0100
> @@ -3959,10 +3959,13 @@ (define_operator_list SYNC_FETCH_AND_AND
> (cmp (bswap@2 @0) (bswap @1))
> (with { tree ctype = TREE_TYPE (@2); }
>  (cmp (convert:ctype @0) (convert:ctype @1
> +#if GIMPLE
>(simplify
> (cmp (bswap @0) INTEGER_CST@1)
> (with { tree ctype = TREE_TYPE (@1); }
> -(cmp (convert:ctype @0) (bswap @1)
> +(cmp (convert:ctype @0) (bswap! @1
> +#endif
> + )
>   /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & C2.  */
>   (simplify
>(bit_and (convert1? (rshift@0 (convert2? (bswap@4 @1)) INTEGER_CST@2))
> --- gcc/testsuite/gcc.dg/pr104644.c.jj2022-02-22 20:02:32.020408468 
> +0100
> +++ gcc/testsuite/gcc.dg/pr104644.c   2022-02-22 20:02:04.609785996 +0100
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/104644 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wno-overflow" } */
> +
> +int
> +foo (void)
> +{
> +  return __builtin_bswap16 (1.31072e+5f) != (signed char) 1.31072e+5f;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] warn-recursion: Don't warn for __builtin_calls in gnu_inline extern inline functions [PR104633]

2022-02-23 Thread Richard Biener via Gcc-patches
On Wed, 23 Feb 2022, Jakub Jelinek wrote:

> Hi!
> 
> The first two testcases show different ways how e.g. the glibc
> _FORTIFY_SOURCE wrappers are implemented, and on Winfinite-recursion-3.c
> the new -Winfinite-recursion warning emits a false positive warning.
> 
> It is a false positive because when a builtin with 2 names is called
> through the __builtin_ name (but not all builtins have a name prefixed
> exactly like that) from extern inline function with gnu_inline semantics,
> it doesn't mean the compiler will ever attempt to use the user inline
> wrapper for the call, the __builtin_ just does what the builtin function
> is expected to do and either expands into some compiler generated code,
> or if the compiler decides to emit a call it will use an actual definition
> of the function, but that is not the extern inline gnu_inline function
> which is never emitted out of line.
> Compared to that, in Winfinite-recursion-5.c the extern inline gnu_inline
> wrapper calls the builtin by the same name as the function's name and in
> that case it is infinite recursion, we actuall try to inline the recursive
> call and also error because the recursion is infinite during inlining;
> without always_inline we wouldn't error but it is still infinite recursion,
> the user has no control on how many recursive calls we actually inline.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2022-02-22  Jakub Jelinek  
> 
>   PR c/104633
>   * gimple-warn-recursion.cc (pass_warn_recursion::find_function_exit):
>   Don't warn about calls to corresponding builtin from extern inline
>   gnu_inline wrappers.
> 
>   * gcc.dg/Winfinite-recursion-3.c: New test.
>   * gcc.dg/Winfinite-recursion-4.c: New test.
>   * gcc.dg/Winfinite-recursion-5.c: New test.
> 
> --- gcc/gimple-warn-recursion.cc.jj   2022-01-18 11:58:59.619981528 +0100
> +++ gcc/gimple-warn-recursion.cc  2022-02-22 13:19:43.592644576 +0100
> @@ -112,13 +112,25 @@ pass_warn_recursion::find_function_exit
> if (!strcmp (name, "siglongjmp"))
>   return true;
>  
> -   if (m_built_in && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
> +   if (m_built_in
> +   && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
> && m_built_in == DECL_FUNCTION_CODE (fndecl))
>   {
> -   /* The call is being made from the definition of a built-in
> -  (e.g., in a replacement of one) to itself.  */
> -   m_calls->safe_push (stmt);
> -   return false;
> +   const char *cname
> + = IDENTIFIER_POINTER (DECL_NAME (current_function_decl));
> +   /* Don't warn about gnu_inline extern inline function
> +  like strcpy calling __builtin_strcpy, that is fine,
> +  if some call is made (the builtin isn't expanded inline),
> +  a call is made to the external definition.  */
> +   if (!(DECL_DECLARED_INLINE_P (current_function_decl)
> + && DECL_EXTERNAL (current_function_decl))
> +   || strcmp (name, cname) == 0)
> + {
> +   /* The call is being made from the definition of a built-in
> +  (e.g., in a replacement of one) to itself.  */
> +   m_calls->safe_push (stmt);
> +   return false;
> + }
>   }
>   }
>  
> --- gcc/testsuite/gcc.dg/Winfinite-recursion-3.c.jj   2022-02-22 
> 13:28:10.345579876 +0100
> +++ gcc/testsuite/gcc.dg/Winfinite-recursion-3.c  2022-02-22 
> 13:25:16.760999396 +0100
> @@ -0,0 +1,18 @@
> +/* PR c/104633 */
> +/* { dg-do compile } */
> +/* { dg-options "-Winfinite-recursion" } */
> +
> +typedef __SIZE_TYPE__ size_t;
> +int memcmp (const void *, const void *, size_t);
> +
> +extern inline __attribute__((always_inline, gnu_inline)) int
> +memcmp (const void *p, const void *q, size_t size)   /* { dg-bogus "infinite 
> recursion detected" } */
> +{
> +  return __builtin_memcmp (p, q, size);  /* { dg-bogus 
> "recursive call" } */
> +}
> +
> +int
> +foo (const void *p, const void *q, size_t size)
> +{
> +  return memcmp (p, q, size);
> +}
> --- gcc/testsuite/gcc.dg/Winfinite-recursion-4.c.jj   2022-02-22 
> 13:28:13.604534458 +0100
> +++ gcc/testsuite/gcc.dg/Winfinite-recursion-4.c  2022-02-22 
> 13:25:22.552918640 +0100
> @@ -0,0 +1,19 @@
> +/* PR c/104633 */
> +/* { dg-do compile } */
> +/* { dg-options "-Winfinite-recursion" } */
> +
> +typedef __SIZE_TYPE__ size_t;
> +int memcmp (const void *, const void *, size_t);
> +__typeof (memcmp) __memcmp_alias __asm ("memcmp");
> +
> +extern inline __attribute__((always_inline, gnu_inline)) int
> +memcmp (const void *p, const void *q, size_t size)   /* { dg-bogus "infinite 
> recursion detected" } */
> +{
> +  return __memcmp_alias (p, q, size);/* { dg-bogus 
> "recursive call" } */
> +}
> +
> +int
> +foo (const void *p, const void *q, 

[PATCH] gcc-12: Re-enable split-stack support for GNU/Hurd.

2022-02-23 Thread Svante Signell via Gcc-patches
Hello,

In line of porting the latest build of libgo/go with gcc-12 to GNU/Hurd, support
of split-stack was found to be removed.
 
After patching the files in libgo the build of gotools fails:
go1: error: '-fsplit-stack' currently only supported on GNU/Linux
go1: error: '-fsplit-stack' is not supported by this compiler configuration

The attached patch defines OPTION_GLIBC_P(opts) and OPTION_GLIBC that was lost
in config/gnu.h, needed to enable split-stack support for GNU/Hurd. 

This problem happened with the latest commit as discussed in the mail thread
starting with https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588973.html
.

The file first doing this check is: (first error: ..)
src/gcc/common/config/i386/i386-common.cc
in function:
static bool ix86_supports_split_stack (bool report,
struct gcc_options *opts ATTRIBUTE_UNUSED)

and secondly in:src/gcc/opts.cc: (second error: ...)
in function:
void
finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
location_t loc)

The checking logic is in function ix86_supports_split_stack():
#if defined(TARGET_THREAD_SPLIT_STACK_OFFSET) && defined(OPTION_GLIBC_P)
  if (!OPTION_GLIBC_P (opts))
#endif
{
  if (report)
error ("%<-fsplit-stack%> currently only supported on GNU/Linux");
  return false;
}

  bool ret = true;

In case of GNU/Hurd TARGET_THREAD_SPLIT_STACK_OFFSET is defined as well as
OPTION_GLIBC_P but OPTION_GLIBC_P(opts) is needed to. The attached patch to
src/gcc/config/gnu.h creates that definition. For GNU/Hurd, gnu.h is included in
the configure stage:
Configuring stage 1 in ./gcc
...
Using the following target machine macro files:
...
../../src/gcc/config/gnu.h

For a longer history about this bug see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104290

Additionally, I would propose the text in gcc/common/config/i386/i386-common.cc
to change from:
error ("%<-fsplit-stack%> currently only supported on GNU/Linux");
to:
error ("%<-fsplit-stack%> currently only supported on GLIBC-based systems");

Thanks!
--- a/src/gcc/config/gnu.h	2022-02-06 11:59:41.0 +0100
+++ b/src/gcc/config/gnu.h	2022-02-06 12:00:19.0 +0100
@@ -19,6 +19,9 @@
 along with GCC.  If not, see .
 */
 
+#define OPTION_GLIBC_P(opts)	(DEFAULT_LIBC == LIBC_GLIBC)
+#define OPTION_GLIBC		OPTION_GLIBC_P (_options)
+
 #undef GNU_USER_TARGET_OS_CPP_BUILTINS
 #define GNU_USER_TARGET_OS_CPP_BUILTINS()		\
 do {	\


Re: [PATCH] [i386] Fix typo in v1ti3.

2022-02-23 Thread Hongtao Liu via Gcc-patches
On Wed, Feb 23, 2022 at 5:48 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Wed, Feb 23, 2022 at 05:21:26PM +0800, liuhongt via Gcc-patches wrote:
> > For evex encoding vp{xor,or,and}, suffix is needed.
> >
> > Or there would be an error for
> > vpxor %ymm0, %ymm31, %ymm1
>
> The insn is about V1TImode, so the error would be on
> vpxor %xmm0, %xmm31, %xmm1
>
> >
> > Error: unsupported instruction `vpxor'
> >
> > Bootstrapped and regtested x86_64-pc-linux-gnu{-m32,}.
> > Pushed to trunk.
> >
> > gcc/ChangeLog:
> >
> >   * config/i386/sse.md (v1ti3): Add suffix and replace
> >   isa attr of alternative 2 from avx to avx512vl.
>
> The patch looks good, but I think it would be nice to have a dg-do assemble
> testcase for it.
Yes will add.
> Something like untested:
> /* { dg-do assemble { target { int128 && avx512vl } } } */
> /* { dg-options "-O2 -mavx512vl" } */
>
> typedef __int128 V __attribute__((vector_size (16)));
>
> void
> foo (V *x, V *y, V *z)
> {
>   register V a __asm ("xmm31") = *z;
>   __asm ("" : "+v" (a));
>   x[0] = y[0] & a;
>   x[1] = y[1] | a;
>   x[2] = y[2] ^ a;
> }
>
> Jakub
>


-- 
BR,
Hongtao


Re: [PATCH] [i386] Fix typo in v1ti3.

2022-02-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Feb 23, 2022 at 05:21:26PM +0800, liuhongt via Gcc-patches wrote:
> For evex encoding vp{xor,or,and}, suffix is needed.
> 
> Or there would be an error for
> vpxor %ymm0, %ymm31, %ymm1

The insn is about V1TImode, so the error would be on
vpxor %xmm0, %xmm31, %xmm1

> 
> Error: unsupported instruction `vpxor'
> 
> Bootstrapped and regtested x86_64-pc-linux-gnu{-m32,}.
> Pushed to trunk.
> 
> gcc/ChangeLog:
> 
>   * config/i386/sse.md (v1ti3): Add suffix and replace
>   isa attr of alternative 2 from avx to avx512vl.

The patch looks good, but I think it would be nice to have a dg-do assemble
testcase for it.
Something like untested:
/* { dg-do assemble { target { int128 && avx512vl } } } */
/* { dg-options "-O2 -mavx512vl" } */

typedef __int128 V __attribute__((vector_size (16)));

void
foo (V *x, V *y, V *z)
{
  register V a __asm ("xmm31") = *z;
  __asm ("" : "+v" (a));
  x[0] = y[0] & a;
  x[1] = y[1] | a;
  x[2] = y[2] ^ a;
}

Jakub



[PATCH 3/5 V1] RISC-V:Implement intrinsics for Crypto extension

2022-02-23 Thread shihua
From: LiaoShihua 

These headers are in https://github.com/rvkrypto/rvkrypto-fips .

gcc/ChangeLog:

* config.gcc: Add extra_headers.
* config/riscv/riscv_crypto.h: New file.
* config/riscv/riscv_crypto_scalar.h: New file.
* config/riscv/rvk_asm_intrin.h: New file.
* config/riscv/rvk_emu_intrin.h: New file.

Co-Authored-By: mjosaarinen 
---
 gcc/config.gcc |   1 +
 gcc/config/riscv/riscv_crypto.h|  12 +
 gcc/config/riscv/riscv_crypto_scalar.h | 247 ++
 gcc/config/riscv/rvk_asm_intrin.h  | 187 
 gcc/config/riscv/rvk_emu_intrin.h  | 594 +
 5 files changed, 1041 insertions(+)
 create mode 100644 gcc/config/riscv/riscv_crypto.h
 create mode 100644 gcc/config/riscv/riscv_crypto_scalar.h
 create mode 100644 gcc/config/riscv/rvk_asm_intrin.h
 create mode 100644 gcc/config/riscv/rvk_emu_intrin.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2cc5aeec9e4..caf673f1cb0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -510,6 +510,7 @@ pru-*-*)
 riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o"
+   extra_headers="riscv_crypto.h riscv_crypto_scalar.h rvk_asm_intrin.h 
rvk_emu_intrin.h"
d_target_objs="riscv-d.o"
;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/riscv_crypto.h b/gcc/config/riscv/riscv_crypto.h
new file mode 100644
index 000..d06c777b7af
--- /dev/null
+++ b/gcc/config/riscv/riscv_crypto.h
@@ -0,0 +1,12 @@
+// riscv_crypto.h
+// 2022-02-12  Markku-Juhani O. Saarinen 
+// Copyright (c) 2022, PQShield Ltd. All rights reserved.
+
+// === Master crypto intrinsics header. Currently Just includes scalar 
crypto.
+
+#ifndef _RISCV_CRYPTO_H
+#define _RISCV_CRYPTO_H
+
+#include "riscv_crypto_scalar.h"
+
+#endif //  _RISCV_CRYPTO_H
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv_crypto_scalar.h 
b/gcc/config/riscv/riscv_crypto_scalar.h
new file mode 100644
index 000..0ed627856fd
--- /dev/null
+++ b/gcc/config/riscv/riscv_crypto_scalar.h
@@ -0,0 +1,247 @@
+// riscv_crypto_scalar.h
+// 2021-11-08  Markku-Juhani O. Saarinen 
+// Copyright (c) 2021, PQShield Ltd. All rights reserved.
+
+// === Scalar crypto: General mapping from intrinsics to compiler builtins,
+// inline assembler, or to an (insecure) porting / emulation layer.
+
+/*
+ * _rv_*(...)
+ *   RV32/64 intrinsics that return the "long" data type
+ *
+ * _rv32_*(...)
+ *   RV32/64 intrinsics that return the "int32_t" data type
+ *
+ * _rv64_*(...)
+ *   RV64-only intrinsics that return the "int64_t" data type
+ *
+ */
+
+#ifndef _RISCV_CRYPTO_SCALAR_H
+#define _RISCV_CRYPTO_SCALAR_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if !defined(__riscv_xlen) && !defined(RVKINTRIN_EMULATE)
+#warning "Target is not RISC-V. Enabling insecure emulation."
+#define RVKINTRIN_EMULATE 1
+#endif
+
+#if defined(RVKINTRIN_EMULATE)
+
+// intrinsics via emulation (insecure -- porting / debug option)
+#include "rvk_emu_intrin.h"
+#define _RVK_INTRIN_IMPL(s) _rvk_emu_##s
+
+#elif defined(RVKINTRIN_ASSEMBLER)
+
+// intrinsics via inline assembler (builtins not available)
+#include "rvk_asm_intrin.h"
+#define _RVK_INTRIN_IMPL(s) _rvk_asm_##s
+#else
+
+// intrinsics via compiler builtins
+#include 
+#define _RVK_INTRIN_IMPL(s) __builtin_riscv_##s
+
+#endif
+
+// set type if not already set
+#if !defined(RVKINTRIN_RV32) && !defined(RVKINTRIN_RV64)
+#if __riscv_xlen == 32
+#define RVKINTRIN_RV32
+#elif __riscv_xlen == 64
+#define RVKINTRIN_RV64
+#else
+#error "__riscv_xlen not valid."
+#endif
+#endif
+
+// Mappings to implementation
+
+// === (mapping)   Zbkb:   Bitmanipulation instructions for Cryptography
+
+static inline int32_t _rv32_ror(int32_t rs1, int32_t rs2)
+   { return _RVK_INTRIN_IMPL(ror_32)(rs1, rs2); }  //  
ROR[W] ROR[W]I
+
+static inline int32_t _rv32_rol(int32_t rs1, int32_t rs2)
+   { return _RVK_INTRIN_IMPL(rol_32)(rs1, rs2); }  //  
ROL[W] ROR[W]I
+
+#ifdef RVKINTRIN_RV64
+static inline int64_t _rv64_ror(int64_t rs1, int64_t rs2)
+   { return _RVK_INTRIN_IMPL(ror_64)(rs1, rs2); }  //  
ROR or RORI
+
+static inline int64_t _rv64_rol(int64_t rs1, int64_t rs2)
+   { return _RVK_INTRIN_IMPL(rol_64)(rs1, rs2); }  //  
ROL or RORI
+#endif
+
+#ifdef RVKINTRIN_RV32
+static inline int32_t _rv32_brev8(int32_t rs1)
+   { return _RVK_INTRIN_IMPL(brev8_32)(rs1); } 
//  BREV8 (GREVI)
+#endif
+
+#ifdef RVKINTRIN_RV64
+static inline int64_t _rv64_brev8(int64_t rs1)
+   { return _RVK_INTRIN_IMPL(brev8_64)(rs1); } 
//  BREV8 (GREVI)
+#endif
+
+#ifdef RVKINTRIN_RV32
+static inline int32_t _rv32_zip(int32_t rs1)
+   { return _RVK_INTRIN_IMPL(zip_32)(rs1); }  

[PATCH 4/5 V1] RISC-V:Implement testcases for Crypto extension

2022-02-23 Thread shihua
From: LiaoShihua 

These testcases use intrinsics .

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbkb32.c: New test.
* gcc.target/riscv/zbkb64.c: New test.
* gcc.target/riscv/zbkc32.c: New test.
* gcc.target/riscv/zbkc64.c: New test.
* gcc.target/riscv/zbkx32.c: New test.
* gcc.target/riscv/zbkx64.c: New test.
* gcc.target/riscv/zknd32.c: New test.
* gcc.target/riscv/zknd64.c: New test.
* gcc.target/riscv/zkne64.c: New test.
* gcc.target/riscv/zknh.c: New test.
* gcc.target/riscv/zknh32.c: New test.
* gcc.target/riscv/zknh64.c: New test.
* gcc.target/riscv/zksed.c: New test.
* gcc.target/riscv/zksh.c: New test.

---
 gcc/testsuite/gcc.target/riscv/zbkb32.c | 34 +
 gcc/testsuite/gcc.target/riscv/zbkb64.c | 21 +
 gcc/testsuite/gcc.target/riscv/zbkc32.c | 16 ++
 gcc/testsuite/gcc.target/riscv/zbkc64.c | 16 ++
 gcc/testsuite/gcc.target/riscv/zbkx32.c | 16 ++
 gcc/testsuite/gcc.target/riscv/zbkx64.c | 16 ++
 gcc/testsuite/gcc.target/riscv/zknd32.c | 18 +++
 gcc/testsuite/gcc.target/riscv/zknd64.c | 35 ++
 gcc/testsuite/gcc.target/riscv/zkne64.c | 29 ++
 gcc/testsuite/gcc.target/riscv/zknh.c   | 28 +
 gcc/testsuite/gcc.target/riscv/zknh32.c | 40 +
 gcc/testsuite/gcc.target/riscv/zknh64.c | 29 ++
 gcc/testsuite/gcc.target/riscv/zksed.c  | 20 +
 gcc/testsuite/gcc.target/riscv/zksh.c   | 17 +++
 14 files changed, 335 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkb32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkb64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkc32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkc64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkx32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkx64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknd32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknd64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zkne64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknh32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknh64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zksed.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zksh.c

diff --git a/gcc/testsuite/gcc.target/riscv/zbkb32.c 
b/gcc/testsuite/gcc.target/riscv/zbkb32.c
new file mode 100644
index 000..5bf588d58b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbkb32.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gc_zbkb -mabi=ilp32" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
+#include"riscv_crypto.h"
+int32_t foo1(int32_t rs1, int32_t rs2)
+{
+return _rv32_ror(rs1,rs2);
+}
+
+int32_t foo2(int32_t rs1, int32_t rs2)
+{
+return _rv32_rol(rs1,rs2);
+}
+
+int32_t foo3(int32_t rs1)
+{
+return _rv32_brev8(rs1);
+}
+
+int32_t foo4(int32_t rs1)
+{
+return _rv32_zip(rs1);
+}
+
+int32_t foo5(int32_t rs1)
+{
+return _rv32_unzip(rs1);
+}
+
+/* { dg-final { scan-assembler-times "ror" 1 } } */
+/* { dg-final { scan-assembler-times "rol" 1 } } */
+/* { dg-final { scan-assembler-times "brev8" 1 } } */
+/* { dg-final { scan-assembler-times "zip" 2 } } */
+/* { dg-final { scan-assembler-times "unzip" 1 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/zbkb64.c 
b/gcc/testsuite/gcc.target/riscv/zbkb64.c
new file mode 100644
index 000..2cd76a29750
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbkb64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zbkb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
+#include"riscv_crypto.h"
+int64_t foo1(int64_t rs1, int64_t rs2)
+{
+return _rv64_ror(rs1,rs2);
+}
+
+int64_t foo2(int64_t rs1, int64_t rs2)
+{
+return _rv64_rol(rs1,rs2);
+}
+
+int64_t foo3(int64_t rs1, int64_t rs2)
+{
+return _rv64_brev8(rs1);
+}
+/* { dg-final { scan-assembler-times "ror" 1 } } */
+/* { dg-final { scan-assembler-times "rol" 1 } } */
+/* { dg-final { scan-assembler-times "brev8" 1 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/zbkc32.c 
b/gcc/testsuite/gcc.target/riscv/zbkc32.c
new file mode 100644
index 000..237085bfc7d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbkc32.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gc_zbkc -mabi=ilp32" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
+#include"riscv_crypto.h"
+int32_t foo1(int32_t rs1, int32_t rs2)
+{
+return _rv32_clmul(rs1,rs2);
+}
+
+int32_t foo2(int32_t rs1, int32_t rs2)
+{
+return _rv32_clmulh(rs1,rs2);
+}
+
+/* { dg-final { scan-assembler-times "clmul" 2 } } */
+/* { dg-final { scan-assembler-times "clmulh" 1 } } */
\ No 

[PATCH 1/5 V1] RISC-V:Implement instruction patterns for Crypto extension

2022-02-23 Thread shihua
From: LiaoShihua 


gcc/ChangeLog:

* config/riscv/predicates.md (bs_operand): operand for bs
(rnum_operand): 
* config/riscv/riscv.md: include crypto.md
* config/riscv/crypto.md: New file. 

Co-Authored-By: Wu 
---
 gcc/config/riscv/crypto.md | 383 +
 gcc/config/riscv/predicates.md |   8 +
 gcc/config/riscv/riscv.md  |   1 +
 3 files changed, 392 insertions(+)
 create mode 100644 gcc/config/riscv/crypto.md

diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md
new file mode 100644
index 000..591066fac3b
--- /dev/null
+++ b/gcc/config/riscv/crypto.md
@@ -0,0 +1,383 @@
+;; Machine description for K extension.
+;; Copyright (C) 2022 Free Software Foundation, Inc.
+;; Contributed by SiYu Wu (s...@isrc.iscas.ac.cn) and ShiHua Liao 
(shi...@iscas.ac.cn).
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_c_enum "unspec" [
+;;ZBKB unspecs
+UNSPEC_ROR
+UNSPEC_ROL
+UNSPEC_BREV8
+UNSPEC_BSWAP
+UNSPEC_ZIP
+UNSPEC_UNZIP
+
+;; Zbkc unspecs
+UNSPEC_CLMUL
+UNSPEC_CLMULH
+
+;; Zbkx unspecs
+UNSPEC_XPERM8
+UNSPEC_XPERM4
+
+;; Zknd unspecs
+UNSPEC_AES_DSI
+UNSPEC_AES_DSMI
+UNSPEC_AES_DS
+UNSPEC_AES_DSM
+UNSPEC_AES_IM
+UNSPEC_AES_KS1I
+UNSPEC_AES_KS2
+
+;; Zkne unspecs
+UNSPEC_AES_ES
+UNSPEC_AES_ESM
+UNSPEC_AES_ESI
+UNSPEC_AES_ESMI
+
+;; Zknh unspecs
+UNSPEC_SHA_256_SIG0
+UNSPEC_SHA_256_SIG1
+UNSPEC_SHA_256_SUM0
+UNSPEC_SHA_256_SUM1
+UNSPEC_SHA_512_SIG0
+UNSPEC_SHA_512_SIG0H
+UNSPEC_SHA_512_SIG0L
+UNSPEC_SHA_512_SIG1
+UNSPEC_SHA_512_SIG1H
+UNSPEC_SHA_512_SIG1L
+UNSPEC_SHA_512_SUM0
+UNSPEC_SHA_512_SUM0R
+UNSPEC_SHA_512_SUM1
+UNSPEC_SHA_512_SUM1R
+
+;; Zksh
+UNSPEC_SM3_P0
+UNSPEC_SM3_P1
+
+;;Zksed
+UNSPEC_SM4_ED
+UNSPEC_SM4_KS
+])
+
+(define_insn "riscv_ror_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_ROR))]
+  "TARGET_ZBKB"
+  "ror\t%0,%1,%2")
+
+(define_insn "riscv_rol_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_ROL))]
+  "TARGET_ZBKB"
+  "rol\t%0,%1,%2")
+
+(define_insn "riscv_brev8_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")]
+  UNSPEC_BREV8))]
+  "TARGET_ZBKB"
+  "brev8\t%0,%1")
+
+(define_insn "riscv_bswap"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")]
+  UNSPEC_BSWAP))]
+  "TARGET_ZBKB"
+  "bswap\t%0,%1")
+
+(define_insn "riscv_zip"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(unspec:SI [(match_operand:SI 1 "register_operand" "r")]
+  UNSPEC_ZIP))]
+  "TARGET_ZBKB && !TARGET_64BIT"
+  "zip\t%0,%1")
+
+(define_insn "riscv_unzip"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(unspec:SI [(match_operand:SI 1 "register_operand" "r")]
+  UNSPEC_UNZIP))]
+  "TARGET_ZBKB && !TARGET_64BIT"
+  "unzip\t%0,%1")
+
+(define_insn "riscv_clmul_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_CLMUL))]
+  "TARGET_ZBKC"
+  "clmul\t%0,%1,%2")
+
+(define_insn "riscv_clmulh_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_CLMULH))]
+  "TARGET_ZBKC"
+  "clmulh\t%0,%1,%2")
+
+(define_insn "riscv_xperm8_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_XPERM8))]
+  "TARGET_ZBKX"
+  "xperm8\t%0,%1,%2")
+
+(define_insn "riscv_xperm4_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X 

[PATCH 2/5 V1] RISC-V:Implement built-in instructions for Crypto extension

2022-02-23 Thread shihua
From: LiaoShihua 

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc (RISCV_FTYPE_NAME2): Defined new 
function prototypes.
(RISCV_FTYPE_NAME3): Ditto.
(AVAIL): Defined new riscv_builtin_avail for crypto extension.
(RISCV_ATYPE_SI): Defined new argument type.
(RISCV_ATYPE_DI): Ditto.
(RISCV_FTYPE_ATYPES2): Defined new RISCV_FTYPE_ATYPESN
(RISCV_FTYPE_ATYPES3): Ditto.
* config/riscv/riscv-ftypes.def (1): Defined new prototypes for RISC-V 
built-in functions.
(2): Ditto.
(3): Ditto.
* config/riscv/riscv-builtins-crypto.def: Defined new RISC-V built-in 
functions for crypto extension.

Co-Authored-By: Wu 
---
 gcc/config/riscv/riscv-builtins-crypto.def | 93 ++
 gcc/config/riscv/riscv-builtins.cc | 35 
 gcc/config/riscv/riscv-ftypes.def  |  7 ++
 3 files changed, 135 insertions(+)
 create mode 100644 gcc/config/riscv/riscv-builtins-crypto.def

diff --git a/gcc/config/riscv/riscv-builtins-crypto.def 
b/gcc/config/riscv/riscv-builtins-crypto.def
new file mode 100644
index 000..91dcf457dd5
--- /dev/null
+++ b/gcc/config/riscv/riscv-builtins-crypto.def
@@ -0,0 +1,93 @@
+/* Builtin definitions for K extension
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+// Zbkb
+RISCV_BUILTIN (ror_si, "ror_32", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE_SI_SI, 
crypto_zbkb32),
+RISCV_BUILTIN (ror_di, "ror_64", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE_DI_DI, 
crypto_zbkb64),
+RISCV_BUILTIN (rol_si, "rol_32", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE_SI_SI, 
crypto_zbkb32),
+RISCV_BUILTIN (rol_di, "rol_64", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE_DI_DI, 
crypto_zbkb64),
+RISCV_BUILTIN (bswapsi, "bswap32", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE_SI_SI, 
crypto_zbkb32),
+RISCV_BUILTIN (bswapdi, "bswap64", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE_DI_DI, 
crypto_zbkb64),
+RISCV_BUILTIN (zip, "zip_32", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE_SI, 
crypto_zbkb32),
+RISCV_BUILTIN (unzip, "unzip_32", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE_SI, 
crypto_zbkb32),
+RISCV_BUILTIN (brev8_si, "brev8_32", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE_SI, 
crypto_zbkb32),
+RISCV_BUILTIN (brev8_di, "brev8_64", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE_DI, 
crypto_zbkb64),
+
+//Zbkc
+RISCV_BUILTIN (clmul_si, "clmul_32", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI_SI, crypto_zbkc32),
+RISCV_BUILTIN (clmul_di, "clmul_64", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI_DI, crypto_zbkc64),
+RISCV_BUILTIN (clmulh_si, "clmulh_32", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI_SI, crypto_zbkc32),
+RISCV_BUILTIN (clmulh_di, "clmulh_64", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI_DI, crypto_zbkc64),
+
+// Zbkx
+RISCV_BUILTIN (xperm4_si, "xperm4_32", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI_SI, crypto_zbkx32),
+RISCV_BUILTIN (xperm4_di, "xperm4_64", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI_DI, crypto_zbkx64),
+RISCV_BUILTIN (xperm8_si, "xperm8_32", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI_SI, crypto_zbkx32),
+RISCV_BUILTIN (xperm8_di, "xperm8_64", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI_DI, crypto_zbkx64),
+
+// Zknd
+DIRECT_BUILTIN (aes32dsi, RISCV_SI_FTYPE_SI_SI_SI, crypto_zknd32),
+DIRECT_BUILTIN (aes32dsmi, RISCV_SI_FTYPE_SI_SI_SI, crypto_zknd32),
+DIRECT_BUILTIN (aes64ds, RISCV_DI_FTYPE_DI_DI, crypto_zknd64),
+DIRECT_BUILTIN (aes64dsm, RISCV_DI_FTYPE_DI_DI, crypto_zknd64),
+DIRECT_BUILTIN (aes64im, RISCV_DI_FTYPE_DI, crypto_zknd64),
+DIRECT_BUILTIN (aes64ks1i, RISCV_DI_FTYPE_DI_SI, crypto_zkne_or_zknd),
+DIRECT_BUILTIN (aes64ks2, RISCV_DI_FTYPE_DI_DI, crypto_zkne_or_zknd),
+
+// Zkne
+DIRECT_BUILTIN (aes32esi, RISCV_SI_FTYPE_SI_SI_SI, crypto_zkne32),
+DIRECT_BUILTIN (aes32esmi, RISCV_SI_FTYPE_SI_SI_SI, crypto_zkne32),
+DIRECT_BUILTIN (aes64es, RISCV_DI_FTYPE_DI_DI, crypto_zkne64),
+DIRECT_BUILTIN (aes64esm, RISCV_DI_FTYPE_DI_DI, crypto_zkne64),
+
+// Zknh - SHA256
+RISCV_BUILTIN (sha256sig0_si, "sha256sig0", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI, crypto_zknh32),
+RISCV_BUILTIN (sha256sig0_di, "sha256sig0", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI, crypto_zknh64),
+RISCV_BUILTIN (sha256sig1_si, "sha256sig1", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI, crypto_zknh32),
+RISCV_BUILTIN (sha256sig1_di, "sha256sig1", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI, crypto_zknh64),
+RISCV_BUILTIN (sha256sum0_si, "sha256sum0", RISCV_BUILTIN_DIRECT, 

[PATCH 0/5 V1] RISC-V:Implement Crypto extension's instruction patterns and it's intrinsics

2022-02-23 Thread shihua
From: LiaoShihua 

This patch set is the implementation of Crypto extension, which includes 
zbkb, zbkc, zbkx,
zknd, zknh, zkne,zksed and zksh extension.
It includes instruction/md patterns, intrinsic functions, testcases for 
intrinsic functions, 
and test macros.
The definitions of intrinsic functions come from 
https://github.com/rvkrypto/rvkrypto-fips .
This work is done by Liao Shihua and Wu Siyu.

LiaoShihua (5):
  RISC-V:Implement instruction patterns for Crypto extensions
  RISC-V:Implement built-in instructions for Crypto extensions
  RISC-V:Implement intrinsics for Crypto extensions
  RISC-V:Implement testcases for Crypto extensions
  RISC-V:Implement architecture extension test macros for Crypto extensions

 gcc/config.gcc |   1 +
 gcc/config/riscv/crypto.md | 383 +
 gcc/config/riscv/predicates.md |   8 +
 gcc/config/riscv/riscv-builtins-crypto.def |  93 
 gcc/config/riscv/riscv-builtins.cc |  35 ++
 gcc/config/riscv/riscv-c.cc|   9 +
 gcc/config/riscv/riscv-ftypes.def  |   7 +
 gcc/config/riscv/riscv.md  |   1 +
 gcc/config/riscv/riscv_crypto.h|  12 +
 gcc/config/riscv/riscv_crypto_scalar.h | 247 +
 gcc/config/riscv/rvk_asm_intrin.h  | 187 +++
 gcc/config/riscv/rvk_emu_intrin.h  | 594 +
 gcc/testsuite/gcc.target/riscv/predef-17.c |  59 ++
 gcc/testsuite/gcc.target/riscv/zbkb32.c|  34 ++
 gcc/testsuite/gcc.target/riscv/zbkb64.c|  21 +
 gcc/testsuite/gcc.target/riscv/zbkc32.c|  16 +
 gcc/testsuite/gcc.target/riscv/zbkc64.c|  16 +
 gcc/testsuite/gcc.target/riscv/zbkx32.c|  16 +
 gcc/testsuite/gcc.target/riscv/zbkx64.c|  16 +
 gcc/testsuite/gcc.target/riscv/zknd32.c|  18 +
 gcc/testsuite/gcc.target/riscv/zknd64.c|  35 ++
 gcc/testsuite/gcc.target/riscv/zkne64.c|  29 +
 gcc/testsuite/gcc.target/riscv/zknh.c  |  28 +
 gcc/testsuite/gcc.target/riscv/zknh32.c|  40 ++
 gcc/testsuite/gcc.target/riscv/zknh64.c|  29 +
 gcc/testsuite/gcc.target/riscv/zksed.c |  20 +
 gcc/testsuite/gcc.target/riscv/zksh.c  |  17 +
 27 files changed, 1971 insertions(+)
 create mode 100644 gcc/config/riscv/crypto.md
 create mode 100644 gcc/config/riscv/riscv-builtins-crypto.def
 create mode 100644 gcc/config/riscv/riscv_crypto.h
 create mode 100644 gcc/config/riscv/riscv_crypto_scalar.h
 create mode 100644 gcc/config/riscv/rvk_asm_intrin.h
 create mode 100644 gcc/config/riscv/rvk_emu_intrin.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-17.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkb32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkb64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkc32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkc64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkx32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbkx64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknd32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknd64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zkne64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknh32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zknh64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zksed.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zksh.c

-- 
2.31.1.windows.1



[PATCH 5/5 V1] RISC-V:Implement architecture extension test macros for Crypto extension

2022-02-23 Thread shihua
From: LiaoShihua 

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):Add __riscv_zks, 
__riscv_zk, __riscv_zkn

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-17.c: New test.

---
 gcc/config/riscv/riscv-c.cc|  9 
 gcc/testsuite/gcc.target/riscv/predef-17.c | 59 ++
 2 files changed, 68 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-17.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 73c62f41274..d6c153e8d7c 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -63,6 +63,15 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__riscv_fdiv");
   builtin_define ("__riscv_fsqrt");
 }
+  
+  if (TARGET_ZBKB && TARGET_ZBKC && TARGET_ZBKX && TARGET_ZKNE && TARGET_ZKND 
&& TARGET_ZKNH)
+{
+  builtin_define ("__riscv_zk");
+  builtin_define ("__riscv_zkn");
+}
+
+  if (TARGET_ZBKB && TARGET_ZBKC && TARGET_ZBKX && TARGET_ZKSED && TARGET_ZKSH)
+  builtin_define ("__riscv_zks");
 
   switch (riscv_abi)
 {
diff --git a/gcc/testsuite/gcc.target/riscv/predef-17.c 
b/gcc/testsuite/gcc.target/riscv/predef-17.c
new file mode 100644
index 000..4366dee1016
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-17.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_zbkb_zbkc_zbkx_zknd_zkne_zknh_zksed_zksh 
-mabi=lp64 -mcmodel=medlow -misa-spec=2.2" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 64
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i)
+#error "__riscv_i"
+#endif
+
+#if !defined(__riscv_zk)
+#error "__riscv_zk"
+#endif
+
+#if !defined(__riscv_zkn)
+#error "__riscv_zkn"
+#endif
+
+#if !defined(__riscv_zks)
+#error "__riscv_zks"
+#endif
+
+#if !defined(__riscv_zbkb)
+#error "__riscv_zbkb"
+#endif
+
+#if !defined(__riscv_zbkc)
+#error "__riscv_zbkc"
+#endif
+
+#if !defined(__riscv_zbkx)
+#error "__riscv_zbkx"
+#endif
+
+#if !defined(__riscv_zknd)
+#error "__riscv_zknd"
+#endif
+
+#if !defined(__riscv_zkne)
+#error "__riscv_zkne"
+#endif
+
+#if !defined(__riscv_zknh)
+#error "__riscv_zknh"
+#endif
+
+#if !defined(__riscv_zksh)
+#error "__riscv_zksh"
+#endif
+
+  return 0;
+}
\ No newline at end of file
-- 
2.31.1.windows.1



[PATCH] match.pd, v2: Avoid infinite recursion during bswap (int_cst_ovf1) cmp int_cst_ovf2 [PR104644]

2022-02-23 Thread Jakub Jelinek via Gcc-patches
On Wed, Feb 23, 2022 at 09:32:41AM +0100, Jakub Jelinek via Gcc-patches wrote:
> The following patch avoids infinite recursion during generic folding.
> The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
> (bswap @1) actually being simplified, if it is not simplified, we just
> move the bswap from one operand to the other and if @0 is also INTEGER_CST,
> we apply the same rule next.
> 
> The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
> has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
> such cases:
> static inline bool
> integer_cst_p (tree t)
> {
>   return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
> }
> The patch uses ! modifier to ensure the bswap is simplified, but because
> ! is only supported in gimple-match, guards it also with #if GIMPLE.

Here is another variant, which just breaks the possible ping-pong.
If @0 is not INTEGER_CST, we still want to canonicalize to bswap on the
INTEGER_CST (e.g. in the hope that we throw away TREE_OVERFLOW during/after
gimplification), but if it is INTEGER_CST, we don't want to move bswap
to the operand with TREE_OVERFLOW on it.

Ok for trunk if this passes bootstrap/regtest (it fixes the testcase too)?

2022-02-23  Jakub Jelinek  

PR tree-optimization/104644
* match.pd (cmp (bswap @0) INTEGER_CST@1): Don't simplify
if TREE_OVERFLOW (@1) and @0 is INTEGER_CST.

* gcc.dg/pr104644.c: New test.

--- gcc/match.pd.jj 2022-02-23 09:17:04.867124392 +0100
+++ gcc/match.pd2022-02-23 10:31:05.417304115 +0100
@@ -3961,8 +3961,9 @@ (define_operator_list SYNC_FETCH_AND_AND
 (cmp (convert:ctype @0) (convert:ctype @1
   (simplify
(cmp (bswap @0) INTEGER_CST@1)
-   (with { tree ctype = TREE_TYPE (@1); }
-(cmp (convert:ctype @0) (bswap @1)
+   (if (TREE_CODE (@0) != INTEGER_CST || !TREE_OVERFLOW (@1))
+(with { tree ctype = TREE_TYPE (@1); }
+ (cmp (convert:ctype @0) (bswap @1))
  /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & C2.  */
  (simplify
   (bit_and (convert1? (rshift@0 (convert2? (bswap@4 @1)) INTEGER_CST@2))
--- gcc/testsuite/gcc.dg/pr104644.c.jj  2022-02-23 10:29:50.704341688 +0100
+++ gcc/testsuite/gcc.dg/pr104644.c 2022-02-23 10:29:50.704341688 +0100
@@ -0,0 +1,9 @@
+/* PR tree-optimization/104644 */
+/* { dg-do compile } */
+/* { dg-options "-Wno-overflow" } */
+
+int
+foo (void)
+{
+  return __builtin_bswap16 (1.31072e+5f) != (signed char) 1.31072e+5f;
+}


Jakub



[PATCH] [i386] Fix typo in v1ti3.

2022-02-23 Thread liuhongt via Gcc-patches
For evex encoding vp{xor,or,and}, suffix is needed.

Or there would be an error for
vpxor %ymm0, %ymm31, %ymm1

Error: unsupported instruction `vpxor'

Bootstrapped and regtested x86_64-pc-linux-gnu{-m32,}.
Pushed to trunk.

gcc/ChangeLog:

* config/i386/sse.md (v1ti3): Add suffix and replace
isa attr of alternative 2 from avx to avx512vl.
---
 gcc/config/i386/sse.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b2f56345c65..3066ea3734a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17025,8 +17025,8 @@ (define_insn "v1ti3"
   "@
p\t{%2, %0|%0, %2}
vp\t{%2, %1, %0|%0, %1, %2}
-   vp\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx,avx")
+   vpd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx,avx512vl")
(set_attr "prefix" "orig,vex,evex")
(set_attr "prefix_data16" "1,*,*")
(set_attr "type" "sselog")
-- 
2.18.1



Re: [committed][nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt

2022-02-23 Thread Thomas Schwinge
Hi Tom!

This is me again, following along GCC/nvptx devlopment, and asking
questions.  ;-)

On 2022-02-19T20:07:18+0100, Tom de Vries via Gcc-patches 
 wrote:
> With the default ptx isa 6.0, we have for uniform-simt-1.c:
> ...
> @%r33   atom.global.cas.b32 %r26, [a], %r28, %r29;
> shfl.sync.idx.b32   %r26, %r26, %r32, 31, 0x;
> ...
>
> The atomic insn is predicated by -muniform-simt, and the subsequent insn does
> a warp sync, at which point the warp is uniform again.

I understand the concern here is Independent Thread Scheduling, where the
execution of predicated-off threads of a warp ('@ ! %r33') may proceed
with the next instruction, 'shfl', without implicitly waiting for the
other threads of a warp still working on the 'atom'?  Hence, the 'sync'
aspect of 'shfl.sync', as a means that PTX provides at the ISA level such
that we're getting the desired semantics: as its first step, "wait for
all threads in membermask to arrive".

> But with -mptx=3.1, we have instead:
> ...
> @%r33   atom.global.cas.b32 %r26, [a], %r28, %r29;
> shfl.idx.b32%r26, %r26, %r32, 31;
> ...
>
> The shfl does not sync the warp, and we want the warp to go back to executing
> uniformly asap.  We cannot enforce this

Is it really the case that such code may cause "permanent" warp-divergent
execution (until re-converging "somewhere")?  My understanding has been
that predicated-off threads of a warp ('@ ! %r33') would simply idle,
implicitly waiting for the other threads of a warp still working on the
'atom' -- due to the nature of a shared program counter per warp, and the
desire to re-converge as soon as possible.

For example, PTX ISA 7.2, 3.1. "A Set of SIMT Multiprocessors":

| [...]
| At every instruction issue time, the SIMT unit selects a warp that is ready 
to execute and
| issues the next instruction to the active threads of the warp. A warp 
executes one common
| instruction at a time, so full efficiency is realized when all threads of a 
warp agree on their
| execution path. If threads of a warp diverge via a data-dependent conditional 
branch, the
| warp serially executes each branch path taken, disabling threads that are not 
on that path,
| and when all paths complete, the threads converge back to the same execution 
path. [...]

So I'd have assumed that after the potentially-diverging
'@%r33'-predicated 'atom' instruction, we're implicitly re-converging for
the unpredicated 'shfl' (as long as Independent Thread Scheduling isn't
involved, which it it's for '-mptx=3.1')?

As I'm understanding you, my understanding is not correct, and we may
thus be getting "permanent" warp-divergent execution as soon as there's
any predication/conditional involved that may evaluate differently for
individual threads of a warp, and we thus need such *explicit*
synchronization after all such instances?

> but at least check this using
> nvptx_uniform_warp_check, similar to how that is done for openacc.
>
> Likewise, detect the case that no shfl insn is emitted, and add a
> nvptx_uniform_warp_check or nvptx_warpsync.

For example, 'nvptx-none/mgomp/libatomic/cas_1_.o':

[...]
 @ %r71 atom.cas.b64 %r62,[%r35],%r29,%r61;
+{
+.reg .b32 act;
+vote.ballot.b32 act,1;
+.reg .pred uni;
+setp.eq.b32 uni,act,0x;
+@ ! uni trap;
+@ ! uni exit;
+}
 mov.b64 {%r69,%r70},%r62;
 shfl.idx.b32 %r69,%r69,%r68,31;
 shfl.idx.b32 %r70,%r70,%r68,31;
[...]

So that's basically an 'assert' that all threads of a warp are converged.
(Is the JIT maybe even able to optimize that out?)  I guess I just wonder
if that's not satisfied implicitly.


Grüße
 Thomas


> [nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt
>
> gcc/ChangeLog:
>
> 2022-02-19  Tom de Vries  
>
>   * config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Change return
>   type to bool.
>   (nvptx_reorg_uniform_simt): Insert nvptx_uniform_warp_check or
>   nvptx_warpsync, if necessary.
>
> gcc/testsuite/ChangeLog:
>
> 2022-02-19  Tom de Vries  
>
>   * gcc.target/nvptx/uniform-simt-1.c: Add scan-assembler test.
>   * gcc.target/nvptx/uniform-simt-2.c: New test.
>
> ---
>  gcc/config/nvptx/nvptx.cc   | 34 
> ++---
>  gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c |  1 +
>  gcc/testsuite/gcc.target/nvptx/uniform-simt-2.c | 20 +++
>  3 files changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
> index afbad5bdde6..4942f1100da 100644
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -3248,12 +3248,18 @@ nvptx_call_insn_is_syscall_p (rtx_insn *insn)
>  /* If SET subexpression of INSN sets a register, emit a shuffle instruction 
> to
> propagate its value from lane MASTER to current lane.  */
>
> -static void
> +static bool
>  nvptx_unisimt_handle_set (rtx set, rtx_insn *insn, rtx master)
>  {
> 

PING**2 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-02-23 Thread Tobias Burnus

PING**2 for the ME review or at least comments to that patch,
which fixes a build issue/ICE with nvptx (at least when bumping the
default -misa).

Patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
(for gcc/cfgexpand.cc + gcc/expr.cc)

(There is some discussion by Tom and Roger about the BE in the patch
thread, which only not relate to the ME patch. But there is no ME-patch
comment so far.)

Thanks,

Tobias

On 17.02.22 15:35, Tobias Burnus wrote:

PING for this cfgexpand.cc + expr.cc change by Roger.

This is a pre-requisite for Roger's nvptx patch to avoid an ICE during
bootstrap:

* https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590250.html
  "[PATCH] nvptx: Back-end portion of a fix for PR target/104489."
  (see patch for additional reasoning for this patch)
* See also https://gcc.gnu.org/PR104489
   nvptx, sm_53: internal compiler error: in gen_rtx_SUBREG, at
emit-rtl.cc:1022

Thanks,

Tobias

On 09.02.22 21:12, Roger Sayle wrote:

This patch adds middle-end support for target ABIs that pass/return
floating point values in integer registers with precision wider than
the original FP mode.  An example, is the nvptx backend where 16-bit
HFmode registers are passed/returned as (promoted to) SImode registers.
Unfortunately, this currently falls foul of the various (recent?) sanity
checks that (very sensibly) prevent creating paradoxical SUBREGs of
floating point registers.  The approach below is to explicitly
perform the
conversion/promotion in two steps, via an integer mode of same precision
as the floating point value.  So on nvptx, 16-bit HFmode is initially
converted to 16-bit HImode (using SUBREG), then zero-extended to SImode,
and likewise when going the other way, parameters truncated to HImode
then converted to HFmode (using SUBREG).  These changes are localized
to expand_value_return and expanding DECL_RTL to support strange ABIs,
rather than inside convert_modes or gen_lowpart, as mismatched
precision integer/FP conversions should be explicit in the RTL,
and these semantics not generally visible/implicit in user code.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and on nvptx-none, where it is
the middle-end portion of a pair of patches to allow the default ISA to
be advanced.  Ok for mainline?


2022-02-09  Roger Sayle  

gcc/ChangeLog
* cfgexpand.cc (expand_value_return): Allow backends to promote
a scalar floating point return value to a wider integer mode.
* expr.cc (expand_expr_real_1) [expand_decl_rtl]: Likewise,
allow
backends to promote scalar FP PARM_DECLs to wider integer modes.


Thanks in advance,
Roger
--


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] match.pd: Avoid infinite recursion during bswap (int_cst_ovf1) cmp int_cst_ovf2 [PR104644]

2022-02-23 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch avoids infinite recursion during generic folding.
The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
(bswap @1) actually being simplified, if it is not simplified, we just
move the bswap from one operand to the other and if @0 is also INTEGER_CST,
we apply the same rule next.

The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
such cases:
static inline bool
integer_cst_p (tree t)
{
  return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
}
The patch uses ! modifier to ensure the bswap is simplified, but because
! is only supported in gimple-match, guards it also with #if GIMPLE.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-02-22  Jakub Jelinek  

PR tree-optimization/104644
* match.pd (cmp (bswap @0) INTEGER_CST@1): Restrict optimization to
GIMPLE only and use ! modifier on bswap.

* gcc.dg/pr104644.c: New test.

--- gcc/match.pd.jj 2022-02-18 12:38:06.075393091 +0100
+++ gcc/match.pd2022-02-22 20:22:02.222022022 +0100
@@ -3959,10 +3959,13 @@ (define_operator_list SYNC_FETCH_AND_AND
(cmp (bswap@2 @0) (bswap @1))
(with { tree ctype = TREE_TYPE (@2); }
 (cmp (convert:ctype @0) (convert:ctype @1
+#if GIMPLE
   (simplify
(cmp (bswap @0) INTEGER_CST@1)
(with { tree ctype = TREE_TYPE (@1); }
-(cmp (convert:ctype @0) (bswap @1)
+(cmp (convert:ctype @0) (bswap! @1
+#endif
+ )
  /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & C2.  */
  (simplify
   (bit_and (convert1? (rshift@0 (convert2? (bswap@4 @1)) INTEGER_CST@2))
--- gcc/testsuite/gcc.dg/pr104644.c.jj  2022-02-22 20:02:32.020408468 +0100
+++ gcc/testsuite/gcc.dg/pr104644.c 2022-02-22 20:02:04.609785996 +0100
@@ -0,0 +1,9 @@
+/* PR tree-optimization/104644 */
+/* { dg-do compile } */
+/* { dg-options "-Wno-overflow" } */
+
+int
+foo (void)
+{
+  return __builtin_bswap16 (1.31072e+5f) != (signed char) 1.31072e+5f;
+}

Jakub



[PATCH] warn-recursion: Don't warn for __builtin_calls in gnu_inline extern inline functions [PR104633]

2022-02-23 Thread Jakub Jelinek via Gcc-patches
Hi!

The first two testcases show different ways how e.g. the glibc
_FORTIFY_SOURCE wrappers are implemented, and on Winfinite-recursion-3.c
the new -Winfinite-recursion warning emits a false positive warning.

It is a false positive because when a builtin with 2 names is called
through the __builtin_ name (but not all builtins have a name prefixed
exactly like that) from extern inline function with gnu_inline semantics,
it doesn't mean the compiler will ever attempt to use the user inline
wrapper for the call, the __builtin_ just does what the builtin function
is expected to do and either expands into some compiler generated code,
or if the compiler decides to emit a call it will use an actual definition
of the function, but that is not the extern inline gnu_inline function
which is never emitted out of line.
Compared to that, in Winfinite-recursion-5.c the extern inline gnu_inline
wrapper calls the builtin by the same name as the function's name and in
that case it is infinite recursion, we actuall try to inline the recursive
call and also error because the recursion is infinite during inlining;
without always_inline we wouldn't error but it is still infinite recursion,
the user has no control on how many recursive calls we actually inline.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-02-22  Jakub Jelinek  

PR c/104633
* gimple-warn-recursion.cc (pass_warn_recursion::find_function_exit):
Don't warn about calls to corresponding builtin from extern inline
gnu_inline wrappers.

* gcc.dg/Winfinite-recursion-3.c: New test.
* gcc.dg/Winfinite-recursion-4.c: New test.
* gcc.dg/Winfinite-recursion-5.c: New test.

--- gcc/gimple-warn-recursion.cc.jj 2022-01-18 11:58:59.619981528 +0100
+++ gcc/gimple-warn-recursion.cc2022-02-22 13:19:43.592644576 +0100
@@ -112,13 +112,25 @@ pass_warn_recursion::find_function_exit
  if (!strcmp (name, "siglongjmp"))
return true;
 
- if (m_built_in && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
+ if (m_built_in
+ && gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
  && m_built_in == DECL_FUNCTION_CODE (fndecl))
{
- /* The call is being made from the definition of a built-in
-(e.g., in a replacement of one) to itself.  */
- m_calls->safe_push (stmt);
- return false;
+ const char *cname
+   = IDENTIFIER_POINTER (DECL_NAME (current_function_decl));
+ /* Don't warn about gnu_inline extern inline function
+like strcpy calling __builtin_strcpy, that is fine,
+if some call is made (the builtin isn't expanded inline),
+a call is made to the external definition.  */
+ if (!(DECL_DECLARED_INLINE_P (current_function_decl)
+   && DECL_EXTERNAL (current_function_decl))
+ || strcmp (name, cname) == 0)
+   {
+ /* The call is being made from the definition of a built-in
+(e.g., in a replacement of one) to itself.  */
+ m_calls->safe_push (stmt);
+ return false;
+   }
}
}
 
--- gcc/testsuite/gcc.dg/Winfinite-recursion-3.c.jj 2022-02-22 
13:28:10.345579876 +0100
+++ gcc/testsuite/gcc.dg/Winfinite-recursion-3.c2022-02-22 
13:25:16.760999396 +0100
@@ -0,0 +1,18 @@
+/* PR c/104633 */
+/* { dg-do compile } */
+/* { dg-options "-Winfinite-recursion" } */
+
+typedef __SIZE_TYPE__ size_t;
+int memcmp (const void *, const void *, size_t);
+
+extern inline __attribute__((always_inline, gnu_inline)) int
+memcmp (const void *p, const void *q, size_t size) /* { dg-bogus "infinite 
recursion detected" } */
+{
+  return __builtin_memcmp (p, q, size);/* { dg-bogus 
"recursive call" } */
+}
+
+int
+foo (const void *p, const void *q, size_t size)
+{
+  return memcmp (p, q, size);
+}
--- gcc/testsuite/gcc.dg/Winfinite-recursion-4.c.jj 2022-02-22 
13:28:13.604534458 +0100
+++ gcc/testsuite/gcc.dg/Winfinite-recursion-4.c2022-02-22 
13:25:22.552918640 +0100
@@ -0,0 +1,19 @@
+/* PR c/104633 */
+/* { dg-do compile } */
+/* { dg-options "-Winfinite-recursion" } */
+
+typedef __SIZE_TYPE__ size_t;
+int memcmp (const void *, const void *, size_t);
+__typeof (memcmp) __memcmp_alias __asm ("memcmp");
+
+extern inline __attribute__((always_inline, gnu_inline)) int
+memcmp (const void *p, const void *q, size_t size) /* { dg-bogus "infinite 
recursion detected" } */
+{
+  return __memcmp_alias (p, q, size);  /* { dg-bogus 
"recursive call" } */
+}
+
+int
+foo (const void *p, const void *q, size_t size)
+{
+  return memcmp (p, q, size);
+}
--- gcc/testsuite/gcc.dg/Winfinite-recursion-5.c.jj 2022-02-22 
13:28:16.690491449 +0100
+++ gcc/testsuite/gcc.dg/Winfinite-recursion-5.c2022-02-22