[r14-2050 Regression] FAIL: gfortran.dg/value_9.f90 -Os execution test on Linux/x86_64

2023-06-23 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

d130ae8499e0c615e1636258d6901372316dfd93 is the first bad commit
commit d130ae8499e0c615e1636258d6901372316dfd93
Author: Harald Anlauf 
Date:   Thu Jun 22 22:07:41 2023 +0200

Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

caused

FAIL: gfortran.dg/value_9.f90   -O1  execution test
FAIL: gfortran.dg/value_9.f90   -O2  execution test
FAIL: gfortran.dg/value_9.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/value_9.f90   -O3 -g  execution test
FAIL: gfortran.dg/value_9.f90   -Os  execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2050/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/value_9.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/value_9.f90 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH] Introduce hardbool attribute for C

2023-06-23 Thread Alexandre Oliva via Gcc-patches
On Jun 16, 2023, Alexandre Oliva  wrote:

> On Aug  9, 2022, Alexandre Oliva  wrote:
>> Ping?

> Ping?  Refreshed, added setting of ENUM_UNDERLYING_TYPE, retested on
> x86_64-linux-gnu (also on gcc-13).

Here's a consolidated patch incorporating the doc and test patchlets
sent out in response to Qing's and Berhard's suggestions.

Regstrapped on x86_64-linux-gnu.  Ok to install?


Introduce hardbool attribute for C

This patch introduces hardened booleans in C.  The hardbool attribute,
when attached to an integral type, turns it into an enumerate type
with boolean semantics, using the named or implied constants as
representations for false and true.

Expressions of such types decay to _Bool, trapping if the value is
neither true nor false, and _Bool can convert implicitly back to them.
Other conversions go through _Bool first.


for  gcc/c-family/ChangeLog

* c-attribs.cc (c_common_attribute_table): Add hardbool.
(handle_hardbool_attribute): New.
(type_valid_for_vector_size): Reject hardbool.
* c-common.cc (convert_and_check): Skip warnings for convert
and check for hardbool.
(c_hardbool_type_attr_1): New.
* c-common.h (c_hardbool_type_attr): New.

for  gcc/c/ChangeLog

* c-typeck.cc (convert_lvalue_to_rvalue): Decay hardbools.
* c-convert.cc (convert): Convert to hardbool through
truthvalue.
* c-decl.cc (check_bitfield_type_and_width): Skip enumeral
truncation warnings for hardbool.
(finish_struct): Propagate hardbool attribute to bitfield
types.
(digest_init): Convert to hardbool.

for  gcc/ChangeLog

* doc/extend.texi (hardbool): New type attribute.
* doc/invoke.texi (-ftrivial-auto-var-init): Document
representation vs values.

for  gcc/testsuite/ChangeLog

* gcc.dg/hardbool-err.c: New.
* gcc.dg/hardbool-trap.c: New.
* gcc.dg/hardbool.c: New.
* gcc.dg/hardbool-s.c: New.
* gcc.dg/hardbool-us.c: New.
* gcc.dg/hardbool-i.c: New.
* gcc.dg/hardbool-ul.c: New.
* gcc.dg/hardbool-ll.c: New.
* gcc.dg/hardbool-5a.c: New.
* gcc.dg/hardbool-s-5a.c: New.
* gcc.dg/hardbool-us-5a.c: New.
* gcc.dg/hardbool-i-5a.c: New.
* gcc.dg/hardbool-ul-5a.c: New.
* gcc.dg/hardbool-ll-5a.c: New.
---
 gcc/c-family/c-attribs.cc |   98 -
 gcc/c-family/c-common.cc  |   21 
 gcc/c-family/c-common.h   |   18 
 gcc/c/c-convert.cc|   14 +++
 gcc/c/c-decl.cc   |   10 ++
 gcc/c/c-typeck.cc |   31 ++-
 gcc/doc/extend.texi   |   64 ++
 gcc/doc/invoke.texi   |   21 
 gcc/testsuite/gcc.dg/hardbool-err.c   |   31 +++
 gcc/testsuite/gcc.dg/hardbool-trap.c  |   13 +++
 gcc/testsuite/gcc.dg/torture/hardbool-5a.c|6 +
 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c  |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-i.c |5 +
 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-ll.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c  |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-s.c |5 +
 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-ul.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-us.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool.c   |  118 +
 22 files changed, 496 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/hardbool-err.c
 create mode 100644 gcc/testsuite/gcc.dg/hardbool-trap.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index c12211cb4d499..365319e642b1a 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -176,6 +176,7 @@ static tree handle_objc_root_class_attribute (tree *, tree, 
tree, int, bool *);
 static tree handle_objc_nullability_attribute (tree *, tree, tree, int, bool 
*);
 static tree 

[PATCH v6] tree-ssa-sink: Improve code sinking pass

2023-06-23 Thread Ajit Agarwal via Gcc-patches
Hello All:

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  
  if (a != 5)
{
  l = a + b + c + d +e + f; 
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

2023-06-24  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(def_use_same_block): New function.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 15 +
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 19 ++
 gcc/tree-ssa-sink.cc| 68 ++---
 3 files changed, 92 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index b1ba7a2ad6c..791d44249f9 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -171,9 +171,28 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
   return commondom;
 }
 
+/* Return TRUE if immediate uses of the defs in
+   STMT occur in the same block as STMT, FALSE otherwise.  */
+
+static bool
+def_use_same_block (gimple *stmt)
+{
+  def_operand_p def;
+  ssa_op_iter iter;
+
+  FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (DEF_FROM_PTR (def));
+  if ((gimple_bb (def_stmt) == gimple_bb (stmt)))
+   return true;
+ }
+  return false;
+}
+
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
-   statements.
+   statements. The best basic block should be an immediate dominator of
+   best basic block if the use stmt is after the call.
 
We want the most control dependent block in the shallowest loop nest.
 
@@ -190,7 +209,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
   basic_block late_bb,
-  gimple *stmt)
+  gimple *stmt,
+  gimple *use)
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
@@ -237,7 +257,37 @@ select_best_block (basic_block early_bb,
   /* If result of comparsion is unknown, prefer EARLY_BB.
 Thus use !(...>=..) rather than (...<...)  */
   && !(best_bb->count * 100 >= early_bb->count * threshold))
-return best_bb;
+{
+  basic_block new_best_bb = get_immediate_dominator (CDI_DOMINATORS, 
best_bb);
+  /* Return best_bb if def and use are in same block otherwise new_best_bb.
+
+Things to consider:
+
+  new_best_bb is not equal to best_bb 

Re: [PATCH] Introduce hardbool attribute for C

2023-06-23 Thread Alexandre Oliva via Gcc-patches
On Jun 22, 2023, Bernhard Reutner-Fischer  wrote:

> On Wed, 21 Jun 2023 22:08:55 -0300
> Alexandre Oliva  wrote:

>> Thanks for the test.
>> 
>> Did you mean for me to incorporate it into the patch, or do you mean to
>> contribute it separately, if the feature happens to be accepted?

> These were your tests that i quoted

Aaah.  I didn't look too closely, I just assumed you'd tweaked something
in there.

>> > I don't see explicit tests with _Complex nor __complex__. Would we
>> > want to check these here, or are they handled thought the "underlying"
>> > tests above?  
>> 
>> Good question.  The notion of using complex types to hold booleans
>> hadn't even crossed my mind.

> Maybe it is not real, it just sparkled through somehow.

Is it imaginary, then? :-D

>> On the other, there doesn't seem to be any useful case for them.  Can
>> anyone think of one?

> We could either not reject other such uses and wait or we could reject
> them and equally wait for complaints. I would not dare to bet who pops
> up first, fuzzers or users

I bet on fuzzers :-)

> it was just a thought (i mentioned tinfoil hat, did i ;).

indeed ;-)

Having verified that it gets rejected (phew :-) I'm inclined to add it
to the test you quoted and be done with it.  If a reason ever comes up
to support it, the test can be adjusted accordingly.

>> 
>> > I'd welcome a fortran interop note in the docs  
>> 
>> Is there any good place for such interop notes?  I'm not sure I'm
>> best-suited to write them up, since Fortran is not a language I'm
>> very familiar with, but I suppose I could give it a try.

> I'd append to your extend.texi hunk below the para about uninitialized a
> note to the effect of:
> Note: Types annotated with this attribute may not be Fortran
> interoperable.

I'm not comfortable single-casing Fortran like that.  I expect other
languages could face similar interop issues when relying on
single-language extensions.  How about:

  Since this is a language extension only available in C, interoperation
  with other languages may pose difficulties.  It should interoperate
  with Ada Booleans defined with the same size and equivalent
  representation clauses, and with enumerations or other languages'
  integral types that correspond to C's chosen integral type.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] Introduce hardbool attribute for C

2023-06-23 Thread Alexandre Oliva via Gcc-patches
On Jun 23, 2023, Qing Zhao via Gcc-patches  wrote:

> -ftrivial-auto-var-init has been in GCC since GCC12.  

*nod*.  IIRC I started designing hardbool in GCC10, but the first
complete implementation was for GCC11.

> The decision to use 0x00 for zero-initiation, 0xfe for pattern-initiation
> has been discussed thoroughly during the design phase. -:)

*nod*, and it's a good one

> Since this hardbool attribute will also  be a security feature, users that 
> seek
> security features of GCC will likely ask the same question for these two
> features.

> So, their interaction is better to be at least documented. That’s my main 
> point.

*nod*.  At first, I thought the ideal place to clarify the issue was in
the documentation for the option, because there's nothing exceptional
about the option's behavior when it comes to hardbool specifically.  But
it doesn't hurt to mention it in both places, so I did.  How about the
incremental patchlet below (at the end)?

> For normal Boolean variables, 0x00 is false, this is a reasonable init
> value with zero-initialization.

*nod*.  I was surprised by zero initialization of (non-hardened)
booleans even when pattern is requested, but not consistently
(e.g. boolean fields of a larger struct would still get
pattern-initialized IIUC).  I'd have expected pattern would translate to
nonzero and thus to true, rather than false.

> For hardbool variables, what 0x00 represents if it’s not false or true
> value?

It depends on how hardbool is parameterized.  One may pick 0x00 or 0xFE
as the representations for true or false, or neither, in which case the
trivial initializer will end up as a trapping value.

>> I'd probably have arranged for the front-end to create the initializer
>> value, because expansion time is too late to figure it out: we may not
>> even have the front-end at hand any more, in case of lto compilation.

> Is the hardbool attribute information available during the rtl expansion 
> phase?

It is in the sense that the attribute lives on, but c_hardbool_type_attr
is a frontend function, it cannot be called from e.g. lto1.

The hardbool attribute is also implemented in Ada, but there it only
affects validity checking in the front end: Boolean types in Ada are
Enumeration types, and there is standard syntax to specify the
representations for true and false.  AFAICT, once we translate GNAT IR
to GNU IR, hardened booleans would not be recognizable as boolean types.
Even non-hardened booleans with representation clauses would.  So
handling these differently from other enumeration types, to make them
closer to booleans, would be a bit of a challenge, and a
backwards-compatibility issue (because such booleans have already been
handled in the present way since the introduction of -ftrivial-* back in
GCC12)

>> Now, I acknowledge that the decision to make implicit
>> zero-initialization of boolean types set them to the value for false,
>> rather than to all-bits-zero representation, is a departure from common
>> practice of zero-initialization yielding logical zero.

> Dont’s quite understand the above, for normal boolean variables,

Sorry, I meant hardened boolean types.  This was WRT to the design
decision that led to this bit in the documentation:

typedef char __attribute__ ((__hardbool__ (0x5a))) hbool;
[...]
static hbool zeroinit; /* False, stored as (char)0x5a.  */
auto hbool uninit; /* Undefined, may trap.  */
  
> And this is a very reasonable initial value for Boolean variables,

Agreed.  The all-zeros bit pattern is not so great for booleans that use
alternate representations, though, such as the following standard Ada:

   type MyBool is new Boolean;
   for MyBool use (16#5a#, 16#a5#);
   for MyBool'Size use 8;

or for biased variables such as:

  X : Integer range 254 .. 507;
  for X'Size use 8; -- bits, so a biased representation is required.

Just to make things more interesting, I chose a range for X that causes
the compiler to represent 0xfe as 0x00 in in the byte that holds X, but
that places the 0xfe pattern just out of the range :-) So with
-ftrivial-auto-var-init=zero, X = 254, whereas with
-ftrivial-auto-var-init=pattern, it fails validity checking, and might
come out as 508 if that's disabled.

> From my understanding, only with the introduction of “hardbool”
> attribute, all-bits-zero will not be equal to the
> logical false anymore. 

Ada booleans have long allowed nonzero representations for false.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 772209c1793e8..ae7867bb35696 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8774,6 +8774,12 @@ on the bits held in the storage (re)used for the 
variable, if any, and
 on optimizations the compiler may perform on the grounds that using
 uninitialized values invokes undefined behavior.
 
+Users of @option{-ftrivial-auto-var-init} should be aware that the bit
+patterns used as its trivial initializers are @emph{not} converted to
+@code{hardbool} types, so using variables 

Re: [PATCH v3] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-06-23 Thread Fangrui Song via Gcc-patches
On Tue, Jun 13, 2023 at 2:49 PM Fangrui Song  wrote:

> On Mon, Jun 12, 2023 at 11:16 PM Jan Beulich  wrote:
>
>> On 13.06.2023 05:28, Fangrui Song wrote:
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/i386/large-data.c
>> > @@ -0,0 +1,13 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-require-effective-target lp64 } */
>> > +/* { dg-options "-O2 -mcmodel=large -mlarge-data-threshold=4" } */
>> > +/* { dg-final { scan-assembler ".lbss" } } */
>> > +/* { dg-final { scan-assembler ".bss" } } */
>> > +/* { dg-final { scan-assembler ".ldata" } } */
>> > +/* { dg-final { scan-assembler ".data" } } */
>> > +/* { dg-final { scan-assembler ".lrodata" } } */
>> > +/* { dg-final { scan-assembler ".rodata" } } */
>>
>> Aren't these regex-es, and hence the dots all need escaping or enclosing
>> in square brackets?
>>
>> Jan
>>
>
> Good catch! I am not familiar with dg-* directives... I can send a v4, but
> I'd like to know whether there are other comments.
> (I don't have git write permission for gcc.)
>
>
> --
> 宋方睿
>

Ping. Do people have other suggestions?


-- 
宋方睿


Re: [PATCH] RISC-V: Refactor the integer ternary autovec pattern

2023-06-23 Thread 钟居哲
Is this patch ok for trunk ?
Tests are all passed.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-06-22 06:38
To: gcc-patches
CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw; Juzhe-Zhong
Subject: [PATCH] RISC-V: Refactor the integer ternary autovec pattern
Long time ago, I encounter ICE when trying to set clobber register as Pmode
and I forgot the reason.
 
So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which
makes patterns look unreasonable.
 
According to Jeff's comments, I tried it again, it works now when we try to
set clobber register as Pmode and the patterns look more reasonable now.
 
The tests are all passed, Ok for trunk.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (*fma): set clobber to Pmode in expand 
stage.
(*fma): Ditto.
(*fnma): Ditto.
(*fnma): Ditto.
 
---
gcc/config/riscv/autovec.md | 54 +++--
1 file changed, 28 insertions(+), 26 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cf154b3737a..731ffe8ff89 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -596,40 +596,41 @@
;;result after reload_completed.
(define_expand "fma4"
   [(parallel
-[(set (match_operand:VI 0 "register_operand" "=vr")
+[(set (match_operand:VI 0 "register_operand")
  (plus:VI
(mult:VI
-   (match_operand:VI 1 "register_operand" " vr")
-   (match_operand:VI 2 "register_operand" " vr"))
- (match_operand:VI 3 "register_operand"   " vr")))
- (clobber (match_scratch:SI 4))])]
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "register_operand"))
+ (match_operand:VI 3 "register_operand")))
+ (clobber (match_dup 4))])]
   "TARGET_VECTOR"
-  {})
+  {
+operands[4] = gen_reg_rtx (Pmode);
+  })
-(define_insn_and_split "*fma"
+(define_insn_and_split "*fma"
   [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?")
(plus:VI
  (mult:VI
(match_operand:VI 1 "register_operand" " %0, vr,   vr")
(match_operand:VI 2 "register_operand" " vr, vr,   vr"))
  (match_operand:VI 3 "register_operand"   " vr,  0,   vr")))
-   (clobber (match_scratch:SI 4 "=r,r,r"))]
+   (clobber (match_operand:P 4 "register_operand" "=r,r,r"))]
   "TARGET_VECTOR"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   {
-PUT_MODE (operands[4], Pmode);
-riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
 if (which_alternative == 2)
   emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
-riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus (mode),
-   riscv_vector::RVV_TERNOP, ops, operands[4]);
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
(mode),
+riscv_vector::RVV_TERNOP, ops, operands[4]);
 DONE;
   }
   [(set_attr "type" "vimuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")])
;; -
;;  [INT] VNMSAC and VNMSUB
@@ -641,40 +642,41 @@
(define_expand "fnma4"
   [(parallel
-[(set (match_operand:VI 0 "register_operand" "=vr")
+[(set (match_operand:VI 0 "register_operand")
(minus:VI
- (match_operand:VI 3 "register_operand"   " vr")
+ (match_operand:VI 3 "register_operand")
  (mult:VI
-   (match_operand:VI 1 "register_operand" " vr")
-   (match_operand:VI 2 "register_operand" " vr"
- (clobber (match_scratch:SI 4))])]
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "register_operand"
+ (clobber (match_dup 4))])]
   "TARGET_VECTOR"
-  {})
+  {
+operands[4] = gen_reg_rtx (Pmode);
+  })
-(define_insn_and_split "*fnma"
+(define_insn_and_split "*fnma"
   [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?")
  (minus:VI
(match_operand:VI 3 "register_operand"   " vr,  0,   vr")
(mult:VI
  (match_operand:VI 1 "register_operand" " %0, vr,   vr")
  (match_operand:VI 2 "register_operand" " vr, vr,   vr"
-   (clobber (match_scratch:SI 4 "=r,r,r"))]
+   (clobber (match_operand:P 4 "register_operand" "=r,r,r"))]
   "TARGET_VECTOR"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   {
-PUT_MODE (operands[4], Pmode);
-riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
 if (which_alternative == 2)
   emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
-riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
-riscv_vector::RVV_TERNOP, ops, operands[4]);
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
+   riscv_vector::RVV_TERNOP, ops, operands[4]);
 DONE;
   }
   [(set_attr "type" "vimuladd")
-   (set_attr "mode" "")])
+   

Re: [PATCH V2] LOOP IVOPTS: Apply LEN_MASK_{LOAD,STORE}

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 17:41, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

Hi, Jeff. I fix format as you suggested.
Ok for trunk ?

gcc/ChangeLog:

 * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Apply 
LEN_MASK_{LOAD,STORE}.

Yes.  Sorry I wasn't explicit that it was OK with the formatting change.

Jeff


[pushed: v2] text-art: remove explicit #include of C++ standard library headers

2023-06-23 Thread David Malcolm via Gcc-patches
On Fri, 2023-06-23 at 16:35 +0100, Alex Coplan wrote:

> Thanks for the fix! I can confirm this fixes bootstrap on
> x86_64-apple-darwin for me.

Thanks Alex

Unfortunately the patch I posted broke the build of the plugin in
the testsuite, so I had to tweak things slightly.

Here's v2 of the patch, which fixes the plugin, and also adds a check to
text-art/types.h to ensure that INCLUDE_VECTOR was defined.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
I've pushed it to trunk as r14-2059-gb2e075a594e93a.

Hopefully this version also fixes the build for you; let me know if
this is still causing problems.

Sorry again for the breakage
Dave


gcc/analyzer/ChangeLog:
* access-diagram.cc: Add #define INCLUDE_VECTOR.
* bounds-checking.cc: Likewise.

gcc/ChangeLog:
* diagnostic-format-sarif.cc: Add #define INCLUDE_VECTOR.
* diagnostic.cc: Likewise.
* text-art/box-drawing.cc: Likewise.
* text-art/canvas.cc: Likewise.
* text-art/ruler.cc: Likewise.
* text-art/selftests.cc: Likewise.
* text-art/selftests.h (text_art::canvas): New forward decl.
* text-art/style.cc: Add #define INCLUDE_VECTOR.
* text-art/styled-string.cc: Likewise.
* text-art/table.cc: Likewise.
* text-art/table.h: Remove #include .
* text-art/theme.cc: Add #define INCLUDE_VECTOR.
* text-art/types.h: Check that INCLUDE_VECTOR is defined.
Remove #include of  and .
* text-art/widget.cc: Add #define INCLUDE_VECTOR.
* text-art/widget.h: Remove #include .

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_test_text_art.c: Add
#define INCLUDE_VECTOR.
---
 gcc/analyzer/access-diagram.cc |  1 +
 gcc/analyzer/bounds-checking.cc|  1 +
 gcc/diagnostic-format-sarif.cc |  1 +
 gcc/diagnostic.cc  |  1 +
 .../gcc.dg/plugin/diagnostic_plugin_test_text_art.c|  1 +
 gcc/text-art/box-drawing.cc|  1 +
 gcc/text-art/canvas.cc |  1 +
 gcc/text-art/ruler.cc  |  1 +
 gcc/text-art/selftests.cc  |  1 +
 gcc/text-art/selftests.h   |  4 +++-
 gcc/text-art/style.cc  |  1 +
 gcc/text-art/styled-string.cc  |  1 +
 gcc/text-art/table.cc  |  1 +
 gcc/text-art/table.h   |  1 -
 gcc/text-art/theme.cc  |  1 +
 gcc/text-art/types.h   | 10 --
 gcc/text-art/widget.cc |  1 +
 gcc/text-art/widget.h  |  1 -
 18 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/gcc/analyzer/access-diagram.cc b/gcc/analyzer/access-diagram.cc
index 968ff50a0b7..467c9bdd734 100644
--- a/gcc/analyzer/access-diagram.cc
+++ b/gcc/analyzer/access-diagram.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 #define INCLUDE_MEMORY
 #define INCLUDE_MAP
 #define INCLUDE_SET
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "coretypes.h"
diff --git a/gcc/analyzer/bounds-checking.cc b/gcc/analyzer/bounds-checking.cc
index 10632d12562..5e8de9a7aa5 100644
--- a/gcc/analyzer/bounds-checking.cc
+++ b/gcc/analyzer/bounds-checking.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "config.h"
 #define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "make-unique.h"
diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index ac2f5b844e3..5e483988027 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "diagnostic.h"
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 7c2289f0634..c523f215bae 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
message module.  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "version.h"
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_text_art.c 
b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_text_art.c
index 27c341b9f2f..58b219bb0dc 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_text_art.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_text_art.c
@@ -2,6 +2,7 @@
 
 /* This plugin exercises the text_art code.  */
 
+#define INCLUDE_VECTOR
 #include "gcc-plugin.h"
 #include "config.h"
 #include "system.h"
diff --git a/gcc/text-art/box-drawing.cc 

RE: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread Li, Pan2 via Gcc-patches
Committed as passed both bootstrap and regression tests.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Bernhard Reutner-Fischer via Gcc-patches
Sent: Friday, June 23, 2023 5:39 PM
To: Richard Sandiford 
Cc: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org; rguent...@suse.de
Subject: Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

On 23 June 2023 10:03:45 CEST, Richard Sandiford  
wrote:

>> Fuse the block below into the one above as the condition seems to be 
>> identical?
>
>Yeah, true, but I think the idea is that the code above “Arguments are
>ready” is calculating argument values, and the code after it is creating
>code.  These are two separate steps, and the fact that the two final_len
>blocks end up being consecutive is something of a coincidence.
>
>So personally I think we should keep the structure in the patch.

Sure, works for me.
thanks,


[PATCH V2] LOOP IVOPTS: Apply LEN_MASK_{LOAD,STORE}

2023-06-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Jeff. I fix format as you suggested.
Ok for trunk ?

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Apply 
LEN_MASK_{LOAD,STORE}.
---
 gcc/tree-ssa-loop-ivopts.cc | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 6671ff6db5a..dc7a29fead2 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -2442,6 +2442,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
 case IFN_MASK_LOAD:
 case IFN_MASK_LOAD_LANES:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   if (op_p == gimple_call_arg_ptr (call, 0))
return TREE_TYPE (gimple_call_lhs (call));
   return NULL_TREE;
@@ -2449,9 +2450,16 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
 case IFN_MASK_STORE:
 case IFN_MASK_STORE_LANES:
 case IFN_LEN_STORE:
-  if (op_p == gimple_call_arg_ptr (call, 0))
-   return TREE_TYPE (gimple_call_arg (call, 3));
-  return NULL_TREE;
+case IFN_LEN_MASK_STORE:
+  {
+   if (op_p == gimple_call_arg_ptr (call, 0))
+ {
+   internal_fn ifn = gimple_call_internal_fn (call);
+   int index = internal_fn_stored_value_index (ifn);
+   return TREE_TYPE (gimple_call_arg (call, index));
+ }
+   return NULL_TREE;
+  }
 
 default:
   return NULL_TREE;
-- 
2.36.3




[PATCH v3] Add leafy mode for zero-call-used-regs

2023-06-23 Thread Alexandre Oliva via Gcc-patches
On Jun 23, 2023, Qing Zhao via Gcc-patches  wrote:

> It’s better to add this definition earlier in the list of the “three
> basic values”, to make it “four basic values”, like the following:

Oh, my, sorry for being so dense, I had managed to miss that bit all
this time somehow :-(

> The sentence "This value is mainly to provide users a more efficient mode to 
> zero 
> call-used registers in leaf functions.” just for your reference,
> the wording can certainly be improved.  -:)

:-)  got it, thanks.  How about this?


Add leafy mode for zero-call-used-regs

Introduce 'leafy' to auto-select between 'used' and 'all' for leaf and
nonleaf functions, respectively.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* doc/extend.texi (zero-call-used-regs): Document leafy and
variants thereof.
* flag-types.h (zero_regs_flags): Add LEAFY_MODE, as well as
LEAFY and variants.
* function.cc (gen_call_ued_regs_seq): Set only_used for leaf
functions in leafy mode.
* opts.cc (zero_call_used_regs_opts): Add leafy and variants.

for  gcc/testsuite/ChangeLog

* c-c++-common/zero-scratch-regs-leafy-1.c: New.
* c-c++-common/zero-scratch-regs-leafy-2.c: New.
* gcc.target/i386/zero-scratch-regs-leafy-1.c: New.
* gcc.target/i386/zero-scratch-regs-leafy-2.c: New.
---
 gcc/doc/extend.texi|   30 ++--
 gcc/flag-types.h   |5 +++
 gcc/function.cc|3 ++
 gcc/opts.cc|4 +++
 .../c-c++-common/zero-scratch-regs-leafy-1.c   |   15 ++
 .../c-c++-common/zero-scratch-regs-leafy-2.c   |   21 ++
 .../gcc.target/i386/zero-scratch-regs-leafy-1.c|   12 
 .../gcc.target/i386/zero-scratch-regs-leafy-2.c|   16 +++
 8 files changed, 103 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-1.c
 create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-2.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 852f6b629bea8..739c40368f556 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4349,7 +4349,7 @@ through registers.
 In order to satisfy users with different security needs and control the
 run-time overhead at the same time, the @var{choice} parameter provides a
 flexible way to choose the subset of the call-used registers to be zeroed.
-The three basic values of @var{choice} are:
+The four basic values of @var{choice} are:
 
 @itemize @bullet
 @item
@@ -4362,10 +4362,16 @@ the function.
 
 @item
 @samp{all} zeros all call-used registers.
+
+@item
+@samp{leafy} behaves like @samp{used} in a leaf function, and like
+@samp{all} in a nonleaf function.  This makes for leaner zeroing in leaf
+functions, where the set of used registers is known, and that may be
+enough for some purposes of register zeroing.
 @end itemize
 
 In addition to these three basic choices, it is possible to modify
-@samp{used} or @samp{all} as follows:
+@samp{used}, @samp{all}, and @samp{leafy} as follows:
 
 @itemize @bullet
 @item
@@ -4412,10 +4418,28 @@ zeros all call-used registers that pass arguments.
 @item all-gpr-arg
 zeros all call-used general purpose registers that pass
 arguments.
+
+@item leafy
+Same as @samp{used} in a leaf function, and same as @samp{all} in a
+nonleaf function.
+
+@item leafy-gpr
+Same as @samp{used-gpr} in a leaf function, and same as @samp{all-gpr}
+in a nonleaf function.
+
+@item leafy-arg
+Same as @samp{used-arg} in a leaf function, and same as @samp{all-arg}
+in a nonleaf function.
+
+@item leafy-gpr-arg
+Same as @samp{used-gpr-arg} in a leaf function, and same as
+@samp{all-gpr-arg} in a nonleaf function.
+
 @end table
 
 Of this list, @samp{used-arg}, @samp{used-gpr-arg}, @samp{all-arg},
-and @samp{all-gpr-arg} are mainly used for ROP mitigation.
+@samp{all-gpr-arg}, @samp{leafy-arg}, and @samp{leafy-gpr-arg} are
+mainly used for ROP mitigation.
 
 The default for the attribute is controlled by @option{-fzero-call-used-regs}.
 @end table
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 2e650bf1c487c..0d2dab1b99dd4 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -348,6 +348,7 @@ namespace zero_regs_flags {
   const unsigned int ONLY_GPR = 1UL << 2;
   const unsigned int ONLY_ARG = 1UL << 3;
   const unsigned int ENABLED = 1UL << 4;
+  const unsigned int LEAFY_MODE = 1UL << 5;
   const unsigned int USED_GPR_ARG = ENABLED | ONLY_USED | ONLY_GPR | ONLY_ARG;
   const unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
   const unsigned int USED_ARG = ENABLED | ONLY_USED | ONLY_ARG;
@@ -356,6 +357,10 @@ namespace zero_regs_flags {
   const unsigned int ALL_GPR = 

[r14-2047 Regression] FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]xorb on Linux/x86_64

2023-06-23 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

d0e891406b16dc28905717de2333f5637cf71d3e is the first bad commit
commit d0e891406b16dc28905717de2333f5637cf71d3e
Author: Roger Sayle 
Date:   Fri Jun 23 15:23:20 2023 +0100

Improved SUBREG simplifications in simplify-rtx.cc's simplify_subreg.

caused

FAIL: gcc.target/i386/pr78904-1b.c scan-assembler-not movb
FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]andb
FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]orb
FAIL: gcc.target/i386/pr78904-1b.c scan-assembler [ \t]xorb

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2047/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr78904-1b.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr78904-1b.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: Re: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis

2023-06-23 Thread 钟居哲
Not sure since I saw MASK_STORE/LEN_STORE didn't compute size.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-24 03:20
To: juzhe.zhong; gcc-patches
CC: rguenther; richard.sandiford
Subject: Re: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias 
analysis
 
 
On 6/23/23 07:56, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
>  * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Apply 
> LEN_MASK_{LOAD,STORE}
> 
> ---
>   gcc/tree-ssa-alias.cc | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
> index e1bc04b82ba..92dc1bb9987 100644
> --- a/gcc/tree-ssa-alias.cc
> +++ b/gcc/tree-ssa-alias.cc
> @@ -2815,11 +2815,13 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
> case IFN_SCATTER_STORE:
> case IFN_MASK_SCATTER_STORE:
> case IFN_LEN_STORE:
> +  case IFN_LEN_MASK_STORE:
>   return false;
> case IFN_MASK_STORE_LANES:
>   goto process_args;
> case IFN_MASK_LOAD:
> case IFN_LEN_LOAD:
> +  case IFN_LEN_MASK_LOAD:
> case IFN_MASK_LOAD_LANES:
>   {
> ao_ref rhs_ref;
Don't you need to adjust how you compute the size for the LEN_MASK_LOAD 
case?
 
jeff
 


Re: Re: [PATCH] LOOP IVOPTS: Apply LEN_MASK_{LOAD,STORE}

2023-06-23 Thread 钟居哲
Ok will send V2 soon.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-24 03:14
To: juzhe.zhong; gcc-patches
CC: rguenther; richard.sandiford
Subject: Re: [PATCH] LOOP IVOPTS: Apply LEN_MASK_{LOAD,STORE}
 
 
On 6/23/23 08:05, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
>  * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Apply 
> LEN_MASK_{LOAD,STORE}.
> 
> ---
>   gcc/tree-ssa-loop-ivopts.cc | 6 +-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 6671ff6db5a..2b66fe66bc7 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -2442,6 +2442,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
>   case IFN_MASK_LOAD:
>   case IFN_MASK_LOAD_LANES:
>   case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
> if (op_p == gimple_call_arg_ptr (call, 0))
>   return TREE_TYPE (gimple_call_lhs (call));
> return NULL_TREE;
> @@ -2449,8 +2450,11 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
>   case IFN_MASK_STORE:
>   case IFN_MASK_STORE_LANES:
>   case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
> if (op_p == gimple_call_arg_ptr (call, 0))
> - return TREE_TYPE (gimple_call_arg (call, 3));
> + return TREE_TYPE (
> +   gimple_call_arg (call, internal_fn_stored_value_index (
> +gimple_call_internal_fn (call;
Formatting nit.  Compute the result of internal_fn_stored_value_index 
into a temporary and pass that temporary into gimple_call_arg which 
should clean up the formatting here.
 
In general, if you find yourself indenting after an open paren like 
you've done here, compute the value into a temporary.
 
OK with the formatting fix.
 
jeff
 
 


Re: Re: [PATCH] SSA ALIAS: Apply LEN_MASK_STORE to 'ref_maybe_used_by_call_p_1'

2023-06-23 Thread 钟居哲
Not sure since I saw MASK_STORE/LEN_STORE didn't compute size.
 



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-24 00:27
To: juzhe.zhong; gcc-patches
CC: rguenther; richard.sandiford
Subject: Re: [PATCH] SSA ALIAS: Apply LEN_MASK_STORE to 
'ref_maybe_used_by_call_p_1'
 
 
On 6/23/23 08:15, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
>  * tree-ssa-alias.cc (call_may_clobber_ref_p_1): Add LEN_MASK_STORE.
Doesn't this need to extract/compute the size argument in a manner 
similar to what DSE does?
 
Jeff
 


Re: Re: [PATCH] DSE: Add LEN_MASK_STORE analysis into DSE

2023-06-23 Thread 钟居哲
Address comment.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-23 22:57
To: juzhe.zhong; gcc-patches
CC: rguenther; richard.sandiford
Subject: Re: [PATCH] DSE: Add LEN_MASK_STORE analysis into DSE
 
 
On 6/23/23 08:48, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> gcc/ChangeLog:
> 
>  * tree-ssa-dse.cc (initialize_ao_ref_for_dse): Add LEN_MASK_STORE.
>  (dse_optimize_stmt): Ditto.
> 
> ---
>   gcc/tree-ssa-dse.cc | 18 ++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 3c7a2e9992d..01b0951f1a9 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -174,6 +174,23 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write, 
> bool may_def_ok = false)
> return true;
>   }
> break;
> +   case IFN_LEN_MASK_STORE: {
> + /* We cannot initialize a must-def ao_ref (in all cases) but we
> +can provide a may-def variant.  */
> + if (may_def_ok)
> +   {
> + tree len_size
> +   = int_const_binop (MINUS_EXPR, gimple_call_arg (stmt, 2),
> +  gimple_call_arg (stmt, 5));
> + tree mask_size
> +   = TYPE_SIZE_UNIT (TREE_TYPE (gimple_call_arg (stmt, 4)));
> + tree size = int_const_binop (MAX_EXPR, len_size, mask_size);
> + ao_ref_init_from_ptr_and_size (write, gimple_call_arg (stmt, 0),
> +size);
So isn't len_size here the size in elements?  If so, don't you need to 
multiply len_size by the element size?
 
Jeff
 


Go patch committed: Support bootstrapping Go 1.21

2023-06-23 Thread Ian Lance Taylor via Gcc-patches
compiler, libgo: support bootstrapping gc compiler

In the Go 1.21 release the package internal/profile imports
internal/lazyregexp.  That works when bootstrapping with Go 1.17,
because that compiler has internal/lazyregep and permits importing it.
We also have internal/lazyregexp in libgo, but since it is not
installed it is not available for importing.  This patch adds
internal/lazyregexp to the list of internal packages that are
installed for bootstrapping.

The Go 1.21, and earlier, releases have a couple of functions in the
internal/abi package that are always fully intrinsified.  The Go
frontend recognizes and intrinsifies those functions as well.
However, the Go frontend was also building function descriptors for
references to the functions without calling them, which failed because
there was nothing to refer to.  That is OK for the gc compiler, which
guarantees that the functions are only called, not referenced.  This
patch arranges to not generate function descriptors for these
functions.

This helps address https://go.dev/issue/60913.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline and GCC 12 and 13 branches.

Ian
2ad5553091d8afbc21bbd3a29a419df359e7aacc
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index a028350ba8e..ff07b1a1fa6 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-195060166e6045408a2cb95e6aa88c6f0b98f20b
+68a756b6aadc901534cfad2b1e73fae9e34f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 2112de6abfc..d276bd811cc 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -12272,7 +12272,8 @@ Call_expression::intrinsify(Gogo* gogo,
   return Runtime::make_call(code, loc, 3, a1, a2, a3);
 }
 }
-  else if (package == "internal/abi")
+  else if (package == "internal/abi"
+  || package == "bootstrap/internal/abi") // for bootstrapping gc
 {
   if (is_method)
return NULL;
diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc
index 9197eef3e38..980db1ea07e 100644
--- a/gcc/go/gofrontend/gogo.cc
+++ b/gcc/go/gofrontend/gogo.cc
@@ -3296,6 +3296,9 @@ class Create_function_descriptors : public Traverse
   int
   expression(Expression**);
 
+  static bool
+  skip_descriptor(Gogo* gogo, const Named_object*);
+
  private:
   Gogo* gogo_;
 };
@@ -3306,6 +3309,9 @@ class Create_function_descriptors : public Traverse
 int
 Create_function_descriptors::function(Named_object* no)
 {
+  if (Create_function_descriptors::skip_descriptor(this->gogo_, no))
+return TRAVERSE_CONTINUE;
+
   if (no->is_function()
   && no->func_value()->enclosing() == NULL
   && !no->func_value()->is_method()
@@ -3393,6 +3399,28 @@ Create_function_descriptors::expression(Expression** 
pexpr)
   return TRAVERSE_CONTINUE;
 }
 
+// The gc compiler has some special cases that it always compiles as
+// intrinsics.  For those we don't want to generate a function
+// descriptor, as there will be no code for it to refer to.
+
+bool
+Create_function_descriptors::skip_descriptor(Gogo* gogo,
+const Named_object* no)
+{
+  const std::string& pkgpath(no->package() == NULL
+? gogo->pkgpath()
+: no->package()->pkgpath());
+
+  // internal/abi is the standard library package,
+  // bootstrap/internal/abi is the name used when bootstrapping the gc
+  // compiler.
+
+  return ((pkgpath == "internal/abi"
+  || pkgpath == "bootstrap/internal/abi")
+ && (no->name() == "FuncPCABI0"
+ || no->name() == "FuncPCABIInternal"));
+}
+
 // Create function descriptors as needed.  We need a function
 // descriptor for all exported functions and for all functions that
 // are referenced without being called.
@@ -3414,7 +3442,8 @@ Gogo::create_function_descriptors()
   if (no->is_function_declaration()
  && !no->func_declaration_value()->type()->is_method()
  && !Linemap::is_predeclared_location(no->location())
- && !Gogo::is_hidden_name(no->name()))
+ && !Gogo::is_hidden_name(no->name())
+ && !Create_function_descriptors::skip_descriptor(this, no))
fndecls.push_back(no);
 }
   for (std::vector::const_iterator p = fndecls.begin();
diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index 920f8cc7071..c95dc2106cd 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -417,6 +417,7 @@ toolexeclibgounicode_DATA = \
 # Some internal packages are needed to bootstrap the gc toolchain.
 toolexeclibgointernaldir = $(toolexeclibgodir)/internal
 toolexeclibgointernal_DATA = \
+   internal/lazyregexp.gox \
internal/reflectlite.gox \
internal/unsafeheader.gox
 
diff --git a/libgo/go/internal/abi/abi.go 

[PATCH] c++: fix error reporting routines re-entered ICE [PR110175]

2023-06-23 Thread Marek Polacek via Gcc-patches
Here we get the "error reporting routines re-entered" ICE because
of an unguarded use of warning_at.  While at it, I added a check
for a warning_at just above it.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/110175

gcc/cp/ChangeLog:

* typeck.cc (cp_build_unary_op): Check tf_warning before warning.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-110175.C: New test.
---
 gcc/cp/typeck.cc | 5 +++--
 gcc/testsuite/g++.dg/cpp0x/decltype-110175.C | 6 ++
 2 files changed, 9 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-110175.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index da591dafc8f..859b133a18d 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -7561,7 +7561,8 @@ cp_build_unary_op (enum tree_code code, tree xarg, bool 
noconvert,
/* [depr.volatile.type] "Postfix ++ and -- expressions and
   prefix ++ and -- expressions of volatile-qualified arithmetic
   and pointer types are deprecated."  */
-   if (TREE_THIS_VOLATILE (arg) || CP_TYPE_VOLATILE_P (TREE_TYPE (arg)))
+   if ((TREE_THIS_VOLATILE (arg) || CP_TYPE_VOLATILE_P (TREE_TYPE (arg)))
+   && (complain & tf_warning))
  warning_at (location, OPT_Wvolatile,
  "%qs expression of %-qualified type is "
  "deprecated",
@@ -7592,7 +7593,7 @@ cp_build_unary_op (enum tree_code code, tree xarg, bool 
noconvert,
return error_mark_node;
  }
/* Otherwise, [depr.incr.bool] says this is deprecated.  */
-   else
+   else if (complain & tf_warning)
  warning_at (location, OPT_Wdeprecated,
  "use of an operand of type %qT "
  "in % is deprecated",
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype-110175.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype-110175.C
new file mode 100644
index 000..39643cafcf8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype-110175.C
@@ -0,0 +1,6 @@
+// PR c++/110175
+// { dg-do compile { target c++11 } }
+
+template auto f(T t) -> decltype(++t) { return t; } // { dg-warning 
"reference" "" { target c++14_down } }
+void f(...) {}
+void g() { f(true); }

base-commit: 5388a43f6a3f348929292998bd6d0c1da6f006de
-- 
2.41.0



[pushed] c++: provide #include hint for missing includes [PR110164]

2023-06-23 Thread David Malcolm via Gcc-patches
> On Fri, 2023-06-23 at 12:08 -0400, Jason Merrill wrote:
> > On 6/22/23 11:50, Marek Polacek wrote:
> > > On Wed, Jun 21, 2023 at 04:44:00PM -0400, David Malcolm via Gcc-patches 
> > > wrote:
> > > I'd like to ping this C++ FE patch for review:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621779.html

> > Not an approval, but LGTM, though some nits below:

> OK with the tweaks Marek suggests.

Thanks Marek and Jason.

For reference, here's what I pushed (as r14-2055-g13709b518aa976):



PR c++/110164 notes that in cases where we have a forward decl
of a std library type such as:

std::array x;

we emit this diagnostic:

error: aggregate ‘std::array x’ has incomplete type and cannot be 
defined

This patch adds this hint to the diagnostic:

note: ‘std::array’ is defined in header ‘’; this is probably fixable by 
adding ‘#include ’

gcc/cp/ChangeLog:
PR c++/110164
* cp-name-hint.h (maybe_suggest_missing_header): New decl.
* decl.cc: Define INCLUDE_MEMORY.  Add include of
"cp/cp-name-hint.h".
(start_decl_1): Call maybe_suggest_missing_header.
* name-lookup.cc (maybe_suggest_missing_header): Remove "static".

gcc/testsuite/ChangeLog:
PR c++/110164
* g++.dg/diagnostic/missing-header-pr110164.C: New test.
---
 gcc/cp/cp-name-hint.h  |  1 +
 gcc/cp/decl.cc | 10 ++
 gcc/cp/name-lookup.cc  |  2 +-
 .../g++.dg/diagnostic/missing-header-pr110164.C| 10 ++
 4 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/missing-header-pr110164.C

diff --git a/gcc/cp/cp-name-hint.h b/gcc/cp/cp-name-hint.h
index bfa7c53c8f6..7693980138a 100644
--- a/gcc/cp/cp-name-hint.h
+++ b/gcc/cp/cp-name-hint.h
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 
 extern name_hint suggest_alternatives_for (location_t, tree, bool);
 extern name_hint suggest_alternatives_in_other_namespaces (location_t, tree);
+extern name_hint maybe_suggest_missing_header (location_t, tree, tree);
 extern name_hint suggest_alternative_in_explicit_scope (location_t, tree, 
tree);
 extern name_hint suggest_alternative_in_scoped_enum (tree, tree);
 
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index c07a4a8d58d..60f107d50c4 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
line numbers.  For example, the CONST_DECLs for enum values.  */
 
 #include "config.h"
+#define INCLUDE_MEMORY
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
@@ -46,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-objc.h"
 #include "c-family/c-pragma.h"
 #include "c-family/c-ubsan.h"
+#include "cp/cp-name-hint.h"
 #include "debug.h"
 #include "plugin.h"
 #include "builtins.h"
@@ -5995,7 +5997,11 @@ start_decl_1 (tree decl, bool initialized)
;   /* An auto type is ok.  */
   else if (TREE_CODE (type) != ARRAY_TYPE)
{
+ auto_diagnostic_group d;
  error ("variable %q#D has initializer but incomplete type", decl);
+ maybe_suggest_missing_header (input_location,
+   TYPE_IDENTIFIER (type),
+   CP_TYPE_CONTEXT (type));
  type = TREE_TYPE (decl) = error_mark_node;
}
   else if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (type
@@ -6011,8 +6017,12 @@ start_decl_1 (tree decl, bool initialized)
gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (type));
   else
{
+ auto_diagnostic_group d;
  error ("aggregate %q#D has incomplete type and cannot be defined",
 decl);
+ maybe_suggest_missing_header (input_location,
+   TYPE_IDENTIFIER (type),
+   CP_TYPE_CONTEXT (type));
  /* Change the type so that assemble_variable will give
 DECL an rtl we can live with: (mem (const_int 0)).  */
  type = TREE_TYPE (decl) = error_mark_node;
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 53b6870f067..74565184403 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -6796,7 +6796,7 @@ maybe_suggest_missing_std_header (location_t location, 
tree name)
for NAME within SCOPE at LOCATION, or an empty name_hint if this isn't
applicable.  */
 
-static name_hint
+name_hint
 maybe_suggest_missing_header (location_t location, tree name, tree scope)
 {
   if (scope == NULL_TREE)
diff --git a/gcc/testsuite/g++.dg/diagnostic/missing-header-pr110164.C 
b/gcc/testsuite/g++.dg/diagnostic/missing-header-pr110164.C
new file mode 100644
index 000..15980071c38
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/missing-header-pr110164.C
@@ -0,0 +1,10 @@
+// { dg-require-effective-target c++11 }
+
+#include 

Re: [PATCH] Introduce hardbool attribute for C

2023-06-23 Thread Qing Zhao via Gcc-patches


> On Jun 21, 2023, at 10:35 PM, Alexandre Oliva  wrote:
> 
> On Jun 21, 2023, Qing Zhao  wrote:
> 
>> I see that you have testing case to check the above built_in_trap call
>> is generated by FE.
>> Do you have a testing case to check the trap is happening at runtime? 
> 
> I have written such tests, using type-punning, but I don't think our
> testing infrastructure could take trapping as success and other results
> as failure.
Okay, I see. 
> 
>> So, when -ftrivial-auto-var-init presents, what’s the behavior for the
>> hardened Boolean auto variables?
> 
> Good question.  This option was not even available when hardbool was
> designed and implemented.  (tests) The deferred_init internal function
> initializes with bit-patterns 0x00 or 0xfe, regardless of type, when the
> data lives in memory, and otherwise forces the 0x00 bit pattern for
> booleans, variable-sized types, types that cannot be accessed with a
> single mode or for modes that don't have a set pattern.

-ftrivial-auto-var-init has been in GCC since GCC12.  
The decision to use 0x00 for zero-initiation, 0xfe for pattern-initiation
has been discussed thoroughly during the design phase. -:)

These initial values (0x00 and 0xfe) are not perfect choices, but they the 
best choices based on functionality and performance tradeoff. 

Since this hardbool attribute will also  be a security feature, users that seek
security features of GCC will likely ask the same question for these two
features.

So, their interaction is better to be at least documented. That’s my main point.

> 
> It's hard for me to tell what "correct" or "expected" would be here.
> Enumerators don't seem to be given special treatment.  Checked
> enumerators, constrained integral subtypes, none of these would get
> well-formed values or even checking at the assignments.

> 
> If I were to design this option myself, I'd probably arrange for it to
> handle booleans (including hardened booleans) by zero-initializing as
> false and pattern-initializing as true, though I realize that this could
> be very confusing if one chose to use 0xfe as the value for false and/or
> 0x00 as the value for true.

For normal Boolean variables, 0x00 is false, this is a reasonable init value 
with zero-initialization.
For hardbool variables, what 0x00 represents if it’s not false or true value?  
A garbage value for this hardbool variable?
> 
> I'd probably have arranged for the front-end to create the initializer
> value, because expansion time is too late to figure it out: we may not
> even have the front-end at hand any more, in case of lto compilation.

Is the hardbool attribute information available during the rtl expansion phase?
> 
> 
> But with the current description and implementation, I guess the
> behavior is correct, if not ideal: the bit patterns refer to the
> representation, rather than to the language interpretation of the value.
> When it comes ot integral types, they may match, but floating-point,
> fixed fractional types, offsets and multipliers, even boolean member of
> larger structs...  not so much: the effect is that of a memset, rather
> than that of an assignment of zero or of the pattern to a variable.
> 
> Now, I acknowledge that the decision to make implicit
> zero-initialization of boolean types set them to the value for false,
> rather than to all-bits-zero representation, is a departure from common
> practice of zero-initialization yielding logical zero.

Dont’s quite understand the above, for normal boolean variables, 
zero-initialization set them to false, 
and also set them to all-bits-zero.(i.e, all-bits-zero is equal to logical 
false for normal boolean variable, right?)
And this is a very reasonable initial value for Boolean variables,


>  That was unusual
> enough that I thought it worth mentioning in the docs.
> 
I don’t see why this is unusual for the normal Boolean variables? Could you 
please explain a little bit?

From my understanding, only with the introduction of “hardbool” attribute, 
all-bits-zero will not be equal to the
logical false anymore. 

> 
>> This might need to be documented and also handled correctly. 
> 
> I suppose the place to document this distinction between logical values
> and representation would be under -ftrivial-auto-var-init.

Yes, the documentation of -ftrivial-auto-var-init could be updated with this 
clarification, mainly for
The new “hardbool” attribute. 

>  That's
> likely where someone using that option would look for guidance on how it
> interacts with unusual types, and where exceptions to general
> expectations WRT initialization would go.  Do you concur?
And at the same time, the doc for the new “hardbool” attribute might be better 
to add such warning too?

> 
> That said, it probalby makes sense to refer to / mention that
> -ftrivial-auto-var-init does not special-case hardened booleans in the
> hardened booleans documentation.

Agreed.

thanks.

Qing

>  I wonder if there are other
> conflicting options I'm not even 

GCC nvptx: Silence warning?

2023-06-23 Thread Jan-Benedict Glaw
Hi Tom!

Building with newer GCC versions (I'm doing CI builds with -Werror),
we might see warnings like this:

/usr/lib/gcc-snapshot/bin/g++  -fno-PIE -c   -g -O2   -DIN_GCC  
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody  
-I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd 
-I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o nvptx.o -MT nvptx.o -MMD 
-MP -MF ./.deps/nvptx.TPo ../../gcc/gcc/config/nvptx/nvptx.cc
../../gcc/gcc/config/nvptx/nvptx.cc: In function 'void 
handle_ptx_version_option()':
../../gcc/gcc/config/nvptx/nvptx.cc:325:12: error: unquoted identifier or 
keyword 'sm_' in format [-Werror=format-diag]
  325 | error ("PTX version (%<-mptx%>) needs to be at least %s to support 
selected"
  |
^
  326 |" %<-misa%> (sm_%s)", ptx_version_to_string (first),
  |
cc1plus: all warnings being treated as errors
make[1]: *** [Makefile:2456: nvptx.o] Error 1


This is because the '_' triggers a generic heuristic that this is
probably something special. As I read the message, the `sm_*` ISA
descriptor belongs to `-misa`, so maybe just reposition the
quotation marks?


gcc/ChangeLog:
* config/nvptx/nvptx.cc (handle_ptx_version_option):
Reposition quotation marks around option with argument.



diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 49cc6814178..ffe6a438265 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -323,7 +323,7 @@ handle_ptx_version_option (void)
 
   if (ptx_version_option < first)
 error ("PTX version (%<-mptx%>) needs to be at least %s to support 
selected"
-  " %<-misa%> (sm_%s)", ptx_version_to_string (first),
+  " %<-misa sm_%s%>", ptx_version_to_string (first),
   sm_version_to_string ((enum ptx_isa)ptx_isa_option));
 }
 


Ok for trunk?

Thanks,
  Jan-Benedict

-- 


signature.asc
Description: PGP signature


Re: [PATCH 1/1] libcpp: allow UCS_LIMIT codepoints in UTF-8 strings

2023-06-23 Thread Jason Merrill via Gcc-patches

On 6/21/23 14:58, Ben Boeckel wrote:

libcpp/

* charset.cc: Allow `UCS_LIMIT` in UTF-8 strings.

Reported-by: Damien Guibouret 
Fixes: c1dbaa6656a (libcpp: reject codepoints above 0x10, 2023-06-06)
Signed-off-by: Ben Boeckel 


Applied, moving the Fixes line up and changing the commit ID to the git 
gcc-descr version.  Thanks.



---
  libcpp/charset.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d4f573e365f..54ebab2b8a4 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1891,7 +1891,7 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
 invalid because they cannot be represented in UTF-16.
  
  	 Reject such values.*/

-  if (cp >= UCS_LIMIT)
+  if (cp > UCS_LIMIT)
return false;
  }
/* No problems encountered.  */




Re: [PATCH v2] c++: Add support for -std={c,gnu}++2{c,6}

2023-06-23 Thread Jason Merrill via Gcc-patches

On 6/23/23 13:22, Marek Polacek wrote:

On Fri, Jun 23, 2023 at 10:58:54AM -0400, Jason Merrill wrote:

On 6/22/23 20:25, Marek Polacek wrote:

It seems prudent to add C++26 now that the first C++26 papers have been
approved.  I followed commit r11-6920 as well as r8-3237.

I was puzzled to see that -std=c++23 was marked Undocumented but
-std=c++2b wasn't.  I think it should be the other way round, like
the earlier modes.


I was leaving -std=c++23 undocumented until C++23 was finalized, which it
now effectively is, so this change is good.  Speaking of which, it looks
like the final value for __cplusplus for C++23 is 202302L.
  
Ok.  I've updated the code to reflect that.



But similarly, I'd like to leave -std=c++26 undocumented for now.


Done.


As for __cplusplus, I've arbitrarily chosen 202600L:


Our previous convention has been the year after the previous standard, so
let's use 202400L.
  
Done.


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.


-- >8 --
It seems prudent to add C++26 now that the first C++26 papers have been
approved.  I followed commit r11-6920 as well as r8-3237.

Since C++23 is essentially finished and its __cplusplus value has
settled to 202302L, I've updated cpp_init_builtins and marked
-std=c++2b Undocumented and made -std=c++23 no longer Undocumented.

As for __cplusplus, I've chosen 202400L:

   $ xg++ -std=c++26 -dM -E -x c++ - < /dev/null | grep cplusplus
   #define __cplusplus 202400L

I've verified the patch with a simple test, exercising the new
directives.  Don't forget to update your GXX_TESTSUITE_STDS!

This patch does not add -Wc++26-extensions.

gcc/c-family/ChangeLog:

* c-common.h (cxx_dialect): Add cxx26 as a dialect.
* c-opts.cc (set_std_cxx26): New.
(c_common_handle_option): Set options when -std={c,gnu}++2{c,6} is
enabled.
(c_common_post_options): Adjust comments.
* c.opt: Add options for -std=c++26, std=c++2c, -std=gnu++26,
and -std=gnu++2c.
(std=c++2b): Mark as Undocumented.
(std=c++23): No longer Undocumented.

gcc/ChangeLog:

* doc/cpp.texi (__cplusplus): Document value for -std=c++26 and
-std=gnu++26.  Document that for C++23, its value is 202302L.
* doc/invoke.texi: Document -std=c++26 and -std=gnu++26.
* dwarf2out.cc (highest_c_language): Handle GNU C++26.
(gen_compile_unit_die): Likewise.

libcpp/ChangeLog:

* include/cpplib.h (c_lang): Add CXX26 and GNUCXX26.
* init.cc (lang_defaults): Add rows for CXX26 and GNUCXX26.
(cpp_init_builtins): Set __cplusplus to 202400L for C++26.
Set __cplusplus to 202302L for C++23.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_c++23): Return
1 also if check_effective_target_c++26.
(check_effective_target_c++23_down): New.
(check_effective_target_c++26_only): New.
(check_effective_target_c++26): New.
* g++.dg/cpp23/cplusplus.C: Adjust expected value.
* g++.dg/cpp26/cplusplus.C: New test.
---
  gcc/c-family/c-common.h|  4 +++-
  gcc/c-family/c-opts.cc | 28 +++-
  gcc/c-family/c.opt | 24 +
  gcc/doc/cpp.texi   |  7 +++---
  gcc/doc/invoke.texi| 12 +++
  gcc/dwarf2out.cc   |  5 -
  gcc/testsuite/g++.dg/cpp23/cplusplus.C |  2 +-
  gcc/testsuite/g++.dg/cpp26/cplusplus.C |  3 +++
  gcc/testsuite/lib/target-supports.exp  | 30 +-
  libcpp/include/cpplib.h|  2 +-
  libcpp/init.cc | 13 +++
  11 files changed, 113 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp26/cplusplus.C

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 336a09f4a40..b5ef5ff6b2c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -740,7 +740,9 @@ enum cxx_dialect {
/* C++20 */
cxx20,
/* C++23 */
-  cxx23
+  cxx23,
+  /* C++26 */
+  cxx26
  };
  
  /* The C++ dialect being used. C++98 is the default.  */

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index c68a2a27469..af19140e382 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -111,6 +111,7 @@ static void set_std_cxx14 (int);
  static void set_std_cxx17 (int);
  static void set_std_cxx20 (int);
  static void set_std_cxx23 (int);
+static void set_std_cxx26 (int);
  static void set_std_c89 (int, int);
  static void set_std_c99 (int);
  static void set_std_c11 (int);
@@ -663,6 +664,12 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
set_std_cxx23 (code == OPT_std_c__23 /* ISO */);
break;
  
+case OPT_std_c__26:

+case OPT_std_gnu__26:
+  if (!preprocessing_asm_p)
+   set_std_cxx26 (code == OPT_std_c__26 /* ISO */);
+  break;
+
  case 

Re: [PATCH][RFC] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 02:26, Richard Biener via Gcc-patches wrote:

The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
ICEs when tree checking is enabled.  This should avoid wrong-code
in cases like PR110182 and instead ICE.

It also introduces a TYPE_PRECISION_RAW accessor and adjusts
places I found that are eligible to use that.

This patch requires (at least) the series of patches I will
followup this with.  I have to re-bootstrap / test to look
for further fallout (I've picked this up again after some weeks).

Opinions?

Thanks,
Richard.

* tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
(TYPE_PRECISION_RAW): Provide raw access to the precision
field.
* tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
(gimple_canonical_types_compatible_p): Likewise.
* tree-streamer-out.cc (pack_ts_type_common_value_fields):
Stream TYPE_PRECISION_RAW.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields):
Likewise.
* lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.
Given how easy it is to incorrectly use TYPE_PRECISION on VECTOR_TYPE, 
I'm all for it.


jeff


Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 06:54, Li, Pan2 wrote:

Thanks Robine for the explanation, it is very clear to me. Totally agree below 
parts and I think we can leave it to the maintainers of the RTL/Machine 
Descriptions.


Now we could argue that combine's behavior should change here and an
insn without any alternatives is not actually available but that's not
a battle I'm willing to fight 


Pan

-Original Message-
From: Robin Dapp 
Sent: Thursday, June 22, 2023 10:31 PM
To: Li, Pan2 ; 钟居哲 ; gcc-patches 
; palmer ; kito.cheng ; Jeff 
Law 
Cc: rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).


Just curious about the combine pass you mentioned, not very sure my
understand is correct but it looks like the combine pass totally
ignore the iterator requirement?

It is sort of surprise to me as the combine pass may also need the
information of iterators.


combine tries to match instructions (with fitting modes of course).
It does not look at the insn constraints that reload/lra later can
use to switch between alternatives depending on the register situation
and other factors.

We e.g. have an instruction
  (define_insn "bla"
(set (match_operand:VF 1   "=vd")
 (match_operand:VF 2   "vr"))
...
and implicitly
   [(set_attr "enabled" "true")]

This instruction gets multiplexed via the VF iterator into (among others)
   (define_insn "bla"
 (set (match_operand:VNx4HF 1   "=vd")
  (match_operand:VNx4HF 2   "vr"))
 ...
   [(set_attr "enabled" "true")]

When we set "enabled" to "false" via "fp_vector_disabled", we have:
   (define_insn "bla"
 (set (match_operand:VNx4HF 1   "=vd")
  (match_operand:VNx4HF 2   "vr"))
 ...
   [(set_attr "enabled" "false")]

This means the only available alternative is disabled but the insn
itself is still there, particularly for combine which does not look
into the constraints.

So in our case the iterator "allowed" the instruction (leading combine
to think it is available) and we later masked it out with "enabled = false".
Now we could argue that combine's behavior should change here and an
insn without any alternatives is not actually available but that's not
a battle I'm willing to fight :D
More importantly, at combine time we don't know which alternative will 
match.  In fact, you can run into cases where no alternative matches 
until register allocation -- this was fairly common in the past as it 
allowed for simpler machine descriptions.  It fell out of favor in the 
90s as more targets started using scheduling and we wanted to expose as 
much of the final code as we could for the first scheduling pass.


Jeff



Re: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 07:56, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Apply 
LEN_MASK_{LOAD,STORE}

---
  gcc/tree-ssa-alias.cc | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index e1bc04b82ba..92dc1bb9987 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -2815,11 +2815,13 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
bool tbaa_p)
case IFN_SCATTER_STORE:
case IFN_MASK_SCATTER_STORE:
case IFN_LEN_STORE:
+  case IFN_LEN_MASK_STORE:
return false;
case IFN_MASK_STORE_LANES:
goto process_args;
case IFN_MASK_LOAD:
case IFN_LEN_LOAD:
+  case IFN_LEN_MASK_LOAD:
case IFN_MASK_LOAD_LANES:
{
  ao_ref rhs_ref;
Don't you need to adjust how you compute the size for the LEN_MASK_LOAD 
case?


jeff


Re: [PATCH] LOOP IVOPTS: Apply LEN_MASK_{LOAD,STORE}

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 08:05, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Apply 
LEN_MASK_{LOAD,STORE}.

---
  gcc/tree-ssa-loop-ivopts.cc | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 6671ff6db5a..2b66fe66bc7 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -2442,6 +2442,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
  case IFN_MASK_LOAD:
  case IFN_MASK_LOAD_LANES:
  case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
if (op_p == gimple_call_arg_ptr (call, 0))
return TREE_TYPE (gimple_call_lhs (call));
return NULL_TREE;
@@ -2449,8 +2450,11 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
  case IFN_MASK_STORE:
  case IFN_MASK_STORE_LANES:
  case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
if (op_p == gimple_call_arg_ptr (call, 0))
-   return TREE_TYPE (gimple_call_arg (call, 3));
+   return TREE_TYPE (
+ gimple_call_arg (call, internal_fn_stored_value_index (
+  gimple_call_internal_fn (call;
Formatting nit.  Compute the result of internal_fn_stored_value_index 
into a temporary and pass that temporary into gimple_call_arg which 
should clean up the formatting here.


In general, if you find yourself indenting after an open paren like 
you've done here, compute the value into a temporary.


OK with the formatting fix.

jeff



Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-23 Thread Jason Merrill via Gcc-patches

On 6/20/23 15:46, Ben Boeckel wrote:

On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:

On 1/25/23 13:06, Ben Boeckel wrote:



Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

>> I notice that the cpp dependency generation tries (in open_file_failed)
to continue after encountering a missing file, is that not sufficient 
for header units?  Or adjustable to be sufficient?


No. Header units can introduce macros which can be used to modify the
set of modules that are imported. Included headers are "discovered"
dependencies and don't modify the build graph (just add more files that
trigger a rebuild) and can be collected during compilation. Module
dependencies are needed to get the build correct in the first place in
order to:

- order module compilations in the build graph so that imported modules
  are ready before anything using them; and
- computing the set of flags needed for telling the compiler where
  imported modules' CMI files should be located.


So if the header unit CMI isn't available during dependency generation, 
would it be better to just #include the header?



+  if (cpp_opts->deps.format != DEPS_FMT_NONE)
+{
+  if (!fdeps_file)
+   fdeps_stream = out_stream;
+  else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0')
+   fdeps_stream = stdout;


You probably want to check that deps_stream and fdeps_stream don't end
up as the same stream.


Hmm. But `stdout` is probably fine to use for both though. Basically:

 if (fdeps_stream == out_stream && fdeps_stream != stdout)
   make_diagnostic_noise ();


(fdeps_stream == deps_stream, but sure, that's reasonable.


So, I take it this is the common use case you have in mind, generating
Make dependencies for the p1689 file?  When are you thinking the Make
dependencies for the .o are generated?  At build time?


Yes. If an included file changes, the scanning should be performed
again. The compilation will have its own `-MF` as well (which should
point to the same files plus the CMI files it ends up reading).


I'm a bit surprised you're using .json rather than an extension that
indicates what the information is.


I can change that; the filename doesn't *really* matter (e.g., CMake
uses `.ddi` for "dynamic dependency information").


That works.


`-M` is about discovered dependencies: those that you find out while
doing work. `-fdep-*` is about ordering dependencies: extracting
information from file content in order to even order future work around.


I'm not sure I see the distinction; Makefiles also express ordering
dependencies.  In both cases, you want to find out from the files what
order you will want to process them in when building the project.


Makefiles can express ordering dependencies, but not the `-M` snippets;
these are for files that, if changed, should trigger a rebuild. This is > 
fundamentally different than module dependencies which instead indicate
which *compiles* (or CMI generation if using a 2-phase setup) need to
complete before compilation (or CMI generation) of the scanned TU can be
performed. Generally generated headers will be ordered manually in the
build system description. However, maintaining that same level for
in-source dependency information on a per-source level is a *far* higher
burden.


The main difference I see is that the CMI might not exist yet.  As you 
say, we don't want to require people to write all the dependencies by 
hand, but that just means we need to be able to generate the 
dependencies automatically.  In the Make-only model I'm thinking of, one 
would collect dependencies on an initial failing build, and then start 
over from the beginning again with the dependencies we discovered.  It's 
the same two-phase scan and build, but one that uses the same compile 
commands for both phases.


Anyway, this isn't an objection to this patch, just another model I also 
want to support.






Is there a reason not to use the gcc/json.h interface for JSON output?


This is `libcpp`; is that not a dependency cycle?


Ah, indeed.  We could move it to libiberty, but it would need 
significant adjustments to remove its dependencies on other stuff in 
gcc/.  So maybe just add a TODO comment about that, along with adding 
comments before the functions.


Jason



Re: [PATCH] libstdc++: Use RAII in std::vector::_M_realloc_insert

2023-06-23 Thread Jonathan Wakely via Gcc-patches
On Fri, 23 Jun 2023 at 17:44, Jan Hubicka wrote:

> > I intend to push this to trunk once testing finishes.
> >
> > I generated the diff with -b so the whitespace changes aren't shown,
> > because there was some re-indenting that makes the diff look larger than
> > it really is.
> >
> > Honza, I don't think this is likely to make much difference for the PR
> > 110287 testcases, but I think it simplifies the code and so is an
> > improvement in terms of maintenance and readability.
>
> Thanks for cleaning it up :)
> The new version seems slightly smaller than the original in inliner
> metrics.
>

Oh good. It's pushed to trunk now.

[snip]


> To work out that the code path is really very unlikely and should be
> offloaded to a cold section, we however need:
>
> diff --git a/libstdc++-v3/include/bits/functexcept.h
> b/libstdc++-v3/include/bits/functexcept.h
> index 89972baf2c9..2765f7865df 100644
> --- a/libstdc++-v3/include/bits/functexcept.h
> +++ b/libstdc++-v3/include/bits/functexcept.h
> @@ -46,14 +46,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #if _GLIBCXX_HOSTED
>// Helper for exception objects in 
>void
> -  __throw_bad_exception(void) __attribute__((__noreturn__));
> +  __throw_bad_exception(void) __attribute__((__noreturn__,__cold__));
>
>// Helper for exception objects in 
>void
> -  __throw_bad_alloc(void) __attribute__((__noreturn__));
> +  __throw_bad_alloc(void) __attribute__((__noreturn__,__cold__));
>
>void
> -  __throw_bad_array_new_length(void) __attribute__((__noreturn__));
> +  __throw_bad_array_new_length(void)
> __attribute__((__noreturn__,__cold__));
>
>// Helper for exception objects in 
>void
>
> This makes us to drop cont to profile_count::zero which indicates that
> the code is very likely not executed at all during run of the program.
>
> The reason why we can't take such a strong hint from unreachable
> attribute is twofold.  First most programs do call "exit (0)" so taking
> this as a strong hint may make us to optimize whole program for size.
> Second is that we consider a possibility that insane developers may make
> EH delivery relatively common.
>
> Would be possible to annotate throw functions in libstdc++ which are
> very unlikely taken by a working program as __cold__ and possibly drop
> the redundant __builtin_expect?
>

Yes, I incorrectly thought they were already considered cold, but your
explanation of why noreturn is not as strong a hint as noreturn+cold makes
sense.

I think the __throw_bad_alloc() and __throw_bad_array_new_length()
functions should always be rare, so marking them cold seems fine (users who
define their own allocators that want to throw bad_alloc "often" will
probably throw it directly, they shouldn't be using our __throw_bad_alloc()
function anyway). I don't think __throw_bad_exception is ever used, so that
doesn't matter (we could remove it from the header and just keep its
definition in the library, but there's no big advantage to doing that).
Others like __throw_length_error() should also be very very rare, and could
be marked cold.

Maybe we should just mark everything in  as cold. If
users want to avoid the cost of calls to those functions they can do so by
checking function preconditions/arguments to avoid the exceptions. There
are very few places where a throwing libstdc++ API doesn't have a way to
avoid the exception. The only one that isn't easily avoidable is
__throw_bad_alloc but OOM should be rare.



> I will reorder predictors so __builtin_cold_noreturn and
> __builtin_expect_with_probability thakes precedence over
> __builtin_expect.
>
> It is fun to see how many things can go wrong in such a simple use of
> libstdc++ :)
>

Yes, it's very interesting!


[PATCH v2] c++: Add support for -std={c,gnu}++2{c,6}

2023-06-23 Thread Marek Polacek via Gcc-patches
On Fri, Jun 23, 2023 at 10:58:54AM -0400, Jason Merrill wrote:
> On 6/22/23 20:25, Marek Polacek wrote:
> > It seems prudent to add C++26 now that the first C++26 papers have been
> > approved.  I followed commit r11-6920 as well as r8-3237.
> > 
> > I was puzzled to see that -std=c++23 was marked Undocumented but
> > -std=c++2b wasn't.  I think it should be the other way round, like
> > the earlier modes.
> 
> I was leaving -std=c++23 undocumented until C++23 was finalized, which it
> now effectively is, so this change is good.  Speaking of which, it looks
> like the final value for __cplusplus for C++23 is 202302L.
 
Ok.  I've updated the code to reflect that.

> But similarly, I'd like to leave -std=c++26 undocumented for now.

Done.

> > As for __cplusplus, I've arbitrarily chosen 202600L:
> 
> Our previous convention has been the year after the previous standard, so
> let's use 202400L.
 
Done.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
It seems prudent to add C++26 now that the first C++26 papers have been
approved.  I followed commit r11-6920 as well as r8-3237.

Since C++23 is essentially finished and its __cplusplus value has
settled to 202302L, I've updated cpp_init_builtins and marked
-std=c++2b Undocumented and made -std=c++23 no longer Undocumented.

As for __cplusplus, I've chosen 202400L:

  $ xg++ -std=c++26 -dM -E -x c++ - < /dev/null | grep cplusplus
  #define __cplusplus 202400L

I've verified the patch with a simple test, exercising the new
directives.  Don't forget to update your GXX_TESTSUITE_STDS!

This patch does not add -Wc++26-extensions.

gcc/c-family/ChangeLog:

* c-common.h (cxx_dialect): Add cxx26 as a dialect.
* c-opts.cc (set_std_cxx26): New.
(c_common_handle_option): Set options when -std={c,gnu}++2{c,6} is
enabled.
(c_common_post_options): Adjust comments.
* c.opt: Add options for -std=c++26, std=c++2c, -std=gnu++26,
and -std=gnu++2c.
(std=c++2b): Mark as Undocumented.
(std=c++23): No longer Undocumented.

gcc/ChangeLog:

* doc/cpp.texi (__cplusplus): Document value for -std=c++26 and
-std=gnu++26.  Document that for C++23, its value is 202302L.
* doc/invoke.texi: Document -std=c++26 and -std=gnu++26.
* dwarf2out.cc (highest_c_language): Handle GNU C++26.
(gen_compile_unit_die): Likewise.

libcpp/ChangeLog:

* include/cpplib.h (c_lang): Add CXX26 and GNUCXX26.
* init.cc (lang_defaults): Add rows for CXX26 and GNUCXX26.
(cpp_init_builtins): Set __cplusplus to 202400L for C++26.
Set __cplusplus to 202302L for C++23.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_c++23): Return
1 also if check_effective_target_c++26.
(check_effective_target_c++23_down): New.
(check_effective_target_c++26_only): New.
(check_effective_target_c++26): New.
* g++.dg/cpp23/cplusplus.C: Adjust expected value.
* g++.dg/cpp26/cplusplus.C: New test.
---
 gcc/c-family/c-common.h|  4 +++-
 gcc/c-family/c-opts.cc | 28 +++-
 gcc/c-family/c.opt | 24 +
 gcc/doc/cpp.texi   |  7 +++---
 gcc/doc/invoke.texi| 12 +++
 gcc/dwarf2out.cc   |  5 -
 gcc/testsuite/g++.dg/cpp23/cplusplus.C |  2 +-
 gcc/testsuite/g++.dg/cpp26/cplusplus.C |  3 +++
 gcc/testsuite/lib/target-supports.exp  | 30 +-
 libcpp/include/cpplib.h|  2 +-
 libcpp/init.cc | 13 +++
 11 files changed, 113 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp26/cplusplus.C

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 336a09f4a40..b5ef5ff6b2c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -740,7 +740,9 @@ enum cxx_dialect {
   /* C++20 */
   cxx20,
   /* C++23 */
-  cxx23
+  cxx23,
+  /* C++26 */
+  cxx26
 };
 
 /* The C++ dialect being used. C++98 is the default.  */
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index c68a2a27469..af19140e382 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -111,6 +111,7 @@ static void set_std_cxx14 (int);
 static void set_std_cxx17 (int);
 static void set_std_cxx20 (int);
 static void set_std_cxx23 (int);
+static void set_std_cxx26 (int);
 static void set_std_c89 (int, int);
 static void set_std_c99 (int);
 static void set_std_c11 (int);
@@ -663,6 +664,12 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
set_std_cxx23 (code == OPT_std_c__23 /* ISO */);
   break;
 
+case OPT_std_c__26:
+case OPT_std_gnu__26:
+  if (!preprocessing_asm_p)
+   set_std_cxx26 (code == OPT_std_c__26 /* ISO */);
+  break;
+
 case OPT_std_c90:
 case OPT_std_iso9899_199409:
   if 

Re: [PATCH v2 3/3] c++: Improve location information in constexpr evaluation

2023-06-23 Thread Patrick Palka via Gcc-patches
On Wed, 29 Mar 2023, Nathaniel Shead via Gcc-patches wrote:

> This patch caches the current expression's location information in the
> constexpr_global_ctx struct, which allows subexpressions that have lost
> location information to still provide accurate diagnostics. Also
> rewrites a number of 'error' calls as 'error_at' to provide more
> specific location information.
> 
> The primary effect of this change is that many errors within evaluation
> of a constexpr function will now point at the offending expression (with
> expansion tracing information) rather than just the outermost call.

This seems like a great improvement!

In other parts of the frontend, e.g. during substitution from
tsubst_expr or tsubst_copy_and_build, we do something similar by
setting/restoring input_location directly.  (We've since added the RAII
class iloc_sentinel for this.)  I wonder if that'd be preferable here?

> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (constexpr_global_ctx): New field for cached
>   tree location, defaulting to input_location.
>   (cxx_eval_internal_function): Fall back to ctx->global->loc
>   rather than input_location.
>   (modifying_const_object_error): Likewise.
>   (cxx_eval_dynamic_cast_fn): Likewise.
>   (eval_and_check_array_index): Likewise.
>   (cxx_eval_array_reference): Likewise.
>   (cxx_eval_bit_field_ref): Likewise.
>   (cxx_eval_component_reference): Likewise.
>   (cxx_eval_indirect_ref): Likewise.
>   (cxx_eval_store_expression): Likewise.
>   (cxx_eval_increment_expression): Likewise.
>   (cxx_eval_loop_expr): Likewise.
>   (cxx_eval_binary_expression): Likewise.
>   (cxx_eval_constant_expression): Cache location of trees for use
> in errors, and prefer it instead of input_location.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/constexpr-48089.C: Updated diagnostic locations.
>   * g++.dg/cpp0x/constexpr-diag3.C: Likewise.
>   * g++.dg/cpp0x/constexpr-ice20.C: Likewise.
>   * g++.dg/cpp1y/constexpr-89481.C: Likewise.
>   * g++.dg/cpp1y/constexpr-lifetime1.C: Likewise.
>   * g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
>   * g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
>   * g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
>   * g++.dg/cpp1y/constexpr-lifetime5.C: Likewise.
>   * g++.dg/cpp1y/constexpr-union5.C: Likewise.
>   * g++.dg/cpp1y/pr68180.C: Likewise.
>   * g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
>   * g++.dg/cpp2a/bit-cast11.C: Likewise.
>   * g++.dg/cpp2a/bit-cast12.C: Likewise.
>   * g++.dg/cpp2a/bit-cast14.C: Likewise.
>   * g++.dg/cpp2a/constexpr-98122.C: Likewise.
>   * g++.dg/cpp2a/constexpr-dynamic17.C: Likewise.
>   * g++.dg/cpp2a/constexpr-init1.C: Likewise.
>   * g++.dg/cpp2a/constexpr-new12.C: Likewise.
>   * g++.dg/cpp2a/constexpr-new3.C: Likewise.
>   * g++.dg/ext/constexpr-vla2.C: Likewise.
>   * g++.dg/ext/constexpr-vla3.C: Likewise.
>   * g++.dg/ubsan/pr63956.C: Likewise.
> 
> libstdc++/ChangeLog:
> 
>   * testsuite/25_algorithms/equal/constexpr_neg.cc: Updated
>   diagnostics locations.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/constexpr.cc   | 83 +++
>  gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  | 10 +--
>  gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |  2 +-
>  gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  4 +-
>  gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |  3 +-
>  .../g++.dg/cpp1y/constexpr-lifetime1.C|  1 +
>  .../g++.dg/cpp1y/constexpr-lifetime2.C|  4 +-
>  .../g++.dg/cpp1y/constexpr-lifetime3.C|  4 +-
>  .../g++.dg/cpp1y/constexpr-lifetime4.C|  2 +-
>  .../g++.dg/cpp1y/constexpr-lifetime5.C|  4 +-
>  gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |  4 +-
>  gcc/testsuite/g++.dg/cpp1y/pr68180.C  |  4 +-
>  .../g++.dg/cpp1z/constexpr-lambda6.C  |  4 +-
>  gcc/testsuite/g++.dg/cpp2a/bit-cast11.C   | 10 +--
>  gcc/testsuite/g++.dg/cpp2a/bit-cast12.C   | 10 +--
>  gcc/testsuite/g++.dg/cpp2a/bit-cast14.C   | 14 ++--
>  gcc/testsuite/g++.dg/cpp2a/constexpr-98122.C  |  4 +-
>  .../g++.dg/cpp2a/constexpr-dynamic17.C|  5 +-
>  gcc/testsuite/g++.dg/cpp2a/constexpr-init1.C  |  5 +-
>  gcc/testsuite/g++.dg/cpp2a/constexpr-new12.C  |  6 +-
>  gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   | 10 +--
>  gcc/testsuite/g++.dg/ext/constexpr-vla2.C |  4 +-
>  gcc/testsuite/g++.dg/ext/constexpr-vla3.C |  4 +-
>  gcc/testsuite/g++.dg/ubsan/pr63956.C  |  4 +-
>  .../25_algorithms/equal/constexpr_neg.cc  |  7 +-
>  25 files changed, 111 insertions(+), 101 deletions(-)
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index bdbc12144a7..74045477a92 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -1165,10 +1165,12 @@ public:
>hash_set *modifiable;
>/* Number of heap VAR_DECL deallocations.  */
>unsigned 

Re: [PATCH v2 2/3] c++: Improve constexpr error for dangling local variables

2023-06-23 Thread Patrick Palka via Gcc-patches
On Wed, 29 Mar 2023, Nathaniel Shead via Gcc-patches wrote:

> Currently, when typeck discovers that a return statement will refer to a
> local variable it rewrites to return a null pointer. This causes the
> error messages for using the return value in a constant expression to be
> unhelpful, especially for reference return values.
> 
> This patch removes this "optimisation". Relying on this raises a warning
> by default and causes UB anyway, so there should be no issue in doing
> so. We also suppress additional warnings from later passes that detect
> this as a dangling pointer, since we've already indicated this anyway.

LGTM.  It seems the original motivation for returning a null pointer
here was to avoid issuing duplicate warnings
(https://gcc.gnu.org/legacy-ml/gcc-patches/2014-04/msg00269.html)
which your patch addresses.

> 
> gcc/cp/ChangeLog:
> 
>   * semantics.cc (finish_return_stmt): Suppress dangling pointer
> reporting on return statement if already reported.
>   * typeck.cc (check_return_expr): Don't set return expression to
> zero for dangling addresses.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.dg/cpp1y/constexpr-lifetime5.C: Test reported message is
> correct.
>   * g++.dg/warn/Wreturn-local-addr-6.C: Remove check for return
> value optimisation.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/semantics.cc  | 5 -
>  gcc/cp/typeck.cc | 5 +++--
>  gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C | 4 ++--
>  gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C | 3 ---
>  4 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 87c2e8a7111..14b4b7f4ce1 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -1246,7 +1246,10 @@ finish_return_stmt (tree expr)
>  
>r = build_stmt (input_location, RETURN_EXPR, expr);
>if (no_warning)
> -suppress_warning (r, OPT_Wreturn_type);
> +{
> +  suppress_warning (r, OPT_Wreturn_type);
> +  suppress_warning (r, OPT_Wdangling_pointer_);
> +}
>r = maybe_cleanup_point_expr_void (r);
>r = add_stmt (r);
>  
> diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
> index afb956087ce..a7d642e2029 100644
> --- a/gcc/cp/typeck.cc
> +++ b/gcc/cp/typeck.cc
> @@ -11235,8 +11235,9 @@ check_return_expr (tree retval, bool *no_warning)
>else if (!processing_template_decl
>  && maybe_warn_about_returning_address_of_local (retval, loc)
>  && INDIRECT_TYPE_P (valtype))
> - retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
> -  build_zero_cst (TREE_TYPE (retval)));
> + /* Suppress the Wdangling-pointer warning in the return statement
> +that would otherwise occur.  */
> + *no_warning = true;
>  }
>  
>if (processing_template_decl)
> diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C 
> b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
> index a4bc71d890a..ad3ef579f63 100644
> --- a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
> +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
> @@ -1,11 +1,11 @@
>  // { dg-do compile { target c++14 } }
>  // { dg-options "-Wno-return-local-addr" }
>  
> -constexpr const int& id(int x) { return x; }
> +constexpr const int& id(int x) { return x; }  // { dg-message "note: 
> declared here" }
>  
>  constexpr bool test() {
>const int& y = id(3);
>return y == 3;
>  }
>  
> -constexpr bool x = test();  // { dg-error "" }
> +constexpr bool x = test();  // { dg-error "accessing object outside its 
> lifetime" }
> diff --git a/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C 
> b/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C
> index fae8b7e766f..ec8e241d83e 100644
> --- a/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C
> +++ b/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C
> @@ -24,6 +24,3 @@ return_addr_local_as_intref (void)
>  
>return (const intptr_t&)a;   // { dg-warning "\\\[-Wreturn-local-addr]" } 
> */
>  }
> -
> -/* Verify that the return value has been replaced with zero:
> -  { dg-final { scan-tree-dump-times "return 0;" 2 "optimized" } } */
> -- 
> 2.34.1
> 
> 



[PATCH] rs6000: Change GPR2 to volatile & non-fixed register for function that does not use TOC [PR110320]

2023-06-23 Thread P Jeevitha via Gcc-patches
Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

Normally, GPR2 is the TOC pointer and is defined as a fixed and non-volatile
register. However, it can be used as volatile for PCREL addressing. Therefore,
if the code is PCREL and the user is not explicitly requesting TOC addressing,
then the register r2 can be changed to volatile and non-fixed register. Changes
in register preservation roles can be accomplished with the help of available
target hooks (TARGET_CONDITIONAL_REGISTER_USAGE).

2023-06-23  Jeevitha Palanisamy  

gcc/
PR target/PR110320
* config/rs6000/rs6000.cc (rs6000_conditional_register_usage): Change
GPR2 to volatile and non-fixed register for pc-relative code.

gcc/testsuite/
PR target/PR110320
* gcc.target/powerpc/pr110320_1.c: New testcase.
* gcc.target/powerpc/pr110320_2.c: New testcase.

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 546c353029b..9e978f85f9d 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10169,6 +10169,35 @@ rs6000_conditional_register_usage (void)
   if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
 call_used_regs[2] = 0;
 
+  /* The TOC register is not needed for functions using the PC-relative ABI
+ extension, so make it available for register allocation as a volatile
+ register.  */
+  if (FIXED_R2 && rs6000_pcrel_p ())
+{
+  bool cli_fixedr2 = false;
+
+  /* Verify the user has not explicitly asked for GPR2 to be fixed.  */
+  if (common_deferred_options)
+   {
+ unsigned int idx;
+ cl_deferred_option *opt;
+ vec v;
+ v = *((vec *) common_deferred_options);
+ FOR_EACH_VEC_ELT (v, idx, opt)
+   if (opt->opt_index == OPT_ffixed_ && strcmp (opt->arg,"r2") == 0)
+ {
+   cli_fixedr2 = true;
+   break;
+ }
+   }
+
+  /* If GPR2 is not FIXED (eg, not a TOC register), then it is volatile.  
*/
+  if (!cli_fixedr2)
+   {
+ fixed_regs[2] = 0;
+ call_used_regs[2] = 1;
+   }
+}
   if (DEFAULT_ABI == ABI_V4 && flag_pic == 2)
 fixed_regs[RS6000_PIC_OFFSET_TABLE_REGNUM] = 1;
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110320_1.c 
b/gcc/testsuite/gcc.target/powerpc/pr110320_1.c
new file mode 100644
index 000..42143fbf889
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110320_1.c
@@ -0,0 +1,23 @@
+/* PR target/110320 */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -ffixed-r0 -ffixed-r11 -ffixed-r12" 
} */
+
+/* Ensure we use r2 as a normal volatile register for the code below.
+   The test case ensures all of the parameter registers r3 - r10 are used
+   and needed after we compute the expression "x + y" which requires a
+   temporary.  The -ffixed-r* options disallow using the other volatile
+   registers r0, r11 and r12.  That leaves RA to choose from r2 and the more
+   expensive non-volatile registers for the temporary to be assigned to, and
+   RA will always chooses the cheaper volatile r2 register.  */
+
+extern long bar (long, long, long, long, long, long, long, long *);
+
+long
+foo (long r3, long r4, long r5, long r6, long r7, long r8, long r9, long *r10)
+{
+  *r10 = r3 + r4;
+  return bar (r3, r4, r5, r6, r7, r8, r9, r10);
+}
+
+/* { dg-final { scan-assembler {\madd 2,3,4\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110320_2.c 
b/gcc/testsuite/gcc.target/powerpc/pr110320_2.c
new file mode 100644
index 000..9d0da5b9695
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110320_2.c
@@ -0,0 +1,22 @@
+/* PR target/110320 */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -mno-pcrel -ffixed-r0 -ffixed-r11 
-ffixed-r12" } */
+
+/* Ensure we don't use r2 as a normal volatile register for the code below.
+   The test case ensures all of the parameter registers r3 - r10 are used
+   and needed after we compute the expression "x + y" which requires a
+   temporary.  The -ffixed-r* options disallow using the other volatile
+   registers r0, r11 and r12.  That only leaves RA to choose from the more
+   expensive non-volatile registers for the temporary to be assigned to.  */
+
+extern long bar (long, long, long, long, long, long, long, long *);
+
+long
+foo (long r3, long r4, long r5, long r6, long r7, long r8, long r9, long *r10)
+{
+  *r10 = r3 + r4;
+  return bar (r3, r4, r5, r6, r7, r8, r9, r10);
+}
+
+/* { dg-final { scan-assembler-not {\madd 2,3,4\M} } } */




Re: [PATCH] libstdc++: Use RAII in std::vector::_M_realloc_insert

2023-06-23 Thread Jan Hubicka via Gcc-patches
> I intend to push this to trunk once testing finishes.
> 
> I generated the diff with -b so the whitespace changes aren't shown,
> because there was some re-indenting that makes the diff look larger than
> it really is.
> 
> Honza, I don't think this is likely to make much difference for the PR
> 110287 testcases, but I think it simplifies the code and so is an
> improvement in terms of maintenance and readability.

Thanks for cleaning it up :)
The new version seems slightly smaller than the original in inliner
metrics.

I started to look if we can break out useful parts out of
_M_realloc_insert to make it smaller and fit -O3 inline limits.
ipa-fnsplit does some "useful" job, like break out:

   [local count: 107374184]:
  if (__n_9(D) > 2305843009213693951)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 53687092]:
  std::__throw_bad_array_new_length ();

   [local count: 53687092]:
  std::__throw_bad_alloc ();

from std::__new_allocator ::allocate
into a separate function, which saves another 4 instructions in the estimate.

It is fun to notice that both checks are dead with default predictor,
but we do not know that because _M_check_len is not inlined and we do
not have return value value range propagation, which I will add.  With
your propsed change to _M_check_len we will also need to solve PR110377
and actually notice the value range implied by __bulitin_unreachable
early enough.

What however also goes wrong is that after splitting we decide to inline
it back before we consider inlining _M_realloc_insert, so the savings
does not help.  The reason is that the profile is estimated as:

  _4 = __builtin_expect (_3, 0);
  if (_4 != 0)
goto ; [10.00%]
  else
goto ; [90.00%]

so we expect that with 10% probability the allocation will exceed 64bit
address space.  The reason is that __builtin_expect is defined to have
10% missrate which we can't change, since it is used in algorithms where
the probability of unlikely value really is non-zero.

There is __builtin_expect_with_probability that makes it to possible to
set probability to 0 or 100 that may be better in such situation,
however here it is useless.  If code path leads to noreturn function,
we predict it as noreturn.  This heuristics has lower precedence than
builtin_expect so it is not applied, but would do the same work.

To work out that the code path is really very unlikely and should be
offloaded to a cold section, we however need:

diff --git a/libstdc++-v3/include/bits/functexcept.h 
b/libstdc++-v3/include/bits/functexcept.h
index 89972baf2c9..2765f7865df 100644
--- a/libstdc++-v3/include/bits/functexcept.h
+++ b/libstdc++-v3/include/bits/functexcept.h
@@ -46,14 +46,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if _GLIBCXX_HOSTED
   // Helper for exception objects in 
   void
-  __throw_bad_exception(void) __attribute__((__noreturn__));
+  __throw_bad_exception(void) __attribute__((__noreturn__,__cold__));
 
   // Helper for exception objects in 
   void
-  __throw_bad_alloc(void) __attribute__((__noreturn__));
+  __throw_bad_alloc(void) __attribute__((__noreturn__,__cold__));
 
   void
-  __throw_bad_array_new_length(void) __attribute__((__noreturn__));
+  __throw_bad_array_new_length(void) __attribute__((__noreturn__,__cold__));
 
   // Helper for exception objects in 
   void

This makes us to drop cont to profile_count::zero which indicates that
the code is very likely not executed at all during run of the program.

The reason why we can't take such a strong hint from unreachable
attribute is twofold.  First most programs do call "exit (0)" so taking
this as a strong hint may make us to optimize whole program for size.
Second is that we consider a possibility that insane developers may make
EH delivery relatively common.

Would be possible to annotate throw functions in libstdc++ which are
very unlikely taken by a working program as __cold__ and possibly drop
the redundant __builtin_expect?

I will reorder predictors so __builtin_cold_noreturn and
__builtin_expect_with_probability thakes precedence over
__builtin_expect.

It is fun to see how many things can go wrong in such a simple use of
libstdc++ :)

Honza


Re: [PATCH v2 1/3] c++: Track lifetimes in constant evaluation [PR70331, PR96630, PR98675]

2023-06-23 Thread Patrick Palka via Gcc-patches
On Wed, 29 Mar 2023, Nathaniel Shead via Gcc-patches wrote:

> This adds rudimentary lifetime tracking in C++ constexpr contexts,
> allowing the compiler to report errors with using values after their
> backing has gone out of scope. We don't yet handle other ways of ending
> lifetimes (e.g. explicit destructor calls).

Awesome!

> 
>   PR c++/96630
>   PR c++/98675
>   PR c++/70331
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (constexpr_global_ctx::put_value): Mark value as
>   in lifetime.
>   (constexpr_global_ctx::remove_value): Mark value as expired.
>   (cxx_eval_call_expression): Remove comment that is no longer
>   applicable.
>   (non_const_var_error): Add check for expired values.
>   (cxx_eval_constant_expression): Add checks for expired values. Forget
>   local variables at end of bind expressions. Forget temporaries at end
>   of cleanup points.
>   * cp-tree.h (struct lang_decl_base): New flag to track expired values
>   in constant evaluation.
>   (DECL_EXPIRED_P): Access the new flag.
>   (SET_DECL_EXPIRED_P): Modify the new flag.
>   * module.cc (trees_out::lang_decl_bools): Write out the new flag.
>   (trees_in::lang_decl_bools): Read in the new flag.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/constexpr-ice20.C: Update error raised by test.
>   * g++.dg/cpp1y/constexpr-lifetime1.C: New test.
>   * g++.dg/cpp1y/constexpr-lifetime2.C: New test.
>   * g++.dg/cpp1y/constexpr-lifetime3.C: New test.
>   * g++.dg/cpp1y/constexpr-lifetime4.C: New test.
>   * g++.dg/cpp1y/constexpr-lifetime5.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/constexpr.cc   | 69 +++
>  gcc/cp/cp-tree.h  | 10 ++-
>  gcc/cp/module.cc  |  2 +
>  gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  2 +-
>  .../g++.dg/cpp1y/constexpr-lifetime1.C| 13 
>  .../g++.dg/cpp1y/constexpr-lifetime2.C| 20 ++
>  .../g++.dg/cpp1y/constexpr-lifetime3.C| 13 
>  .../g++.dg/cpp1y/constexpr-lifetime4.C| 11 +++
>  .../g++.dg/cpp1y/constexpr-lifetime5.C| 11 +++
>  9 files changed, 137 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index 3de60cfd0f8..bdbc12144a7 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -1185,10 +1185,22 @@ public:
>void put_value (tree t, tree v)
>{
>  bool already_in_map = values.put (t, v);
> +if (!already_in_map && DECL_P (t))
> +  {
> + if (!DECL_LANG_SPECIFIC (t))
> +   retrofit_lang_decl (t);
> + if (DECL_LANG_SPECIFIC (t))
> +   SET_DECL_EXPIRED_P (t, false);
> +  }

Since this new flag would only be used only during constexpr evaluation,
could we instead use an on-the-side hash_set in constexpr_global_ctx for
tracking expired-ness?  That way we won't have to allocate a
DECL_LANG_SPECIFIC structure for decls that lack it, and won't have to
worry about the flag in other parts of the compiler.

>  if (!already_in_map && modifiable)
>modifiable->add (t);
>}
> -  void remove_value (tree t) { values.remove (t); }
> +  void remove_value (tree t)
> +  {
> +if (DECL_P (t) && DECL_LANG_SPECIFIC (t))
> +  SET_DECL_EXPIRED_P (t, true);
> +values.remove (t);
> +  }
>  };
>  
>  /* Helper class for constexpr_global_ctx.  In some cases we want to avoid
> @@ -3157,10 +3169,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
> for (tree save_expr : save_exprs)
>   ctx->global->remove_value (save_expr);
>  
> -   /* Remove the parms/result from the values map.  Is it worth
> -  bothering to do this when the map itself is only live for
> -  one constexpr evaluation?  If so, maybe also clear out
> -  other vars from call, maybe in BIND_EXPR handling?  */
> +   /* Remove the parms/result from the values map.  */
> ctx->global->remove_value (res);
> for (tree parm = parms; parm; parm = TREE_CHAIN (parm))
>   ctx->global->remove_value (parm);
> @@ -5708,6 +5717,13 @@ non_const_var_error (location_t loc, tree r, bool 
> fundef_p)
>   inform (DECL_SOURCE_LOCATION (r), "allocated here");
>return;
>  }
> +  if (DECL_EXPIRED_P (r))
> +{
> +  if (constexpr_error (loc, fundef_p, "accessing object outside its "
> +"lifetime"))
> + inform (DECL_SOURCE_LOCATION (r), "declared here");
> +  return;
> +}
>if (!constexpr_error (loc, fundef_p, "the value of 

Re: Tiny phiprop compile time optimization

2023-06-23 Thread Richard Biener via Gcc-patches



> Am 23.06.2023 um 18:10 schrieb Jan Hubicka via Gcc-patches 
> :
> 
> Hi,
> here is updated version with TODO_update_ssa_only_virtuals.
> bootstrapped/regtested x86_64-linux. OK?

Ok

Richard 

> gcc/ChangeLog:
> 
>* tree-ssa-phiprop.cc (propagate_with_phi): Compute post dominators on
>demand.
>(pass_phiprop::execute): Do not compute it here; return
>update_ssa_only_virtuals if something changed.
>(pass_data_phiprop): Remove TODO_update_ssa from todos.
>
> 
> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
> index 8c9ce903472..b01ef4495c2 100644
> --- a/gcc/tree-ssa-phiprop.cc
> +++ b/gcc/tree-ssa-phiprop.cc
> @@ -340,6 +340,9 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
> phiprop_d *phivn,
>   gimple *def_stmt;
>   tree vuse;
> 
> +  if (!dom_info_available_p (cfun, CDI_POST_DOMINATORS))
> +calculate_dominance_info (CDI_POST_DOMINATORS);
> +
>   /* Only replace loads in blocks that post-dominate the PHI node.  That
>  makes sure we don't end up speculating loads.  */
>   if (!dominated_by_p (CDI_POST_DOMINATORS,
> @@ -485,7 +488,7 @@ const pass_data pass_data_phiprop =
>   0, /* properties_provided */
>   0, /* properties_destroyed */
>   0, /* todo_flags_start */
> -  TODO_update_ssa, /* todo_flags_finish */
> +  0, /* todo_flags_finish */
> };
> 
> class pass_phiprop : public gimple_opt_pass
> @@ -513,7 +516,6 @@ pass_phiprop::execute (function *fun)
>   size_t n;
> 
>   calculate_dominance_info (CDI_DOMINATORS);
> -  calculate_dominance_info (CDI_POST_DOMINATORS);
> 
>   n = num_ssa_names;
>   phivn = XCNEWVEC (struct phiprop_d, n);
> @@ -539,7 +541,7 @@ pass_phiprop::execute (function *fun)
> 
>   free_dominance_info (CDI_POST_DOMINATORS);
> 
> -  return 0;
> +  return did_something ? TODO_update_ssa_only_virtuals : 0;
> }
> 
> } // anon namespace


Re: [PATCH] SSA ALIAS: Apply LEN_MASK_STORE to 'ref_maybe_used_by_call_p_1'

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 08:15, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * tree-ssa-alias.cc (call_may_clobber_ref_p_1): Add LEN_MASK_STORE.
Doesn't this need to extract/compute the size argument in a manner 
similar to what DSE does?


Jeff


Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-06-23 Thread Andre Vieira (lists) via Gcc-patches

+  /* In order to find out if the loop is of type A or B above look for the
+ loop counter: it will either be incrementing by one per iteration or
+ it will be decrementing by num_of_lanes.  We can find the loop counter
+ in the condition at the end of the loop.  */
+  rtx_insn *loop_cond = prev_nonnote_nondebug_insn_bb (BB_END (body));
+  gcc_assert (cc_register (XEXP (PATTERN (loop_cond), 0), VOIDmode)
+ && GET_CODE (XEXP (PATTERN (loop_cond), 1)) == COMPARE);

Not sure this should be an assert. If we do encounter a differently 
formed loop, we should bail out of DLSTPing for now but we shouldn't ICE.



+  /* The loop latch has to be empty.  When compiling all the known MVE 
LoLs in
+ user applications, none of those with incrementing counters had 
any real
+ insns in the loop latch.  As such, this function has only been 
tested with

+ an empty latch and may misbehave or ICE if we somehow get here with an
+ increment in the latch, so, for sanity, error out early.  */
+  rtx_insn *dec_insn = BB_END (body->loop_father->latch);
+  if (NONDEBUG_INSN_P (dec_insn))
+gcc_unreachable ();

Similarly here I'd return false rather than gcc_unreachable ();


+  /* Find where both of those are modified in the loop body bb.  */
+  rtx condcount_reg_set = PATTERN (DF_REF_INSN (df_bb_regno_only_def_find
+(body, REGNO (condcount;
Put = on newline, breaks it down nicer.

+ counter_orig_set = XEXP (PATTERN
+   (DF_REF_INSN
+ (DF_REF_NEXT_REG
+   (DF_REG_DEF_CHAIN
+(REGNO
+  (XEXP (condcount_reg_set, 0)), 
1);

This makes me a bit nervous, can we be certain that the PATTERN of the 
next insn that sets it is indeed a set. Heck can we even be sure 
DF_REG_DEF_CHAIN returns a non-null, I can't imagine why not but maybe 
there are some constructs it can't follow-up on? Might just be worth 
checking these steps and bailing out.




+  /* When we find the vctp instruction: This may be followed by
+  a zero-extend insn to SImode.  If it is, then save the
+  zero-extended REG into vctp_vpr_generated.  If there is no
+  zero-extend, then store the raw output of the vctp.
+  For any VPT-predicated instructions we need to ensure that
+  the VPR they use is the same as the one given here and
+  they often consume the output of a subreg of the SImode
+  zero-extended VPR-reg.  As a result, comparing against the
+  output of the zero-extend is more likely to succeed.
+  This code also guarantees to us that the vctp comes before
+  any instructions that use the VPR within the loop, for the
+  dlstp/letp transform to succeed.  */

Wrong comment indent after first line.

+  rtx_insn *vctp_insn = arm_mve_get_loop_vctp (body);
+  if (!vctp_insn || !arm_mve_loop_valid_for_dlstp (body))
+return GEN_INT (1);

arm_mve_loop_valid_for_dlstp already calls arm_mve_get_loop_vctp, maybe 
have 'arm_mve_loop_valid_for_dlstp' return vctp_insn or null to 
determine success or failure, avoids looping through the BB again.


For the same reason I'd also pass vctp_insn down to 
'arm_mve_check_df_chain_back_for_implic_predic'.


+ if (GET_CODE (SET_SRC (single_set (next_use1))) == ZERO_EXTEND)
+   {
+ rtx_insn *next_use2 = NULL;

Are we sure single_set can never return 0 here? Maybe worth an extra 
check and bail out if it does?


+   /* If the insn pattern requires the use of the VPR value from the
+ vctp as an input parameter.  */
s/an an input parameter./as an input parameter for predication./

+ /* None of registers USE-d by the instruction need can be the VPR
+vctp_vpr_generated.  This blocks the optimisation if there any
+instructions that use the optimised-out VPR value in any way
+other than as a VPT block predicate.  */

Reword this slightly to be less complex:
This instruction USE-s the vctp_vpr_generated other than for 
predication, this blocks the transformation as we are not allowed to 
optimise the VPR value away.


Will continue reviewing next week :)

On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:

     Hi all,

     This is the 2/2 patch that contains the functional changes needed
     for MVE Tail Predicated Low Overhead Loops.  See my previous email
     for a general introduction of MVE LOLs.

     This support is added through the already existing loop-doloop
     mechanisms that are used for non-MVE dls/le looping.

     Mid-end changes are:

     1) Relax the loop-doloop mechanism in the mid-end to allow for
    decrement numbers other that -1 and for `count` to be an
    rtx containing a simple REG (which in this case will contain
    the number of elements to be processed), 

Re: [PATCH] c++: redundant targ coercion for var/alias tmpls

2023-06-23 Thread Patrick Palka via Gcc-patches
On Fri, 23 Jun 2023, Jason Merrill wrote:

> On 6/21/23 13:19, Patrick Palka wrote:
> > When stepping through the variable/alias template specialization code
> > paths, I noticed we perform template argument coercion twice: first from
> > instantiate_alias_template / finish_template_variable and again from
> > tsubst_decl (during instantiate_template).  It should suffice to perform
> > coercion once.
> > 
> > To that end patch elides this second coercion from tsubst_decl when
> > possible.  We can't get rid of it completely because we don't always
> > specialize a variable template from finish_template_variable: we could
> > also be doing so directly from instantiate_template during variable
> > template partial specialization selection, in which case the coercion
> > from tsubst_decl would be the first and only coercion.
> 
> Perhaps we should be coercing in lookup_template_variable rather than
> finish_template_variable?

Ah yes, there's a patch for that at
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617377.html :)

> It looks like we currently get to
> most_specialized_partial_spec with args that haven't yet been coerced to match
> the primary template.

The call to most_specialized_partial_spec from instantiate_template?
I believe the arguments should already have been coerced by the
caller, which is presumably always finish_template_variable.

So in that patch I also made instantiate_template use
build2 (TEMPLATE_ID_EXPR, ...) directly instead of calling
lookup_template_variable, to avoid an unnecessary double coercion.

> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?  This reduces memory usage of range-v3's zip.cpp by ~0.5%.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (tsubst_decl) : Call
> > coercion_template_parms only if DECL_TEMPLATE_SPECIALIZATION
> > is set.
> > ---
> >   gcc/cp/pt.cc | 15 +++
> >   1 file changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index be86051abad..dd10409ce18 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -15232,10 +15232,17 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
> > complain)
> > argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> > if (argvec != error_mark_node
> > && PRIMARY_TEMPLATE_P (gen_tmpl)
> > -   && TMPL_ARGS_DEPTH (args) >= TMPL_ARGS_DEPTH (argvec))
> > - /* We're fully specializing a template declaration, so
> > -we need to coerce the innermost arguments corresponding
> > to
> > -the template.  */
> > +   && TMPL_ARGS_DEPTH (args) >= TMPL_ARGS_DEPTH (argvec)
> > +   && DECL_TEMPLATE_SPECIALIZATION (t))
> > + /* We're fully specializing an alias or variable template,
> > so
> > +coerce the innermost arguments if necessary.  We expect
> > +instantiate_alias_template and finish_template_variable
> > to
> > +already have done this relative to the primary template,
> > in
> > +which case this coercion is unnecessary, but we can also
> > +get here when substituting a partial variable template
> > +specialization (directly from instantiate_template), in
> > +which case DECL_TEMPLATE_SPECIALIZATION is set and
> > coercion
> > +is necessary.  */
> >   argvec = (coerce_template_parms
> > (DECL_TEMPLATE_PARMS (gen_tmpl),
> >  argvec, tmpl, complain));
> 
> 



Re: [PATCH] c++: Report invalid id-expression in decltype [PR100482]

2023-06-23 Thread Patrick Palka via Gcc-patches
On Sun, 30 Apr 2023, Nathaniel Shead via Gcc-patches wrote:

> This patch ensures that any errors raised by finish_id_expression when
> parsing a decltype expression are properly reported, rather than
> potentially going ignored and causing invalid code to be accepted.
> 
> We can also now remove the separate check for templates without args as
> this is also checked for in finish_id_expression.
> 
>   PR 100482
> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_parser_decltype_expr): Report errors raised by
>   finish_id_expression.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/pr100482.C: New test.

LGTM.  Some minor comments about the new testcase below:

> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/parser.cc| 22 +++---
>  gcc/testsuite/g++.dg/pr100482.C | 11 +++
>  2 files changed, 22 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/pr100482.C
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index e5f032f2330..20ebcdc3cfd 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -16508,10 +16508,6 @@ cp_parser_decltype_expr (cp_parser *parser,
>   expr = cp_parser_lookup_name_simple (parser, expr,
>id_expr_start_token->location);
>  
> -  if (expr && TREE_CODE (expr) == TEMPLATE_DECL)
> - /* A template without args is not a complete id-expression.  */
> - expr = error_mark_node;
> -
>if (expr
>&& expr != error_mark_node
>&& TREE_CODE (expr) != TYPE_DECL
> @@ -16532,13 +16528,17 @@ cp_parser_decltype_expr (cp_parser *parser,
> _msg,
>  id_expr_start_token->location));
>  
> -  if (expr == error_mark_node)
> -/* We found an id-expression, but it was something that we
> -   should not have found. This is an error, not something
> -   we can recover from, so note that we found an
> -   id-expression and we'll recover as gracefully as
> -   possible.  */
> -id_expression_or_member_access_p = true;
> +   if (error_msg)
> + {
> +   /* We found an id-expression, but it was something that we
> +  should not have found. This is an error, not something
> +  we can recover from, so report the error we found and
> +  we'll recover as gracefully as possible.  */
> +   cp_parser_parse_definitely (parser);
> +   cp_parser_error (parser, error_msg);
> +   id_expression_or_member_access_p = true;
> +   return error_mark_node;
> + }
>  }
>  
>if (expr
> diff --git a/gcc/testsuite/g++.dg/pr100482.C b/gcc/testsuite/g++.dg/pr100482.C
> new file mode 100644
> index 000..dcf6722fda5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr100482.C

We generally prefer to organize tests according to the language dialect
they apply to and the langugae construct that they're primarily testing.
In this case we could name the test e.g.

  gcc/testsuite/g++.dg/cpp0x/decltype-100482.C

> @@ -0,0 +1,11 @@
> +// { dg-do compile { target c++10 } }

We also usually mention the PR number in the test as a comment:

// PR c++/100482

One benefit of doing so is that the git alias 'git gcc-commit-mklog'
(https://gcc.gnu.org/gitwrite.html#vendor) will then automatically
include the PR number in the commit message template.

> +
> +namespace N {}
> +decltype(std) x;   // { dg-error "expected primary-expression" }
> +
> +struct S {};
> +decltype(S) y;  // { dg-error "argument to .decltype. must be an expression" 
> }
> +
> +template 
> +struct U {};
> +decltype(U) z;  // { dg-error "missing template arguments" }
> -- 
> 2.40.0
> 
> 



Re: Tiny phiprop compile time optimization

2023-06-23 Thread Jan Hubicka via Gcc-patches
Hi,
here is updated version with TODO_update_ssa_only_virtuals.
bootstrapped/regtested x86_64-linux. OK?

gcc/ChangeLog:

* tree-ssa-phiprop.cc (propagate_with_phi): Compute post dominators on
demand.
(pass_phiprop::execute): Do not compute it here; return
update_ssa_only_virtuals if something changed.
(pass_data_phiprop): Remove TODO_update_ssa from todos.


diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index 8c9ce903472..b01ef4495c2 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -340,6 +340,9 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
   gimple *def_stmt;
   tree vuse;
 
+  if (!dom_info_available_p (cfun, CDI_POST_DOMINATORS))
+   calculate_dominance_info (CDI_POST_DOMINATORS);
+
   /* Only replace loads in blocks that post-dominate the PHI node.  That
  makes sure we don't end up speculating loads.  */
   if (!dominated_by_p (CDI_POST_DOMINATORS,
@@ -485,7 +488,7 @@ const pass_data pass_data_phiprop =
   0, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_update_ssa, /* todo_flags_finish */
+  0, /* todo_flags_finish */
 };
 
 class pass_phiprop : public gimple_opt_pass
@@ -513,7 +516,6 @@ pass_phiprop::execute (function *fun)
   size_t n;
 
   calculate_dominance_info (CDI_DOMINATORS);
-  calculate_dominance_info (CDI_POST_DOMINATORS);
 
   n = num_ssa_names;
   phivn = XCNEWVEC (struct phiprop_d, n);
@@ -539,7 +541,7 @@ pass_phiprop::execute (function *fun)
 
   free_dominance_info (CDI_POST_DOMINATORS);
 
-  return 0;
+  return did_something ? TODO_update_ssa_only_virtuals : 0;
 }
 
 } // anon namespace


Re: PING: Re: [PATCH] c++: provide #include hint for missing includes [PR110164]

2023-06-23 Thread Jason Merrill via Gcc-patches

On 6/22/23 11:50, Marek Polacek wrote:

On Wed, Jun 21, 2023 at 04:44:00PM -0400, David Malcolm via Gcc-patches wrote:

I'd like to ping this C++ FE patch for review:
 https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621779.html


Not an approval, but LGTM, though some nits below:
  

On Wed, 2023-06-14 at 20:28 -0400, David Malcolm wrote:

PR c++/110164 notes that in cases where we have a forward decl
of a std library type such as:

std::array x;

we omit this diagnostic:

error: aggregate ‘std::array x’ has incomplete type and cannot be 
defined

This patch adds this hint to the diagnostic:

note: ‘std::array’ is defined in header ‘’; this is probably fixable by adding 
‘#include ’

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/cp/ChangeLog:
 PR c++/110164
 * cp-name-hint.h (maybe_suggest_missing_header): New decl.
 * decl.cc: Define INCLUDE_MEMORY.  Add include of
 "cp/cp-name-hint.h".
 (start_decl_1): Call maybe_suggest_missing_header.
 * name-lookup.cc (maybe_suggest_missing_header): Remove "static".

gcc/testsuite/ChangeLog:
 PR c++/110164
 * g++.dg/missing-header-pr110164.C: New test.
---
  gcc/cp/cp-name-hint.h  |  3 +++
  gcc/cp/decl.cc | 10 ++
  gcc/cp/name-lookup.cc  |  2 +-
  gcc/testsuite/g++.dg/missing-header-pr110164.C | 10 ++
  4 files changed, 24 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/missing-header-pr110164.C

diff --git a/gcc/cp/cp-name-hint.h b/gcc/cp/cp-name-hint.h
index bfa7c53c8f6..e2387e23d1f 100644
--- a/gcc/cp/cp-name-hint.h
+++ b/gcc/cp/cp-name-hint.h
@@ -32,6 +32,9 @@ along with GCC; see the file COPYING3.  If not see
  
  extern name_hint suggest_alternatives_for (location_t, tree, bool);

  extern name_hint suggest_alternatives_in_other_namespaces (location_t, tree);
+extern name_hint maybe_suggest_missing_header (location_t location,
+  tree name,
+  tree scope);


The enclosing decls omit the parameter names; if you do that, it may
fit on one line.


  extern name_hint suggest_alternative_in_explicit_scope (location_t, tree, 
tree);
  extern name_hint suggest_alternative_in_scoped_enum (tree, tree);
  
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc

index a672e4844f1..504b08ec250 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
     line numbers.  For example, the CONST_DECLs for enum values.  */
  
  #include "config.h"

+#define INCLUDE_MEMORY
  #include "system.h"
  #include "coretypes.h"
  #include "target.h"
@@ -46,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "c-family/c-objc.h"
  #include "c-family/c-pragma.h"
  #include "c-family/c-ubsan.h"
+#include "cp/cp-name-hint.h"
  #include "debug.h"
  #include "plugin.h"
  #include "builtins.h"
@@ -5995,7 +5997,11 @@ start_decl_1 (tree decl, bool initialized)
 ;   /* An auto type is ok.  */
    else if (TREE_CODE (type) != ARRAY_TYPE)
 {
+ auto_diagnostic_group d;
   error ("variable %q#D has initializer but incomplete type", decl);
+ maybe_suggest_missing_header (input_location,
+   TYPE_IDENTIFIER (type),
+   TYPE_CONTEXT (type));


Maybe CP_TYPE_CONTEXT?


   type = TREE_TYPE (decl) = error_mark_node;
 }
    else if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (type
@@ -6011,8 +6017,12 @@ start_decl_1 (tree decl, bool initialized)
 gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (type));
    else
 {
+ auto_diagnostic_group d;
   error ("aggregate %q#D has incomplete type and cannot be defined",
  decl);
+ maybe_suggest_missing_header (input_location,
+   TYPE_IDENTIFIER (type),
+   TYPE_CONTEXT (type));


Here as well.


   /* Change the type so that assemble_variable will give
  DECL an rtl we can live with: (mem (const_int 0)).  */
   type = TREE_TYPE (decl) = error_mark_node;
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 6ac58a35b56..917b481c163 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -6796,7 +6796,7 @@ maybe_suggest_missing_std_header (location_t location, 
tree name)
     for NAME within SCOPE at LOCATION, or an empty name_hint if this isn't
     applicable.  */
  
-static name_hint

+name_hint
  maybe_suggest_missing_header (location_t location, tree name, tree scope)
  {
    if (scope == NULL_TREE)
diff --git a/gcc/testsuite/g++.dg/missing-header-pr110164.C 
b/gcc/testsuite/g++.dg/missing-header-pr110164.C
new file mode 100644
index 000..15980071c38

Re: [PATCH] c++: Fix ICE with parameter pack of decltype(auto) [PR103497]

2023-06-23 Thread Patrick Palka via Gcc-patches
Hi,

On Sat, 22 Apr 2023, Nathaniel Shead via Gcc-patches wrote:

> Bootstrapped and tested on x86_64-pc-linux-gnu.
> 
> -- 8< --
> 
> This patch raises an error early when the decltype(auto) specifier is
> used as a parameter of a function. This prevents any issues with an
> unexpected tree type later on when performing the call.

Thanks very much for the patch!  Some minor comments below.

> 
>   PR 103497

We should include the bug component name when referring to the PR in the
commit message (i.e. PR c++/103497) so that upon pushing the patch the
post-commit hook automatically adds a comment to the PR reffering to the
commit.  I could be wrong but AFAIK the hook only performs this when the
component name is included.

> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_parser_simple_type_specifier): Add check for
>   decltype(auto) as function parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/pr103497.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/parser.cc| 10 ++
>  gcc/testsuite/g++.dg/pr103497.C |  7 +++
>  2 files changed, 17 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/pr103497.C
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index e5f032f2330..1415e07e152 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -19884,6 +19884,16 @@ cp_parser_simple_type_specifier (cp_parser* parser,
>&& cp_lexer_peek_nth_token (parser->lexer, 2)->type != CPP_SCOPE)
>  {
>type = saved_checks_value (token->u.tree_check_value);
> +  /* Within a function parameter declaration, decltype(auto) is always an
> +  error.  */
> +  if (parser->auto_is_implicit_function_template_parm_p
> +   && TREE_CODE (type) == TEMPLATE_TYPE_PARM

We could check is_auto (type) here instead, to avoid any confusion with
checking AUTO_IS_DECLTYPE for a non-auto TEMPLATE_TYPE_PARM.

> +   && AUTO_IS_DECLTYPE (type))
> + {
> +   error_at (token->location,
> + "cannot declare a parameter with %");
> +   type = error_mark_node;
> + }
>if (decl_specs)
>   {
> cp_parser_set_decl_spec_type (decl_specs, type,
> diff --git a/gcc/testsuite/g++.dg/pr103497.C b/gcc/testsuite/g++.dg/pr103497.C
> new file mode 100644
> index 000..bcd421c2907
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr103497.C
> @@ -0,0 +1,7 @@
> +// { dg-do compile { target c++14 } }
> +
> +void foo(decltype(auto)... args);  // { dg-error "parameter with 
> .decltype.auto..|no parameter packs" }

I noticed for

  void foo(decltype(auto) arg);

we already issue an identical error from grokdeclarator.  Perhaps we could
instead extend the error handling there to detect decltype(auto)... as well,
rather than adding new error handling in cp_parser_simple_type_specifier?

> +
> +int main() {
> +  foo();
> +}
> -- 
> 2.34.1
> 
> 



Re: [PATCH] text-art: remove explicit #include of C++ standard library headers

2023-06-23 Thread Alex Coplan via Gcc-patches
Hi Dave,

On 23/06/2023 10:36, David Malcolm wrote:
> On Fri, 2023-06-23 at 12:52 +0100, Alex Coplan wrote:
> > Hi David,
> > 
> > It looks like this patch breaks bootstrap on Darwin. I tried a
> > bootstrap on
> > x86_64-apple-darwin and got errors building selftest-run-tests.cc:
> > 
> > In file included from
> > /Users/alecop01/toolchain/src/gcc/gcc/selftest-run-tests.cc:31:
> > In file included from
> > /Users/alecop01/toolchain/src/gcc/gcc/text-art/selftests.h:25:
> > In file included from
> > /Users/alecop01/toolchain/src/gcc/gcc/text-art/types.h:26:
> > In file included from
> > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> > Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:276:
> > In file included from
> > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> > Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__bit_reference:15:
> > In file included from
> > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> > Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/algorithm:653:
> > In file included from
> > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> > Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:670:
> > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> > Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/typeinfo:377:5: error:
> > no member named 'fancy_abort' in namespace 'std::__1'; did you mean
> > simply 'fancy_abort'?
> > _VSTD::abort();
> > ^~~
> > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> > Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:856:15: note:
> > expanded from macro '_VSTD'
> > #define _VSTD std::_LIBCPP_ABI_NAMESPACE
> >   ^
> > /Users/alecop01/toolchain/src/gcc/gcc/system.h:811:13: note:
> > 'fancy_abort' declared here
> > extern void fancy_abort (const char *, int, const char *)
> > ^
> > 
> > Please could you take a look?
> > 
> > Thanks,
> > Alex
> 
> Sorry about the breakage.
> 
> Does the following patch fix it for you?
> (only tested lightly so far, on x86_64-pc-linux-gnu)

Thanks for the fix! I can confirm this fixes bootstrap on
x86_64-apple-darwin for me.

Cheers,
Alex

> 
> 
> Dave
> 
> 
> 
> gcc/analyzer/ChangeLog:
>   * access-diagram.cc: Add #define INCLUDE_VECTOR.
>   * bounds-checking.cc: Likewise.
> 
> gcc/ChangeLog:
>   * diagnostic-format-sarif.cc: Add #define INCLUDE_VECTOR.
>   * diagnostic.cc: Likewise.
>   * text-art/box-drawing.cc: Likewise.
>   * text-art/canvas.cc: Likewise.
>   * text-art/ruler.cc: Likewise.
>   * text-art/selftests.cc: Likewise.
>   * text-art/selftests.h (text_art::canvas): New forward decl.
>   * text-art/style.cc: Add #define INCLUDE_VECTOR.
>   * text-art/styled-string.cc: Likewise.
>   * text-art/table.cc: Likewise.
>   * text-art/table.h: Remove #include .
>   * text-art/theme.cc: Add #define INCLUDE_VECTOR.
>   * text-art/types.h: Remove #include of  and .
>   * text-art/widget.cc: Add #define INCLUDE_VECTOR.
>   * text-art/widget.h: Remove #include .
> ---
>  gcc/analyzer/access-diagram.cc  | 1 +
>  gcc/analyzer/bounds-checking.cc | 1 +
>  gcc/diagnostic-format-sarif.cc  | 1 +
>  gcc/diagnostic.cc   | 1 +
>  gcc/text-art/box-drawing.cc | 1 +
>  gcc/text-art/canvas.cc  | 1 +
>  gcc/text-art/ruler.cc   | 1 +
>  gcc/text-art/selftests.cc   | 1 +
>  gcc/text-art/selftests.h| 4 +++-
>  gcc/text-art/style.cc   | 1 +
>  gcc/text-art/styled-string.cc   | 1 +
>  gcc/text-art/table.cc   | 1 +
>  gcc/text-art/table.h| 1 -
>  gcc/text-art/theme.cc   | 1 +
>  gcc/text-art/types.h| 2 --
>  gcc/text-art/widget.cc  | 1 +
>  gcc/text-art/widget.h   | 1 -
>  17 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/analyzer/access-diagram.cc b/gcc/analyzer/access-diagram.cc
> index 968ff50a0b7..467c9bdd734 100644
> --- a/gcc/analyzer/access-diagram.cc
> +++ b/gcc/analyzer/access-diagram.cc
> @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define INCLUDE_MEMORY
>  #define INCLUDE_MAP
>  #define INCLUDE_SET
> +#define INCLUDE_VECTOR
>  #include "system.h"
>  #include "coretypes.h"
>  #include "coretypes.h"
> diff --git a/gcc/analyzer/bounds-checking.cc b/gcc/analyzer/bounds-checking.cc
> index 10632d12562..5e8de9a7aa5 100644
> --- a/gcc/analyzer/bounds-checking.cc
> +++ b/gcc/analyzer/bounds-checking.cc
> @@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #include "config.h"
>  #define INCLUDE_MEMORY
> +#define INCLUDE_VECTOR
>  #include "system.h"
>  #include "coretypes.h"
>  #include "make-unique.h"
> diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
> index ac2f5b844e3..5e483988027 100644
> --- a/gcc/diagnostic-format-sarif.cc
> +++ b/gcc/diagnostic-format-sarif.cc
> @@ -20,6 +20,7 @@ along with GCC; see 

Re: [PATCH] c++: redundant targ coercion for var/alias tmpls

2023-06-23 Thread Jason Merrill via Gcc-patches

On 6/21/23 13:19, Patrick Palka wrote:

When stepping through the variable/alias template specialization code
paths, I noticed we perform template argument coercion twice: first from
instantiate_alias_template / finish_template_variable and again from
tsubst_decl (during instantiate_template).  It should suffice to perform
coercion once.

To that end patch elides this second coercion from tsubst_decl when
possible.  We can't get rid of it completely because we don't always
specialize a variable template from finish_template_variable: we could
also be doing so directly from instantiate_template during variable
template partial specialization selection, in which case the coercion
from tsubst_decl would be the first and only coercion.


Perhaps we should be coercing in lookup_template_variable rather than 
finish_template_variable?  It looks like we currently get to 
most_specialized_partial_spec with args that haven't yet been coerced to 
match the primary template.



Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  This reduces memory usage of range-v3's zip.cpp by ~0.5%.

gcc/cp/ChangeLog:

* pt.cc (tsubst_decl) : Call
coercion_template_parms only if DECL_TEMPLATE_SPECIALIZATION
is set.
---
  gcc/cp/pt.cc | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index be86051abad..dd10409ce18 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -15232,10 +15232,17 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
if (argvec != error_mark_node
&& PRIMARY_TEMPLATE_P (gen_tmpl)
-   && TMPL_ARGS_DEPTH (args) >= TMPL_ARGS_DEPTH (argvec))
- /* We're fully specializing a template declaration, so
-we need to coerce the innermost arguments corresponding to
-the template.  */
+   && TMPL_ARGS_DEPTH (args) >= TMPL_ARGS_DEPTH (argvec)
+   && DECL_TEMPLATE_SPECIALIZATION (t))
+ /* We're fully specializing an alias or variable template, so
+coerce the innermost arguments if necessary.  We expect
+instantiate_alias_template and finish_template_variable to
+already have done this relative to the primary template, in
+which case this coercion is unnecessary, but we can also
+get here when substituting a partial variable template
+specialization (directly from instantiate_template), in
+which case DECL_TEMPLATE_SPECIALIZATION is set and coercion
+is necessary.  */
  argvec = (coerce_template_parms
(DECL_TEMPLATE_PARMS (gen_tmpl),
 argvec, tmpl, complain));




Re: [PATCH] c++: Add support for -std={c,gnu}++2{c,6}

2023-06-23 Thread Jason Merrill via Gcc-patches

On 6/22/23 20:25, Marek Polacek wrote:

It seems prudent to add C++26 now that the first C++26 papers have been
approved.  I followed commit r11-6920 as well as r8-3237.

I was puzzled to see that -std=c++23 was marked Undocumented but
-std=c++2b wasn't.  I think it should be the other way round, like
the earlier modes.


I was leaving -std=c++23 undocumented until C++23 was finalized, which 
it now effectively is, so this change is good.  Speaking of which, it 
looks like the final value for __cplusplus for C++23 is 202302L.


But similarly, I'd like to leave -std=c++26 undocumented for now.


As for __cplusplus, I've arbitrarily chosen 202600L:


Our previous convention has been the year after the previous standard, 
so let's use 202400L.



   $ xg++ -std=c++26 -dM -E -x c++ - < /dev/null | grep cplusplus
   #define __cplusplus 202600L

I've verified the patch with a simple test, exercising the new
directives.  Don't forget to update your GXX_TESTSUITE_STDS!

This patch does not add -Wc++26-extensions.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

* c-common.h (cxx_dialect): Add cxx26 as a dialect.
* c-opts.cc (set_std_cxx26): New.
(c_common_handle_option): Set options when -std={c,gnu}++2{c,6} is
enabled.
(c_common_post_options): Adjust comments.
* c.opt: Add options for -std=c++26, std=c++2c, -std=gnu++26,
and -std=gnu++2c.
(std=c++2b): Mark as Undocumented.
(std=c++23): No longer Undocumented.

gcc/ChangeLog:

* doc/cpp.texi (__cplusplus): Document value for -std=c++26 and
-std=gnu++26.
* doc/invoke.texi: Document -std=c++26 and -std=gnu++26.
* dwarf2out.cc (highest_c_language): Handle GNU C++26.
(gen_compile_unit_die): Likewise.

libcpp/ChangeLog:

* include/cpplib.h (c_lang): Add CXX26 and GNUCXX26.
* init.cc (lang_defaults): Add rows for CXX26 and GNUCXX26.
(cpp_init_builtins): Set __cplusplus to 202600L for C++26.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_c++23): Return
1 also if check_effective_target_c++26.
(check_effective_target_c++23_down): New.
(check_effective_target_c++26_only): New.
(check_effective_target_c++26): New.
* g++.dg/cpp26/cplusplus.C: New test.
---
  gcc/c-family/c-common.h|  4 +++-
  gcc/c-family/c-opts.cc | 28 +++-
  gcc/c-family/c.opt | 24 +
  gcc/doc/cpp.texi   |  5 +
  gcc/doc/invoke.texi| 12 +++
  gcc/dwarf2out.cc   |  5 -
  gcc/testsuite/g++.dg/cpp26/cplusplus.C |  5 +
  gcc/testsuite/lib/target-supports.exp  | 30 +-
  libcpp/include/cpplib.h|  2 +-
  libcpp/init.cc | 13 ---
  10 files changed, 116 insertions(+), 12 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp26/cplusplus.C

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 336a09f4a40..b5ef5ff6b2c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -740,7 +740,9 @@ enum cxx_dialect {
/* C++20 */
cxx20,
/* C++23 */
-  cxx23
+  cxx23,
+  /* C++26 */
+  cxx26
  };
  
  /* The C++ dialect being used. C++98 is the default.  */

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index c68a2a27469..af19140e382 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -111,6 +111,7 @@ static void set_std_cxx14 (int);
  static void set_std_cxx17 (int);
  static void set_std_cxx20 (int);
  static void set_std_cxx23 (int);
+static void set_std_cxx26 (int);
  static void set_std_c89 (int, int);
  static void set_std_c99 (int);
  static void set_std_c11 (int);
@@ -663,6 +664,12 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
set_std_cxx23 (code == OPT_std_c__23 /* ISO */);
break;
  
+case OPT_std_c__26:

+case OPT_std_gnu__26:
+  if (!preprocessing_asm_p)
+   set_std_cxx26 (code == OPT_std_c__26 /* ISO */);
+  break;
+
  case OPT_std_c90:
  case OPT_std_iso9899_199409:
if (!preprocessing_asm_p)
@@ -1032,7 +1039,8 @@ c_common_post_options (const char **pfilename)
warn_narrowing = 1;
  
/* Unless -f{,no-}ext-numeric-literals has been used explicitly,

-for -std=c++{11,14,17,20,23} default to -fno-ext-numeric-literals.  */
+for -std=c++{11,14,17,20,23,26} default to
+-fno-ext-numeric-literals.  */
if (flag_iso && !OPTION_SET_P (flag_ext_numeric_literals))
cpp_opts->ext_numeric_literals = 0;
  }
@@ -1820,6 +1828,24 @@ set_std_cxx23 (int iso)
lang_hooks.name = "GNU C++23";
  }
  
+/* Set the C++ 2026 standard (without GNU extensions if ISO).  */

+static void
+set_std_cxx26 (int iso)
+{
+  cpp_set_lang 

Re: [PATCH] IVOPTS: Add LEN_MASK_{LOAD, STORE} into 'get_alias_ptr_type_for_ptr_address'

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 08:21, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Add 
LEN_MASK_{LOAD,STORE}.

OK
jeff


Re: [PATCH] DSE: Add LEN_MASK_STORE analysis into DSE

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 08:48, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * tree-ssa-dse.cc (initialize_ao_ref_for_dse): Add LEN_MASK_STORE.
 (dse_optimize_stmt): Ditto.

---
  gcc/tree-ssa-dse.cc | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 3c7a2e9992d..01b0951f1a9 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -174,6 +174,23 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write, 
bool may_def_ok = false)
  return true;
}
  break;
+ case IFN_LEN_MASK_STORE: {
+   /* We cannot initialize a must-def ao_ref (in all cases) but we
+  can provide a may-def variant.  */
+   if (may_def_ok)
+ {
+   tree len_size
+ = int_const_binop (MINUS_EXPR, gimple_call_arg (stmt, 2),
+gimple_call_arg (stmt, 5));
+   tree mask_size
+ = TYPE_SIZE_UNIT (TREE_TYPE (gimple_call_arg (stmt, 4)));
+   tree size = int_const_binop (MAX_EXPR, len_size, mask_size);
+   ao_ref_init_from_ptr_and_size (write, gimple_call_arg (stmt, 0),
+  size);
So isn't len_size here the size in elements?  If so, don't you need to 
multiply len_size by the element size?


Jeff


Re: [PATCH 1/6] Avoid shorten_binary_op on VECTOR_TYPE

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 02:26, Richard Biener via Gcc-patches wrote:

When we disallow TYPE_PRECISION on VECTOR_TYPEs it shows that
shorten_binary_op performs some checks on that that are likely
harmless in the end.  The following bails out early for
VECTOR_TYPE operations to avoid those questionable checks.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

OK?

gcc/c-family/
* c-common.cc (shorten_binary_op): Exit early for VECTOR_TYPE
operations.

OK.  Oh how I want shorten_binary_op to go away :(

jeff


Re: [PATCH 2/6] Fix TYPE_PRECISION use in hashable_expr_equal_p

2023-06-23 Thread Jeff Law via Gcc-patches




On 6/23/23 02:27, Richard Biener via Gcc-patches wrote:

While the checks look unnecessary they probably are quick and
thus done early.  The following avoids using TYPE_PRECISION
on VECTOR_TYPEs by making the code match the comment which
talks about precision and signedness.  An alternative would
be to only retain the ERROR_MARK and TYPE_MODE checks or
use TYPE_PRECISION_RAW (but I like that least).

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, OK?

* tree-ssa-scopedtables.cc (hashable_expr_equal_p):
Use element_precision.

OK.
jeff


[PATCH] DSE: Add LEN_MASK_STORE analysis into DSE

2023-06-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Add LEN_MASK_STORE.
(dse_optimize_stmt): Ditto.

---
 gcc/tree-ssa-dse.cc | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 3c7a2e9992d..01b0951f1a9 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -174,6 +174,23 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write, 
bool may_def_ok = false)
  return true;
}
  break;
+ case IFN_LEN_MASK_STORE: {
+   /* We cannot initialize a must-def ao_ref (in all cases) but we
+  can provide a may-def variant.  */
+   if (may_def_ok)
+ {
+   tree len_size
+ = int_const_binop (MINUS_EXPR, gimple_call_arg (stmt, 2),
+gimple_call_arg (stmt, 5));
+   tree mask_size
+ = TYPE_SIZE_UNIT (TREE_TYPE (gimple_call_arg (stmt, 4)));
+   tree size = int_const_binop (MAX_EXPR, len_size, mask_size);
+   ao_ref_init_from_ptr_and_size (write, gimple_call_arg (stmt, 0),
+  size);
+   return true;
+ }
+   break;
+ }
default:;
}
 }
@@ -1502,6 +1519,7 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator 
*gsi, sbitmap live_bytes)
{
case IFN_LEN_STORE:
case IFN_MASK_STORE:
+   case IFN_LEN_MASK_STORE:
  {
enum dse_store_status store_status;
store_status = dse_classify_store (, stmt, false, live_bytes);
-- 
2.36.3



Re: [PATCH zero-call-used-regs] Add leafy mode for zero-call-used-regs

2023-06-23 Thread Qing Zhao via Gcc-patches
Hi, Alexandre,


> On Jun 21, 2023, at 9:16 PM, Alexandre Oliva  wrote:
> 
> Hello, Qing,
> 
> On Jun 16, 2023, Qing Zhao  wrote:
> 
>> As I mentioned in the previous round of review, I think that the 
>> documentation
>> might need to add more details on what’s the LEAFY mode,
>> The purpose of it, and how to use it, provide more details to the end-users.
> 
> I'm afraid I'm having difficulty picturing what it is that you're
> looking for.  The proposal incorporates, by reference, all the
> documentation for 'used' and for 'all', and the way to use it is no
> different.
> 
>>> +Same as @samp{used} in a leaf function, and same as @samp{all} in a
>>> +nonleaf function.

Oh, yeah.  The definition of “leafy” is here. -:)

It’s better to add this definition earlier in the list of the “three basic 
values”, to make it “four basic values”, like the following:

===
In order to satisfy users with different security needs and control the
run-time overhead at the same time, the @var{choice} parameter provides a
flexible way to choose the subset of the call-used registers to be zeroed.
The four basic values of @var{choice} are:

@itemize @bullet
@item
@samp{skip} doesn't zero any call-used registers.

@item
@samp{used} only zeros call-used registers that are used in the function.
A ``used'' register is one whose content has been set or referenced in
the function.

@item
@samp{all} zeros all call-used registers.
@end itemize

@item
@samp{leafy} Same as @samp{used} in a leaf function, and same as @samp{all} in a
nonleaf function. This value is mainly to provide users a more efficient 
mode to zero 
call-used registers in leaf functions.
@end itemize
==

Then,  in the full list of choice, add the new values of leafy, leafy-gpr, 
leafy-arg, leafy-gpr-arg 

The sentence "This value is mainly to provide users a more efficient mode to 
zero 
call-used registers in leaf functions.” just for your reference,  the 
wording can certainly be improved.  -:)
> 
> If there was documentation on how to choose between e.g. all and used, I
> suppose I could build on that to add this intermediate choice, but...  I
> can't find any such docs, and I'm uncertain on whether adding that would
> be useful to begin with.
> 
> Did you have something else in mind?


Hope this time I am clear (and sorry for the confusion in the previous emails).

thanks.

Qing

> 
> -- 
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>   Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 



Re: [PATCH v5 5/5] c++modules: report module mapper files as a dependency

2023-06-23 Thread Jason Merrill via Gcc-patches

On 1/25/23 16:06, Ben Boeckel wrote:

It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.


Hmm, this seems a bit like making all .o depend on the Makefile; it 
shouldn't be necessary to rebuild all TUs that use modules when we add 
another module to the mapper file.  What is your expected use case for 
this dependency?



gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

Signed-off-by: Ben Boeckel 
---
  gcc/cp/mapper-client.cc |  4 
  gcc/cp/mapper-client.h  |  1 +
  gcc/cp/module.cc| 18 +-
  3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..0ce5679d659 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "diagnostic-core.h"
  #include "mapper-client.h"
  #include "intl.h"
+#include "mkdeps.h"
  
  #include "../../c++tools/resolver.h"
  
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string ,
  
  module_client *

  module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
  {
@@ -285,6 +287,8 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
  
  public:

static module_client *open_module_client (location_t loc, const char 
*option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index dbd1b721616..37066bf072b 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
  /* Our module mapper (created lazily).  */
  module_client *mapper;
  
-static module_client *make_mapper (location_t loc);

-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
  {
auto *res = mapper;
if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
return res;
  }
  
@@ -14031,7 +14031,7 @@ get_module (const char *ptr)

  /* Create a new mapper connecting to OPTION.  */
  
  module_client *

-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
  {
timevar_start (TV_MODULE_MAPPER);
const char *option = module_mapper_name;
@@ -14039,7 +14039,7 @@ make_mapper (location_t loc)
  option = getenv ("CXX_MODULE_MAPPER");
  
mapper = module_client::open_module_client

-(loc, option, _cmi_repo,
+(loc, option, deps, _cmi_repo,
   (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
   && save_decoded_options[0].arg != progname
   ? save_decoded_options[0].arg : nullptr);
@@ -19503,7 +19503,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
dump.push (NULL);
  
dump () && dump ("Checking include translation '%s'", path);

-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
  
size_t len = strlen (path);

path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19619,7 +19619,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
  static void
  name_pending_imports (cpp_reader *reader)
  {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
  
if (!vec_safe_length (pending_imports))

  /* Not doing anything.  */
@@ -20089,7 +20089,7 @@ init_modules (cpp_reader *reader)
  
if (!flag_module_lazy)

  /* Get the mapper now, if we're not being lazy.  */
-get_mapper (cpp_main_loc (reader));
+get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
  
if (!flag_preprocess_only)

  {
@@ -20299,7 +20299,7 @@ late_finish_module (cpp_reader *reader,  
module_processing_cookie 

[pushed] testsuite,objective-c++: Fix imported NSObjCRuntime.h.

2023-06-23 Thread Iain Sandoe via Gcc-patches
Tested on x86_64-darwin, pushed to trunk, thanks
Iain

--- 8< ---

We have imported some headers from the GNUStep project to allow us
to maintain the testsuite independent to changing versions of system
headers.

One of these headers has a macro that (now we have support for
__has_feature) expands to a declaration that triggers a warning.

These headers are considered part of the implementation so that, in
this case, we can suppress the warning with the system_header pragma.

Signed-off-by: Iain Sandoe 

gcc/testsuite/ChangeLog:

* objc-obj-c++-shared/GNUStep/Foundation/NSObjCRuntime.h: Make
this header use pragma system_header.
---
 .../objc-obj-c++-shared/GNUStep/Foundation/NSObjCRuntime.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/gcc/testsuite/objc-obj-c++-shared/GNUStep/Foundation/NSObjCRuntime.h 
b/gcc/testsuite/objc-obj-c++-shared/GNUStep/Foundation/NSObjCRuntime.h
index 189af80436a..62556f9ac88 100644
--- a/gcc/testsuite/objc-obj-c++-shared/GNUStep/Foundation/NSObjCRuntime.h
+++ b/gcc/testsuite/objc-obj-c++-shared/GNUStep/Foundation/NSObjCRuntime.h
@@ -29,6 +29,9 @@
 #ifndef __NSObjCRuntime_h_GNUSTEP_BASE_INCLUDE
 #define __NSObjCRuntime_h_GNUSTEP_BASE_INCLUDE
 
+/* Allow the elaborated enum use in _GS_NAMED_ENUM. */
+#pragma GCC system_header
+
 #ifdef __cplusplus
 #ifndef __STDC_LIMIT_MACROS
 #define __STDC_LIMIT_MACROS 1
-- 
2.39.2 (Apple Git-143)



[PATCH] text-art: remove explicit #include of C++ standard library headers

2023-06-23 Thread David Malcolm via Gcc-patches
On Fri, 2023-06-23 at 12:52 +0100, Alex Coplan wrote:
> Hi David,
> 
> It looks like this patch breaks bootstrap on Darwin. I tried a
> bootstrap on
> x86_64-apple-darwin and got errors building selftest-run-tests.cc:
> 
> In file included from
> /Users/alecop01/toolchain/src/gcc/gcc/selftest-run-tests.cc:31:
> In file included from
> /Users/alecop01/toolchain/src/gcc/gcc/text-art/selftests.h:25:
> In file included from
> /Users/alecop01/toolchain/src/gcc/gcc/text-art/types.h:26:
> In file included from
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/vector:276:
> In file included from
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__bit_reference:15:
> In file included from
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/algorithm:653:
> In file included from
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:670:
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/typeinfo:377:5: error:
> no member named 'fancy_abort' in namespace 'std::__1'; did you mean
> simply 'fancy_abort'?
> _VSTD::abort();
> ^~~
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/
> Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:856:15: note:
> expanded from macro '_VSTD'
> #define _VSTD std::_LIBCPP_ABI_NAMESPACE
>   ^
> /Users/alecop01/toolchain/src/gcc/gcc/system.h:811:13: note:
> 'fancy_abort' declared here
> extern void fancy_abort (const char *, int, const char *)
> ^
> 
> Please could you take a look?
> 
> Thanks,
> Alex

Sorry about the breakage.

Does the following patch fix it for you?
(only tested lightly so far, on x86_64-pc-linux-gnu)


Dave



gcc/analyzer/ChangeLog:
* access-diagram.cc: Add #define INCLUDE_VECTOR.
* bounds-checking.cc: Likewise.

gcc/ChangeLog:
* diagnostic-format-sarif.cc: Add #define INCLUDE_VECTOR.
* diagnostic.cc: Likewise.
* text-art/box-drawing.cc: Likewise.
* text-art/canvas.cc: Likewise.
* text-art/ruler.cc: Likewise.
* text-art/selftests.cc: Likewise.
* text-art/selftests.h (text_art::canvas): New forward decl.
* text-art/style.cc: Add #define INCLUDE_VECTOR.
* text-art/styled-string.cc: Likewise.
* text-art/table.cc: Likewise.
* text-art/table.h: Remove #include .
* text-art/theme.cc: Add #define INCLUDE_VECTOR.
* text-art/types.h: Remove #include of  and .
* text-art/widget.cc: Add #define INCLUDE_VECTOR.
* text-art/widget.h: Remove #include .
---
 gcc/analyzer/access-diagram.cc  | 1 +
 gcc/analyzer/bounds-checking.cc | 1 +
 gcc/diagnostic-format-sarif.cc  | 1 +
 gcc/diagnostic.cc   | 1 +
 gcc/text-art/box-drawing.cc | 1 +
 gcc/text-art/canvas.cc  | 1 +
 gcc/text-art/ruler.cc   | 1 +
 gcc/text-art/selftests.cc   | 1 +
 gcc/text-art/selftests.h| 4 +++-
 gcc/text-art/style.cc   | 1 +
 gcc/text-art/styled-string.cc   | 1 +
 gcc/text-art/table.cc   | 1 +
 gcc/text-art/table.h| 1 -
 gcc/text-art/theme.cc   | 1 +
 gcc/text-art/types.h| 2 --
 gcc/text-art/widget.cc  | 1 +
 gcc/text-art/widget.h   | 1 -
 17 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/gcc/analyzer/access-diagram.cc b/gcc/analyzer/access-diagram.cc
index 968ff50a0b7..467c9bdd734 100644
--- a/gcc/analyzer/access-diagram.cc
+++ b/gcc/analyzer/access-diagram.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 #define INCLUDE_MEMORY
 #define INCLUDE_MAP
 #define INCLUDE_SET
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "coretypes.h"
diff --git a/gcc/analyzer/bounds-checking.cc b/gcc/analyzer/bounds-checking.cc
index 10632d12562..5e8de9a7aa5 100644
--- a/gcc/analyzer/bounds-checking.cc
+++ b/gcc/analyzer/bounds-checking.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "config.h"
 #define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "make-unique.h"
diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index ac2f5b844e3..5e483988027 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "diagnostic.h"
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 7c2289f0634..c523f215bae 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
message 

[PATCH] IVOPTS: Add LEN_MASK_{LOAD, STORE} into 'get_alias_ptr_type_for_ptr_address'

2023-06-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Add 
LEN_MASK_{LOAD,STORE}.

---
 gcc/tree-ssa-loop-ivopts.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 2b66fe66bc7..2eb19406e61 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -7563,6 +7563,8 @@ get_alias_ptr_type_for_ptr_address (iv_use *use)
 case IFN_MASK_STORE_LANES:
 case IFN_LEN_LOAD:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
   /* The second argument contains the correct alias type.  */
   gcc_assert (use->op_p = gimple_call_arg_ptr (call, 0));
   return TREE_TYPE (gimple_call_arg (call, 1));
-- 
2.36.3



[PATCH] SSA ALIAS: Apply LEN_MASK_STORE to 'ref_maybe_used_by_call_p_1'

2023-06-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* tree-ssa-alias.cc (call_may_clobber_ref_p_1): Add LEN_MASK_STORE.

---
 gcc/tree-ssa-alias.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 92dc1bb9987..f31fd042c2a 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -3070,6 +3070,7 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool 
tbaa_p)
return false;
   case IFN_MASK_STORE:
   case IFN_LEN_STORE:
+  case IFN_LEN_MASK_STORE:
   case IFN_MASK_STORE_LANES:
{
  tree rhs = gimple_call_arg (call,
-- 
2.36.3



[PATCH] LOOP IVOPTS: Apply LEN_MASK_{LOAD,STORE}

2023-06-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Apply 
LEN_MASK_{LOAD,STORE}.

---
 gcc/tree-ssa-loop-ivopts.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 6671ff6db5a..2b66fe66bc7 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -2442,6 +2442,7 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
 case IFN_MASK_LOAD:
 case IFN_MASK_LOAD_LANES:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   if (op_p == gimple_call_arg_ptr (call, 0))
return TREE_TYPE (gimple_call_lhs (call));
   return NULL_TREE;
@@ -2449,8 +2450,11 @@ get_mem_type_for_internal_fn (gcall *call, tree *op_p)
 case IFN_MASK_STORE:
 case IFN_MASK_STORE_LANES:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   if (op_p == gimple_call_arg_ptr (call, 0))
-   return TREE_TYPE (gimple_call_arg (call, 3));
+   return TREE_TYPE (
+ gimple_call_arg (call, internal_fn_stored_value_index (
+  gimple_call_internal_fn (call;
   return NULL_TREE;
 
 default:
-- 
2.36.3



[PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis

2023-06-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Apply 
LEN_MASK_{LOAD,STORE}

---
 gcc/tree-ssa-alias.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index e1bc04b82ba..92dc1bb9987 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -2815,11 +2815,13 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
bool tbaa_p)
   case IFN_SCATTER_STORE:
   case IFN_MASK_SCATTER_STORE:
   case IFN_LEN_STORE:
+  case IFN_LEN_MASK_STORE:
return false;
   case IFN_MASK_STORE_LANES:
goto process_args;
   case IFN_MASK_LOAD:
   case IFN_LEN_LOAD:
+  case IFN_LEN_MASK_LOAD:
   case IFN_MASK_LOAD_LANES:
{
  ao_ref rhs_ref;
-- 
2.36.3



[PATCH] GIMPLE_FOLD: Apply LEN_MASK_{LOAD,STORE} into GIMPLE_FOLD

2023-06-23 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, since we are going to have LEN_MASK_{LOAD,STORE} into loopVectorizer.

Currenly, 
1. we can fold MASK_{LOAD,STORE} into MEM when mask is all ones.
2. we can fold LEN_{LOAD,STORE} into MEM when (len - bias) is VF.

Now, I think it makes sense that we can support

fold LEN_MASK_{LOAD,STORE} into MEM when both mask = all ones and (len - bias) 
is VF.
 
gcc/ChangeLog:

* gimple-fold.cc (arith_overflowed_p): Apply LEN_MASK_{LOAD,STORE}.
(gimple_fold_partial_load_store_mem_ref): Ditto.
(gimple_fold_partial_store): Ditto.
(gimple_fold_call): Ditto.

---
 gcc/gimple-fold.cc | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 55e80567708..3d46b76edeb 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -5370,10 +5370,10 @@ arith_overflowed_p (enum tree_code code, const_tree 
type,
   return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
 }
 
-/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
-   for the memory it references, otherwise return null.  VECTYPE is the
-   type of the memory vector.  MASK_P indicates it's for MASK if true,
-   otherwise it's for LEN.  */
+/* If IFN_{MASK,LEN,LEN_MASK}_LOAD/STORE call CALL is unconditional,
+   return a MEM_REF for the memory it references, otherwise return null.
+   VECTYPE is the type of the memory vector.  MASK_P indicates it's for
+   MASK if true, otherwise it's for LEN.  */
 
 static tree
 gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
@@ -5400,6 +5400,16 @@ gimple_fold_partial_load_store_mem_ref (gcall *call, 
tree vectype, bool mask_p)
   if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
GET_MODE_SIZE (TYPE_MODE (vectype
return NULL_TREE;
+
+  /* For LEN_MASK_{LOAD,STORE}, we should also check whether
+ the mask is all ones mask.  */
+  internal_fn ifn = gimple_call_internal_fn (call);
+  if (ifn == IFN_LEN_MASK_LOAD || ifn == IFN_LEN_MASK_STORE)
+   {
+ tree mask = gimple_call_arg (call, internal_fn_mask_index (ifn));
+ if (!integer_all_onesp (mask))
+   return NULL_TREE;
+   }
 }
 
   unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
@@ -5438,7 +5448,8 @@ static bool
 gimple_fold_partial_store (gimple_stmt_iterator *gsi, gcall *call,
   bool mask_p)
 {
-  tree rhs = gimple_call_arg (call, 3);
+  internal_fn ifn = gimple_call_internal_fn (call);
+  tree rhs = gimple_call_arg (call, internal_fn_stored_value_index (ifn));
   if (tree lhs
   = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (rhs), mask_p))
 {
@@ -5676,9 +5687,11 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
inplace)
  changed |= gimple_fold_partial_store (gsi, stmt, true);
  break;
case IFN_LEN_LOAD:
+   case IFN_LEN_MASK_LOAD:
  changed |= gimple_fold_partial_load (gsi, stmt, false);
  break;
case IFN_LEN_STORE:
+   case IFN_LEN_MASK_STORE:
  changed |= gimple_fold_partial_store (gsi, stmt, false);
  break;
default:
-- 
2.36.3




Re: Do not account __builtin_unreachable guards in inliner

2023-06-23 Thread Jan Hubicka via Gcc-patches
> 
> So you need to feed it with extra info on the optimized out stmts because
> as-is it will not remove __builtin_unreachable ().  That means you're

My plan was to add entry point to tree-ssa-dce that will take an
set of stmts declared dead by external force and will do the usual mark
stage bypassing mark_stmt_if_necessary if the stmt is in the set of
deads.

> doing the find_obviously_necessary_stmts manually, skipping the
> conditional and all stmts it controls to the __builtin_unreachable () path?
> 
> I also think you want something cheaper than non-cd-dce mark, you also don't
> want to bother with stores/loads?

You are probably right. cd-dce marking became bit of a monster and I do
not want to care about memory.
One can add extra flag to avoid processing of memory, but the code I
would re-use is quite small.

I can do my own mark  just considering phis, pre-identified
conditionals and basic gimple_assigns with no side effects as possibly
unnecesary stmts.  I can completely ignore debug stmts.

So it should be one pass through the statments to populate the worklist
& simple walk of the ssa graph to propagae it.

> 
> Also when you only do this conditional how do you plan to use the result?

Well, the analysis is a loop that walks all basic blocks and then all
stmts.  I can keep track if computation of live stmts was done and in
that case query the flag assume it is true otherwise.

Honza


RE: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-23 Thread Li, Pan2 via Gcc-patches
Thanks Robine for the explanation, it is very clear to me. Totally agree below 
parts and I think we can leave it to the maintainers of the RTL/Machine 
Descriptions.

> Now we could argue that combine's behavior should change here and an
> insn without any alternatives is not actually available but that's not
> a battle I'm willing to fight 

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, June 22, 2023 10:31 PM
To: Li, Pan2 ; 钟居哲 ; gcc-patches 
; palmer ; kito.cheng 
; Jeff Law 
Cc: rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

> Just curious about the combine pass you mentioned, not very sure my
> understand is correct but it looks like the combine pass totally
> ignore the iterator requirement?
> 
> It is sort of surprise to me as the combine pass may also need the
> information of iterators.

combine tries to match instructions (with fitting modes of course).
It does not look at the insn constraints that reload/lra later can
use to switch between alternatives depending on the register situation
and other factors.

We e.g. have an instruction
 (define_insn "bla"
   (set (match_operand:VF 1   "=vd")
(match_operand:VF 2   "vr"))
   ...
and implicitly
  [(set_attr "enabled" "true")]

This instruction gets multiplexed via the VF iterator into (among others)
  (define_insn "bla"
(set (match_operand:VNx4HF 1   "=vd")
 (match_operand:VNx4HF 2   "vr"))
...
  [(set_attr "enabled" "true")]

When we set "enabled" to "false" via "fp_vector_disabled", we have:
  (define_insn "bla"
(set (match_operand:VNx4HF 1   "=vd")
 (match_operand:VNx4HF 2   "vr"))
...
  [(set_attr "enabled" "false")]

This means the only available alternative is disabled but the insn
itself is still there, particularly for combine which does not look
into the constraints.

So in our case the iterator "allowed" the instruction (leading combine
to think it is available) and we later masked it out with "enabled = false".
Now we could argue that combine's behavior should change here and an
insn without any alternatives is not actually available but that's not
a battle I'm willing to fight :D

Regards
 Robin


[PATCH] narrowing initializers and initializer_constant_valid_p_1

2023-06-23 Thread Richard Biener via Gcc-patches
initializer_constant_valid_p_1 attempts to handle narrowing
differences and sums but fails to handle when the overall
value looks like

  VIEW_CONVERT_EXPR(NON_LVALUE_EXPR 
-  VEC_COND_EXPR < { 0, 0 } == { 0, 0 } , { -1, -1 } , { 0, 0 } > )

where endtype is scalar integer but value is a vector type.
In this particular case all is good and we recurse since
two vector lanes is more than 64bits of long long.  But still
it compares apples and oranges.

Fixed by appropriately also requiring the type of the
value to be scalar integral.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.  This
seems to be the last fallout in the testsuite for the
TYPE_PRECISION checking patch.

* varasm.cc (initializer_constant_valid_p_1): Also
constrain the type of value to be scalar integral
before dispatching to narrowing_initializer_constant_valid_p.
---
 gcc/varasm.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index f2a19aa6dbd..542315f88cd 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -4944,6 +4944,7 @@ initializer_constant_valid_p_1 (tree value, tree endtype, 
tree *cache)
   if (cache && cache[0] == value)
return cache[1];
   if (! INTEGRAL_TYPE_P (endtype)
+ || ! INTEGRAL_TYPE_P (TREE_TYPE (value))
  || TYPE_PRECISION (endtype) >= TYPE_PRECISION (TREE_TYPE (value)))
{
  tree ncache[4] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
@@ -4980,6 +4981,7 @@ initializer_constant_valid_p_1 (tree value, tree endtype, 
tree *cache)
   if (cache && cache[0] == value)
return cache[1];
   if (! INTEGRAL_TYPE_P (endtype)
+ || ! INTEGRAL_TYPE_P (TREE_TYPE (value))
  || TYPE_PRECISION (endtype) >= TYPE_PRECISION (TREE_TYPE (value)))
{
  tree ncache[4] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
-- 
2.35.3


[PATCH] Fix initializer_constant_valid_p_1 TYPE_PRECISION use

2023-06-23 Thread Richard Biener via Gcc-patches
initializer_constant_valid_p_1 is letting through all conversions
of float vector types that have the same number of elements but
that's of course not valid.  The following restricts the code
to scalar floating point types as was probably intended (only
scalar integer types are handled as well).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* varasm.cc (initializer_constant_valid_p_1): Only
allow conversions between scalar floating point types.
---
 gcc/varasm.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index dd84754a283..f2a19aa6dbd 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -4885,7 +4885,8 @@ initializer_constant_valid_p_1 (tree value, tree endtype, 
tree *cache)
/* Allow length-preserving conversions between integer types and
   floating-point types.  */
if (((INTEGRAL_TYPE_P (dest_type) && INTEGRAL_TYPE_P (src_type))
-|| (FLOAT_TYPE_P (dest_type) && FLOAT_TYPE_P (src_type)))
+|| (SCALAR_FLOAT_TYPE_P (dest_type)
+&& SCALAR_FLOAT_TYPE_P (src_type)))
&& (TYPE_PRECISION (dest_type) == TYPE_PRECISION (src_type)))
  return initializer_constant_valid_p_1 (src, endtype, cache);
 
-- 
2.35.3


[PATCH] Deal with vector typed operands in conversions

2023-06-23 Thread Richard Biener via Gcc-patches
The following avoids using TYPE_PRECISION on VECTOR_TYPE when
looking for bit-precision changes in vectorizable_assignment.
We didn't anticipate a stmt like

  _21 = VIEW_CONVERT_EXPR(vect__1.7_28);

and the following makes sure to handle that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-stmts.cc (vectorizable_assignment):
Properly handle non-integral operands when analyzing
conversions.
---
 gcc/tree-vect-stmts.cc | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e6649789540..01cb19ce933 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5833,12 +5833,15 @@ vectorizable_assignment (vec_info *vinfo,
   /* We do not handle bit-precision changes.  */
   if ((CONVERT_EXPR_CODE_P (code)
|| code == VIEW_CONVERT_EXPR)
-  && INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
-  && (!type_has_mode_precision_p (TREE_TYPE (scalar_dest))
- || !type_has_mode_precision_p (TREE_TYPE (op)))
+  && ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
+  && !type_has_mode_precision_p (TREE_TYPE (scalar_dest)))
+ || (INTEGRAL_TYPE_P (TREE_TYPE (op))
+ && !type_has_mode_precision_p (TREE_TYPE (op
   /* But a conversion that does not change the bit-pattern is ok.  */
-  && !((TYPE_PRECISION (TREE_TYPE (scalar_dest))
-   > TYPE_PRECISION (TREE_TYPE (op)))
+  && !(INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
+  && INTEGRAL_TYPE_P (TREE_TYPE (op))
+  && (TYPE_PRECISION (TREE_TYPE (scalar_dest))
+  > TYPE_PRECISION (TREE_TYPE (op)))
   && TYPE_UNSIGNED (TREE_TYPE (op
 {
   if (dump_enabled_p ())
-- 
2.35.3


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-06-23 Thread Nathan Sidwell via Gcc-patches

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the 
ones with imports directly in the TU would suffice (i.e check the 'outermost' arg)


nathan


--
Nathan Sidwell



Re: Do not account __builtin_unreachable guards in inliner

2023-06-23 Thread Richard Biener via Gcc-patches
On Fri, Jun 23, 2023 at 12:11 PM Jan Hubicka  wrote:
>
> > On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka  wrote:
> > >
> > > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > this was suggested earlier somewhere, but I can not find the thread.
> > > > > C++ has assume attribute that expands int
> > > > >   if (conditional)
> > > > > __builtin_unreachable ()
> > > > > We do not want to account the conditional in inline heuristics since
> > > > > we know that it is going to be optimized out.
> > > > >
> > > > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > > > thre are no complains.
> > > >
> > > > I think we also had the request to not account the condition feeding
> > > > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > > > complex range comparisons here.  Also ...
> > >
> > > I was thinking of this: it depends on how smart do we want to get.
> > > We also have dead conditionals guarding clobbers, predicts and other
> > > stuff.  In general we can use mark phase of cd-dce telling it to ignore
> > > those statements and then use its resut in the analysis.
> >
> > Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
> > which might be a way to do this (of course we cannot use that 1:1).  Also
> > then consider
> >
> >  a = a + 1;
> >  if (a > 10)
> >__builtin_unreachable ();
> >  if (a < 5)
> >__builtin_unreachable ();
> >
> > and a has more than one use but both are going away.  So indeed a
> > more global analysis would be needed to get the full benefit.
>
> I was looking into simple_dce_from_worklist and if I understand it
> right, it simply walks list of SSA names which probably lost some uses
> by the consuming pass. If they have zero non-debug uses and defining 
> statement has
> no side effects, then they are removed.
>
> I think this is not really fitting the bill here since the example above
> is likely to be common and also if we want one assign filling
> conditional optimized out, we probably want to handle case with multiple
> assignments.  What about
>  1) walk function body and see if there are conditionals we know will be
> optimized out (at the begining those can be only those which has one
> arm reaching __bulitin_unreachable
>  2) if there are none, just proceed with fnsummary construction
>  3) if there were some, do non-cd-dce mark stage which will skip those
> dead conditional identified in 1
> and proceed to fnsummary construction with additional bitmap of
> marked stmts.

So you need to feed it with extra info on the optimized out stmts because
as-is it will not remove __builtin_unreachable ().  That means you're
doing the find_obviously_necessary_stmts manually, skipping the
conditional and all stmts it controls to the __builtin_unreachable () path?

I also think you want something cheaper than non-cd-dce mark, you also don't
want to bother with stores/loads?

Also when you only do this conditional how do you plan to use the result?

Richard.

>
> This should be cheaper than unconditionally doing cd-dce and should
> handle common cases?
> Honza


GCC 10.4.1 Status Report (2023-06-23)

2023-06-23 Thread Richard Biener via Gcc-patches


Status
==

The gcc-10 branch is open for regression and documentation fixes.

The last release from the branch, GCC 10.5, before it is being
closed is due.  There will be a release candidate next week,
Friday, June 30th followed by the actual release a week later
on July 7th.

Please check if you have any important and safe regression fixes
to backport and verify your port is on good shape to build from
the branch so we do not end up with a non-working last release
from that branch.


Quality Data


Priority  #   Change from last report
---   ---
P1  0  
P2  457   +  16
P3  56+  11 
P4  179   -  27
P5  23-   1
---   ---
Total P1-P3 513   +  27
Total   715   -   1


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2022-October/239683.html


Re: [PATCH, OpenACC 2.7] Implement default clause support for data constructs

2023-06-23 Thread Thomas Schwinge
Hi Chung-Lin!

On 2023-06-06T23:11:55+0800, Chung-Lin Tang  wrote:
> this patch implements the OpenACC 2.7 addition of default(none|present) 
> support
> for data constructs.

Thanks!

It wasn't clear to me what is supposed to happen, for example, for:

#pragma acc data default(none)
{
  #pragma acc data // no 'default' clause
  {
#pragma acc parallel

Specifically, does the "no 'default' clause" inner 'data' construct
invalidate the 'default(none)' clause of the outer 'data' construct?

In later revisions of the OpenACC specification, wording for 'default'
clause etc. generally has been changed; for example, OpenACC 3.3,
2.6.2 "Variables with Implicitly Determined Data Attributes" defines:

*Visible 'default' clause*: The nearest 'default' clause appearing on the 
compute construct or a lexically containing 'data' construct.

Therefore, in the example above, the 'default(none)' still holds.

> Apart from adjusting the front-ends for allowed clauses masks (for acc data),
> mostly implemented in gimplify.

ACK ('s%mostly%%') -- but a little bit differently, please:

> From 101305aee9b27c6df00d7c403e469bdf8d7f45a4 Mon Sep 17 00:00:00 2001
> From: Chung-Lin Tang 
> Date: Tue, 6 Jun 2023 03:46:29 -0700
> Subject: [PATCH 2/2] OpenACC 2.7: default clause support for data constructs
>
> This patch implements the OpenACC 2.7 addition of default(none|present) 
> support
> for data constructs.
>
> Now, specifying "default(none|present)" on a data construct turns on same
> default clause behavior for all enclosed compute constructs (which don't
> already themselves have a default clause).

Please say "lexically enclosed" -- it's that only, not any dynamic
extent.

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -225,6 +225,7 @@ struct gimplify_omp_ctx
>vec loop_iter_var;
>location_t location;
>enum omp_clause_default_kind default_kind;
> +  enum omp_clause_default_kind oacc_data_default_kind;
>enum omp_region_type region_type;
>enum tree_code code;
>bool combined_loop;
> @@ -459,6 +460,8 @@ new_omp_context (enum omp_region_type region_type)
>  c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
>else
>  c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
> +  if (gimplify_omp_ctxp)
> +c->oacc_data_default_kind = gimplify_omp_ctxp->oacc_data_default_kind;
>c->defaultmap[GDMK_SCALAR] = GOVD_MAP;
>c->defaultmap[GDMK_SCALAR_TARGET] = GOVD_MAP;
>c->defaultmap[GDMK_AGGREGATE] = GOVD_MAP;

Instead of adding a new 'oacc_data_default_kind' to 'gimplify_omp_ctx',
let's please do this the other way round:

> @@ -12050,6 +12053,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>
>   case OMP_CLAUSE_DEFAULT:
> ctx->default_kind = OMP_CLAUSE_DEFAULT_KIND (c);

Here, we already preserve 'default' for whichever OMP construct.

> +   if (code == OACC_DATA)
> + ctx->oacc_data_default_kind = OMP_CLAUSE_DEFAULT_KIND (c);
> break;
>
>   case OMP_CLAUSE_INCLUSIVE:
> @@ -12098,6 +12103,21 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>   list_p = _CLAUSE_CHAIN (c);
>  }
>
> +  if ((code == OACC_PARALLEL
> +   || code == OACC_KERNELS
> +   || code == OACC_SERIAL)
> +  && ctx->default_kind == OMP_CLAUSE_DEFAULT_SHARED
> +  && ctx->oacc_data_default_kind != OMP_CLAUSE_DEFAULT_UNSPECIFIED)
> +{
> +  ctx->default_kind = ctx->oacc_data_default_kind;
> +
> +  /* Append actual default clause on compute construct. Not really needed
> +  for omp_notice_variable to work properly, but for debug dump files.  */
> +  c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_DEFAULT);
> +  OMP_CLAUSE_DEFAULT_KIND (c) = ctx->oacc_data_default_kind;
> +  *list_p = c;
> +}
> +
>ctx->clauses = *orig_list_p;
>gimplify_omp_ctxp = ctx;
>  }

Instead of this, in 'gimplify_omp_workshare', before the
'gimplify_scan_omp_clauses' call, do something like:

if ((ort & ORT_ACC)
&& !omp_find_clause (OMP_CLAUSES (expr), OMP_CLAUSE_DEFAULT))
  {
/* Determine effective 'default' clause for OpenACC compute construct.  
*/
for (struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp; ctx; ctx = 
ctx->outer_context)
  {
if (ctx->region_type == ORT_ACC_DATA
&& ctx->default_kind != OMP_CLAUSE_DEFAULT_SHARED)
  {
[Append actual default clause on compute construct.]
break;
  }
  }
  }

That seems conceptually simpler to me?

For the 'build_omp_clause', does using 'ctx->location' instead of
'UNKNOWN_LOCATION' help diagnostics in any way?  Like if we add in
'gcc/gimplify.cc:oacc_default_clause',
'if (ctx->default_kind == OMP_CLAUSE_DEFAULT_NONE)' another 'inform' to
point to the 'data' construct's 'default' clause?  (But not sure if
that's easily done; otherwise don't.)

Similar to the ones you've already got, please also add a few test cases
for 

Re: [PATCH][RFC] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-23 Thread Richard Biener via Gcc-patches
On Fri, 23 Jun 2023, Richard Biener wrote:

> The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
> ICEs when tree checking is enabled.  This should avoid wrong-code
> in cases like PR110182 and instead ICE.
> 
> It also introduces a TYPE_PRECISION_RAW accessor and adjusts
> places I found that are eligible to use that.
> 
> This patch requires (at least) the series of patches I will
> followup this with.  I have to re-bootstrap / test to look
> for further fallout (I've picked this up again after some weeks).
> 
> Opinions?

Bootstrapped on x86_64-unknown-linux-gnu with all lanugages enabled,
but there's still testsuite fallout.

Richard.

> Thanks,
> Richard.
> 
>   * tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
>   (TYPE_PRECISION_RAW): Provide raw access to the precision
>   field.
>   * tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
>   (gimple_canonical_types_compatible_p): Likewise.
>   * tree-streamer-out.cc (pack_ts_type_common_value_fields):
>   Stream TYPE_PRECISION_RAW.
>   * tree-streamer-in.cc (unpack_ts_type_common_value_fields):
>   Likewise.
>   * lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.
> ---
>  gcc/lto-streamer-out.cc  | 2 +-
>  gcc/tree-streamer-in.cc  | 2 +-
>  gcc/tree-streamer-out.cc | 2 +-
>  gcc/tree.cc  | 6 +++---
>  gcc/tree.h   | 4 +++-
>  5 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
> index 5ab2eb4301e..3432dd434e2 100644
> --- a/gcc/lto-streamer-out.cc
> +++ b/gcc/lto-streamer-out.cc
> @@ -1373,7 +1373,7 @@ hash_tree (struct streamer_tree_cache_d *cache, 
> hash_map *map,
>if (AGGREGATE_TYPE_P (t))
>   hstate.add_flag (TYPE_TYPELESS_STORAGE (t));
>hstate.commit_flag ();
> -  hstate.add_int (TYPE_PRECISION (t));
> +  hstate.add_int (TYPE_PRECISION_RAW (t));
>hstate.add_int (TYPE_ALIGN (t));
>hstate.add_int (TYPE_EMPTY_P (t));
>  }
> diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
> index c803800862c..e6919e463c0 100644
> --- a/gcc/tree-streamer-in.cc
> +++ b/gcc/tree-streamer-in.cc
> @@ -387,7 +387,7 @@ unpack_ts_type_common_value_fields (struct bitpack_d *bp, 
> tree expr)
>  TYPE_TYPELESS_STORAGE (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_EMPTY_P (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_NO_NAMED_ARGS_STDARG_P (expr) = (unsigned) bp_unpack_value (bp, 1);
> -  TYPE_PRECISION (expr) = bp_unpack_var_len_unsigned (bp);
> +  TYPE_PRECISION_RAW (expr) = bp_unpack_var_len_unsigned (bp);
>SET_TYPE_ALIGN (expr, bp_unpack_var_len_unsigned (bp));
>  #ifdef ACCEL_COMPILER
>if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
> index 5751f77273b..719cbeacf99 100644
> --- a/gcc/tree-streamer-out.cc
> +++ b/gcc/tree-streamer-out.cc
> @@ -356,7 +356,7 @@ pack_ts_type_common_value_fields (struct bitpack_d *bp, 
> tree expr)
>  bp_pack_value (bp, TYPE_TYPELESS_STORAGE (expr), 1);
>bp_pack_value (bp, TYPE_EMPTY_P (expr), 1);
>bp_pack_value (bp, TYPE_NO_NAMED_ARGS_STDARG_P (expr), 1);
> -  bp_pack_var_len_unsigned (bp, TYPE_PRECISION (expr));
> +  bp_pack_var_len_unsigned (bp, TYPE_PRECISION_RAW (expr));
>bp_pack_var_len_unsigned (bp, TYPE_ALIGN (expr));
>  }
>  
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 8e144bc090e..58288efa2e2 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -13423,7 +13423,7 @@ verify_type_variant (const_tree t, tree tv)
>   }
>verify_variant_match (TYPE_NEEDS_CONSTRUCTING);
>  }
> -  verify_variant_match (TYPE_PRECISION);
> +  verify_variant_match (TYPE_PRECISION_RAW);
>if (RECORD_OR_UNION_TYPE_P (t))
>  verify_variant_match (TYPE_TRANSPARENT_AGGR);
>else if (TREE_CODE (t) == ARRAY_TYPE)
> @@ -13701,8 +13701,8 @@ gimple_canonical_types_compatible_p (const_tree t1, 
> const_tree t2,
>|| TREE_CODE (t1) == OFFSET_TYPE
>|| POINTER_TYPE_P (t1))
>  {
> -  /* Can't be the same type if they have different recision.  */
> -  if (TYPE_PRECISION (t1) != TYPE_PRECISION (t2))
> +  /* Can't be the same type if they have different precision.  */
> +  if (TYPE_PRECISION_RAW (t1) != TYPE_PRECISION_RAW (t2))
>   return false;
>  
>/* In some cases the signed and unsigned types are required to be
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 1854fe4a7d4..1b791335d38 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -2191,7 +2191,9 @@ class auto_suppress_location_wrappers
>  #define TYPE_SIZE_UNIT(NODE) (TYPE_CHECK (NODE)->type_common.size_unit)
>  #define TYPE_POINTER_TO(NODE) (TYPE_CHECK (NODE)->type_common.pointer_to)
>  #define TYPE_REFERENCE_TO(NODE) (TYPE_CHECK (NODE)->type_common.reference_to)
> -#define TYPE_PRECISION(NODE) (TYPE_CHECK (NODE)->type_common.precision)
> +#define TYPE_PRECISION(NODE) \
> +  (TREE_NOT_CHECK (TYPE_CHECK 

Re: Ping [PATCH v4] Add condition coverage profiling

2023-06-23 Thread Jan Hubicka via Gcc-patches
> > 
> > gcc/ChangeLog:
> > 
> > * builtins.cc (expand_builtin_fork_or_exec): Check
> > profile_condition_flag.
> > * collect2.cc (main): Add -fno-profile-conditions to OBSTACK.
> > * common.opt: Add new options -fprofile-conditions and
> > * doc/gcov.texi: Add --conditions documentation.
> > * doc/invoke.texi: Add -fprofile-conditions documentation.
> > * gcc.cc: Link gcov on -fprofile-conditions.
> > * gcov-counter.def (GCOV_COUNTER_CONDS): New.
> > * gcov-dump.cc (tag_conditions): New.
> > * gcov-io.h (GCOV_TAG_CONDS): New.
> > (GCOV_TAG_CONDS_LENGTH): Likewise.
> > (GCOV_TAG_CONDS_NUM): Likewise.
> > * gcov.cc (class condition_info): New.
> > (condition_info::condition_info): New.
> > (condition_info::popcount): New.
> > (struct coverage_info): New.
> > (add_condition_counts): New.
> > (output_conditions): New.
> > (print_usage): Add -g, --conditions.
> > (process_args): Likewise.
> > (output_intermediate_json_line): Output conditions.
> > (read_graph_file): Read conditions counters.
> > (read_count_file): Read conditions counters.
> > (file_summary): Print conditions.
> > (accumulate_line_info): Accumulate conditions.
> > (output_line_details): Print conditions.
> > * ipa-inline.cc (can_early_inline_edge_p): Check
> > profile_condition_flag.
> > * ipa-split.cc (pass_split_functions::gate): Likewise.
> > * passes.cc (finish_optimization_passes): Likewise.
> > * profile.cc (find_conditions): New declaration.
> > (cov_length): Likewise.
> > (cov_blocks): Likewise.
> > (cov_masks): Likewise.
> > (cov_free): Likewise.
> > (instrument_decisions): New.
> > (read_thunk_profile): Control output to file.
> > (branch_prob): Call find_conditions, instrument_decisions.
> > (init_branch_prob): Add total_num_conds.
> > (end_branch_prob): Likewise.
> > * tree-profile.cc (struct conds_ctx): New.
> > (CONDITIONS_MAX_TERMS): New.
> > (EDGE_CONDITION): New.
> > (cmp_index_map): New.
> > (index_of): New.
> > (block_conditional_p): New.
> > (edge_conditional_p): New.
> > (single): New.
> > (single_edge): New.
> > (contract_edge): New.
> > (contract_edge_up): New.
> > (ancestors_of): New.
> > (struct outcomes): New.
> > (conditional_succs): New.
> > (condition_index): New.
> > (masking_vectors): New.
> > (cond_reachable_from): New.
> > (neighborhood): New.
> > (isolate_expression): New.
> > (emit_bitwise_op): New.
> > (make_index_map_visit): New.
> > (make_index_map): New.
> > (collect_conditions): New.
> > (yes): New.
> > (struct condcov): New.
> > (cov_length): New.
> > (cov_blocks): New.
> > (cov_masks): New.
> > (cov_free): New.
> > (find_conditions): New.
> > (instrument_decisions): New.
> > (tree_profiling): Check profile_condition_flag.
> > (pass_ipa_tree_profile::gate): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * lib/gcov.exp: Add condition coverage test function.
> > * g++.dg/gcov/gcov-18.C: New test.
> > * gcc.misc-tests/gcov-19.c: New test.
> > * gcc.misc-tests/gcov-20.c: New test.
> > * gcc.misc-tests/gcov-21.c: New test.
> > ---
> > 
> > v2:
> > * Moved the docs to rst/sphinx
> > * Output and message uses the 'conditions outcomes' vocabulary
> > * Fixed errors reported by contrib/style-check. Note that a few
> >   warnings persist but are either in comments (ascii art) or because
> >   the surrounding code (typically lists) are formatted the same way
> > v3:
> > * Revert docs from rst/sphinx to texinfo
> > v4:
> > * Rebased on trunk, removed @gol from texi
> > 
> >  gcc/builtins.cc|2 +-
> >  gcc/collect2.cc|7 +-
> >  gcc/common.opt |8 +
> >  gcc/doc/gcov.texi  |   37 +
> >  gcc/doc/invoke.texi|   19 +
> >  gcc/gcc.cc |4 +-
> >  gcc/gcov-counter.def   |3 +
> >  gcc/gcov-dump.cc   |   24 +
> >  gcc/gcov-io.h  |3 +
> >  gcc/gcov.cc|  200 +++-
> >  gcc/ipa-inline.cc  |2 +-
> >  gcc/ipa-split.cc   |3 +-
> >  gcc/passes.cc  |3 +-
> >  gcc/profile.cc |   84 +-
> >  gcc/testsuite/g++.dg/gcov/gcov-18.C|  234 +
> >  gcc/testsuite/gcc.misc-tests/gcov-19.c | 1250 
> >  gcc/testsuite/gcc.misc-tests/gcov-20.c |   22 +
> >  gcc/testsuite/gcc.misc-tests/gcov-21.c |   16 +
> >  gcc/testsuite/lib/gcov.exp |  191 +++-
> >  gcc/tree-profile.cc| 1048 +++-
> >  libgcc/libgcov-merge.c |5 +
> >  21 files changed, 3137 insertions(+), 28 

[PATCH] tree-optimization/96208 - SLP of non-grouped loads

2023-06-23 Thread Richard Biener via Gcc-patches
The following extends SLP discovery to handle non-grouped loads
in loop vectorization in the case the same load appears in all
lanes.

Code generation is adjusted to mimick what we do for the case
of single element interleaving (when the load is not unit-stride)
which is already handled by SLP.  There are some limits we
run into because peeling for gap cannot cover all cases and
we choose VMAT_CONTIGUOUS.  The patch does not try to address
these issues yet.

The main obstacle is that these loads are not
STMT_VINFO_GROUPED_ACCESS and that's a new thing with SLP.
I know from the past that it's not a good idea to make them
grouped.  Instead the following massages places to deal
with SLP loads that are not STMT_VINFO_GROUPED_ACCESS.

There's already a testcase testing for the case the PR
is after, just XFAILed, the following adjusts that instead
of adding another.

I do expect to have missed some so I don't plan to push this
on a Friday.  Still there may be feedback, so posting this
now.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

PR tree-optimization/96208
* tree-vect-slp.cc (vect_build_slp_tree_1): Allow
a non-grouped load if it is the same for all lanes.
(vect_build_slp_tree_2): Handle not grouped loads.
(vect_optimize_slp_pass::remove_redundant_permutations):
Likewise.
(vect_transform_slp_perm_load_1): Likewise.
* tree-vect-stmts.cc (vect_model_load_cost): Likewise.
(get_group_load_store_type): Likewise.  Handle
invariant accesses.
(vectorizable_load): Likewise.

* gcc.dg/vect/slp-46.c: Adjust for new vectorizations.
* gcc.dg/vect/bb-slp-pr65935.c: Adjust.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c |  16 ++-
 gcc/testsuite/gcc.dg/vect/slp-46.c |   2 +-
 gcc/tree-vect-slp.cc   |  51 +---
 gcc/tree-vect-stmts.cc | 128 +
 4 files changed, 127 insertions(+), 70 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index ee121364910..8cefa7f52af 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -24,11 +24,17 @@ void rephase (void)
   struct site *s;
   for(i=0,s=lattice;ilink[dir].e[j][k].real *= s->phase[dir];
- s->link[dir].e[j][k].imag *= s->phase[dir];
-   }
+  {
+   for(j=0;j<3;j++)
+ for(k=0;k<3;k++)
+   {
+ s->link[dir].e[j][k].real *= s->phase[dir];
+ s->link[dir].e[j][k].imag *= s->phase[dir];
+   }
+   /* Avoid loop vectorizing the outer loop after unrolling
+  the inners.  */
+   __asm__ volatile ("" : : : "memory");
+  }
 }
 
 int main()
diff --git a/gcc/testsuite/gcc.dg/vect/slp-46.c 
b/gcc/testsuite/gcc.dg/vect/slp-46.c
index 18476a43d3f..79ed0bb9f6b 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-46.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-46.c
@@ -94,4 +94,4 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
xfail vect_load_lanes } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
xfail vect_load_lanes } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ab89a82f1b3..4481d43e3d7 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1286,15 +1286,19 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
{
  if (load_p
  && rhs_code != CFN_GATHER_LOAD
- && rhs_code != CFN_MASK_GATHER_LOAD)
+ && rhs_code != CFN_MASK_GATHER_LOAD
+ /* Not grouped loads are handled as externals for BB
+vectorization.  For loop vectorization we can handle
+splats the same we handle single element interleaving.  */
+ && (is_a  (vinfo)
+ || stmt_info != first_stmt_info))
{
  /* Not grouped load.  */
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Build SLP failed: not grouped load %G", stmt);
 
- /* FORNOW: Not grouped loads are not supported.  */
- if (is_a  (vinfo) && i != 0)
+ if (i != 0)
continue;
  /* Fatal mismatch.  */
  matches[0] = false;
@@ -1302,7 +1306,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
}
 
  /* Not memory operation.  */
- if (!phi_p
+ if (!load_p
+ && !phi_p
  && rhs_code.is_tree_code ()
  && TREE_CODE_CLASS (tree_code (rhs_code)) != tcc_binary
  && TREE_CODE_CLASS (tree_code (rhs_code)) != tcc_unary
@@ -1774,7 +1779,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 return NULL;
 
   /* If the SLP node is a load, terminate the recursion unless 

Re: [PATCH V1] RISC-V:Add float16 tuple type support

2023-06-23 Thread Andreas Schwab
../../gcc/lto-streamer-out.cc: In function 'void lto_output_init_mode_table()':
../../gcc/lto-streamer-out.cc:3177:10: error: 'void* memset(void*, int, 
size_t)' forming offset [256, 283] is out of the bounds [0, 256] of object 
'streamer_mode_table' with type 'unsigned char [256]' [-Werror=array-bounds=]
 3177 |   memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
  |   ~~~^
In file included from ../../gcc/gimple-streamer.h:25,
 from ../../gcc/lto-streamer-out.cc:33:
../../gcc/tree-streamer.h:78:22: note: 'streamer_mode_table' declared here
   78 | extern unsigned char streamer_mode_table[1 << 8];
  |  ^~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1180: lto-streamer-out.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-06-23 Thread Andre Vieira (lists) via Gcc-patches

+  if (insn != arm_mve_get_loop_vctp (body))
+{

probably a good idea to invert the condition here and return false, 
helps reducing the indenting in this function.



+   /* Starting from the current insn, scan backwards through the insn
+  chain until BB_HEAD: "for each insn in the BB prior to the current".
+   */

There's a trailing whitespace after insn, but also I'd rewrite this bit. 
The "for each insn in the BB prior to the current" is superfluous and 
even confusing to me. How about:
"Scan backwards from the current INSN through the instruction chain 
until the start of the basic block.  "



 I find 'that previous insn' to be confusing as you don't mention any 
previous insn before. So how about something along the lines of:
'If a previous insn defines a register that INSN uses then return true 
if...'



Do we need to check: 'insn != prev_insn' ? Any reason why you can't 
start the loop with:

'for (rtx_insn *prev_insn = PREV_INSN (insn);'

Now I also found a case where things might go wrong in:
+   /* Look at all the DEFs of that previous insn: if one of them is on
+  the same REG as our current insn, then recurse in order to check
+  that insn's USEs.  If any of these insns return true as
+  MVE_VPT_UNPREDICATED_INSN_Ps, then the whole chain is affected
+  by the change in behaviour from being placed in dlstp/letp loop.
+   */
+   df_ref prev_insn_defs = NULL;
+   FOR_EACH_INSN_DEF (prev_insn_defs, prev_insn)
+ {
+   if (DF_REF_REGNO (insn_uses) == DF_REF_REGNO (prev_insn_defs)
+   && insn != prev_insn
+   && body == BLOCK_FOR_INSN (prev_insn)
+   && !arm_mve_vec_insn_is_predicated_with_this_predicate
+(insn, vctp_vpr_generated)
+   && arm_mve_check_df_chain_back_for_implic_predic
+(prev_insn, vctp_vpr_generated))
+ return true;
+ }

The body == BLOCK_FOR_INSN (prev_insn) hinted me at it, if a def comes 
from outside of the BB (so outside of the loop's body) then its by 
definition unpredicated by vctp.  I think you want to check that if 
prev_insn defines a register used by insn then return true if prev_insn 
isn't in the same BB or has a chain that is not predicated, i.e.: 
'!arm_mve_vec_insn_is_predicated_with_this_predicate (insn, 
vctp_vpr_generated) && arm_mve_check_df_chain_back_for_implic_predic 
prev_insn, vctp_vpr_generated))' you check body != BLOCK_FOR_INSN 
(prev_insn)'



I also found some other issues, this currently loloops:

uint16_t  test (uint16_t *a, int n)
{
  uint16_t res =0;
  while (n > 0)
{
  mve_pred16_t p = vctp16q (n);
  uint16x8_t va = vldrhq_u16 (a);
  res = vaddvaq_u16 (res, va);
  res = vaddvaq_p_u16 (res, va, p);
  a += 8;
  n -= 8;
}
  return res;
}

But it shouldn't, this is because there's a lack of handling of across 
vector instructions. Luckily in MVE all across vector instructions have 
the side-effect that they write to a scalar register, even the vshlcq 
instruction (it writes to a scalar carry output).


Did this lead me to find an ICE with:

uint16x8_t  test (uint16_t *a, int n)
{
  uint16x8_t res = vdupq_n_u16 (0);
  while (n > 0)
{
  uint16_t carry = 0;
  mve_pred16_t p = vctp16q (n);
  uint16x8_t va = vldrhq_u16 (a);
  res = vshlcq_u16 (va, , 1);
  res = vshlcq_m_u16 (res, , 1 , p);
  a += 8;
  n -= 8;
}
  return res;
}

This is because:
+ /* If the USE is outside the loop body bb, or it is inside, but
+is an unpredicated store to memory.  */
+ if (BLOCK_FOR_INSN (insn) != BLOCK_FOR_INSN (next_use_insn)
+|| (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate
+(next_use_insn, vctp_vpr_generated)
+   && mve_memory_operand
+   (SET_DEST (single_set (next_use_insn)),
+GET_MODE (SET_DEST (single_set (next_use_insn))
+   return true;

Assumes single_set doesn't return 0.

Let's deal with these issues and I'll continue to review.

On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:

     Hi all,

     This is the 2/2 patch that contains the functional changes needed
     for MVE Tail Predicated Low Overhead Loops.  See my previous email
     for a general introduction of MVE LOLs.

     This support is added through the already existing loop-doloop
     mechanisms that are used for non-MVE dls/le looping.

     Mid-end changes are:

     1) Relax the loop-doloop mechanism in the mid-end to allow for
    decrement numbers other that -1 and for `count` to be an
    rtx containing a simple REG (which in this case will contain
    the number of elements to be processed), rather
    than an expression for calculating the number of iterations.
  

Ping [PATCH v4] Add condition coverage profiling

2023-06-23 Thread Jørgen Kvalsvik via Gcc-patches
On 13/06/2023 09:59, Jørgen Kvalsvik wrote:
> This patch adds support in gcc+gcov for modified condition/decision
> coverage (MC/DC) with the -fprofile-conditions flag. MC/DC is a type of
> test/code coverage and it is particularly important in the avation and
> automotive industries for safety-critical applications. MC/DC it is
> required for or recommended by:
> 
> * DO-178C for the most critical software (Level A) in avionics
> * IEC 61508 for SIL 4
> * ISO 26262-6 for ASIL D
> 
> From the SQLite webpage:
> 
> Two methods of measuring test coverage were described above:
> "statement" and "branch" coverage. There are many other test
> coverage metrics besides these two. Another popular metric is
> "Modified Condition/Decision Coverage" or MC/DC. Wikipedia defines
> MC/DC as follows:
> 
> * Each decision tries every possible outcome.
> * Each condition in a decision takes on every possible outcome.
> * Each entry and exit point is invoked.
> * Each condition in a decision is shown to independently affect
>   the outcome of the decision.
> 
> In the C programming language where && and || are "short-circuit"
> operators, MC/DC and branch coverage are very nearly the same thing.
> The primary difference is in boolean vector tests. One can test for
> any of several bits in bit-vector and still obtain 100% branch test
> coverage even though the second element of MC/DC - the requirement
> that each condition in a decision take on every possible outcome -
> might not be satisfied.
> 
> https://sqlite.org/testing.html#mcdc
> 
> Wahlen, Heimdahl, and De Silva "Efficient Test Coverage Measurement for
> MC/DC" describes an algorithm for adding instrumentation by carrying
> over information from the AST, but my algorithm analyses the the control
> flow graph to instrument for coverage. This has the benefit of being
> programming language independent and faithful to compiler decisions
> and transformations. I have only tested it on primarily in C and C++,
> see testsuite/gcc.misc-tests and testsuite/g++.dg, and run some manual
> tests using D, Rust, and go. D and rust mostly behave as you would
> expect, although sometimes conditions are really expanded and therefore
> instrumented in another module than the one with the source, which also
> applies to the branch coverage. It does not work as expected for go as
> the go front end evaluates multi-conditional expressions by folding
> results into temporaries.
> 
> Like Wahlen et al this implementation records coverage in fixed-size
> bitsets which gcov knows how to interpret. This is very fast, but
> introduces a limit on the number of terms in a single boolean
> expression, the number of bits in a gcov_unsigned_type (which is
> typedef'd to uint64_t), so for most practical purposes this would be
> acceptable. This limitation is in the implementation and not the
> algorithm, so support for more conditions can be added by also
> introducing arbitrary-sized bitsets.
> 
> For space overhead, the instrumentation needs two accumulators
> (gcov_unsigned_type) per condition in the program which will be written
> to the gcov file. In addition, every function gets a pair of local
> accumulators, but these accmulators are reused between conditions in the
> same function.
> 
> For time overhead, there is a zeroing of the local accumulators for
> every condition and one or two bitwise operation on every edge taken in
> the an expression.
> 
> In action it looks pretty similar to the branch coverage. The -g short
> opt carries no significance, but was chosen because it was an available
> option with the upper-case free too.
> 
> gcov --conditions:
> 
> 3:   17:void fn (int a, int b, int c, int d) {
> 3:   18:if ((a && (b || c)) && d)
> conditions covered 3/8
> condition  0 not covered (true)
> condition  0 not covered (false)
> condition  1 not covered (true)
> condition  2 not covered (true)
> condition  3 not covered (true)
> 1:   19:x = 1;
> -:   20:else
> 2:   21:x = 2;
> 3:   22:}
> 
> gcov --conditions --json-format:
> 
> "conditions": [
> {
> "not_covered_false": [
> 0
> ],
> "count": 8,
> "covered": 3,
> "not_covered_true": [
> 0,
> 1,
> 2,
> 3
> ]
> }
> ],
> 
> Some expressions, mostly those without else-blocks, are effectively
> "rewritten" in the CFG construction making the algorithm unable to
> distinguish them:
> 
> and.c:
> 
> if (a && b && c)
> x = 1;
> 
> ifs.c:
> 
> if (a)
> if (b)
> if (c)
> x = 1;
> 
> gcc will build the same graph for both these programs, and gcov will
> report boths as 3-term expressions. It is vital that it is not
> interpreted the other way 

Re: Do not account __builtin_unreachable guards in inliner

2023-06-23 Thread Jan Hubicka via Gcc-patches
> On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka  wrote:
> >
> > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > > this was suggested earlier somewhere, but I can not find the thread.
> > > > C++ has assume attribute that expands int
> > > >   if (conditional)
> > > > __builtin_unreachable ()
> > > > We do not want to account the conditional in inline heuristics since
> > > > we know that it is going to be optimized out.
> > > >
> > > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > > thre are no complains.
> > >
> > > I think we also had the request to not account the condition feeding
> > > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > > complex range comparisons here.  Also ...
> >
> > I was thinking of this: it depends on how smart do we want to get.
> > We also have dead conditionals guarding clobbers, predicts and other
> > stuff.  In general we can use mark phase of cd-dce telling it to ignore
> > those statements and then use its resut in the analysis.
> 
> Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
> which might be a way to do this (of course we cannot use that 1:1).  Also
> then consider
> 
>  a = a + 1;
>  if (a > 10)
>__builtin_unreachable ();
>  if (a < 5)
>__builtin_unreachable ();
> 
> and a has more than one use but both are going away.  So indeed a
> more global analysis would be needed to get the full benefit.

I was looking into simple_dce_from_worklist and if I understand it
right, it simply walks list of SSA names which probably lost some uses
by the consuming pass. If they have zero non-debug uses and defining statement 
has
no side effects, then they are removed.

I think this is not really fitting the bill here since the example above
is likely to be common and also if we want one assign filling
conditional optimized out, we probably want to handle case with multiple
assignments.  What about
 1) walk function body and see if there are conditionals we know will be
optimized out (at the begining those can be only those which has one
arm reaching __bulitin_unreachable
 2) if there are none, just proceed with fnsummary construction
 3) if there were some, do non-cd-dce mark stage which will skip those
dead conditional identified in 1
and proceed to fnsummary construction with additional bitmap of
marked stmts.

This should be cheaper than unconditionally doing cd-dce and should
handle common cases?
Honza


Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-23 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 23 Jun 2023 at 14:58, Richard Biener  wrote:
>
> On Fri, Jun 23, 2023 at 11:09 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 22 Jun 2023 at 18:06, Richard Biener  
> > wrote:
> > >
> > > On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Tue, 20 Jun 2023 at 16:47, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Hi Richard,
> > > > > > For the following reduced test-case taken from PR:
> > > > > >
> > > > > > #include "arm_sve.h"
> > > > > > svuint32_t l() {
> > > > > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > > > > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > > > > }
> > > > > >
> > > > > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > > > > during GIMPLE pass: fre
> > > > > > pr110280.c: In function 'l':
> > > > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > > > > tree-ssa-sccvn.cc:6890
> > > > > > 5 | }
> > > > > >   | ^
> > > > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > > > > gimple_stmt_iterator*)
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > > > > 0x120bf4d 
> > > > > > eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > > > > 0x120bf4d 
> > > > > > eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > > > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > > > > ../../gcc/gcc/domwalk.cc:311
> > > > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > > > > 0x1214664 do_rpo_vn_1
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > > > > 0x1215ba5 execute
> > > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > > > > >
> > > > > > cc1 simplifies:
> > > > > >   lanes[0] = 0;
> > > > > >   lanes[1] = 0;
> > > > > >   lanes[2] = 0;
> > > > > >   lanes[3] = 0;
> > > > > >   _1 = { -1, ... };
> > > > > >   _7 = svld1rq_u32 (_1, );
> > > > > >
> > > > > > to:
> > > > > >   _9 = MEM  [(unsigned int * 
> > > > > > {ref-all})];
> > > > > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > > > > >
> > > > > > and then fre1 dump shows:
> > > > > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to 
> > > > > > {
> > > > > > 0, 0, 0, 0 }
> > > > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 
> > > > > > 0, 0, 0 }
> > > > > >
> > > > > > The issue seems to be with the following pattern:
> > > > > > (simplify
> > > > > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > > > > >  @0)
> > > > > >
> > > > > > which simplifies above VEC_PERM_EXPR to:
> > > > > > _7 = {0, 0, 0, 0}
> > > > > > which is incorrect since _9 and mask have different vector lengths.
> > > > > >
> > > > > > The attached patch amends the pattern to simplify above 
> > > > > > VEC_PERM_EXPR
> > > > > > only if operand and mask have same number of elements, which seems 
> > > > > > to fix
> > > > > > the issue, and we're left with the following in .optimized dump:
> > > > > >[local count: 1073741824]:
> > > > > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, 
> > > > > > ... }>;
> > > > >
> > > > > it would be nice to have this optimized.
> > > > >
> > > > > -
> > > > >  (simplify
> > > > >   (vec_perm vec_same_elem_p@0 @0 @1)
> > > > > - @0)
> > > > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > > > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> > > > > +  @0))
> > > > >
> > > > > that looks good I think.  Maybe even better use 'type' instead of 
> > > > > TREE_TYPE (@1)
> > > > > since that's more obviously the return type in which case
> > > > >
> > > > >   (if (types_match (type, TREE_TYPE (@0))
> > > > >
> > > > > would be more to the point.
> > > > >
> > > > > But can't you to simplify this in the !known_eq case do a simple
> > > > >
> > > > >   { build_vector_from_val (type, the-element); }
> > > > >
> > > > > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> > > > >
> > > > >  (with { tree el = uniform_vector_p (@0); }
> > > > >   (if (el)
> > > > >{ build_vector_from_val (type, el); })))
> > > > >
> > > > > would be the cheapest workaround.
> > > > Hi Richard,
> > > > Thanks for the suggestions. Using build_vector_from_val simplifies it 
> > > > to:
> > > >[local count: 1073741824]:
> > > >   return { 0, ... };
> > > >
> > > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> > > > x86_64-linux-gnu.
> > > > OK to commit ?
> > >
> > > Can you retain the case of matching type?  Like
> > >
> > >   (if (types_match (type, TREE_TYPE (@0))
> > >@0
> > >(with
> > > {
> > >tree elem = uniform_vector_p (@0);
> > > }

GCC 13.1.1 Status Report (2023-06-23)

2023-06-23 Thread Richard Biener via Gcc-patches


Status
==

The gcc-13 branch is open for regression and documentation fixes.

It's time to plan for a GCC 13.2 release which should follow
roughly two to three months after the .1 release.  The plan is
to do a release candidate for GCC 13.2 on Thursday, Jul 20th
with the release following a week after that on Thursday, Jul 27th.

That leaves plenty of times to backport regression fixes from trunk.
Please make sure to go over the list of your assigned bugreports and
consider backporting where that seems safe.


Quality Data


Priority  #   Change from last report
---   ---
P12   +   2
P2  514   +  22
P3  109   +  53
P4  216   -  25
P5   24
---   ---
Total P1-P3 625   +  76
Total   865   +  51


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2023-April/241140.html


Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread Bernhard Reutner-Fischer via Gcc-patches
On 23 June 2023 10:03:45 CEST, Richard Sandiford  
wrote:

>> Fuse the block below into the one above as the condition seems to be 
>> identical?
>
>Yeah, true, but I think the idea is that the code above “Arguments are
>ready” is calculating argument values, and the code after it is creating
>code.  These are two separate steps, and the fact that the two final_len
>blocks end up being consecutive is something of a coincidence.
>
>So personally I think we should keep the structure in the patch.

Sure, works for me.
thanks,


Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-23 Thread Richard Biener via Gcc-patches
On Fri, Jun 23, 2023 at 11:09 AM Prathamesh Kulkarni
 wrote:
>
> On Thu, 22 Jun 2023 at 18:06, Richard Biener  
> wrote:
> >
> > On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Tue, 20 Jun 2023 at 16:47, Richard Biener  
> > > wrote:
> > > >
> > > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi Richard,
> > > > > For the following reduced test-case taken from PR:
> > > > >
> > > > > #include "arm_sve.h"
> > > > > svuint32_t l() {
> > > > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > > > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > > > }
> > > > >
> > > > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > > > during GIMPLE pass: fre
> > > > > pr110280.c: In function 'l':
> > > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > > > tree-ssa-sccvn.cc:6890
> > > > > 5 | }
> > > > >   | ^
> > > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > > > gimple_stmt_iterator*)
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > > > ../../gcc/gcc/domwalk.cc:311
> > > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > > > 0x1214664 do_rpo_vn_1
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > > > 0x1215ba5 execute
> > > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > > > >
> > > > > cc1 simplifies:
> > > > >   lanes[0] = 0;
> > > > >   lanes[1] = 0;
> > > > >   lanes[2] = 0;
> > > > >   lanes[3] = 0;
> > > > >   _1 = { -1, ... };
> > > > >   _7 = svld1rq_u32 (_1, );
> > > > >
> > > > > to:
> > > > >   _9 = MEM  [(unsigned int * 
> > > > > {ref-all})];
> > > > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > > > >
> > > > > and then fre1 dump shows:
> > > > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> > > > > 0, 0, 0, 0 }
> > > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 
> > > > > 0, 0 }
> > > > >
> > > > > The issue seems to be with the following pattern:
> > > > > (simplify
> > > > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > > > >  @0)
> > > > >
> > > > > which simplifies above VEC_PERM_EXPR to:
> > > > > _7 = {0, 0, 0, 0}
> > > > > which is incorrect since _9 and mask have different vector lengths.
> > > > >
> > > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> > > > > only if operand and mask have same number of elements, which seems to 
> > > > > fix
> > > > > the issue, and we're left with the following in .optimized dump:
> > > > >[local count: 1073741824]:
> > > > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, 
> > > > > ... }>;
> > > >
> > > > it would be nice to have this optimized.
> > > >
> > > > -
> > > >  (simplify
> > > >   (vec_perm vec_same_elem_p@0 @0 @1)
> > > > - @0)
> > > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> > > > +  @0))
> > > >
> > > > that looks good I think.  Maybe even better use 'type' instead of 
> > > > TREE_TYPE (@1)
> > > > since that's more obviously the return type in which case
> > > >
> > > >   (if (types_match (type, TREE_TYPE (@0))
> > > >
> > > > would be more to the point.
> > > >
> > > > But can't you to simplify this in the !known_eq case do a simple
> > > >
> > > >   { build_vector_from_val (type, the-element); }
> > > >
> > > > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> > > >
> > > >  (with { tree el = uniform_vector_p (@0); }
> > > >   (if (el)
> > > >{ build_vector_from_val (type, el); })))
> > > >
> > > > would be the cheapest workaround.
> > > Hi Richard,
> > > Thanks for the suggestions. Using build_vector_from_val simplifies it to:
> > >[local count: 1073741824]:
> > >   return { 0, ... };
> > >
> > > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> > > x86_64-linux-gnu.
> > > OK to commit ?
> >
> > Can you retain the case of matching type?  Like
> >
> >   (if (types_match (type, TREE_TYPE (@0))
> >@0
> >(with
> > {
> >tree elem = uniform_vector_p (@0);
> > }
> >(if (elem)
> > { build_vector_from_val (type, elem); }
> >
> > ?  Because uniform_vector_p is strictly less powerful than (vec_same_elem_p 
> > ...)
> >
> > OK with that change.
> Thanks, does the attached patch look OK ?

OK.

> Bootstrapped+tested on aarch64-linux-gnu and x86_64-linux-gnu.
>
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >
> > >
> > > 

Re: [SVE][match.pd] Fix ICE observed in PR110280

2023-06-23 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 22 Jun 2023 at 18:06, Richard Biener  wrote:
>
> On Thu, Jun 22, 2023 at 11:08 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Tue, 20 Jun 2023 at 16:47, Richard Biener  
> > wrote:
> > >
> > > On Tue, Jun 20, 2023 at 11:56 AM Prathamesh Kulkarni via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi Richard,
> > > > For the following reduced test-case taken from PR:
> > > >
> > > > #include "arm_sve.h"
> > > > svuint32_t l() {
> > > >   alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
> > > >   return svld1rq_u32(svptrue_b8(), lanes);
> > > > }
> > > >
> > > > compiling with -O3 -mcpu=generic+sve results in following ICE:
> > > > during GIMPLE pass: fre
> > > > pr110280.c: In function 'l':
> > > > pr110280.c:5:1: internal compiler error: in eliminate_stmt, at
> > > > tree-ssa-sccvn.cc:6890
> > > > 5 | }
> > > >   | ^
> > > > 0x865fb1 eliminate_dom_walker::eliminate_stmt(basic_block_def*,
> > > > gimple_stmt_iterator*)
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:6890
> > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7324
> > > > 0x120bf4d eliminate_dom_walker::before_dom_children(basic_block_def*)
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7257
> > > > 0x1aeec77 dom_walker::walk(basic_block_def*)
> > > > ../../gcc/gcc/domwalk.cc:311
> > > > 0x11fd924 eliminate_with_rpo_vn(bitmap_head*)
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:7504
> > > > 0x1214664 do_rpo_vn_1
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8616
> > > > 0x1215ba5 execute
> > > > ../../gcc/gcc/tree-ssa-sccvn.cc:8702
> > > >
> > > > cc1 simplifies:
> > > >   lanes[0] = 0;
> > > >   lanes[1] = 0;
> > > >   lanes[2] = 0;
> > > >   lanes[3] = 0;
> > > >   _1 = { -1, ... };
> > > >   _7 = svld1rq_u32 (_1, );
> > > >
> > > > to:
> > > >   _9 = MEM  [(unsigned int * {ref-all})];
> > > >   _7 = VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }>;
> > > >
> > > > and then fre1 dump shows:
> > > > Applying pattern match.pd:8675, generic-match-5.cc:9025
> > > > Match-and-simplified VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> to {
> > > > 0, 0, 0, 0 }
> > > > RHS VEC_PERM_EXPR <_9, _9, { 0, 1, 2, 3, ... }> simplified to { 0, 0, 
> > > > 0, 0 }
> > > >
> > > > The issue seems to be with the following pattern:
> > > > (simplify
> > > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > > >  @0)
> > > >
> > > > which simplifies above VEC_PERM_EXPR to:
> > > > _7 = {0, 0, 0, 0}
> > > > which is incorrect since _9 and mask have different vector lengths.
> > > >
> > > > The attached patch amends the pattern to simplify above VEC_PERM_EXPR
> > > > only if operand and mask have same number of elements, which seems to 
> > > > fix
> > > > the issue, and we're left with the following in .optimized dump:
> > > >[local count: 1073741824]:
> > > >   _2 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 0, 0, 0, 0 }, { 0, 1, 2, 3, ... 
> > > > }>;
> > >
> > > it would be nice to have this optimized.
> > >
> > > -
> > >  (simplify
> > >   (vec_perm vec_same_elem_p@0 @0 @1)
> > > - @0)
> > > + (if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)),
> > > +   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1
> > > +  @0))
> > >
> > > that looks good I think.  Maybe even better use 'type' instead of 
> > > TREE_TYPE (@1)
> > > since that's more obviously the return type in which case
> > >
> > >   (if (types_match (type, TREE_TYPE (@0))
> > >
> > > would be more to the point.
> > >
> > > But can't you to simplify this in the !known_eq case do a simple
> > >
> > >   { build_vector_from_val (type, the-element); }
> > >
> > > ?  The 'vec_same_elem_p' predicate doesn't get you at the element,
> > >
> > >  (with { tree el = uniform_vector_p (@0); }
> > >   (if (el)
> > >{ build_vector_from_val (type, el); })))
> > >
> > > would be the cheapest workaround.
> > Hi Richard,
> > Thanks for the suggestions. Using build_vector_from_val simplifies it to:
> >[local count: 1073741824]:
> >   return { 0, ... };
> >
> > Patch is bootstrapped+tested on aarch64-linux-gnu, in progress on
> > x86_64-linux-gnu.
> > OK to commit ?
>
> Can you retain the case of matching type?  Like
>
>   (if (types_match (type, TREE_TYPE (@0))
>@0
>(with
> {
>tree elem = uniform_vector_p (@0);
> }
>(if (elem)
> { build_vector_from_val (type, elem); }
>
> ?  Because uniform_vector_p is strictly less powerful than (vec_same_elem_p 
> ...)
>
> OK with that change.
Thanks, does the attached patch look OK ?
Bootstrapped+tested on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Prathamesh
>
> Richard.
>
>
> >
> > Thanks,
> > Prathamesh
> > >
> > > >   return _2;
> > > >
> > > > code-gen:
> > > > l:
> > > > mov z0.b, #0
> > > > ret
> > > >
> > > > Patch is bootstrapped+tested on aarch64-linux-gnu.
> > > > OK to commit ?
> > > >
> > > > Thanks,
> > > > Prathamesh
[aarch64/match.pd] Fix ICE observed in PR110280.

gcc/ChangeLog:
PR tree-optimization/110280
 

Re: Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread 钟居哲
Oh. Ok Thanks Richard so much.
I will merge V6 after I finished regression.

Previously, I didn't understand whether you want V7 (I tried use google 
translator to translate your words :)
Now I understand you are happy with V6.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-23 16:03
To: Bernhard Reutner-Fischer
CC: juzhe.zhong; gcc-patches; rguenther
Subject: Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
Bernhard Reutner-Fischer  writes:
> On 23 June 2023 01:51:12 CEST, juzhe.zh...@rivai.ai wrote:
>>From: Ju-Zhe Zhong 
>
> I am sorry but I somehow overlooked a trivial spot in V5.
> Nit which does not warrant an immediate next version, but please consider it 
> before pushing iff approved:
>
>>+   if (final_len)
>>+ {
>>+   signed char biasval
>>+ = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
>>+
>>+   bias = build_int_cst (intQI_type_node, biasval);
>>+ }
>>+
>>+   /* Arguments are ready.  Create the new vector stmt.  */
>>+   if (final_len)
>>+ {
>
> Fuse the block below into the one above as the condition seems to be 
> identical?
 
Yeah, true, but I think the idea is that the code above “Arguments are
ready” is calculating argument values, and the code after it is creating
code.  These are two separate steps, and the fact that the two final_len
blocks end up being consecutive is something of a coincidence.
 
So personally I think we should keep the structure in the patch.
 
Thanks,
Richard
 


[PATCH 5/6] Bogus and missed folding on vector compares

2023-06-23 Thread Richard Biener via Gcc-patches
fold_binary tries to transform (double)float1 CMP (double)float2
into float1 CMP float2 but ends up using TYPE_PRECISION on the
argument types.  For vector types that compares the number of
lanes which should be always equal (so it's harmless as to
not generating wrong code).  The following instead properly
uses element_precision.

The same happens in the corresponding match.pd pattern.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, will
push after that finished.

* fold-const.cc (fold_binary_loc): Use element_precision
when trying (double)float1 CMP (double)float2 to
float1 CMP float2 simplification.
* match.pd: Likewise.
---
 gcc/fold-const.cc | 4 ++--
 gcc/match.pd  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 7e35eda7140..ac90a594fcc 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -12564,10 +12564,10 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
tree targ1 = strip_float_extensions (arg1);
tree newtype = TREE_TYPE (targ0);
 
-   if (TYPE_PRECISION (TREE_TYPE (targ1)) > TYPE_PRECISION (newtype))
+   if (element_precision (TREE_TYPE (targ1)) > element_precision (newtype))
  newtype = TREE_TYPE (targ1);
 
-   if (TYPE_PRECISION (newtype) < TYPE_PRECISION (TREE_TYPE (arg0)))
+   if (element_precision (newtype) < element_precision (TREE_TYPE (arg0)))
  return fold_build2_loc (loc, code, type,
  fold_convert_loc (loc, newtype, targ0),
  fold_convert_loc (loc, newtype, targ1));
diff --git a/gcc/match.pd b/gcc/match.pd
index 2dd23826034..85d562a531d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6034,10 +6034,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 type1 = double_type_node;
 }
   tree newtype
-= (TYPE_PRECISION (TREE_TYPE (@00)) > TYPE_PRECISION (type1)
+= (element_precision (TREE_TYPE (@00)) > element_precision (type1)
   ? TREE_TYPE (@00) : type1);
  }
- (if (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (newtype))
+ (if (element_precision (TREE_TYPE (@0)) > element_precision (newtype))
   (cmp (convert:newtype @00) (convert:newtype @10
 
 
-- 
2.35.3



[PATCH 6/6] Use element_precision for match.pd arith conversion optimization

2023-06-23 Thread Richard Biener via Gcc-patches
The simplification (outertype)((innertype0)a+(innertype1)b) to
((newtype)a+(newtype)b) ends up using TYPE_PRECISION to check
whether it can elide a conversion but in some paths there can
be VECTOR_TYPEs where this instead compares the number of lanes.
The following fixes the missed optimizations and uses
element_precision in those places.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, will
push after that finished.

* match.pd ((outertype)((innertype0)a+(innertype1)b)
-> ((newtype)a+(newtype)b)): Use element_precision
where appropriate.
---
 gcc/match.pd | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 85d562a531d..48b76e6a051 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7428,9 +7428,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  && newtype == type
  && types_match (newtype, type))
(op (convert:newtype @1) (convert:newtype @2))
-   (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
+   (with { if (element_precision (ty1) > element_precision (newtype))
  newtype = ty1;
-   if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
+   if (element_precision (ty2) > element_precision (newtype))
  newtype = ty2; }
   /* Sometimes this transformation is safe (cannot
  change results through affecting double rounding
@@ -7453,9 +7453,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  exponent range for the product or ratio of two
  values representable in the TYPE to be within the
  range of normal values of ITYPE.  */
- (if (TYPE_PRECISION (newtype) < TYPE_PRECISION (itype)
+ (if (element_precision (newtype) < element_precision (itype)
   && (flag_unsafe_math_optimizations
-  || (TYPE_PRECISION (newtype) == TYPE_PRECISION (type)
+  || (element_precision (newtype) == element_precision 
(type)
   && real_can_shorten_arithmetic (TYPE_MODE (itype),
   TYPE_MODE (type))
   && !excess_precision_type (newtype)))
-- 
2.35.3


[PATCH 3/6] Properly guard vect_look_through_possible_promotion

2023-06-23 Thread Richard Biener via Gcc-patches
The function ends up getting called on VECTOR_TYPEs which it
really isn't prepared for and with the TYPE_PRECISION checking
changes will ICE.  The following exits early when the type
to work on isn't scalar integral.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, will
push after that finished.

* tree-vect-patterns.cc (vect_look_through_possible_promotion):
Exit early when the type isn't scalar integral.
---
 gcc/tree-vect-patterns.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 60bc9be6819..a04accf3b03 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -398,8 +398,11 @@ vect_look_through_possible_promotion (vec_info *vinfo, 
tree op,
  vect_unpromoted_value *unprom,
  bool *single_use_p = NULL)
 {
-  tree res = NULL_TREE;
   tree op_type = TREE_TYPE (op);
+  if (!INTEGRAL_TYPE_P (op_type))
+return NULL_TREE;
+
+  tree res = NULL_TREE;
   unsigned int orig_precision = TYPE_PRECISION (op_type);
   unsigned int min_precision = orig_precision;
   stmt_vec_info caster = NULL;
@@ -3881,6 +3884,7 @@ vect_recog_vector_vector_shift_pattern (vec_info *vinfo,
   if (TREE_CODE (oprnd0) != SSA_NAME
   || TREE_CODE (oprnd1) != SSA_NAME
   || TYPE_MODE (TREE_TYPE (oprnd0)) == TYPE_MODE (TREE_TYPE (oprnd1))
+  || !INTEGRAL_TYPE_P (TREE_TYPE (oprnd0))
   || !type_has_mode_precision_p (TREE_TYPE (oprnd1))
   || TYPE_PRECISION (TREE_TYPE (lhs))
 != TYPE_PRECISION (TREE_TYPE (oprnd0)))
-- 
2.35.3



[PATCH 4/6] Fix tree_simple_nonnegative_warnv_p for VECTOR_TYPEs

2023-06-23 Thread Richard Biener via Gcc-patches
tree_simple_nonnegative_warnv_p ends up being called on VECTOR_TYPEs
which I think even gets the wrong answer here for tcc_comparison
since vector bools are signed.  The following properly guards
that with !VECTOR_TYPE_P.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, will
push after that finished.

* fold-const.cc (tree_simple_nonnegative_warnv_p): Guard
the truth_value_p case with !VECTOR_TYPE_P.
---
 gcc/fold-const.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 3aa6851acd5..7e35eda7140 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -14530,7 +14530,8 @@ tree_expr_maybe_real_minus_zero_p (const_tree x)
 static bool
 tree_simple_nonnegative_warnv_p (enum tree_code code, tree type)
 {
-  if ((TYPE_PRECISION (type) != 1 || TYPE_UNSIGNED (type))
+  if (!VECTOR_TYPE_P (type)
+  && (TYPE_PRECISION (type) != 1 || TYPE_UNSIGNED (type))
   && truth_value_p (code))
 /* Truth values evaluate to 0 or 1, which is nonnegative unless we
have a signed:1 type (where the value is -1 and 0).  */
-- 
2.35.3



[PATCH 2/6] Fix TYPE_PRECISION use in hashable_expr_equal_p

2023-06-23 Thread Richard Biener via Gcc-patches
While the checks look unnecessary they probably are quick and
thus done early.  The following avoids using TYPE_PRECISION
on VECTOR_TYPEs by making the code match the comment which
talks about precision and signedness.  An alternative would
be to only retain the ERROR_MARK and TYPE_MODE checks or
use TYPE_PRECISION_RAW (but I like that least).

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, OK?

* tree-ssa-scopedtables.cc (hashable_expr_equal_p):
Use element_precision.
---
 gcc/tree-ssa-scopedtables.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-scopedtables.cc b/gcc/tree-ssa-scopedtables.cc
index 528ddf2a2ab..e698ef97343 100644
--- a/gcc/tree-ssa-scopedtables.cc
+++ b/gcc/tree-ssa-scopedtables.cc
@@ -574,7 +574,7 @@ hashable_expr_equal_p (const struct hashable_expr *expr0,
   && (TREE_CODE (type0) == ERROR_MARK
  || TREE_CODE (type1) == ERROR_MARK
  || TYPE_UNSIGNED (type0) != TYPE_UNSIGNED (type1)
- || TYPE_PRECISION (type0) != TYPE_PRECISION (type1)
+ || element_precision (type0) != element_precision (type1)
  || TYPE_MODE (type0) != TYPE_MODE (type1)))
 return false;
 
-- 
2.35.3



[PATCH 1/6] Avoid shorten_binary_op on VECTOR_TYPE

2023-06-23 Thread Richard Biener via Gcc-patches
When we disallow TYPE_PRECISION on VECTOR_TYPEs it shows that
shorten_binary_op performs some checks on that that are likely
harmless in the end.  The following bails out early for
VECTOR_TYPE operations to avoid those questionable checks.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

OK?

gcc/c-family/
* c-common.cc (shorten_binary_op): Exit early for VECTOR_TYPE
operations.
---
 gcc/c-family/c-common.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 9c8eed5442a..34566a342bd 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -1338,6 +1338,10 @@ shorten_binary_op (tree result_type, tree op0, tree op1, 
bool bitwise)
   int uns;
   tree type;
 
+  /* Do not shorten vector operations.  */
+  if (VECTOR_TYPE_P (result_type))
+return result_type;
+
   /* Cast OP0 and OP1 to RESULT_TYPE.  Doing so prevents
  excessive narrowing when we call get_narrower below.  For
  example, suppose that OP0 is of unsigned int extended
-- 
2.35.3



[PATCH][RFC] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-23 Thread Richard Biener via Gcc-patches
The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
ICEs when tree checking is enabled.  This should avoid wrong-code
in cases like PR110182 and instead ICE.

It also introduces a TYPE_PRECISION_RAW accessor and adjusts
places I found that are eligible to use that.

This patch requires (at least) the series of patches I will
followup this with.  I have to re-bootstrap / test to look
for further fallout (I've picked this up again after some weeks).

Opinions?

Thanks,
Richard.

* tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
(TYPE_PRECISION_RAW): Provide raw access to the precision
field.
* tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
(gimple_canonical_types_compatible_p): Likewise.
* tree-streamer-out.cc (pack_ts_type_common_value_fields):
Stream TYPE_PRECISION_RAW.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields):
Likewise.
* lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.
---
 gcc/lto-streamer-out.cc  | 2 +-
 gcc/tree-streamer-in.cc  | 2 +-
 gcc/tree-streamer-out.cc | 2 +-
 gcc/tree.cc  | 6 +++---
 gcc/tree.h   | 4 +++-
 5 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 5ab2eb4301e..3432dd434e2 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -1373,7 +1373,7 @@ hash_tree (struct streamer_tree_cache_d *cache, 
hash_map *map,
   if (AGGREGATE_TYPE_P (t))
hstate.add_flag (TYPE_TYPELESS_STORAGE (t));
   hstate.commit_flag ();
-  hstate.add_int (TYPE_PRECISION (t));
+  hstate.add_int (TYPE_PRECISION_RAW (t));
   hstate.add_int (TYPE_ALIGN (t));
   hstate.add_int (TYPE_EMPTY_P (t));
 }
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c803800862c..e6919e463c0 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -387,7 +387,7 @@ unpack_ts_type_common_value_fields (struct bitpack_d *bp, 
tree expr)
 TYPE_TYPELESS_STORAGE (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_EMPTY_P (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_NO_NAMED_ARGS_STDARG_P (expr) = (unsigned) bp_unpack_value (bp, 1);
-  TYPE_PRECISION (expr) = bp_unpack_var_len_unsigned (bp);
+  TYPE_PRECISION_RAW (expr) = bp_unpack_var_len_unsigned (bp);
   SET_TYPE_ALIGN (expr, bp_unpack_var_len_unsigned (bp));
 #ifdef ACCEL_COMPILER
   if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index 5751f77273b..719cbeacf99 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -356,7 +356,7 @@ pack_ts_type_common_value_fields (struct bitpack_d *bp, 
tree expr)
 bp_pack_value (bp, TYPE_TYPELESS_STORAGE (expr), 1);
   bp_pack_value (bp, TYPE_EMPTY_P (expr), 1);
   bp_pack_value (bp, TYPE_NO_NAMED_ARGS_STDARG_P (expr), 1);
-  bp_pack_var_len_unsigned (bp, TYPE_PRECISION (expr));
+  bp_pack_var_len_unsigned (bp, TYPE_PRECISION_RAW (expr));
   bp_pack_var_len_unsigned (bp, TYPE_ALIGN (expr));
 }
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 8e144bc090e..58288efa2e2 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -13423,7 +13423,7 @@ verify_type_variant (const_tree t, tree tv)
}
   verify_variant_match (TYPE_NEEDS_CONSTRUCTING);
 }
-  verify_variant_match (TYPE_PRECISION);
+  verify_variant_match (TYPE_PRECISION_RAW);
   if (RECORD_OR_UNION_TYPE_P (t))
 verify_variant_match (TYPE_TRANSPARENT_AGGR);
   else if (TREE_CODE (t) == ARRAY_TYPE)
@@ -13701,8 +13701,8 @@ gimple_canonical_types_compatible_p (const_tree t1, 
const_tree t2,
   || TREE_CODE (t1) == OFFSET_TYPE
   || POINTER_TYPE_P (t1))
 {
-  /* Can't be the same type if they have different recision.  */
-  if (TYPE_PRECISION (t1) != TYPE_PRECISION (t2))
+  /* Can't be the same type if they have different precision.  */
+  if (TYPE_PRECISION_RAW (t1) != TYPE_PRECISION_RAW (t2))
return false;
 
   /* In some cases the signed and unsigned types are required to be
diff --git a/gcc/tree.h b/gcc/tree.h
index 1854fe4a7d4..1b791335d38 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2191,7 +2191,9 @@ class auto_suppress_location_wrappers
 #define TYPE_SIZE_UNIT(NODE) (TYPE_CHECK (NODE)->type_common.size_unit)
 #define TYPE_POINTER_TO(NODE) (TYPE_CHECK (NODE)->type_common.pointer_to)
 #define TYPE_REFERENCE_TO(NODE) (TYPE_CHECK (NODE)->type_common.reference_to)
-#define TYPE_PRECISION(NODE) (TYPE_CHECK (NODE)->type_common.precision)
+#define TYPE_PRECISION(NODE) \
+  (TREE_NOT_CHECK (TYPE_CHECK (NODE), VECTOR_TYPE)->type_common.precision)
+#define TYPE_PRECISION_RAW(NODE) (TYPE_CHECK (NODE)->type_common.precision)
 #define TYPE_NAME(NODE) (TYPE_CHECK (NODE)->type_common.name)
 #define TYPE_NEXT_VARIANT(NODE) (TYPE_CHECK (NODE)->type_common.next_variant)
 #define TYPE_MAIN_VARIANT(NODE) (TYPE_CHECK (NODE)->type_common.main_variant)
-- 
2.35.3


[PATCH] Optimize vector codegen for invariant loads, fix SLP support

2023-06-23 Thread Richard Biener via Gcc-patches
The following avoids creating duplicate stmts for invariant loads
which was necessary when the vector stmts were in a linked list.
It also fixes SLP support which didn't correctly create the
appropriate number of copies.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-stmts.cc (vectorizable_load): Avoid useless
copies of VMAT_INVARIANT vectorized stmts, fix SLP support.
---
 gcc/tree-vect-stmts.cc | 39 +++
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1b160cecfce..b1b08238dc3 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9612,27 +9612,26 @@ vectorizable_load (vec_info *vinfo,
  gimple_set_vuse (new_stmt, vuse);
  gsi_insert_on_edge_immediate (pe, new_stmt);
}
-  /* These copies are all equivalent, but currently the representation
-requires a separate STMT_VINFO_VEC_STMT for each one.  */
-  gimple_stmt_iterator gsi2 = *gsi;
-  gsi_next ();
-  for (j = 0; j < ncopies; j++)
+  /* These copies are all equivalent.  */
+  if (hoist_p)
+   new_temp = vect_init_vector (vinfo, stmt_info, scalar_dest,
+vectype, NULL);
+  else
{
- if (hoist_p)
-   new_temp = vect_init_vector (vinfo, stmt_info, scalar_dest,
-vectype, NULL);
- else
-   new_temp = vect_init_vector (vinfo, stmt_info, scalar_dest,
-vectype, );
- gimple *new_stmt = SSA_NAME_DEF_STMT (new_temp);
- if (slp)
-   SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
- else
-   {
- if (j == 0)
-   *vec_stmt = new_stmt;
- STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
-   }
+ gimple_stmt_iterator gsi2 = *gsi;
+ gsi_next ();
+ new_temp = vect_init_vector (vinfo, stmt_info, scalar_dest,
+  vectype, );
+   }
+  gimple *new_stmt = SSA_NAME_DEF_STMT (new_temp);
+  if (slp)
+   for (j = 0; j < (int) SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); ++j)
+ SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+  else
+   {
+ for (j = 0; j < ncopies; ++j)
+   STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+ *vec_stmt = new_stmt;
}
   return true;
 }
-- 
2.35.3


Re: Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread 钟居哲
Hi, Richard.

I saw Berhard comments. 

Should I send a V7 to fuse 2 if (final_len) together which is the final version 
to be merged?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-23 16:08
To: juzhe.zhong
CC: gcc-patches; rguenther; rep.dot.nop
Subject: Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Address comments from Richard and Bernhard from V5 patch.
> V6 fixed all issues according their comments.
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
> (internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
> (internal_fn_stored_value_index): Add LEN_MASK_STORE.
> (internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
> * optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
> LEN_MASK_{LOAD,STORE}.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
> (vectorizable_load): Ditto.
 
Given Richard was happy with the previous version and this addresses
my comments from V5: OK, thanks.
 
Richard
 
>
> ---
>  gcc/internal-fn.cc |  37 ++-
>  gcc/optabs-tree.cc |  86 ++--
>  gcc/optabs-tree.h  |   6 +-
>  gcc/tree-vect-stmts.cc | 221 +
>  4 files changed, 267 insertions(+), 83 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..1c2fd487e2a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_SCATTER_STORE:
>return 4;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  default:
>return (conditional_internal_fn_code (fn) != ERROR_MARK
>|| get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,33 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_opno = 3;
> +
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode;
> +  if (!targetm.vectorize.get_mask_mode (mode).exists (_mode))
> + return VECT_PARTIAL_BIAS_UNSUPPORTED;
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_opno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
>  return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
>  return -1;
>  }
>  
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 

Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Address comments from Richard and Bernhard from V5 patch.
> V6 fixed all issues according their comments.
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
> (internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
> (internal_fn_stored_value_index): Add LEN_MASK_STORE.
> (internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
> * optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
> LEN_MASK_{LOAD,STORE}.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
> (vectorizable_load): Ditto.

Given Richard was happy with the previous version and this addresses
my comments from V5: OK, thanks.

Richard

>
> ---
>  gcc/internal-fn.cc |  37 ++-
>  gcc/optabs-tree.cc |  86 ++--
>  gcc/optabs-tree.h  |   6 +-
>  gcc/tree-vect-stmts.cc | 221 +
>  4 files changed, 267 insertions(+), 83 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..1c2fd487e2a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_SCATTER_STORE:
>return 4;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  default:
>return (conditional_internal_fn_code (fn) != ERROR_MARK
> || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,33 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_opno = 3;
> +
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode;
> +  if (!targetm.vectorize.get_mask_mode (mode).exists (_mode))
> + return VECT_PARTIAL_BIAS_UNSUPPORTED;
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_opno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
>   return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
>   return -1;
>  }
>  
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 77bf745ae40..e6ae15939d3 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -543,19 +543,50 @@ target_supports_op_p (tree type, enum tree_code code,
> && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing);
>  }
>  
> -/* Return true if target supports vector masked load/store for mode.  

Re: [PATCH] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

2023-06-23 Thread Mikael Morin

Le 22/06/2023 à 22:23, Harald Anlauf via Fortran a écrit :

Dear all,

gfortran's ABI specifies that actual arguments to CHARACTER(LEN=1),VALUE
dummy arguments are passed by value in the scalar case.  That did work
for constant strings being passed, but not in several other cases, where
pointers were passed, resulting in subsequent random junk...

The attached patch fixes this for the case of a non-constant string
argument.

It does not touch the character,value bind(c) case - this is a different
thing and may need separate work, as Mikael pointed out - and there is
a missed optimization for the case of actual constant string arguments
of length larger than 1: it appears that the full string is pushed to
the stack.  I did not address that, as the primary aim here is to get
correctly working code.  (I added a TODO in a comment.)

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK, thanks.


Thanks,
Harald





Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread Richard Sandiford via Gcc-patches
Bernhard Reutner-Fischer  writes:
> On 23 June 2023 01:51:12 CEST, juzhe.zh...@rivai.ai wrote:
>>From: Ju-Zhe Zhong 
>
> I am sorry but I somehow overlooked a trivial spot in V5.
> Nit which does not warrant an immediate next version, but please consider it 
> before pushing iff approved:
>
>>+   if (final_len)
>>+ {
>>+   signed char biasval
>>+ = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
>>+
>>+   bias = build_int_cst (intQI_type_node, biasval);
>>+ }
>>+
>>+   /* Arguments are ready.  Create the new vector stmt.  */
>>+   if (final_len)
>>+ {
>
> Fuse the block below into the one above as the condition seems to be 
> identical?

Yeah, true, but I think the idea is that the code above “Arguments are
ready” is calculating argument values, and the code after it is creating
code.  These are two separate steps, and the fact that the two final_len
blocks end up being consecutive is something of a coincidence.

So personally I think we should keep the structure in the patch.

Thanks,
Richard


Re: [PATCH V6] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread Bernhard Reutner-Fischer via Gcc-patches
On 23 June 2023 01:51:12 CEST, juzhe.zh...@rivai.ai wrote:
>From: Ju-Zhe Zhong 

I am sorry but I somehow overlooked a trivial spot in V5.
Nit which does not warrant an immediate next version, but please consider it 
before pushing iff approved:

>+if (final_len)
>+  {
>+signed char biasval
>+  = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
>+
>+bias = build_int_cst (intQI_type_node, biasval);
>+  }
>+
>+/* Arguments are ready.  Create the new vector stmt.  */
>+if (final_len)
>+  {

Fuse the block below into the one above as the condition seems to be identical?
thanks,

>+gcall *call;
> tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT);
> /* Need conversion if it's wrapped with VnQI.  */
> if (vmode != new_vmode)


Re: [PATCH v5] tree-ssa-sink: Improve code sinking pass

2023-06-23 Thread Ajit Agarwal via Gcc-patches



On 23/06/23 7:44 am, Peter Bergner wrote:
> On 6/1/23 11:54 PM, Ajit Agarwal via Gcc-patches wrote:
>>
>>
>> On 01/06/23 2:06 pm, Bernhard Reutner-Fischer wrote:
>>> On 1 June 2023 09:20:08 CEST, Ajit Agarwal  wrote:
 Hello All:

 This patch improves code sinking pass to sink statements before call to 
 reduce
 register pressure.
 Review comments are incorporated.
>>>
>>> Hi Ajit!
>>>
>>> I had two comments for v4 that you did not address in v5 or followed up.
>>> thanks,
>>
>> Which comments I didn't address. Please let me know.
> 
> I believe he's referring to these two comments:
> 
>   > +   && dominated_by_p (CDI_DOMINATORS, new_best_bb, early_bb))
>   > + {
>   > +   if (def_use_same_block (use))
>   > + return best_bb;
>   > +
>   > +   return new_best_bb;
>   > + }
>   > + return best_bb;
>   > +}
>   >  
> 
>   Many returns.
>   I'd have said
> && !def_use_same_block (use)
>   return new_best_bb;
>   else
>   return best_bb;
> 
>   and rephrase the comment above list of Things to consider accordingly.
> 
> 
> I agree with Bernhard's comment that it could be rewritten to be clearer.
> Although, the "else" isn't really required.  So Bernhard's version would
> look like:
> 
>   if (new_best_bb
>   && use
>   && new_best_bb != best_bb
>   && new_best_bb != early_bb
>   && !is_gimple_call (stmt)
>   && gsi_end_p (gsi_start_phis (new_best_bb))
>   && gimple_bb (use) != early_bb
>   && !is_gimple_call (use)
>   && dominated_by_p (CDI_POST_DOMINATORS, new_best_bb, gimple_bb (use))
>   && dominated_by_p (CDI_DOMINATORS, new_best_bb, early_bb)
>   && !def_use_same_block (use))
> return new_best_bb;
>   else
> return best_bb;
> 
> ...or just:
> 
>   if (new_best_bb
>   && use
>   && new_best_bb != best_bb
>   && new_best_bb != early_bb
>   && !is_gimple_call (stmt)
>   && gsi_end_p (gsi_start_phis (new_best_bb))
>   && gimple_bb (use) != early_bb
>   && !is_gimple_call (use)
>   && dominated_by_p (CDI_POST_DOMINATORS, new_best_bb, gimple_bb (use))
>   && dominated_by_p (CDI_DOMINATORS, new_best_bb, early_bb)
>   && !def_use_same_block (use))
> return new_best_bb;
> 
>   return best_bb;
> 
> 
> Either works.

Thanks Peter. I will incorporate and send the new version of the patch.

> 
> 
> Peter
> 


Re: Re: [PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread 钟居哲
Thanks Richi so much.

I am gonna wait for Richard's final approve for V6 for :
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622560.html 
which has some small fixes for his comments on V5.

And Bootstrap has passed, regression is running. I am gonna wait for regression 
too.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-23 14:21
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
On Thu, 22 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
 
OK.
 
Thanks,
Richard.
 
> gcc/ChangeLog:
> 
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
> (internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
> (internal_fn_stored_value_index): Add LEN_MASK_STORE.
> (internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
> * optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
> LEN_MASK_{LOAD,STORE}.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
> (vectorizable_load): Ditto.
> 
> ---
>  gcc/internal-fn.cc |  36 ++-
>  gcc/optabs-tree.cc |  85 +---
>  gcc/optabs-tree.h  |   6 +-
>  gcc/tree-vect-stmts.cc | 220 +
>  4 files changed, 265 insertions(+), 82 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..b90bd85df2c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_SCATTER_STORE:
>return 4;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  default:
>return (conditional_internal_fn_code (fn) != ERROR_MARK
>|| get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,32 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_opno = 3;
> +
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_opno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
>  return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
>  return -1;
>  }
>  
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 77bf745ae40..ab9514fc8e0 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -543,19 +543,49 @@ 

[PATCH] Improve vector_vector_composition_type

2023-06-23 Thread Richard Biener via Gcc-patches
We sometimes get to ask to decompose, say V2DFmode into two halves.
Currently this results in composing it from two DImode pieces
instead of the obvious two DFmode pieces.  The following adjusts
vector_vector_composition_type for this trivial case and avoids
a VIEW_CONVERT_EXPR in the initial code generation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-stmts.cc (vector_vector_composition_type):
Handle composition of a vector from a number of elements that
happens to match its number of lanes.
---
 gcc/tree-vect-stmts.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ae24f3e66e6..9e046ced7c6 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2143,6 +2143,14 @@ vector_vector_composition_type (tree vtype, poly_uint64 
nelts, tree *ptype)
   if (!VECTOR_MODE_P (vmode))
 return NULL_TREE;
 
+  /* When we are asked to compose the vector from its components let
+ that happen directly.  */
+  if (known_eq (TYPE_VECTOR_SUBPARTS (vtype), nelts))
+{
+  *ptype = TREE_TYPE (vtype);
+  return vtype;
+}
+
   poly_uint64 vbsize = GET_MODE_BITSIZE (vmode);
   unsigned int pbsize;
   if (constant_multiple_p (vbsize, nelts, ))
-- 
2.35.3


Re: [PATCH V5] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-23 Thread Richard Biener via Gcc-patches
On Thu, 22 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for 
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
> (internal_fn_mask_index): Add LEN_MASK_{LOAD,STORE}.
> (internal_fn_stored_value_index): Add LEN_MASK_STORE.
> (internal_len_load_store_bias):  Add LEN_MASK_{LOAD,STORE}.
> * optabs-tree.cc (can_vec_mask_load_store_p): Adapt for 
> LEN_MASK_{LOAD,STORE}.
> (get_len_load_store_mode): Ditto.
> * optabs-tree.h (can_vec_mask_load_store_p): Ditto.
> (get_len_load_store_mode): Ditto.
> * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
> (get_all_ones_mask): New function.
> (vectorizable_store): Apply LEN_MASK_{LOAD,STORE} into vectorizer.
> (vectorizable_load): Ditto.
> 
> ---
>  gcc/internal-fn.cc |  36 ++-
>  gcc/optabs-tree.cc |  85 +---
>  gcc/optabs-tree.h  |   6 +-
>  gcc/tree-vect-stmts.cc | 220 +
>  4 files changed, 265 insertions(+), 82 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..b90bd85df2c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>   * OPTAB.  */
>  
>  static void
> -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
> optab)
>  {
>class expand_operand ops[5];
>tree type, lhs, rhs, maskt, biast;
> @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> -  rhs = gimple_call_arg (stmt, 3);
> +  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
>  case IFN_GATHER_LOAD:
>  case IFN_MASK_GATHER_LOAD:
>  case IFN_LEN_LOAD:
> +case IFN_LEN_MASK_LOAD:
>return true;
>  
>  default:
> @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
>  case IFN_SCATTER_STORE:
>  case IFN_MASK_SCATTER_STORE:
>  case IFN_LEN_STORE:
> +case IFN_LEN_MASK_STORE:
>return true;
>  
>  default:
> @@ -4498,6 +4500,10 @@ internal_fn_mask_index (internal_fn fn)
>  case IFN_MASK_SCATTER_STORE:
>return 4;
>  
> +case IFN_LEN_MASK_LOAD:
> +case IFN_LEN_MASK_STORE:
> +  return 3;
> +
>  default:
>return (conditional_internal_fn_code (fn) != ERROR_MARK
> || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
>  case IFN_LEN_STORE:
>return 3;
>  
> +case IFN_LEN_MASK_STORE:
> +  return 4;
> +
>  default:
>return -1;
>  }
> @@ -4583,13 +4592,32 @@ internal_len_load_store_bias (internal_fn ifn, 
> machine_mode mode)
>  {
>optab optab = direct_internal_fn_optab (ifn);
>insn_code icode = direct_optab_handler (optab, mode);
> +  int bias_opno = 3;
> +
> +  if (icode == CODE_FOR_nothing)
> +{
> +  machine_mode mask_mode
> + = targetm.vectorize.get_mask_mode (mode).require ();
> +  if (ifn == IFN_LEN_LOAD)
> + {
> +   /* Try LEN_MASK_LOAD.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
> + }
> +  else
> + {
> +   /* Try LEN_MASK_STORE.  */
> +   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
> + }
> +  icode = convert_optab_handler (optab, mode, mask_mode);
> +  bias_opno = 4;
> +}
>  
>if (icode != CODE_FOR_nothing)
>  {
>/* For now we only support biases of 0 or -1.  Try both of them.  */
> -  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (0)))
>   return 0;
> -  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
> +  if (insn_operand_matches (icode, bias_opno, GEN_INT (-1)))
>   return -1;
>  }
>  
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 77bf745ae40..ab9514fc8e0 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -543,19 +543,49 @@ target_supports_op_p (tree type, enum tree_code code,
> && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing);
>  }
>  
> -/* Return true if target supports vector masked load/store for mode.  */
> +/* Return true if the target has support for masked load/store.
> +   We can support masked load/store by either mask{load,store}
> +   or len_mask{load,store}.
> +   This helper function checks whether target supports masked

  1   2   >