Re: [RFC: PATCH] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

2022-08-04 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 4, 2022 at 4:19 PM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, Aug 4, 2022 at 6:29 AM liuhongt via Gcc-patches
>  wrote:
> >
> > For neg, the patch create a vec_init as [ a, -a, a, -a, ...  ] and no
> > vec_step is needed to update vectorized iv since vf is always multiple
> > of 2(negative * negative is positive).
> >
> > For shift, the patch create a vec_init as [ a, a >> c, a >> 2*c, ..]
> > as vec_step as [ c * nunits, c * nunits, c * nunits, ... ], vectorized iv is
> > updated as vec_def = vec_init >>/<< vec_step.
> >
> > For mul, the patch create a vec_init as [ a, a * c, a * pow(c, 2), ..]
> > as vec_step as [ pow(c,nunits), pow(c,nunits),...] iv is updated as vec_def 
> > =
> > vec_init * vec_step.
> >
> > The patch handles nonlinear iv for
> > 1. Integer type only, floating point is not handled.
> > 2. No slp_node.
> > 3. iv_loop should be same as vector loop, not nested loop.
> > 4. No UD is created, for mul, no UD overlow for pow (c, vf), for
> >shift, shift count should be less than type precision.
> >
> > Bootstrapped and regression tested on x86_64-pc-linux-gnu{-m32,}.
> > There's some cases observed in SPEC2017, but no big performance impact.
> >
> > Any comments?
>
> Looks good overall - a few comments inline.  Also can you please add
> SLP support?
> I've tried hard to fill in gaps where SLP support is missing since my
> goal is still to get
> rid of non-SLP.
>
> > gcc/ChangeLog:
> >
> > PR tree-optimization/103144
> > * tree-vect-loop.cc (vect_is_nonlinear_iv_evolution): New function.
> > (vect_analyze_scalar_cycles_1): Detect nonlinear iv by upper 
> > function.
> > (vect_create_nonlinear_iv_init): New function.
> > (vect_create_nonlinear_iv_step): Ditto
> > (vect_create_nonlinear_iv_vec_step): Ditto
> > (vect_update_nonlinear_iv): Ditto
> > (vectorizable_nonlinear_induction): Ditto.
> > (vectorizable_induction): Call
> > vectorizable_nonlinear_induction when induction_type is not
> > vect_step_op_add.
> > * tree-vectorizer.h (enum vect_induction_op_type): New enum.
> > (STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE): New Macro.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr103144-mul-1.c: New test.
> > * gcc.target/i386/pr103144-mul-2.c: New test.
> > * gcc.target/i386/pr103144-neg-1.c: New test.
> > * gcc.target/i386/pr103144-neg-2.c: New test.
> > * gcc.target/i386/pr103144-shift-1.c: New test.
> > * gcc.target/i386/pr103144-shift-2.c: New test.
> > ---
> >  .../gcc.target/i386/pr103144-mul-1.c  |  25 +
> >  .../gcc.target/i386/pr103144-mul-2.c  |  43 ++
> >  .../gcc.target/i386/pr103144-neg-1.c  |  25 +
> >  .../gcc.target/i386/pr103144-neg-2.c  |  36 ++
> >  .../gcc.target/i386/pr103144-shift-1.c|  34 +
> >  .../gcc.target/i386/pr103144-shift-2.c|  61 ++
> >  gcc/tree-vect-loop.cc | 604 +-
> >  gcc/tree-vectorizer.h |  11 +
> >  8 files changed, 834 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-neg-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-neg-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-shift-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-shift-2.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c 
> > b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
> > new file mode 100644
> > index 000..2357541d95d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited 
> > -fdump-tree-vect-details -mprefer-vector-width=256" } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> > +
> > +#define N 1
> > +void
> > +foo_mul (int* a, int b)
> > +{
> > +  for (int i = 0; i != N; i++)
> > +{
> > +  a[i] = b;
> > +  b *= 3;
> > +}
> > +}
> > +
> > +void
> > +foo_mul_const (int* a)
> > +{
> > +  int b = 1;
> > +  for (int i = 0; i != N; i++)
> > +{
> > +  a[i] = b;
> > +  b *= 3;
> > +}
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c 
> > b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
> > new file mode 100644
> > index 000..4ea53e44658
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
> > @@ -0,0 +1,43 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited 
> > -mprefer-vector-width=256" } */
> > +/* { dg-require-effective-target avx2 } */
> > +
> > +#include "avx2-check.h"
> > +#include 
> > +#include "pr103144-mul-1.c"

[r13-1956 Regression] FAIL: libstdc++-prettyprinters/cxx11.cc print tpl on Linux/x86_64

2022-08-04 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

075683767abe15b936ad41792da6ee71e9eda449 is the first bad commit
commit 075683767abe15b936ad41792da6ee71e9eda449
Author: Ulrich Drepper 
Date:   Thu Aug 4 13:18:05 2022 +0200

Adjust index number of tuple pretty printer

caused

FAIL: libstdc++-prettyprinters/48362.cc print t2
FAIL: libstdc++-prettyprinters/cxx11.cc print rtpl
FAIL: libstdc++-prettyprinters/cxx11.cc print tpl

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-1956/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="prettyprinters.exp=libstdc++-prettyprinters/48362.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="prettyprinters.exp=libstdc++-prettyprinters/48362.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="prettyprinters.exp=libstdc++-prettyprinters/48362.cc 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="prettyprinters.exp=libstdc++-prettyprinters/48362.cc 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="prettyprinters.exp=libstdc++-prettyprinters/cxx11.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="prettyprinters.exp=libstdc++-prettyprinters/cxx11.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="prettyprinters.exp=libstdc++-prettyprinters/cxx11.cc 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="prettyprinters.exp=libstdc++-prettyprinters/cxx11.cc 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-04 Thread Lulu Cheng



在 2022/8/5 上午11:45, Xi Ruoyao 写道:

On Fri, 2022-08-05 at 11:34 +0800, Xi Ruoyao via Gcc-patches wrote:


Or maybe we should just use a PC-relative addressing with 4 instructions
instead of GOT for -fno-PIC?

Not possible, Glibc does not support R_LARCH_PCALA* relocations in
ld.so.  So we still need a -mno-got (or something) option to disable GOT
for special cases like the kernel.


Both way consumes 16 bytes (4 instructions
for PC-relative, 2 instructions and a 64-bit GOT entry for GOT) and PC-
relative may be more cache friendly.   But such a major change cannot be
backported for 12.2 IMO.


I'm very sorry, my understanding of the precpu variable is wrong, I just 
read the code of the kernel you submitted, this precpu variable not only 
has a large offset but also has an uncertain address when compiling, so 
no matter whether it is addressed with pcrel Still got addressing needs 
dynamic relocation when loading. It seems that accessing through the got 
table is a better choice.


The name movable is also very vivid to describe this function in the 
kernel, indicating that the address of the variable can be changed at 
will. But this name is more difficult to understand in gcc, I have no 
opinion on other, can this name be changed?




Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-04 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-08-05 at 11:34 +0800, Xi Ruoyao via Gcc-patches wrote:

> Or maybe we should just use a PC-relative addressing with 4 instructions
> instead of GOT for -fno-PIC?

Not possible, Glibc does not support R_LARCH_PCALA* relocations in
ld.so.  So we still need a -mno-got (or something) option to disable GOT
for special cases like the kernel.

> Both way consumes 16 bytes (4 instructions
> for PC-relative, 2 instructions and a 64-bit GOT entry for GOT) and PC-
> relative may be more cache friendly.   But such a major change cannot be
> backported for 12.2 IMO.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-04 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-08-05 at 10:51 +0800, Xi Ruoyao via Gcc-patches wrote:

> > If it is accessed through the GOT table, dynamic relocation is required 
> > when the module is loaded.
> 
> Dynamic relocation is required when the module is loaded anyway.  The
> .ko modules are actually relocatable ELF objects (produced by ld -r) and
> the module loader has to perform some work of a normal linker.
> 
> > And accessing the got table may have a cache miss.
> 
> /* snip */
> 
> > So my idea is "model(normal)","model (large)" etc.
> 
> Then should we have an option to disable GOT globally?  Maybe for kernel
> we'll just "-mno-got -mcmodel=large" (or "extreme"?  The kernel image is
> loaded at 0x9000 and the modules are above
> 0x so we need to handle 64-bit offset).

Or maybe we should just use a PC-relative addressing with 4 instructions
instead of GOT for -fno-PIC?  Both way consumes 16 bytes (4 instructions
for PC-relative, 2 instructions and a 64-bit GOT entry for GOT) and PC-
relative may be more cache friendly.   But such a major change cannot be
backported for 12.2 IMO.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] configure: respect --with-build-time-tools [PR43301]

2022-08-04 Thread Alexandre Oliva via Gcc-patches
On Aug  3, 2022, Eric Gallager  wrote:

> OK, after reviewing the surrounding code, I think it makes sense; do
> you want to commit it, or should I?

Please proceed, if you have the cycles to give it a round of meaningful
testing (as in, including reconfigure with configure-generated as, and
also with external as in place).

Adjusting the analogous test patterns covering the other tools and
generated scripts would surely be welcome as well ;-)

Thanks!

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH, rs6000] Correct return value of check_p9modulo_hw_available

2022-08-04 Thread HAO CHEN GUI via Gcc-patches
Hi Segher,
  Thanks so much for your explanation. Now I have a clear picture about
the usage of return value. Patch was committed as r13-1971.

Thanks
Gui Haochen


On 5/8/2022 上午 1:09, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Aug 04, 2022 at 05:55:20PM +0800, HAO CHEN GUI wrote:
>>   This patch corrects return value of check_p9modulo_hw_available. It should
>> return 0 when p9modulo is supported.
> 
> It would be harder to make such mistakes if it used exit() explicitly,
> so that the reader is reminded the shell semantics are used here instead
> of the C conventions.
> 
>> -return (r == 2);
>> +return (r != 2);
> 
> so that then would be smth like
> 
>   if (r == 2)
>   exit (0);
>   else
>   exit (1);
> 
> (which makes the exit code for failure explicit as well).
> 
> Terse is good.  Explicit is good as well :-)
> 
> (You don't have to make this change here of course, but keep it in mind
> for the future :-) )
> 
> 
> Segher


Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-04 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-08-05 at 10:38 +0800, Lulu Cheng wrote:

> > > I'm working on the implementation of specifing attributes of variables 
> > > for other architectures.
> > > If the address is obtained through the GOT table and 4 instructions, 
> > > there is not much difference in performance.
> > In this case I still prefer a GOT table entry because for 4-instruction
> > absolute addressing sequence we'll need to implement 4 new relocation
> > types in the kernel module loader.
> If it is accessed through the GOT table, dynamic relocation is required when 
> the module is loaded.

Dynamic relocation is required when the module is loaded anyway.  The
.ko modules are actually relocatable ELF objects (produced by ld -r) and
the module loader has to perform some work of a normal linker.

> And accessing the got table may have a cache miss.

/* snip */

> So my idea is "model(normal)","model (large)" etc.

Then should we have an option to disable GOT globally?  Maybe for kernel
we'll just "-mno-got -mcmodel=large" (or "extreme"?  The kernel image is
loaded at 0x9000 and the modules are above
0x so we need to handle 64-bit offset).
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[COMMITED] [RSIC-V] Fix 32bit riscv with zbs extension enabled

2022-08-04 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here was a disconnect between splittable_const_int_operand
predicate and the function riscv_build_integer_1 for 32bits with zbs enabled.
The splittable_const_int_operand predicate had a check for TARGET_64BIT which
was not needed so this patch removed it.

Committed as obvious after a build for risc32-elf configured with 
--with-arch=rv32imac_zba_zbb_zbc_zbs.

Thanks,
Andrew Pinski

gcc/ChangeLog:

* config/riscv/predicates.md (splittable_const_int_operand):
Remove the check for TARGET_64BIT for single bit const values.
---
 gcc/config/riscv/predicates.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 90db5dfcdd5..e98db2cb574 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -76,7 +76,7 @@ (define_predicate "splittable_const_int_operand"
 
   /* Check whether the constant can be loaded in a single
  instruction with zbs extensions.  */
-  if (TARGET_64BIT && TARGET_ZBS && SINGLE_BIT_MASK_OPERAND (INTVAL (op)))
+  if (TARGET_ZBS && SINGLE_BIT_MASK_OPERAND (INTVAL (op)))
 return false;
 
   /* Otherwise check whether the constant can be loaded in a single
-- 
2.27.0



Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-04 Thread Lulu Cheng



在 2022/8/5 上午9:28, Xi Ruoyao 写道:

On Fri, 2022-08-05 at 09:05 +0800, Lulu Cheng wrote:

I'm working on the implementation of specifing attributes of variables for 
other architectures.
If the address is obtained through the GOT table and 4 instructions, there is 
not much difference in performance.

In this case I still prefer a GOT table entry because for 4-instruction
absolute addressing sequence we'll need to implement 4 new relocation
types in the kernel module loader.


If it is accessed through the GOT table, dynamic relocation is required 
when the module is loaded. And accessing the got table may have a cache 
miss.



Is it more reasonable for us to refer to the implementation of the model 
attribute under the IA64 architecture?

Maybe we can use "model(got)", "model(abs)", "model(pcrel)" etc.


We have a set of instruction implementations that can get a relative pc 
64-bit offset:


  "pcalau12i %1,%%pc_hi20(%3);"

  "addi.d %2,$r0,%%pc_lo12(%3);"
  "lu32i.d %2,%%pc64_lo20(%3);"
  "lu52i.d %2,%2,%%pc64_hi12(%3);"

  "add.d %1,%1,%2;",

This set of instructions can be selected according to the size of the 
offset:


  "pcalau12i %1,%%pc_hi20(%3);"

  "addi.d %2,$r0,%%pc_lo12(%3);"

  "lu32i.d %2,%%pc64_lo20(%3);"

  "add.d %1,%1,%2;",

for offset within signed 52 bits.

or

  "pcalau12i %1,%%pc_hi20(%3);"

  "addi.d %2,$r0,%%pc_lo12(%3);"
  "lu32i.d %2,%%pc64_lo20(%3);"
  "lu52i.d %2,%2,%%pc64_hi12(%3);"

  "add.d %1,%1,%2;"

for offset within signed 64 bits.

So my idea is "model(normal)","model (large)" etc.


I will compare the performance of the two soon. Do you know the approximate 
release date of GCC 12.2?
I also want to fix this before 12.2 is released.

GCC 12.2 rc1 will be frozen on Aug 12th.





Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-04 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-08-05 at 09:05 +0800, Lulu Cheng wrote:
> I'm working on the implementation of specifing attributes of variables for 
> other architectures.
> If the address is obtained through the GOT table and 4 instructions, there is 
> not much difference in performance.

In this case I still prefer a GOT table entry because for 4-instruction
absolute addressing sequence we'll need to implement 4 new relocation
types in the kernel module loader.

> Is it more reasonable for us to refer to the implementation of the model 
> attribute under the IA64 architecture?

Maybe we can use "model(got)", "model(abs)", "model(pcrel)" etc.

> I will compare the performance of the two soon. Do you know the approximate 
> release date of GCC 12.2?
> I also want to fix this before 12.2 is released.

GCC 12.2 rc1 will be frozen on Aug 12th. 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] place `const volatile' objects in read-only sections

2022-08-04 Thread Jose E. Marchesi via Gcc-patches


Hi people!

First of all, a bit of context.

It is common for C BPF programs to use variables that are implicitly set
by the underlying BPF machinery and not by the program itself.  It is
also necessary for these variables to be stored in read-only storage so
the BPF verifier recognizes them as such.  This leads to declarations
using both `const' and `volatile' qualifiers, like this:

  const volatile unsigned char is_allow_list = 0;

Where `volatile' is used to avoid the compiler to optimize out the
variable, or turn it into a constant, and `const' to make sure it is
placed in .rodata.

Now, it happens that:

- GCC places `const volatile' objects in the .data section, under the
  assumption that `volatile' somehow voids the `const'.

- LLVM places `const volatile' objects in .rodata, under the
  assumption that `volatile' is orthogonal to `const'.

So there is a divergence, and this divergence has practical
consequences: it makes BPF programs compiled with GCC to not work
properly.

When looking into this, I found this bugzilla:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25521
  "change semantics of const volatile variables"

which was filed back in 2005.  This report was already asking to put
`const volatile' objects in .rodata, questioning the current behavior.

While discussing this in the #gcc IRC channel I was pointed out to the
following excerpt from the C18 spec:

   6.7.3 Type qualifiers / 5 The properties associated with qualified
 types are meaningful only for expressions that are
 lval-values [note 135]


   135) The implementation may place a const object that is not
volatile in a read-only region of storage. Moreover, the
implementation need not allocate storage for such an object if
its $ address is never used.

This footnote may be interpreted as if const objects that are volatile
shouldn't be put in read-only storage.  Even if I was not very convinced
of that interpretation (see my earlier comment in BZ 25521) I filed the
following issue in the LLVM tracker in order to discuss the matter:

  https://github.com/llvm/llvm-project/issues/56468

As you can see, Aaron Ballman, one of the LLVM hackers, asked the WG14
reflectors about this.  He reported back that the reflectors consider
footnote 135 has not normative value.

So, not having a normative mandate on either direction, there are two
options:

a) To change GCC to place `const volatile' objects in .rodata instead
   of .data.

b) To change LLVM to place `const volatile' objects in .data instead
   of .rodata.

Considering that:

- One target (bpf-unknown-none) breaks with the current GCC behavior.

- No target/platform relies on the GCC behavior, that we know.  (And it
  is unlikely there is any, at least for targets also supported by
  LLVM.)

- Changing the LLVM behavior at this point would be very severely
  traumatic for the BPF people and their users.

I think the right thing to do is a).
Therefore this patch.

A note about the patch itself:

I am not that familiar with the middle-end and in this patch I am
assuming that a `var|constructor + SIDE_EFFECTS' is the result of
`volatile' (or an equivalent language construction) and nothing else.
It would be good if some middle-end wizard could confirm this.

Regtested in x86_64-linux-gnu and bpf-unknown-none.
No regressions observed.

gcc/ChangeLog:

PR middle-end/25521
* varasm.cc (categorize_decl_for_section): Place `const volatile'
objects in read-only sections.
(default_select_section): Likewise.
---
 gcc/varasm.cc | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 4db8506b106..7864db11faf 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -6971,7 +6971,6 @@ default_select_section (tree decl, int reloc,
 {
   if (! ((flag_pic && reloc)
 || !TREE_READONLY (decl)
-|| TREE_SIDE_EFFECTS (decl)
 || !TREE_CONSTANT (decl)))
return readonly_data_section;
 }
@@ -7005,7 +7004,6 @@ categorize_decl_for_section (const_tree decl, int reloc)
   if (bss_initializer_p (decl))
ret = SECCAT_BSS;
   else if (! TREE_READONLY (decl)
-  || TREE_SIDE_EFFECTS (decl)
   || (DECL_INITIAL (decl)
   && ! TREE_CONSTANT (DECL_INITIAL (decl
{
@@ -7046,7 +7044,6 @@ categorize_decl_for_section (const_tree decl, int reloc)
   else if (TREE_CODE (decl) == CONSTRUCTOR)
 {
   if ((reloc & targetm.asm_out.reloc_rw_mask ())
- || TREE_SIDE_EFFECTS (decl)
  || ! TREE_CONSTANT (decl))
ret = SECCAT_DATA;
   else
-- 
2.30.2



Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-04 Thread Lulu Cheng
I'm working on the implementation of specifing attributes of variables 
for other architectures. If the address is obtained through the GOT 
table and 4 instructions, there is not much difference in performance. 
Is it more reasonable for us to refer to the implementation of the model 
attribute under the IA64 architecture? I will compare the performance of 
the two soon. Do you know the approximate release date of GCC 12.2? I 
also want to fix this before 12.2 is released. Thanks!


在 2022/8/4 下午3:47, Xi Ruoyao 写道:

On Wed, 2022-08-03 at 11:10 +0800, Xi Ruoyao via Gcc-patches wrote:


I'd like to wait for the kernel team to test the performance data of
the two implementations before deciding whether to support this
attribute.

What do you think?

Perhaps, I can't access my dev system now anyway (I've configured the
SSH access but then a sudden power surge happened and I didn't
configured automatically power on :( )

Hi folks,

Can someone perform a bench to see if a four-instruction immediate load
sequence can outperform GOT or vice versa?  I cannot access my test
system in at least 1 week, and I may be busy preparing Linux From
Scratch 11.2 release in the remaining of August.

Note: if the four-instruction immediate load sequence outperforms GOT,
we should consider use immediate load instead of GOT for -fno-PIC by
default.

P.S. It seems I have trouble accessing gcc400.fsffrance.org.  I have a C
Farm account and I've already put

Host gcc400.fsffrance.org
Port 25465

in ~/.ssh/config, and I can access other C farm machines w/o problem.
But:

$ ssh gcc400.fsffrance.org
xry...@gcc400.fsffrance.org: Permission denied 
(publickey,keyboard-interactive).

If you know the administrator of the C farm machine, can you tell him to

check the configuration?  If I can access it I may use some time to
perform the bench (in userspace of course) myself.  Thanks.



[PATCH][_GLIBCXX_DEBUG] Refine singular iterator state

2022-08-04 Thread François Dumont via Gcc-patches
This an old patch I had prepared a long time ago, I don't think I ever 
submitted it.


    libstdc++: [_GLIBCXX_DEBUG] Do not consider detached iterators as 
value-initialized


    An attach iterator has its _M_version set to something != 0. This 
value shall be preserved
    when detaching it so that the iterator does not look like a value 
initialized one.


    libstdc++-v3/ChangeLog:

    * include/debug/formatter.h (__singular_value_init): New 
_Iterator_state enum entry.
    (_Parameter<>(const _Safe_iterator<>&, const char*, 
_Is_iterator)): Check if iterator

    parameter is value-initialized.
    (_Parameter<>(const _Safe_local_iterator<>&, const char*, 
_Is_iterator)): Likewise.
    * include/debug/safe_iterator.h 
(_Safe_iterator<>::_M_value_initialized()): New. Adapt

    checks.
    * include/debug/safe_local_iterator.h 
(_Safe_local_iterator<>::_M_value_initialized()): New.

    Adapt checks.
    * src/c++11/debug.cc (_Safe_iterator_base::_M_reset): Do 
not reset _M_version.
    (print_field(PrintContext&, const _Parameter&, const 
char*)): Adapt state_names.
    * testsuite/23_containers/deque/debug/iterator1_neg.cc: New 
test.
    * testsuite/23_containers/deque/debug/iterator2_neg.cc: New 
test.
    * 
testsuite/23_containers/forward_list/debug/iterator1_neg.cc: New test.
    * 
testsuite/23_containers/forward_list/debug/iterator2_neg.cc: New test.


Tested under Linux x86_64 _GLIBCXX_DEBUG mode.

Ok to commit ?

François
diff --git a/libstdc++-v3/include/debug/formatter.h b/libstdc++-v3/include/debug/formatter.h
index 80e8ba46d1e..748d4fbfea4 100644
--- a/libstdc++-v3/include/debug/formatter.h
+++ b/libstdc++-v3/include/debug/formatter.h
@@ -185,6 +185,7 @@ namespace __gnu_debug
   __rbegin,		// dereferenceable, and at the reverse-beginning
   __rmiddle,	// reverse-dereferenceable, not at the reverse-beginning
   __rend,		// reverse-past-the-end
+  __singular_value_init,	// singular, value initialized
   __last_state
 };
 
@@ -280,7 +281,12 @@ namespace __gnu_debug
 	  _M_variant._M_iterator._M_seq_type = _GLIBCXX_TYPEID(_Sequence);
 
 	  if (__it._M_singular())
-	_M_variant._M_iterator._M_state = __singular;
+	{
+	  if (__it._M_value_initialized())
+		_M_variant._M_iterator._M_state = __singular_value_init;
+	  else
+		_M_variant._M_iterator._M_state = __singular;
+	}
 	  else
 	{
 	  if (__it._M_is_before_begin())
@@ -308,7 +314,12 @@ namespace __gnu_debug
 	  _M_variant._M_iterator._M_seq_type = _GLIBCXX_TYPEID(_Sequence);
 
 	  if (__it._M_singular())
-	_M_variant._M_iterator._M_state = __singular;
+	{
+	  if (__it._M_value_initialized())
+		_M_variant._M_iterator._M_state = __singular_value_init;
+	  else
+		_M_variant._M_iterator._M_state = __singular;
+	}
 	  else
 	{
 	  if (__it._M_is_end())
diff --git a/libstdc++-v3/include/debug/safe_iterator.h b/libstdc++-v3/include/debug/safe_iterator.h
index d613933e236..33f7a86478a 100644
--- a/libstdc++-v3/include/debug/safe_iterator.h
+++ b/libstdc++-v3/include/debug/safe_iterator.h
@@ -41,8 +41,8 @@
 
 #define _GLIBCXX_DEBUG_VERIFY_OPERANDS(_Lhs, _Rhs, _BadMsgId, _DiffMsgId) \
   _GLIBCXX_DEBUG_VERIFY(!_Lhs._M_singular() && !_Rhs._M_singular()	\
-			|| (_Lhs.base() == _Iterator()			\
-			&& _Rhs.base() == _Iterator()),		\
+			|| (_Lhs._M_value_initialized()			\
+			&& _Rhs._M_value_initialized()),		\
 			_M_message(_BadMsgId)\
 			._M_iterator(_Lhs, #_Lhs)			\
 			._M_iterator(_Rhs, #_Rhs));			\
@@ -177,7 +177,7 @@ namespace __gnu_debug
 	// _GLIBCXX_RESOLVE_LIB_DEFECTS
 	// DR 408. Is vector > forbidden?
 	_GLIBCXX_DEBUG_VERIFY(!__x._M_singular()
-			  || __x.base() == _Iterator(),
+			  || __x._M_value_initialized(),
 			  _M_message(__msg_init_copy_singular)
 			  ._M_iterator(*this, "this")
 			  ._M_iterator(__x, "other"));
@@ -193,7 +193,7 @@ namespace __gnu_debug
   : _Iter_base()
   {
 	_GLIBCXX_DEBUG_VERIFY(!__x._M_singular()
-			  || __x.base() == _Iterator(),
+			  || __x._M_value_initialized(),
 			  _M_message(__msg_init_copy_singular)
 			  ._M_iterator(*this, "this")
 			  ._M_iterator(__x, "other"));
@@ -220,7 +220,7 @@ namespace __gnu_debug
 	  // _GLIBCXX_RESOLVE_LIB_DEFECTS
 	  // DR 408. Is vector > forbidden?
 	  _GLIBCXX_DEBUG_VERIFY(!__x._M_singular()
-|| __x.base() == _MutableIterator(),
+|| __x._M_value_initialized(),
 _M_message(__msg_init_const_singular)
 ._M_iterator(*this, "this")
 ._M_iterator(__x, "other"));
@@ -236,7 +236,7 @@ namespace __gnu_debug
 	// _GLIBCXX_RESOLVE_LIB_DEFECTS
 	// DR 408. Is vector > forbidden?
 	_GLIBCXX_DEBUG_VERIFY(!__x._M_singular()
-			  || __x.base() == _Iterator(),
+			  || __x._M_value_initialized(),
 			  _M_message(__msg_copy_singular)
 			  ._M_iterator(*this, "this")
 			  

[committed] MAINTAINERS: Add myself as AutoFDO maintainer

2022-08-04 Thread Eugene Rozenfeld via Gcc-patches
ChangeLog:

* MAINTAINERS: Add myself as AutoFDO maintainer.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1a37f4419b9..02ced0c43aa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -239,6 +239,7 @@ tree-ssaAndrew MacLeod  

 tree browser/unparser  Sebastian Pop   
 scev, data dependence  Sebastian Pop   
 profile feedback   Jan Hubicka 
+AutoFDOEugene Rozenfeld
 reload Ulrich Weigand  
 RTL optimizers Eric Botcazou   
 instruction combiner   Segher Boessenkool  
@@ -603,7 +604,6 @@ Craig Rodrigues 

 Erven Rohou
 Ira Rosen  
 Yvan Roux  
-Eugene Rozenfeld   
 Silvius Rus
 Matthew Sachs  
 Ankur Saini
-- 
2.25.1


Re: [PATCH] Add _GLIBCXX_DEBUG backtrace generation

2022-08-04 Thread François Dumont via Gcc-patches

Gentle reminder.

On 13/07/22 19:26, François Dumont wrote:

libstdc++: [_GLIBCXX_DEBUG] Add backtrace generation on demand

  Add _GLIBCXX_DEBUG_BACKTRACE macro to activate backtrace generation 
on _GLIBCXX_DEBUG assertions. Prerequisite is to have configure the 
lib with:


  --enable-libstdcxx-backtrace=yes

  libstdc++-v3/ChangeLog:

  * include/debug/formatter.h
  [_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_state): Declare.
  [_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_create_state): Declare.
  [_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_full_callback): Define.
  [_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_error_callback): Define.
  [_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_full_func): Define.
  [_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_full): Declare.
[_GLIBCXX_HAVE_STACKTRACE](_Error_formatter::_M_backtrace_state): New.
  [_GLIBCXX_HAVE_STACKTRACE](_Error_formatter::_M_backtrace_full): New.
  * src/c++11/debug.cc (pretty_print): Rename into...
  (print_function): ...that.
  [_GLIBCXX_HAVE_STACKTRACE](print_backtrace): New.
  (_Error_formatter::_M_error()): Adapt.
  * src/libbacktrace/Makefile.am: Add backtrace.c.
  * src/libbacktrace/Makefile.in: Regenerate.
  * src/libbacktrace/backtrace-rename.h (backtrace_full): New.
  * testsuite/23_containers/vector/debug/assign4_neg.cc: Add backtrace
    generation.
  * doc/xml/manual/debug_mode.xml: Document _GLIBCXX_DEBUG_BACKTRACE.
  * doc/xml/manual/using.xml: Likewise.

Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes.

Ok to commit ?

François





Question about submitting an article

2022-08-04 Thread Angela Anderson via Gcc-patches
 Hi there,

I'm a member of the content team at ATI Restoration, a disaster recovery
firm. I have a connection that works with a number of related
organizations, and they mentioned that you might be open to publishing
submitted articles. Is this true, and if so, what are the guidelines?

Thank you! Hoping we can work together soon
.
Angela
ATI Restoration
[image: beacon]


[PATCH] c++: Tweak for -Wpessimizing-move in templates [PR89780]

2022-08-04 Thread Marek Polacek via Gcc-patches
In my previous patches I've been extending our std::move warnings,
but this tweak actually dials it down a little bit.  As reported in
bug 89780, it's questionable to warn about expressions in templates
that were type-dependent, but aren't anymore because we're instantiating
the template.  As in,

  template 
  Dest withMove() {
T x;
return std::move(x);
  }

  template Dest withMove(); // #1
  template Dest withMove(); // #2

Saying that the std::move is pessimizing for #1 is not incorrect, but
it's not useful, because removing the std::move would then pessimize #2.
So the user can't really win.  At the same time, disabling the warning
just because we're in a template would be going too far, I still want to
warn for

  template 
  Dest withMove() {
Dest x;
return std::move(x);
  }

because the std::move therein will be pessimizing for any instantiation.

So I'm using the suppress_warning machinery to that effect.
Problem: I had to add a new group to nowarn_spec_t, otherwise
suppressing the -Wpessimizing-move warning would disable a whole bunch
of other warnings, which we really don't want.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/89780

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build) : Maybe suppress
-Wpessimizing-move.
* typeck.cc (maybe_warn_pessimizing_move): Don't issue warnings
if they are suppressed.
(check_return_expr): Disable -Wpessimizing-move when returning
a dependent expression.

gcc/ChangeLog:

* diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Handle
OPT_Wpessimizing_move and OPT_Wredundant_move.
* diagnostic-spec.h (nowarn_spec_t): Add NW_REDUNDANT enumerator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/Wpessimizing-move3.C: Remove dg-warning.
* g++.dg/cpp0x/Wpessimizing-move7.C: Likewise.
* g++.dg/cpp0x/Wredundant-move2.C: Likewise.
* g++.dg/cpp0x/Wpessimizing-move9.C: New test.
---
 gcc/cp/pt.cc  |  3 +
 gcc/cp/typeck.cc  | 20 +++--
 gcc/diagnostic-spec.cc|  7 +-
 gcc/diagnostic-spec.h |  4 +-
 .../g++.dg/cpp0x/Wpessimizing-move3.C |  2 +-
 .../g++.dg/cpp0x/Wpessimizing-move7.C |  2 +-
 .../g++.dg/cpp0x/Wpessimizing-move9.C | 89 +++
 gcc/testsuite/g++.dg/cpp0x/Wredundant-move2.C |  4 +-
 8 files changed, 119 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wpessimizing-move9.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 6c581fe0fb7..fe7e809fc2d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21215,6 +21215,9 @@ tsubst_copy_and_build (tree t,
  CALL_EXPR_ORDERED_ARGS (call) = ord;
  CALL_EXPR_REVERSE_ARGS (call) = rev;
}
+   if (warning_suppressed_p (t, OPT_Wpessimizing_move))
+ /* This also suppresses -Wredundant-move.  */
+ suppress_warning (ret, OPT_Wpessimizing_move);
  }
 
RETURN (ret);
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 2650beb780e..70a5efc45de 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -10430,9 +10430,10 @@ maybe_warn_pessimizing_move (tree expr, tree type, 
bool return_p)
  if (can_do_rvo_p (arg, type))
{
  auto_diagnostic_group d;
- if (warning_at (loc, OPT_Wpessimizing_move,
- "moving a temporary object prevents copy "
- "elision"))
+ if (!warning_suppressed_p (expr, OPT_Wpessimizing_move)
+ && warning_at (loc, OPT_Wpessimizing_move,
+"moving a temporary object prevents copy "
+"elision"))
inform (loc, "remove % call");
}
  /* The rest of the warnings is only relevant for when we are
@@ -10443,14 +10444,16 @@ maybe_warn_pessimizing_move (tree expr, tree type, 
bool return_p)
  else if (can_do_nrvo_p (arg, type))
{
  auto_diagnostic_group d;
- if (warning_at (loc, OPT_Wpessimizing_move,
- "moving a local object in a return statement "
- "prevents copy elision"))
+ if (!warning_suppressed_p (expr, OPT_Wpessimizing_move)
+ && warning_at (loc, OPT_Wpessimizing_move,
+"moving a local object in a return statement "
+"prevents copy elision"))
inform (loc, "remove % call");
}
  /* Warn if the move is redundant.  It is redundant when we would
 do maybe-rvalue overload resolution even without std::move.  */
  else if (warn_redundant_move
+  && !warning_suppressed_p (expr, OPT_Wredundant_move)
   && (moved = 

[committed] libstdc++: Make std::string_view(Range&&) constructor explicit

2022-08-04 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

The P2499R0 paper was recently approved for C++23.

libstdc++-v3/ChangeLog:

* include/std/string_view (basic_string_view(Range&&)): Add
explicit as per P2499R0.
* testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc:
Adjust implicit conversions. Check implicit conversions fail.
* testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc:
Likewise.
---
 libstdc++-v3/include/std/string_view  |  2 +-
 .../cons/char/range_c++20.cc  | 28 ++---
 .../cons/wchar_t/range_c++20.cc   | 30 +++
 3 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index bccf4d1847f..30ff136b1cb 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -162,7 +162,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  })
  && (!requires { typename _DRange::traits_type; }
  || is_same_v)
-   constexpr
+   constexpr explicit
basic_string_view(_Range&& __r)
noexcept(noexcept(ranges::size(__r)) && noexcept(ranges::data(__r)))
: _M_len(ranges::size(__r)), _M_str(ranges::data(__r))
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc
index bd50c3058e6..a5745fcb603 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/char/range_c++20.cc
@@ -36,7 +36,7 @@ test01()
   };
 
   R r;
-  std::string_view s = r;
+  std::string_view s{r};
   VERIFY( s == r.str );
   VERIFY( s.data() == std::ranges::data(r) );
   VERIFY( s.size() == std::ranges::size(r) );
@@ -50,10 +50,15 @@ test01()
   static_assert( std::ranges::contiguous_range );
   static_assert( std::ranges::sized_range );
   R2 r2;
-  std::string_view s2 = r2; // uses conversion to string_view
+  std::string_view s2(r2); // uses conversion to string_view
   VERIFY( s2 == "Out of range" );
   VERIFY( std::string_view(const_cast(r2)) == s2 );
 
+  // And again using copy-initialization instead of direct-initialization.
+  std::string_view s2_implicit = r2; // uses conversion to string_view
+  VERIFY( s2_implicit == "Out of range" );
+  VERIFY( std::string_view(const_cast(r2)) == s2_implicit );
+
   struct R3 : R
   {
 using R::begin;
@@ -91,7 +96,7 @@ test01()
   static_assert( std::ranges::contiguous_range );
   static_assert( std::ranges::sized_range );
   R5 r5;
-  std::string_view s5 = r5; // Uses range constructor
+  std::string_view s5(r5); // Uses range constructor
   VERIFY( s5 == r5.str );
   s5 = std::string_view(std::move(r5)); // In C++20 this used conversion op.
   VERIFY( s5 == r5.str );  // In C++23 it uses range constructor.
@@ -156,15 +161,30 @@ test04()
   };
 
   R r;
-  std::basic_string_view s = r; // Use deduction guide.
+  std::basic_string_view s(r); // Use deduction guide.
 
   static_assert( std::is_same_v );
 }
 
+void
+test05()
+{
+  struct R
+  {
+const char* begin() const { return nullptr; }
+const char* end() const { return nullptr; }
+  };
+
+  // P2499R0 string_view range constructor should be explicit
+  // P2516R0 string_view is implicitly convertible from what?
+  static_assert( ! std::is_convertible_v );
+}
+
 int main()
 {
   test01();
   test02();
   test03();
   test04();
+  test05();
 }
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc
index 0b28220d862..af3c986e56f 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/cons/wchar_t/range_c++20.cc
@@ -36,7 +36,7 @@ test01()
   };
 
   R r;
-  std::wstring_view s = r;
+  std::wstring_view s{r};
   VERIFY( s == r.str );
   VERIFY( s.data() == std::ranges::data(r) );
   VERIFY( s.size() == std::ranges::size(r) );
@@ -50,10 +50,15 @@ test01()
   static_assert( std::ranges::contiguous_range );
   static_assert( std::ranges::sized_range );
   R2 r2;
-  std::wstring_view s2 = r2; // uses conversion to wstring_view
+  std::wstring_view s2(r2); // uses conversion to wstring_view
   VERIFY( s2 == L"Out of range" );
   VERIFY( std::wstring_view(const_cast(r2)) == s2 );
 
+  // And again using copy-initialization instead of direct-initialization.
+  std::wstring_view s2_implicit = r2; // uses conversion to wstring_view
+  VERIFY( s2_implicit == L"Out of range" );
+  VERIFY( std::wstring_view(const_cast(r2)) == s2_implicit );
+
   struct R3 : R
   {
 using R::begin;
@@ -91,10 +96,10 @@ test01()
   static_assert( std::ranges::contiguous_range );
   static_assert( std::ranges::sized_range );
   R5 

[committed] libstdc++: Add comparisons to std::default_sentinel_t (LWG 3719)

2022-08-04 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This library defect was recently approved for C++23.

libstdc++-v3/ChangeLog:

* include/bits/fs_dir.h (directory_iterator): Add comparison
with std::default_sentinel_t. Remove redundant operator!= for
C++20.
* (recursive_directory_iterator): Likewise.
* include/bits/iterator_concepts.h [!__cpp_lib_concepts]
(default_sentinel_t, default_sentinel): Define even if concepts
are not supported.
* include/bits/regex.h (regex_iterator): Add comparison with
std::default_sentinel_t. Remove redundant operator!= for C++20.
(regex_token_iterator): Likewise.
(regex_token_iterator::_M_end_of_seq()): Add noexcept.
* testsuite/27_io/filesystem/iterators/lwg3719.cc: New test.
* testsuite/28_regex/iterators/regex_iterator/lwg3719.cc:
New test.
* testsuite/28_regex/iterators/regex_token_iterator/lwg3719.cc:
New test.
---
 libstdc++-v3/include/bits/fs_dir.h| 33 
 libstdc++-v3/include/bits/iterator_concepts.h | 28 ++---
 libstdc++-v3/include/bits/regex.h | 24 +++-
 .../27_io/filesystem/iterators/lwg3719.cc | 39 +++
 .../iterators/regex_iterator/lwg3719.cc   | 29 ++
 .../iterators/regex_token_iterator/lwg3719.cc | 29 ++
 6 files changed, 169 insertions(+), 13 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/27_io/filesystem/iterators/lwg3719.cc
 create mode 100644 
libstdc++-v3/testsuite/28_regex/iterators/regex_iterator/lwg3719.cc
 create mode 100644 
libstdc++-v3/testsuite/28_regex/iterators/regex_token_iterator/lwg3719.cc

diff --git a/libstdc++-v3/include/bits/fs_dir.h 
b/libstdc++-v3/include/bits/fs_dir.h
index ca37952ec17..bec2b7674ef 100644
--- a/libstdc++-v3/include/bits/fs_dir.h
+++ b/libstdc++-v3/include/bits/fs_dir.h
@@ -36,8 +36,9 @@
 # include 
 # include 
 
-#if __cplusplus > 201703L
+#if __cplusplus >= 202002L
 # include // std::strong_ordering
+# include// std::default_sentinel_t
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -420,9 +421,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   return __pr;
 }
 
-  private:
-directory_iterator(const path&, directory_options, error_code*);
-
 friend bool
 operator==(const directory_iterator& __lhs,
const directory_iterator& __rhs) noexcept
@@ -431,10 +429,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
&& !__lhs._M_dir.owner_before(__rhs._M_dir);
 }
 
+#if __cplusplus >= 202002L
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3719. Directory iterators should be usable with default sentinel
+  bool operator==(default_sentinel_t) const noexcept
+  { return !_M_dir; }
+#endif
+
+#if __cpp_impl_three_way_comparison < 201907L
 friend bool
 operator!=(const directory_iterator& __lhs,
   const directory_iterator& __rhs) noexcept
 { return !(__lhs == __rhs); }
+#endif
+
+  private:
+directory_iterator(const path&, directory_options, error_code*);
 
 friend class recursive_directory_iterator;
 
@@ -519,9 +529,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 void disable_recursion_pending() noexcept;
 
-  private:
-recursive_directory_iterator(const path&, directory_options, error_code*);
-
 friend bool
 operator==(const recursive_directory_iterator& __lhs,
const recursive_directory_iterator& __rhs) noexcept
@@ -530,10 +537,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
&& !__lhs._M_dirs.owner_before(__rhs._M_dirs);
 }
 
+#if __cplusplus >= 202002L
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3719. Directory iterators should be usable with default sentinel
+  bool operator==(default_sentinel_t) const noexcept
+  { return !_M_dirs; }
+#endif
+
+#if __cpp_impl_three_way_comparison < 201907L
 friend bool
 operator!=(const recursive_directory_iterator& __lhs,
const recursive_directory_iterator& __rhs) noexcept
 { return !(__lhs == __rhs); }
+#endif
+
+  private:
+recursive_directory_iterator(const path&, directory_options, error_code*);
 
 struct _Dir_stack;
 std::__shared_ptr<_Dir_stack> _M_dirs;
diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index a04c970b03b..cf66c63f395 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -32,15 +32,35 @@
 
 #pragma GCC system_header
 
+#if __cplusplus >= 202002L
 #include 
 #include// to_address
 #include// identity, ranges::less
 
-#if __cpp_lib_concepts
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  /** A sentinel type that can be used to check for the end of a range.
+   *
+   * For some iterator types the past-the-end sentinel value is independent
+   * of the underlying sequence, and a default sentinel can be used with them.
+   

[COMMITTED] PR tree-optimization/106514 - Loop over intersected bitmaps.

2022-08-04 Thread Andrew MacLeod via Gcc-patches
compute_ranges_in_block loops over all the imports, and checks to see if 
each one is exported, then calculated an outgoing range for those that are.


The outer loop loops over the import list, whereas the export list is 
usually smaller, and empty half the time.  This means we were doing a 
lot of busy work looping over the imports for no reason.


The export list was extracted on each iteration of the loop, and because 
its a self sizing thing with a call, It doesn't get hauled out of the 
loop.  More extra work.


This patch changes the loop to use EXECUTE_IF_AND_IN_BITMAP to only look 
at the intersection of imports and exports, ths only doing the work it 
needs to do.


With a performance build, running with --param 
max-jump-thread-duplication-stmts=50 for good measure:


before patch:

backwards jump threading   :   5.91 ( 90%)   0.01 ( 20%)   5.93 
( 89%)    72  (  0%)

TOTAL   :   6.58  0.05 6.66   47M

after patch:

backwards jump threading   :   4.67 ( 88%)   0.01 ( 14%)   4.67 
( 86%)    72  (  0%)

TOTAL   :   5.31  0.07 5.42   47M

bootstrapped on 86_64-pc-linux-gnu  with no regressions, and no change 
in the thread count.  pushed.


Andrew
commit 8e34d92ef29a175b84cc7f5185db43656ae762bb
Author: Andrew MacLeod 
Date:   Thu Aug 4 12:22:59 2022 -0400

Loop over intersected bitmaps.

compute_ranges_in_block loops over the import list and then checks the
same bit in exports.  It is nmore efficent to loop over the intersection
of the 2 bitmaps.

PR tree-optimization/106514
* gimple-range-path.cc (path_range_query::compute_ranges_in_block):
Use EXECUTE_IF_AND_IN_BITMAP to loop over 2 bitmaps.

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index e1b9683c1e4..43e7526b6fc 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -479,32 +479,28 @@ path_range_query::compute_ranges_in_block (basic_block bb)
   p->set_root_oracle (nullptr);
 }
 
-  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
+  gori_compute  = m_ranger->gori ();
+  bitmap exports = g.exports (bb);
+  EXECUTE_IF_AND_IN_BITMAP (m_imports, exports, 0, i, bi)
 {
   tree name = ssa_name (i);
-  gori_compute  = m_ranger->gori ();
-  bitmap exports = g.exports (bb);
-
-  if (bitmap_bit_p (exports, i))
+  Value_Range r (TREE_TYPE (name));
+  if (g.outgoing_edge_range_p (r, e, name, *this))
 	{
-	  Value_Range r (TREE_TYPE (name));
-	  if (g.outgoing_edge_range_p (r, e, name, *this))
+	  Value_Range cached_range (TREE_TYPE (name));
+	  if (get_cache (cached_range, name))
+	r.intersect (cached_range);
+
+	  set_cache (r, name);
+	  if (DEBUG_SOLVER)
 	{
-	  Value_Range cached_range (TREE_TYPE (name));
-	  if (get_cache (cached_range, name))
-		r.intersect (cached_range);
-
-	  set_cache (r, name);
-	  if (DEBUG_SOLVER)
-		{
-		  fprintf (dump_file, "outgoing_edge_range_p for ");
-		  print_generic_expr (dump_file, name, TDF_SLIM);
-		  fprintf (dump_file, " on edge %d->%d ",
-			   e->src->index, e->dest->index);
-		  fprintf (dump_file, "is ");
-		  r.dump (dump_file);
-		  fprintf (dump_file, "\n");
-		}
+	  fprintf (dump_file, "outgoing_edge_range_p for ");
+	  print_generic_expr (dump_file, name, TDF_SLIM);
+	  fprintf (dump_file, " on edge %d->%d ",
+		   e->src->index, e->dest->index);
+	  fprintf (dump_file, "is ");
+	  r.dump (dump_file);
+	  fprintf (dump_file, "\n");
 	}
 	}
 }


Re: [PATCH, rs6000] TARGET_MADDLD should include TARGET_POWERPC64

2022-08-04 Thread Segher Boessenkool
Hi!

On Thu, Aug 04, 2022 at 11:17:48AM +0800, HAO CHEN GUI wrote:
> On 4/8/2022 上午 12:54, Segher Boessenkool wrote:
> > Hrm.  But the maddld insn is useful for SImode as well, in 32-bit mode,
> > it is just its name that is a bit confusing then.  Sorry for confusing
> > things :-(
> > 
> > Add a test for SImode maddld as well?
> 
>  Thanks for your comments.
> 
>  Just want to double confirm that a maddld test case with "-m32" and
> "-mpowerpc64" is needed. As far as I understand, maddld is a 64-bit
> instruction and it should be used with "-mpowerpc64" in a 32-bit register
> environment.

Nope.  The instruction is fine in pure 32 bit as well.

Almost all instructions actually work on 64 bits, but for many
(including this maddld) the low 32 bits of the result make sense on
their own, as a 32-bit operation done on the low 32 bits of the input
registers.

We have
  (define_mode_iterator GPR [SI (DI "TARGET_POWERPC64")])
so that :DI will not be used for plain -m32 compilations, but it can
(and will, and should) be used for -m32 -mpowerpc64, and :SI can be used
for -m32 in every case.

Instructions that look at the top 32 bits of a GPR need an explicit
TARGET_POWERPC64.


Segher


Re: [PATCH, rs6000] Correct return value of check_p9modulo_hw_available

2022-08-04 Thread Segher Boessenkool
Hi!

On Thu, Aug 04, 2022 at 05:55:20PM +0800, HAO CHEN GUI wrote:
>   This patch corrects return value of check_p9modulo_hw_available. It should
> return 0 when p9modulo is supported.

It would be harder to make such mistakes if it used exit() explicitly,
so that the reader is reminded the shell semantics are used here instead
of the C conventions.

> - return (r == 2);
> + return (r != 2);

so that then would be smth like

if (r == 2)
exit (0);
else
exit (1);

(which makes the exit code for failure explicit as well).

Terse is good.  Explicit is good as well :-)

(You don't have to make this change here of course, but keep it in mind
for the future :-) )


Segher


[PATCH] libstdc++: Fixing Error: invalid type argument of unary '*' (have 'int')

2022-08-04 Thread Seija Kijin via Gcc-patches
Had an error compiling tiny-cuda-nn using gcc 12.1. With this minor
patch, I recompiled and the build succeeded.

No behavioral change.

---
 libstdc++-v3/include/bits/locale_facets_nonio.tcc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 17a2c8d4486e..fc35a9e693e7 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -1474,8 +1474,8 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
   // calls.  So e.g. if __fmt is "%p %I:%M:%S", we can't handle it
   // properly, because we first handle the %p am/pm specifier and only
   // later the 12-hour format specifier.
-  if ((void*)(this->*(_get::do_get)) == (void*)(_get::do_get))
- __use_state = true;
+  if ((void*)(this->*(_get::do_get)) == (_get::do_get))
+__use_state = true;
 #pragma GCC diagnostic pop
 #endif
   __time_get_state __state = __time_get_state();


Re: [PATCH V2] arm: add -static-pie support

2022-08-04 Thread Lance Fredrickson via Gcc-patches
Just a follow up trying to get some eyes on my static-pie patch 
submission for arm.

Feedback welcome.
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598610.html

thanks,
Lance Fredrickson


Re: [PATCH 1/1 v2] c++/106423: Fix pragma suppression of -Wc++20-compat diagnostics.

2022-08-04 Thread Tom Honermann via Gcc-patches
Are there any further concerns with this patch? If not, I extend my 
gratitude to anyone so kind as to commit this for me as I don't have 
commit access.


I just noticed that I neglected to add a ChangeLog entry for the comment 
addition to gcc/cp/parser.cc. Noted inline below. I can re-send the 
patch with that update if desired.


Tom.

On 8/1/22 2:49 PM, Tom Honermann wrote:

Gcc's '#pragma GCC diagnostic' directives are processed in "early mode"
(see handle_pragma_diagnostic_early) for the C++ frontend and, as such,
require that the target diagnostic option be enabled for the preprocessor
(see c_option_is_from_cpp_diagnostics).  This change modifies the
-Wc++20-compat option definition to register it as a preprocessor option
so that its associated diagnostics can be suppressed.  The changes also
implicitly disable the option in C++20 and later modes.  These changes
are consistent with the definition of the -Wc++11-compat option.

This support is motivated by the need to suppress the following diagnostic
otherwise issued in C++17 and earlier modes due to the char8_t typedef
present in the uchar.h header file in glibc 2.36.
   warning: identifier ‘char8_t’ is a keyword in C++20 [-Wc++20-compat]

Tests are added to validate suppression of both -Wc++11-compat and
-Wc++20-compat related diagnostics (fixes were only needed for the C++20
case).

Fixeshttps://gcc.gnu.org/PR106423.

gcc/c-family/ChangeLog:
* c-opts.cc (c_common_post_options): Disable -Wc++20-compat diagnostics
in C++20 and later.
* c.opt (Wc++20-compat): Enable hooks for the preprocessor.


gcc/cp/ChangeLog:
    * parser.cc (cp_lexer_saving_tokens): Add comment regarding 
diagnostic requirements.




gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/keywords2.C: New test.
* g++.dg/cpp2a/keywords2.C: New test.

libcpp/ChangeLog:
* include/cpplib.h (cpp_warning_reason): Add CPP_W_CXX20_COMPAT.
* init.cc (cpp_create_reader): Add cpp_warn_cxx20_compat.
---
  gcc/c-family/c-opts.cc |  7 +++
  gcc/c-family/c.opt |  2 +-
  gcc/cp/parser.cc   |  5 -
  gcc/testsuite/g++.dg/cpp0x/keywords2.C | 16 
  gcc/testsuite/g++.dg/cpp2a/keywords2.C | 13 +
  libcpp/include/cpplib.h|  4 
  libcpp/init.cc |  1 +
  7 files changed, 46 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/keywords2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/keywords2.C

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index b9f01a65ed7..1ea37ba9742 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1046,6 +1046,13 @@ c_common_post_options (const char **pfilename)
else if (warn_narrowing == -1)
  warn_narrowing = 0;
  
+  if (cxx_dialect >= cxx20)

+{
+  /* Don't warn about C++20 compatibility changes in C++20 or later.  */
+  warn_cxx20_compat = 0;
+  cpp_opts->cpp_warn_cxx20_compat = 0;
+}
+
/* C++17 has stricter evaluation order requirements; let's use some of them
   for earlier C++ as well, so chaining works as expected.  */
if (c_dialect_cxx ()
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 44e1a60ce24..dfdebd596ef 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -455,7 +455,7 @@ Wc++2a-compat
  C++ ObjC++ Warning Alias(Wc++20-compat) Undocumented
  
  Wc++20-compat

-C++ ObjC++ Var(warn_cxx20_compat) Warning LangEnabledBy(C++ ObjC++,Wall)
+C++ ObjC++ Var(warn_cxx20_compat) Warning LangEnabledBy(C++ ObjC++,Wall) 
Init(0) CPP(cpp_warn_cxx20_compat) CppReason(CPP_W_CXX20_COMPAT)
  Warn about C++ constructs whose meaning differs between ISO C++ 2017 and ISO 
C++ 2020.
  
  Wc++11-extensions

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4f67441eeb1..c3584446827 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -924,7 +924,10 @@ cp_lexer_saving_tokens (const cp_lexer* lexer)
  /* Store the next token from the preprocessor in *TOKEN.  Return true
 if we reach EOF.  If LEXER is NULL, assume we are handling an
 initial #pragma pch_preprocess, and thus want the lexer to return
-   processed strings.  */
+   processed strings.
+
+   Diagnostics issued from this function must have their controlling option (if
+   any) in c.opt annotated as a libcpp option via the CppReason property.  */
  
  static void

  cp_lexer_get_preprocessor_token (unsigned flags, cp_token *token)
diff --git a/gcc/testsuite/g++.dg/cpp0x/keywords2.C 
b/gcc/testsuite/g++.dg/cpp0x/keywords2.C
new file mode 100644
index 000..d67d01e31ed
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/keywords2.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++98_only } }
+// { dg-options "-Wc++11-compat" }
+
+// Validate suppression of -Wc++11-compat diagnostics.
+#pragma GCC diagnostic ignored "-Wc++11-compat"
+int alignof;
+int alignas;
+int constexpr;
+int decltype;
+int noexcept;
+int nullptr;
+int 

Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule

2022-08-04 Thread Koning, Paul via Gcc-patches



> On Aug 4, 2022, at 9:17 AM, Chung-Lin Tang  wrote:
> 
> On 2022/6/28 10:06 PM, Jakub Jelinek wrote:
>> On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote:
>>> with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next:
>>> 
>>> (1) chunk_size <= -1: wraps into large unsigned value, seems to work though.
>>> (2) chunk_size == 0:  infinite loop
>>> 
>>> The (2) behavior is obviously not desired. This patch fixes this by changing
>> Why?  It is a user error, undefined behavior, we shouldn't slow down valid
>> code for users who don't bother reading the standard.
> 
> This is loop init code, not per-iteration. The overhead really isn't that 
> much.
> 
> The question should be, if GCC having infinite loop behavior is reasonable,
> even if it is undefined in the spec.

I wouldn't think so.  The way I see "undefined code" is that you can't complain 
about "wrong code" produced by the compiler.  But for the compiler to 
malfunction on wrong input is an entirely differerent matter.  For one thing, 
it's hard to fix your code if the compiler fails.  How would you locate the 
offending source line?

paul




Re: [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule

2022-08-04 Thread Chung-Lin Tang

On 2022/6/28 10:06 PM, Jakub Jelinek wrote:

On Thu, Jun 23, 2022 at 11:47:59PM +0800, Chung-Lin Tang wrote:

with the way that chunk_size < 1 is handled for gomp_iter_dynamic_next:

(1) chunk_size <= -1: wraps into large unsigned value, seems to work though.
(2) chunk_size == 0:  infinite loop

The (2) behavior is obviously not desired. This patch fixes this by changing


Why?  It is a user error, undefined behavior, we shouldn't slow down valid
code for users who don't bother reading the standard.


This is loop init code, not per-iteration. The overhead really isn't that much.

The question should be, if GCC having infinite loop behavior is reasonable,
even if it is undefined in the spec.


E.g. OpenMP 5.1 [132:14] says clearly:
"chunk_size must be a loop invariant integer expression with a positive
value."
and omp_set_schedule for chunk_size < 1 should use a default value (which it
does).

For OMP_SCHEDULE the standard says it is implementation-defined what happens
if the format isn't the specified one, so I guess the env.c change
could be acceptable (though without it it is fine too), but the
loop.c change is wrong.  Note, if the loop.c change would be ok, you'd
need to also change loop_ull.c too.


I've updated the patch to add the same changes for libgomp/loop_ull.c and 
updated
the testcase too. Tested on mainline trunk without regressions.

Thanks,
Chung-Lin

libgomp/ChangeLog:

* env.c (parse_schedule): Make negative values invalid for chunk_size.
* loop.c (gomp_loop_init): For non-STATIC schedule and chunk_size <= 0,
set initialized chunk_size to 1.
* loop_ull.c (gomp_loop_ull_init): Likewise.

* testsuite/libgomp.c/loop-28.c: New test.diff --git a/libgomp/env.c b/libgomp/env.c
index 1c4ee894515..dff07617e15 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -182,6 +182,8 @@ parse_schedule (void)
 goto invalid;
 
   errno = 0;
+  if (*env == '-')
+goto invalid;
   value = strtoul (env, , 10);
   if (errno || end == env)
 goto invalid;
diff --git a/libgomp/loop.c b/libgomp/loop.c
index be85162bb1e..018b4e9a8bd 100644
--- a/libgomp/loop.c
+++ b/libgomp/loop.c
@@ -41,7 +41,7 @@ gomp_loop_init (struct gomp_work_share *ws, long start, long 
end, long incr,
enum gomp_schedule_type sched, long chunk_size)
 {
   ws->sched = sched;
-  ws->chunk_size = chunk_size;
+  ws->chunk_size = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 1;
   /* Canonicalize loops that have zero iterations to ->next == ->end.  */
   ws->end = ((incr > 0 && start > end) || (incr < 0 && start < end))
? start : end;
diff --git a/libgomp/loop_ull.c b/libgomp/loop_ull.c
index 602737296d4..74ddb1bd623 100644
--- a/libgomp/loop_ull.c
+++ b/libgomp/loop_ull.c
@@ -43,7 +43,7 @@ gomp_loop_ull_init (struct gomp_work_share *ws, bool up, 
gomp_ull start,
gomp_ull chunk_size)
 {
   ws->sched = sched;
-  ws->chunk_size_ull = chunk_size;
+  ws->chunk_size_ull = (sched == GFS_STATIC || chunk_size > 1) ? chunk_size : 
1;
   /* Canonicalize loops that have zero iterations to ->next == ->end.  */
   ws->end_ull = ((up && start > end) || (!up && start < end))
? start : end;
diff --git a/libgomp/testsuite/libgomp.c/loop-28.c 
b/libgomp/testsuite/libgomp.c/loop-28.c
new file mode 100644
index 000..664842e27aa
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/loop-28.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-timeout 10 } */
+
+void __attribute__((noinline))
+foo (int a[], int n, int chunk_size)
+{
+  #pragma omp parallel for schedule (dynamic,chunk_size)
+  for (int i = 0; i < n; i++)
+a[i] = i;
+
+  #pragma omp parallel for schedule (dynamic,chunk_size)
+  for (unsigned long long i = 0; i < n; i++)
+a[i] = i;
+}
+
+int main (void)
+{
+  int a[100];
+  foo (a, 100, 0);
+  return 0;
+}


Re: [PATCH 1/2] Allow subtarget customization of CC1_SPEC

2022-08-04 Thread Sebastian Huber

On 22/07/2022 15:02, Sebastian Huber wrote:

gcc/ChangeLog:

* gcc.cc (SUBTARGET_CC1_SPEC): Define if not defined.
(CC1_SPEC): Define to SUBTARGET_CC1_SPEC.
* config/arm/arm.h (CC1_SPEC): Remove.
* config/arc/arc.h (CC1_SPEC): Append SUBTARGET_CC1_SPEC.
* config/cris/cris.h (CC1_SPEC): Likewise.
* config/frv/frv.h (CC1_SPEC): Likewise.
* config/i386/i386.h (CC1_SPEC): Likewise.
* config/ia64/ia64.h (CC1_SPEC): Likewise.
* config/lm32/lm32.h (CC1_SPEC): Likewise.
* config/m32r/m32r.h (CC1_SPEC): Likewise.
* config/mcore/mcore.h (CC1_SPEC): Likewise.
* config/microblaze/microblaze.h: Likewise.
* config/nds32/nds32.h (CC1_SPEC): Likewise.
* config/nios2/nios2.h (CC1_SPEC): Likewise.
* config/pa/pa.h (CC1_SPEC): Likewise.
* config/rs6000/sysv4.h (CC1_SPEC): Likewise.
* config/rx/rx.h (CC1_SPEC): Likewise.
* config/sparc/sparc.h (CC1_SPEC): Likewise.


Could someone please have a look at this patch set?

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH]middle-end: Fix phi-ssa assertion triggers. [PR106519]

2022-08-04 Thread Richard Biener via Gcc-patches
On Thu, 4 Aug 2022, Tamar Christina wrote:

> Hi All,
> 
> The failures on -m32 on x86 show that the checks at the top level in
> tree_ssa_phiopt_worker aren't enough for diamond_p.
> 
> In minmax_replacement we perform the additional validation of the shape but by
> then it's too late to catch these case.
> 
> This patch changes it so we check that for a diamond shape we check that the
> edges we're operation on must result in the same destination BB.
> 
> We also enforce that for a diamond the middle BBs must have a single 
> successor,
> this is because the remainder of the code always follows EDGE_SUCC 0.  If 
> there
> are more than 1 successor then the edge we follow could be not part of the
> diamond so just reject it early on.
> 
> I also remove the assert after the use of gimple_phi_arg_def as this function
> can't return NULL. if the index is out of range it already breaks on an assert
> inside the gimple_phi_arg_def, so we never hit this assert.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR middle-end/106519
>   * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Check final phi edge for
>   diamond shapes.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/106519
>   * gcc.dg/pr106519.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/pr106519.c b/gcc/testsuite/gcc.dg/pr106519.c
> new file mode 100644
> index 
> ..3d4662d8a02c6501560abb71ac53320f093620d8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr106519.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +
> +int bytestart, bytemem_0_0, modlookup_l_p;
> +
> +void
> +modlookup_l() {
> +  long j;
> +  while (modlookup_l_p)
> +while (bytestart && j && bytemem_0_0)
> +  j = j + 1;
> +}
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 
> a8e55e040649e17f83a2fc3340e368cf9c4c5e70..1c4942b5b5c18732a8b18789c04e2685437404dd
>  100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -268,7 +268,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
> continue;
>   }
>else if (EDGE_SUCC (bb1, 0)->dest == EDGE_SUCC (bb2, 0)->dest
> -&& !empty_block_p (bb1))
> +&& !empty_block_p (bb1)
> +&& single_succ_p (bb1) && single_succ_p (bb2))
>   diamond_p = true;
>else
>   continue;
> @@ -311,6 +312,12 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
>   for (gsi = gsi_start (phis); !gsi_end_p (gsi); gsi_next ())
> {
>   phi = as_a  (gsi_stmt (gsi));
> +
> + /* A diamond must continue to the same destination node, 
> otherwise
> +by definition it's not a diamond.  */
> + if (diamond_p && e1->dest != e2->dest)

this and the check below looks redundant since we already check
EDGE_SUCC (bb1, 0)->dest == EDGE_SUCC (bb2, 0)->dest when setting
diamond_p = true?  Which means we already verify that following
edge 0 we have a diamond?

In fact for the testcase we _do_ have a diamond but we failed to
adjust e2 to point to the merge block early, we only do it here:

  e2 = diamond_p ? EDGE_SUCC (bb2, 0) : e2;
  phi = single_non_singleton_phi_for_edges (phis, e1, e2);

That also means that the do_store_elim case possibly wants a
bail out on diamond_p?  Not sure for value_replacement.

At least

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index a8e55e04064..ef4c0b78f4e 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -269,7 +269,10 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
}
   else if (EDGE_SUCC (bb1, 0)->dest == EDGE_SUCC (bb2, 0)->dest
   && !empty_block_p (bb1))
-   diamond_p = true;
+   {
+ diamond_p = true;
+ e2 = EDGE_SUCC (bb2, 0);
+   }
   else
continue;
 
@@ -324,7 +327,6 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
  if (!candorest)
continue;
 
- e2 = diamond_p ? EDGE_SUCC (bb2, 0) : e2;
  phi = single_non_singleton_phi_for_edges (phis, e1, e2);
  if (!phi)
continue;

fixes the ICE for me but I did no further checking.

> +   continue;
> +
>   arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
>   arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
>   if (value_replacement (bb, bb1, e1, e2, phi, arg0, arg1) == 2)
> @@ -329,12 +336,15 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
> if (!phi)
>   continue;
>  
> -   arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
> -   arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
> +   /* A diamond must continue to the same 

Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-04 Thread Takayuki 'January June' Suwa via Gcc-patches
(sorry repost due to the lack of cc here)
Hi!

On 2022/08/04 18:49, Richard Sandiford wrote:
> Takayuki 'January June' Suwa  writes:
>> Thanks for your response.
>>
>> On 2022/08/03 16:52, Richard Sandiford wrote:
>>> Takayuki 'January June' Suwa via Gcc-patches  
>>> writes:
 Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
 data flow consistent, but it also increases register allocation pressure
 and thus often creates many unwanted register-to-register moves that
 cannot be optimized away.
>>>
>>> There are two things here:
>>>
>>> - If emit_move_complex_parts emits a clobber of a hard register,
>>>   then that's probably a bug/misfeature.  The point of the clobber is
>>>   to indicate that the register has no useful contents.  That's useful
>>>   for wide pseudos that are written to in parts, since it avoids the
>>>   need to track the liveness of each part of the pseudo individually.
>>>   But it shouldn't be necessary for hard registers, since subregs of
>>>   hard registers are simplified to hard registers wherever possible
>>>   (which on most targets is "always").
>>>
>>>   So I think the emit_move_complex_parts clobber should be restricted
>>>   to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>>>   (if only partly) then it would be worth doing as its own patch.
>>>
>>> - I think it'd be worth looking into more detail why a clobber makes
>>>   a difference to register pressure.  A clobber of a pseudo register R
>>>   shouldn't make R conflict with things that are live at the point of
>>>   the clobber.
>>
>> I agree with its worth.
>> In fact, aside from other ports, on the xtensa one, RA in code with frequent 
>> D[FC]mode pseudos is terribly bad.
>> For example, in __muldc3 on libgcc2, the size of the stack frame reserved 
>> will almost double depending on whether or not this patch is applied.
> 
> Yeah, that's a lot.

So lots, but almost double might be an overstatement :)

BTW after some quick experimentation, I found that turning on 
-fsplit-wide-types-early would roughly (but not completely) solve the problem.  
Surely, the output was not so bad in the past...

> 
  It seems just analogous to partial register
 stall which is a famous problem on processors that do register renaming.

 In my opinion, when the register to be clobbered is a composite of hard
 ones, we should clobber the individual elements separetely, otherwise
 clear the entire to zero prior to use as the "init-regs" pass does (like
 partial register stall workarounds on x86 CPUs).  Such redundant zero
 constant assignments will be removed later in the "cprop_hardreg" pass.
>>>
>>> I don't think we should rely on the zero being optimised away later.
>>>
>>> Emitting the zero also makes it harder for the register allocator
>>> to elide the move.  For example, if we have:
>>>
>>>   (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>>>   (set (subreg:SI (reg:DI P) 4) (reg:SI R1))
>>>
>>> then there is at least a chance that the RA could assign hard registers
>>> R0:R1 to P, which would turn the moves into nops.  If we emit:
>>>
>>>   (set (reg:DI P) (const_int 0))
>>>
>>> beforehand then that becomes impossible, since R0 and R1 would then
>>> conflict with P.
>>
>> Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds to 
>> one hard register.
> 
> I was thinking here about the case where (reg:DI …) corresponds to
> 2 hard registers.  Each subreg move is then a single hard register
> copy, but assigning P to the combination R0:R1 can remove both of
> the subreg moves.
> 
>>> TBH I'm surprised we still run init_regs for LRA.  I thought there was
>>> a plan to stop doing that, but perhaps I misremember.
>>
>> Sorry I am not sure about the status of LRA... because the xtensa port is 
>> still using reload.
> 
> Ah, hadn't realised that.  If you have time to work on it, it would be
> really good to move over to LRA.  There are plans to remove old reload.

Alas you do overestimate me :) I've only been working about the GCC development 
for a little over a year.
Well it's a lie that I'm not interested in it, but too much for me.

> 
> It might be that old reload *does* treat a pseudo clobber as a conflict.
> I can't remember now.  If so, then zeroing the register wouldn't be
> too bad (for old reload only).
> 
>> As conclusion, trying to tweak the common code side may have been a bit 
>> premature.
>> I'll consider if I can deal with those issues on the side of the 
>> target-specific code.
> 
> It's likely to be at least partly a target-independent issue, so tweaking
> the common code makes sense in principle.
> 
> Does adding !HARD_REGISTER_P (x) to:
> 
>   /* Show the output dies here.  This is necessary for SUBREGs
>  of pseudos since we cannot track their lifetimes correctly;
>  hard regs shouldn't appear here except as return values.  */
>   if (!reload_completed && !reload_in_progress
>   && REG_P (x) && 

Re: [PATCH] match.pd: Add bitwise and pattern [PR106243]

2022-08-04 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 3 Aug 2022, Jeff Law via Gcc-patches wrote:

> > .optimized dump shows:
> >_1 = ABS_EXPR ;
> >_3 = _1 & 1;
> >return _3;
> > 
> > altho combine simplifies it to x & 1 on RTL, resulting in code-gen:
> > f1:
> >  and w0, w0, 1
> >  ret
> Doesn't the abs(x) & mask simplify to x & mask for any mask where the sign bit
> of x is off

No.  Only the lowest bit remains the same between x and -x, all others 
might or might not be inverted (first counter example: x=-3, mask=3).


Ciao,
Michael.


[committed] libstdc++: Rename data members of std::unexpected and std::bad_expected_access

2022-08-04 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

The P2549R1 paper was accepted for C++23. I already implemented it for
our , but I didn't rename the private daata members, only the
public member functions. This renames the data members for consistency
with the working draft.

libstdc++-v3/ChangeLog:

* include/std/expected (unexpected::_M_val): Rename to _M_unex.
(bad_expected_access::_M_val): Likewise.
---
 libstdc++-v3/include/std/expected | 32 +++
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index 3446d6dbaed..3ee13aa95f6 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -95,32 +95,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 class bad_expected_access : public bad_expected_access {
 public:
   explicit
-  bad_expected_access(_Er __e) : _M_val(std::move(__e)) { }
+  bad_expected_access(_Er __e) : _M_unex(std::move(__e)) { }
 
   // XXX const char* what() const noexcept override;
 
   [[nodiscard]]
   _Er&
   error() & noexcept
-  { return _M_val; }
+  { return _M_unex; }
 
   [[nodiscard]]
   const _Er&
   error() const & noexcept
-  { return _M_val; }
+  { return _M_unex; }
 
   [[nodiscard]]
   _Er&&
   error() && noexcept
-  { return std::move(_M_val); }
+  { return std::move(_M_unex); }
 
   [[nodiscard]]
   const _Er&&
   error() const && noexcept
-  { return std::move(_M_val); }
+  { return std::move(_M_unex); }
 
 private:
-  _Er _M_val;
+  _Er _M_unex;
 };
 
   /// Tag type for constructing unexpected values in a std::expected
@@ -175,7 +175,7 @@ namespace __expected
constexpr explicit
unexpected(_Err&& __e)
noexcept(is_nothrow_constructible_v<_Er, _Err>)
-   : _M_val(std::forward<_Err>(__e))
+   : _M_unex(std::forward<_Err>(__e))
{ }
 
   template
@@ -183,7 +183,7 @@ namespace __expected
constexpr explicit
unexpected(in_place_t, _Args&&... __args)
noexcept(is_nothrow_constructible_v<_Er, _Args...>)
-   : _M_val(std::forward<_Args>(__args)...)
+   : _M_unex(std::forward<_Args>(__args)...)
{ }
 
   template
@@ -192,7 +192,7 @@ namespace __expected
unexpected(in_place_t, initializer_list<_Up> __il, _Args&&... __args)
noexcept(is_nothrow_constructible_v<_Er, initializer_list<_Up>&,
_Args...>)
-   : _M_val(__il, std::forward<_Args>(__args)...)
+   : _M_unex(__il, std::forward<_Args>(__args)...)
{ }
 
   constexpr unexpected& operator=(const unexpected&) = default;
@@ -201,33 +201,33 @@ namespace __expected
 
   [[nodiscard]]
   constexpr const _Er&
-  error() const & noexcept { return _M_val; }
+  error() const & noexcept { return _M_unex; }
 
   [[nodiscard]]
   constexpr _Er&
-  error() & noexcept { return _M_val; }
+  error() & noexcept { return _M_unex; }
 
   [[nodiscard]]
   constexpr const _Er&&
-  error() const && noexcept { return std::move(_M_val); }
+  error() const && noexcept { return std::move(_M_unex); }
 
   [[nodiscard]]
   constexpr _Er&&
-  error() && noexcept { return std::move(_M_val); }
+  error() && noexcept { return std::move(_M_unex); }
 
   constexpr void
   swap(unexpected& __other) noexcept(is_nothrow_swappable_v<_Er>)
   {
static_assert( is_swappable_v<_Er> );
using std::swap;
-   swap(_M_val, __other._M_val);
+   swap(_M_unex, __other._M_unex);
   }
 
   template
[[nodiscard]]
friend constexpr bool
operator==(const unexpected& __x, const unexpected<_Err>& __y)
-   { return __x._M_val == __y.error(); }
+   { return __x._M_unex == __y.error(); }
 
   friend constexpr void
   swap(unexpected& __x, unexpected& __y)
@@ -236,7 +236,7 @@ namespace __expected
   { __x.swap(__y); }
 
 private:
-  _Er _M_val;
+  _Er _M_unex;
 };
 
   template unexpected(_Er) -> unexpected<_Er>;
-- 
2.37.1



[committed] libstdc++: Update value of __cpp_lib_ios_noreplace macro

2022-08-04 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

My P2467R1 proposal was accepted for C++23 so there's an official value
for this macro now.

libstdc++-v3/ChangeLog:

* include/bits/ios_base.h (__cpp_lib_ios_noreplace): Update
value to 202207L.
* include/std/version (__cpp_lib_ios_noreplace): Likewise.
* testsuite/27_io/basic_ofstream/open/char/noreplace.cc: Check
for new value.
* testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc:
Likewise.
---
 libstdc++-v3/include/bits/ios_base.h  | 2 +-
 libstdc++-v3/include/std/version  | 2 +-
 .../testsuite/27_io/basic_ofstream/open/char/noreplace.cc | 4 ++--
 .../testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc  | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/ios_base.h 
b/libstdc++-v3/include/bits/ios_base.h
index e34097171a5..5b554548ecd 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -474,7 +474,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 static const openmode __noreplace =_S_noreplace;
 
 #if __cplusplus >= 202100L
-#define __cpp_lib_ios_noreplace 202200L
+#define __cpp_lib_ios_noreplace 202207L
 /// Open a file in exclusive mode.
 static const openmode noreplace =  _S_noreplace;
 #endif
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 5edca2f3007..6b6187973a2 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -316,7 +316,7 @@
 # define __cpp_lib_expected 202202L
 #endif
 #define __cpp_lib_invoke_r 202106L
-#define __cpp_lib_ios_noreplace 202200L
+#define __cpp_lib_ios_noreplace 202207L
 #if __cpp_lib_concepts
 # undef __cpp_lib_optional
 # define __cpp_lib_optional 202110L
diff --git a/libstdc++-v3/testsuite/27_io/basic_ofstream/open/char/noreplace.cc 
b/libstdc++-v3/testsuite/27_io/basic_ofstream/open/char/noreplace.cc
index e39f5928a1f..56ff2d7cead 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ofstream/open/char/noreplace.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_ofstream/open/char/noreplace.cc
@@ -2,10 +2,10 @@
 
 #include 
 
-#if __cplusplus >= 202200L
+#if __cplusplus >= 202207L
 #ifndef __cpp_lib_ios_noreplace
 # error "Feature-test macro for ios::noreplace missing in "
-#elif __cpp_lib_ios_noreplace < 202200L
+#elif __cpp_lib_ios_noreplace < 202207L
 # error "Feature-test macro for ios::noreplace has wrong value in "
 #endif
 #endif
diff --git 
a/libstdc++-v3/testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc 
b/libstdc++-v3/testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc
index 77f11865ac4..f0425cdab3d 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_ofstream/open/wchar_t/noreplace.cc
@@ -2,10 +2,10 @@
 
 #include 
 
-#if __cplusplus >= 202200L
+#if __cplusplus >= 202207L
 #ifndef __cpp_lib_ios_noreplace
 # error "Feature-test macro for ios::noreplace missing in "
-#elif __cpp_lib_ios_noreplace < 202200L
+#elif __cpp_lib_ios_noreplace < 202207L
 # error "Feature-test macro for ios::noreplace has wrong value in "
 #endif
 #endif
-- 
2.37.1



[committed] libstdc++: Unblock atomic wait on non-futex platforms [PR106183]

2022-08-04 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, powerpc64le-linux and sparc-solaris2.11, pushed to trunk.

We want this on the gcc-12 and gcc-11 branches too.

-- >8 --

When using a mutex and condition variable, the notifying thread needs to
increment _M_ver while holding the mutex lock, and the waiting thread
needs to re-check after locking the mutex. This avoids a missed
notification as described in the PR.

By moving the increment of _M_ver to the base _M_notify we can make the
use of the mutex local to the use of the condition variable, and
simplify the code a little. We can use a relaxed store because the mutex
already provides sequential consistency. Also we don't need to check
whether __addr == &_M_ver because we know that's always true for
platforms that use a condition variable, and so we also know that we
always need to use notify_all() not notify_one().

Reviewed-by: Thomas Rodgers 

libstdc++-v3/ChangeLog:

PR libstdc++/106183
* include/bits/atomic_wait.h (__waiter_pool_base::_M_notify):
Move increment of _M_ver here.
[!_GLIBCXX_HAVE_PLATFORM_WAIT]: Lock mutex around increment.
Use relaxed memory order and always notify all waiters.
(__waiter_base::_M_do_wait) [!_GLIBCXX_HAVE_PLATFORM_WAIT]:
Check value again after locking mutex.
(__waiter_base::_M_notify): Remove increment of _M_ver.
---
 libstdc++-v3/include/bits/atomic_wait.h | 42 -
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index a6d55d3af8a..76ed7409937 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -221,18 +221,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   void
-  _M_notify(const __platform_wait_t* __addr, bool __all, bool __bare) 
noexcept
+  _M_notify(__platform_wait_t* __addr, [[maybe_unused]] bool __all,
+   bool __bare) noexcept
   {
-   if (!(__bare || _M_waiting()))
- return;
-
 #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-   __platform_notify(__addr, __all);
+   if (__addr == &_M_ver)
+ {
+   __atomic_fetch_add(__addr, 1, __ATOMIC_SEQ_CST);
+   __all = true;
+ }
+
+   if (__bare || _M_waiting())
+ __platform_notify(__addr, __all);
 #else
-   if (__all)
+   {
+ lock_guard __l(_M_mtx);
+ __atomic_fetch_add(__addr, 1, __ATOMIC_RELAXED);
+   }
+   if (__bare || _M_waiting())
  _M_cv.notify_all();
-   else
- _M_cv.notify_one();
 #endif
   }
 
@@ -259,7 +266,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (__val == __old)
  {
lock_guard __l(_M_mtx);
-   _M_cv.wait(_M_mtx);
+   __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
+   if (__val == __old)
+ _M_cv.wait(_M_mtx);
  }
 #endif // __GLIBCXX_HAVE_PLATFORM_WAIT
   }
@@ -297,20 +306,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
, _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
  { }
 
-   bool
-   _M_laundered() const
-   { return _M_addr == &_M_w._M_ver; }
-
void
-   _M_notify(bool __all, bool __bare = false)
-   {
- if (_M_laundered())
-   {
- __atomic_fetch_add(_M_addr, 1, __ATOMIC_SEQ_CST);
- __all = true;
-   }
- _M_w._M_notify(_M_addr, __all, __bare);
-   }
+   _M_notify(bool __all, bool __bare = false) noexcept
+   { _M_w._M_notify(_M_addr, __all, __bare); }
 
template
-- 
2.37.1



[PATCH]middle-end: Fix phi-ssa assertion triggers. [PR106519]

2022-08-04 Thread Tamar Christina via Gcc-patches
Hi All,

The failures on -m32 on x86 show that the checks at the top level in
tree_ssa_phiopt_worker aren't enough for diamond_p.

In minmax_replacement we perform the additional validation of the shape but by
then it's too late to catch these case.

This patch changes it so we check that for a diamond shape we check that the
edges we're operation on must result in the same destination BB.

We also enforce that for a diamond the middle BBs must have a single successor,
this is because the remainder of the code always follows EDGE_SUCC 0.  If there
are more than 1 successor then the edge we follow could be not part of the
diamond so just reject it early on.

I also remove the assert after the use of gimple_phi_arg_def as this function
can't return NULL. if the index is out of range it already breaks on an assert
inside the gimple_phi_arg_def, so we never hit this assert.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR middle-end/106519
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Check final phi edge for
diamond shapes.

gcc/testsuite/ChangeLog:

PR middle-end/106519
* gcc.dg/pr106519.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/pr106519.c b/gcc/testsuite/gcc.dg/pr106519.c
new file mode 100644
index 
..3d4662d8a02c6501560abb71ac53320f093620d8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr106519.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+int bytestart, bytemem_0_0, modlookup_l_p;
+
+void
+modlookup_l() {
+  long j;
+  while (modlookup_l_p)
+while (bytestart && j && bytemem_0_0)
+  j = j + 1;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 
a8e55e040649e17f83a2fc3340e368cf9c4c5e70..1c4942b5b5c18732a8b18789c04e2685437404dd
 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -268,7 +268,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
  continue;
}
   else if (EDGE_SUCC (bb1, 0)->dest == EDGE_SUCC (bb2, 0)->dest
-  && !empty_block_p (bb1))
+  && !empty_block_p (bb1)
+  && single_succ_p (bb1) && single_succ_p (bb2))
diamond_p = true;
   else
continue;
@@ -311,6 +312,12 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
for (gsi = gsi_start (phis); !gsi_end_p (gsi); gsi_next ())
  {
phi = as_a  (gsi_stmt (gsi));
+
+   /* A diamond must continue to the same destination node, 
otherwise
+  by definition it's not a diamond.  */
+   if (diamond_p && e1->dest != e2->dest)
+ continue;
+
arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
if (value_replacement (bb, bb1, e1, e2, phi, arg0, arg1) == 2)
@@ -329,12 +336,15 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
  if (!phi)
continue;
 
- arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
- arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
+ /* A diamond must continue to the same destination node, otherwise
+by definition it's not a diamond.  */
+ if (diamond_p && e1->dest != e2->dest)
+   continue;
 
  /* Something is wrong if we cannot find the arguments in the PHI
 node.  */
- gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
+ arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
+ arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
 
  gphi *newphi;
  if (single_pred_p (bb1)




-- 
diff --git a/gcc/testsuite/gcc.dg/pr106519.c b/gcc/testsuite/gcc.dg/pr106519.c
new file mode 100644
index 
..3d4662d8a02c6501560abb71ac53320f093620d8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr106519.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+int bytestart, bytemem_0_0, modlookup_l_p;
+
+void
+modlookup_l() {
+  long j;
+  while (modlookup_l_p)
+while (bytestart && j && bytemem_0_0)
+  j = j + 1;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 
a8e55e040649e17f83a2fc3340e368cf9c4c5e70..1c4942b5b5c18732a8b18789c04e2685437404dd
 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -268,7 +268,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
  continue;
}
   else if (EDGE_SUCC (bb1, 0)->dest == EDGE_SUCC (bb2, 0)->dest
-  && !empty_block_p (bb1))
+  && !empty_block_p (bb1)
+  && single_succ_p (bb1) && single_succ_p (bb2))
diamond_p = true;
   else
continue;
@@ -311,6 +312,12 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)

Re: [PATCH, rs6000] Correct return value of check_p9modulo_hw_available

2022-08-04 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/8/4 17:55, HAO CHEN GUI wrote:
> Hi,
>   This patch corrects return value of check_p9modulo_hw_available. It should
> return 0 when p9modulo is supported.

Good catch!  There is no case using p9modulo_hw for now, no coverage, sigh...

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.

This patch is OK, thanks!

BR,
Kewen

> 
> ChangeLog
> 2022-08-04  Haochen Gui  
> 
> gcc/testsuite/
>   * lib/target-supports.exp (check_p9modulo_hw_available): Correct return
>   value.
> 
> 
> patch.diff
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 4ed7b25b9a4..04a2a8e8659 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -2288,7 +2288,7 @@ proc check_p9modulo_hw_available { } {
>   {
>   int i = 5, j = 3, r = -1;
>   asm ("modsw %0,%1,%2" : "+r" (r) : "r" (i), "r" (j));
> - return (r == 2);
> + return (r != 2);
>   }
>   } $options
>   }




Re: tuple pretty printer

2022-08-04 Thread Jonathan Wakely via Gcc-patches
CC gcc-patches

On Wed, 27 Jul 2022 at 17:40, Ulrich Drepper via Libstdc++
 wrote:
>
> The current tuple pretty printer shows for this variable
>
> std::tuple a{1,2,3};
>
> the following output:
>
> (gdb) p a
> $1 = std::tuple containing = {[1] = 1, [2] = 2, [3] = 3}
>
> I find this quite irritating because the indices don't match the
> std::get template parameters.  In a large tuple or arrays of tuples
> which are less readable than this simple example this becomes an even
> larger problem.  How about the following simple patch which brings the
> indices in line?

I think this makes sense, want to push it?


>
> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> @@ -611,9 +611,9 @@ class StdTuplePrinter:
>  # the value "as is".
>  fields = impl.type.fields ()
>  if len (fields) < 1 or fields[0].name != "_M_head_impl":
> -return ('[%d]' % self.count, impl)
> +return ('[%d]' % (self.count - 1), impl)
>  else:
> -return ('[%d]' % self.count, impl['_M_head_impl'])
> +return ('[%d]' % (self.count - 1), impl['_M_head_impl'])
>
>  def __init__ (self, typename, val):
>  self.typename = strip_versioned_namespace(typename)
>



[PATCH] tree-optimization/106521 - unroll-and-jam LC SSA rewrite

2022-08-04 Thread Richard Biener via Gcc-patches
The LC SSA rewrite performs SSA verification at start but the VN
run performed on the unrolled-and-jammed body can leave us with
invalid SSA form until CFG cleanup is run.  So make sure we do that
before rewriting into LC SSA.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/106521
* gimple-loop-jam.cc (tree_loop_unroll_and_jam): Perform
CFG cleanup manually before rewriting into LC SSA.

* gcc.dg/torture/pr106521.c: New testcase.
---
 gcc/gimple-loop-jam.cc  |  8 
 gcc/testsuite/gcc.dg/torture/pr106521.c | 17 +
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr106521.c

diff --git a/gcc/gimple-loop-jam.cc b/gcc/gimple-loop-jam.cc
index 8cde6c7c5ce..41ba4e42819 100644
--- a/gcc/gimple-loop-jam.cc
+++ b/gcc/gimple-loop-jam.cc
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop-ivopts.h"
 #include "tree-vectorizer.h"
 #include "tree-ssa-sccvn.h"
+#include "tree-cfgcleanup.h"
 
 /* Unroll and Jam transformation

@@ -609,6 +610,13 @@ tree_loop_unroll_and_jam (void)
 
   if (todo)
 {
+  /* We need to cleanup the CFG first since otherwise SSA form can
+be not up-to-date from the VN run.  */
+  if (todo & TODO_cleanup_cfg)
+   {
+ cleanup_tree_cfg ();
+ todo &= ~TODO_cleanup_cfg;
+   }
   rewrite_into_loop_closed_ssa (NULL, 0);
   scev_reset ();
   free_dominance_info (CDI_DOMINATORS);
diff --git a/gcc/testsuite/gcc.dg/torture/pr106521.c 
b/gcc/testsuite/gcc.dg/torture/pr106521.c
new file mode 100644
index 000..05c8ce54d0b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr106521.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-floop-unroll-and-jam --param 
unroll-jam-min-percent=0" } */
+
+short a, b, e;
+volatile long c;
+long d;
+int main() {
+  for (; d; d++) {
+long g = a = 1;
+for (; a; a++) {
+  g++;
+  c;
+}
+g && (b = e);
+  }
+  return 0;
+}
-- 
2.35.3


Re: [RFC: PATCH] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

2022-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
>> +/* Create vector init for vectorized iv.  */
>> +static tree
>> +vect_create_nonlinear_iv_init (gimple_seq* stmts, tree init_expr,
>> +  tree step_expr, poly_uint64 nunits,
>> +  tree vectype,
>> +  enum vect_induction_op_type induction_type)
>> +{
>> +  unsigned HOST_WIDE_INT const_nunits;
>> +  tree vec_shift, vec_init, new_name;
>> +  unsigned i;
>> +
>> +  /* iv_loop is the loop to be vectorized. Create:
>> + vec_init = [X, X+S, X+2*S, X+3*S] (S = step_expr, X = init_expr).  */
>> +  new_name = init_expr;
>> +  switch (induction_type)
>> +{
>> +case vect_step_op_shr:
>> +case vect_step_op_shl:
>> +  /* Build the Initial value from shift_expr.  */
>> +  vec_init = gimple_build_vector_from_val (stmts,
>> +  vectype,
>> +  new_name);
>> +  vec_shift = gimple_build (stmts, VEC_SERIES_EXPR, vectype,
>> +   build_zero_cst (TREE_TYPE (step_expr)),
>> +   step_expr);
>
> There might be a more canonical way to build the series expr - Richard?

build_vec_series is shorter if step_expr is known to be a constant.
The above looks good for the general case.

Thanks,
Richard


[PATCH, rs6000] Correct return value of check_p9modulo_hw_available

2022-08-04 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch corrects return value of check_p9modulo_hw_available. It should
return 0 when p9modulo is supported.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-08-04  Haochen Gui  

gcc/testsuite/
* lib/target-supports.exp (check_p9modulo_hw_available): Correct return
value.


patch.diff
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4ed7b25b9a4..04a2a8e8659 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2288,7 +2288,7 @@ proc check_p9modulo_hw_available { } {
{
int i = 5, j = 3, r = -1;
asm ("modsw %0,%1,%2" : "+r" (r) : "r" (i), "r" (j));
-   return (r == 2);
+   return (r != 2);
}
} $options
}


Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-04 Thread Richard Sandiford via Gcc-patches
Takayuki 'January June' Suwa  writes:
> Thanks for your response.
>
> On 2022/08/03 16:52, Richard Sandiford wrote:
>> Takayuki 'January June' Suwa via Gcc-patches  
>> writes:
>>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
>>> data flow consistent, but it also increases register allocation pressure
>>> and thus often creates many unwanted register-to-register moves that
>>> cannot be optimized away.
>> 
>> There are two things here:
>> 
>> - If emit_move_complex_parts emits a clobber of a hard register,
>>   then that's probably a bug/misfeature.  The point of the clobber is
>>   to indicate that the register has no useful contents.  That's useful
>>   for wide pseudos that are written to in parts, since it avoids the
>>   need to track the liveness of each part of the pseudo individually.
>>   But it shouldn't be necessary for hard registers, since subregs of
>>   hard registers are simplified to hard registers wherever possible
>>   (which on most targets is "always").
>> 
>>   So I think the emit_move_complex_parts clobber should be restricted
>>   to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>>   (if only partly) then it would be worth doing as its own patch.
>> 
>> - I think it'd be worth looking into more detail why a clobber makes
>>   a difference to register pressure.  A clobber of a pseudo register R
>>   shouldn't make R conflict with things that are live at the point of
>>   the clobber.
>
> I agree with its worth.
> In fact, aside from other ports, on the xtensa one, RA in code with frequent 
> D[FC]mode pseudos is terribly bad.
> For example, in __muldc3 on libgcc2, the size of the stack frame reserved 
> will almost double depending on whether or not this patch is applied.

Yeah, that's a lot.

>>>  It seems just analogous to partial register
>>> stall which is a famous problem on processors that do register renaming.
>>>
>>> In my opinion, when the register to be clobbered is a composite of hard
>>> ones, we should clobber the individual elements separetely, otherwise
>>> clear the entire to zero prior to use as the "init-regs" pass does (like
>>> partial register stall workarounds on x86 CPUs).  Such redundant zero
>>> constant assignments will be removed later in the "cprop_hardreg" pass.
>> 
>> I don't think we should rely on the zero being optimised away later.
>> 
>> Emitting the zero also makes it harder for the register allocator
>> to elide the move.  For example, if we have:
>> 
>>   (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>>   (set (subreg:SI (reg:DI P) 4) (reg:SI R1))
>> 
>> then there is at least a chance that the RA could assign hard registers
>> R0:R1 to P, which would turn the moves into nops.  If we emit:
>> 
>>   (set (reg:DI P) (const_int 0))
>> 
>> beforehand then that becomes impossible, since R0 and R1 would then
>> conflict with P.
>
> Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds to 
> one hard register.

I was thinking here about the case where (reg:DI …) corresponds to
2 hard registers.  Each subreg move is then a single hard register
copy, but assigning P to the combination R0:R1 can remove both of
the subreg moves.

>> TBH I'm surprised we still run init_regs for LRA.  I thought there was
>> a plan to stop doing that, but perhaps I misremember.
>
> Sorry I am not sure about the status of LRA... because the xtensa port is 
> still using reload.

Ah, hadn't realised that.  If you have time to work on it, it would be
really good to move over to LRA.  There are plans to remove old reload.

It might be that old reload *does* treat a pseudo clobber as a conflict.
I can't remember now.  If so, then zeroing the register wouldn't be
too bad (for old reload only).

> As conclusion, trying to tweak the common code side may have been a bit 
> premature.
> I'll consider if I can deal with those issues on the side of the 
> target-specific code.

It's likely to be at least partly a target-independent issue, so tweaking
the common code makes sense in principle.

Does adding !HARD_REGISTER_P (x) to:

  /* Show the output dies here.  This is necessary for SUBREGs
 of pseudos since we cannot track their lifetimes correctly;
 hard regs shouldn't appear here except as return values.  */
  if (!reload_completed && !reload_in_progress
  && REG_P (x) && !reg_overlap_mentioned_p (x, y))
emit_clobber (x);

in emit_move_complex_parts help?  If so, I think we should do at
least that much.

Thanks,
Richard


Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-04 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches  writes:
> On 8/3/2022 1:52 AM, Richard Sandiford via Gcc-patches wrote:
>> Takayuki 'January June' Suwa via Gcc-patches  
>> writes:
>>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
>>> data flow consistent, but it also increases register allocation pressure
>>> and thus often creates many unwanted register-to-register moves that
>>> cannot be optimized away.
>> There are two things here:
>>
>> - If emit_move_complex_parts emits a clobber of a hard register,
>>then that's probably a bug/misfeature.  The point of the clobber is
>>to indicate that the register has no useful contents.  That's useful
>>for wide pseudos that are written to in parts, since it avoids the
>>need to track the liveness of each part of the pseudo individually.
>>But it shouldn't be necessary for hard registers, since subregs of
>>hard registers are simplified to hard registers wherever possible
>>(which on most targets is "always").
>>
>>So I think the emit_move_complex_parts clobber should be restricted
>>to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>>(if only partly) then it would be worth doing as its own patch.
> Agreed.
>
>>
>> - I think it'd be worth looking into more detail why a clobber makes
>>a difference to register pressure.  A clobber of a pseudo register R
>>shouldn't make R conflict with things that are live at the point of
>>the clobber.
> Also agreed.
>
>>
>>>   It seems just analogous to partial register
>>> stall which is a famous problem on processors that do register renaming.
>>>
>>> In my opinion, when the register to be clobbered is a composite of hard
>>> ones, we should clobber the individual elements separetely, otherwise
>>> clear the entire to zero prior to use as the "init-regs" pass does (like
>>> partial register stall workarounds on x86 CPUs).  Such redundant zero
>>> constant assignments will be removed later in the "cprop_hardreg" pass.
>> I don't think we should rely on the zero being optimised away later.
>>
>> Emitting the zero also makes it harder for the register allocator
>> to elide the move.  For example, if we have:
>>
>>(set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>>(set (subreg:SI (reg:DI P) 4) (reg:SI R1))
>>
>> then there is at least a chance that the RA could assign hard registers
>> R0:R1 to P, which would turn the moves into nops.  If we emit:
>>
>>(set (reg:DI P) (const_int 0))
>>
>> beforehand then that becomes impossible, since R0 and R1 would then
>> conflict with P.
>>
>> TBH I'm surprised we still run init_regs for LRA.  I thought there was
>> a plan to stop doing that, but perhaps I misremember.
> I have vague memories of dealing with some of this nonsense a few 
> release cycles.  I don't recall all the details, but init-regs + 
> lower-subreg + regcprop + splitting all conspired to generate poor code 
> on the MIPS targets.  See pr87761, though it doesn't include all my 
> findings -- I can't recall if I walked through the entire tortured 
> sequence in the gcc-patches discussion or not.
>
> I ended up working around in the mips backend in conjunction with some 
> changes to regcprop IIRC.

Thanks for the pointer, hadn't seen that.  And yeah, for the early-ish
passes, I guess the interaction between lower-subreg and init-regs is
important too, not just the interaction between lower-subreg and RA.
It probably also ties into the problems with overly-scalarised register
moves, like in PR 106106.

So maybe I was being too optimistic :-)

Richard


Re: [PATCH] Backwards threader greedy search TLC

2022-08-04 Thread Richard Biener via Gcc-patches
On Thu, 4 Aug 2022, Aldy Hernandez wrote:

> On Thu, Aug 4, 2022 at 9:15 AM Richard Biener  wrote:
> >
> > On Thu, 4 Aug 2022, Aldy Hernandez wrote:
> >
> > > On Wed, Aug 3, 2022 at 11:53 AM Richard Biener  wrote:
> > > >
> > > > I've tried to understand how the greedy search works seeing the
> > > > bitmap dances and the split into resolve_phi.  I've summarized
> > > > the intent of the algorithm as
> > > >
> > > >   // For further greedy searching we want to remove interesting
> > > >   // names defined in BB but add ones on the PHI edges for the
> > > >   // respective edges.
> > > >
> > > > but the implementation differs in detail.  In particular when
> > > > there is more than one interesting PHI in BB it seems to only consider
> > > > the first for translating defs across edges.  It also only applies
> > > > the loop crossing restriction when there is an interesting PHI.
> > > > I've also noticed that while the set of interesting names is rolled
> > > > back, m_imports just keeps growing - is that a bug?
> > >
> > > I've never quite liked how I was handling PHIs.  I had some
> > > improvements in this space, especially the problem with only
> > > considering the first def across a PHI, but I ran out of time last
> > > cycle, and we were already into stage3.
> > >
> > > The loop crossing restriction I inherited from the original
> > > implementation, though I suppose I restricted it further by only
> > > looking at interesting PHIs.  In practice I don't think it mattered,
> > > because we cap loop crossing violations in cancel_invalid_paths in the
> > > registry.  Does anything break if you keep the restriction across the
> > > board?
> >
> > Probably not - I've also spotted all these "late" invalidates all over
> > the place, some of them should be done earlier to limit the exponential
> > search.  I plan to go find them and divide them into things that can
> > be checked locally per BB (good to do at greedy search time) and those
> > that need the full path (we should simply not register such path).
> 
> That would be great.  The earlier the better.
> 
> One of the reasons for the current code doing them late, is because we
> still have the DOM/forward threader, so we need a place to catch
> violations from both threaders.  With the impending demise of the
> forward threader, I suppose we could move as much as possible earlier.
> 
> >
> > I'll keep this as is for now.
> >
> > > Good spotting on the imports growing.  Instinctively that seems like a 
> > > bug.
> > >
> > > [As a suggestion, check the total threadable paths as a sanity check
> > > if you make any changes in this space.  What I've been doing is
> > > counting "Jumps threaded" in the *.stat dump file across the .ii files
> > > in a bootstrap, and making sure you're not getting the same # of
> > > threads because we missed one in the backward threader which then got
> > > picked up by DOM.  I divide up the threads by ethread, thread,
> > > threadfull, DOM, etc, to make sure I'm not just shuffling threading
> > > opportunities around.]
> >
> > I'll do this experiment and will roll this fix into the patch if it
> > works out.
> >
> > > But yeah, we shouldn't be growing imports unnecessarily, if nothing
> > > else because of the bitmap explosion you're noticing in other places.
> > > In practice, I'm not so sure it slows the solver itself down:
> > >
> > > // They are hints for the solver.  Adding more elements doesn't slow
> > > // us down, because we don't solve anything that doesn't appear in the
> > > // path.  On the other hand, not having enough imports will limit what
> > > // we can solve.
> > >
> > > But that comment may be old ;-).
> >
> > I hoped so, yes.
> >
> > > >
> > > > The following preserves the loop crossing restriction to the case
> > > > of interesting PHIs but merges resolve_phi back, changing interesting
> > > > as outlined with the intent above.  It should get more threading
> > > > cases when there are multiple interesting PHI defs in a block.
> > > > It might be a bit faster due to less bitmap operations but in the
> > > > end the main intent was to make what happens more obvious.
> > >
> > > Sweet!
> > >
> > > >
> > > > Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> > > >
> > > > Aldy - you wrote the existing implementation, is the following OK?
> > >
> > > I'm tickled pink you're cleaning and improving this.  Looks good.
> >
> > OK, I'll re-test and push after doing the m_imports experiment.

So it turns out we cannot simply clear the same bits from m_imports
as we do from new_interesting since the bits are not the same.
In fact we do not prune names from m_imports that are locally defined
in BB (and the fact that cc1files sees one threading less hints at
that we shouldn't for some reason).  I've amended the comment
simply mentioning it's not worth the trouble keeping track of which
bits we can clear and which we should keep.

We can revisit this later if wanted.

Richard.


Re: [PATCH] Add condition coverage profiling

2022-08-04 Thread Jørgen Kvalsvik via Gcc-patches
On 04/08/2022 09:43, Sebastian Huber wrote:
> On 02/08/2022 09:58, Jørgen Kvalsvik wrote:
>> Based on this established terminology I can think of a few good candidates:
>>
>> condition outcomes covered n/m
>> outcomes covered n/m
>>
>> What do you think?
> 
> Both are fine, but I would prefer "condition outcomes covered n/m".
> 

I'll update the patch. Maybe we should review the other outputs too? Something 
like:

condition N missing outcome (true false)


Re: [PATCH] RFC: Extend SLP permutation optimisations

2022-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 2 Aug 2022, Richard Sandiford wrote:
>
>> Currently SLP tries to force permute operations "down" the graph
>> from loads in the hope of reducing the total number of permutes
>> needed or (in the best case) removing the need for the permutes
>> entirely.  This patch tries to extend it as follows:
>> 
>> - Allow loads to take a different permutation from the one they
>>   started with, rather than choosing between "original permutation"
>>   and "no permutation".
>> 
>> - Allow changes in both directions, if the target supports the
>>   reverse permute operation.
>> 
>> - Treat the placement of permute operations as a two-way dataflow
>>   problem: after propagating information from leaves to roots (as now),
>>   propagate information back up the graph.
>> 
>> - Take execution frequency into account when optimising for speed,
>>   so that (for example) permutes inside loops have a higher cost
>>   than permutes outside loops.
>> 
>> - Try to reduce the total number of permutes when optimising for
>>   size, even if that increases the number of permutes on a given
>>   execution path.
>> 
>> See the big block comment above vect_optimize_slp_pass for
>> a detailed description.
>> 
>> A while back I posted a patch to extend the existing optimisation
>> to non-consecutive loads.  This patch doesn't include that change
>> (although it's a possible future addition).
>> 
>> The original motivation for doing this was to add a framework that would
>> allow other layout differences in future.  The two main ones are:
>> 
>> - Make it easier to represent predicated operations, including
>>   predicated operations with gaps.  E.g.:
>> 
>>  a[0] += 1;
>>  a[1] += 1;
>>  a[3] += 1;
>> 
>>   could be a single load/add/store for SVE.  We could handle this
>>   by representing a layout such as { 0, 1, _, 2 } or { 0, 1, _, 3 }
>>   (depending on what's being counted).  We might need to move
>>   elements between lanes at various points, like with permutes.
>> 
>>   (This would first mean adding support for stores with gaps.)
>> 
>> - Make it easier to switch between an even/odd and unpermuted layout
>>   when switching between wide and narrow elements.  E.g. if a widening
>>   operation produces an even vector and an odd vector, we should try
>>   to keep operations on the wide elements in that order rather than
>>   force them to be permuted back "in order".
>> 
>> To give some examples of what the current patch does:
>> 
>> int f1(int *__restrict a, int *__restrict b, int *__restrict c,
>>int *__restrict d)
>> {
>>   a[0] = (b[1] << c[3]) - d[1];
>>   a[1] = (b[0] << c[2]) - d[0];
>>   a[2] = (b[3] << c[1]) - d[3];
>>   a[3] = (b[2] << c[0]) - d[2];
>> }
>> 
>> continues to produce the same code as before when optimising for
>> speed: b, c and d are permuted at load time.  But when optimising
>> for size we instead permute c into the same order as b+d and then
>> permute the result of the arithmetic into the same order as a:
>> 
>> ldr q1, [x2]
>> ldr q0, [x1]
>> ext v1.16b, v1.16b, v1.16b, #8 // <--
>> sshlv0.4s, v0.4s, v1.4s
>> ldr q1, [x3]
>> sub v0.4s, v0.4s, v1.4s
>> rev64   v0.4s, v0.4s   // <--
>> str q0, [x0]
>> ret
>> 
>> The following function:
>> 
>> int f2(int *__restrict a, int *__restrict b, int *__restrict c,
>>int *__restrict d)
>> {
>>   a[0] = (b[3] << c[3]) - d[3];
>>   a[1] = (b[2] << c[2]) - d[2];
>>   a[2] = (b[1] << c[1]) - d[1];
>>   a[3] = (b[0] << c[0]) - d[0];
>> }
>> 
>> continues to push the reverse down to just before the store,
>> like the current code does.
>> 
>> In:
>> 
>> int f3(int *__restrict a, int *__restrict b, int *__restrict c,
>>int *__restrict d)
>> {
>>   for (int i = 0; i < 100; ++i)
>> {
>>   a[0] = (a[0] + c[3]);
>>   a[1] = (a[1] + c[2]);
>>   a[2] = (a[2] + c[1]);
>>   a[3] = (a[3] + c[0]);
>>   c += 4;
>> }
>> }
>> 
>> the loads of a are hoisted and the stores of a are sunk, so that
>> only the load from c happens in the loop.  When optimising for
>> speed, we prefer to have the loop operate on the reversed layout,
>> changing on entry and exit from the loop:
>> 
>> mov x3, x0
>> adrpx0, .LC0
>> add x1, x2, 1600
>> ldr q2, [x0, #:lo12:.LC0]
>> ldr q0, [x3]
>> mov v1.16b, v0.16b
>> tbl v0.16b, {v0.16b - v1.16b}, v2.16b// <
>> .p2align 3,,7
>> .L6:
>> ldr q1, [x2], 16
>> add v0.4s, v0.4s, v1.4s
>> cmp x2, x1
>> bne .L6
>> mov v1.16b, v0.16b
>> adrpx0, .LC0
>> ldr q2, [x0, #:lo12:.LC0]
>> tbl v0.16b, {v0.16b - v1.16b}, v2.16b// <
>> str q0, [x3]
>> ret
>> 
>> Similarly, for the very artificial testcase:
>> 
>> int f4(int 

[PATCH] lto: support --jobserver-style=fifo for recent GNU make

2022-08-04 Thread Martin Liška

After a long time, GNU make has finally implemented named pipes when
it comes to --jobserver-auth. The traditional approach are
provided opened file descriptors that causes troubles:
https://savannah.gnu.org/bugs/index.php?57242

GNU make commit:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=7ad2593b2d2bb5b9332fd8bf93ac6f958bc6

I tested that locally with TOT GNU make and it works:

$ cat Makefile
all:
g++ tramp3d-v4.ii -c -flto -O2
g++ tramp3d-v4.o -flto=jobserver

$ MAKE=/tmp/bin/bin/make /tmp/bin/bin/make -j16 --jobserver-style=fifo
g++ tramp3d-v4.ii -c -flto -O2
g++ tramp3d-v4.o -flto=jobserver
(ltrans run in parallel)

Ready to be installed after tests?
Martin

gcc/ChangeLog:

* gcc.cc (driver::detect_jobserver): Support --jobserver-style=fifo.
* lto-wrapper.cc (jobserver_active_p): Likewise.
---
 gcc/gcc.cc | 15 ---
 gcc/lto-wrapper.cc | 20 +++-
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 5cbb38560b2..c98407fe03d 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -9182,15 +9182,24 @@ driver::detect_jobserver () const
   const char *makeflags = env.get ("MAKEFLAGS");
   if (makeflags != NULL)
 {
-  const char *needle = "--jobserver-auth=";
-  const char *n = strstr (makeflags, needle);
+  /* Traditionally, GNU make uses opened pieps for jobserver-auth,
+e.g. --jobserver-auth=3,4.  */
+  const char *pipe_needle = "--jobserver-auth=";
+
+  /* Starting with GNU make 4.4, one can use --jobserver-style=fifo
+and then named pipe is used: --jobserver-auth=fifo:/tmp/hcsparta.  */
+  const char *fifo_needle = "--jobserver-auth=fifo:";
+  if (strstr (makeflags, fifo_needle) != NULL)
+   return;
+
+  const char *n = strstr (makeflags, pipe_needle);
   if (n != NULL)
{
  int rfd = -1;
  int wfd = -1;
 
 	  bool jobserver

-   = (sscanf (n + strlen (needle), "%d,%d", , ) == 2
+   = (sscanf (n + strlen (pipe_needle), "%d,%d", , ) == 2
   && rfd > 0
   && wfd > 0
   && is_valid_fd (rfd)
diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 795ab74555c..756350d5ace 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -1342,27 +1342,37 @@ static const char *
 jobserver_active_p (void)
 {
   #define JS_PREFIX "jobserver is not available: "
-  #define JS_NEEDLE "--jobserver-auth="
+
+  /* Traditionally, GNU make uses opened pieps for jobserver-auth,
+ e.g. --jobserver-auth=3,4.  */
+  #define JS_PIPE_NEEDLE "--jobserver-auth="
+
+  /* Starting with GNU make 4.4, one can use --jobserver-style=fifo
+ and then named pipe is used: --jobserver-auth=fifo:/tmp/hcsparta.  */
+  #define JS_FIFO_NEEDLE "--jobserver-auth=fifo:"
 
   const char *makeflags = getenv ("MAKEFLAGS");

   if (makeflags == NULL)
 return JS_PREFIX "% environment variable is unset";
 
-  const char *n = strstr (makeflags, JS_NEEDLE);

+  if (strstr (makeflags, JS_FIFO_NEEDLE) != NULL)
+return NULL;
+
+  const char *n = strstr (makeflags, JS_PIPE_NEEDLE);
   if (n == NULL)
-return JS_PREFIX "%<" JS_NEEDLE "%> is not present in %";
+return JS_PREFIX "%<" JS_PIPE_NEEDLE "%> is not present in %";
 
   int rfd = -1;

   int wfd = -1;
 
-  if (sscanf (n + strlen (JS_NEEDLE), "%d,%d", , ) == 2

+  if (sscanf (n + strlen (JS_PIPE_NEEDLE), "%d,%d", , ) == 2
   && rfd > 0
   && wfd > 0
   && is_valid_fd (rfd)
   && is_valid_fd (wfd))
 return NULL;
   else
-return JS_PREFIX "cannot access %<" JS_NEEDLE "%> file descriptors";
+return JS_PREFIX "cannot access %<" JS_PIPE_NEEDLE "%> file descriptors";
 }
 
 /* Print link to -flto documentation with a hint message.  */

--
2.37.1



Re: [RFC: PATCH] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

2022-08-04 Thread Richard Biener via Gcc-patches
On Thu, Aug 4, 2022 at 6:29 AM liuhongt via Gcc-patches
 wrote:
>
> For neg, the patch create a vec_init as [ a, -a, a, -a, ...  ] and no
> vec_step is needed to update vectorized iv since vf is always multiple
> of 2(negative * negative is positive).
>
> For shift, the patch create a vec_init as [ a, a >> c, a >> 2*c, ..]
> as vec_step as [ c * nunits, c * nunits, c * nunits, ... ], vectorized iv is
> updated as vec_def = vec_init >>/<< vec_step.
>
> For mul, the patch create a vec_init as [ a, a * c, a * pow(c, 2), ..]
> as vec_step as [ pow(c,nunits), pow(c,nunits),...] iv is updated as vec_def =
> vec_init * vec_step.
>
> The patch handles nonlinear iv for
> 1. Integer type only, floating point is not handled.
> 2. No slp_node.
> 3. iv_loop should be same as vector loop, not nested loop.
> 4. No UD is created, for mul, no UD overlow for pow (c, vf), for
>shift, shift count should be less than type precision.
>
> Bootstrapped and regression tested on x86_64-pc-linux-gnu{-m32,}.
> There's some cases observed in SPEC2017, but no big performance impact.
>
> Any comments?

Looks good overall - a few comments inline.  Also can you please add
SLP support?
I've tried hard to fill in gaps where SLP support is missing since my
goal is still to get
rid of non-SLP.

> gcc/ChangeLog:
>
> PR tree-optimization/103144
> * tree-vect-loop.cc (vect_is_nonlinear_iv_evolution): New function.
> (vect_analyze_scalar_cycles_1): Detect nonlinear iv by upper function.
> (vect_create_nonlinear_iv_init): New function.
> (vect_create_nonlinear_iv_step): Ditto
> (vect_create_nonlinear_iv_vec_step): Ditto
> (vect_update_nonlinear_iv): Ditto
> (vectorizable_nonlinear_induction): Ditto.
> (vectorizable_induction): Call
> vectorizable_nonlinear_induction when induction_type is not
> vect_step_op_add.
> * tree-vectorizer.h (enum vect_induction_op_type): New enum.
> (STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE): New Macro.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr103144-mul-1.c: New test.
> * gcc.target/i386/pr103144-mul-2.c: New test.
> * gcc.target/i386/pr103144-neg-1.c: New test.
> * gcc.target/i386/pr103144-neg-2.c: New test.
> * gcc.target/i386/pr103144-shift-1.c: New test.
> * gcc.target/i386/pr103144-shift-2.c: New test.
> ---
>  .../gcc.target/i386/pr103144-mul-1.c  |  25 +
>  .../gcc.target/i386/pr103144-mul-2.c  |  43 ++
>  .../gcc.target/i386/pr103144-neg-1.c  |  25 +
>  .../gcc.target/i386/pr103144-neg-2.c  |  36 ++
>  .../gcc.target/i386/pr103144-shift-1.c|  34 +
>  .../gcc.target/i386/pr103144-shift-2.c|  61 ++
>  gcc/tree-vect-loop.cc | 604 +-
>  gcc/tree-vectorizer.h |  11 +
>  8 files changed, 834 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-neg-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-neg-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-shift-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-shift-2.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c 
> b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
> new file mode 100644
> index 000..2357541d95d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited 
> -fdump-tree-vect-details -mprefer-vector-width=256" } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> +
> +#define N 1
> +void
> +foo_mul (int* a, int b)
> +{
> +  for (int i = 0; i != N; i++)
> +{
> +  a[i] = b;
> +  b *= 3;
> +}
> +}
> +
> +void
> +foo_mul_const (int* a)
> +{
> +  int b = 1;
> +  for (int i = 0; i != N; i++)
> +{
> +  a[i] = b;
> +  b *= 3;
> +}
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c 
> b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
> new file mode 100644
> index 000..4ea53e44658
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited 
> -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx2 } */
> +
> +#include "avx2-check.h"
> +#include 
> +#include "pr103144-mul-1.c"
> +
> +typedef int v8si __attribute__((vector_size(32)));
> +
> +void
> +avx2_test (void)
> +{
> +  int* epi32_exp = (int*) malloc (N * sizeof (int));
> +  int* epi32_dst = (int*) malloc (N * sizeof (int));
> +
> +  __builtin_memset (epi32_exp, 0, N * sizeof (int));
> +  int b = 8;
> +  v8si init = __extension__(v8si) { b, 

Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-04 Thread Xi Ruoyao via Gcc-patches
On Wed, 2022-08-03 at 11:10 +0800, Xi Ruoyao via Gcc-patches wrote:

> > I'd like to wait for the kernel team to test the performance data of
> > the two implementations before deciding whether to support this
> > attribute.
> > 
> > What do you think?
> 
> Perhaps, I can't access my dev system now anyway (I've configured the
> SSH access but then a sudden power surge happened and I didn't
> configured automatically power on :( )

Hi folks,

Can someone perform a bench to see if a four-instruction immediate load
sequence can outperform GOT or vice versa?  I cannot access my test
system in at least 1 week, and I may be busy preparing Linux From
Scratch 11.2 release in the remaining of August.

Note: if the four-instruction immediate load sequence outperforms GOT,
we should consider use immediate load instead of GOT for -fno-PIC by
default.

P.S. It seems I have trouble accessing gcc400.fsffrance.org.  I have a C
Farm account and I've already put

   Host gcc400.fsffrance.org
   Port 25465

in ~/.ssh/config, and I can access other C farm machines w/o problem. 
But:

   $ ssh gcc400.fsffrance.org 
   xry...@gcc400.fsffrance.org: Permission denied 
(publickey,keyboard-interactive).
   
If you know the administrator of the C farm machine, can you tell him to
check the configuration?  If I can access it I may use some time to
perform the bench (in userspace of course) myself.  Thanks.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Backwards threader greedy search TLC

2022-08-04 Thread Aldy Hernandez via Gcc-patches
On Thu, Aug 4, 2022 at 9:15 AM Richard Biener  wrote:
>
> On Thu, 4 Aug 2022, Aldy Hernandez wrote:
>
> > On Wed, Aug 3, 2022 at 11:53 AM Richard Biener  wrote:
> > >
> > > I've tried to understand how the greedy search works seeing the
> > > bitmap dances and the split into resolve_phi.  I've summarized
> > > the intent of the algorithm as
> > >
> > >   // For further greedy searching we want to remove interesting
> > >   // names defined in BB but add ones on the PHI edges for the
> > >   // respective edges.
> > >
> > > but the implementation differs in detail.  In particular when
> > > there is more than one interesting PHI in BB it seems to only consider
> > > the first for translating defs across edges.  It also only applies
> > > the loop crossing restriction when there is an interesting PHI.
> > > I've also noticed that while the set of interesting names is rolled
> > > back, m_imports just keeps growing - is that a bug?
> >
> > I've never quite liked how I was handling PHIs.  I had some
> > improvements in this space, especially the problem with only
> > considering the first def across a PHI, but I ran out of time last
> > cycle, and we were already into stage3.
> >
> > The loop crossing restriction I inherited from the original
> > implementation, though I suppose I restricted it further by only
> > looking at interesting PHIs.  In practice I don't think it mattered,
> > because we cap loop crossing violations in cancel_invalid_paths in the
> > registry.  Does anything break if you keep the restriction across the
> > board?
>
> Probably not - I've also spotted all these "late" invalidates all over
> the place, some of them should be done earlier to limit the exponential
> search.  I plan to go find them and divide them into things that can
> be checked locally per BB (good to do at greedy search time) and those
> that need the full path (we should simply not register such path).

That would be great.  The earlier the better.

One of the reasons for the current code doing them late, is because we
still have the DOM/forward threader, so we need a place to catch
violations from both threaders.  With the impending demise of the
forward threader, I suppose we could move as much as possible earlier.

>
> I'll keep this as is for now.
>
> > Good spotting on the imports growing.  Instinctively that seems like a bug.
> >
> > [As a suggestion, check the total threadable paths as a sanity check
> > if you make any changes in this space.  What I've been doing is
> > counting "Jumps threaded" in the *.stat dump file across the .ii files
> > in a bootstrap, and making sure you're not getting the same # of
> > threads because we missed one in the backward threader which then got
> > picked up by DOM.  I divide up the threads by ethread, thread,
> > threadfull, DOM, etc, to make sure I'm not just shuffling threading
> > opportunities around.]
>
> I'll do this experiment and will roll this fix into the patch if it
> works out.
>
> > But yeah, we shouldn't be growing imports unnecessarily, if nothing
> > else because of the bitmap explosion you're noticing in other places.
> > In practice, I'm not so sure it slows the solver itself down:
> >
> > // They are hints for the solver.  Adding more elements doesn't slow
> > // us down, because we don't solve anything that doesn't appear in the
> > // path.  On the other hand, not having enough imports will limit what
> > // we can solve.
> >
> > But that comment may be old ;-).
>
> I hoped so, yes.
>
> > >
> > > The following preserves the loop crossing restriction to the case
> > > of interesting PHIs but merges resolve_phi back, changing interesting
> > > as outlined with the intent above.  It should get more threading
> > > cases when there are multiple interesting PHI defs in a block.
> > > It might be a bit faster due to less bitmap operations but in the
> > > end the main intent was to make what happens more obvious.
> >
> > Sweet!
> >
> > >
> > > Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> > >
> > > Aldy - you wrote the existing implementation, is the following OK?
> >
> > I'm tickled pink you're cleaning and improving this.  Looks good.
>
> OK, I'll re-test and push after doing the m_imports experiment.

Thanks.
Aldy



Re: [PATCH] Add condition coverage profiling

2022-08-04 Thread Sebastian Huber

On 02/08/2022 09:58, Jørgen Kvalsvik wrote:

Based on this established terminology I can think of a few good candidates:

condition outcomes covered n/m
outcomes covered n/m

What do you think?


Both are fine, but I would prefer "condition outcomes covered n/m".

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


[Patch] OpenMP: Fix folding with simd's linear clause [PR106492]

2022-08-04 Thread Tobias Burnus

Rather obvious fix and similar to PR106449.

OK for mainline and backporting (how far?). I would like to backport it
at least to GCC 12.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Fix folding with simd's linear clause [PR106492]

gcc/ChangeLog:

	PR middle-end/106492
	* omp-low.cc (lower_rec_input_clauses): Add missing folding
	to data type of linear-clause list item.

gcc/testsuite/ChangeLog:

	PR middle-end/106492
	* g++.dg/gomp/pr106492.C: New test.

 gcc/omp-low.cc   |  6 ++---
 gcc/testsuite/g++.dg/gomp/pr106492.C | 49 
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d73c165f029..3c4b8593c8b 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6241,10 +6241,10 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 			}
 
 		  if (POINTER_TYPE_P (TREE_TYPE (x)))
-			x = fold_build2 (POINTER_PLUS_EXPR,
-	 TREE_TYPE (x), x, t);
+			x = fold_build_pointer_plus (x, t);
 		  else
-			x = fold_build2 (PLUS_EXPR, TREE_TYPE (x), x, t);
+			x = fold_build2 (PLUS_EXPR, TREE_TYPE (x), x,
+	 fold_convert (TREE_TYPE (x), t));
 		}
 
 		  if ((OMP_CLAUSE_CODE (c) != OMP_CLAUSE_LINEAR
diff --git a/gcc/testsuite/g++.dg/gomp/pr106492.C b/gcc/testsuite/g++.dg/gomp/pr106492.C
new file mode 100644
index 000..f263bb42710
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/pr106492.C
@@ -0,0 +1,49 @@
+/* PR middle-end/106492 */
+
+template 
+struct S {
+  T a : 12;
+  S () : a(0)
+  {
+#pragma omp for simd linear(a)
+for (int k = 0; k < 64; ++k)
+  a++;
+  }
+};
+struct U {
+  int a : 12;
+  U () : a(0)
+  {
+#pragma omp for simd linear(a)
+for (int k = 0; k < 64; ++k)
+  a++;
+  }
+};
+
+S s;
+U u;
+
+
+template 
+struct Sptr {
+  T a;
+  Sptr (T init) : a(init)
+  {
+#pragma omp for simd linear(a)
+for (int k = 0; k < 64; ++k)
+  a++;
+  }
+};
+struct Uptr {
+  int *a;
+  Uptr (int *init) : a(init)
+  {
+#pragma omp for simd linear(a)
+for (int k = 0; k < 64; ++k)
+  a++;
+  }
+};
+
+int i[1024];
+Sptr sptr(i);
+Uptr uptr([100]);


Re: [PATCH] Backwards threader greedy search TLC

2022-08-04 Thread Richard Biener via Gcc-patches
On Thu, 4 Aug 2022, Aldy Hernandez wrote:

> On Wed, Aug 3, 2022 at 11:53 AM Richard Biener  wrote:
> >
> > I've tried to understand how the greedy search works seeing the
> > bitmap dances and the split into resolve_phi.  I've summarized
> > the intent of the algorithm as
> >
> >   // For further greedy searching we want to remove interesting
> >   // names defined in BB but add ones on the PHI edges for the
> >   // respective edges.
> >
> > but the implementation differs in detail.  In particular when
> > there is more than one interesting PHI in BB it seems to only consider
> > the first for translating defs across edges.  It also only applies
> > the loop crossing restriction when there is an interesting PHI.
> > I've also noticed that while the set of interesting names is rolled
> > back, m_imports just keeps growing - is that a bug?
> 
> I've never quite liked how I was handling PHIs.  I had some
> improvements in this space, especially the problem with only
> considering the first def across a PHI, but I ran out of time last
> cycle, and we were already into stage3.
> 
> The loop crossing restriction I inherited from the original
> implementation, though I suppose I restricted it further by only
> looking at interesting PHIs.  In practice I don't think it mattered,
> because we cap loop crossing violations in cancel_invalid_paths in the
> registry.  Does anything break if you keep the restriction across the
> board?

Probably not - I've also spotted all these "late" invalidates all over
the place, some of them should be done earlier to limit the exponential
search.  I plan to go find them and divide them into things that can
be checked locally per BB (good to do at greedy search time) and those
that need the full path (we should simply not register such path).

I'll keep this as is for now.

> Good spotting on the imports growing.  Instinctively that seems like a bug.
> 
> [As a suggestion, check the total threadable paths as a sanity check
> if you make any changes in this space.  What I've been doing is
> counting "Jumps threaded" in the *.stat dump file across the .ii files
> in a bootstrap, and making sure you're not getting the same # of
> threads because we missed one in the backward threader which then got
> picked up by DOM.  I divide up the threads by ethread, thread,
> threadfull, DOM, etc, to make sure I'm not just shuffling threading
> opportunities around.]

I'll do this experiment and will roll this fix into the patch if it
works out.

> But yeah, we shouldn't be growing imports unnecessarily, if nothing
> else because of the bitmap explosion you're noticing in other places.
> In practice, I'm not so sure it slows the solver itself down:
> 
> // They are hints for the solver.  Adding more elements doesn't slow
> // us down, because we don't solve anything that doesn't appear in the
> // path.  On the other hand, not having enough imports will limit what
> // we can solve.
> 
> But that comment may be old ;-).

I hoped so, yes.

> >
> > The following preserves the loop crossing restriction to the case
> > of interesting PHIs but merges resolve_phi back, changing interesting
> > as outlined with the intent above.  It should get more threading
> > cases when there are multiple interesting PHI defs in a block.
> > It might be a bit faster due to less bitmap operations but in the
> > end the main intent was to make what happens more obvious.
> 
> Sweet!
> 
> >
> > Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> >
> > Aldy - you wrote the existing implementation, is the following OK?
> 
> I'm tickled pink you're cleaning and improving this.  Looks good.

OK, I'll re-test and push after doing the m_imports experiment.

Thanks,
Richard.


Re: [PATCH] match.pd: Add bitwise and pattern [PR106243]

2022-08-04 Thread Richard Biener via Gcc-patches
On Thu, Aug 4, 2022 at 12:40 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/3/2022 2:44 PM, Prathamesh Kulkarni via Gcc-patches wrote:
> > On Thu, 4 Aug 2022 at 00:41, Sam Feifer via Gcc-patches
> >  wrote:
> >> This patch adds a new optimization to match.pd. The pattern, -x & 1,
> >> now gets simplified to x & 1, reducing the number of instructions
> >> produced.
> > Hi Sam,
> > No comments on patch, but wondering if we can similarly add another pattern 
> > to
> > simplify abs(x) & 1 -> x & 1 ?
> > Currently we don't appear to do it on GIMPLE:
> >
> > __attribute__((noipa))
> > int f1 (int x)
> > {
> >return __builtin_abs (x) & 1;
> > }
> >
> > .optimized dump shows:
> >_1 = ABS_EXPR ;
> >_3 = _1 & 1;
> >return _3;
> >
> > altho combine simplifies it to x & 1 on RTL, resulting in code-gen:
> > f1:
> >  and w0, w0, 1
> >  ret
> Doesn't the abs(x) & mask simplify to x & mask for any mask where the
> sign bit of x is off -- including cases where mask isn't necessarily a
> compile-time constant, but we have range data which allows us to know
> that x's sign bit is off in mask.

You can use tree_expr_nonnegative_p but then that does simplify
abs(x) to x already.  But sure, handling abs() like negate sounds it
should work.

Richard,

> jeff
>
>
>


Re: [PATCH] Backwards threader greedy search TLC

2022-08-04 Thread Aldy Hernandez via Gcc-patches
On Wed, Aug 3, 2022 at 11:53 AM Richard Biener  wrote:
>
> I've tried to understand how the greedy search works seeing the
> bitmap dances and the split into resolve_phi.  I've summarized
> the intent of the algorithm as
>
>   // For further greedy searching we want to remove interesting
>   // names defined in BB but add ones on the PHI edges for the
>   // respective edges.
>
> but the implementation differs in detail.  In particular when
> there is more than one interesting PHI in BB it seems to only consider
> the first for translating defs across edges.  It also only applies
> the loop crossing restriction when there is an interesting PHI.
> I've also noticed that while the set of interesting names is rolled
> back, m_imports just keeps growing - is that a bug?

I've never quite liked how I was handling PHIs.  I had some
improvements in this space, especially the problem with only
considering the first def across a PHI, but I ran out of time last
cycle, and we were already into stage3.

The loop crossing restriction I inherited from the original
implementation, though I suppose I restricted it further by only
looking at interesting PHIs.  In practice I don't think it mattered,
because we cap loop crossing violations in cancel_invalid_paths in the
registry.  Does anything break if you keep the restriction across the
board?

Good spotting on the imports growing.  Instinctively that seems like a bug.

[As a suggestion, check the total threadable paths as a sanity check
if you make any changes in this space.  What I've been doing is
counting "Jumps threaded" in the *.stat dump file across the .ii files
in a bootstrap, and making sure you're not getting the same # of
threads because we missed one in the backward threader which then got
picked up by DOM.  I divide up the threads by ethread, thread,
threadfull, DOM, etc, to make sure I'm not just shuffling threading
opportunities around.]

But yeah, we shouldn't be growing imports unnecessarily, if nothing
else because of the bitmap explosion you're noticing in other places.
In practice, I'm not so sure it slows the solver itself down:

// They are hints for the solver.  Adding more elements doesn't slow
// us down, because we don't solve anything that doesn't appear in the
// path.  On the other hand, not having enough imports will limit what
// we can solve.

But that comment may be old ;-).

>
> The following preserves the loop crossing restriction to the case
> of interesting PHIs but merges resolve_phi back, changing interesting
> as outlined with the intent above.  It should get more threading
> cases when there are multiple interesting PHI defs in a block.
> It might be a bit faster due to less bitmap operations but in the
> end the main intent was to make what happens more obvious.

Sweet!

>
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
>
> Aldy - you wrote the existing implementation, is the following OK?

I'm tickled pink you're cleaning and improving this.  Looks good.

Thanks.
Aldy



Re: [PATCH] match.pd: Add bitwise and pattern [PR106243]

2022-08-04 Thread Richard Biener via Gcc-patches
On Wed, Aug 3, 2022 at 9:11 PM Sam Feifer via Gcc-patches
 wrote:
>
> This patch adds a new optimization to match.pd. The pattern, -x & 1,
> now gets simplified to x & 1, reducing the number of instructions
> produced.
>
> This patch also adds tests for the optimization rule.
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>
> PR tree-optimization/106243
>
> gcc/ChangeLog:
>
> * match.pd (-x & 1): New simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr106243-1.c: New test.
> * gcc.dg/pr106243.c: New test.
> ---
>  gcc/match.pd  |  5 
>  gcc/testsuite/gcc.dg/pr106243-1.c | 18 +
>  gcc/testsuite/gcc.dg/pr106243.c   | 43 +++
>  3 files changed, 66 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr106243-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr106243.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 562138a8034..78b32567836 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8061,3 +8061,8 @@ and,
>(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
>  (bit_and @0 @1)
>(cond (le @0 @1) @0 (bit_and @0 @1))
> +
> +/* -x & 1 -> x & 1.  */
> +(simplify
> +  (bit_and:c (negate @0) integer_onep@1)

Note the bit_and doesn't need :c because constant operands are always
canonicalized second.

OK with that change.
Thanks,
Richard.

> +  (bit_and @0 @1))
> diff --git a/gcc/testsuite/gcc.dg/pr106243-1.c 
> b/gcc/testsuite/gcc.dg/pr106243-1.c
> new file mode 100644
> index 000..b1dbe5cbe44
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr106243-1.c
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/106243 */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include "pr106243.c"
> +
> +int main () {
> +
> +if (foo(3) != 1
> +|| bar(-6) != 0
> +|| baz(17) != 1
> +|| qux(-128) != 0
> +|| foo(127) != 1) {
> +__builtin_abort();
> +}
> +
> +return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr106243.c b/gcc/testsuite/gcc.dg/pr106243.c
> new file mode 100644
> index 000..ee2706f2bf9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr106243.c
> @@ -0,0 +1,43 @@
> +/* PR tree-optimization/106243 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +#define vector __attribute__((vector_size(4*sizeof(int
> +
> +/* Test from PR.  */
> +__attribute__((noipa)) int foo (int x) {
> +return -x & 1;
> +}
> +
> +/* Other test from PR.  */
> +__attribute__((noipa)) int bar (int x) {
> +return (0 - x) & 1;
> +}
> +
> +/* Forward propogation.  */
> +__attribute__((noipa)) int baz (int x) {
> +x = -x;
> +return x & 1;
> +}
> +
> +/* Commutative property.  */
> +__attribute__((noipa)) int qux (int x) {
> +return 1 & -x;
> +}
> +
> +/* Vector test case.  */
> +__attribute__((noipa)) vector int waldo (vector int x) {
> +return -x & 1;
> +}
> +
> +/* Should not simplify.  */
> +__attribute__((noipa)) int thud (int x) {
> +return -x & 2;
> +}
> +
> +/* Should not simplify.  */
> +__attribute__((noipa)) int corge (int x) {
> +return -x & -1;
> +}
> +
> +/* { dg-final {scan-tree-dump-times "-" 2 "optimized" } } */
>
> base-commit: 388fbbd895e72669909173c3003ae65c6483a3c2
> --
> 2.31.1
>


RE: [PATCH 1/2]middle-end: Simplify subtract where both arguments are being bitwise inverted.

2022-08-04 Thread Richard Biener via Gcc-patches
On Wed, 3 Aug 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, June 21, 2022 8:43 AM
> > To: Tamar Christina 
> > Cc: Richard Sandiford ; Richard Biener via Gcc-
> > patches ; Richard Guenther
> > ; nd 
> > Subject: Re: [PATCH 1/2]middle-end: Simplify subtract where both
> > arguments are being bitwise inverted.
> > 
> > On Mon, Jun 20, 2022 at 10:49 AM Tamar Christina
> >  wrote:
> > >
> > > > -Original Message-
> > > > From: Richard Sandiford 
> > > > Sent: Monday, June 20, 2022 9:19 AM
> > > > To: Richard Biener via Gcc-patches 
> > > > Cc: Tamar Christina ; Richard Biener
> > > > ; Richard Guenther
> > ;
> > > > nd 
> > > > Subject: Re: [PATCH 1/2]middle-end: Simplify subtract where both
> > > > arguments are being bitwise inverted.
> > > >
> > > > Richard Biener via Gcc-patches  writes:
> > > > > On Thu, Jun 16, 2022 at 1:10 PM Tamar Christina via Gcc-patches
> > > > >  wrote:
> > > > >>
> > > > >> Hi All,
> > > > >>
> > > > >> This adds a match.pd rule that drops the bitwwise nots when both
> > > > >> arguments to a subtract is inverted. i.e. for:
> > > > >>
> > > > >> float g(float a, float b)
> > > > >> {
> > > > >>   return ~(int)a - ~(int)b;
> > > > >> }
> > > > >>
> > > > >> we instead generate
> > > > >>
> > > > >> float g(float a, float b)
> > > > >> {
> > > > >>   return (int)a - (int)b;
> > > > >> }
> > > > >>
> > > > >> We already do a limited version of this from the fold_binary fold
> > > > >> functions but this makes a more general version in match.pd that
> > > > >> applies
> > > > more often.
> > > > >>
> > > > >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >>
> > > > >> Ok for master?
> > > > >>
> > > > >> Thanks,
> > > > >> Tamar
> > > > >>
> > > > >> gcc/ChangeLog:
> > > > >>
> > > > >> * match.pd: New bit_not rule.
> > > > >>
> > > > >> gcc/testsuite/ChangeLog:
> > > > >>
> > > > >> * gcc.dg/subnot.c: New test.
> > > > >>
> > > > >> --- inline copy of patch --
> > > > >> diff --git a/gcc/match.pd b/gcc/match.pd index
> > > > >>
> > > >
> > a59b6778f661cf9121dd3503f43472871e4da445..51b0a1b562409af535e53828a1
> > > > 0
> > > > >> c30b8a3e1ae2e 100644
> > > > >> --- a/gcc/match.pd
> > > > >> +++ b/gcc/match.pd
> > > > >> @@ -1258,6 +1258,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN
> > (RINT)
> > > > >> (simplify
> > > > >>   (bit_not (plus:c (bit_not @0) @1))
> > > > >>   (minus @0 @1))
> > > > >> +/* (~X - ~Y) -> X - Y.  */
> > > > >> +(simplify
> > > > >> + (minus (bit_not @0) (bit_not @1))  (minus @0 @1))
> > > > >
> > > > > It doesn't seem correct.
> > > > >
> > > > > (gdb) p/x ~-1 - ~0x8000
> > > > > $3 = 0x8001
> > > > > (gdb) p/x -1 - 0x8000
> > > > > $4 = 0x7fff
> > > > >
> > > > > where I was looking for a case exposing undefined integer overflow.
> > > >
> > > > Yeah, shouldn't it be folding to (minus @1 @0) instead?
> > > >
> > > >   ~X = (-X - 1)
> > > >   -Y = (-Y - 1)
> > > >
> > > > so:
> > > >
> > > >   ~X - ~Y = (-X - 1) - (-Y - 1)
> > > >   = -X - 1 + Y + 1
> > > >   = Y - X
> > > >
> > >
> > > You're right, sorry, I should have paid more attention when I wrote the
> > patch.
> > 
> > You still need to watch out for undefined overflow cases in the result that
> > were well-defined in the original expression I think.
> 
> The only special thing we do for signed numbers if to do the subtract as 
> unsigned.  As I mentioned
> before GCC already does this transformation as part of the fold machinery, 
> but that only only happens
> when a very simple tree is matched and only when single use. i.e. 
> https://godbolt.org/z/EWsdhfrKj
> 
> I'm only attempting to make it apply more generally as the result is always 
> beneficial.
> 
> I've respun the patch to the same as we already do.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * match.pd: New bit_not rule.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/subnot.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 330c1db0c8e12b0fb010b1958729444672403866..00b3e07b2a5216b19ed58500923680d83c67d8cf
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1308,6 +1308,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (simplify
>   (bit_not (plus:c (bit_not @0) @1))
>   (minus @0 @1))
> +/* (~X - ~Y) -> Y - X.  */
> +(simplify
> + (minus (bit_not @0) (bit_not @1))
> +  (with { tree utype = unsigned_type_for (type); }
> +   (convert (minus (convert:utype @1) (convert:utype @0)
>  
>  /* ~(X - Y) -> ~X + Y.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/subnot.c b/gcc/testsuite/gcc.dg/subnot.c
> new file mode 100644
> index 
> ..d621bacd27bd3d19a010e4c9f831aa77d28bd02d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/subnot.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O