date:20220803

[Bug c++/106502] Three calls to attribute((const)) function

2022-08-03 Thread egallager at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106502

Eric Gallager  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=18487
 CC||egallager at gcc dot gnu.org

--- Comment #6 from Eric Gallager  ---
(In reply to Jonathan Wakely from comment #5)
> I noticed this by adding a printf statement to the const function for
> temporary debugging purposes, which is obviously incorrect

Seems related to bug 18487 IMO.

[Bug bootstrap/43301] top-level configure script ignores ---with-build-time-tools

2022-08-03 Thread egallager at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43301

Eric Gallager  changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu.org

--- Comment #7 from Eric Gallager  ---
(In reply to Iain Sandoe from comment #6)
> JFTR, I had cause to use this today on powerpc-darwin9, and it seemed to
> DTRT - so it would be useful to establish what it was that did not work
> before, that was fixed by the patch.
> 
> /src-local/gcc-git-11/configure
> --prefix=/opt/iains/powerpc-apple-darwin9/gcc-11-3Dr2d
> --build=powerpc-apple-darwin9 --enable-languages=all --with-tune-cpu=G5
> --enable-libphobos --with-libphobos-druntime-only
> CC=powerpc-apple-darwin-gcc CXX=powerpc-apple-darwin-g++
> --with-build-time-tools=/opt/iains/powerpc-apple-darwin9/gcc-11-3Dr2d/bin
> 
> Without the
> "--with-build-time-tools=/opt/iains/powerpc-apple-darwin9/gcc-11-3Dr2d/bin"
> the system linker and assembler are found and used (which fails to work with
> D, causing a bootstrap fail) with the option, the relevant tools are found
> and bootstrap succeeded
> 
> (so I am not sure what the original problem was
> since $build is not specified in the summary, I guess we must assume it was
> i686-pc-cygwin so perhaps the problem is specific to that setup?)

Alexandre Oliva's assessment is that the issue was just one having an old build
left over, and that all the patch did was to force a rebuild:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599169.html
(so we might be going with his patch instead)

[Bug middle-end/106519] [13 Regression] internal compiler error: in gimple_phi_arg, at gimple.h:4594 by r13-1950

2022-08-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106519

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-08-03
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Tamar Christina  ---
The condition checks that the two BBs share the same successor but forgot to
check that both BB have only one successor.

It looks like with -m32 (and powerpc) the order of the edges just happen to
match and the assert triggers.

Testing a patch overnight and will post tomorrow.

[Bug tree-optimization/106521] New: ICE at -O1 with "-floop-unroll-and-jam --param unroll-jam-min-percent=0": verify_ssa failed

2022-08-03 Thread zhendong.su at inf dot ethz.ch via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106521

Bug ID: 106521
   Summary: ICE at -O1 with "-floop-unroll-and-jam --param
unroll-jam-min-percent=0": verify_ssa failed
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhendong.su at inf dot ethz.ch
  Target Milestone: ---

It appears to be a recent regression (and possibly related to PR106249).

Compiler Explorer: https://godbolt.org/z/Tanf9axav


[545] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/13.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap
--enable-checking=yes --prefix=/local/suz-local/software/local/gcc-trunk
--enable-sanitizers --enable-languages=c,c++ --disable-werror --enable-multilib
--with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.0.0 20220803 (experimental) [master r13-1950-g9bb19e143cf] (GCC) 
[546] % 
[546] % gcctk -O1 -floop-unroll-and-jam --param unroll-jam-min-percent=0
small.c
small.c: In function ‘main’:
small.c:4:5: error: definition in block 30 does not dominate use in block 33
4 | int main() {
  | ^~~~
for SSA_NAME: b_lsm.15_82 in statement:
b_lsm.15_23 = PHI 
PHI argument
b_lsm.15_82
for PHI node
b_lsm.15_23 = PHI 
during GIMPLE pass: unrolljam
small.c:4:5: internal compiler error: verify_ssa failed
0x11356ef verify_ssa(bool, bool)
../../gcc-trunk/gcc/tree-ssa.cc:1211
0x105fb5b rewrite_into_loop_closed_ssa_1
../../gcc-trunk/gcc/tree-ssa-loop-manip.cc:576
0x105fb5b rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int)
../../gcc-trunk/gcc/tree-ssa-loop-manip.cc:626
0x1c94c2d tree_loop_unroll_and_jam
../../gcc-trunk/gcc/gimple-loop-jam.cc:612
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
[547] % 
[547] % cat small.c
short a, b, e;
volatile long c;
long d;
int main() {
  for (; d; d++) {
long g = a = 1;
for (; a; a++) {
  g++;
  c;
}
g && (b = e);
  }
  return 0;
}

[Bug c++/106520] New: 2+ index expressions in build_op_subscript are incorrectly interpreted as comma expression

2022-08-03 Thread mkretz at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106520

Bug ID: 106520
   Summary: 2+ index expressions in build_op_subscript are
incorrectly interpreted as comma expression
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mkretz at gcc dot gnu.org
  Target Milestone: ---

Commit b38c9cf6d570f6c4c1109e00c8b81d82d0f24df3 implemented Multidimensional
subscript operator [PR102611]. However, the backwards compatibility leads to
surprising results. E.g.:

struct A
{
  void operator[](unsigned);
  void operator[](unsigned, unsigned);
};

struct B
{
  explicit operator unsigned() const;
};

void f(A a, B b)
{
  a[1];
  a[b, 2];
}

Compiles to two calls to A::operator[](unsigned) with the following
diagnostics:

: In function 'void f(A, B)':
:15:4: warning: top-level comma expression in array subscript changed
meaning in C++23 [-Wcomma-subscript]
   15 |   a[b, 2];
  |^

[https://godbolt.org/z/f6vf3x5Gv]

The user probably intended to call the two-index subscript overload. But
there's no indication why the call failed. The warning is probably puzzling to
most users. It's probably not obvious to most users that the "wrong" function
gets called.

I'm not sure the compatibility issue is worth it. I think it would be better to
call build_op_subscript with unmodified complain and let code that turns on
-std=c++23 break if it relies on comma expressions in subscripts.

[Bug middle-end/106519] [13 Regression] internal compiler error: in gimple_phi_arg, at gimple.h:4594 by r13-1950

2022-08-03 Thread seurer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106519

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||seurer at gcc dot gnu.org

--- Comment #1 from seurer at gcc dot gnu.org ---
Seeing this on powerpc64 as well for 

FAIL: gcc.dg/torture/pr61346.c   -O1  (internal compiler error: in
gimple_phi_arg, at gimple.h:4594)
FAIL: gcc.dg/torture/pr61346.c   -O1  (test for excess errors)
FAIL: gfortran.dg/make_unit.f90   -O1  (internal compiler error: in
gimple_phi_arg, at gimple.h:4594)
FAIL: gfortran.dg/make_unit.f90   -O1  (test for excess errors)

[Bug middle-end/106519] New: [13 Regression] internal compiler error: in gimple_phi_arg, at gimple.h:4594 by r13-1950

2022-08-03 Thread hjl.tools at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106519

Bug ID: 106519
   Summary: [13 Regression] internal compiler error: in
gimple_phi_arg, at gimple.h:4594 by r13-1950
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: tamar.christina at arm dot com
  Target Milestone: ---

On x86-64, r13-1950 caused

FAIL: gcc.dg/analyzer/pr96653.c (internal compiler error: in gimple_phi_arg, at
gimple.h:4594)
FAIL: gcc.dg/analyzer/pr96653.c (test for excess errors)
FAIL: g++.dg/warn/uninit-pr105562.C  -std=gnu++14 (internal compiler error: in
gimple_phi_arg, at gimple.h:4594)
FAIL: g++.dg/warn/uninit-pr105562.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/warn/uninit-pr105562.C  -std=gnu++17 (internal compiler error: in
gimple_phi_arg, at gimple.h:4594)
FAIL: g++.dg/warn/uninit-pr105562.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/warn/uninit-pr105562.C  -std=gnu++20 (internal compiler error: in
gimple_phi_arg, at gimple.h:4594)
FAIL: g++.dg/warn/uninit-pr105562.C  -std=gnu++20 (test for excess errors)
FAIL: gfortran.dg/make_unit.f90   -O1  (internal compiler error: in
gimple_phi_arg, at gimple.h:4594)
FAIL: gfortran.dg/make_unit.f90   -O1  (internal compiler error: in
gimple_phi_arg, at gimple.h:4594)
FAIL: gfortran.dg/make_unit.f90   -O1  (internal compiler error: in
gimple_phi_arg, at gimple.h:4594)
FAIL: gfortran.dg/make_unit.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/make_unit.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/make_unit.f90   -O1  (test for excess errors)

with -m32.

[Bug rtl-optimization/106518] New: Exchange/swap aware register allocation (generate xchg in reload)

2022-08-03 Thread roger at nextmovesoftware dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106518

Bug ID: 106518
   Summary: Exchange/swap aware register allocation (generate xchg
in reload)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: roger at nextmovesoftware dot com
  Target Milestone: ---

This enhacement request is a proposal for improving/tweaking GCC's register
allocation, but assuming/making use of a register exchange/swap operation as a
useful abstraction.  Currently reload/lra is (solely) "move"-based, so when the
contents of regA need to be placed in regB and the original contents of regB
need to be placed in regA, they make use of a temporary register (or a spill)
and generate the classic sequence: tmp=regA; regA=regB; regB=tmp.

A small improvement is to tweak register allocation to assume, as a higher
level abstraction, the existence of an exchange/swap instruction, like x86's
xchg, much like is assummed/used during the reg-stack pass (with i387's fxch). 
[https://gcc.gnu.org/legacy-ml/gcc-patches/2004-12/msg00815.html]

During early register allocation, we introduce virtual exchange operations,
that on can be lowered as a later pass, either to real exchange operations on
targets that support them, or to the standard three-move shuffle sequence
above, if there's a spare suitable temporary register, or alternatively to the
sequence regA^=regB; regB^=regA; regA^=regB, which implements an exchange using
three fast instructions without requiring an additional register.  These three
alternatives guarantee that register allocation is no worse than current, but
has the flexibility to use fewer registers and perhaps fewer instructions.
On modern hardware, xchg is sometimes zero latency (using register renaming),
and on older architectures, a three xor sequence has the same latency as three
moves, but requires on less register, helpfully reducing register pressure.

An example application/benefit of this PR rtl-optimization/97756, which
demonstrates that the x86_64 ABI frequently places (TImode double word)
registers in locations that then neeed the high and low parts to be swapped
(or moved) to place them in the (reg X) and (reg X+1) locations required by
GCC's multi-word register allocation requirements.

Interestingly, GCC's middle-end doesn't have a standard named pattern for an
exchange/swap instruction, i.e. an optab, so currently it has no (easy) way of
deciding whether a target has an xchg-like instruction, which helps explain why
it doesn't currently use/generate them.

[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*

2022-08-03 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888

--- Comment #3 from Segher Boessenkool  ---
Your second option isn't correct: all these nops should be consecutive.  Your
option 1 is fine :-)

[Bug middle-end/25521] change semantics of const volatile variables

2022-08-03 Thread jose.marchesi at oracle dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25521

--- Comment #9 from Jose E. Marchesi  ---
So I got feedback from the clang/llvm folks on this.

As you can see in [1] they asked the WG14 reflectors about the footnote 135 in
the C18 spec and their conclusion is that there is no normative objection to
place `const volatile' variables in read-only sections, much like non-volatile
consts.

This matches my earlier impression (before I got pointed to that footnote) and
since there is at least one target being impacted by this GCC/LLVM discrepancy
(bpf-unknown-none) I intend to prepare a patch to change the place where GCC
places the `const volatiles'.

[1] https://github.com/llvm/llvm-project/issues/56468

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #28 from Segher Boessenkool  ---
(In reply to rsand...@gcc.gnu.org from comment #25)
> - On big-endian targets, vector loads and stores are assumed to put the
>   first memory element at the most significant end of the vector register.

I agree with everything here, except calling this "most significant".  That
just makes no sense for vectors.  It is element 0, but that is not more
significant than any other element :-)  Vectors aren't integers.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #27 from Segher Boessenkool  ---
IMO what vec_select calls element 0 is always in the first argument of the
vec_concat it works on, in BE as well as LE.  But yes this is quite
underdefined
in our documentation, and who know what is actually implemented, in targets as
well as in generic code :-(

[Bug target/106517] New: RISC-V: Inefficient Generated Code for Floating Point to Integer Rounds

2022-08-03 Thread palmer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106517

Bug ID: 106517
   Summary: RISC-V: Inefficient Generated Code for Floating Point
to Integer Rounds
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

RISC-V has a handful of floating-point conversion instructions that we don't
appear to be taking advantage of.  For example

long f(double in)
{
return __builtin_floor(in);
}

generates a call to the floor() library routine, while I believe we can
implement in via just a "fcvt.l.d a0, fa0, rdn" (RISC-V clang and arm64 GCC). 
There are a bunch of similar patterns, the aarch64 test suite seems to have
pretty good coverage of them.

We should port those tests over to RISC-V, figure out which conversions we can
implement directly, and then fix whatever's broken.  I started poking around a
bit and found that even some of the conversions where we have MD file patterns
aren't behaving as expected, so there might be some deeper issue going on.

This has come up in a handful of forums lately and while we're still hoping to
find some time to look into it, I figured it'd be best to open at least a basic
bug so at least we can have one place to track the issues.

[Bug target/105090] BFI instructions are not generated on arm-none-eabi-g++

2022-08-03 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105090

Richard Earnshaw  changed:

   What|Removed |Added

   Target Milestone|--- |13.0

[Bug target/105090] BFI instructions are not generated on arm-none-eabi-g++

2022-08-03 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105090

Richard Earnshaw  changed:

   What|Removed |Added

 CC||andij.cr at gmail dot com

--- Comment #7 from Richard Earnshaw  ---
*** Bug 91674 has been marked as a duplicate of this bug. ***

[Bug target/91674] [ARM/thumb] redundant memcpy does not get optimized away on thumb

2022-08-03 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91674

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Richard Earnshaw  ---
This is essentially a dup of PR105090, which is now fixed on master.  The code
generation in both Arm and Thumb2 state is essentially the same early on, but
in Thumb we were unable to optimize away all the byte manipulations.
The unused stack slot was needed at the time of early expansion to RTL and once
created there's no mechanism for getting rid of it if it is no-longer needed.

*** This bug has been marked as a duplicate of bug 105090 ***

[Bug testsuite/106515] [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Jose E. Marchesi :

https://gcc.gnu.org/g:f0688c82ba8206a3d8960eb1d4821dc6a5f2a9f4

commit r13-1951-gf0688c82ba8206a3d8960eb1d4821dc6a5f2a9f4
Author: Jose E. Marchesi 
Date:   Wed Aug 3 18:50:05 2022 +0200

testsuite: btf: fix regexps in btf-int-1.c

The regexps in hte test btf-int-1.c were not working properly with the
commenting style of at least one target: powerpc64le-linux-gnu.  This
patch changes the test to use better regexps.

Tested in bpf-unkonwn-none, x86_64-linux-gnu and powerpc64le-linux-gnu.
Pushed to master as obvious.

gcc/testsuite/ChangeLog:

PR testsuite/106515
* gcc.dg/debug/btf/btf-int-1.c: Fix regexps in
scan-assembler-times.

[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query

2022-08-03 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

--- Comment #1 from Andrew Macleod  ---
(In reply to Richard Biener from comment #0)

> 
> Part of the sub-optimality is probably the equiv chain becoming very long
> (can we simply limit that?) and clearing bits in all the very many
> bitmaps linked.  N

we certainly could.  Especially in the path version which has that killing-def
issue which doesn't exist in the normal oracle.   The path oracle basically
takes a normal oracle, then bolts the path following code onto it, and has to
deal with new defintions invalidating any existing equivalences.  I'd first
look for inefficiencies elsewhere as we didnt spend a lot of time tweaking it
once it was working.

Given the way equivalences have to be matched, Im not sure that we even need to
walk the list.  The new equivalence set for the killed def will only contain
itself, or any new equivalences encountered since the kill.   In order to be
equivalent, 2 names must be in each others set, which they won't be.  I'm not
convinced we need to remove them at all.

Im also not sure why the path oracle changes the root oracle requirement that
they be the same equivalence set, not just in each others. I think it has
something to do with the  transitory nature of the path equivalence/relations
vs the root oracles "permanent" sets.  I think we can do better here too.

And finally, Aldy has a list of all the ssa-names in the path that are relevant
to the calculations in the path.  I suspect we can reduce any equivalence sets
immediately to just those names, as any on-entry ranges should reflect existing
equivalences.  in theory :-)

We'll see if any or all of those have any effect and get back to you.

[Bug tree-optimization/104992] [missed optimization] x / y * y == x not optimized to x % y == 0

2022-08-03 Thread seurer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104992

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||seurer at gcc dot gnu.org

--- Comment #3 from seurer at gcc dot gnu.org ---
The fix for this causes an issue on power 10.  See PR106516

[Bug testsuite/106516] New test case gcc.dg/pr104992.c fails on power 10

2022-08-03 Thread seurer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106516

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org
Summary|New test case   |New test case
   |gcc.dg/pr104992.c fails |gcc.dg/pr104992.c fails on
   ||power 10

--- Comment #1 from seurer at gcc dot gnu.org ---
I should have said it ONLY fails on power 10.  Works fine on power 9 and
earlier.

I can't find a valid email address for Sam Feifer to use for bugzilla.

[Bug testsuite/106516] New: New test case gcc.dg/pr104992.c fails

2022-08-03 Thread seurer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106516

Bug ID: 106516
   Summary: New test case gcc.dg/pr104992.c fails
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:388fbbd895e72669909173c3003ae65c6483a3c2, r13-1916-g388fbbd895e726

I only saw this on a power 10 machine.

make  -k check-gcc RUNTESTFLAGS="dg.exp=gcc.dg/pr104992.c"
FAIL: gcc.dg/pr104992.c scan-tree-dump-times optimized " % " 9
# of expected passes1
# of unexpected failures1


commit 388fbbd895e72669909173c3003ae65c6483a3c2 (HEAD, refs/bisect/bad)
Author: Sam Feifer 
Date:   Fri Jul 29 09:44:48 2022 -0400

match.pd: Add new division pattern [PR104992]


Executing on host: /home/seurer/gcc/git/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/git/build/gcc-test/gcc/
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gcc.dg/pr104992.c   
-fdiagnostics-plain-output   -O2 -fdump-tree-optimized -S -o pr104992.s   
(timeout = 300)
spawn -ignore SIGHUP /home/seurer/gcc/git/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/git/build/gcc-test/gcc/
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gcc.dg/pr104992.c
-fdiagnostics-plain-output -O2 -fdump-tree-optimized -S -o pr104992.s^M
PASS: gcc.dg/pr104992.c (test for excess errors)
gcc.dg/pr104992.c: pattern found 6 times
FAIL: gcc.dg/pr104992.c scan-tree-dump-times optimized " % " 9

[Bug testsuite/106515] [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1

2022-08-03 Thread jose.marchesi at oracle dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515

--- Comment #3 from Jose E. Marchesi  ---
This is due to having not so good regular expressions in the test btf-int-1.c
and to a slightly different way than the powerpc backend has to comment lines
in assembly.

Working on a fix.

[Bug testsuite/106515] [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1

2022-08-03 Thread jose.marchesi at oracle dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515

--- Comment #2 from Jose E. Marchesi  ---
Don't bother I just reproduced the issue in powerpc64le-linux-gnu.

[Bug testsuite/106515] [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1

2022-08-03 Thread jose.marchesi at oracle dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515

Jose E. Marchesi  changed:

   What|Removed |Added

 CC||jose.marchesi at oracle dot com

--- Comment #1 from Jose E. Marchesi  ---
Hello.  Thanks for reporting this.

Could you please attach the $top_builddir/gcc/testsuite/gcc/gcc.log file you
get after running the testsuite?

Thanks.

[Bug testsuite/106515] New: [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1

2022-08-03 Thread seurer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515

Bug ID: 106515
   Summary: [13 regression] gcc.dg/debug/btf/btf-int-1.c fails
after r13-1937-g5df04a7aa837a1
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:5df04a7aa837a13b0e14d269c37bd3871d86bf08, r13-1937-g5df04a7aa837a1
make  -k check-gcc RUNTESTFLAGS="btf.exp=gcc.dg/debug/btf/btf-int-1.c"
FAIL: gcc.dg/debug/btf/btf-int-1.c scan-assembler-times [\t ]0x..[\t
]+[^\n]*bti_encoding 3
# of expected passes13
# of unexpected failures1


commit 5df04a7aa837a13b0e14d269c37bd3871d86bf08 (HEAD, refs/bisect/bad)
Author: Jose E. Marchesi 
Date:   Fri Jul 22 12:40:50 2022 +0200

btf: do not use the CHAR `encoding' bit for BTF

[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query

2022-08-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |12.2
 CC||amacleod at redhat dot com
   Keywords||compile-time-hog

[Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query

2022-08-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514

Bug ID: 106514
   Summary: [12/13 Regression] ranger slowness in path query
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When you bump --param max-jump-thread-duplication-stmts only so slightly, like
by a factor of two to 30, making the effective backwards threading limit 15,
the gcc.dg/pr69592.c -O2 compile-time explodes and you'll see

 backwards jump threading   :   9.17 ( 92%)   0.01 ( 20%)   9.18 ( 92%)
   24  (  0%)

note the testcase is somewhat degenerate with a series of diamonds also
"nicely" showing the backwards threader exponential behavior in exploring
the threading path space (plus ontop the quadraticness with starting on
every condition).  The current effective limit of 7 copied stmts limits the
effective thread length to a single diamond, avoiding the issue.

perf shows you

Samples: 143K of event 'cycles:u', Event count (approx.): 127516678963  
Overhead   Samples  Command  Shared Object Symbol   
  24.36% 34962  cc1  cc1   [.] bitmap_bit_p
  18.78% 26962  cc1  cc1   [.] bitmap_list_find_element
  14.98% 21505  cc1  cc1   [.] path_oracle::killing_def
   1.94%  2791  cc1  cc1   [.]
path_range_query::compute_ranges_in_block

so it's not the exponential search space per-se but the high overhead of
ranger, specifically the relation oracle which seems to be unbound.

path_range_query::range_defined_in_block calling path_oracle::killing_def
87 000 times gets you 600 000 000 bitmap_bit_p queries (resulting in
10 billion(!) bitmap list walk steps).

Part of the sub-optimality is probably the equiv chain becoming very long
(can we simply limit that?) and clearing bits in all the very many
bitmaps linked.  Not to say that linked lists (for relations and equivalences)
are hardly a good data structure for anything but inserts/removals :/

The bitmap (on the list) + linked list combos should probably be all replaced
with splay trees.  There's (unused) splay-tree-utils.h that seem to be
splay tree "adaptors" ontop of something with links, but libiberty splay-trees
of course work as well.  I think it's worth optimizing for small number of
elements, thus favor a balanced tree over hashing.

Note for the testcase at hand it's walking of the m_equiv list, not the
m_relations one so it might be a bit more difficult to fix this than
if the issue were the m_relations chain.

I'm also seeing missed micro-optimization like

  // Walk the equivalency list and remove SSA from any equivalencies.
  if (bitmap_bit_p (m_equiv.m_names, v))
...
  else
bitmap_set_bit (m_equiv.m_names, v);

which can be written as

  if (!bitmap_set_bit (m_equiv.m_names, v))


likewise for bitmap_clear_bit.  Both return whether the bit changed
with the operation.  Of course that's just constant factor, the issue
here is complexity involving the linear list walks.

[Bug libstdc++/105957] __n * sizeof(_Tp) might overflow under consteval context for std::allocator

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105957

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Jonathan Wakely  ---
Backported for 12.2

[Bug libstdc++/104443] common_iterator::operator-> is not correctly implemented

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104443

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|13.0|12.2

--- Comment #7 from Jonathan Wakely  ---
Backported for 12.2

[Bug libstdc++/105995] QoI: constexpr basic_string variable must use all of its SSO buffer

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105995

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|13.0|12.2

--- Comment #8 from Jonathan Wakely  ---
Backported for 12.2

[Bug libstdc++/105844] [10/11 Regression] std::lcm(50000, 49999) is UB but accepted in a constexpr due to cast to unsigned

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105844

--- Comment #10 from Jonathan Wakely  ---
Backported for 12.2

[Bug libstdc++/106248] [11 Regression] operator>>std::basic_istream at boundary condition behave differently in different opt levels

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106248

--- Comment #11 from Jonathan Wakely  ---
Backported for 12.2

[Bug libstdc++/106248] [11/12 Regression] operator>>std::basic_istream at boundary condition behave differently in different opt levels

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106248

--- Comment #10 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:7a0ed28d4feb450f1ede5b52b57793a5df5b19fe

commit r12-8659-g7a0ed28d4feb450f1ede5b52b57793a5df5b19fe
Author: Jonathan Wakely 
Date:   Tue Jul 12 11:18:47 2022 +0100

libstdc++: Check for EOF if extraction avoids buffer overflow [PR106248]

In r11-2581-g17abcc77341584 (for LWG 2499) I added overflow checks to
the pre-C++20 operator>>(istream&, char*) overload.  Those checks can
cause extraction to stop after filling the buffer, where previously it
would have tried to extract another character and stopped at EOF. When
that happens we no longer set eofbit in the stream state, which is
consistent with the behaviour of the new C++20 overload, but is an
observable and unexpected change in the C++17 behaviour. What makes it
worse is that the behaviour change is dependent on optimization, because
__builtin_object_size is used to detect the buffer size and that only
works when optimizing.

To avoid the unexpected and optimization-dependent change in behaviour,
set eofbit manually if we stopped extracting because of the buffer size
check, but had reached EOF anyway. If the stream's rdstate() != goodbit
or width() is non-zero and smaller than the buffer, there's nothing to
do. Otherwise, we filled the buffer and need to check for EOF, and maybe
set eofbit.

The new check is guarded by #ifdef __OPTIMIZE__ because otherwise
__builtin_object_size is useless. There's no point compiling and
emitting dead code that can't be eliminated because we're not
optimizing.

We could add extra checks that the next character in the buffer is not
whitespace, to detect the case where we stopped early and prevented a
buffer overflow that would have happened otherwise. That would allow us
to assert or set badbit in the stream state when undefined behaviour was
prevented. However, those extra checks would increase the size of the
function, potentially reducing the likelihood of it being inlined, and
so making the buffer size detection less reliable. It seems preferable
to prevent UB and silently truncate, rather than miss the UB and allow
the overflow to happen.

libstdc++-v3/ChangeLog:

PR libstdc++/106248
* include/std/istream [C++17] (operator>>(istream&, char*)):
Set eofbit if we stopped extracting at EOF.
*
testsuite/27_io/basic_istream/extractors_character/char/pr106248.cc:
New test.
*
testsuite/27_io/basic_istream/extractors_character/wchar_t/pr106248.cc:
New test.

(cherry picked from commit 5ae74944af1de032d4a27fad4a2287bd3a2163fd)

[Bug libstdc++/104443] common_iterator::operator-> is not correctly implemented

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104443

--- Comment #6 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:1a9681e60964c7f7ce0892e14745e6dcf6100157

commit r12-8660-g1a9681e60964c7f7ce0892e14745e6dcf6100157
Author: Jonathan Wakely 
Date:   Thu Jul 28 20:55:51 2022 +0100

libstdc++: Tweak common_iterator::operator-> return type [PR104443]

This adjusts the return type to match the resolution of LWG 3672. There
is no functional difference, because decltype(auto) always deduced a
value anyway, but this makes it simpler and consistent with the working
draft.

libstdc++-v3/ChangeLog:

PR libstdc++/104443
* include/bits/stl_iterator.h (common_iterator::operator->):
Change return type to just auto.

(cherry picked from commit b5f5d1b36edbcd7d923f2e2653e54e52637c715b)

[Bug libstdc++/105995] QoI: constexpr basic_string variable must use all of its SSO buffer

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105995

--- Comment #7 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:e562236851e06091256593aa0d3fbda60a28e45b

commit r12-8657-ge562236851e06091256593aa0d3fbda60a28e45b
Author: Jonathan Wakely 
Date:   Thu Jun 16 14:57:32 2022 +0100

libstdc++: Support constexpr global std::string for size < 15 [PR105995]

I don't think this is required by the standard, but it's easy to
support.

libstdc++-v3/ChangeLog:

PR libstdc++/105995
* include/bits/basic_string.h (_M_use_local_data): Initialize
the entire SSO buffer.
* testsuite/21_strings/basic_string/cons/char/105995.cc: New test.

(cherry picked from commit 98a0d72a610a87e8e383d366e50253ddcc9a51dd)

[Bug libstdc++/105957] __n * sizeof(_Tp) might overflow under consteval context for std::allocator

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105957

--- Comment #4 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:2ef2de76dae5cac14e0de77ca7205e43be03ab22

commit r12-8655-g2ef2de76dae5cac14e0de77ca7205e43be03ab22
Author: Jonathan Wakely 
Date:   Tue Jun 14 14:37:25 2022 +0100

libstdc++: Check for size overflow in constexpr allocation [PR105957]

libstdc++-v3/ChangeLog:

PR libstdc++/105957
* include/bits/allocator.h (allocator::allocate): Check for
overflow in constexpr allocation.
* testsuite/20_util/allocator/105975.cc: New test.

(cherry picked from commit 0a9af7b4ef1b8aa85cc8820acf54d41d1569fc10)

[Bug libstdc++/92978] std::gcd mishandles mixed-signedness

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92978

--- Comment #9 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:8a57deb926cd660c2eae7ed621d61a301ae0d523

commit r12-8654-g8a57deb926cd660c2eae7ed621d61a301ae0d523
Author: Jonathan Wakely 
Date:   Fri Jun 10 14:39:13 2022 +0100

libstdc++: Make std::lcm and std::gcd detect overflow [PR105844]

When I fixed PR libstdc++/92978 I introduced a regression whereby
std::lcm(INT_MIN, 1) and std::lcm(5, 4) would no longer produce
errors during constant evaluation. Those calls are undefined, because
they violate the preconditions that |m| and the result can be
represented in the return type (which is int in both those cases). The
regression occurred because __absu(INT_MIN) is well-formed,
due to the explicit casts to unsigned in that new helper function, and
the out-of-range multiplication is well-formed, because unsigned
arithmetic wraps instead of overflowing.

To fix 92978 I made std::gcm and std::lcm calculate |m| and |n|
immediately, yielding a common unsigned type that was used to calculate
the result. That was partly correct, but there's no need to use an
unsigned type. Doing so only suppresses the overflow errors so the
compiler can't detect them. This change replaces __absu with __abs_r
that returns the common type (not its corresponding unsigned type). This
way we can detect overflow in __abs_r when required, while still
supporting the most-negative value when it can be represented in the
result type. To detect LCM results that are out of range of the result
type we still need explicit checks, because neither constant evaluation
nor UBsan will complain about unsigned wrapping for cases such as
std::lcm(50u, 49u). We can detect those overflows efficiently by
using __builtin_mul_overflow and asserting.

libstdc++-v3/ChangeLog:

PR libstdc++/105844
* include/experimental/numeric (experimental::gcd): Simplify
assertions. Use __abs_r instead of __absu.
(experimental::lcm): Likewise. Remove use of __detail::__lcm so
overflow can be detected.
* include/std/numeric (__detail::__absu): Rename to __abs_r and
change to allow signed result type, so overflow can be detected.
(__detail::__lcm): Remove.
(gcd): Simplify assertions. Use __abs_r instead of __absu.
(lcm): Likewise. Remove use of __detail::__lcm so overflow can
be detected.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/105844.cc: New test.
* testsuite/26_numerics/lcm/105844.cc: New test.

(cherry picked from commit 671970a5621e18e7079b4ca113e56434c858db66)

[Bug libstdc++/105844] [10/11/12 Regression] std::lcm(50000, 49999) is UB but accepted in a constexpr due to cast to unsigned

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105844

--- Comment #9 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:8a57deb926cd660c2eae7ed621d61a301ae0d523

commit r12-8654-g8a57deb926cd660c2eae7ed621d61a301ae0d523
Author: Jonathan Wakely 
Date:   Fri Jun 10 14:39:13 2022 +0100

libstdc++: Make std::lcm and std::gcd detect overflow [PR105844]

When I fixed PR libstdc++/92978 I introduced a regression whereby
std::lcm(INT_MIN, 1) and std::lcm(5, 4) would no longer produce
errors during constant evaluation. Those calls are undefined, because
they violate the preconditions that |m| and the result can be
represented in the return type (which is int in both those cases). The
regression occurred because __absu(INT_MIN) is well-formed,
due to the explicit casts to unsigned in that new helper function, and
the out-of-range multiplication is well-formed, because unsigned
arithmetic wraps instead of overflowing.

To fix 92978 I made std::gcm and std::lcm calculate |m| and |n|
immediately, yielding a common unsigned type that was used to calculate
the result. That was partly correct, but there's no need to use an
unsigned type. Doing so only suppresses the overflow errors so the
compiler can't detect them. This change replaces __absu with __abs_r
that returns the common type (not its corresponding unsigned type). This
way we can detect overflow in __abs_r when required, while still
supporting the most-negative value when it can be represented in the
result type. To detect LCM results that are out of range of the result
type we still need explicit checks, because neither constant evaluation
nor UBsan will complain about unsigned wrapping for cases such as
std::lcm(50u, 49u). We can detect those overflows efficiently by
using __builtin_mul_overflow and asserting.

libstdc++-v3/ChangeLog:

PR libstdc++/105844
* include/experimental/numeric (experimental::gcd): Simplify
assertions. Use __abs_r instead of __absu.
(experimental::lcm): Likewise. Remove use of __detail::__lcm so
overflow can be detected.
* include/std/numeric (__detail::__absu): Rename to __abs_r and
change to allow signed result type, so overflow can be detected.
(__detail::__lcm): Remove.
(gcd): Simplify assertions. Use __abs_r instead of __absu.
(lcm): Likewise. Remove use of __detail::__lcm so overflow can
be detected.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/105844.cc: New test.
* testsuite/26_numerics/lcm/105844.cc: New test.

(cherry picked from commit 671970a5621e18e7079b4ca113e56434c858db66)

[Bug libstdc++/98421] std::span does not detect invalid range in Debug Mode

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98421

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|11.3|10.5

--- Comment #5 from Jonathan Wakely  ---
Fixed for 11.3 and 10.5 too.

[Bug testsuite/100748] [12 regression] 30_threads/jthread/95989.cc fails after r12-843

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100748

--- Comment #13 from Jonathan Wakely  ---
Fixed for 11.3 and 10.5 too.

[Bug libstdc++/103133] Binary built with -static using std::thread crashes

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103133

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|11.3|10.5

--- Comment #12 from Jonathan Wakely  ---
Also fixed for 10.5

[Bug tree-optimization/106513] bswap is incorrectly generated

2022-08-03 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513

--- Comment #2 from Krister Walfridsson  ---
(In reply to Andreas Schwab from comment #1)
> This subexpression has undefined behaviour: (((int64_t) 0xff) << 56).

I thought that was allowed in GCC as the manual says
(https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Integers-implementation.html#Integers-implementation)
"As an extension to the C language, GCC does not use the latitude given in C99
and C11 only to treat certain aspects of signed ‘<<’ as undefined."

If not, what behavior does the manual refer to?

[Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-03 Thread malat at debian dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #14 from Mathieu Malaterre  ---
I can make the symptom go away with a single function attribute:

```
% diff -u *
--- /tmp/ii/mul_test.cc.ii.bad  2022-08-03 12:29:41.192263306 +
+++ /tmp/ii/mul_test.cc.ii.good 2022-08-03 12:29:41.196263281 +
@@ -124932,7 +124932,7 @@
}
   template 
   __attribute__((noinline)) void
-
+  __attribute__((optimize("no-tree-vectorize")))
   operator()(T , D d) {

 const size_t N = Lanes(d);
```

[Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-03 Thread malat at debian dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #13 from Mathieu Malaterre  ---
Created attachment 53407
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53407=edit
main function with no-tree-optimize attribute

[Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)

2022-08-03 Thread malat at debian dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #12 from Mathieu Malaterre  ---
Created attachment 53406
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53406=edit
main function with no-tree-optimize attribute

[Bug libstdc++/98421] std::span does not detect invalid range in Debug Mode

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98421

--- Comment #4 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:de802e4736613a585dcfd508acf73033f18aa4da

commit r10-10932-gde802e4736613a585dcfd508acf73033f18aa4da
Author: Jonathan Wakely 
Date:   Tue Aug 31 17:34:51 2021 +0100

libstdc++: Add valid range checks to std::span constructors [PR98421]

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/98421
* include/std/span (span(Iter, size_type), span(Iter, Iter)):
Add valid range checks.
* testsuite/23_containers/span/cons_1_assert_neg.cc: New test.
* testsuite/23_containers/span/cons_2_assert_neg.cc: New test.

(cherry picked from commit ef7becc9c8a48804d3fd9dac032f7b33e561a612)

[Bug testsuite/100748] [12 regression] 30_threads/jthread/95989.cc fails after r12-843

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100748

--- Comment #12 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:18eecb8c4a97716d4bc4890b05c91f172fadc7b3

commit r10-10928-g18eecb8c4a97716d4bc4890b05c91f172fadc7b3
Author: Jonathan Wakely 
Date:   Tue Nov 9 23:45:36 2021 +

libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133]

Since Glibc 2.34 all pthreads symbols are defined directly in libc not
libpthread, and since Glibc 2.32 we have used __libc_single_threaded to
avoid unnecessary locking in single-threaded programs. This means there
is no reason to avoid linking to libpthread now, and so no reason to use
weak symbols defined in gthr-posix.h for all the pthread_xxx functions.

libstdc++-v3/ChangeLog:

PR libstdc++/100748
PR libstdc++/103133
* config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK):
Define for glibc 2.34 and later.

(cherry picked from commit 80fe172ba9820199c2bbce5d0611ffca27823049)

[Bug libstdc++/103133] Binary built with -static using std::thread crashes

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103133

--- Comment #11 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:18eecb8c4a97716d4bc4890b05c91f172fadc7b3

commit r10-10928-g18eecb8c4a97716d4bc4890b05c91f172fadc7b3
Author: Jonathan Wakely 
Date:   Tue Nov 9 23:45:36 2021 +

libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133]

Since Glibc 2.34 all pthreads symbols are defined directly in libc not
libpthread, and since Glibc 2.32 we have used __libc_single_threaded to
avoid unnecessary locking in single-threaded programs. This means there
is no reason to avoid linking to libpthread now, and so no reason to use
weak symbols defined in gthr-posix.h for all the pthread_xxx functions.

libstdc++-v3/ChangeLog:

PR libstdc++/100748
PR libstdc++/103133
* config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK):
Define for glibc 2.34 and later.

(cherry picked from commit 80fe172ba9820199c2bbce5d0611ffca27823049)

[Bug tree-optimization/106513] bswap is incorrectly generated

2022-08-03 Thread schwab--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513

--- Comment #1 from Andreas Schwab  ---
This subexpression has undefined behaviour: (((int64_t) 0xff) << 56).

[Bug tree-optimization/106513] New: bswap is incorrectly generated

2022-08-03 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513

Bug ID: 106513
   Summary: bswap is incorrectly generated
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

GCC may incorrectly generate bswap instructions for code not doing a correct
swap. This can be seen by running the function from testsuite/gcc.dg/pr40501.c
as

typedef long int int64_t;

__attribute__((noinline)) int64_t
swap64 (int64_t n)
{
  return (((n & (((int64_t) 0xff) )) << 56) |
  ((n & (((int64_t) 0xff) << 8)) << 40) |
  ((n & (((int64_t) 0xff) << 16)) << 24) |
  ((n & (((int64_t) 0xff) << 24)) << 8) |
  ((n & (((int64_t) 0xff) << 32)) >> 8) |
  ((n & (((int64_t) 0xff) << 40)) >> 24) |
  ((n & (((int64_t) 0xff) << 48)) >> 40) |
  ((n & (((int64_t) 0xff) << 56)) >> 56));
}

int main (void)
{
  volatile int64_t n = 0x8000l;

  if (swap64(n) != 0xff80l)
__builtin_abort ();

  return 0;
}

This fails at -Os and higher optimization levels.

[Bug lto/106499] LTO runs forever in libfabric 1.15.1 linking

2022-08-03 Thread kloczko.tomasz at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106499

--- Comment #24 from Tomasz Kłoczko  ---
Thank you :)

[Bug tree-optimization/105651] [12/13 Regression] bogus "may overlap" memcpy warning with std::string and operator+ at -O3

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105651

Jonathan Wakely  changed:

   What|Removed |Added

 CC||hi at jdoubleu dot de

--- Comment #16 from Jonathan Wakely  ---
*** Bug 106512 has been marked as a duplicate of this bug. ***

[Bug c++/106512] String optimization underflows in std::string::operator+ inlining

2022-08-03 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106512

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Jonathan Wakely  ---
Another dup of PR 105651 or PR 105329

*** This bug has been marked as a duplicate of bug 105651 ***

[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*

2022-08-03 Thread linkw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888

--- Comment #2 from Kewen Lin  ---
Created attachment 53405
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53405=edit
untested patch

With the attached patch, for -fpatchable-function-entry=5,2 it gets:

foo:
.LFB0:
.cfi_startproc
.LCF0:
0:  addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
.section__patchable_function_entries,"awo",@progbits,foo
.align 3
.8byte  .LPFE1
.section".text"
.LPFE1:
nop
nop
.localentry foo,.-foo
.section__patchable_function_entries,"awo",@progbits,foo
.align 3
.8byte  .LPFE2
.section".text"
.LPFE2:
nop
nop
nop
std 31,-8(1)

for -fpatchable-function-entry=5,1, it emits error msg:

test.c:4:1: error: ‘-fpatchable-function-entry=M,N’ N NOPs can cause assembler
error due to invalid .localentry expression.

[Bug c++/106512] New: String optimization underflows in std::string::operator+ inlining

2022-08-03 Thread hi at jdoubleu dot de via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106512

Bug ID: 106512
   Summary: String optimization underflows in
std::string::operator+ inlining
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hi at jdoubleu dot de
  Target Milestone: ---

Live example: https://godbolt.org/z/zMqG8W7WE

Given the following code:
```cpp
#include 

std::string GetHello()
{
return std::string{"ello"};
}

int main()
{
("H" + GetHello());
}
```

Fails to compile with
1. gcc version 12.1 and newer,
2. linking against gnu++20 and higher
3. all warnings enabled,
4. warnings set to produce an error,
5. -O3 is turned on


I get the following error:
```
In file included from
/opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/string:40,
 from :1:
In static member function 'static constexpr std::char_traits::char_type*
std::char_traits::copy(char_type*, const char_type*, std::size_t)',
inlined from 'static constexpr void std::__cxx11::basic_string<_CharT,
_Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT =
char; _Traits = std::char_traits; _Alloc = std::allocator]' at
/opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.h:423:21,
inlined from 'constexpr std::__cxx11::basic_string<_CharT, _Traits,
_Allocator>& std::__cxx11::basic_string<_CharT, _Traits,
_Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with
_CharT = char; _Traits = std::char_traits; _Alloc =
std::allocator]' at
/opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.tcc:532:22,
inlined from 'constexpr std::__cxx11::basic_string<_CharT, _Traits,
_Alloc>& std::__cxx11::basic_string<_CharT, _Traits,
_Alloc>::replace(size_type, size_type, const _CharT*, size_type) [with _CharT =
char; _Traits = std::char_traits; _Alloc = std::allocator]' at
/opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.h:2171:19,
inlined from 'constexpr std::__cxx11::basic_string<_CharT, _Traits,
_Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::insert(size_type,
const _CharT*) [with _CharT = char; _Traits = std::char_traits; _Alloc =
std::allocator]' at
/opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.h:1928:22,
inlined from 'constexpr std::__cxx11::basic_string<_CharT, _Traits,
_Allocator> std::operator+(const _CharT*, __cxx11::basic_string<_CharT,
_Traits, _Allocator>&&) [with _CharT = char; _Traits = char_traits;
_Alloc = allocator]' at
/opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.h:3541:36,
inlined from 'int main()' at :10:10:
/opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/char_traits.h:431:56:
error: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
accessing 9223372036854775810 or more bytes at offsets [18,
9223372036854775807] and 17 may overlap up to 9223372036854775813 bytes at
offset -3 [-Werror=restrict]
  431 | return static_cast(__builtin_memcpy(__s1, __s2,
__n));
  |   
^
```

I'm not sure what the issue is here exactly. From the error message, it looks
some underflow (of `long long`) when trying to inline the
std::string::operator+?

It doesn't seem like a bug in libstdc++, since it compiles with gcc11.

Furthermore, if you just change the `"H" + ...` in the example to `"He" + ...`
it suddenly works.

The symptoms of this one look similar:
https://gcc.gnu.org/bugzilla//show_bug.cgi?id=85651

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread rsandifo at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #26 from rsandifo at gcc dot gnu.org  
---
> describes a different option on big-endian and little-endian

should have said: describes a different instruction.  In other words,
the mapping of gimple to RTL operations is fixed, but the mapping of
those RTL operations to machine instructions varies by endianness
(if registers are involved).

[Bug libfortran/106079] [12/13 regression] gfortran.dg/boz_15.f90 fails after r12-6498-g07c60b8e33

2022-08-03 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106079

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Jakub Jelinek  ---
Fixed for 12.2 and later.

[Bug libfortran/106079] [12/13 regression] gfortran.dg/boz_15.f90 fails after r12-6498-g07c60b8e33

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106079

--- Comment #9 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:4e5ca7ff8c9afd3c38245aa6b939cd3ae49bf1fe

commit r12-8653-g4e5ca7ff8c9afd3c38245aa6b939cd3ae49bf1fe
Author: Jakub Jelinek 
Date:   Mon Aug 1 08:26:03 2022 +0200

libfortran: Fix up boz_15.f90 on powerpc64le with -mabi=ieeelongdouble
[PR106079]

The boz_15.f90 test FAILs on powerpc64le-linux when -mabi=ieeelongdouble
is used (either default through --with-long-double-format=ieee or
when used explicitly).
The problem is that the read/write transfer routines are called with
BT_REAL (or BT_COMPLEX) type and kind 17 which is magic we use to say
it is the IEEE quad real(kind=16) rather than the IBM double double
real(kind=16).  For the floating point input/output we then handle kind
17 specially, but for B/O/Z we just treat the bytes of the floating point
value as binary blob and using 17 in that case results in unexpected
behavior, for write it means we don't estimate right how many chars we'll
need and print  etc. rather than what we should, and
even with explicit size we'd print one further byte than intended.
For read it would even mean overwriting some unrelated byte after the
floating point object.

Fixed by using 16 instead of 17 in the read_radix and write_{b,o,z} calls.

2022-08-01  Jakub Jelinek  

PR libfortran/106079
* io/transfer.c (formatted_transfer_scalar_read,
formatted_transfer_scalar_write): For type BT_REAL with kind 17
change kind to 16 before calling read_radix or write_{b,o,z}.

(cherry picked from commit 82ac4cd213867be939aedee15347e8fd3f200b6a)

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread rsandifo at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #25 from rsandifo at gcc dot gnu.org  
---
AIUI the rules are:

- GCC vector lane numbers always correspond to memory array indices.
  For example, lane 0 always comes first in memory.

- On big-endian targets, vector loads and stores are assumed to put the
  first memory element at the most significant end of the vector register.

So lane 0 refers to the most-significant register element on big-endian
targets and to the least-significant register element on little-endian
targets.  So:

  (vec_select:V4SI (reg:V4SI R)
[(const_int 2) (const_int 6) (const_int 3) (const_int 7)])

describes a different option on big-endian and little-endian but:

  (vec_select:V4SI (mem:V4SI M)
[(const_int 2) (const_int 6) (const_int 3) (const_int 7)])

is endian-independent.

[Bug rtl-optimization/106187] armhf: Miscompilation at O2 level (O0 / O1 are working)

2022-08-03 Thread rearnsha at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106187

Richard Earnshaw  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rearnsha at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #50 from Richard Earnshaw  ---
Fixed on master so far.

[Bug tree-optimization/106511] [13 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e

2022-08-03 Thread marxin at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2022-08-03
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

[Bug tree-optimization/106511] New: [13 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e

2022-08-03 Thread marxin at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511

Bug ID: 106511
   Summary: [13 Regression] New -Werror=maybe-uninitialized since
r13-1268-g8c99e307b20c502e
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: aldyh at gcc dot gnu.org
  Target Milestone: ---

Created attachment 53404
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53404=edit
test-case

Knowing the warning has a significant false-positive rate, but still, it may be
an interesting test-case. It's reduced from xen package:

$ gcc bunzip.i -Werror=maybe-uninitialized -O1
bunzip2.c: In function ‘get_next_block’:
bunzip2.c:261:27: error: ‘length’ may be used uninitialized
[-Werror=maybe-uninitialized]
bunzip2.c:224:17: note: ‘length’ declared here
cc1: some warnings being treated as errors

[Bug rtl-optimization/106187] armhf: Miscompilation at O2 level (O0 / O1 are working)

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106187

--- Comment #49 from CVS Commits  ---
The master branch has been updated by Richard Earnshaw :

https://gcc.gnu.org/g:64ce76d940501cb04d14a0d36752b4f93473531c

commit r13-1948-g64ce76d940501cb04d14a0d36752b4f93473531c
Author: Richard Earnshaw 
Date:   Wed Aug 3 10:01:51 2022 +0100

cselib: add function to check if SET is redundant [PR106187]

A SET operation that writes memory may have the same value as an
earlier store but if the alias sets of the new and earlier store do
not conflict then the set is not truly redundant.  This can happen,
for example, if objects of different types share a stack slot.

To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.

The routine is used in two places elsewhere in the compiler:
cfgcleanup and postreload.

gcc/ChangeLog:

PR rtl-optimization/106187
* alias.h (mems_same_for_tbaa_p): Declare.
* alias.cc (mems_same_for_tbaa_p): New function.
* dse.cc (record_store): Use it instead of open-coding
alias check.
* cselib.h (cselib_redundant_set_p): Declare.
* cselib.cc: Include alias.h
(cselib_redundant_set_p): New function.
* cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
of rtx_equal_for_cselib_p.
* postreload.cc (reload_cse_simplify): Use cselib_redundant_set_p.
(reload_cse_noop_set_p): Delete.

[Bug lto/106499] LTO runs forever in libfabric 1.15.1 linking

2022-08-03 Thread marxin at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106499

--- Comment #23 from Martin Liška  ---
> If may I ask yet another question 

Sure, don't hesitate ;)

> Martin can you tell how did you manage to diagnose that it was exactly that
> cause in this case?

I noticed we spent time in inliner (perf top) and then I suspected a flatten
attribute. So 'git grep flatten' proved that.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

Richard Biener  changed:

   What|Removed |Added

 CC||rearnsha at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org

--- Comment #24 from Richard Biener  ---
Richards, how is this handled on arm BE vs LE?  We don't have a specific
VECTOR_LANES_BIG_ENDIAN, but we are using BYTES_BIG_ENDIAN already for some
of the VEC_*_{LO,HI}_EXPR tree codes (but IIRC not for anything regarding
to VEC_PERM_EXPR for example which looks most related to select/concat on RTL)

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread linkw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #23 from Kewen Lin  ---
> Ideally we would avoid semantic difference of RTL depending on the target.
> If that's not avoidable there should be target macros/hooks that specify
> the desired semantics.  

Not sure, IMHO it seems it doesn't depend on the target but on endianness
(BYTES_BIG_ENDIAN)? Segher and Mike may have more insights on this.

> I assume the semantic difference is in
> vec_concat behavior but that's just documented as
> 
> @findex vec_concat
> @item (vec_concat:@var{m} @var{x1} @var{x2})
> Describes a vector concat operation.  The result is a concatenation of the
> vectors or scalars @var{x1} and @var{x2}; its length is the sum of the
> lengths of the two inputs.
> 
> which is a bit unspecific.  To me it implies that
> vec_select of a single lane N of the concat result can be distributed
> to the operands of the vec_concat in the obvious way (if N >=
> GET_MODE_NUNITS (x1) subtract GET_MODE_NUNITS and use x2)

Yeah, the documentation isn't clear, neither for vec_select. I guess vec_select
also matters here, the indexes for vec_select would have the LE ordering like
subreg byte offset on LE?

[Bug fortran/105924] false floating point exception when evaluating exponential function

2022-08-03 Thread yelinhui at hotmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105924

Lin-Hui Ye  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #2 from Lin-Hui Ye  ---
Sorry I didn't know this is an underflow. I was expecting exp(-16) to give
a value close to zero. Thanks for the explanation.

Linhui

(In reply to kargl from comment #1)
> Why do you thing that you should not get an exception?
> 
> e = -400
> e*e = 16
> -e*e = -16
> exp(-e*e) = exp(-16)  <-- This is going to underflow to zero.
> 
> You specifically asked gfortran to signal an exception if
> underflow occurs with the -ffpe-trap=underflow option.  The
> underflow threshold occurs at x = -745 for exp(x).

[Bug target/106322] i386: Wrong code at O2 level (O0 / O1 are working)

2022-08-03 Thread malat at debian dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #11 from Mathieu Malaterre  ---
(In reply to Uroš Bizjak from comment #10)
> The reason the test fails with gcc-12 is that gcc-12 enabled
> auto-vectorisation for -O2.

I can make the symptoms go away by doing: `-O2 -fno-tree-vectorize`. Since this
affects also arm5 and powerpc, it seems the bug is somewhere in the shared
32bits code (bug does not affects 64bits arch for some reason).

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread rguenther at suse dot de via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #22 from rguenther at suse dot de  ---
On Wed, 3 Aug 2022, linkw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069
> 
> --- Comment #21 from Kewen Lin  ---
> I didn't look into this in details, but something in the culprit commit caught
> my eyes, take altivec_vmrghh as example:
> 
> Before the patch, the pattern
> 
>[(set (match_operand:V8HI 0 "register_operand" "=v")
>  (vec_select:V8HI
>(vec_concat:V16HI
>  (match_operand:V8HI 1 "register_operand" "v")
>  (match_operand:V8HI 2 "register_operand" "v"))
>(parallel [(const_int 0) (const_int 8)
>   (const_int 1) (const_int 9)
>   (const_int 2) (const_int 10)
>   (const_int 3) (const_int 11)])))]
> 
> can match vmrghh on BE while vmrglh on LE. It indicates this pattern has
> different semantic from underlying instruction perspectives.
> 
> After the patch, this pattern only matches vmrghh.
> 
> IMHO, this part has semantic change before and after the patch. The code 
> before
> the patch looks more reasonable to me, since the pattern can have different
> meanings on BE and LE (underlying behavior).

Ideally we would avoid semantic difference of RTL depending on the target.
If that's not avoidable there should be target macros/hooks that specify
the desired semantics.  I assume the semantic difference is in
vec_concat behavior but that's just documented as

@findex vec_concat
@item (vec_concat:@var{m} @var{x1} @var{x2})
Describes a vector concat operation.  The result is a concatenation of the
vectors or scalars @var{x1} and @var{x2}; its length is the sum of the
lengths of the two inputs.

which is a bit unspecific.  To me it implies that
vec_select of a single lane N of the concat result can be distributed
to the operands of the vec_concat in the obvious way (if N >=
GET_MODE_NUNITS (x1) subtract GET_MODE_NUNITS and use x2)

[Bug target/47949] Missed optimization for -Os using xchg instead of mov.

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:fc6ef90173478521982e9df3831a06ea85b4f41e

commit r13-1945-gfc6ef90173478521982e9df3831a06ea85b4f41e
Author: Roger Sayle 
Date:   Wed Aug 3 09:07:36 2022 +0100

PR target/47949: Use xchg to move from/to AX_REG with -Oz on x86.

This patch adds a peephole2 to i386.md to implement the suggestion in
PR target/47949, of using xchg instead of mov for moving values to/from
the %rax/%eax register, controlled by -Oz, as the xchg instruction is
one byte shorter than the move it is replacing.

The new test case is taken from the PR:
int foo(int x) { return x; }

where previously we'd generate:
foo:mov %edi,%eax  // 2 bytes
ret

but with this patch, using -Oz, we generate:
foo:xchg %eax,%edi  // 1 byte
ret

On the CSiBE benchmark, this saves a total of 10238 bytes (reducing
the -Oz total from 3661796 bytes to 3651558 bytes, a 0.28% saving).

Interestingly, some modern architectures (such as Zen 3) implement
xchg using zero latency register renaming (just like mov), so in theory
this transformation could be enabled when optimizing for speed, if
benchmarking shows the improved code density produces consistently
better performance.  However, this is architecture dependent, and
there may be interactions using xchg (instead a single_set) in the
late RTL passes (such as cprop_hardreg), so for now I've restricted
this to -Oz.

2022-08-03  Roger Sayle  
UroÅ¡ Bizjak  

gcc/ChangeLog
PR target/47949
* config/i386/i386.md (peephole2): New peephole2 to convert
SWI48 moves to/from %rax/%eax where the src is dead to xchg,
when optimizing for minimal size with -Oz.

gcc/testsuite/ChangeLog
PR target/47949
* gcc.target/i386/pr47949.c: New test case.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread linkw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #21 from Kewen Lin  ---
I didn't look into this in details, but something in the culprit commit caught
my eyes, take altivec_vmrghh as example:

Before the patch, the pattern

   [(set (match_operand:V8HI 0 "register_operand" "=v")
 (vec_select:V8HI
   (vec_concat:V16HI
 (match_operand:V8HI 1 "register_operand" "v")
 (match_operand:V8HI 2 "register_operand" "v"))
   (parallel [(const_int 0) (const_int 8)
  (const_int 1) (const_int 9)
  (const_int 2) (const_int 10)
  (const_int 3) (const_int 11)])))]

can match vmrghh on BE while vmrglh on LE. It indicates this pattern has
different semantic from underlying instruction perspectives.

After the patch, this pattern only matches vmrghh.

IMHO, this part has semantic change before and after the patch. The code before
the patch looks more reasonable to me, since the pattern can have different
meanings on BE and LE (underlying behavior).

[Bug rtl-optimization/71775] Redundant move instruction for sign extension

2022-08-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71775

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f

commit r13-1942-gc23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f
Author: Roger Sayle 
Date:   Wed Aug 3 08:55:35 2022 +0100

Some additional zero-extension related optimizations in simplify-rtx.

This patch implements some additional zero-extension and sign-extension
related optimizations in simplify-rtx.cc.  The original motivation comes
from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees:

Failed to match this instruction:
(set (reg:DI 88 [ _1 ])
(sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))

On many platforms the result of DImode CTZ is constrained to be a
small unsigned integer (between 0 and 64), hence the truncation to
32-bits (using a SUBREG) and the following sign extension back to
64-bits are effectively a no-op, so the above should ideally (often)
be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".

To implement this, and some closely related transformations, we build
upon the existing val_signbit_known_clear_p predicate.  In the first
chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
can itself be simplified.  The second transformation is that we can
canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above)
when the operand's sign-bit is known to be clear.  The final two chunks
are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a
truncating SUBREG respectively.  The nonzero_bits of a truncating
SUBREG pessimistically thinks that the upper bits may have an
arbitrary value (by taking the SUBREG), so we need look deeper at the
SUBREG's operand to confirm that the high bits are known to be zero.

Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
default architecture options is undefined at zero, so we can't be sure
the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
nonzero_bits knows this, so the above transformations don't trigger,
but the transformations themselves are perfectly valid for other
operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
settings where CTZ is defined at zero.

2022-08-03  Roger Sayle  
Segher Boessenkool  
Richard Sandiford  

gcc/ChangeLog
* simplify-rtx.cc (simplify_unary_operation_1) : Add
optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS and LSHIFTRT
that are all positive to complement the existing FFS and
idempotent ABS simplifications.
: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
val_signbit_known_clear_p is true of the operand.
Simplify sign extensions of SUBREG truncations of operands
that are already suitably (zero) extended.
: Simplify zero extensions of SUBREG truncations
of operands that are already suitably zero extended.

[Bug target/106481] [13 Regression] ICE: in native_encode_rtx, at simplify-rtx.cc:6884 with -O2 -fno-dce -fno-forward-propagate -fno-rerun-cse-after-loop

2022-08-03 Thread roger at nextmovesoftware dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106481

Roger Sayle  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Roger Sayle  ---
This should now be fixed on mainline.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread yinyuefengyi at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #20 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
Another reference is manually change the generated assembly with modifying the
source and index vspltw to verify:

luoxhu@gcc135 build $ diff q.bad.s q.good.s -U12
--- q.bad.s 2022-08-03 06:30:08.298451116 +
+++ q.good.s2022-08-03 06:30:52.887250451 +
@@ -18,31 +18,31 @@
addi 2,2,.TOC.-.LCF0@l
.localentry _Z3fooPhPjDv4_jS1_S1_S1_,.-_Z3fooPhPjDv4_jS1_S1_S1_
mflr %r0
std %r0,16(%r1)
std %r30,-16(%r1)
std %r31,-8(%r1)
stdu %r1,-112(%r1)
.cfi_def_cfa_offset 112
.cfi_offset 65, 16
.cfi_offset 30, -16
.cfi_offset 31, -8
mr %r30,%r3
-   vspltw %v0,%v2,0
+   vspltw %v0,%v5,3
mfvsrwz %r7,%vs32
-   vspltw %v0,%v3,0
+   vspltw %v0,%v4,3
mfvsrwz %r6,%vs32
-   vspltw %v0,%v4,0
+   vspltw %v0,%v3,3
mfvsrwz %r5,%vs32
-   vspltw %v0,%v5,0
+   vspltw %v0,%v2,3
mfvsrwz %r31,%vs32
rldicl %r7,%r7,0,32
rldicl %r6,%r6,0,32
rldicl %r5,%r5,0,32
rldicl %r4,%r31,0,32
addis %r3,%r2,.LC0@toc@ha
addi %r3,%r3,.LC0@toc@l
bl printf
nop
stb %r31,0(%r30)
addi %r1,%r1,112
.cfi_def_cfa_offset 0

luoxhu@gcc135 build $ gcc q.good.s -o q.good
luoxhu@gcc135 build $ ./q.good
B0: 41fcef98, 91648e8b,7dca18c6,61707865

Which means both register and index are incorrectly used in LE nested
vec_select optimization.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread yinyuefengyi at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #19 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
(In reply to Xionghu Luo (luo...@gcc.gnu.org) from comment #15)
> In combine: vec_select(vec_concat and the followed vec_select are combined
> to a single extract instruction, which seems reasonable for both LE and BE?
> 
> R146:   0 1 2 3
> R141:   4 5 6 7
> R150:   2 6 3 7// vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7])
> R151:   R150[3]// vec_select(r150:V4SI,3)
> 
> => 
> 
> R151:   R141[3]   //  vec_select(r141:V4SI,3)
> 
>   
> 
> Trying 21 -> 24:
>21: r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel)
>   REG_DEAD r146:V4SI
>   REG_DEAD r141:V4SI
>24: {r151:SI=vec_select(r150:V4SI,parallel);clobber scratch;}
> Failed to match this instruction:
> (parallel [
> (set (reg:SI 151)
> (vec_select:SI (reg:V4SI 141)
> (parallel [
> (const_int 3 [0x3])
> ])))
> (clobber (scratch:SI))
> (set (reg:V4SI 150)
> (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
> (reg:V4SI 141))
> (parallel [
> (const_int 2 [0x2])
> (const_int 6 [0x6])
> (const_int 3 [0x3])
> (const_int 7 [0x7])
> ])))
> ])
> Failed to match this instruction:
> (parallel [
> (set (reg:SI 151)
> (vec_select:SI (reg:V4SI 141)
> (parallel [
> (const_int 3 [0x3])
> ])))
> (set (reg:V4SI 150)
> (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
> (reg:V4SI 141))
> (parallel [
> (const_int 2 [0x2])
> (const_int 6 [0x6])
> (const_int 3 [0x3])
> (const_int 7 [0x7])
> ])))
> ])
> Successfully matched this instruction:
> (set (reg:V4SI 150)
> (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
> (reg:V4SI 141))
> (parallel [
> (const_int 2 [0x2])
> (const_int 6 [0x6])
> (const_int 3 [0x3])
> (const_int 7 [0x7])
> ])))
> Successfully matched this instruction:
> (set (reg:SI 151)
> (vec_select:SI (reg:V4SI 141)
> (parallel [
> (const_int 3 [0x3])
> ])))
> allowing combination of insns 21 and 24
> original costs 4 + 4 = 8
> replacement costs 4 + 4 = 8
> modifying insn i221:
> r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel)
>   REG_DEAD r146:V4SI
> deferring rescan insn with uid = 21.
> modifying insn i324: {r151:SI=vec_select(r141:V4SI,parallel);clobber
> scratch;}
>   REG_DEAD r141:V4SI
> deferring rescan insn with uid = 24.
> 
> 
> I guess the previous unspec implementation bypassed the LE + LE swap check,
> so now in split2, we should generate vextuwlx instead of vextuwrx on little
> endian?


This nested vec_select+vec_select+vec_concat optimization is introduced by Uros
in simplify-rtx.c by PR32661, unfortunately it only works for Power BE
platforms, disable that piece of code could work due to not combined the nested
vec_select optimizations...

For Power LE, firstly:

Trying 21 -> 24:

 R146:   3 2 1 0
 R141:   7 6 5 4
 R150:   7 3 6 2// vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7])
 R151:   R150[3]// vec_select(r150:V4SI,3)

 => 

currently:
 R151:   R141[3]   //  vec_select(r141:V4SI,3)

But it should be:
 R151:   R146[3]   //  vec_select(r146:V4SI,3)

Which means current:

R151: R150[3] R141[3]
R153: R150[2] R146[3]
R155: R150[1] R141[2]
R157: R150[0] R146[2]

Should be optimized to after the first nested vec_select optimization:

R151: R150[3] R146[3]
R153: R150[2] R141[3]
R155: R150[1] R146[2]
R157: R150[0] R141[2]

With some little endian check and swap could achieve the result (swap op00 and
op01).  But
Secondly there is another "nested vec_select" optimisation which produces
R151=R165[3]:

Trying 21 -> 26:
...

R146 R165 R163 [7 3 6 2]
R151: R146[3]   =>  R165[3]  (this is wrong!)

While R162, R163, R164, R165 is input value R0 R1 R2 R3.  the
vsx_extract_v4si_di_p9 index should be "0" instead of "3".

correct should be:

R151: R165[0]
R153: R164[0]
R155: R163[0]
R157: R162[0]


(insn 44 2 4 2 (set (reg:V4SI 162)
(reg:V4SI 66 2 [ R0 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}
 (expr_list:REG_DEAD (reg:V4SI 66 2 [ R0 ])
(nil)))
(note 4 44 45 2 NOTE_INSN_DELETED)
(insn 45 4 5 2 (set (reg:V4SI 163)
(reg:V4SI 67 3 [ R1 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}
 (expr_list:REG_DEAD (reg:V4SI 67 3 [ R1 ])
(nil)))
(note 5 45 46 2 NOTE_INSN_DELETED)
(insn 46 5 6 2 (set (reg:V4SI 164)
(reg:V4SI 68 4 [ R2 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}

75 matches

Mail list logo