[Bug tree-optimization/111595] New: detection of MIN/MAX with truncation and sign change for the result

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111595

Bug ID: 111595
   Summary: detection of MIN/MAX with truncation and sign change
for the result
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
unsigned short f(long a, long b)
{
short as = a;
short bs = b;
unsigned short asu = a;
unsigned short bsu = b;
if (as < bs) return asu;
return bsu;
}
unsigned short f0(long a, long b)
{
short as = a;
short bs = b;
unsigned short asu = a;
unsigned short bsu = b;
if (as < bs) return as;
return bs;
}

unsigned short f1(long a, long b)
{
short as = a;
short bs = b;
unsigned short asu = a;
unsigned short bsu = b;
signed short t;
if (as < bs) t = as;
else t = bs;
return t;
}
```

Currently only f1 detects MIN here. They all should produce the same IR in the
end.

[Bug middle-end/111594] RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594

--- Comment #4 from Andrew Pinski  ---
(In reply to JuzheZhong from comment #3)
> (In reply to Andrew Pinski from comment #1)
> > The SVE one was added with r12-4402-g62b505a4d5fc89:
> > ```
> > /* Detect simplication for a conditional reduction where
> > 
> >a = mask1 ? b : 0
> >c = mask2 ? d + a : d
> > 
> >is turned into
> > 
> >c = mask1 && mask2 ? d + b : d.  */
> > (simplify
> >   (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)
> >(IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))
> > ```
> > Most likely should do the similar thing for IFN_COND_LEN_ADD too.
> 
> Hi, I saw ARM SVE failed to fold VEC_COND + COND_ADD into COND_ADD on
> float vector since it can't satisfy integer_zerop.
> 
> Is is reasonable the same optimization should also work for float vector ?

I suspect it would only be valid if `!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS
(type)` is true. So it could use (match on) zerop instead but would need to
check the above conditional too.

[Bug middle-end/111594] RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD

2023-09-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594

--- Comment #3 from JuzheZhong  ---
(In reply to Andrew Pinski from comment #1)
> The SVE one was added with r12-4402-g62b505a4d5fc89:
> ```
> /* Detect simplication for a conditional reduction where
> 
>a = mask1 ? b : 0
>c = mask2 ? d + a : d
> 
>is turned into
> 
>c = mask1 && mask2 ? d + b : d.  */
> (simplify
>   (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)
>(IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))
> ```
> Most likely should do the similar thing for IFN_COND_LEN_ADD too.

Hi, I saw ARM SVE failed to fold VEC_COND + COND_ADD into COND_ADD on
float vector since it can't satisfy integer_zerop.

Is is reasonable the same optimization should also work for float vector ?

[Bug middle-end/111594] RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD

2023-09-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594

--- Comment #2 from JuzheZhong  ---
Oh, I see. Thanks a lot! I will have a try.

[Bug middle-end/111594] RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-09-26

--- Comment #1 from Andrew Pinski  ---
The SVE one was added with r12-4402-g62b505a4d5fc89:
```
/* Detect simplication for a conditional reduction where

   a = mask1 ? b : 0
   c = mask2 ? d + a : d

   is turned into

   c = mask1 && mask2 ? d + b : d.  */
(simplify
  (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)
   (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))
```
Most likely should do the similar thing for IFN_COND_LEN_ADD too.

[Bug c/111594] New: RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD

2023-09-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594

Bug ID: 111594
   Summary: RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Consider this following case:


#include 

void single_loop_with_if_condition(uint64_t * restrict a, 
uint64_t * restrict b,
int loop_size) {
  uint64_t result = 0;

  for (int i = 0; i < loop_size; i++) {
if (b[i] <= a[i]) {
  result += a[i];
}
  }

  a[0] = result;
}

In ARM SVE:

vect__ifc__33.15_48 = VEC_COND_EXPR ;
vect__34.16_49 = .COND_ADD (loop_mask_41, vect_result_19.7_38,
vect__ifc__33.15_48, vect_result_19.7_38);

will be folded into:

vect__34.16_49 = .COND_ADD (_50, vect_result_19.7_38, vect__7.13_45,
vect_result_19.7_38);

However, for RVV, if failed to fold VEC_COND_EXPR + COND_LEN_ADD.

vect__ifc__44.30_96 = VEC_COND_EXPR ;
  vect__45.31_97 = .COND_LEN_ADD ({ -1, ... }, vect_result_35.22_78,
vect__ifc__44.30_96, vect_result_35.22_78, _104, 0);

I am not sure where to do this optimization?

[Bug target/111533] [14 Regression] ICE: RTL check: expected code 'reg', have 'const_int' in rhs_regno, at rtl.h:1934

2023-09-25 Thread xuli1 at eswincomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111533

--- Comment #3 from xuli1 at eswincomputing dot com  ---
The problem has been reproduced, thank you.

[Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8

2023-09-25 Thread lili.cui at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

--- Comment #7 from cuilili  ---
(In reply to Martin Jambor from comment #6)
> I believe this has been fixed?

Yes.

[Bug target/111545] [14 Regression] RISC-V gfortran.dg/host_assoc_function_7.f09 Illegal instruction error

2023-09-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111545

--- Comment #4 from JuzheZhong  ---
Confirm this is the latent bug in VSETVL PASS which is already existed for a
long time.

Lehua is working on refactoring Phase 1 and Phase 2 of VSETVL PASS which will
fix all potential issues of VSETVL PASS.

[Bug middle-end/94267] Missed folding of _MEM_REF

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94267

--- Comment #4 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #3)
> Right now we depend on not doing the folding, PR 110702.

Well rather we depend on not folding *(_MEM_REF) ...

[Bug middle-end/94267] Missed folding of _MEM_REF

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94267

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=110702

--- Comment #3 from Andrew Pinski  ---
Right now we depend on not doing the folding, PR 110702.

[Bug middle-end/111497] [11/12/13/14 Regression] ICE building mariadb on i686 since r8-470

2023-09-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111497

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:3c23defed384cf17518ad6c817d94463a445d21b

commit r14-4256-g3c23defed384cf17518ad6c817d94463a445d21b
Author: Vladimir N. Makarov 
Date:   Mon Sep 25 16:19:50 2023 -0400

[PR111497][LRA]: Copy substituted equivalence

When we substitute the equivalence and it becomes shared, we can fail
to correctly update reg info used by LRA.  This can result in wrong
code generation, e.g. because of incorrect live analysis.  It can also
result in compiler crash as the pseudo survives RA.  This is what
exactly happened for the PR.  This patch solves this problem by
unsharing substituted equivalences.

gcc/ChangeLog:

PR middle-end/111497
* lra-constraints.cc (lra_constraints): Copy substituted
equivalence.
* lra.cc (lra): Change comment for calling unshare_all_rtl_again.

gcc/testsuite/ChangeLog:

PR middle-end/111497
* g++.target/i386/pr111497.C: new test.

[Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization

2023-09-25 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #2 from Jonathan Wakely  ---
This needs numbers, not opinions.

[Bug target/111593] New: wrong code for 128-bit multiplication on MIPS64R6

2023-09-25 Thread mikulas at artax dot karlin.mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111593

Bug ID: 111593
   Summary: wrong code for 128-bit multiplication on MIPS64R6
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mikulas at artax dot karlin.mff.cuni.cz
  Target Milestone: ---

MIPS64R6 has new instructions for multiplication and division. GCC uses them,
however it miscompiles 128-bit multiplication.

When you compile and run this program with -O1 or -O2 on mips64r6, you get
incorrect result 9F172AF9AEE4FDB2FD12E7537CC82A0F. The correct result is
60E3DC5DAC542B19FD12E7537CC82A0F.

#include 

__attribute__((noinline,noclone)) static unsigned __int128 power(unsigned
__int128 a, unsigned __int128 b)
{
unsigned __int128 c = 1;
while (b) {
if (b & 1)
c *= a;
a *= a;
b >>= 1;
}
return c;
}

int main(void)
{
int i;
unsigned __int128 a = 0x1234567890abcdefULL;
unsigned __int128 b = 0x1234567890abcdefULL;
unsigned __int128 c = power(a, b);
for (i = 124; i >= 0; i -= 4) {
printf("%X", (unsigned)(c >> i) & 0xf);
}
printf("\n");
return 0;
}

How to reproduce:

On Debian SID, install the packages gcc-13-mipsisa64r6-linux-gnuabi64,
libc6-dev-mips64r6-cross and qemu-user.

Run mipsisa64r6-linux-gnuabi64-gcc-13 -O2 power.c && /usr/bin/qemu-mips64 -L
/usr/mipsisa64r6-linux-gnuabi64/ a.out

The bug happens with gcc-10, gcc-11, gcc-12 and gcc-13 (I didn't try older
releases).

[Bug gcov-profile/110827] C++20 coroutines aren't being measured by gcov

2023-09-25 Thread mwd at md5i dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110827

--- Comment #10 from Michael Duggan  ---
To sum up what I have figured out, C++ transforms the coroutine "function" into
a trio of functions: a ramp function, an actor function, and a destruction
function.  The ramp function acts as the actual function (by name).  The actor
function contains the original body of the written function (with some
transformations), and thus contains the code associated with most of the lines
that need coverage information.

Since the actor function is generated artificially, it is marked as artificial.
 The gcov program explicitly ignores functions that are marked as artificial. 
Also, even if that were not the case, it looks to me like the line coverage
information for the actor function only includes the initial line of the
function.  This seems to be due to the way the artificial function gets
inserted into the list of functions of the program.

In order to solve this problem, we would need to at least the following: 
  Find a way to not ignore the actor function.  This would involve either not
marking it as artificial or by marking it in some other way that would be
recognized by gcov.
  Ensure that the actor function properly includes the line number information
from the original coroutine body.

Most of this work would probably need to be done in the c++ code (where the
coroutine transformation happens) rather than in the coverage code.  Should
this be reassigned to the c++ component?

[Bug middle-end/109967] [11/12/13/14 Regression] Wrong code at -O2 on x86_64-linux-gnu

2023-09-25 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109967

Xi Ruoyao  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=111294
 CC||rguenther at suse dot de

--- Comment #9 from Xi Ruoyao  ---
Bisect shows r14-4089 (the fix for PR111294) either fixes or "covers up" the
issue.

[Bug fortran/59298] ICE when initialising PARAMETER array of derived-type (containing an array) using array constructor

2023-09-25 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59298

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED
   Keywords||ice-on-valid-code
  Known to work||10.5.0
  Known to fail||7.5.0, 8.5.0, 9.5.0
   Target Milestone|--- |10.5

--- Comment #16 from anlauf at gcc dot gnu.org ---
Fixed in gcc-10.

[Bug fortran/84693] scalar DT not broadcast across an array in an initialization expression

2023-09-25 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84693
Bug 84693 depends on bug 59298, which changed state.

Bug 59298 Summary: ICE when initialising PARAMETER array of derived-type 
(containing an array) using array constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59298

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

[Bug target/111570] -march=generic prints error

2023-09-25 Thread brjd_epdjq36 at kygur dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111570

--- Comment #2 from Brjd  ---
Thank you and I also read this guide. My point is that the generic arch might
be  possible in theory. If the gcc builds for the oldest CPU with x86_64, is it
possible that code will run on all modern CPU since their subset includes also
that of their predecessor. 

How about making it default to that generic or baseline build for that limited
CPU?

If I could ask you also more questions, let me ask you about this problem. The
guide doesn't mention anything about the specific arch. If -march=cpu what is
better -mtune=cpu where cpu is the same as in arch or -mtune=generic so that
the code tunes to all CPU kinds of this family.If the tune is empty, is it
default generic or native and the arch is not clear either.

One question more, I am not able to find a guide about the gcc build and no
information whether the gcc may be built in targets like LLVM and clang. For
example, is it possible to build first only the LLVM, then stop and resume with
clang etc. or first, gcc's only c modiule and its submodules, then stop and
resume with its g++  module and submodules, next with libgcc, libstdc++ etc.? 

It would be great, especially for long bootstraps and stage 2, but I find only
make all-gcc, target-libgcc which however build almost all of the compiler.

[Bug target/109166] Built-in __atomic_test_and_set does not seem to be atomic on ARMv4T

2023-09-25 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109166

--- Comment #9 from Hans-Peter Nilsson  ---
(In reply to Richard Earnshaw from comment #8)
> I'm going to close this as WONTFIX.

I guess I'll have to find another PR to lean on, for fixing the underlying
cause for the nonatomic code.

[Bug target/104831] RISCV libatomic LR.aq/SC.rl pair insufficient for SEQ_CST

2023-09-25 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104831

Patrick O'Neill  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #11 from Patrick O'Neill  ---
This has been resolved on trunk:
https://inbox.sourceware.org/gcc-patches/20230427162301.1151333-1-patr...@rivosinc.com/
The cover letter there contains a lot more context about why the mappings are
wrong and why we implemented a strengthened version of Table A.6.
These mappings are included in the RISC-V PSABI doc:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378

And this series has been backported to be included in GCC 13.3 (along with a
bugfix):
https://inbox.sourceware.org/gcc-patches/20230725180206.284777-1-patr...@rivosinc.com/

[Bug target/111533] [14 Regression] ICE: RTL check: expected code 'reg', have 'const_int' in rhs_regno, at rtl.h:1934

2023-09-25 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111533

--- Comment #2 from Patrick O'Neill  ---
Hi,

I believe the issue is that you're using rv64gc, not rv64gcv.

I haven't tried building with multilib, so my commands are:

../configure --with-arch=rv64gcv --with-abi=lp64d --enable-gcc-checking=rtl

make linux -j32

[Bug target/111546] [14 Regression] ICE: gfortran.dg/overload_5.f90:53:2: internal compiler error: in emit_move_insn, at expr.cc:4219 since r14-4163-gbea89f78f2f

2023-09-25 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111546

Patrick O'Neill  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Patrick O'Neill  ---
gfortran.dg/overload_5.f90 failures have been resolved!

[Bug c++/111592] [11/12/13/14 Regression] ICE on expanding argument pack into variadic constructor

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111592

Andrew Pinski  changed:

   What|Removed |Added

Summary|ICE on expanding argument   |[11/12/13/14 Regression]
   |pack into variadic  |ICE on expanding argument
   |constructor |pack into variadic
   ||constructor
   Last reconfirmed||2023-09-25
   Target Milestone|--- |11.5
  Known to work||5.1.0, 5.5.0
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
  Known to fail||6.1.0, 6.2.0
   Keywords||ice-on-valid-code

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

--- Comment #1 from Andrew Pinski  ---
>for programs that know they are effectively always multithreaded they pay for 
>a runtime branch and .text segment bloat for an optimization that never 
>applies.

The bloat is not much and the overhead for a branch compared to atomics is
still not going to have a bent.


I suspect you are looking into the wrong place for optimizations really.

[Bug middle-end/109967] [11/12/13/14 Regression] Wrong code at -O2 on x86_64-linux-gnu

2023-09-25 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109967

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #8 from Xi Ruoyao  ---
(In reply to Shaohua Li from comment #7)
> This test case does not reproduce anymore on the current trunk. Maybe one of
> the recent fixes fixed the underlying issue as well.

But we still need to ensure the fix backported into 11/12/13.  And there is
still a chance that the issue might be covered up by an unrelated change.

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

Mathieu Malaterre  changed:

   What|Removed |Added

  Known to work||11.4.0

--- Comment #5 from Mathieu Malaterre  ---
(In reply to Mathieu Malaterre from comment #3)
> I can make the upstream code fails using g++-11 / g++-12 version
> (Debian/sid).

Nevermind, it seems g++ 11.4.0 can handle the original test case.

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

Mathieu Malaterre  changed:

   What|Removed |Added

  Known to work||10.5.0

--- Comment #4 from Mathieu Malaterre  ---
g++-10 seems to handle -O3/-mstrict-align

[Bug middle-end/109967] [11/12/13/14 Regression] Wrong code at -O2 on x86_64-linux-gnu

2023-09-25 Thread shaohua.li at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109967

--- Comment #7 from Shaohua Li  ---
This test case does not reproduce anymore on the current trunk. Maybe one of
the recent fixes fixed the underlying issue as well.

[Bug modula2/111530] Unable to build GM2 standard library on BSD due to a `getopt_long_only' GNU extension dependency

2023-09-25 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111530

Gaius Mulley  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2023-09-25

--- Comment #1 from Gaius Mulley  ---
Many thanks for the bug report and hints on how to fix it.

[Bug c++/111592] New: ICE on expanding argument pack into variadic constructor

2023-09-25 Thread yankel-pro at scialom dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111592

Bug ID: 111592
   Summary: ICE on expanding argument pack into variadic
constructor
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yankel-pro at scialom dot org
  Target Milestone: ---

GCC raises an Internal Compiler Error in c_common_parse_file() when
(indirectly, see source) expanding an argument pack into a variadic
constructor.

$ g++ --version
g++
(Compiler-Explorer-Build-gcc-1eb80f78f114f6582c349f75e08b361a0a582091-binutils-2.40)
14.0.0 20230925 (experimental)

$ cat source
struct ignore
{ ignore(...) {} };

template
void
InternalCompilerError(Args... args)
{ ignore{ ignore(args) ... }; }

int
main()
{ InternalCompilerError(0, 0); }

$ g++ -c source
: In instantiation of 'void InternalCompilerError(Args ...) [with Args
= {int, int}]':
:11:24:   required from here
:7:3: internal compiler error: in finish_expr_stmt, at
cp/semantics.cc:910
7 | { ignore{ ignore(args) ... }; }
  |   ^~
0x251c8ee internal_error(char const*, ...)
???:0
0xae8dda fancy_abort(char const*, int, char const*)
???:0
0xcfa8f8 instantiate_decl(tree_node*, bool, bool)
???:0
0xd2dcbb instantiate_pending_templates(int)
???:0
0xbded50 c_parse_final_cleanups()
???:0
0xe149d8 c_common_parse_file()
???:0

Found on Compiler Explorer <https://godbolt.org/z/M788xE44z>.

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

--- Comment #3 from Mathieu Malaterre  ---
I can make the upstream code fails using g++-11 / g++-12 version (Debian/sid).

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-09-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

Richard Biener  changed:

   What|Removed |Added

   Keywords||needs-bisection

--- Comment #2 from Richard Biener  ---
does it work with older GCC?

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

--- Comment #1 from Mathieu Malaterre  ---
Created attachment 55989
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55989=edit
cvise reduced test case

% g++ -std=c++11 -o works -DHWY_COMPILE_ONLY_EMU128 -DHWY_BROKEN_EMU128=0
-maltivec -mcpu=power8  -g -O3 alt.cc  -Wall -Wextra -Werror -Wfatal-errors

% g++ -std=c++11 -o fails -DHWY_COMPILE_ONLY_EMU128 -DHWY_BROKEN_EMU128=0
-maltivec -mcpu=power8 -mstrict-align -g -O3 alt.cc  -Wall -Wextra -Werror
-Wfatal-errors

should give:

% ./works
-> success

but:

% ./fails 
fails: alt.cc:395: void hwy::detail::AssertArrayEqual(const TypeInfo&, const
void*, const void*, size_t, const char*, const char*, int): Assertion
`memcmp(a, b, c * ti.sizeof_t) == 0' failed.
zsh: abort  ./fails

[Bug target/111591] New: ppc64be: miscompilation with -mstrict-align / -O3

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

Bug ID: 111591
   Summary: ppc64be: miscompilation with -mstrict-align / -O3
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: malat at debian dot org
  Target Milestone: ---

I am seeing a regression in highway unit test on ppc64be when using
-mstrict-align / -O3

454/530 Test #454:
HwyWidenMulTestGroup/HwyWidenMulTest.TestAllSatWidenMulPairwiseAdd/EMU128  #
GetParam() = 2305843009213693952 .Subprocess aborted***Exception:  
0.00 sec
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter =
HwyWidenMulTestGroup/HwyWidenMulTest.TestAllSatWidenMulPairwiseAdd/EMU128
[==] Running 1 test from 1 test suite.
[--] Global test environment set-up.
[--] 1 test from HwyWidenMulTestGroup/HwyWidenMulTest
[ RUN  ]
HwyWidenMulTestGroup/HwyWidenMulTest.TestAllSatWidenMulPairwiseAdd/EMU128


i16x4 expect [0+ ->]:
  0x7FFF,0x7FFF,0x7FFF,0x7FFF,
i16x4 actual [0+ ->]:
  0x7FFF,0x01A5,0x7FFF,0x7FFF,
Abort at ./hwy/tests/widen_mul_test.cc:205: EMU128, i16x4 lane 1 mismatch:
expected '0x7FFF', got '0x01A5'.



ref:
https://buildd.debian.org/status/fetch.php?pkg=highway=ppc64=1.0.8%7Egit20230918.1e3a3d7-4=1695113957=0

[Bug c/111590] New: RISC-V: Multiple ICE in gfortran regression with 'V' Extension enabled

2023-09-25 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111590

Bug ID: 111590
   Summary: RISC-V: Multiple ICE in gfortran regression with 'V'
Extension enabled
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

FAIL: gfortran.dg/assumed_rank_24.f90   -O2  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/assumed_rank_24.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/assumed_rank_24.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/assumed_rank_24.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/assumed_rank_24.f90   -O3 -g  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/assumed_rank_24.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/class_to_type_1.f03   -O2  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/class_to_type_1.f03   -O2  (test for excess errors)
FAIL: gfortran.dg/class_to_type_1.f03   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/class_to_type_1.f03   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/class_to_type_1.f03   -O3 -g  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/class_to_type_1.f03   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/class_array_4.f03   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/cshift_bounds_4.f90   -O2  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/cshift_bounds_4.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/cshift_bounds_4.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/cshift_bounds_4.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/cshift_bounds_4.f90   -O3 -g  (internal compiler error: in
smallest_mode_for_size, at stor-layout.cc:356)
FAIL: gfortran.dg/cshift_bounds_4.f90   -O3 -g  (test for excess errors)

One of the case:

program main
  integer, dimension(:,:), allocatable :: a, b
  integer, dimension(:), allocatable :: sh
  allocate (a(2,2))
  allocate (b(2,2))
  allocate (sh(3))
  a = 1
  b = cshift(a,sh)
end program main

[Bug target/109166] Built-in __atomic_test_and_set does not seem to be atomic on ARMv4T

2023-09-25 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109166

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|NEW |RESOLVED

--- Comment #8 from Richard Earnshaw  ---
I'm going to close this as WONTFIX.

There are several reasons for this.

There's no SWPH operation, so it's impossible to generalize atomic operations
for all basic data types.  It's not possible to synthesize a 16-bit atomic type
with either SWP or SWPB.

There's no support in Thumb state for SWP[B].

The instruction was removed in later versions of the architecture, which makes
code non-portable.

Finally, Armv4, which dates to around 1995, is essentially in maintenance only
mode and this is really a new feature request.  In fact, I don't think we'd
really want to add new features for anything before Armv7 these days (even that
is more than 10 years old).

[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)

2023-09-25 Thread cptarse-luke at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500

Luke  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #8 from Luke  ---


*** This bug has been marked as a duplicate of bug 104773 ***

[Bug rtl-optimization/104773] compare with 1 not merged with subtract 1

2023-09-25 Thread cptarse-luke at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773

Luke  changed:

   What|Removed |Added

 CC||cptarse-luke at yahoo dot com

--- Comment #3 from Luke  ---
*** Bug 111500 has been marked as a duplicate of this bug. ***

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #10 from Mathieu Malaterre  ---
for reference:

% c++ --verbose  -O2 -flto   base2.cc  && ./a.out
Using built-in specs.
COLLECT_GCC=c++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/powerpc64le-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: powerpc64le-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 13.2.0-4'
--with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-13
--program-prefix=powerpc64le-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --with-libphobos-druntime-only=yes
--enable-objc-gc=auto --enable-secureplt --enable-targets=powerpcle-linux
--disable-multilib --enable-multiarch --disable-werror --with-long-double-128
--enable-offload-targets=nvptx-none=/build/reproducible-path/gcc-13-13.2.0/debian/tmp-nvptx/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=powerpc64le-linux-gnu --host=powerpc64le-linux-gnu
--target=powerpc64le-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=4
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.0 (Debian 13.2.0-4) 
COLLECT_GCC_OPTIONS='-v' '-O2' '-flto' '-shared-libgcc' '-dumpdir' 'a-'
 /usr/libexec/gcc/powerpc64le-linux-gnu/13/cc1plus -quiet -v -imultiarch
powerpc64le-linux-gnu -D_GNU_SOURCE base2.cc -msecure-plt -quiet -dumpdir a-
-dumpbase base2.cc -dumpbase-ext .cc -O2 -version -flto
-fasynchronous-unwind-tables -o /tmp/cc1cimSD.s
GNU C++17 (Debian 13.2.0-4) version 13.2.0 (powerpc64le-linux-gnu)
compiled by GNU C version 13.2.0, GMP version 6.3.0, MPFR version
4.2.1, MPC version 1.3.1, isl version isl-0.26-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory
"/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../../include/powerpc64-linux-gnu/c++/13"
ignoring nonexistent directory "/usr/local/include/powerpc64le-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc64le-linux-gnu/13/include-fixed/powerpc64le-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc64le-linux-gnu/13/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../../powerpc64le-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/13
 /usr/include/powerpc64le-linux-gnu/c++/13
 /usr/include/c++/13/backward
 /usr/lib/gcc/powerpc64le-linux-gnu/13/include
 /usr/local/include
 /usr/include/powerpc64le-linux-gnu
 /usr/include
End of search list.
Compiler executable checksum: 403ce0768541423839c6b7d8fd9dfeff
COLLECT_GCC_OPTIONS='-v' '-O2' '-flto' '-shared-libgcc' '-dumpdir' 'a-'
 as -v -a64 -mpower8 -many -mlittle -o /tmp/ccFzBgtQ.o /tmp/cc1cimSD.s
GNU assembler version 2.41 (powerpc64le-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.41
COMPILER_PATH=/usr/libexec/gcc/powerpc64le-linux-gnu/13/:/usr/libexec/gcc/powerpc64le-linux-gnu/13/:/usr/libexec/gcc/powerpc64le-linux-gnu/:/usr/lib/gcc/powerpc64le-linux-gnu/13/:/usr/lib/gcc/powerpc64le-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/powerpc64le-linux-gnu/13/:/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../powerpc64le-linux-gnu/:/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../../lib/:/lib/powerpc64le-linux-gnu/:/lib/../lib/:/usr/lib/powerpc64le-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-O2' '-flto' '-shared-libgcc' '-dumpdir' 'a.'
 /usr/libexec/gcc/powerpc64le-linux-gnu/13/collect2 -plugin
/usr/libexec/gcc/powerpc64le-linux-gnu/13/liblto_plugin.so
-plugin-opt=/usr/libexec/gcc/powerpc64le-linux-gnu/13/lto-wrapper
-plugin-opt=-fresolution=/tmp/ccSvdAAw.res -plugin-opt=-pass-through=-lgcc_s
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc
-plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -flto
--build-id --eh-frame-hdr -V -m elf64lppc --hash-style=gnu --as-needed
-dynamic-linker /lib64/ld64.so.2 -pie
/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../powerpc64le-linux-gnu/Scrt1.o
/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../powerpc64le-linux-gnu/crti.o
/usr/lib/gcc/powerpc64le-linux-gnu/13/crtbeginS.o
-L/usr/lib/gcc/powerpc64le-linux-gnu/13
-L/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../powerpc64le-linux-gnu
-L/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../../lib
-L/lib/powerpc64le-linux-gnu -L/lib/../lib -L/usr/lib/powerpc64le-linux-gnu
-L/usr/lib/../lib 

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #9 from Mathieu Malaterre  ---
If you download pr111522.cc from comment #8, you should be able to reproduce
exactly the original upstream issue.

Steps:

% c++ -O2 -flto pr111522.cc  && ./a.out


vs

% c++ -O2 pr111522.cc && ./a.out

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #8 from Mathieu Malaterre  ---
Created attachment 55988
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55988=edit
gcc -E -P

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #7 from Mathieu Malaterre  ---
Created attachment 55987
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55987=edit
gcc -E -P

[Bug target/104611] memcmp/strcmp/strncmp can be optimized when the result is tested for [in]equality with 0 on aarch64

2023-09-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611

Mathias Stearn  changed:

   What|Removed |Added

 CC||redbeard0531 at gmail dot com

--- Comment #4 from Mathias Stearn  ---
clang has already been using the optimized memcmp code since v16, even at -O1:
https://www.godbolt.org/z/qEd768TKr. Older versions (at least since v9) were
still branch-free, but via a less optimal sequence of instructions.

GCC's code gets even more ridiculous at 32 bytes, because it does a branch
after every 8-byte compare, while the clang code is fully branch-free (not that
branch-free is always better, but it seems clearly so in this case).

Judging by the codegen, there seems to be three deficiencies in GCC: 1) an
inability to take advantage of the load-pair instructions to load 16-bytes at a
time, and 2) an inability to use ccmp to combine comparisons. 3) using
branching rather than cset to fill the output register. Ideally these could all
be done in the general case by the low level instruction optimizer, but even
getting them special cased for memcmp (and friends) would be an improvement.

[Bug tree-optimization/111563] Missed optimization of LICM

2023-09-25 Thread 652023330028 at smail dot nju.edu.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563

--- Comment #5 from Yi <652023330028 at smail dot nju.edu.cn> ---
(In reply to Andrew Pinski from comment #3)

> So this is again reassociation with LIM, the same issue as PR 111560.

For this similar code, GCC works as expected:
https://godbolt.org/z/3TaqfeTqb

```c++
extern int var_24;
int t;
void test(int var_2, int var_3, int var_8, int var_10, int var_14) {

for (int i_2 = -3247424; i_2 < 19; i_2 += var_3 + 1056714155) 
{
var_24 += (-(200 / var_10)) + (-var_8);
var_24 += var_14 + var_2;

i_2+=i_2/3;
}
}
```
So it seems that this and PR 111560 may not be due to the same cause.

Because it doesn't seem to be relevant to the statement, "Our re-association
only produces a canonical order within a single expression."



Meanwhile, in Example 2, 'if(var_3)' is actually optimized out of the Loop by
Loop Unswitch. So maybe the rest of the loop should be optimized as expected
like this similar code?

[Bug libstdc++/111589] New: Use relaxed atomic increment (but not decrement!) in shared_ptr

2023-09-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111589

Bug ID: 111589
   Summary: Use relaxed atomic increment (but not decrement!) in
shared_ptr
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

The atomic increment when copying a shared_ptr can be relaxed because it is
never actually used as a synchronization operation. The current thread must
already have sufficient synchronization to access the memory because it can
already deref the pointer. All synchronization is done either via whatever
program-provided code makes the shared_ptr object available to the thread, or
in the atomic decrement (where the decrements to non-zero are releases that
ensure all uses of the object happen before the final decrement to zero
acquires and destroys the object).

As an argument-from-authority, libc++ already is using relaxed for increments
and acq/rel for decements:
https://github.com/llvm/llvm-project/blob/c649fd34e928ad01951cbff298c5c44853dd41dd/libcxx/include/__memory/shared_ptr.h#L101-L121

This will have no impact on x86 where all atomic RMWs are effectively
sequentially consistent, but it will enable the use of ldadd rather than
ldaddal on aarch64, and similar optimizations on other weaker architectures.

[Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization

2023-09-25 Thread redbeard0531 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588

Bug ID: 111588
   Summary: Provide opt-out of shared_ptr single-threaded
optimization
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Right now there is a fast-path for single-threaded programs to avoid the
overhead of atomics in shared_ptr, but there is no equivalent for a program the
knows it is multi-threaded to remove the check and branch. If __GTHREADS is not
defined then no atomic code is emitted.

There are two issues with this: 1) for programs that know they are effectively
always multithreaded they pay for a runtime branch and .text segment bloat for
an optimization that never applies. This may have knock-on effects of making
functions that use shared_ptr less likely to be inlined by pushing them
slightly over the complexity threshold. 2) It invalidates singlethreaded
microbenchmarks of code that uses shared_ptr because the performance of the
code may be very different from when run in the real multithreaded program.

I understand the value of making a fast mode for single-threaded code, and I
can even except having the runtime branch by default, rather than as an opt-in,
when it is unknown if the program will be run with multiple threads. But an
opt-out would be nice to have. If it had to be a gcc-build time option rather
than a #define, that would be acceptable for us since we always use our own
build of gcc, but it seems like a worse option for other users.

FWIW, neither llvm libc++
(https://github.com/llvm/llvm-project/blob/0bfaed8c612705cfa8c5382d26d8089a0a26386b/libcxx/include/__memory/shared_ptr.h#L103-L110)
nor MS-STL
(https://github.com/microsoft/STL/blob/main/stl/inc/memory#L1171-L1173) ever
use runtime branching to detect multithreading.

[Bug target/104831] RISCV libatomic LR.aq/SC.rl pair insufficient for SEQ_CST

2023-09-25 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104831

palmer at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |patrick at rivosinc dot 
com
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2023-09-25

--- Comment #10 from palmer at gcc dot gnu.org ---
This should be fixed, looks like we just forgot to close the bug.  I've
assigned it to Patrick to make sure everything's finished.

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #6 from Mathieu Malaterre  ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Mathieu Malaterre from comment #4)
> > > So the original
> > > (upstream) code is somewhat buggy as it rely on lazy init for global var.
> > 
> > Those global vars are in different namespace, I actually fail to underwhat
> > why the definition with ",cpu=power10" gets pulled in...
> 
> Because `#pragma GCC target targets_str` is global state and unrelated to
> namespace ...

Forgot to mentionned that each `#pragma GCC target` for namespace are inside
`#pragma GCC push_options` / `#pragma GCC pop_options`. This implements "per
namespace" target-specific options AFAIK.

[Bug ipa/59948] Optimize std::function

2023-09-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59948

--- Comment #8 from Jan Hubicka  ---
Trunk optimized stuff return 0, but fails to optimize out functions which
becomes unused after indirect inlining.
With -fno-early-inlining we end up with:

int m ()
{
  void * D.48296;
  int __args#0;
  struct function h;
  int _12;
  bool (*) (union _Any_data & {ref-all}, const union _Any_data &
{ref-all}, _Manager_operation) _24;
  bool (*) (union _Any_data & {ref-all}, const union _Any_data &
{ref-all}, _Manager_operation) _27;
  long unsigned int _29;
  long unsigned int _35;
  vector(2) long unsigned int _37;
  void * _42;

   [local count: 1073741824]:
  _29 = (long unsigned int) _M_invoke;
  _35 = (long unsigned int) _M_manager;
  _37 = {_35, _29};
  h ={v} {CLOBBER};
  MEM  [(struct _Function_base *) + 8B] = {};
  MEM[(int (*) (int) *)] = f;
  MEM  [(void *) + 16B] = _37;
  __args#0 = 1;
  _12 = std::_Function_handler::_M_invoke
(_M_functor, &__args#0);

   [local count: 1073312329]:
  __args#0 ={v} {CLOBBER(eol)};
  _24 = MEM[(struct _Function_base *)]._M_manager;
  if (_24 != 0B)
goto ; [70.00%]
  else
goto ; [30.00%]

   [local count: 751318634]:
  _24 ([(struct _Function_base *)]._M_functor, [(struct
_Function_base *)]._M_functor, 3);

   [local count: 1073312329]:
  h ={v} {CLOBBER};
  h ={v} {CLOBBER(eol)};
  return _12;

   [count: 0]:
:
  _27 = MEM[(struct _Function_base *)]._M_manager;
  if (_27 != 0B)
goto ; [0.00%]
  else
goto ; [0.00%]

   [count: 0]:
  _27 ([(struct _Function_base *)]._M_functor, [(struct
_Function_base *)]._M_functor, 3);

   [count: 0]:
  h ={v} {CLOBBER};
  _42 = __builtin_eh_pointer (2);
  __builtin_unwind_resume (_42);

}

ipa-prop fails to track the pointer passed around:

IPA function summary for int m()/288 inlinable
  global time: 41.256800
  self size:   16
  global size: 41
  min size:   38
  self stack:  32
  global stack:32
size:19.00, time:8.66
size:3.00, time:2.00,  executed if:(not inlined)
  calls:
std::function::~function()/286 inlined
  freq:0.00
  Stack frame offset 32, callee self size 0
  std::_Function_base::~_Function_base()/71 inlined
freq:0.00
Stack frame offset 32, callee self size 0
indirect call loop depth: 0 freq:0.00 size: 6 time: 18
std::function::~function()/404 inlined
  freq:1.00
  Stack frame offset 32, callee self size 0
  std::_Function_base::~_Function_base()/405 inlined
freq:1.00
Stack frame offset 32, callee self size 0
indirect call loop depth: 0 freq:0.70 size: 6 time: 18
_Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const
[with _Res = int; _ArgTypes = {int}]/304 inlined
  freq:1.00
  Stack frame offset 32, callee self size 0
  void std::__throw_bad_function_call()/374 function body not available
freq:0.00 loop depth: 0 size: 1 time: 10
  _M_empty.isra/384 inlined 
freq:1.00
Stack frame offset 32, callee self size 0
  indirect call loop depth: 0 freq:1.00 size: 6 time: 18
std::function<_Res(_ArgTypes ...)>::function(_Functor&&) [with _Functor =
int (&)(int); _Constraints = void; _Res = int; _ArgTypes = {int}]/302 inlined 
  freq:1.00
  Stack frame offset 32, callee self size 0
  std::function<_Res(_ArgTypes ...)>::function(_Functor&&) [with _Functor =
int (&)(int); _Constraints = void; _Res = int; _ArgTypes = {int}]/375 inlined
freq:0.33
Stack frame offset 32, callee self size 0
static void
std::_Function_base::_Base_manager<_Functor>::_M_init_functor(std::_Any_data&,
_Fn&&) [with _Fn = int (&)(int); _Functor = int (*)(int)]/310 inlined
  freq:0.33
  Stack frame offset 32, callee self size 0
  _M_create.isra/383 inlined
freq:0.33
Stack frame offset 32, callee self size 0
void* std::_Any_data::_M_access()/388 inlined
  freq:0.33
  Stack frame offset 32, callee self size 0
operator new.isra/386 inlined
  freq:0.33
  Stack frame offset 32, callee self size 0
  static bool
std::_Function_base::_Base_manager<_Functor>::_M_not_empty_function(_Tp*) [with
_Tp = int(int); _Functor = int (*)(int)]/308 inlined
freq:1.00
Stack frame offset 32, callee self size 0
  constexpr std::_Function_base::_Function_base()/299 inlined
freq:1.00
Stack frame offset 32, callee self size 0

[Bug c++/111512] GCC's __builtin_memcpy can trigger ADL

2023-09-25 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111512

--- Comment #3 from Jonathan Wakely  ---
The library has a workaround, but the front end still does unwanted ADL for
__builtin_memcpy (and probably other built-ins).

[Bug libstdc++/111511] Incorrect ADL in std::to_array in GCC 11/12/13

2023-09-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111511

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:77cf3773021b0a20d89623e09d620747a05588ec

commit r14-4252-g77cf3773021b0a20d89623e09d620747a05588ec
Author: Jonathan Wakely 
Date:   Thu Sep 21 09:14:57 2023 +0100

libstdc++: Prevent unwanted ADL in std::to_array [PR111512]

As noted in PR c++/111512, GCC does ADL for __builtin_memcpy if it is
unqualified, which can cause errors for template argument types which
cannot be completed.

Casting the memcpy arguments to void* prevents ADL from considering the
problem type.

libstdc++-v3/ChangeLog:

PR libstdc++/111511
PR c++/111512
* include/std/array (to_array): Cast memcpy arguments to void*.
* testsuite/23_containers/array/creation/111512.cc: New test.

[Bug c++/111512] GCC's __builtin_memcpy can trigger ADL

2023-09-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111512

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:77cf3773021b0a20d89623e09d620747a05588ec

commit r14-4252-g77cf3773021b0a20d89623e09d620747a05588ec
Author: Jonathan Wakely 
Date:   Thu Sep 21 09:14:57 2023 +0100

libstdc++: Prevent unwanted ADL in std::to_array [PR111512]

As noted in PR c++/111512, GCC does ADL for __builtin_memcpy if it is
unqualified, which can cause errors for template argument types which
cannot be completed.

Casting the memcpy arguments to void* prevents ADL from considering the
problem type.

libstdc++-v3/ChangeLog:

PR libstdc++/111511
PR c++/111512
* include/std/array (to_array): Cast memcpy arguments to void*.
* testsuite/23_containers/array/creation/111512.cc: New test.

[Bug tree-optimization/110982] (unsigned)(signed_char) != (unsigned)-1 is never changed back into signed_char != -1

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110982

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-09-25

--- Comment #2 from Andrew Pinski  ---
Another case where int_fits_type_p use causes an missed optimization is:
```
unsigned f(int a)
{
unsigned t = a;
if (a == -1)
return t;
return 0;
}
```

This should be caught in phiopt2 but currently is not due to the
int_fits_type_p usage.

I noticed this in PR 110131 .

[Bug target/111570] -march=generic prints error

2023-09-25 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111570

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2023-09-25
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html is very clear:

"There is no -march=generic option because -march indicates the instruction set
the compiler can use, and there is no generic instruction set applicable to all
processors. In contrast, -mtune indicates the processor (or, in this case,
collection of processors) for which the code is optimized."

This is just a bug in the list of valid arguments printed.

[Bug target/111584] [aarch64] Redundant movprfx with ptrue

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111584

--- Comment #1 from Andrew Pinski  ---
Created attachment 55986
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55986=edit
Full testcase

`-march=armv8.2-a+sve -O2 -msve-vector-bits=256`

[Bug tree-optimization/111583] [13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r13-3281-g6cc3394507

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111583

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-09-25

--- Comment #1 from Andrew Pinski  ---
Confirmed.

The problem is a latent bug in ldist.
It turns:
```
   [local count: 955630224]:
  a_5 = a_4 + 1;
  a.4_6 = (char *) a_4;
  *a.4_6 = 0;

   [local count: 1073741824]:
  # a_4 = PHI 
  # j_7 = PHI 
  j_8 = j_7 + 18446744073709551615;
  if (j_7 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]
```

Into:
```
  a_2 = (long int) k_1(D);
  j_3 = (long unsigned int) k_1(D);
  _23 = (sizetype) k_1(D);
  _25 = (char *) a_2;
  __builtin_memset (_25, 0, _23);
```

Which then basically says k!=0 as _25 can't be a null pointer.

[Bug c/111584] New: [aarch64] Redundant movprfx with ptrue

2023-09-25 Thread zhongyunde at huawei dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111584

Bug ID: 111584
   Summary: [aarch64] Redundant movprfx with ptrue
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhongyunde at huawei dot com
  Target Milestone: ---

* test: https://gcc.godbolt.org/z/E6Eez81jh
```
#include

typedef svfloat32_t fvec32 __attribute__((arm_sve_vector_bits(256)));

typedef svfloat32_t __m256_;

 __m256_ _mm256_mul_ps2_z(__m256_ a, __m256_ b)
{
 __m256_ res;
 res = svmul_f32_z(svptrue_b32(), a, b);
 return res;
}
```

* llvm have same output for _mm256_mul_ps2_x and  _mm256_mul_ps2_z, while gcc
doesn't has high efficient output for _mm256_mul_ps2_z

[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)

2023-09-25 Thread cptarse-luke at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500

--- Comment #7 from Luke  ---
(In reply to Andrew Pinski from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > This is most likely a dup of bug 104773.
> 
> Or of bug 3507.

i concur...
but i do not know which one to choose...
they both look the same to me... somehow...

[Bug tree-optimization/111583] [13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r13-3281-g6cc3394507

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111583

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.3

[Bug tree-optimization/111583] New: [13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r13-3281-g6cc3394507

2023-09-25 Thread shaohua.li at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111583

Bug ID: 111583
   Summary: [13/14 Regression] Wrong code at -Os on
x86_64-linux-gnu since r13-3281-g6cc3394507
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: shaohua.li at inf dot ethz.ch
CC: amacleod at redhat dot com
  Target Milestone: ---

gcc at -Os produced the wrong code.

Bisected to r13-3281-g6cc3394507

Compiler explorer: https://godbolt.org/z/8GM9YvMKb

$ cat a.c
int printf(const char *, ...);
int b, c, d;
char e;
short f;
const unsigned short **g;
char h(char k) {
  if (k)
return '0';
  return 0;
}
int l() {
  b = 0;
  return 1;
}
static short m(unsigned k) {
  const unsigned short *n[65];
  g = [4];
  k || l();
  long a = k;
  char i = 0;
  unsigned long j = k;
  while (j--)
*(char *)a++ = i;
  c = h(d);
  f = k;
  return 0;
}
int main() {
  long o = (e < 0) << 5;
  m(o);
  printf("%d\n", f);
}
$
$ gcc -O0 -fsanitize=address,undefined a.c && ./a.out
0
$ gcc -Os a.c && ./a.out
32
$

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #5 from Andrew Pinski  ---
(In reply to Mathieu Malaterre from comment #4)
> > So the original
> > (upstream) code is somewhat buggy as it rely on lazy init for global var.
> 
> Those global vars are in different namespace, I actually fail to underwhat
> why the definition with ",cpu=power10" gets pulled in...

Because `#pragma GCC target targets_str` is global state and unrelated to
namespace ...

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #4 from Mathieu Malaterre  ---
> So the original
> (upstream) code is somewhat buggy as it rely on lazy init for global var.

Those global vars are in different namespace, I actually fail to underwhat why
the definition with ",cpu=power10" gets pulled in...

[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)

2023-09-25 Thread cptarse-luke at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500
Bug 111500 depends on bug 111581, which changed state.

Bug 111581 Summary: [arm-none-eabi-gcc] / suboptimal optimization / uxth/sxth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111581

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

[Bug rtl-optimization/60749] combine is overly cautious when operating on volatile memory references

2023-09-25 Thread cptarse-luke at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60749

Luke  changed:

   What|Removed |Added

 CC||cptarse-luke at yahoo dot com

--- Comment #3 from Luke  ---
*** Bug 111581 has been marked as a duplicate of this bug. ***

[Bug target/111581] [arm-none-eabi-gcc] / suboptimal optimization / uxth/sxth

2023-09-25 Thread cptarse-luke at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111581

Luke  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Luke  ---
nope... i didn't know bug 60749... and it does not happen, when i omit the
"volatile"...

*** This bug has been marked as a duplicate of bug 60749 ***

[Bug tree-optimization/110386] [11/12/13 Regression] ICE with ABSU in backprop

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110386

Andrew Pinski  changed:

   What|Removed |Added

Summary|[11/12/13/14 Regression]|[11/12/13 Regression] ICE
   |ICE with ABSU in backprop   |with ABSU in backprop
  Known to work||14.0

--- Comment #9 from Andrew Pinski  ---
Fixed on the trunk so far.

[Bug tree-optimization/110386] [11/12/13/14 Regression] ICE with ABSU in backprop

2023-09-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110386

--- Comment #8 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f

commit r14-4249-g2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f
Author: Andrew Pinski 
Date:   Sat Sep 23 21:53:09 2023 -0700

Fix PR 110386: backprop vs ABSU_EXPR

The issue here is that when backprop tries to go
and strip sign ops, it skips over ABSU_EXPR but
ABSU_EXPR not only does an ABS, it also changes the
type to unsigned.
Since strip_sign_op_1 is only supposed to strip off
sign changing operands and not ones that change types,
removing ABSU_EXPR here is correct. We don't handle
nop conversions so this does cause any missed optimizations either.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/110386

gcc/ChangeLog:

* gimple-ssa-backprop.cc (strip_sign_op_1): Remove ABSU_EXPR.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr110386-1.c: New test.
* gcc.c-torture/compile/pr110386-2.c: New test.

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #3 from Mathieu Malaterre  ---
For reference:

*
https://github.com/google/highway/commit/fea3dba9cfec3a74ddcd8ecac3a5d4d8429191e4

[Bug target/111522] Different code path for static initialization with flto

2023-09-25 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

--- Comment #2 from Mathieu Malaterre  ---
(In reply to Andrew Pinski from comment #1)
> I think this is just broken code.
> 
> It does:
> #define HWY_BEFORE_NAMESPACE()  
> \
>   HWY_PUSH_ATTRIBUTES("altivec,vsx,power8-vector"   
> \
>   ",cpu=power10")
> 
> But does not do a pop before the main function.
> 
> And then you are testing on power8 which obvious does not have all of the
> instructions as power10 ...
> Why it works without -flto is just pure accident not using the instructions
> that are not in power8.
> 
> Anyways I suspect this is too much reduced testcase. So you might need to
> provide the original one.

I reported this one up after reading #111380. Honestly there is no "wrong-code"
here. The LTO case is simply an eager init of global variable, while the
non-LTO is a lazy loading of global var. So the original (upstream) code is
somewhat buggy as it rely on lazy init for global var.

Could someone please just confirm that eager init of global var is expected in
LTO case, we could just close this one.

[Bug ada/111578] GNAT ada.textio.setline gives incorrect result

2023-09-25 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111578

Eric Botcazou  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 CC||ebotcazou at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Eric Botcazou  ---
No, see the A.10.5 clause of the Ada Reference Manual.

[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> This is most likely a dup of bug 104773.

Or of bug 3507.

[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500

--- Comment #5 from Andrew Pinski  ---
(In reply to Luke from comment #4)
> the a.i file for example #1a is:
> # 1 "a.c"
> # 1 "/tmp//"
> # 1 ""
> # 1 ""
> # 1 "a.c"
> void artiSUBS() {
>  for (int i=100; i>0; i--)
>   *(volatile int*)0xE000E014 = i;
> }
> 
> the command-line was:
> > arm-none-eabi-gcc -save-temps -S a.c -O3 -g -mcpu=cortex-m0plus -mthumb 
> > -Wall --specs=nosys.specs -nostdlib -fdata-sections -ffunction-sections 
> > -ffreestanding -Winline
> 
> and the resulting a.s file contains that subs/cmp sequence...

This is most likely a dup of bug 104773.

[Bug target/111582] [arm-none-eabi-gcc] / suboptimal optimization / bitfield / superfluous stack write

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111582

--- Comment #2 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #1)
> Fixed in GCC 10.

artiSP:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movsr2, #14
@ sp needed
ldr r3, .L3
ldrbr0, [r3]
bicsr0, r2
subsr2, r2, #10
orrsr0, r2
strbr0, [r3]
lslsr0, r0, #16
bx  lr
.L4:
.align  2
.L3:
.word   -536870742

[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500
Bug 111500 depends on bug 111582, which changed state.

Bug 111582 Summary: [arm-none-eabi-gcc] / suboptimal optimization / bitfield / 
superfluous stack write
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111582

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/111582] [arm-none-eabi-gcc] / suboptimal optimization / bitfield / superfluous stack write

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111582

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |10.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
  Known to work||10.4.0, 14.0

--- Comment #1 from Andrew Pinski  ---
Fixed in GCC 10.

[Bug target/111581] [arm-none-eabi-gcc] / suboptimal optimization / uxth/sxth

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111581

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2023-09-25
 Ever confirmed|0   |1
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=60749

--- Comment #2 from Andrew Pinski  ---
Is there a testcase without pointers to a volatile location?

If not then this is a dup of bug 60749.

[Bug ada/111579] gnatpp error at start

2023-09-25 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111579

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID
 CC||ebotcazou at gcc dot gnu.org

--- Comment #1 from Eric Botcazou  ---
gnatpp is not part of the GNU Compiler Collection.

[Bug target/40499] [missed optimization] branch to return not threaded on thumb

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40499

Andrew Pinski  changed:

   What|Removed |Added

 CC||cptarse-luke at yahoo dot com

--- Comment #7 from Andrew Pinski  ---
*** Bug 111580 has been marked as a duplicate of this bug. ***

[Bug target/111580] [arm-none-eabi-gcc] / suboptimal optimization / b.n to bx lr

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111580

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup of bug 40499.

*** This bug has been marked as a duplicate of bug 40499 ***

[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)

2023-09-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500
Bug 111500 depends on bug 111580, which changed state.

Bug 111580 Summary: [arm-none-eabi-gcc] / suboptimal optimization / b.n to bx lr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111580

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

[Bug target/111581] [arm-none-eabi-gcc] / suboptimal optimization / uxth/sxth

2023-09-25 Thread cptarse-luke at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111581

--- Comment #1 from Luke  ---
in the unsigned case:
furthermore the ldrh already cleared the high half-word,
so that a uxth would be superfluous,
even if there would be a subsequent str...

[Bug target/111582] New: [arm-none-eabi-gcc] / suboptimal optimization / bitfield / superfluous stack write

2023-09-25 Thread cptarse-luke at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111582

Bug ID: 111582
   Summary: [arm-none-eabi-gcc] / suboptimal optimization /
bitfield / superfluous stack write
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cptarse-luke at yahoo dot com
  Target Milestone: ---

When I try to use a struct with a bitfield,
then it happens, that GCC writes to the stack without ever reading it:

> arm-none-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-none-eabi/9.3.0/lto-wrapper
Target: arm-none-eabi
Configured with: ../configure --disable-decimal-float --disable-libffi
--disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp
--disable-libstdcxx-pch --disable-libstdc__-v3 --disable-nls --disable-shared
--disable-threads --disable-tls --disable-werror --enable-__cxa_atexit
--enable-c99 --enable-gnu-indirect-function --enable-interwork
--enable-languages=c,c++ --enable-long-long --enable-multilib --enable-plugins
--host= --libdir=/usr/lib --libexecdir=/usr/lib --prefix=/usr
--target=arm-none-eabi --with-gmp --with-gnu-as --with-gnu-ld
--with-headers=/usr/arm-none-eabi/include --with-host-libstdcxx='-static-libgcc
-Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-isl --with-libelf --with-mpc
--with-mpfr --with-multilib-list=rmprofile
--with-native-system-header-dir=/include --with-newlib
--with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/usr/arm-none-eabi
--with-system-zlib
Thread model: single
gcc version 9.3.0 (GCC)
# arm-none-eabi-gcc -save-temps -S a.c -O3 -g -mcpu=cortex-m0plus -mthumb -Wall
--specs=nosys.specs -nostdlib -fdata-sections -ffunction-sections
-ffreestanding -Winline
> cat a.i
# 1 "a.c"
# 1 "/tmp//"
# 1 ""
# 1 ""
# 1 "a.c"

typedef unsigned char u8;
typedef unsigned int u32;
extern int fatal();
__attribute__((always_inline)) inline u32 lsb(const u8 l) { return (1U<> (i*8);
 if (R.rs || msk==~R.msk) return (((volatile u8*)R.a)[i] = v) << (i*8);
 else if (R.v==~R.msk) return (((volatile u8*)R.a)[i] |= v) << (i*8);
 return (((volatile u8*)R.a)[i] = (((volatile u8*)R.a)[i] &
(R.msk>>(i*8))) | v) << (i*8);
  }
 return 0;
}
__attribute__((always_inline)) inline Reg GU(Reg R, u32 A, u32 N, u8 o, u8 w,
u32 v) {
   const u32 msk=~(lsb(w)< cat a.s
artiSP:
sub sp, sp, #16
mov r2, sp
movsr3, #2
strbr3, [r2, #12]
...
add sp, sp, #16
bx  lr

I compile it on a
Intel(R) Pentium(R) Silver J5040 CPU @ 2.00GHz
running Void Linux (kernel: 6.3.13_1)
for a STM32G030.

[Bug middle-end/111548] RISC-V Vector: ICE in validate_change_or_fail (vsetvl pass)

2023-09-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111548

--- Comment #1 from CVS Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:9d5f20fc4a6b3254d2d379309193da4be2747987

commit r14-4248-g9d5f20fc4a6b3254d2d379309193da4be2747987
Author: Juzhe-Zhong 
Date:   Sun Sep 24 11:17:01 2023 +0800

RISC-V: Fix AVL/VL bug of VSETVL PASS[PR111548]

This patch fixes that AVL/VL reg incorrect fetch in VSETVL PASS.

C/C++ regression passed.

But gfortran didn't run yet. I am still finding a way to run it.

Will commit it when I pass the fortran regression.

PR target/111548

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (earliest_pred_can_be_fused_p):
Bugfix

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111548.c: New test.