[Bug target/110435] New: ICE in in convert_move, at expr.cc:297 on Aarch64 with -Ofast

2023-06-27 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110435

Bug ID: 110435
   Summary: ICE in in convert_move, at expr.cc:297 on Aarch64 with
-Ofast
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux
Target: aarch64-linux

Using a cross compiler (revision r14-2079-g9326a49c9e9d63) configured with

  /home/worker/buildworker/tiber-gcc-trunk-aarch64/build/configure
--enable-languages=c,c++,fortran,rust,m2 --disable-bootstrap
--disable-libsanitizer --disable-multilib --enable-checking=release
--prefix=/home/worker/cross --target=aarch64-linux-gnu
--with-as=/usr/bin/aarch64-suse-linux-as

to compile our own testcase gcc/testsuite/gfortran.dg/pr68251.f90 with -Ofast
results in an ICE:

$ ~/cross/bin/aarch64-linux-gnu-gfortran gcc/testsuite/gfortran.dg/pr68251.f90
-Ofast -o /tmp/aaa.out
during RTL pass: expand
gcc/testsuite/gfortran.dg/pr68251.f90:1043:57:

 1043 | kbc((mc-1)*15+mb) = kbc((mc-1)*15+mb) - ks_bc
  | ^
internal compiler error: in convert_move, at expr.cc:297
0x74393a convert_move(rtx_def*, rtx_def*, int)
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/expr.cc:297
0xa5ffa5 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/expr.cc:9368
0x961290 expand_gimple_stmt_1
   
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/cfgexpand.cc:3983
0x961290 expand_gimple_stmt
   
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/cfgexpand.cc:4044
0x9661c7 expand_gimple_basic_block
   
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/cfgexpand.cc:6096
0x967e2e execute
   
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/cfgexpand.cc:6831
Please submit a full bug report, with preprocessed source (by using
-freport-bug).

[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16

2023-06-27 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432

--- Comment #5 from Iain Sandoe  ---
(In reply to Sascha Scandella from comment #4)
> I found also this issue regarding init_priority:
> https://github.com/llvm/llvm-project/issues/15363

So that is the intentional behaviour (upstream clang definitely used to reject
it) - as noted it actually works fine with LTO too (or within one module if
not).

I was investigating whether we could do the work in collect2, but that gets
quite complex when considering the interactions between LTO and non-LTO
objects.

For now, IMO, we should adopt a fix of the nature Jonathan suggests and then it
will "just work" if/when we get init prio on Darwin.

in slower time, we might consider the option of following clang's behaviour for
Darwin (possibly with a warning about the does-not-work-between-tus).

[Bug tree-optimization/110420] [12/13/14 Regression] internal compiler error: in gimple_redirect_edge_and_branch due to simple_dce_from_worklist removing `asm goto`

2023-06-27 Thread jbglaw--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

--- Comment #7 from Jan-Benedict Glaw  ---
Confirmed: This patch fixes the issue for me with the Linux PPC builds.

[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16

2023-06-27 Thread sascha.scandella at dentsplysirona dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432

--- Comment #4 from Sascha Scandella  ---
I found also this issue regarding init_priority:
https://github.com/llvm/llvm-project/issues/15363

[Bug tree-optimization/110434] New: tree-nrv introduces incorrect CLOBBER(eol)

2023-06-27 Thread kristerw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110434

Bug ID: 110434
   Summary: tree-nrv introduces incorrect CLOBBER(eol)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The tree-nrv pass may introduce incorrect CLOBBER(eol) of the form
   ={v} {CLOBBER(eol)};
  return ;

One example of this can be seen by compiling gcc.c-torture/execute/921204-1.c
for x86 using the flags "-O -m32", where it changes the IR

  union bu o;
  ...
  o = i;
  MEM[(union  *)].b18 = _11;
  MEM[(union  *)].b20 = _11;
   = o;
  o ={v} {CLOBBER(eol)};
  return ;

to just use  instead of o

  union bu o [value-expr: ];
  ...
   = i;
  MEM[(union  *)&].b18 = _11;
  MEM[(union  *)&].b20 = _11;
   ={v} {CLOBBER(eol)};
  return ;

so the CLOBBER(eol) now refers to .

[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16

2023-06-27 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432

--- Comment #3 from Iain Sandoe  ---


interesting - Apple clang does seem to accept __attribute__((init_priority))
but it still does not actually work **between TUs** unless LTO is engaged. 

Actually GCC for Darwin could adopt a similar scheme (perhaps we should to be
*** compatible).

The issue is not whether GCC can do it - it is whether the linker (ld64)
honours the ordering information and can generate a new global initialiser
(which it seems still not to).

AFAIR upstream clang rejects the attribute for Darwin.



@Jonathan is there a patch for that proposed solution?

[Bug tree-optimization/110428] missed CSE with VLA vectors

2023-06-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

--- Comment #4 from JuzheZhong  ---
(In reply to JuzheZhong from comment #3)
> Hi, I think for VLS vectors, we should be able the enhance CSE for this
> following case:
> 
> #include 
> 
> void __attribute__((noinline,noclone))
> foo (int *out, int *res, unsigned int n)
> {
>   int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
>   int i;
>   for (i = 0; i < n+16; ++i)
> {
>   if (mask[i])
> out[i] = i;
> }
>   int o0 = out[0];
>   int o7 = out[7];
>   int o14 = out[14];
>   int o15 = out[15];
>   res[0] = o0;
>   res[2] = o7;
>   res[4] = o14;
>   res[6] = o15;
> }
> 
> since n is unsigned int number, i < n + 16, ARM SVE fail to CSE.
> Is it right?


Maybe this case is too complicated, I try this following case:


void __attribute__((noinline,noclone))
foo (int *out, int *res, unsigned int n)
{
  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
  int i;
  for (i = 0; i < 16; ++i)
{
  if (mask[i])
out[i] = i;
}
  for (i = 16; i < n + 16; ++i)
{
  if (mask[i])
out[i] = i;
}
  int o0 = out[0];
  int o7 = out[7];
  int o14 = out[14];
  int o15 = out[15];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

Such case is simpler, it should be CSE? I tried on SVE, GCC failed to CSE.

[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16

2023-06-27 Thread sascha.scandella at dentsplysirona dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432

--- Comment #2 from Sascha Scandella  ---
> Still libstdc++ ;-)

True that ;-)

> Patrick, we talked about this and IIRC your suggestion was to move the
> __has_attribute check into configure, so that it depends on GCC, not on
> whichever compiler happens to include  later.

I think this would also be a solution. Would this then be included in a future
GCC 13.2? Took quite a while until I figured out the reason for the segfault.

Just for completeness sake. I also posted it on one of the brew repositories
for GCC. Probably this could be patched on macOS also for GCC 13.1.
https://github.com/iains/gcc-13-branch/issues/6

[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16

2023-06-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2023-06-27
 Status|UNCONFIRMED |NEW
   Keywords||ABI
 CC||ppalka at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
(In reply to Sascha Scandella from comment #0)
> This leads to a segmentation fault with a simple sample application when
> using clang-16 in combination with stdlibc++.

It's libstdc++

> Would it be possible to change the #if statement such that it would also
> work on macOS when using clang in combination with the stdlibc++?

Still libstdc++ ;-)

> #if !__has_attribute(__init_priority__)
>   static ios_base::Init __ioinit;
> #elif defined(_GLIBCXX_SYMVER_GNU)
>   __extension__ __asm (".globl _ZSt21ios_base_library_initv");
> #endif

Patrick, we talked about this and IIRC your suggestion was to move the
__has_attribute check into configure, so that it depends on GCC, not on
whichever compiler happens to include  later.

[Bug tree-optimization/110428] missed CSE with VLA vectors

2023-06-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

JuzheZhong  changed:

   What|Removed |Added

 CC||juzhe.zhong at rivai dot ai

--- Comment #3 from JuzheZhong  ---
Hi, I think for VLS vectors, we should be able the enhance CSE for this
following case:

#include 

void __attribute__((noinline,noclone))
foo (int *out, int *res, unsigned int n)
{
  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
  int i;
  for (i = 0; i < n+16; ++i)
{
  if (mask[i])
out[i] = i;
}
  int o0 = out[0];
  int o7 = out[7];
  int o14 = out[14];
  int o15 = out[15];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

since n is unsigned int number, i < n + 16, ARM SVE fail to CSE.
Is it right?

[Bug middle-end/110431] Incorrect disambiguation of wide accesess from store-merging or SLP

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110431

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-06-27
 CC||rguenth at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
I guess the provenance people would say this violates some rules since
pa + 1 is used to access 'b'.  GCC itself is prone to introduce such
issues when propagating equivalences though.

[Bug c/110430] Fail to CSE for LEN_MASK_STORE

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110430

--- Comment #1 from Richard Biener  ---
*** Bug 110428 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/110428] missed CSE with VLA vectors

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Richard Biener  ---
.

*** This bug has been marked as a duplicate of bug 110430 ***

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237

--- Comment #22 from rguenther at suse dot de  ---
On Tue, 27 Jun 2023, amonakov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
> 
> --- Comment #21 from Alexander Monakov  ---
> (In reply to rguent...@suse.de from comment #19)
> > But the size argument doesn't have anything to do with TBAA (and
> > may_alias is about TBAA).  I don't think we have any way to circumvent
> > C object access rules.  That is, for example, with -fno-strict-aliasing
> > the following isn't going to work.
> > 
> > int a;
> > int b;
> > 
> > int main()
> > {
> >   a = 1;
> >   b = 2;
> >   if ( + 1 == ) // equality compare of unrelated pointers OK
> > {
> >   long x = *(long *) // access outside of 'a' not OK
> >   if (x != 0x00010002)
> > abort ();
> > }
> > }
> > 
> > there's no command-line flag or attribute to form a pointer
> > to an object composing 'a' and 'b' besides changing how the
> > storage is declared.
> 
> But store-merging and SLP can introduce a wide long-sized access where on
> source level you had two adjacent loads or even memcpy's, so we really seem to
> have a problem here and might need to be able to annotate types or individual
> accesses as "may-alias-with-oob-ok" in the IR: PR 110431.

But above 'a' and 'b' are not adjacent, they are only verified to be
at runtime.  The only thing we do IIRC is use wider loads to access
properly aligned storage as we know the load wouldn't trap.  That
can lead us to the case you pointed out originally - we load stuff
we will ignore but might cause alias disambiguation to disambiguate
against a store of the original non-widened size.

[Bug analyzer/110433] New: ASAN reports mismatching new/delete when compiling analyzer testcases

2023-06-27 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110433

Bug ID: 110433
   Summary: ASAN reports mismatching new/delete when compiling
analyzer testcases
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: dmalcolm at gcc dot gnu.org
Blocks: 86656
  Target Milestone: ---
  Host: x86_64-linux
Target: x86_64-linux

With a bootstrapped compiler configured with
--with-build-config=bootstrap-asan I get errors about new/delete
mismatching types when compiling testcases:

  - gcc.dg/analyzer/out-of-bounds-diagram-13.c
  - gcc.dg/analyzer/out-of-bounds-diagram-15.c
  - gcc.dg/analyzer/out-of-bounds-diagram-4.c
  - gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c
  - gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c and
  - gcc.dg/analyzer/out-of-bounds-diagram-7.c

The errors all look like:

Executing on host: /home/worker/buildworker/tiber-gcc-asan/objdir/gcc/xgcc
-B/home/worker/buildworker/tiber-gcc-asan/objdir/gcc/ 
/home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c
   -fdiagnostics-plain-output   -fanalyzer -Wanalyzer-too-complex
-fanalyzer-call-summaries -fdiagnostics-text-art-charset=unicode -S -o
out-of-bounds-diagram-13.s(timeout = 300)
spawn -ignore SIGHUP /home/worker/buildworker/tiber-gcc-asan/objdir/gcc/xgcc
-B/home/worker/buildworker/tiber-gcc-asan/objdir/gcc/
/home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c
-fdiagnostics-plain-output -fanalyzer -Wanalyzer-too-complex
-fanalyzer-call-summaries -fdiagnostics-text-art-charset=unicode -S -o
out-of-bounds-diagram-13.s
/home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:
In function 'test_non_ascii':
/home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:9:3:
warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds]
/home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:8:8:
note: (1) capacity: 9 bytes
/home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:9:3:
note: (2) out-of-bounds write at byte 9 but 'buf' ends at byte 9
/home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:9:3:
note: write of 1 byte to beyond the end of 'buf'
/home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:9:3:
note: valid subscripts for 'buf' are '[0]' to '[8]'
=
==58507==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x50d00a00 in
thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   136 bytes;
  size of the deallocated type: 104 bytes.
#0 0x83eba8 in operator delete(void*, unsigned long)
/home/worker/buildworker/tiber-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:164
#1 0x51e6e45 in
std::default_delete::operator()(ana::svalue_spatial_item*)
const
/home/worker/buildworker/tiber-gcc-asan/objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:99
#2 0x51e6e45 in std::unique_ptr >::~unique_ptr()
/home/worker/buildworker/tiber-gcc-asan/objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:404
#3 0x51e6e45 in ana::access_diagram_impl::~access_diagram_impl()
/home/worker/buildworker/tiber-gcc-asan/build/gcc/analyzer/access-diagram.cc:1728
#4 0x51e703c in ana::access_diagram_impl::~access_diagram_impl()
/home/worker/buildworker/tiber-gcc-asan/build/gcc/analyzer/access-diagram.cc:1728
#5 0x4e97142 in
std::default_delete::operator()(text_art::widget*) const
/home/worker/buildworker/tiber-gcc-asan/objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:99
#6 0x4e97142 in std::unique_ptr >::~unique_ptr()
/home/worker/buildworker/tiber-gcc-asan/objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:404
--
==58507==HINT: if you don't care about these errors you may set
ASAN_OPTIONS=new_delete_type_mismatch=0
==58507==ABORTING
compiler exited with status 1
PASS: gcc.dg/analyzer/out-of-bounds-diagram-13.c  (test for warnings, line 9)
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-13.c  at line 10 (test for
warnings, line 9)
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-13.c expected multiline pattern
lines 17-42
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-13.c 2 blank line(s) in output
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-13.c (test for excess errors)
Excess errors:
=
==58507==ERROR: AddressSanitizer: new-delete-type-mismatch 

[Bug libstdc++/110432] New: macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16

2023-06-27 Thread sascha.scandella at dentsplysirona dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432

Bug ID: 110432
   Summary: macOS: Segmentation fault when using stdlibc++ from
gcc 13.1 in combination with clang-16
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sascha.scandella at dentsplysirona dot com
  Target Milestone: ---

As you certainly all know GCC has changed the way how the global iostream
objects are created since gcc 13.1. This can be found on the official page.

"For C++, construction of the global iostream objects std::cout, std::cin, etc.
is now done inside the standard library, instead of in every source file that
includes the header. This change improves the start-up performance of C++
programs, but it means that code compiled with GCC 13.1 will crash if the
correct version of libstdc++.so is not used at runtime. See the documentation
about using the right libstdc++.so at runtime. Future GCC releases will
mitigate the problem so that the program cannot be run at all with an older
libstdc++.so."

More details can also be found here:
https://developers.redhat.com/articles/2023/04/03/leaner-libstdc-gcc-13

On macOS SUPPORTS_INIT_PRIORITY within gcc is set to 0. This means that the
global iostream object is not initialized and the fallback will be taken (i.e.
static initialization of the iostream object).

The problem is that when the iostream include is used, the expression
__has_attribute(__init_priority__) is true, since clang-16 supports
__init_priority__ and the static initialization is not done. See here:

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/iostream#L78

This leads to a segmentation fault with a simple sample application when using
clang-16 in combination with stdlibc++.

Sample application:

#include 

int main()
{
  std::cout << "Hello" << std::endl;
}
$HOMEBREW_PREFIX/opt/llvm@16/bin/clang++ \
  -v \
  -stdlib=libstdc++ \
  -stdlib++-isystem $HOMEBREW_PREFIX/opt/gcc@13/include/c++/13 \
  -cxx-isystem $HOMEBREW_PREFIX/opt/gcc@13/include/c++/13/x86_64-apple-darwin22
\
  -L $HOMEBREW_PREFIX/opt/gcc@13/lib/gcc/13/ \
  -L $HOMEBREW_PREFIX/opt/llvm/lib \
  -o test main.cpp
Execute test -> segfault.

➜  ~ ./test
[1]7965 segmentation fault  ./test

Would it be possible to change the #if statement such that it would also work
on macOS when using clang in combination with the stdlibc++?

#if !__has_attribute(__init_priority__)
  static ios_base::Init __ioinit;
#elif defined(_GLIBCXX_SYMVER_GNU)
  __extension__ __asm (".globl _ZSt21ios_base_library_initv");
#endif

Remarks: When compiling with gcc everything works as expected since the
iostream object gets initialized properly with the fallback.

gcc -v

Using built-in specs.
COLLECT_GCC=gcc-13
COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc/13.1.0/bin/../libexec/gcc/x86_64-apple-darwin22/13/lto-wrapper
Target: x86_64-apple-darwin22
Configured with: ../configure --prefix=/usr/local/opt/gcc
--libdir=/usr/local/opt/gcc/lib/gcc/current --disable-nls
--enable-checking=release --with-gcc-major-version-only
--enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-13
--with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr
--with-mpc=/usr/local/opt/libmpc --with-isl=/usr/local/opt/isl
--with-zstd=/usr/local/opt/zstd --with-pkgversion='Homebrew GCC 13.1.0'
--with-bugurl=https://github.com/Homebrew/homebrew-core/issues
--with-system-zlib --build=x86_64-apple-darwin22
--with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.0 (Homebrew GCC 13.1.0) 

OS: macOS Ventura 13.4 (Intel)

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237

--- Comment #21 from Alexander Monakov  ---
(In reply to rguent...@suse.de from comment #19)
> But the size argument doesn't have anything to do with TBAA (and
> may_alias is about TBAA).  I don't think we have any way to circumvent
> C object access rules.  That is, for example, with -fno-strict-aliasing
> the following isn't going to work.
> 
> int a;
> int b;
> 
> int main()
> {
>   a = 1;
>   b = 2;
>   if ( + 1 == ) // equality compare of unrelated pointers OK
> {
>   long x = *(long *) // access outside of 'a' not OK
>   if (x != 0x00010002)
> abort ();
> }
> }
> 
> there's no command-line flag or attribute to form a pointer
> to an object composing 'a' and 'b' besides changing how the
> storage is declared.

But store-merging and SLP can introduce a wide long-sized access where on
source level you had two adjacent loads or even memcpy's, so we really seem to
have a problem here and might need to be able to annotate types or individual
accesses as "may-alias-with-oob-ok" in the IR: PR 110431.

[Bug middle-end/110431] New: Incorrect disambiguation of wide accesess from store-merging or SLP

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110431

Bug ID: 110431
   Summary: Incorrect disambiguation of wide accesess from
store-merging or SLP
   Product: gcc
   Version: 12.3.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

Inspired by bug 110237 comment 19:

int b, a;

int main()
{
int *pa = , *pb = 
asm("" : "+r"(pa));
asm("" : "+r"(pb));
if (pa + 1 == pb) {
a = 1, b = 2;
long x;
__builtin_memcpy(, pa, 4);
__builtin_memcpy(4 + (char *), pa+1, 4);
return (x - 0x00020001) * 131 >> 32;
}
}

https://godbolt.org/z/b67zxMv54

On GIMPLE, both store-merging and SLP vectorization are capable of introducing
merged long-sized access in place of individual int-sized memcpy's, which is
then disambiguated against initial stores on the RTL level, leading to a
miscompilation.

[Bug middle-end/110379] Unnecessary copies after early opts

2023-06-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110379

--- Comment #3 from Jan Hubicka  ---
I thought that ADDR_EXPR of refenence is just fancy way to represent NOP_EXPR
or POINTER_PLUS in today gimple. How that affects builtin_object_size? :)

However I think ipa-sra will eventually need to handel also ADDR_EXPR that
correspnds to non-zero offset

[Bug tree-optimization/110414] [14 Regression] Dead Code Elimination Regression since r14-1127-g9e2017ae6ac

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110414

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-06-27

--- Comment #1 from Richard Biener  ---
DOM3 removes the call in GCC 13 and the cause looks pretty similar to PR110413,
we lose a __builtin_unreachable () early.

[Bug tree-optimization/110413] [14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r14-1880-g827e208fa64

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110413

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
  Known to work||11.4.0, 12.3.0
   Last reconfirmed||2023-06-27

--- Comment #1 from Richard Biener  ---
GCC 13 eliminates the call in DOM3.  The differences into that pass are quite
big with trunk having performed many more optimizations, in particular
we have elided the __builtin_unreachable () call already in DOM2 where
the first differences appears.  We now manage to simplify the _24 != 0
branch.

@@ -79,22 +80,15 @@
   _22 = _2 <= 
   _23 = _2 != 0B;
   _24 = _23 & _22;
-  if (_24 != 0)
-goto ; [100.00%]
-  else
-goto ; [0.00%]
+  goto ; [100.00%]

[local count: 850510901]:
   k = _2;
   _3 = 1;
   _1 = 1;
   _15 = 1;
-  goto ; [100.00%]
-
-   [count: 0]:
-  __builtin_unreachable ();

-   [local count: 955630225]:
+   [local count: 955630225]:

So we manage to optimize

   [local count: 105119324]:
  k = _2;
  _22 = _2 <= 
  _23 = _2 != 0B;
  _24 = _23 & _22;
  if (_24 != 0)
goto ; [100.00%]
  else
goto ; [0.00%]

   [local count: 850510901]:
  k = _2;
  _3 = _2 <= 
  _1 = _2 != 0B;
  _15 = _1 & _3;
  if (_15 != 0)
goto ; [100.00%]
  else
goto ; [0.00%]

   [count: 0]:
  __builtin_unreachable ();

   [local count: 955630225]:
  # h.4_25 = PHI 
  _4 = h.4_25 + -1;
  h = _4;
  h.4_5 = h;
  if (h.4_5 != 0)
goto ; [89.00%]


It seems we're now doing this based on the exported global range table
since we correctly first arrive at

Optimizing block #3

Optimizing statement k = _2; 
LKUP STMT k = _2 with .MEM_11
LKUP STMT _2 = k with .MEM_11
LKUP STMT _2 = k with .MEM_21 
2>>> STMT _2 = k with .MEM_21
Optimizing statement _22 = _2 <=  
LKUP STMT _22 = _2 le_expr 
2>>> STMT _22 = _2 le_expr 
LKUP STMT _2 ge_expr 
Optimizing statement _23 = _2 != 0B;
LKUP STMT _23 = _2 ne_expr 0B
2>>> STMT _23 = _2 ne_expr 0B
Optimizing statement _24 = _23 & _22;
LKUP STMT _24 = _23 bit_and_expr _22
2>>> STMT _24 = _23 bit_and_expr _22
Optimizing statement if (_24 != 0)

Visiting conditional with predicate: if (_24 != 0)

With known ranges
_24: [irange] _Bool VARYING

Predicate evaluates to: DON'T KNOW

but then we register global ranges from the __builtin_unreachable () CFG:

  # RANGE [irange] _Bool [1, 1]
  _24 = _23 & _22;

and CFG cleanup scheduled by DOM does

static bool
cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
{
...
case GIMPLE_COND:
  {
gimple_match_op res_op;
if (gimple_simplify (stmt, _op, NULL, no_follow_ssa_edges,
 no_follow_ssa_edges)
&& res_op.code == INTEGER_CST)
  val = res_op.ops[0];

which now picks this up and elides the branch.

For some reason the "dead" stmts

   [local count: 118111600]:
  # PT = nonlocal null 
  _2 = j (); 
  i = _2;
  k = _2;
  # RANGE [irange] _Bool [1, 1]
  _22 = _2 <=  
  # RANGE [irange] _Bool [1, 1]
  _23 = _2 != 0B;
  # RANGE [irange] _Bool [1, 1]
  _24 = _23 & _22;

allow us to optimize

  k = _2;
  # PT = nonlocal escaped null
  k.5_6 = k;
  _16 = k.5_6 == 
  _17 = k.5_6 != 0B;
  _18 = _17 | _16;
  if (_18 != 0)

via

Optimizing statement _16 = k.5_6 == 
  Replaced 'k.5_6' with variable '_2'
LKUP STMT _16 = _2 eq_expr 
2>>> STMT _16 = _2 eq_expr 
Optimizing statement _17 = k.5_6 != 0B;
  Replaced 'k.5_6' with variable '_2'
LKUP STMT _17 = _2 ne_expr 0B
FIND: _23
  Replaced redundant expr '_2 != 0B' with '_23'
 ASGN _17 = _23
Optimizing statement _18 = _17 | _16;
  Replaced '_17' with variable '_23'
  Folded to: _18 = _16 | _23;
LKUP STMT _18 = _16 bit_ior_expr _23
2>>> STMT _18 = _16 bit_ior_expr _23
Optimizing statement if (_18 != 0)
  Replaced '_18' with constant '1'

so it's kind of a missed optimization in the first DOM that elides the
stmts and the inability of ranger to capture the relations in the global
ranges.

[Bug target/110406] d: Wrong code-gen returning POD structs by value

2023-06-27 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110406

--- Comment #12 from Iain Sandoe  ---
OTOH there was a second issue with zero-sized objects which was fixed thus:

diff --git a/gcc/d/types.cc b/gcc/d/types.cc
index a1f69bb02b7..020cc7de83f 100644
--- a/gcc/d/types.cc
+++ b/gcc/d/types.cc
@@ -581,6 +581,11 @@ finish_aggregate_mode (tree type)
 {
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 {
+  /* Fields of type `typeof(*null)' have no size, so let them force the
+record type mode to be computed as BLKmode.  */
+  if (TYPE_MAIN_VARIANT (TREE_TYPE (field)) == noreturn_type_node)
+   break;
+
   if (DECL_SIZE (field) == NULL_TREE)
return;
 }

[Bug c/110430] New: Fail to CSE for LEN_MASK_STORE

2023-06-27 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110430

Bug ID: 110430
   Summary: Fail to CSE for LEN_MASK_STORE
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Consider this following case:

void __attribute__((noinline,noclone))
foo (int *out, int *res)
{
  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
  int i;
  for (i = 0; i < 16; ++i)
{
  if (mask[i])
out[i] = i;
}
  int o0 = out[0];
  int o7 = out[7];
  int o14 = out[14];
  int o15 = out[15];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

-O3 -march=rv64gcv_zvl512b --param riscv-autovec-preference=fixed-vlmax
Current RVV auto-vectorization codegen:

foo:
lui a5,%hi(.LANCHOR0)
vsetivlizero,16,e32,m1,ta,ma
addia5,a5,%lo(.LANCHOR0)
vid.v   v1
vlm.v   v0,0(a5)
vsetvli a5,zero,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
lw  a2,0(a0)
lw  a3,28(a0)
lw  a4,56(a0)
lw  a5,60(a0)
sw  a2,0(a1)
sw  a3,8(a1)
sw  a4,16(a1)
sw  a5,24(a1)
ret

However, with this patch:
https://patchwork.sourceware.org/project/gcc/patch/20230627064737.16257-1-juzhe.zh...@rivai.ai/

We will end up with better codegen with CSE:

foo:
lui a5,%hi(.LANCHOR0)
vsetivlizero,16,e32,m1,ta,ma
addia5,a5,%lo(.LANCHOR0)
vid.v   v1
vlm.v   v0,0(a5)
vsetvli a5,zero,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
lw  a4,0(a0)
lw  a5,56(a0)
sw  a4,0(a1)
sw  a5,16(a1)
li  a4,7
li  a5,15
sw  a4,8(a1)
sw  a5,24(a1)
ret

2 "lw" should be CSE into 2 "li" instructions, gimple IR:

.LEN_MASK_STORE (out_10(D), 32B, 16, { 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0,
-1, 0, -1, 0, -1 }, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
0);
  o0_11 = *out_10(D);
  o14_13 = MEM[(int *)out_10(D) + 56B];
  *res_15(D) = o0_11;
  MEM[(int *)res_15(D) + 8B] = 7;
  MEM[(int *)res_15(D) + 16B] = o14_13;
  MEM[(int *)res_15(D) + 24B] = 15;
  mask ={v} {CLOBBER(eol)};

Since after discussion with Richi, 
this current possible fix patch can only hanlde VLS (fixed-length) vectors,
can not handle VLA (variable-length) vectors.

It's hard for us to create a C code testcase to produce CSE opportunity for
VL vectors.

So, open a BUG for now to make me won't forget such issue.
Will enhance LEN_MASK_STORE in CSE after I finished all RVV auto-vectorization
support.

[Bug target/110429] New: Redundant vector extract instruction on P9

2023-06-27 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110429

Bug ID: 110429
   Summary: Redundant vector extract instruction on P9
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guihaoc at gcc dot gnu.org
  Target Milestone: ---

//test.c
#include 
void extract_int_2 (int *p, vector int a) { *p = vec_extract (a, 2); }

On P9 LE, it generates
xxextractuw 34,34,4
stxsiwx 34,0,3

The xxextractuw is unnecessary as the extracted int is just at word[1].

[Bug middle-end/106081] missed vectorization

2023-06-27 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081

--- Comment #7 from rsandifo at gcc dot gnu.org  
---
I don't think the splat creates a new layout, but instead a
splat should be allowed to change its layout at zero cost.

[Bug target/110406] d: Wrong code-gen returning POD structs by value

2023-06-27 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110406

Iain Sandoe  changed:

   What|Removed |Added

 CC||iains at gcc dot gnu.org

--- Comment #11 from Iain Sandoe  ---
If I remember correctly, the underlying issue is that D always has a vtable
pointer for a "class" whereas C++ only adds one if needed (i.e. there are
actual virtual methods)

So we really need to use the 'struct' tag to D for classes without virtual
methods that need to interoperate with C++.  I think that then D will lay them
out without the vtable pointer.

We had a fix for this for Darwin - which does not seem to have made upstream
just yet.  Restesting (there's an unrelated bootstrap regression to work
around).

[Bug testsuite/110419] [14 regression] new test case gfortran.dg/value_9.f90 in r14-2050-gd130ae8499e0c6 fails

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110419

Richard Biener  changed:

   What|Removed |Added

   Keywords||testsuite-fail
  Component|other   |testsuite
   Target Milestone|--- |14.0

[Bug tree-optimization/110428] missed CSE with VLA vectors

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

Richard Biener  changed:

   What|Removed |Added

 Target||aarch64
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
On x86_64 for example with -march=znver4 we can perform the required CSE.

[Bug tree-optimization/110428] New: missed CSE with VLA vectors

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428

Bug ID: 110428
   Summary: missed CSE with VLA vectors
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

#include 

void __attribute__((noinline,noclone))
foo (uint16_t *out, uint16_t *res)
{
  int mask[] = { 0, 1, 1, 1, 1, 1, 1, 1 };
  int i;
  for (i = 0; i < 8; ++i)
{
  if (mask[i])
out[i] = 33;
}
  uint16_t o0 = out[0];
  uint16_t o7 = out[3];
  uint16_t o14 = out[6];
  uint16_t o15 = out[7];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

With -march=armv9.3-a -O3 -g0 -fno-vect-cost-model we fail to CSE the
out[] loads after vectorization.

[Bug c/110427] a

2023-06-27 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110427

Andreas Schwab  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Andreas Schwab  ---
The behaviour is undefined.

[Bug middle-end/106081] missed vectorization

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #6 from Richard Biener  ---
So what's interesting is that we now get as of r14-2117-gdd86a5a69cbda4 the
following.  The odd thing is that we fail to eliminate the load permutation
{ 3 2 1 0 } even though this is a reduction group.

I _suppose_ the reason is the { 0 0 0 0 } load permutation (the "splat")
which we don't "support".  In vect_optimize_slp_pass::start_choosing_layouts
there's

  if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
{
  /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node
 unpermuted, record a layout that reverses this permutation.

 We would need more work to cope with loads that are internally
 permuted and also have inputs (such as masks for
 IFN_MASK_LOADs).  */
  gcc_assert (partition.layout == 0 && !m_slpg->vertices[node_i].succ);
  if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt))
continue;

which means we'll keep the permute there (well, that's OK - any permute
of the permute will retain it ...).  I suspect this prevents the optimization
here.  Massaging start_choosing_layouts to allow a splat on element zero
for a non-grouped access breaks things as we try to move that permute.
So I guess this needs a new kind of layout constraint?  The permute
can absorb any permute but we cannot "move" it.

Richard?


t.c:14:18: note:   === scheduling SLP instances ===
t.c:14:18: note:   Vectorizing SLP tree:
t.c:14:18: note:   node 0x4304170 (max_nunits=16, refcnt=2) vector(4) double
t.c:14:18: note:   op template: _21 = _20 + results$d_60;
t.c:14:18: note:stmt 0 _21 = _20 + results$d_60;
t.c:14:18: note:stmt 1 _17 = _16 + results$c_58;
t.c:14:18: note:stmt 2 _13 = _12 + results$b_56;
t.c:14:18: note:stmt 3 _9 = _8 + results$a_54;
t.c:14:18: note:children 0x43041f8 0x4304418
t.c:14:18: note:   node 0x43041f8 (max_nunits=16, refcnt=1) vector(4) double
t.c:14:18: note:   op template: _20 = _1 * _19;
t.c:14:18: note:stmt 0 _20 = _1 * _19;
t.c:14:18: note:stmt 1 _16 = _1 * _15;
t.c:14:18: note:stmt 2 _12 = _1 * _11;
t.c:14:18: note:stmt 3 _8 = _1 * _7;
t.c:14:18: note:children 0x4304280 0x4304308
t.c:14:18: note:   node 0x4304280 (max_nunits=4, refcnt=1) vector(4) double
t.c:14:18: note:   op template: _1 = *k_50;
t.c:14:18: note:stmt 0 _1 = *k_50;
t.c:14:18: note:stmt 1 _1 = *k_50;
t.c:14:18: note:stmt 2 _1 = *k_50;
t.c:14:18: note:stmt 3 _1 = *k_50;
t.c:14:18: note:load permutation { 0 0 0 0 }
t.c:14:18: note:   node 0x4304308 (max_nunits=16, refcnt=1) vector(4) double
t.c:14:18: note:   op template: _19 = (double) _18;
t.c:14:18: note:stmt 0 _19 = (double) _18;
t.c:14:18: note:stmt 1 _15 = (double) _14;
t.c:14:18: note:stmt 2 _11 = (double) _10;
t.c:14:18: note:stmt 3 _7 = (double) _6;
t.c:14:18: note:children 0x4304390
t.c:14:18: note:   node 0x4304390 (max_nunits=16, refcnt=1) vector(16) short
int
t.c:14:18: note:   op template: _18 = _5->d;
t.c:14:18: note:stmt 0 _18 = _5->d;
t.c:14:18: note:stmt 1 _14 = _5->c;
t.c:14:18: note:stmt 2 _10 = _5->b;
t.c:14:18: note:stmt 3 _6 = _5->a;
t.c:14:18: note:load permutation { 3 2 1 0 }
t.c:14:18: note:   node 0x4304418 (max_nunits=4, refcnt=1) vector(4) double
t.c:14:18: note:   op template: results$d_60 = PHI <_21(5), 0.0(6)>
t.c:14:18: note:stmt 0 results$d_60 = PHI <_21(5), 0.0(6)>
t.c:14:18: note:stmt 1 results$c_58 = PHI <_17(5), 0.0(6)>
t.c:14:18: note:stmt 2 results$b_56 = PHI <_13(5), 0.0(6)>
t.c:14:18: note:stmt 3 results$a_54 = PHI <_9(5), 0.0(6)>
t.c:14:18: note:children 0x4304170 (nil)

[Bug c/110427] a

2023-06-27 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110427

Arsen Arsenović  changed:

   What|Removed |Added

 CC||arsen at gcc dot gnu.org

--- Comment #1 from Arsen Arsenović  ---
: In function 'main':
:4:18: warning: operation on 'a' may be undefined [-Wsequence-point]
4 | if (a < a--) {
  | ~^~

the result is simply undefined (is the first `a' pre- or post-decrement?)

[Bug c/110427] New: a

2023-06-27 Thread qurong at ios dot ac.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110427

Bug ID: 110427
   Summary: a 
int main() {
int a = 0;
if (a < a--) {
a = 1;
}
printf("%d\n", a);
return 0;
}

[Bug middle-end/106081] missed vectorization

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081
Bug 106081 depends on bug 96208, which changed state.

Bug 96208 Summary: non-grouped load can be SLP vectorized for 2-element vectors 
case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 96208, which changed state.

Bug 96208 Summary: non-grouped load can be SLP vectorized for 2-element vectors 
case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/96208] non-grouped load can be SLP vectorized for 2-element vectors case

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Richard Biener  ---
Fixed.

[Bug tree-optimization/96208] non-grouped load can be SLP vectorized for 2-element vectors case

2023-06-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:dd86a5a69cbda40cf76388a65d3317c91cb2b501

commit r14-2117-gdd86a5a69cbda40cf76388a65d3317c91cb2b501
Author: Richard Biener 
Date:   Thu Jun 22 11:40:46 2023 +0200

tree-optimization/96208 - SLP of non-grouped loads

The following extends SLP discovery to handle non-grouped loads
in loop vectorization in the case the same load appears in all
lanes.

Code generation is adjusted to mimick what we do for the case
of single element interleaving (when the load is not unit-stride)
which is already handled by SLP.  There are some limits we
run into because peeling for gap cannot cover all cases and
we choose VMAT_CONTIGUOUS.  The patch does not try to address
these issues yet.

The main obstacle is that these loads are not
STMT_VINFO_GROUPED_ACCESS and that's a new thing with SLP.
I know from the past that it's not a good idea to make them
grouped.  Instead the following massages places to deal
with SLP loads that are not STMT_VINFO_GROUPED_ACCESS.

There's already a testcase testing for the case the PR
is after, just XFAILed, the following adjusts that instead
of adding another.

I do expect to have missed some so I don't plan to push this
on a Friday.  Still there may be feedback, so posting this
now.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

PR tree-optimization/96208
* tree-vect-slp.cc (vect_build_slp_tree_1): Allow
a non-grouped load if it is the same for all lanes.
(vect_build_slp_tree_2): Handle not grouped loads.
(vect_optimize_slp_pass::remove_redundant_permutations):
Likewise.
(vect_transform_slp_perm_load_1): Likewise.
* tree-vect-stmts.cc (vect_model_load_cost): Likewise.
(get_group_load_store_type): Likewise.  Handle
invariant accesses.
(vectorizable_load): Likewise.

* gcc.dg/vect/slp-46.c: Adjust for new vectorizations.
* gcc.dg/vect/bb-slp-pr65935.c: Adjust.

[Bug middle-end/110377] Early VRP and IPA-PROP should work out value ranges from __builtin_unreachable

2023-06-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377

Jan Hubicka  changed:

   What|Removed |Added

   Last reconfirmed||2023-06-27
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #5 from Jan Hubicka  ---
OK,
I think we want to use ranger in the analysis stage then. I am testing
the following.

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 704fe01b02c..4bc142e1471 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2339,7 +2339,8 @@ ipa_set_jfunc_vr (ipa_jump_func *jf, value_range *tmp)

 static void
 ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi,
-struct cgraph_edge *cs)
+struct cgraph_edge *cs,
+gimple_ranger *ranger)
 {
   ipa_node_params *info = ipa_node_params_sum->get (cs->caller);
   ipa_edge_args *args = ipa_edge_args_sum->get_create (cs);
@@ -2384,7 +2385,7 @@ ipa_compute_jump_functions_for_edge (struct
ipa_func_body_info *fbi,

  if (TREE_CODE (arg) == SSA_NAME
  && param_type
- && get_range_query (cfun)->range_of_expr (vr, arg)
+ && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt)
  && vr.nonzero_p ())
addr_nonzero = true;
  else if (tree_single_nonzero_warnv_p (arg, _overflow))
@@ -2407,7 +2408,7 @@ ipa_compute_jump_functions_for_edge (struct
ipa_func_body_info *fbi,
 integers and pointers.  */
  && irange::supports_p (TREE_TYPE (arg))
  && irange::supports_p (param_type)
- && get_range_query (cfun)->range_of_expr (vr, arg)
+ && ranger->range_of_expr (vr, arg, cs->call_stmt)
  && !vr.undefined_p ())
{
  value_range resvr = vr;
@@ -2516,7 +2517,8 @@ ipa_compute_jump_functions_for_edge (struct
ipa_func_body_info *fbi,
from BB.  */

 static void
-ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, basic_block
bb)
+ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, basic_block
bb,
+  gimple_ranger *ranger)
 {
   struct ipa_bb_info *bi = ipa_get_bb_info (fbi, bb);
   int i;
@@ -2535,7 +2537,7 @@ ipa_compute_jump_functions_for_bb (struct
ipa_func_body_info *fbi, basic_block b
  && !gimple_call_fnspec (cs->call_stmt).known_p ())
continue;
}
-  ipa_compute_jump_functions_for_edge (fbi, cs);
+  ipa_compute_jump_functions_for_edge (fbi, cs, ranger);
 }
 }

@@ -3109,19 +3111,27 @@ class analysis_dom_walker : public dom_walker
 {
 public:
   analysis_dom_walker (struct ipa_func_body_info *fbi)
-: dom_walker (CDI_DOMINATORS), m_fbi (fbi) {}
+: dom_walker (CDI_DOMINATORS), m_fbi (fbi)
+  {
+m_ranger = enable_ranger (cfun, false);
+  }
+  ~analysis_dom_walker ()
+  {
+disable_ranger (cfun);
+  }

   edge before_dom_children (basic_block) final override;

 private:
   struct ipa_func_body_info *m_fbi;
+  gimple_ranger *m_ranger;
 };

 edge
 analysis_dom_walker::before_dom_children (basic_block bb)
 {
   ipa_analyze_params_uses_in_bb (m_fbi, bb);
-  ipa_compute_jump_functions_for_bb (m_fbi, bb);
+  ipa_compute_jump_functions_for_bb (m_fbi, bb, m_ranger);
   return NULL;
 }

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
new file mode 100644
index 000..d770f8babba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
@@ -0,0 +1,17 @@
+/* { dg-do compile */
+/* { dg-options "-O2 -fdump-ipa-fnsummary" } */
+int test3(int);
+__attribute__ ((noinline))
+void test2(int a)
+{
+   test3(a);
+}
+void
+test(int n)
+{
+if (n > 5)
+  __builtin_unreachable ();
+test2(n);
+}
+/* { dg-final { scan-tree-dump "-INF, 5-INF" "fnsummary" } }  */

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237

--- Comment #20 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:dbf8ab449417aa24669f6ccf50be8c17f8c1278e

commit r14-2116-gdbf8ab449417aa24669f6ccf50be8c17f8c1278e
Author: liuhongt 
Date:   Mon Jun 26 21:07:09 2023 +0800

Refine maskstore patterns with UNSPEC_MASKMOV.

Similar like r14-2070-gc79476da46728e

If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKMOV to prevent
it to be transformed to any other whole memory access instructions.

gcc/ChangeLog:

PR rtl-optimization/110237
* config/i386/sse.md (_store_mask): Refine with
UNSPEC_MASKMOV.
(maskstore_store_mask): New define_insn, it's renamed
from original _store_mask.

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237

--- Comment #19 from rguenther at suse dot de  ---
On Mon, 26 Jun 2023, amonakov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
> 
> --- Comment #18 from Alexander Monakov  ---
> (In reply to rguent...@suse.de from comment #17)
> > Yes, we do the same to loads.  I hope that's not a common technique
> > though but I have to admit the vectorizer itself assesses whether it's
> > safe to access "gaps" by looking at alignment so its code generation
> > is prone to this same "mistake".
> > 
> > Now, is "alignment to 16 is ensured externally" good enough here?
> > If we consider
> > 
> > static int a[2];
> > 
> > and code doing
> > 
> >  if (is_aligned (a))
> >{
> >  __v4si v = (__attribute__((may_alias)) __v4si *) 
> >}
> > 
> > then we cannot even use a DECL_ALIGN that's insufficient for decls
> > that bind locally.
> 
> I agree. I went with the 'extern' example because there it should be more
> obvious the construction ought to work.
> 
> 
> > Note we have similar arguments with aggregate type sizes (and TBAA)
> > where when we infer a dynamic type from one access we check if
> > the other access would fit.  Wouldn't the above then extend to that
> > as well given we could also do aggregate copies of "padding" and
> > ignore the bits if we'd have ensured the larger access wouldn't trap?
> 
> I think a read via a may_alias type just tells you that N bytes are accessible
> for reading, not necessarily for writing. So I don't see a problem, but maybe 
> I
> didn't quite catch what you are saying.

I wasn't sure how to phrase, what I was saying is we have this
"the access is too large for the object in consideration, so it cannot
alias it" in places where we just work with types within the TBAA
framework.  So I wondered if one can construct a similar case to
support that we should not do this.  (tree-ssa-alias.cc:
aliasing_component_refs_p)

> 
> > So supporting the above might be a bit of a stretch (though I think
> > we have to fix the vectorizer here).
> 
> What would the solution be? Using a may_alias type for such accesses?

But the size argument doesn't have anything to do with TBAA (and
may_alias is about TBAA).  I don't think we have any way to circumvent
C object access rules.  That is, for example, with -fno-strict-aliasing
the following isn't going to work.

int a;
int b;

int main()
{
  a = 1;
  b = 2;
  if ( + 1 == ) // equality compare of unrelated pointers OK
{
  long x = *(long *) // access outside of 'a' not OK
  if (x != 0x00010002)
abort ();
}
}

there's no command-line flag or attribute to form a pointer
to an object composing 'a' and 'b' besides changing how the
storage is declared.

I don't think we should make an exception for "padding" after
an object and I don't see any sensible way how to constrain
the size of the supported "padding" either?  Pad to the
largest possible alignment of the object?  That would be
MAX_OFILE_ALIGNMENT ...

> 
> > > > If the v4si store is masked we cannot do this anymore, but the IL
> > > > we seed the alias oracle with doesn't know the store is partial.
> > > > The only way to "fix" it is to take away all of the information from it.
> > > 
> > > But that won't fix the trapping issue? I think we need a distinct RTX for
> > > memory accesses where hardware does fault suppression for masked-out 
> > > elements.
> > 
> > Yes, it doesn't fix that part.  The idea of using BLKmode instead of
> > a vector mode for the MEMs would, I guess, together with specifying
> > MEM_SIZE as not known.
> 
> Unfortunate if that works for the trapping side, but not for the 
> aliasing side.

It should work for both I think, but MEM_EXPR would need changing
as well - we do have a perfectly working representation there, it
would just be the first CALL_EXPR in such context ...

[Bug ada/110398] internal error on call with parameter of predicated subtype

2023-06-27 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110398

Eric Botcazou  changed:

   What|Removed |Added

 Status|WAITING |NEW
Summary|Program_Error   |internal error on call with
   |sem_eval.adb:4635 explicit  |parameter of predicated
   |raise   |subtype

--- Comment #4 from Eric Botcazou  ---
Thanks.  Confirmed on the mainline with an assertion failure:

eric@fomalhaut:~/build/gcc/native> gcc/gnat1 -quiet example.adb 
+===GNAT BUG DETECTED==+
| 14.0.0 20230626 (experimental) [master r14-2083-g068eba260fa]
(x86_64-suse-linux) |
| Assert_Failure sem.adb:650   |
| Error detected at example.adb:3:42   |
| Compiling example.adb

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334

--- Comment #12 from rguenther at suse dot de  ---
On Mon, 26 Jun 2023, hubicka at ucw dot cz wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334
> 
> --- Comment #11 from Jan Hubicka  ---
> Hi,
> what about this. It should make at least quite basic inlining to happen
> to always_inline. I do not think many critical always_inlines have
> indirect calls in them.  The test for lto is quite bad and I can
> work on solving this incrementally (it would be nice to have this
> tested and possibly backport it).
> 
> diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
> index efc8df7d4e0..dcec07e49e1 100644
> --- a/gcc/ipa-inline.cc
> +++ b/gcc/ipa-inline.cc
> @@ -702,6 +702,38 @@ can_early_inline_edge_p (struct cgraph_edge *e)
>if (!can_inline_edge_p (e, true, true)
>|| !can_inline_edge_by_limits_p (e, true, false, true))
>  return false;
> +  /* When inlining regular functions into always-inline functions
> + during early inlining watch for possible inline cycles.  */
> +  if (DECL_DISREGARD_INLINE_LIMITS (caller->decl)
> +  && lookup_attribute ("always_inline", DECL_ATTRIBUTES (caller->decl))
> +  && (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
> + || !lookup_attribute ("always_inline", DECL_ATTRIBUTES
> (callee->decl
> +{
> +  /* If there are indirect calls, inlining may produce direct call.
> +TODO: We may lift this restriction if we avoid errors on formely
> +indirect calls to always_inline functions.  Taking address
> +of always_inline function is generally bad idea and should
> +have been declared as undefined, but sadly we allow this.  */
> +  if (caller->indirect_calls || e->callee->indirect_calls)

why disallow caller->indirect_calls?

> +   return false;
> +  for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee)

I don't think this flys - it looks quadratic.  Can we compute this
in the inline summary once instead?

As for indirect calls, can we maybe mark initial direct GIMPLE call
stmts as "always-inline" and only look at that marking, thus an
indirect call will never become "always-inline"?  Iff cgraph edges
prevail during all early inlining we could mark call edges for
this purpose?

> +   {
> + struct cgraph_node *callee2 = e2->callee->ultimate_alias_target ();
> + /* As early inliner runs in RPO order, we will see uninlined
> +always_inline calls only in the case of cyclic graphs.  */
> + if (DECL_DISREGARD_INLINE_LIMITS (callee2->decl)
> + || lookup_attribute ("always_inline", callee2->decl))
> +   return false;
> + /* With LTO watch for case where function is later replaced
> +by always_inline definition.
> +TODO: We may either stop treating noninlined cross-module always
> +inlines as errors, or we can extend decl merging to produce
> +syntacic alias and honor always inline only in units it has
> +been declared as such.  */
> + if (flag_lto && callee2->externally_visible)
> +   return false;
> +   }
> +}
>return true;
>  }
> 
> @@ -3034,18 +3066,7 @@ early_inliner (function *fun)
> 
>if (!optimize
>|| flag_no_inline
> -  || !flag_early_inlining
> -  /* Never inline regular functions into always-inline functions
> -during incremental inlining.  This sucks as functions calling
> -always inline functions will get less optimized, but at the
> -same time inlining of functions calling always inline
> -function into an always inline function might introduce
> -cycles of edges to be always inlined in the callgraph.
> -
> -We might want to be smarter and just avoid this type of inlining.  */
> -  || (DECL_DISREGARD_INLINE_LIMITS (node->decl)
> - && lookup_attribute ("always_inline",
> -  DECL_ATTRIBUTES (node->decl
> +  || !flag_early_inlining)
>  ;
>else if (lookup_attribute ("flatten",
>  DECL_ATTRIBUTES (node->decl)) != NULL)
> 
>

[Bug tree-optimization/110414] [14 Regression] Dead Code Elimination Regression since r14-1127-g9e2017ae6ac

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110414

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Blocks||109849
   Keywords||missed-optimization


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
[Bug 109849] suboptimal code for vector walking loop

[Bug tree-optimization/110413] [14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r14-1880-g827e208fa64

2023-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110413

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |14.0
 Blocks||110269


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110269
[Bug 110269] [13 Regression] Missed Dead Code Elimination when using
__builtin_unreachable since r13-4607-g2dc5d6b1e7e

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2023-06-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #21 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:a90f558bbb87c0b5d2b1e07d55bd585b2285cf3d

commit r14-2114-ga90f558bbb87c0b5d2b1e07d55bd585b2285cf3d
Author: liuhongt 
Date:   Mon Jun 26 13:59:29 2023 +0800

Don't issue vzeroupper for vzeroupper call_insn.

gcc/ChangeLog:

PR target/82735
* config/i386/i386.cc (ix86_avx_u127_mode_needed): Don't emit
vzeroupper for vzeroupper call_insn.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-vzeroupper-30.c: New test.

[Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh

2023-06-27 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #6 from Hongyu Wang  ---
Thanks for the fix, now for the attached test, main loop will not have any
load. 

There is a remaining issue that the loop epilogue still contains load from
stack and constant pool

.L9:
movslq  %edx, %rax
movss   72(%rsp), %xmm5
salq$2, %rax
leaq(%rbx,%rax), %rcx
movaps  %xmm5, %xmm1
subss   (%rcx), %xmm1
andps   .LC4(%rip), %xmm1
movss   %xmm1, (%rcx)
leal1(%rdx), %ecx
addss   %xmm1, %xmm0
cmpl%ecx, %r12d
jle .L8

IRA dump shows the pseudos does not have conflict but they still failed to be
allocated with register. This issue does not exist on aarch64.

<    1   2