[Bug tree-optimization/114231] [12/13/14 regression] ICE when building libjxl

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek  ---
On x86_64 the #c6 testcase with -O3 -fno-vect-cost-model started to ICE with
r14-5603-g2b59e2b4dff42118fe3a505f07b9a6aa4cf53bdf
For aarch64 same testcase, my bet is
r12-1551-g3dfa4fe9f1a089b2b3906c83e22a1b39c49d937c
though I've only verified r12-1529 works and r12-1573 ICEs and there are no IL
differences before slp2 which newly ICEs.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-04 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #14 from Andreas Krebbel  ---
If my analysis from comment #1 is correct, combine does superfluous steps here.
Getting rid of this should not cause any harm, but should be beneficial for
other targets as well. I agree that the patch I've proposed is kind of a hack.
Do you think this could be turned into a proper fix?

[Bug target/114233] Newly-introduced pr113617.C test fails on Darwin

2024-03-04 Thread fxcoudert at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114233

Francois-Xavier Coudert  changed:

   What|Removed |Added

 Target||x86_64-apple-darwin23
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-03-05
 CC||iains at gcc dot gnu.org
 Ever confirmed|0   |1

[Bug target/114233] New: Newly-introduced pr113617.C test fails on Darwin

2024-03-04 Thread fxcoudert at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114233

Bug ID: 114233
   Summary: Newly-introduced pr113617.C test fails on Darwin
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fxcoudert at gcc dot gnu.org
  Target Milestone: ---

FAIL: g++.dg/other/pr113617.C  -std=gnu++14 (test for excess errors)
Excess errors:
ld: Undefined symbols:
  R::Y>::operator->(), referenced from:
  A::foo(long long, long long) in cci8MVgO.o
  R::Y>::operator->(), referenced from:
  A::foo(long long, long long) in cci8MVgO.o
  N1::N2::N3::AB::bleh(), referenced from:
  A::foo(long long, long long) in cci8MVgO.o
  N1::N2::N3::AC::m1(R::S), referenced from:
  void N1::N2::N3::X<1>::boo, false>>(long long, long long, long long, N1::N2::N3::C<(anonymous
namespace)::D, false>&) in cci8MVgO.o
  void N1::N2::N3::X<1>::boo, false>>(long long, long long, long long, N1::N2::N3::C<(anonymous
namespace)::D, false>&) in ccjwgqSE.o
  N1::N2::N3::AC::AC(int), referenced from:
  void N1::N2::N3::X<1>::boo, false>>(long long, long long, long long, N1::N2::N3::C<(anonymous
namespace)::D, false>&) in cci8MVgO.o
  void N1::N2::N3::X<1>::boo, false>>(long long, long long, long long, N1::N2::N3::C<(anonymous
namespace)::D, false>&) in ccjwgqSE.o
  _main, referenced from:
  

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #3 from Sam James  ---
I am reducing it now but it's slow going.

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #2 from Sam James  ---
Created attachment 57610
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57610=edit
TraceStream.cc.ii.xz

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

--- Comment #1 from Sam James  ---
Created attachment 57609
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57609=edit
Task.cc.ii.xz

[Bug target/114232] New: [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232

Bug ID: 114232
   Summary: [14 regression] ICE when building rr-5.7.0 with LTO on
x86
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sjames at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57608
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57608=edit
RecordSession.cc.ii.xz

Hit this when building rr-5.7.0 with LTO on x86.

```
$ cat list.txt

RecordSession.cc.ii
Task.cc.ii
TraceStream.cc.ii

$ g++ -O3 -pipe -march=i686 -mfpmath=sse -msse -msse2 -fno-vect-cost-model
-rdynamic -flto=auto @list.txt
/var/tmp/portage/dev-util/rr-5.7.0/work/rr-5.7.0/src/TraceStream.cc: In member
function ‘close’:
/var/tmp/portage/dev-util/rr-5.7.0/work/rr-5.7.0/src/TraceStream.cc:1467:1:
error: unrecognizable insn:
 1467 | }
  | ^
(insn 160 159 161 26 (parallel [
(set (reg:V2QI 250 [ vect_patt_207.470_183 ])
(minus:V2QI (reg:V2QI 251)
(reg:V2QI 249 [ vect__4.468_451 ])))
(clobber (reg:CC 17 flags))
])
"/var/tmp/portage/dev-util/rr-5.7.0/work/rr-5.7.0/src/TraceStream.cc":254:16 -1
 (nil))
during RTL pass: vregs
/var/tmp/portage/dev-util/rr-5.7.0/work/rr-5.7.0/src/TraceStream.cc:1467:1:
internal compiler error: in extract_insn, at recog.cc:2812
0x5799263a _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/rtl-error.cc:108
0x579927e8 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/rtl-error.cc:116
0x56eadade extract_insn(rtx_insn*)
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/recog.cc:2812
0x58ac1379 instantiate_virtual_regs_in_insn
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/function.cc:1611
0x58ac1379 instantiate_virtual_regs
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/function.cc:1994
0x58ac1379 execute
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/function.cc:2041
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://bugs.gentoo.org/> for instructions.
make: *** [/tmp/ccCI1g9e.mk:17: /tmp/ccZVEvZf.ltrans5.ltrans.o] Error 1
lto-wrapper: fatal error: make returned 2 exit status
compilation terminated.
/usr/lib/gcc/i686-pc-linux-gnu/14/../../../../i686-pc-linux-gnu/bin/ld: error:
lto-wrapper failed
collect2: error: ld returned 1 exit status
```

```
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-pc-linux-gnu/14/lto-wrapper
Target: i686-pc-linux-gnu
Configured with:
/var/tmp/portage/sys-devel/gcc-14.0./work/gcc-14.0./configure
--host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --prefix=/usr
--bindir=/usr/i686-pc-linux-gnu/gcc-bin/14
--includedir=/usr/lib/gcc/i686-pc-linux-gnu/14/include
--datadir=/usr/share/gcc-data/i686-pc-linux-gnu/14
--mandir=/usr/share/gcc-data/i686-pc-linux-gnu/14/man
--infodir=/usr/share/gcc-data/i686-pc-linux-gnu/14/info
--with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/14/include/g++-v14
--disable-silent-rules --disable-dependency-tracking
--with-python-dir=/share/gcc-data/i686-pc-linux-gnu/14/python
--enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt
--disable-werror --with-system-zlib --enable-nls --without-included-gettext
--disable-libunwind-exceptions --enable-checking=yes,extra,rtl,df
--with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 14.0. p,
commit c8305c9bdf09abe3e2f89783fe62f2e4049468fa' --with-gcc-major-version-only
--enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
--disable-multilib --disable-fixed-point --with-arch=i686 --enable-targets=all
--enable-libgomp --disable-libssp --disable-libada --disable-cet
--disable-systemtap --enable-valgrind-annotations --disable-vtable-verify
--disable-libvtv --with-zstd --without-isl --enable-default-pie
--enable-host-pie --disable-host-bind-now --enable-default-ssp
--disable-fixincludes --with-build-config='bootstrap-O3 bootstrap-lto'
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.1 20240304 (experimental)
a89c5df317d1de74871e2a05c36aed9cbbb21f42 (Gentoo 14.0. p, commit
c8305c9bdf09abe3e2f89783fe62f2e4049468fa)
```

[Bug tree-optimization/114231] [12/13/14 regression] ICE when building libjxl

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

--- Comment #9 from Andrew Pinski  ---
The testcases are kinda of flaky because the following fails only on the trunk
for aarch64:
```

void f(long*);
int ff[2];
long tt[4];
unsigned long ttt;
void k(long x, long y) {
  long t = x >> ff[0];
  long t1 = ff[1];
  long t2 = y >> ff[0];
  tt[0] = t1;
  tt[1] = t+t2;
  tt[2] = t2;
}
```

[Bug tree-optimization/114231] [12/13/14 regression] ICE when building libjxl

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

--- Comment #8 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #7)
> So my reduced testcase fails for aarch64 since GCC 12 but for x86_64 only on
> the trunk. I suspect the commit that it will bisect to on x86_64 is just
> enabling the pattern for x86_64.
> 
> So if anyone does a bisect, please try on aarch64.

Note also use `-O3 -fno-vect-cost-model` for the options since -O2 might catch
when the vectorizer is turned on for -O2.

[Bug tree-optimization/114231] [12/13/14 regression] ICE when building libjxl

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|14.0|12.4
   Keywords||needs-bisection
  Known to work||11.1.0
Summary|[14 regression] ICE when|[12/13/14 regression] ICE
   |building libjxl |when building libjxl
 Target||aarch64 x86_64

--- Comment #7 from Andrew Pinski  ---
So my reduced testcase fails for aarch64 since GCC 12 but for x86_64 only on
the trunk. I suspect the commit that it will bisect to on x86_64 is just
enabling the pattern for x86_64.

So if anyone does a bisect, please try on aarch64.

[Bug tree-optimization/114231] [14 regression] ICE when building libjxl

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

--- Comment #6 from Andrew Pinski  ---
A little better reduced (this time only 1 BB even):
```

void f(long*);
int ff[2];
void f2(long, long, unsigned long);
void k(unsigned long x, unsigned long y) {
  long t = x >> ff[0];
  long t1 = ff[1];
  unsigned long t2 = y >> ff[0];
  f2(t1, t+t2, t2);
}
```

[Bug tree-optimization/114231] [14 regression] ICE when building libjxl

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

--- Comment #5 from Andrew Pinski  ---
Little more reduced:
```

void f(long*);
int ff[2];
void f1(long, long);
void k(unsigned long x, unsigned long y) {
  long t = x >> ff[0];
  long t1 = ff[1];
  unsigned long t2 = y >> ff[0];
  long t3 = t+t2 ? t2 : 0;
  f1(t1, t3);
}
```

/app/example.cpp:9:14: missed: unusable type for last operand in vector/vector
shift/rotate.

Note if you change the type of ff to long, this works.

[Bug tree-optimization/114231] [14 regression] ICE when building libjxl

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-03-05

--- Comment #4 from Andrew Pinski  ---
Further reduced:
```
static inline
long ClampedSize(long begin, unsigned long size_max) {
  return begin + size_max ? size_max : 0;
}

void f(long*);
int ff[2];
void k(unsigned long x, unsigned long y) {
  long t = x >> ff[0];
  long t1 = ff[1];
  long t2 = y >> ff[0];
  long t3 = ClampedSize(t, t2);
  long t4[2];
  t4[0] = t1;
  t4[1] = t3;
  f(t4);
}
```

[Bug tree-optimization/114231] [14 regression] ICE when building libjxl

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

--- Comment #3 from Andrew Pinski  ---
vectorizable_shift

[Bug tree-optimization/114231] [14 regression] ICE when building libjxl

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

--- Comment #2 from Sam James  ---
(In reply to Sam James from comment #1)
> Created attachment 57607 [details]
> reduced.ii
> 
> `g++ -c reduced.ii -march=sapphirerapids -O2 -fno-vect-cost-model` is enough
> for the reduced version.

in fact, g++ -c reduced.ii -O2 -fno-vect-cost-model is enough

[Bug tree-optimization/114231] [14 regression] ICE when building libjxl

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
   Target Milestone|--- |14.0
 CC||pinskia at gcc dot gnu.org

[Bug tree-optimization/114231] [14 regression] ICE when building libjxl

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

--- Comment #1 from Sam James  ---
Created attachment 57607
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57607=edit
reduced.ii

`g++ -c reduced.ii -march=sapphirerapids -O2 -fno-vect-cost-model` is enough
for the reduced version.

[Bug tree-optimization/114231] New: [14 regression] ICE when building libjxl

2024-03-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231

Bug ID: 114231
   Summary: [14 regression] ICE when building libjxl
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sjames at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57606
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57606=edit
enc_modular.ii.xz

Originally reported downstream in Gentoo by tdr.

```
$ g++ -c enc_modular.ii -mrtm -mshstk -march=sapphirerapids -O3
-fno-vect-cost-model
during GIMPLE pass: slp
/var/tmp/portage/media-libs/libjxl-0.9.1-r1/work/libjxl-0.9.1/lib/jxl/enc_modular.cc:
In member function ‘jxl::Status
jxl::ModularFrameEncoder::PrepareStreamParams(const jxl::Rect&, const
jxl::CompressParams&, int, int, const jxl::ModularStreamId&, bool)’:
/var/tmp/portage/media-libs/libjxl-0.9.1-r1/work/libjxl-0.9.1/lib/jxl/enc_modular.cc:1294:8:
internal compiler error: in vect_transform_stmt, at tree-vect-stmts.cc:13361
 1294 | Status ModularFrameEncoder::PrepareStreamParams(const Rect& rect,
  |^~~
0x55bdf7edf2b0 vect_transform_stmt(vec_info*, _stmt_vec_info*,
gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-stmts.cc:13361
0x55bdf99cd64d vect_schedule_slp_node
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:9410
0x55bdf99cd64d vect_schedule_slp_node
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:9203
0x55bdf99ccaa2 vect_schedule_scc
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:9645
0x55bdf96e6382 vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap,
vl_ptr> const&)
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:9790
0x55bdf9468b27 vect_slp_region
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:7911
0x55bdf94648c3 vect_slp_bbs
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:8011
0x55bdf94626be vect_slp_function(function*)
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:8127
0x55bdf9461e1c execute
   
/usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vectorizer.cc:1533
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://bugs.gentoo.org/> for instructions.
```

```
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/14/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with:
/var/tmp/portage/sys-devel/gcc-14.0./work/gcc-14.0./configure
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr
--bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/14
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/14/include
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14/man
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14/info
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14
--disable-silent-rules --disable-dependency-tracking
--with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/14/python
--enable-languages=c,c++,fortran,rust --enable-obsolete --enable-secureplt
--disable-werror --with-system-zlib --enable-nls --without-included-gettext
--disable-libunwind-exceptions --enable-checking=yes,extra,rtl
--with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo Hardened
14.0. p, commit c8305c9bdf09abe3e2f89783fe62f2e4049468fa'
--with-gcc-major-version-only --enable-libstdcxx-time --enable-lto
--disable-libstdcxx-pch --enable-shared --enable-threads=posix
--enable-__cxa_atexit --enable-clocale=gnu --enable-multilib
--with-multilib-list=m32,m64 --disable-fixed-point --enable-targets=all
--enable-libgomp --disable-libssp --disable-libada --disable-cet
--disable-systemtap --enable-valgrind-annotations --disable-vtable-verify
--disable-libvtv --with-zstd --with-isl --disable-isl-version-check
--enable-default-pie --enable-host-pie --enable-host-bind-now
--enable-default-ssp --disable-fixincludes --with-build-config='bootstrap-O3
bootstrap-lto'
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.1 20240304 (experimental)
eae6b63b5b5426f943f58b5ae0bf0a6068ca8ad6 (Gentoo Hardened 14.0. p, commit
c8305c9bdf09abe3e2f89783fe62f2e4049468fa)
```

[Bug c/8960] invalid error mode `SI' applied to inappropriate type

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8960

--- Comment #14 from Andrew Pinski  ---
I am not 100% sure if this is actually valid.
The question becomes does the attribute in this case applies to the return type
or the type of the function?

The manual is not clear here either.

[Bug libfortran/93550] Implement control of leading zero in formatted numeric output

2024-03-04 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93550

--- Comment #4 from Jerry DeLisle  ---
The LEADING_ZERO specifiers are now included in the 2023 standard, so away we
go! In support of lazy programmers lets make the compiler do it. ;)

[Bug target/114194] ICE when using std::unique_ptr with xtheadvector

2024-03-04 Thread bruce at hoult dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194

--- Comment #6 from Bruce Hoult  ---
The ICE also happens with bzero().

The ICE does NOT happen with a constant length of 16 of greater, in which case
a function call is made instead of expanding inline.

With rv64gv or rv64gcv a series of N `sb` are generated (N < 16)

With rv64gc_xtheadvector, N >= 6, and -Os a tail call to memset is generated,
no ICE. With N < 6 ... ICE.

So the problem is only trying to expand memset() or bzero() inline. Does it try
to use a vectorised memset? That doesn't happen with rv64gcv.

memcpy() does not ICE for any N.

I assume the originally reported C++ code is generating a memset() to
initialise one of the classes/structs.

[Bug tree-optimization/114230] Missed optimization of loop deletion: `a!=0`

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114230

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
Summary|Missed optimization of loop |Missed optimization of loop
   |deletion: a=0||a|deletion: `a!=0`
 CC||pinskia at gcc dot gnu.org
   Keywords||missed-optimization
   Severity|normal  |enhancement
   Last reconfirmed||2024-03-05

--- Comment #1 from Andrew Pinski  ---
Confirmed.
we have:
```
   [local count: 1063004408]:
  # i_11 = PHI 
  # a_lsm.4_13 = PHI <_3(5), a_lsm.4_5(2)>
  _2 = a_lsm.4_13 != 0;
  _3 = (int) _2;
  i_8 = i_11 + 1;
  if (i_8 != 10)
goto ; [98.99%]
  else
goto ; [1.01%]

   [local count: 1052266995]:
  goto ; [100.00%]
```

Which sccp does not handle `(int)a != 0` currently. It does handle `a|=b;`,
`a^=b;`, and `a&=b;` though.

[Bug tree-optimization/114230] New: Missed optimization of loop deletion: a=0||a

2024-03-04 Thread 652023330028 at smail dot nju.edu.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114230

Bug ID: 114230
   Summary: Missed optimization of loop deletion: a=0||a
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: 652023330028 at smail dot nju.edu.cn
  Target Milestone: ---

Hello, we noticed that in the code below, looping is not necessary (the value
of 0||a doesn't change), but gcc seems to have missed this optimization.

https://godbolt.org/z/bx9jEfb63

int a;
void func(){
for(int i=0;i<10;i++){
a=0||a;
}
}

GCC -O3:
func():
mov edx, DWORD PTR a[rip]
mov eax, 10
.L2:
testedx, edx
setne   dl
movzx   edx, dl
sub eax, 1
jne .L2
mov DWORD PTR a[rip], edx
ret

Expected code (Clang):
func():   # @func()
xor eax, eax
cmp dword ptr [rip + a], 0
setne   al
mov dword ptr [rip + a], eax
ret

Thank you very much for your time and effort! We look forward to hearing from
you.

[Bug target/113859] popcount HI can be vectorized for non-SVE

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
   Last reconfirmed||2024-03-05

--- Comment #2 from Andrew Pinski  ---
Mine.

[Bug other/79469] Feature request: provide `__builtin_assume` builtin function to allow more aggressive optimizations and to match clang

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79469

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|NEW |RESOLVED

--- Comment #6 from Andrew Pinski  ---
So even though clang/LLVM has not implement the attribute yet
(https://github.com/llvm/llvm-project/pull/81014), adding another extension is
not a good idea for GCC so closing as won't fix. There is a reason why this got
standarized is so it can be implemented in a cross compiler way.

Also the builtin has an odd definition when it comes to the whole no side
effects (though the attribute has that, it is not directly part of the a
function call).

[Bug target/54284] -mabi=ieeelongdouble problems

2024-03-04 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54284

Peter Bergner  changed:

   What|Removed |Added

 CC|bergner at vnet dot ibm.com,   |bergner at gcc dot 
gnu.org,
   |dje.gcc at gmail dot com   |dje at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Peter Bergner  ---
I'm pretty sure this has been long ago fixed, so I'm going to close this as
FIXED.

[Bug target/50329] [PowerPC] Unnecessary stack frame set up

2024-03-04 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50329

Peter Bergner  changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #2)
> Current trunk (to be GCC 6) optimises "c" perfectly.  Not the other
> two, alas.

Current trunk (to be GCC 14) optimizes all of them now.  Marking as FIXED.

a:
li 9,-1
rldicr 9,9,0,0
std 9,0(3)
blr
b:
li 9,-1
rldicr 9,9,0,0
std 9,0(3)
blr
c:
li 9,0
li 10,-1
rldimi 9,10,63,0
std 9,0(3)
blr

[Bug target/36557] -m32 -mpowerpc64 produces better code than -m64 for a!=0

2024-03-04 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36557

Peter Bergner  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||bergner at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #5 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #4)
> We now do
> 
> cntlzw 3,3
> srwi 3,3,5
> xori 3,3,0x1
> blr
> 
> which is still not optimal (and not what -m32 / -m32 -mpowerpc64 do).

My GCC 10 and later compiles show we now generate:

addic 9,3,-1
subfe 3,9,3
blr

Marking as FIXED.

[Bug target/33236] -mminimal-toc register should be psedu-register

2024-03-04 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33236

Peter Bergner  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|NEW |RESOLVED
 CC||bergner at gcc dot gnu.org

--- Comment #5 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #4)
> Still happens.

I'm marking this as WONTFIX since -mminimal-toc is an option that is basically
never used with the introduction of -mcmodel=medium (and is the default) and
which results in ideal code for this testcase.

[Bug target/31557] return 0x80000000UL code gen can be improved

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31557

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.0

[Bug target/31557] return 0x80000000UL code gen can be improved

2024-03-04 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31557

Peter Bergner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||bergner at gcc dot gnu.org
 Status|REOPENED|RESOLVED

--- Comment #7 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #6)
> Actually, huh, *not* fixed on trunk yet.

This was fixed in GCC 13.  Marking it as FIXED.

[Bug target/113001] [14 Regression] RISCV Zicond ICE: in extract_insn, at recog.cc:2812 with -O2 rv64gcv_zicond

2024-03-04 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113001

--- Comment #3 from Jeffrey A. Law  ---
*** Bug 112871 has been marked as a duplicate of this bug. ***

[Bug target/112871] [14 Regression] RISCV ICE: in extract_insn, at recog.cc:2804 (unrecognizable insn) with -01 rv32gc_zicond

2024-03-04 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112871

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Jeffrey A. Law  ---
Same path through the conditional move expansion code.

*** This bug has been marked as a duplicate of bug 113001 ***

[Bug target/114194] ICE when using std::unique_ptr with xtheadvector

2024-03-04 Thread bruce at hoult dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194

--- Comment #5 from Bruce Hoult  ---
oops .. 379 lines .. I grep'd wrong. Anyway...

gcc/config/riscv/riscv-vector-switch.def

-ENTRY (RVVMF2QI, true, LMUL_F2, 16)
-ENTRY (RVVMF4QI, true, LMUL_F4, 32)
-ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32, LMUL_F8, 64)
+ENTRY (RVVMF2QI, !TARGET_XTHEADVECTOR, LMUL_F2, 16)
+ENTRY (RVVMF4QI, !TARGET_XTHEADVECTOR, LMUL_F4, 32)
+ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F8, 64)

Fractional LMUL (including RVVMF8QI) is removed. Correct, 0.7.1 doesn't have
it.

But something still tries to use it.

[Bug c++/98356] [11/12/13/14 Regression] ICE in cp_parser_dot_deref_incomplete, at cp/parser.c:7899 since r9-4841-g2139fd74f31449c0

2024-03-04 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98356

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org
   Keywords||patch

--- Comment #7 from Marek Polacek  ---
This has a patch now:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647157.html

[Bug target/114194] ICE when using std::unique_ptr with xtheadvector

2024-03-04 Thread bruce at hoult dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194

--- Comment #4 from Bruce Hoult  ---
I've bisected this and the problem is introduced in 2d7205eb2c3 "RISC-V: Handle
differences between XTheadvector and Vector"

Fortunately this commit touches only 136 lines of code, unlike the later two
xtheadvector commits which are 1119 and 204 touched lines.

[Bug target/114224] popcount RTL cost seems wrong with cssc

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114224

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
   Last reconfirmed||2024-03-04
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
Interesting:
```
int h1(unsigned a)
{
return  __builtin_popcountg(a) == 1;
}
```
works.


Anyways I will be adding POPCOUNT's rtl cost here.

We don't even handle POPCOUNT for vector modes either ...

[Bug middle-end/106727] Missed fold / canonicalization for checking if a number is a power of 2

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106727

--- Comment #3 from Andrew Pinski  ---
(In reply to Richard Biener from comment #1)
> Confirmed.  Expanding __builtin_popcount (n) <= 1 as (n & (n - 1)) == 0
> might be already done.  The canonicalization could be applied if .POPCOUNT
> is available.

No, it is not already done, expanding `__builtin_popcount (n) == 1` is done
(and including if n is known not to include 0 which is exapnded as `n & (n - 1)
== 0`)

[Bug middle-end/106727] Missed fold / canonicalization for checking if a number is a power of 2

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106727

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Andrew Pinski  ---
Mine.

[Bug libstdc++/97759] Could std::has_single_bit be faster?

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
   Last reconfirmed||2024-03-04
 Status|UNCONFIRMED |ASSIGNED

--- Comment #15 from Andrew Pinski  ---
>popcount (x) == 1 || x == 0

That could be optimized to just `popcount (x) <= 1`.

I am going to look to see what is left in GCC 15.

[Bug tree-optimization/90693] Missing popcount simplifications

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|wilco at gcc dot gnu.org   |pinskia at gcc dot 
gnu.org

--- Comment #14 from Andrew Pinski  ---
> __builtin_popcount (x) == 1 into x == (x & -x)

Actually that should be `__builtin_popcount (x) <= 1`

Anyways I am going to implement the rest here due to PR 94787 .

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

Richard Sandiford  changed:

   What|Removed |Added

  Attachment #57602|0   |1
is obsolete||

--- Comment #42 from Richard Sandiford  ---
Created attachment 57605
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57605=edit
proof-of-concept patch to suppress peeling for gaps

How about the attached?  It records whether all accesses that require peeling
for gaps could instead have used gathers, and only retries when that's true. 
It means that we retry for only 0.034% of calls to vect_analyze_loop_1 in a
build of SPEC2017 with -mcpu=neoverse-v1 -Ofast -fomit-frame-pointer.

The figures exclude wrf, which failed for me with:

module_mp_gsfcgce.fppized.f90:852:23:

  852 |REAL FUNCTION ggamma(X)
  |   ^
Error: definition in block 18 does not dominate use in block 13
for SSA_NAME: stmp_pf_6.5657_140 in statement:
pf_81 = PHI 
PHI argument
stmp_pf_6.5657_140
for PHI node
pf_81 = PHI 
during GIMPLE pass: vect
module_mp_gsfcgce.fppized.f90:852:23: internal compiler error: verify_ssa
failed

Will look at that tomorrow.

[Bug middle-end/114198] [14] RISC-V fixed-length vector -flto ICE: in vectorizable_load, at tree-vect-stmts.cc:10570

2024-03-04 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114198

--- Comment #2 from Patrick O'Neill  ---
(In reply to Richard Biener from comment #1)
> Probably also with -fwhole-program instead of -flto

Thanks! Updated args (--param=riscv-autovec-preference=fixed-vlmax was recently
removed):

-march=rv64gcv -fwhole-program -O3 -mrvv-vector-bits=zvl
or
-march=rv64gcv -flto -O3 -mrvv-vector-bits=zvl

Updated godbolt: https://godbolt.org/z/qb9bK61xM

[Bug target/114083] Possible word play on conditional/unconditional

2024-03-04 Thread roland.illig at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114083

--- Comment #6 from Roland Illig  ---
(In reply to Maciej W. Rozycki from comment #4)
> The flag enables the use of the conditional-move operations even with
> hardware that has no support for such operations, hence unconditionally.

Thank you for your explanation, that made the intention much clearer to me.

There's a problem with the wording though. On a platform that doesn't support
conditional-move operations, it's not possible to _use_ conditional-move
operations. Period. It's only possible to _emulate_ the behavior of these
operations.

I'm not sure how consistently the words 'operation' and 'instruction' are used
in the GCC code base and documentation, but I mixed them up in my mind when I
tried to translate this option.

> if someone has
> a better proposal, then please feel free to submit a patch.  Or would:
> 
> Enable conditional-move operations unconditionally.
> 
> be preferable?

No. Above, you wrote that the branchless instructions would be selected _if_
they are cheaper than the equivalent branch instructions. This is a condition,
thus the word 'unconditionally' doesn't fit.

What about this?
> Prefer branchless move instructions where cheaper.

[Bug modula2/114227] InstallTerminationProcedure does not work with -fiso

2024-03-04 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114227

Gaius Mulley  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Gaius Mulley  ---
Closing now that the patch has been applied.

[Bug sanitizer/114217] -fsanitize=alignment false positive with intended unaligned struct member access

2024-03-04 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114217

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #14 from Fangrui Song  ---
I agree with Jakub and Andrew.

The relevant rules: C11 6.3.2.3 says

> An integer may be converted to any pointer type. Except as previously 
> specified, the result is implementation-defined, might not be correctly 
> aligned, might not point to an entity of the referenced type, and might be a 
> trap representation.
>
> A pointer to an object type may be converted to a pointer to a different 
> object type. If the resulting pointer is not correctly aligned for the 
> referenced type, the behavior is undefined. ...

C++ [expr.static.cast]p14 says of conversions from a misaligned pointer:

> A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type 
> “pointer to cv2 T”, where T is an object type and cv2 is the same 
> cv-qualification as, or greater cv-qualification than, cv1. If the original 
> pointer value represents the address A of a byte in memory and A does not 
> satisfy the alignment requirement of T, then the resulting pointer value is 
> unspecified. ...

Which is allowed to be an invalid pointer value, which the compiler is then
permitted to give whatever semantics we like, such as disallowing it being
passed to memcpy.

---

memcpy is preferred for expressing an unaligned access.

typedef struct dir_entry dir_entry_u __attribute__((aligned(1)));
// In C++, there is an alternative: using dir_entry_u
__attribute__((aligned(1))) = dir_entry;

u64 gu(dir_entry_u *entry)
{
return entry->offset;
}

[Bug modula2/114227] InstallTerminationProcedure does not work with -fiso

2024-03-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114227

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Gaius Mulley :

https://gcc.gnu.org/g:d646db0e35ad9d235635b204349f5d960072f9fe

commit r14-9308-gd646db0e35ad9d235635b204349f5d960072f9fe
Author: Gaius Mulley 
Date:   Mon Mar 4 21:46:32 2024 +

PR modula2/114227 InstallTerminationProcedure does not work with -fiso

This patch moves the initial/termination user procedure functionality in
pim and iso versions of M2RTS into M2Dependent.  This ensures that
finalization/initialization procedures will always be invoked for both
-fiso
and -fpim.  Prior to this patch M2Dependent called M2RTS for
termination procedure cleanup and always invoked the pim M2RTS.

gcc/m2/ChangeLog:

PR modula2/114227
* gm2-libs-iso/M2RTS.mod (ProcedureChain): Remove.
(ProcedureList): Remove.
(ExecuteReverse): Remove.
(ExecuteTerminationProcedures): Rewrite.
(ExecuteInitialProcedures): Rewrite.
(AppendProc): Remove.
(InstallTerminationProcedure): Rewrite.
(InstallInitialProcedure): Rewrite.
(InitProcList): Remove.
* gm2-libs/M2Dependent.def (InstallTerminationProcedure):
New procedure.
(ExecuteTerminationProcedures): New procedure.
(InstallInitialProcedure): New procedure.
(ExecuteInitialProcedures): New procedure.
* gm2-libs/M2Dependent.mod (ProcedureChain): New type.
(ProcedureList): New type.
(ExecuteReverse): New procedure.
(ExecuteTerminationProcedures): New procedure.
(ExecuteInitialProcedures): New procedure.
(AppendProc): New procedure.
(InstallTerminationProcedure): New procedure.
(InstallInitialProcedure): New procedure.
(InitProcList): New procedure.
* gm2-libs/M2RTS.mod (ProcedureChain): Remove.
(ProcedureList): Remove.
(ExecuteReverse): Remove.
(ExecuteTerminationProcedures): Rewrite.
(ExecuteInitialProcedures): Rewrite.
(AppendProc): Remove.
(InstallTerminationProcedure): Rewrite.
(InstallInitialProcedure): Rewrite.
(InitProcList): Remove.

Signed-off-by: Gaius Mulley 

[Bug modula2/114227] InstallTerminationProcedure does not work with -fiso

2024-03-04 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114227

--- Comment #2 from Gaius Mulley  ---
Created attachment 57604
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57604=edit
Proposed fix

Here is the proposed patch which moves the initial/termination user procedure
functionality in
pim and iso versions of M2RTS into M2Dependent.  This ensures that
finalization/initialization procedures will always be invoked for both -fiso
and -fpim.  Prior to this patch M2Dependent called M2RTS for
termination procedure cleanup and always invoked the pim M2RTS.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #13 from Segher Boessenkool  ---
(In reply to Sarah Julia Kriesch from comment #12)
> I expect also, that this bug is a bigger case.

A bigger case of what?  What do you mean?

[Bug c++/114183] [11/12/13/14 Regression] Lambda constexpr works in msvc but not in gcc

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114183

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Andrew Pinski  ---
Invalid for the same reason as the clang issue is invalid.

[Bug libstdc++/114147] [11/12/13/14 Regression] tuple allocator-extended constructor requires non-explicit default constructor

2024-03-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114147

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:0a545ac7000501844670add0b3560ebdbcb123c6

commit r14-9307-g0a545ac7000501844670add0b3560ebdbcb123c6
Author: Jonathan Wakely 
Date:   Fri Mar 1 11:16:58 2024 +

libstdc++: Add missing std::tuple constructor [PR114147]

I caused a regression with commit r10-908 by adding a constraint to the
non-explicit allocator-extended default constructor, but seemingly
forgot to add an explicit overload with the corresponding constraint.

libstdc++-v3/ChangeLog:

PR libstdc++/114147
* include/std/tuple (tuple::tuple(allocator_arg_t, const Alloc&)):
Add missing overload of allocator-extended default constructor.
(tuple::tuple(allocator_arg_t, const Alloc&)): Likewise.
* testsuite/20_util/tuple/cons/114147.cc: New test.

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #18 from Jakub Jelinek  ---
I was looking at the sysdeps/ieee754/ldbl-128/ version, i.e. what is used for
hypotf128.

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Created attachment 57603
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57603=edit
gcc14-pr114211.patch

Untested fix.

[Bug target/114194] ICE when using std::unique_ptr with xtheadvector

2024-03-04 Thread bruce at hoult dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194

--- Comment #3 from Bruce Hoult  ---
Simpler example, found independently.

void *memset();
void a(void *b){ memset(b, 0, 1lu); }

There might be a lot of code that triggers this. Fortunately the source file
this happened in didn't actually use RVV (others did) so I was able to simply
use rv64gc for it.

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

--- Comment #5 from Jakub Jelinek  ---
Anyway, the actual bug is in the
r9-4082-g38e601118ca88adf0a472750b0da83f0ef1798a7
PR87507 change.
Either we need to punt if the rotate input and output overlaps, or handle that
case correctly.

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[13/14 Regression] wrong|[13/14 Regression] wrong
   |code with -O|code with -O
   |-fno-tree-coalesce-vars |-fno-tree-coalesce-vars
   ||since r13-1907
 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Started with r13-1907-g525a1a73a5a563c829a5f76858fe122c9b39f254

[Bug target/113010] [RISCV] sign-extension lost in comparison with constant embedded in comma-op expression

2024-03-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113010

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:901e7bdab70e2275723ac31dacbbce0b6f68f4f4

commit r14-9304-g901e7bdab70e2275723ac31dacbbce0b6f68f4f4
Author: Jakub Jelinek 
Date:   Mon Mar 4 19:23:02 2024 +0100

combine: Fix recent WORD_REGISTER_OPERATIONS check [PR113010]

On Mon, Mar 04, 2024 at 05:18:39PM +0100, Rainer Orth wrote:
> unfortunately, the patch broke Solaris/SPARC bootstrap
> (sparc-sun-solaris2.11):
>
> .../gcc/combine.cc: In function 'rtx_code simplify_comparison(rtx_code,
rtx_def**, rtx_def**)':
> .../gcc/combine.cc:12101:25: error: '*(unsigned int*)((char*)_mode
+ offsetof(scalar_int_mode, scalar_int_mode::m_mode))' may be used
uninitialized [-Werror=maybe-uninitialized]
> 12101 |   scalar_int_mode mode, inner_mode, tmode;
>   | ^~

I don't see how it could ever work properly, inner_mode in that spot is
just uninitialized.

I think we shouldn't worry about paradoxical subregs of non-scalar_int_mode
REGs/MEMs and for the scalar_int_mode ones should initialize inner_mode
before we use it.
Another option would be to use
maybe_lt (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))), BITS_PER_WORD)
and
load_extend_op (GET_MODE (SUBREG_REG (op0))) == ZERO_EXTEND,
or set machine_mode smode = GET_MODE (SUBREG_REG (op0)); and use it in
those two spots.

2024-03-04  Jakub Jelinek  

PR rtl-optimization/113010
* combine.cc (simplify_comparison): Guard the
WORD_REGISTER_OPERATIONS check on scalar_int_mode of SUBREG_REG
and initialize inner_mode.

[Bug c++/114114] [11/12/13/14 Regression] Internal compiler error on function-local conditional noexcept

2024-03-04 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114114

Marek Polacek  changed:

   What|Removed |Added

   Priority|P3  |P2
   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Status|NEW |ASSIGNED
 CC||mpolacek at gcc dot gnu.org

[Bug c++/114183] [11/12/13/14 Regression] Lambda constexpr works in msvc but not in gcc

2024-03-04 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114183

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #1 from Marek Polacek  ---
https://github.com/llvm/llvm-project/issues/83569 was closed so this is not a
bug?

[Bug tree-optimization/114206] [11/12/13/14 Regression] recursive function call vs local variable addresses

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114206

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||4.5.3
Summary|recursive function call vs  |[11/12/13/14 Regression]
   |local variable addresses|recursive function call vs
   ||local variable addresses
   Target Milestone|--- |11.5
  Known to fail||4.6.3, 4.7.3, 5.1.0

[Bug c++/110031] [11/12/13/14 Regression] ICE with deprecated attribute and NTTP and diagnostic for deprecated printed out so much

2024-03-04 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110031

Marek Polacek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug libstdc++/77776] C++17 std::hypot implementation is poor

2024-03-04 Thread mkretz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6

--- Comment #17 from Matthias Kretz (Vir)  ---
hypotf(a, b) is implemented using double precision and hypot(a, b) uses 80-bit
long double on i386 and x86_64 hypot does what you describe, right?

std::experimental::simd benchmarks of hypot(a, b), where simd_abi::scalar uses
the  implementation (i.e. glibc):


-march=skylake-avx512 -ffast-math -O3 -lmvec:
  TYPE  Latency Speedup Throughput
Speedup
  [cycles/call] [per value]  [cycles/call] [per
value]
 float, simd_abi::scalar   37.5   1   11.5 
 1
 float,37.6   0.999   10.2 
  1.13
 float, simd_abi::__sse  344.42   6.46 
  7.15
 float, simd_abi::__avx34.18.79   6.56 
  14.1
 float, simd_abi::_Avx512<32>  34.38.76   6.01 
  15.4
 float, simd_abi::_Avx512<64>  44.113.6 12 
  15.4
 float, [[gnu::vector_size(16)]]   58.32.57   47.5 
 0.974
 float, [[gnu::vector_size(32)]]1322.27104 
 0.892
 float, [[gnu::vector_size(64)]]240 2.5222 
 0.832
--
  TYPE  Latency Speedup Throughput
Speedup
  [cycles/call] [per value]  [cycles/call] [per
value]
double, simd_abi::scalar 81   1   21.5 
 1
double,80.11.01   21.3 
  1.01
double, simd_abi::__sse39.94.06   6.47 
  6.64
double, simd_abi::__avx40.28.05 12 
  7.14
double, simd_abi::_Avx512<32>  40.38.04 12 
  7.14
double, simd_abi::_Avx512<64>  56.211.5 24 
  7.14
double, [[gnu::vector_size(16)]]   89.31.81   42.5 
  1.01
double, [[gnu::vector_size(32)]]1502.16110 
 0.777
double, [[gnu::vector_size(64)]]2972.18242 
  0.71
--

-march=skylake-avx512 -O3 -lmvec:
  TYPE  Latency Speedup Throughput
Speedup
  [cycles/call] [per value]  [cycles/call] [per
value]
 float, simd_abi::scalar   37.6   1   10.4 
 1
 float,37.7   0.998   10.2 
  1.02  
 float, simd_abi::__sse37.6   4   8.83 
  4.71  
 float, simd_abi::__avx37.58.01   9.42 
  8.82
 float, simd_abi::_Avx512<64>  47.812.6 12 
  13.8
 float, [[gnu::vector_size(16)]]   98.71.52   57.2 
 0.727
 float, [[gnu::vector_size(32)]]151   2114 
 0.728
 float, [[gnu::vector_size(64)]]2602.31230 
 0.722
--
  TYPE  Latency Speedup Throughput
Speedup
  [cycles/call] [per value]  [cycles/call] [per
value]
double, simd_abi::scalar   79.7   1   21.7 
 1
double,80.1   0.995   21.6 
 1
double, simd_abi::__sse44.2 3.6   9.99 
  4.33
double, simd_abi::__avx43.67.32 12 
  7.21
double, simd_abi::_Avx512<64>  59.910.6 24 
  7.21
double, [[gnu::vector_size(16)]]   88.3 1.8   44.2 
  0.98
double, [[gnu::vector_size(32)]]1631.96115 
  0.75
double, [[gnu::vector_size(64)]]3022.11233 
 0.742
--

I have never ported my SIMD implementation back to scalar and benchmarked it
against glibc.

[Bug c++/110031] [11/12/13/14 Regression] ICE with deprecated attribute and NTTP and diagnostic for deprecated printed out so much

2024-03-04 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110031

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #5 from Marek Polacek  ---
Started with r8-4678-g6296cf8e099aae:

commit 6296cf8e099aae43c86a773f93d83a19df85d7e7
Author: Jason Merrill 
Date:   Thu Nov 16 15:13:48 2017 -0500

PR c++/79092 - non-type args of different types are different

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-04 Thread sarah.kriesch at opensuse dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #12 from Sarah Julia Kriesch  ---
Raise your hand if you need anything new from my side.
We have got enough use cases in our build system and upstream open source
projects gave warnings to remove the s390x support because of long building
time and the required resources.

I expect also, that this bug is a bigger case.

[Bug c++/114229] [modules] duplicate symbols when including stl in submodule

2024-03-04 Thread nickbegg at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114229

--- Comment #1 from Nick Begg  ---
gcc (GCC) 14.0.1 20240301 (experimental)

[Bug c++/103497] [11/12/13/14 Regression] ICE when decltype(auto)... as parameters

2024-03-04 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103497

Marek Polacek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Marek Polacek  ---
Fixed in GCC 14.

[Bug c++/114229] New: [modules] duplicate symbols when including stl in submodule

2024-03-04 Thread nickbegg at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114229

Bug ID: 114229
   Summary: [modules] duplicate symbols when including stl in
submodule
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nickbegg at gmail dot com
  Target Milestone: ---

using the same test src code as PR113930 -

// submod.mpp

module;

#include 

export module modA:submod;

// modA.mpp

module;

export module modA;

export import :submod;

// main.cpp

#include 

import modA;

std::string test_func() {
return "";
}

Note that this test code causes #113930 to check in a GCC debug build.
With a GCC release build, at link time numerous STL symbols become duplicated -

% /home/nick/inst/gcc-trunk-release/bin/g++ -freport-bug -g
CMakeFiles/moduleMin.dir/main.cpp.o CMakeFiles/moduleMin.dir/submod.mpp.o
CMakeFiles/moduleMin.dir/modA.mpp.o -o moduleMin

/usr/bin/ld: CMakeFiles/moduleMin.dir/modA.mpp.o:(.rodata+0x40): multiple
definition of `vtable for std::basic_ios >';
CMakeFiles/moduleMin.dir/main.cpp.o:(.rodata+0x950): first defined here
/usr/bin/ld: CMakeFiles/moduleMin.dir/modA.mpp.o:(.rodata+0x60): multiple
definition of `vtable for std::basic_ostream
>'; CMakeFiles/moduleMin.dir/main.cpp.o:(.rodata+0x8f0): first defined here
/usr/bin/ld: CMakeFiles/moduleMin.dir/modA.mpp.o:(.rodata+0xb0): multiple
definition of `VTT for std::basic_ostream
>'; CMakeFiles/moduleMin.dir/main.cpp.o:(.rodata+0x940): first defined here
/usr/bin/ld: CMakeFiles/moduleMin.dir/modA.mpp.o:(.rodata+0xc0): multiple
definition of `vtable for std::basic_istream
>'; CMakeFiles/moduleMin.dir/main.cpp.o:(.rodata+0x740): first defined here

[snip]

Note that #including  in both places (rather than string in main.cpp)
resolves the issue - Is the include guard mechanism failing?

[Bug middle-end/94787] Failure to detect single bit popcount pattern

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94787

--- Comment #7 from Andrew Pinski  ---
And add:
```
int h(int a)
{
if (a == 0) return 0;
return __builtin_popcount(a) == 1;
}

int h1(int a)
{
if (a == 0) return 1;
return __builtin_popcount(a) == 1;
}
```

h should be just `__builtin_popcount(a) == 1`.
While h1 should be just `__builtin_popcount(a) <= 1`.

[Bug tree-optimization/90693] Missing popcount simplifications

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #13 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #12)
> (In reply to Piotr Siupa from comment #11)
> > However, I've noticed that:
> > bool foo(unsigned x)
> > {
> > if (x == 0)
> > return true;
> > else
> > return std::has_single_bit(x);
> > }
> 
> 
> Oh that is because expand does not use flow sensitive ranges/non-zero bits
> there. There is talk about adding the ability for that but nothing has been
> done yet.

Well that also should be transformed into `__builtin_popcount(a) <= 1` which
then gets expanded into `(v & (v - 1)) == 0`. I will be handling both of those
via PR 94787 .

[Bug middle-end/94787] Failure to detect single bit popcount pattern

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94787

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> Note the expansion part is handled by r14-5612, r14-5613, and r14-6940 .
> 
> So now we just need the match part which I will handle for 15.

Actually the expansion part is not fully complete.
```
int f(int a)
{
return __builtin_popcount(a) <= 1;
}

int f1(int a)
{
return __builtin_popcount(a) == 1;
}
```

f1 is handled but f is not.
f should expand to `!(v & (v - 1))`.

The other match patterns needed:
```
int g(int a)
{
if (a == 0) return 0;
return __builtin_popcount(a) <= 1;
}

int g1(int a)
{
if (a == 0) return 1;
return __builtin_popcount(a) <= 1;
}
```

g should be transformed into just `__builtin_popcount(a) == 1`
and g1 should be transformed into just `__builtin_popcount(a) <= 1`.
Both during phi-opt.

[Bug rtl-optimization/101523] Huge number of combine attempts

2024-03-04 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523

--- Comment #11 from Segher Boessenkool  ---
Okay, so it is a function with a huge BB, so this is not a regression at all,
there will have been incredibly many combination attempts since the day combine
has existed.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #41 from Richard Sandiford  ---
(In reply to Richard Biener from comment #40)
> So I wonder if we can use "local costing" to decide a gather is always OK
> compared to the alternative with peeling for gaps.  On x86 gather tends
> to be slow compared to open-coding it.
Yeah, on SVE gathers are generally “enabling” instructions rather than
something to use for their own sake.

I suppose one problem is that we currently only try to use gathers for
single-element groups.  If we make a local decision to use gathers while
keeping that restriction, we could end up using gathers “unnecessarily” while
still needing to peel for gaps for (say) a two-element group.

That is, it's only better to use gathers than contiguous loads if by doing that
we avoid all need to peel for gaps (and if the cost of peeling for gaps was
high enough to justify the cost of using gathers over consecutive loads).

One of the things on the list to do (once everything is SLP!) is to support
loads with gaps directly via predication, so that we never load elements that
aren't needed.  E.g. on SVE, a 64-bit predicate (PTRUE .D) can be used with a
32-bit load (LD1W .S) to load only even-indexed elements.  So a single-element
group with a group size of 2 could be done cheaply with just consecutive loads,
without peeling for gaps.

[Bug c++/106207] [11/12/13/14 Regression] ICE in apply_fixit, at edit-context.cc:769

2024-03-04 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106207

Marek Polacek  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|mpolacek at gcc dot gnu.org|unassigned at gcc dot 
gnu.org

[Bug c++/103994] Module ICE in write_var_def with global variable in global module fragment

2024-03-04 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103994

Patrick Palka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ppalka at gcc dot 
gnu.org

[Bug analyzer/106390] Support gsl::owner and/or [[gnu::owner]] attribute in -fanalyzer

2024-03-04 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106390

--- Comment #6 from Jonathan Wakely  ---
Related work: http://thradams.com/cake/ownership.html

[Bug rtl-optimization/114208] RTL DSE deletes a store that is not dead

2024-03-04 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114208

--- Comment #5 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #4)
> Did it ever work?
No.  I allowed -mfuse-add=3 to reproduce this PR because there seems to be a
problem with DSE, and for the case that someone is going to fix it before it
bites an important target.  The mfuse-add optimization tries to avoid the
broken parts of DSE and works around it; documented are only -mfuse-add=0...2 
It was added Feb 2024 as PR114100.

>  I suppose 'st Y+,r20 is' post-inc so maybe DSE mishandles this somehow.
That post-inc is only generated after .dse2: .split2 splits some move insns:
These cores don't have reg+offset addressing, so the backend must pretend to
support it.  Then .split2 generates pointer-adjust + mem-access +
undo-pointer-adjust.  The address adjustments are plain additions of the
address register (frame pointer in this case) and have according
REG_CFA_ADJUST_CFA notes.  Then .dse2 removes some non-dead stores.  The 'st
Y+,r20' you mentioned is only generated by .avr-fuse-add which runs after
.dse2.

I'd guess that GCC is not ready for targets with such tight addressing modes?
(without reg+offset addressing; stack-pointer cannot be used either, the only
SP accesses are PUSH and POP).

ad "needs-bisection": -mfuse-add is a new target optimization added as PR114100
in Feb 2024, so bi-secting won't work because -mfuse-add is not recognized
prior to that date.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #40 from Richard Biener  ---
So I wonder if we can use "local costing" to decide a gather is always OK
compared to the alternative with peeling for gaps.  On x86 gather tends
to be slow compared to open-coding it.

In the future we might want to explore whether we can re-do costing for
alternatives without re-running all of the analysis at least for decisions
we know have only "local" effect.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #39 from Richard Sandiford  ---
(In reply to Richard Sandiford from comment #38)
> (In reply to Richard Biener from comment #37)
> > Even more iteration looks bad.  I do wonder why when gather can avoid
> > peeling for GAPs using load-lanes cannot?
> Like you say, we don't realise that all the loads from array3[i] form a
> single group.
Oops, sorry, I shouldn't have gone off memory.  So yeah, it's array1[] where
that happens, not array3[].  The reason we don't use load-lanes is that we
don't have load-lane instructions for smaller elements in larger containers, so
we're forced to use load-and-permute instead.

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars

2024-03-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |rtl-optimization
   Keywords|needs-bisection |

--- Comment #3 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #2)
> Possibly target independent rtl-optimization issue.

It is _subreg1 pass that converts:

(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
(rotate:TI (reg/v:TI 106 [ h ])
(const_int 64 [0x40]))) "pr114211.c":9:5 1042
{rotl64ti2_doubleword}
 (nil))

to:

(insn 39 7 40 2 (set (reg:DI 128 [ h+8 ])
(reg:DI 127 [ h ])) "pr114211.c":9:5 84 {*movdi_internal}
 (nil))
(insn 40 39 11 2 (set (reg:DI 127 [ h ])
(reg:DI 128 [ h+8 ])) "pr114211.c":9:5 84 {*movdi_internal}
 (nil))

Well... this won't swap. Either parallel should be emitted, or a temporary
should be used.

Adding -fno-split-wide-types fixes the testcase.

Re-confirmed as rtl-optimization problem.

[Bug tree-optimization/113632] Range info for a^CSTP2-1 could be improved in some cases

2024-03-04 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113632

Andrew Macleod  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #1 from Andrew Macleod  ---
(In reply to Andrew Pinski from comment #0)
> Take:
> ```
> void dummy();
> _Bool f(unsigned long a)
> {
> _Bool cmp = a > 8192;
> if (cmp) goto then; else goto e;
> then:
> unsigned long t = __builtin_clzl(a); // [0,50] 
> t^=63; // [13,63]
> return t >= 13;
> e:
>   dummy();
>   return 0;
> }
> ```
> 
> Currently after the t^=63; we get:
> ```
>   # RANGE [irange] int [1, 63] MASK 0x3f VALUE 0x0
>   _7 = _1 ^ 63;
> ```
> 
> But this could/should be improved to [13,63].
> 
> If we change to using minus instead:
> ```
> t = 63 - t;
> ```
> 
> We get the better range and the comparison (t >= 13) is optimized away.
> ```
> Folding statement: t_10 = 63 - t_9;
> Global Exported: t_10 = [irange] long unsigned int [13, 63] MASK 0x3f VALUE
> 0x0
> Not folded
> ```
> 
> Yes this should up in real code, see the LLVM issue for more information on
> that.

I think the current implementation of "operator_bitwise_xor::wi_fold ()" in
range-op.cc  was simply ported from the original version we used in the old VRP
code.  so it is neither multi-range awre, nor been enhanced.

If you put a break point there, you'll see its getting:

(gdb) p lh_lb.dump()
[0], precision = 32
$1 = void
(gdb) p lh_ub.dump()
[0x32], precision = 32
$2 = void
(gdb) p rh_ub.dump()
[0x3f], precision = 32
$3 = void
(gdb) p rh_lb.dump()
[0x3f], precision = 32
$4 = void

One could conceivable do something much better than the general masking stuff
that goes on if rh_lb == rh_ub.  I suspect we could probably do a better job in
general, but have never looked at it.

It also looks like we make some minor attempts with signed values in
wi_optimize_signed_bitwise_op (),   but again, I do not think anyone has tried
to make this code do anything new yet.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #38 from Richard Sandiford  ---
(In reply to Richard Biener from comment #37)
> Even more iteration looks bad.  I do wonder why when gather can avoid
> peeling for GAPs using load-lanes cannot?
Like you say, we don't realise that all the loads from array3[i] form a single
group.

Note that we're not using load-lanes in either case, since the group size (8)
is too big for that.  But load-lanes and load-and-permute have the same
restriction about when peeling for gaps is required.

In contrast, gather loads only ever load data that they actually need.

> Also for the stores we seem to use elementwise stores rather than store-lanes.
What configuration are you trying?  The original report was about SVE, so I was
trying that.  There we use a scatter store.

> To me the most obvious thing to try optimizing in this testcase is DR
> analysis.  With -march=armv8.3-a I still see
> 
> t.c:26:22: note:   === vect_analyze_data_ref_accesses ===
> t.c:26:22: note:   Detected single element interleaving array1[0][_8] step 4
> t.c:26:22: note:   Detected single element interleaving array1[1][_8] step 4
> t.c:26:22: note:   Detected single element interleaving array1[2][_8] step 4
> t.c:26:22: note:   Detected single element interleaving array1[3][_8] step 4
> t.c:26:22: note:   Detected single element interleaving array1[0][_1] step 4
> t.c:26:22: note:   Detected single element interleaving array1[1][_1] step 4
> t.c:26:22: note:   Detected single element interleaving array1[2][_1] step 4
> t.c:26:22: note:   Detected single element interleaving array1[3][_1] step 4
> t.c:26:22: missed:   not consecutive access array2[_4][_8] = _69;
> t.c:26:22: note:   using strided accesses
> t.c:26:22: missed:   not consecutive access array2[_4][_1] = _67;
> t.c:26:22: note:   using strided accesses
> 
> so we don't figure
> 
> Creating dr for array1[0][_1]
> base_address: 
> offset from base address: (ssizetype) ((sizetype) (m_111 * 2) * 2)
> constant offset from base address: 0
> step: 4
> base alignment: 16
> base misalignment: 0
> offset alignment: 4
> step alignment: 4
> base_object: array1
> Access function 0: {m_111 * 2, +, 2}_4
> Access function 1: 0
> Creating dr for array1[0][_8]
> analyze_innermost: success.
> base_address: 
> offset from base address: (ssizetype) ((sizetype) (m_111 * 2 + 1) *
> 2)
> constant offset from base address: 0
> step: 4
> base alignment: 16
> base misalignment: 0
> offset alignment: 2
> step alignment: 4
> base_object: array1
> Access function 0: {m_111 * 2 + 1, +, 2}_4
> Access function 1: 0
> 
> belong to the same group (but the access functions tell us it worked out).
> Above we fail to split the + 1 to the constant offset.
OK, but this is moving the question on to how we should optimise the testcase
for Advanced SIMD rather than SVE, and how we should optimise the testcase in
general, rather than simply recover what we could do before.  (SVE is only
enabled for -march=arvm9-a and above, in case armv8.3-a was intended to enable
SVE too.)

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #37 from Richard Biener  ---
(In reply to Richard Sandiford from comment #36)
> Created attachment 57602 [details]
> proof-of-concept patch to suppress peeling for gaps
> 
> This patch does what I suggested in the previous comment: if the loop needs
> peeling for gaps, try again without that, and pick the better loop.  It
> seems to restore the original style of code for SVE.
> 
> A more polished version would be a bit smarter about when to retry.  E.g.
> it's pointless if the main loop already operates on full vectors (i.e. if
> peeling 1 iteration is natural in any case).  Perhaps the condition should
> be that either (a) the number of epilogue iterations is known to be equal to
> the VF of the main loop or (b) the target is known to support partial
> vectors for the loop's vector_mode.
> 
> Any thoughts?

Even more iteration looks bad.  I do wonder why when gather can avoid
peeling for GAPs using load-lanes cannot?  Also for the stores we
seem to use elementwise stores rather than store-lanes.

To me the most obvious thing to try optimizing in this testcase is DR
analysis.  With -march=armv8.3-a I still see

t.c:26:22: note:   === vect_analyze_data_ref_accesses ===
t.c:26:22: note:   Detected single element interleaving array1[0][_8] step 4
t.c:26:22: note:   Detected single element interleaving array1[1][_8] step 4
t.c:26:22: note:   Detected single element interleaving array1[2][_8] step 4
t.c:26:22: note:   Detected single element interleaving array1[3][_8] step 4
t.c:26:22: note:   Detected single element interleaving array1[0][_1] step 4
t.c:26:22: note:   Detected single element interleaving array1[1][_1] step 4
t.c:26:22: note:   Detected single element interleaving array1[2][_1] step 4
t.c:26:22: note:   Detected single element interleaving array1[3][_1] step 4
t.c:26:22: missed:   not consecutive access array2[_4][_8] = _69;
t.c:26:22: note:   using strided accesses
t.c:26:22: missed:   not consecutive access array2[_4][_1] = _67;
t.c:26:22: note:   using strided accesses

so we don't figure

Creating dr for array1[0][_1]
base_address: 
offset from base address: (ssizetype) ((sizetype) (m_111 * 2) * 2)
constant offset from base address: 0
step: 4
base alignment: 16
base misalignment: 0
offset alignment: 4
step alignment: 4
base_object: array1
Access function 0: {m_111 * 2, +, 2}_4
Access function 1: 0
Creating dr for array1[0][_8]
analyze_innermost: success.
base_address: 
offset from base address: (ssizetype) ((sizetype) (m_111 * 2 + 1) * 2)
constant offset from base address: 0
step: 4
base alignment: 16
base misalignment: 0
offset alignment: 2
step alignment: 4
base_object: array1
Access function 0: {m_111 * 2 + 1, +, 2}_4
Access function 1: 0

belong to the same group (but the access functions tell us it worked out).
Above we fail to split the + 1 to the constant offset.

See my hint to use int32_t m instead of uint32_t yielding

t.c:26:22: note:   Detected interleaving load of size 2
t.c:26:22: note:_2 = array1[0][_1];
t.c:26:22: note:_9 = array1[0][_8];
t.c:26:22: note:   Detected interleaving load of size 2
t.c:26:22: note:_18 = array1[1][_1];
t.c:26:22: note:_23 = array1[1][_8];
t.c:26:22: note:   Detected interleaving load of size 2
t.c:26:22: note:_32 = array1[2][_1];
t.c:26:22: note:_37 = array1[2][_8];
t.c:26:22: note:   Detected interleaving load of size 2
t.c:26:22: note:_46 = array1[3][_1];
t.c:26:22: note:_51 = array1[3][_8];
t.c:26:22: note:   Detected interleaving store of size 2
t.c:26:22: note:array2[_4][_1] = _67;
t.c:26:22: note:array2[_4][_8] = _69;

(and SLP being thrown away because we can use load/store lanes)

[Bug c++/107688] [C++23] P2615 - Meaningful exports

2024-03-04 Thread nshead at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107688

Nathaniel Shead  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |nshead at gcc dot 
gnu.org
 CC||nshead at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #1 from Nathaniel Shead  ---
Proposed patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647120.html

[Bug c/114226] ICE on valid vanilla code when RVV xtheadvector enabled

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114226

Andrew Pinski  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 114194 ***

[Bug target/114194] ICE when using std::unique_ptr with xtheadvector

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194

Andrew Pinski  changed:

   What|Removed |Added

 CC||bruce at hoult dot org

--- Comment #2 from Andrew Pinski  ---
*** Bug 114226 has been marked as a duplicate of this bug. ***

[Bug testsuite/114221] gcc.c-torture/execute/20101011-1.c fails for H8/300

2024-03-04 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114221

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug middle-end/114157] during GIMPLE pass: bitintlower ICE: in lower_stmt, at gimple-lower-bitint.cc:5577 with -O with _BitInt(256) / vector memmove

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114157

--- Comment #1 from Jakub Jelinek  ---
Ah, we need to handle BIT_FIELD_REF from some SSA_NAME to large/huge _BitInt:
void foo (vector(8) long int s)
{
  _BitInt(256) _2;

   [local count: 1073741824]:
  _2 = BIT_FIELD_REF ;
  MEM <_BitInt(256)> [(char * {ref-all})] = _2;

maybe also BIT_FIELD_REF from large/huge _BitInt to non-bitint and maybe also
from/to large/huge _BitInt.  Though, I really can't reproduce those cases right
now, so it would be purely theoretical.

[Bug middle-end/114197] [14 regression] ICE in verify_dominators

2024-03-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114197

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:8fdac08b4d5f65973164a476bd255533ed97a766

commit r14-9296-g8fdac08b4d5f65973164a476bd255533ed97a766
Author: Richard Biener 
Date:   Mon Mar 4 13:28:34 2024 +0100

tree-optimization/114197 - unexpected if-conversion for vectorization

The following avoids lowering a volatile bitfiled access and in case
the if-converted and original loops end up in different outer loops
because of simplifcations enabled scrap the result since that is not
how the vectorizer expects the loops to be laid out.

PR tree-optimization/114197
* tree-if-conv.cc (bitfields_to_lower_p): Do not lower if
there are volatile bitfield accesses.
(pass_if_conversion::execute): Throw away result if the
if-converted and original loops are not nested as expected.

* gcc.dg/torture/pr114197.c: New testcase.

[Bug middle-end/114197] [14 regression] ICE in verify_dominators

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114197

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #7 from Richard Biener  ---
Fixed both issues.

[Bug tree-optimization/114228] [14 Regression] memset/memcpy loop not always recognised with -Os

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114228

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Andrew Pinski  ---
Looks like this was on purpose, see PR 111583 for more analysis.

Basically if buff/input were either NULL, then this would have been an invalid
transformation.

So invalid.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548

--- Comment #6 from Robin Dapp  ---
Honestly, I don't know how to analyze/debug this without a zen4, in particular
as it only seems to happen with PGO.  I tried locally but of course the
execution time doesn't change (same as with zen3 according to the database).
Is there a way to obtain the binaries in order to tell a difference?

[Bug debug/92387] [11/12/13 Regression] gcc generates wrong debug information at -O1 since r10-1907-ga20f263ba1a76a

2024-03-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92387

--- Comment #5 from Jan Hubicka  ---
The revision is changing inlining decisions, so it would be probably possible
to reproduce the problem without that change with right alaways_inline and
noinline attributes.

[Bug tree-optimization/114228] [14 Regression] memset/memcpy loop not always recognised with -Os

2024-03-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114228

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||needs-bisection
   Last reconfirmed||2024-03-04
 Status|UNCONFIRMED |NEW
   Target Milestone|--- |14.0
Summary|memset/memcpy loop not  |[14 Regression]
   |always recognised with -Os  |memset/memcpy loop not
   ||always recognised with -Os

--- Comment #1 from Andrew Pinski  ---
Confirmed.

The IR is the same before coming into ldist .

ldist in 13.2.0 had:
```
ldist creates useful parallel partition:
  0, 1, 2, 3, 4
Applying pattern match.pd:365, generic-match.cc:23462
distribute loop <1> into partitions:
```

But the trunk:
```
ldist asked to generate code for vertex 3
ldist creates useful parallel partition:
  0, 1, 2, 3, 4
Loop 1 not distributed.
```

But no reason why though.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-03-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #36 from Richard Sandiford  ---
Created attachment 57602
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57602=edit
proof-of-concept patch to suppress peeling for gaps

This patch does what I suggested in the previous comment: if the loop needs
peeling for gaps, try again without that, and pick the better loop.  It seems
to restore the original style of code for SVE.

A more polished version would be a bit smarter about when to retry.  E.g. it's
pointless if the main loop already operates on full vectors (i.e. if peeling 1
iteration is natural in any case).  Perhaps the condition should be that either
(a) the number of epilogue iterations is known to be equal to the VF of the
main loop or (b) the target is known to support partial vectors for the loop's
vector_mode.

Any thoughts?

[Bug middle-end/114197] [14 regression] ICE in verify_dominators

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114197

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Richard Biener  ---
Fixed I assume.

[Bug rtl-optimization/114190] [14 regression] Wrong code with -O2 -fno-dce -fharden-compares -mvpclmulqdq --param=max-rtl-if-conversion-unpredictable-cost=136

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114190

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug rtl-optimization/114228] New: memset/memcpy loop not always recognised with -Os

2024-03-04 Thread denis.campredon at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114228

Bug ID: 114228
   Summary: memset/memcpy loop not always recognised with -Os
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: denis.campredon at gmail dot com
  Target Milestone: ---

typedef __SIZE_TYPE__ size_t;
void baz(char *);

void foo( char *__restrict buff, const char*__restrict input)
{
size_t max = __builtin_strlen (input);
for(size_t i = 0 ; i < max; ++i)
buff[i] = 0;

baz(buff);
}

void bar( char *__restrict buff, const char*__restrict input)
{
size_t max = __builtin_strlen (input);
for(size_t i = 0 ; i < max; ++i)
buff[i] = input[i];

baz(buff);
}
--

The code above, compiled with -Os, the current trunk fails to convert the two
loops into memcpy/memset.

gcc 13.2 is able to convert the loops into a call.

[Bug tree-optimization/114108] [14 regression] ICE when building opencv-4.8.1 (error: type mismatch in binary expression) since r14-1833

2024-03-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114108

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

  1   2   >