[Bug middle-end/88670] [meta-bug] generic vector extension issues

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88670
Bug 88670 depends on bug 112787, which changed state.

Bug 112787 Summary: Codegen regression of large GCC vector extensions when 
enabling SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112787

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/112787] Codegen regression of large GCC vector extensions when enabling SVE

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112787

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |14.0

--- Comment #4 from Andrew Pinski  ---
Fixed.

[Bug libgomp/113192] [11/12/13/14 Regression] ERROR: couldn't execute "../../../gcc/libgomp/testsuite/flock": no such file or directory

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113192

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.5

[Bug target/113116] [14 Regression] ~11-17% exec time regression of 436.cactusADM on aarch64

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113116

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug middle-end/113199] [14 Regression][GCN] ICE (segfault) due to invalid 'loop_mask_46 = VEC_PERM_EXPR' when compiling Newlib's wcsftime.c

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113199

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug lto/113208] [14 Regression] lto1: error: Alias and target's comdat groups differs

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113208

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug lto/113204] [14 Regression] lto1: error: qsort comparator non-negative on sorted output: 64

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113204

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug sanitizer/105347] [12/13/14 Regression] Failed to build from source on FreeBSD 11.* due to using nonexistent sha224.h

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105347

--- Comment #2 from Andrew Pinski  ---
https://github.com/llvm/llvm-project/commit/18a7ebda99044473fdbce6376993714ff54e6690

[Bug sanitizer/105347] [12/13/14 Regression] Failed to build from source on FreeBSD 11.* due to using nonexistent sha224.h

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105347

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||build
Summary|Failed to build from source |[12/13/14 Regression]
   |on FreeBSD 11.* due to  |Failed to build from source
   |using nonexistent sha224.h  |on FreeBSD 11.* due to
   ||using nonexistent sha224.h
   Target Milestone|--- |12.4

[Bug other/113188] graphite-isl-ast-to-gimple.c: ‘isl_val_free’ was not declared in this scope

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113188

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=86724

--- Comment #2 from Andrew Pinski  ---
https://gcc.gnu.org/pipermail/gcc-patches/2017-February/469402.html I think was
the fix here ...

[Bug bootstrap/113181] When compiling sanitizer_printf.cc, getting error: multiple definition of ‘enum fsconfig_command’

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113181

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=106266,
   ||https://github.com/llvm/llv
   ||m-project/issues/56421

--- Comment #3 from Andrew Pinski  ---
libgo was fixed with PR 106266.

libsanitizer's upstream bug was
https://github.com/llvm/llvm-project/issues/56421 and the fix is referenced
there.

[Bug target/113248] RISC-V: Invalid vsetvli fusion using -mtune=generic-ooo

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113248

--- Comment #1 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:6cf47447f6fba84a17864fc7a19a532a62b6e736

commit r14-6967-g6cf47447f6fba84a17864fc7a19a532a62b6e736
Author: Juzhe-Zhong 
Date:   Sat Jan 6 13:10:38 2024 +0800

RISC-V: Update MAX_SEW for available vsevl info[VSETVL PASS]

This patch fixes a bug of VSETVL PASS in this following situation:

Ignore curr info since prev info available with it:
  prev_info: VALID (insn 8, bb 2)
Demand fields: demand_ratio_and_ge_sew demand_avl
SEW=16, VLMUL=mf4, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(const_int 1 [0x1])
VL=(nil)
  curr_info: VALID (insn 12, bb 2)
Demand fields: demand_ge_sew demand_non_zero_avl
SEW=16, VLMUL=m1, RATIO=16, MAX_SEW=32
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(const_int 1 [0x1])
VL=(nil)

We should update prev_info MAX_SEW from 64 into 32.

Before this patch:
foo:
vsetivlizero,1,e64,m1,ta,ma
vle64.v v1,0(a1)
vmv.s.x v3,a0
vfmv.s.fv2,fa0
vadd.vv v1,v1,v1
ret

After this patch:
foo:
vsetivlizero,1,e16,mf4,ta,ma
vle64.v v1,0(a1)
vmv.s.x v3,a0
vfmv.s.fv2,fa0
vsetvli zero,zero,e64,m1,ta,ma
vadd.vv v1,v1,v1
ret

Tested on both RV32 and RV64 no regression. Committed.

PR target/113248

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::fuse_local_vsetvl_info):
Update the MAX_SEW.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr113248.c: New test.

[Bug libstdc++/106308] Consider using statx(2) for std::filesystem

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106308

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-01-06
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
.

[Bug libstdc++/113250] New: std::filesystem::equivalent("", "/") should throw

2024-01-05 Thread davidepesa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113250

Bug ID: 113250
   Summary: std::filesystem::equivalent("", "/") should throw
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: davidepesa at gmail dot com
  Target Milestone: ---

According to https://eel.is/c++draft/fs.op.equivalent

> !exists(p1) || !exists(p2) is an error.

However, equivalent("", "/") returns false instead of throwing
filesystem_error.

This is possibly due to a typo here:
https://github.com/gcc-mirror/gcc/blob/5a0b3355d956f5d36f9b562e027b890cc5f61d88/libstdc%2B%2B-v3/src/c%2B%2B17/fs_ops.cc#L900
(&& instead of ||)

[Bug target/113249] New: RISC-V: regression testsuite errors -mtune=generic-ooo

2024-01-05 Thread ewlu at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113249

Bug ID: 113249
   Summary: RISC-V: regression testsuite errors -mtune=generic-ooo
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ewlu at rivosinc dot com
  Target Milestone: ---

Configuration: 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mtune=generic-ooo/-mcmodel=medlow

This is a tracker bug for all of the testsuite failures using the generic-ooo
tune instead of rocket

=== gfortran: Unexpected fails for rv64gcv lp64d medlow ===
FAIL: gfortran.dg/ieee/ieee_6.f90   -O0  execution test
FAIL: gfortran.dg/ieee/ieee_6.f90   -O1  execution test
FAIL: gfortran.dg/ieee/ieee_6.f90   -O2  execution test
FAIL: gfortran.dg/ieee/ieee_6.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/ieee_6.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/ieee_6.f90   -Os  execution test
FAIL: gfortran.dg/ieee/modes_1.f90   -O0  execution test
FAIL: gfortran.dg/ieee/modes_1.f90   -O1  execution test
FAIL: gfortran.dg/ieee/modes_1.f90   -O2  execution test
FAIL: gfortran.dg/ieee/modes_1.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/ieee/modes_1.f90   -O3 -g  execution test
FAIL: gfortran.dg/ieee/modes_1.f90   -Os  execution test
FAIL: gfortran.dg/vect/vect-8.f90   -O   scan-tree-dump-times vect "vectorized
2[234] loops" 1
=== gcc: Unexpected fails for rv64gcv lp64d medlow ===
FAIL: gcc.dg/debug/btf/btf-datasec-3.c scan-assembler-times bts_type 3
FAIL: gcc.dg/debug/btf/btf-datasec-3.c scan-assembler-times bts_type:
\\(BTF_KIND_VAR 'test_bss2'\\) 1
FAIL: gcc.dg/debug/btf/btf-datasec-3.c scan-assembler-times bts_type:
\\(BTF_KIND_VAR 'test_data2'\\) 1
FAIL: c-c++-common/spec-barrier-1.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/vector-subscript-4.c  -Wc++-compat   scan-tree-dump-not
optimized "vector"
FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 72)
FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 (test for warnings, line 77)
FAIL: gcc.dg/Wstringop-overflow-47.c pr97027 note (test for warnings, line 68)
XPASS: gcc.dg/attr-alloc_size-11.c missing range info for short (test for
warnings, line 51)
XPASS: gcc.dg/attr-alloc_size-11.c missing range info for signed char (test for
warnings, line 50)
FAIL: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/pr30957-1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"
FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "\\s+>\\s+{ 0(, 0)+ }"
FAIL: gcc.dg/signbit-2.c scan-tree-dump-not optimized "\\s+>>\\s+31"
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "Not unrolling loop, doesn't
roll"
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "likely upper bound: 6"
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "realistic bound: -1"
FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"
FAIL: gcc.dg/tree-prof/val-prof-1.c scan-tree-dump optimized "if \\(n_[0-9]* !=
257\\)"
FAIL: gcc.dg/tree-prof/val-prof-3.c scan-tree-dump optimized "if \\(_[0-9]* \\<
n_[0-9]*"
FAIL: gcc.dg/tree-prof/val-prof-4.c scan-tree-dump optimized "if \\(n_[0-9]*
\\>"
FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Conditional
combines static and invariant" 1
FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Will duplicate
bb" 2
FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump cunroll "optimized: loop with
[0-9]+ iterations completely unrolled"
FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump-not optimized "foo"
FAIL: gcc.dg/tree-ssa/gen-vect-34.c scan-tree-dump-times vect "vectorized 1
loops" 1
FAIL: gcc.dg/tree-ssa/ivopts-lt-2.c scan-tree-dump-times ivopts "PHI 

[Bug libstdc++/113246] Behavior of std::filesystem::weakly_canonical with non-existing relative paths

2024-01-05 Thread davidepesa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113246

--- Comment #5 from Davide Pesavento  ---
(In reply to Jonathan Wakely from comment #2)
> > If there are no leading elements of p that exist, should canonical() be
> > called with an empty path? or should it not be called at all?
> 
> It makes no sense for weakly_canonical to ever call canonical with an empty
> path, since that would always report an error (i.e. throw or set ec and
> return an empty path). That would make it completely useless for paths with
> no prefix that already exists. So if there are no leading elements of p that
> exist, then obviously canonical should not be called. The alternative makes
> no sense.
> 
> So the behaviour of weakly_canonical seems correct to me. If any leading
> elements exist, then canonical is called on them, which returns an absolute
> path, and then the non-existing elements are appended to that. If there are
> no existing elements, then you get a relative path.

Thanks. Not the answer I was hoping for(*) but I guess it does make sense
logically.

(*) because it means that weakly_canonical("foo") and weakly_canonical("./foo")
give different results. I guess weakly_canonical(absolute(whatever)) is what I
need (in most cases).

[Bug target/113248] New: RISC-V: Invalid vsetvli fusion using -mtune=generic-ooo

2024-01-05 Thread ewlu at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113248

Bug ID: 113248
   Summary: RISC-V: Invalid vsetvli fusion using
-mtune=generic-ooo
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ewlu at rivosinc dot com
  Target Milestone: ---

Opening a new bug instead of reopening other one since the configuration is
different. 

Same testcase as PR111037. Switching cost model should not cause program to
crash

foo:
vsetivlizero,1,e64,m1,ta,ma
vle64.v v1,0(a1)
vmv.s.x v3,a0
vfmv.s.fv2,fa0 # illegal insn still
vadd.vv v1,v1,v1

Configuration:
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mtune=generic-ooo/-mcmodel=medlow

Compilation:
./build-gcc-linux-stage2/gcc/xgcc -B./build-gcc-linux-stage2/gcc/ 
../gcc/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-3.c  -march=rv64gcv
-mabi=lp64d -mtune=generic-ooo -mcmodel=medlow   -fdiagnostics-plain-output 
-O0 --param=riscv-autovec-preference=scalable -march=rv32gc_zve64f_zvfh
-mabi=ilp32d -O3 -ffat-lto-objects -fno-ident -S   -o pr111037-3.s

Godbolt:
https://godbolt.org/z/q3779xnab

[Bug libstdc++/113246] Behavior of std::filesystem::weakly_canonical with non-existing relative paths

2024-01-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113246

--- Comment #4 from Jonathan Wakely  ---
I think the exception for a deleted pwd is also correct.

weakly_canonical uses status(".") to check if the current directory exists.
That returns true, even if the directory has been deleted (it still exists
because there are open descriptors referring to it, you just can't find it by
name any longer). Because the directory exists, we call canonical(".") to get
its absolute name. But that fails, because it uses current_path()/p and
absolute_path uses getcwd() which fails with ENOENT.

The directory exists, but you can't get its name, so you can't canonicalize it.

So I think the libstdc++ behaviour conforms to the standard.

I think it might also be conforming if weakly_canonical detected that
exists(".") is insufficient to guarantee that we can call canonical. Since the
point of weakly_canonical is to handle non-existing directories, I think it
would be useful to add a special case for relative paths explicitly beginning
with "." if current_path() fails. But I don't think the standard requires us to
do that.

[Bug libstdc++/113246] Behavior of std::filesystem::weakly_canonical with non-existing relative paths

2024-01-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113246

--- Comment #3 from Jonathan Wakely  ---
(In reply to Davide Pesavento from comment #1)
> Another interesting(?) behavior, when run from a non-existing (deleted)
> working directory:
> 
> weakly_canonical("foo") returns "foo", while weakly_canonical("./foo")
> throws:
> 
> > terminate called after throwing an instance of 
> > 'std::filesystem::__cxx11::filesystem_error'
> >   what():  filesystem error: cannot make canonical path: No such file or 
> > directory [.]

This case is a bit more interesting. I'll have to investigate further.

[Bug libstdc++/113246] Behavior of std::filesystem::weakly_canonical with non-existing relative paths

2024-01-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113246

--- Comment #2 from Jonathan Wakely  ---
(In reply to Davide Pesavento from comment #0)
> note that canonical("") currently throws a filesystem_error)

That's clearly correct, as canonical says that !exists(p) is an error.

> If there are no leading elements of p that exist, should canonical() be
> called with an empty path? or should it not be called at all?

It makes no sense for weakly_canonical to ever call canonical with an empty
path, since that would always report an error (i.e. throw or set ec and return
an empty path). That would make it completely useless for paths with no prefix
that already exists. So if there are no leading elements of p that exist, then
obviously canonical should not be called. The alternative makes no sense.

So the behaviour of weakly_canonical seems correct to me. If any leading
elements exist, then canonical is called on them, which returns an absolute
path, and then the non-existing elements are appended to that. If there are no
existing elements, then you get a relative path.

[Bug c/113247] New: RISC-V: Performance bug in SHA256

2024-01-05 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247

Bug ID: 113247
   Summary: RISC-V: Performance bug in SHA256
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

We found we have a performance bug while testing with various benchmarks.

This is the case from coremark-pro SHA256:

Tested on spike vector vs scalar in case of dynamic instruction count:

897210195 (vector) vs 418451694 (scalar)

Obviously vector dynamic instruction count as twice as scalar.

We tested on our hardware board and Thead C908 (K230),

Both vector performance drop about 60%+ in the real hardware in case of vector
vs scalar.

After investigation, the performance bug issue happens in the following case:

https://compiler-explorer.com/z/GcsnK7edn

#include 

#define Ch(x,y,z)   (z ^ (x & (y ^ z)))
#define Maj(x,y,z)  ((x & y) | (z & (x | y)))

#define SHR(x, n)(x >> n)
#define ROTR(x,n)(SHR(x,n) | (x << (32 - n)))
#define S1(x)(ROTR(x, 6) ^ ROTR(x,11) ^ ROTR(x,25))
#define S0(x)(ROTR(x, 2) ^ ROTR(x,13) ^ ROTR(x,22))

#define s1(x)(ROTR(x,17) ^ ROTR(x,19) ^  SHR(x,10))
#define s0(x)(ROTR(x, 7) ^ ROTR(x,18) ^  SHR(x, 3))

#define SHA256_STEP(a,b,c,d,e,f,g,h,x,K) \
{\
tmp1 = h + S1(e) + Ch(e,f,g) + K + x;\
tmp2 = S0(a) + Maj(a,b,c);   \
h  = tmp1 + tmp2;\
d += tmp1;   \
}

#define BE_LOAD32(n,b,i) (n) = byteswap(*(uint32_t *)(b + i))

static uint32_t byteswap(uint32_t x)
{
x = (x & 0x) << 16 | (x & 0x) >> 16;
x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;  

return x;
}

void sha256 (const uint8_t *in, uint32_t out[8])
{
uint32_t tmp1, tmp2, a, b, c, d, e, f, g, h;
uint32_t w0, w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12, w13, w14,
w15;

tmp1 = tmp2 = 0;
w0 = w1 = w2 = w3 = w4 = w5 = w6 = w7 = w8 = w9 = w10 = w11 = w12 = w13 =
w14 = w15 = 0;

BE_LOAD32 (  w0, in,  0 );
BE_LOAD32 (  w1, in,  4 );
BE_LOAD32 (  w2, in,  8 );
BE_LOAD32 (  w3, in, 12 );
BE_LOAD32 (  w4, in, 16 );
BE_LOAD32 (  w5, in, 20 );
BE_LOAD32 (  w6, in, 24 );
BE_LOAD32 (  w7, in, 28 );
BE_LOAD32 (  w8, in, 32 );
BE_LOAD32 (  w9, in, 36 );
BE_LOAD32 ( w10, in, 40 );
BE_LOAD32 ( w11, in, 44 );
BE_LOAD32 ( w12, in, 48 );
BE_LOAD32 ( w13, in, 52 );
BE_LOAD32 ( w14, in, 56 );
BE_LOAD32 ( w15, in, 60 );

a = out[0];
b = out[1];
c = out[2];
d = out[3];
e = out[4];
f = out[5];
g = out[6];
h = out[7];

SHA256_STEP(a, b, c, d, e, f, g, h,  w0, 0x428a2f98);
SHA256_STEP(h, a, b, c, d, e, f, g,  w1, 0x71374491);
SHA256_STEP(g, h, a, b, c, d, e, f,  w2, 0xb5c0fbcf);
SHA256_STEP(f, g, h, a, b, c, d, e,  w3, 0xe9b5dba5);
SHA256_STEP(e, f, g, h, a, b, c, d,  w4, 0x3956c25b);
SHA256_STEP(d, e, f, g, h, a, b, c,  w5, 0x59f111f1);
SHA256_STEP(c, d, e, f, g, h, a, b,  w6, 0x923f82a4);
SHA256_STEP(b, c, d, e, f, g, h, a,  w7, 0xab1c5ed5);
SHA256_STEP(a, b, c, d, e, f, g, h,  w8, 0xd807aa98);
SHA256_STEP(h, a, b, c, d, e, f, g,  w9, 0x12835b01);
SHA256_STEP(g, h, a, b, c, d, e, f, w10, 0x243185be);
SHA256_STEP(f, g, h, a, b, c, d, e, w11, 0x550c7dc3);
SHA256_STEP(e, f, g, h, a, b, c, d, w12, 0x72be5d74);
SHA256_STEP(d, e, f, g, h, a, b, c, w13, 0x80deb1fe);
SHA256_STEP(c, d, e, f, g, h, a, b, w14, 0x9bdc06a7);
SHA256_STEP(b, c, d, e, f, g, h, a, w15, 0xc19bf174);

w0 = s1(w14) + w9 + s0(w1) + w0;
SHA256_STEP(a, b, c, d, e, f, g, h,  w0, 0xe49b69c1);
w1 = s1(w15) + w10 + s0(w2) + w1;
SHA256_STEP(h, a, b, c, d, e, f, g,  w1, 0xefbe4786);
w2 = s1(w0) + w11 + s0(w3) + w2;
SHA256_STEP(g, h, a, b, c, d, e, f,  w2, 0x0fc19dc6);
w3 = s1(w1) + w12 + s0(w4) + w3;
SHA256_STEP(f, g, h, a, b, c, d, e,  w3, 0x240ca1cc);
w4 = s1(w2) + w13 + s0(w5) + w4;
SHA256_STEP(e, f, g, h, a, b, c, d,  w4, 0x2de92c6f);
w5 = s1(w3) + w14 + s0(w6) + w5;
SHA256_STEP(d, e, f, g, h, a, b, c,  w5, 0x4a7484aa);
w6 = s1(w4) + w15 + s0(w7) + w6;
SHA256_STEP(c, d, e, f, g, h, a, b,  w6, 0x5cb0a9dc);
w7 = s1(w5) + w0 + s0(w8) + w7;
SHA256_STEP(b, c, d, e, f, g, h, a,  w7, 0x76f988da);
w8 = s1(w6) + w1 + s0(w9) + w8;
SHA256_STEP(a, b, c, d, e, f, g, h,  w8, 0x983e5152);
w9 = s1(w7) + w2 + s0(w10) + w9;
SHA256_STEP(h, a, b, c, d, e, f, g,  w9, 0xa831c66d);
w10 = s1(w8) + w3 + s0(w11) + w10;
SHA256_STEP(g, h, a, b, c, d, e, f, w10, 0xb00327c8);
w11 = s1(w9) + w4 + s0(w12) + w11;
SHA256_STEP(f, g, h, a, b, c, d, e, w11, 0xbf597fc7);
w12 = 

[Bug c++/109899] [12/13/14 Regression] ICE in check_noexcept_r, at cp/except.cc:1065

2024-01-05 Thread lozko.roma at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109899

Roman Lozko  changed:

   What|Removed |Added

 CC||lozko.roma at gmail dot com

--- Comment #8 from Roman Lozko  ---
Found this issue after reducing my example with GCC 13.2 to be an exact copy of
previous comment. Dunno how to help so just notifying that the bug still
exists, I guess.

[Bug tree-optimization/111268] [11/12/13/14 Regression] internal compiler error: in to_constant, at poly-int.h:504

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111268

Andrew Pinski  changed:

   What|Removed |Added

  Known to work|12.3.0  |
Summary|[14 Regression] internal|[11/12/13/14 Regression]
   |compiler error: in  |internal compiler error: in
   |to_constant, at |to_constant, at
   |poly-int.h:504  |poly-int.h:504
   Target Milestone|14.0|11.5
   Keywords||ice-checking

--- Comment #13 from Andrew Pinski  ---
(In reply to Jakub Jelinek from comment #12)
> The #c8 testcase started to ICE with
> r13-926-g08afab6f8642f58f702010ec196dce3b00955627
> The #c3 one started to ICE with
> r11-1445-g502d63b6d6141597bb18fd23c87736a1b384cf8f
> So, not really sure why is this marked as [14 Regression] only.

Because I didn't notice the ICE is due to an assert which is only enabled with
checking:
  gcc_checking_assert (is_constant ());

[Bug middle-end/113228] [14 Regression] ICE: recalculate_side_effects, at gimplify.cc:3347 since r14-6420

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113228

--- Comment #14 from Andrew Pinski  ---
(In reply to Jakub Jelinek from comment #12)
> The reason why late gimplification/regimplification generally works fine
> with SSA_NAMEs is that the
> case SSA_NAME:
>   /* Allow callbacks into the gimplifier during optimization.  */
>   ret = GS_ALL_DONE;
>   break;
> case doesn't fall through into the recalculation.  It is just this new
> match.pd folding which can turn a tcc_comparison *expr_p (which is what
> generally wants to recalculate side-effects) into SSA_NAME (which isn't
> handled there).

Thanks for looking into this issue further and handling it.

[Bug target/113229] [14 Regression] gcc.dg/torture/pr70083.c ICEs when compiled with -march=armv9-a+sve2

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113229

--- Comment #7 from Andrew Pinski  ---
(In reply to avieira from comment #6)
> Oh forgot to mention, this is triggering because of the div optimization in:
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;
> h=c69db3ef7f7d82a50f46038aa5457b7c8cc2d643
> 
> But I suspect that too is just an enabler and not the root cause? Unless we
> aren't supposed to use subregs for sve modes...

Note there is another paths which lead to crashing in paradoxical_subreg_p
still via simplify_const_vector_subreg (but not with the gen_divv4si3 in the
trace) in a different testcase; testsuite/gcc.dg/pr69896.c .



```
/home/apinski/src/upstream-full-cross/gcc/gcc/testsuite/gcc.dg/pr69896.c:22:1:
internal compiler error: in paradoxical_subreg_p, at rtl.h:3213
0x80c65b paradoxical_subreg_p(machine_mode, machine_mode)
../../gcc/rtl.h:3213
0x80cfc8 paradoxical_subreg_p(machine_mode, machine_mode)
../../gcc/poly-int.h:2179
0x80cfc8 simplify_const_vector_subreg
../../gcc/simplify-rtx.cc:7423
0x80cfc8 simplify_context::simplify_subreg(machine_mode, rtx_def*,
machine_mode, poly_int<2u, unsigned long>)
../../gcc/simplify-rtx.cc:7595
0xfae1c9 insn_propagation::apply_to_rvalue_1(rtx_def**)
../../gcc/recog.cc:1176
0xfadcab insn_propagation::apply_to_rvalue_1(rtx_def**)
../../gcc/recog.cc:1117
0xfade93 insn_propagation::apply_to_rvalue_1(rtx_def**)
../../gcc/recog.cc:1254
0xfae63f insn_propagation::apply_to_pattern(rtx_def**)
../../gcc/recog.cc:1396
0x1cfdb66 try_fwprop_subst_pattern
../../gcc/fwprop.cc:440
0x1cfdb66 try_fwprop_subst
../../gcc/fwprop.cc:613
0x1cfe500 forward_propagate_and_simplify
../../gcc/fwprop.cc:809
0x1cfe500 forward_propagate_into
../../gcc/fwprop.cc:872
0x1cfe89d forward_propagate_into
../../gcc/fwprop.cc:821
0x1cfe89d fwprop_insn
../../gcc/fwprop.cc:929
0x1cfe9c1 fwprop
../../gcc/fwprop.cc:981
```

[Bug libstdc++/113246] Behavior of std::filesystem::weakly_canonical with non-existing relative paths

2024-01-05 Thread davidepesa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113246

--- Comment #1 from Davide Pesavento  ---
Another interesting(?) behavior, when run from a non-existing (deleted) working
directory:

weakly_canonical("foo") returns "foo", while weakly_canonical("./foo") throws:

> terminate called after throwing an instance of 
> 'std::filesystem::__cxx11::filesystem_error'
>   what():  filesystem error: cannot make canonical path: No such file or 
> directory [.]

[Bug sanitizer/113244] Potential thread sanitizer false positive with future exception

2024-01-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113244

--- Comment #1 from Andrew Pinski  ---
I suspect this is because libstdc++.so is NOT instrumented for TSAN.

[Bug fortran/96724] Bogus warnings with the repeat intrinsic and the flag -Wconversion-extra

2024-01-05 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96724

--- Comment #4 from anlauf at gcc dot gnu.org ---
Submitted: https://gcc.gnu.org/pipermail/fortran/2024-January/060090.html

[Bug middle-end/79704] [meta-bug] Phoronix Test Suite compiler performance issues

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704
Bug 79704 depends on bug 109811, which changed state.

Bug 109811 Summary: libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #19 from Jan Hubicka  ---
I think we can declare this one fixed.

[Bug libstdc++/113246] New: Behavior of std::filesystem::weakly_canonical with non-existing relative paths

2024-01-05 Thread davidepesa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113246

Bug ID: 113246
   Summary: Behavior of std::filesystem::weakly_canonical with
non-existing relative paths
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: davidepesa at gmail dot com
  Target Milestone: ---

I initially reported this as a Boost.Filesystem issue
(https://github.com/boostorg/filesystem/issues/300), but std::filesystem in
libstdc++ has the same behavior.

With -std=c++17, weakly_canonical("foo/bar") returns "foo/bar" when "foo" does
not exist in the current working directory.

Conversely, when at least one leading element of the input path exists, the
returned path is absolute, e.g., weakly_canonical("existing/foo/bar") ==
"/home/existing/foo/bar".

It's not clear to me how to interpret the standard wording
(https://eel.is/c++draft/fs.op.weakly.canonical#1) in this case. It says:

> return a path composed by operator/= from the result of calling canonical() 
> with a path argument composed of the leading elements of p that exist, if 
> any, ...

If there are no leading elements of p that exist, should canonical() be called
with an empty path? or should it not be called at all? (if it's the former,
note that canonical("") currently throws a filesystem_error)

[Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

Jan Hubicka  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-01-05
 CC||hubicka at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #2 from Jan Hubicka  ---
On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems
reproducible.

Profile looks comparable:

gcc
  30.96%  cwebplibwebp.so.7.1.5   [.]
GetCombinedEntropyUnre
  26.19%  cwebplibwebp.so.7.1.5   [.] VP8LHashChainFill 
   3.34%  cwebplibwebp.so.7.1.5   [.]
CalculateBestCacheSize
   3.30%  cwebplibwebp.so.7.1.5   [.]
CombinedShannonEntropy
   3.21%  cwebplibwebp.so.7.1.5   [.]
CollectColorBlueTransf

clang:

  34.06%  cwebplibwebp.so.7.1.5[.] GetCombinedEntropy   
  28.95%  cwebplibwebp.so.7.1.5[.] VP8LHashChainFill
   5.37%  cwebplibwebp.so.7.1.5[.]
VP8LGetBackwardReferences
   4.39%  cwebplibwebp.so.7.1.5[.]
CombinedShannonEntropy_SS
   4.28%  cwebplibwebp.so.7.1.5[.]
CollectColorBlueTransform


In the first loop clang seems to ifconvert while GCC doesn't:
  0.59 │   lea  kSLog2Table,%rdi
  3.69 │   vmovss   (%rdi,%rax,4),%xmm0
  0.98 │ 6f:   vcvtsi2ss%edx,%xmm2,%xmm1
  0.63 │   vfnmadd213ss 0x0(%r13),%xmm0,%xmm1
 38.16 │   vmovss   %xmm1,0x0(%r13)
  5.48 │   cmp  %r12d,0xc(%r13)
  0.06 │ ↓ jae  89 
   │   mov  %r12d,0xc(%r13)
  0.99 │ 89:   mov  0x4(%r13),%edi 
  0.96 │ 8d:   xor  %eax,%eax  
  0.40 │   test %r12d,%r12d
  0.60 │   setne%al 



   │   vcvtsd2ss%xmm0,%xmm0,%xmm1   
  0.02 │362:   mov  %r15d,%eax  
  0.57 │   imul %r12d,%eax  
  0.00 │   cmp  %r12d,%r9d  
  0.03 │   cmovbe   %r12d,%r9d  
  0.02 │   vmovd%eax,%xmm0  
  0.08 │   vpinsrd  $0x1,%r15d,%xmm0,%xmm0  
  1.50 │   vpaddd   %xmm0,%xmm4,%xmm4   
  1.08 │   vcvtsi2ss%r15d,%xmm5,%xmm0   
  0.87 │   vfnmadd231ss %xmm0,%xmm1,%xmm3   
  5.40 │   vmovaps  %xmm3,%xmm0 
  0.02 │38c:   xor  %eax,%eax   
  0.16 │   cmp  $0x4,%r15d

[Bug fortran/96724] Bogus warnings with the repeat intrinsic and the flag -Wconversion-extra

2024-01-05 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96724

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 CC||anlauf at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |anlauf at gcc dot 
gnu.org
 Status|WAITING |ASSIGNED

--- Comment #3 from anlauf at gcc dot gnu.org ---
I have an even simpler variant based on Jose's patch.

[Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang (not enough complete loop peeling)

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235

--- Comment #6 from Jan Hubicka  ---
The internal loops are:

static const unsigned keccakf_rotc[24] = {
   1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 2, 14, 27, 41, 56, 8, 25, 43, 62, 18,
39, 61, 20, 44
}; 

static const unsigned keccakf_piln[24] = {
   10, 7, 11, 17, 18, 3, 5, 16, 8, 21, 24, 4, 15, 23, 19, 13, 12, 2, 20, 14,
22, 9, 6, 1
};

static void keccakf(ulong64 s[25])
{  
   int i, j, round;
   ulong64 t, bc[5];

   for(round = 0; round < SHA3_KECCAK_ROUNDS; round++) {
  /* Theta */
  for(i = 0; i < 5; i++)
 bc[i] = s[i] ^ s[i + 5] ^ s[i + 10] ^ s[i + 15] ^ s[i + 20];

  for(i = 0; i < 5; i++) { 
 t = bc[(i + 4) % 5] ^ ROL64(bc[(i + 1) % 5], 1);
 for(j = 0; j < 25; j += 5)
s[j + i] ^= t;
  }
  /* Rho Pi */
  t = s[1];
  for(i = 0; i < 24; i++) {
 j = keccakf_piln[i];
 bc[0] = s[j];
 s[j] = ROL64(t, keccakf_rotc[i]);
 t = bc[0];
  }
  /* Chi */
  for(j = 0; j < 25; j += 5) {
 for(i = 0; i < 5; i++)
bc[i] = s[j + i];
 for(i = 0; i < 5; i++)
s[j + i] ^= (~bc[(i + 1) % 5]) & bc[(i + 2) % 5];
  }
  s[0] ^= keccakf_rndc[round];
   }
}

I suppose with complete unrolling this will propagate, partly stay in registers
and fold. I think increasing the default limits, especially -O3 may make sense.
Value of 16 is there for very long time (I think since the initial
implementation).

[Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang (not enough complete loop peeling)

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235

Jan Hubicka  changed:

   What|Removed |Added

Summary|SMHasher SHA3-256 benchmark |SMHasher SHA3-256 benchmark
   |is almost 40% slower vs.|is almost 40% slower vs.
   |Clang   |Clang (not enough complete
   ||loop peeling)

--- Comment #5 from Jan Hubicka  ---
On my zen3 machine default build gets me 180MB/S
-O3 -flto -funroll-all-loops gets me 193MB/s
-O3 -flto --param max-completely-peel-times=30 gets me 382MB/s, speedup is gone
with --param max-completely-peel-times=20, default is 16.

[Bug fortran/67972] Substrings of arrays of unicode strings are of type DEFAULT rather than ISO_10646

2024-01-05 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67972

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED
   Target Milestone|--- |11.3
 CC||anlauf at gcc dot gnu.org

--- Comment #4 from anlauf at gcc dot gnu.org ---
Definitely fixed long ago before 11.3.  Closing.

[Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #4 from Jan Hubicka  ---
I keep mentioning to Larabel that he should use -fno-semantic-interposition,
but he doesn't.

Profile is very simple:

 96.75%  SMHasher[.] keccakf.lto_priv.0
  ◆

All goes to simple loop. On Zen3 gcc 13 -march=native -Ofast -flto I get:

  3.85 │330:   mov%r8,%rdi  
  7.68 │   movslq (%rsi,%r9,1),%rcx 
  3.85 │   lea(%rax,%rcx,8),%r10
  3.86 │   mov(%rdx,%r9,1),%ecx 
  3.83 │   add$0x4,%r9  
  3.86 │   mov(%r10),%r8
  7.37 │   rol%cl,%rdi  
  7.37 │   mov%rdi,(%r10)   
  4.76 │   cmp$0x60,%r9 
  0.00 │ ↑ jne330   


Clang seems to unroll it:

 0.25 │ d0:   mov  -0x48(%rsp),%rdx
  ▒
  0.25 │   xor  %r12,%rcx  
   ▒
  0.25 │   mov  %r13,%r12  
   ▒
  0.25 │   mov  %r13,0x10(%rsp)
   ▒
  0.25 │   mov  %rax,%r13  
   ◆
  0.26 │   xor  %r15,%r13  
   ▒
  0.23 │   mov  %r11,-0x70(%rsp)   
   ▒
  0.25 │   mov  %r8,0x8(%rsp)  
   ▒
  0.25 │   mov  %r15,-0x40(%rsp)   
   ▒
  0.25 │   mov  %r10,%r15  
   ▒
  0.26 │   mov  %r10,(%rsp)
   ▒
  0.26 │   mov  %r14,%r10  
   ▒
  0.25 │   xor  %r12,%r10  
   ▒
  0.26 │   xor  %rsi,%r15  
   ▒
  0.24 │   mov  %rbp,-0x80(%rsp)   
   ▒
  0.25 │   xor  %rcx,%r15  
   ▒
  0.26 │   mov  -0x60(%rsp),%rcx   
   ▒
  0.25 │   xor  -0x68(%rsp),%r15   
   ▒
  0.26 │   xor  %rbp,%rdx  
   ▒
  0.25 │   mov  -0x30(%rsp),%rbp   
   ▒
  0.25 │   xor  %rdx,%r13  
   ▒
  0.24 │   mov  -0x10(%rsp),%rdx   
   ▒
  0.25 │   mov  %rcx,%r12  
   ▒
  0.24 │   xor  %rcx,%r13  
   ▒
  0.25 │   mov  $0x1,%ecx  
   ▒
  0.25 │   xor  %r11,%rdx  
   ▒
  0.24 │   mov  %r8,%r11   
   ▒
  0.25 │   mov  -0x28(%rsp),%r8
   ▒
  0.26 │   xor  -0x58(%rsp),%r8
   ▒
  0.24 │   xor  %rdx,%r8   
   ▒
  0.26 │   mov  -0x8(%rsp),%rdx
   ▒
  0.25 │   xor  %rbp,%r8   
   ▒
  0.26 │   xor  %r11,%rdx  
   ▒
  0.25 │   mov  -0x20(%rsp),%r11   
   ▒
  0.25 │   xor  %rdx,%r10  
   ▒

[Bug fortran/93948] Surprising option processing of -fdec and -fdec-math in combination with -std

2024-01-05 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93948

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED
   Target Milestone|--- |14.0
  Known to work||14.0

--- Comment #6 from anlauf at gcc dot gnu.org ---
(In reply to Dominique d'Humieres from comment #5)
> From the Steve's comments, could this PR closed as WONTFIX?

Yes.  Note that -fdec-math has been a no-op for quite some time, and
the functions that were added under that flag are finally available
under -std=f2023 and -std=gnu.

The example in comment#0 works as requested with gcc-14.

Closing.

[Bug fortran/113245] SIZE with optional DIM argument that has the OPTIONAL+VALUE attributes

2024-01-05 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113245

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P4
   Keywords||wrong-code

[Bug fortran/113245] New: SIZE with optional DIM argument that has the OPTIONAL+VALUE attributes

2024-01-05 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113245

Bug ID: 113245
   Summary: SIZE with optional DIM argument that has the
OPTIONAL+VALUE attributes
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: anlauf at gcc dot gnu.org
  Target Milestone: ---

Related to pr30865, but with optional that has the VALUE attribute.
Testcase:

program p
  implicit none
  real :: a(2,3)
  call ref (a,2) ! works
  call val (a,2) ! works
  print *, "--"
  call ref (a)   ! works
  call val (a)   ! fails
contains
  subroutine ref (x, d)
real,  intent(in) :: x(:,:)
integer, optional, intent(in) :: d
print *, "present (d) =", present (d)
print *, "size (a, d) =", size (x, dim=d)
  end
  subroutine val (x, d)
real,  intent(in) :: x(:,:)
integer, optional, value  :: d
print *, "present (d) =", present (d)
print *, "size (a, d) =", size (x, dim=d)  ! <<< miscompiled
  end
end

The dump-tree shows that the presence test is miscompiled, leading to
always accessing 'd', which is 0 for an absent argument, and causing an
invalid access.

[Bug libstdc++/108976] codecvt for Unicode allows surrogate code points

2024-01-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976

--- Comment #10 from Jonathan Wakely  ---
I think it would be good to backport it, what do you think?

[Bug debug/78100] DWARF symbols for an array sometimes missing the array length

2024-01-05 Thread gccbugs at dima dot secretsauce.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78100

--- Comment #4 from Dima Kogan  ---
I just tried again, and I see that this bug has been fixed. I'm using

  gcc (Debian 13.2.0-2) 13.2.0

Should we close this report?

[Bug target/113229] [14 Regression] gcc.dg/torture/pr70083.c ICEs when compiled with -march=armv9-a+sve2

2024-01-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113229

--- Comment #6 from avieira at gcc dot gnu.org ---
Oh forgot to mention, this is triggering because of the div optimization in:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=c69db3ef7f7d82a50f46038aa5457b7c8cc2d643

But I suspect that too is just an enabler and not the root cause? Unless we
aren't supposed to use subregs for sve modes...

[Bug target/113229] [14 Regression] gcc.dg/torture/pr70083.c ICEs when compiled with -march=armv9-a+sve2

2024-01-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113229

--- Comment #5 from avieira at gcc dot gnu.org ---
Oh forgot to mention, this is triggering because of the div optimization in:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=c69db3ef7f7d82a50f46038aa5457b7c8cc2d643

But I suspect that too is just an enabler and not the root cause? Unless we
aren't supposed to use subregs for sve modes...

[Bug target/113229] [14 Regression] gcc.dg/torture/pr70083.c ICEs when compiled with -march=armv9-a+sve2

2024-01-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113229

avieira at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2024-01-05
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #4 from avieira at gcc dot gnu.org ---
So I can confirm this ICE and it was exposed rather than caused by my patch.

The problem arises because it seems we have never tried to simplify a:
(subreg: (subreg:<...> () N) M)

This makes simplify_subreg neter the if (GET_CODE (op) == SUBREG) which calls:
'paradoxical_subreg_p (VNx4SImode, OImode)'
Which seems to assume these are ordered with an assert.

I am not sure what the right fix is here, I did check and changing
paradoxical_subreg_p to return false if the mode sizes are not ordered leads to
some bizarre fail, it looks like simplify_gen_subreg then just returns 0 ...
rather than the original nested subregs.

Before I dig deeper I'll get richi and Richard S to comment.

[Bug sanitizer/113244] New: Potential thread sanitizer false positive with future exception

2024-01-05 Thread gcc-bugzilla at mhxnet dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113244

Bug ID: 113244
   Summary: Potential thread sanitizer false positive with future
exception
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc-bugzilla at mhxnet dot de
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

Created attachment 56994
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56994=edit
C++ code to reproduce the issue

Using `g++-13 (Gentoo 13.2.1_p20230826 p7) 13.2.1 20230826` and compiling the
attached program with

g++-13 -std=c++20 -g -O2 -fsanitize=thread -fno-omit-frame-pointer -o
packaged_task packaged_task.cpp

Thread Sanitizer will likely, but not always, report several data races related
to the exception stored in the future. The races are between reading from the
exception object in the catch block on the main thread and destroying the
exception as a result of packaged_task going out of scope in one of the worker
threads.

==
WARNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call)
(pid=7610)
  Write of size 8 at 0x7b2c00031460 by thread T9:
#0 ~error_base /home/mhx/src/c++/test/packaged_task.cpp:12
(packaged_task+0x5574)
#1 ~my_error /home/mhx/src/c++/test/packaged_task.cpp:22
(packaged_task+0x5574)
#2 std::__exception_ptr::exception_ptr::_M_release()
/var/tmp/portage/sys-devel/gcc-13.2.1_p20230826/work/gcc-13-20230826/libstdc++-v3/libsupc++/eh_ptr.cc:105
(libstdc++.so.6+0xb2f30)
#3 std::__future_base::_Result::_M_destroy()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/future:672
(packaged_task+0x5287)
#4
std::__future_base::_Result_base::_Deleter::operator()(std::__future_base::_Result_base*)
const /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/future:229
(packaged_task+0x5287)
#5 std::unique_ptr::~unique_ptr()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/unique_ptr.h:404
(packaged_task+0x5287)
#6 std::__future_base::_State_baseV2::~_State_baseV2()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/future:344
(packaged_task+0x5287)
#7 std::__future_base::_Task_state_base::~_Task_state_base()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/future:1450
(packaged_task+0x5287)
#8 ~_Task_state
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/future:1477
(packaged_task+0x5287)
#9 destroy_at,
std::allocator, void()> >
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_construct.h:88
(packaged_task+0x5287)
#10 destroy,
std::allocator, void()> >
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/alloc_traits.h:559
(packaged_task+0x5287)
#11 _M_dispose
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/shared_ptr_base.h:613
(packaged_task+0x5287)
#12
std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/shared_ptr_base.h:175
(packaged_task+0x83bc)
#13 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/shared_ptr_base.h:361
(packaged_task+0x83bc)
#14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/shared_ptr_base.h:1071
(packaged_task+0x83bc)
#15 std::__shared_ptr,
(__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/shared_ptr_base.h:1524
(packaged_task+0x83bc)
#16 std::shared_ptr
>::~shared_ptr()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/shared_ptr.h:175
(packaged_task+0x83bc)
#17 std::packaged_task::~packaged_task()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/future:1594
(packaged_task+0x83bc)
#18 std::_Optional_payload_base
>::_M_destroy()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/optional:287
(packaged_task+0x645c)
#19 std::_Optional_payload_base
>::_M_reset() /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/optional:318
(packaged_task+0x645c)
#20 std::_Optional_payload, false, false,
false>::~_Optional_payload()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/optional:439
(packaged_task+0x645c)
#21 std::_Optional_base, false,
false>::~_Optional_base()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/optional:510
(packaged_task+0x645c)
#22 std::optional >::~optional()
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/optional:705
(packaged_task+0x645c)
#23 worker /home/mhx/src/c++/test/packaged_task.cpp:45
(packaged_task+0x645c)
#24 void 

[Bug c++/113242] g++ rejects-valid template argument of class type containing an lvalue reference

2024-01-05 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113242

Patrick Palka  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-01-05
   Assignee|unassigned at gcc dot gnu.org  |ppalka at gcc dot 
gnu.org
 CC||ppalka at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Patrick Palka  ---
Confirmed, this never worked.

[Bug libstdc++/108976] codecvt for Unicode allows surrogate code points

2024-01-05 Thread dmjpp at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976

--- Comment #9 from Dimitrij Mijoski  ---
I believe this bug report should closed as resolved. Are there maybe plans for
back-porting?

[Bug tree-optimization/113104] Suboptimal loop-based slp node splicing across iterations

2024-01-05 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113104

Richard Sandiford  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Richard Sandiford  ---
Fixed.  Thanks for the report.

[Bug tree-optimization/113104] Suboptimal loop-based slp node splicing across iterations

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113104

--- Comment #5 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:7328faf89e9b4953baaff10e18262c70fbd3e578

commit r14-6961-g7328faf89e9b4953baaff10e18262c70fbd3e578
Author: Richard Sandiford 
Date:   Fri Jan 5 16:25:16 2024 +

aarch64: Extend VECT_COMPARE_COSTS to !SVE [PR113104]

When SVE is enabled, we try vectorising with multiple different SVE and
Advanced SIMD approaches and use the cost model to pick the best one.
Until now, we've not done that for Advanced SIMD, since "the first mode
that works should always be the best".

The testcase is a counterexample.  Each iteration of the scalar loop
vectorises naturally with 64-bit input vectors and 128-bit output
vectors.  We do try that for SVE, and choose it as the best approach.
But the first approach we try is instead to use:

- a vectorisation factor of 2
- 1 128-bit vector for the inputs
- 2 128-bit vectors for the outputs

But since the stride is variable, the cost of marshalling the input
vector from two iterations outweighs the benefit of doing two iterations
at once.

This patch therefore generalises aarch64-sve-compare-costs to
aarch64-vect-compare-costs and applies it to non-SVE compilations.

gcc/
PR target/113104
* doc/invoke.texi (aarch64-sve-compare-costs): Replace with...
(aarch64-vect-compare-costs): ...this.
* config/aarch64/aarch64.opt (-param=aarch64-sve-compare-costs=):
Replace with...
(-param=aarch64-vect-compare-costs=): ...this new param.
* config/aarch64/aarch64.cc (aarch64_override_options_internal):
Don't disable it when vectorizing for Advanced SIMD only.
(aarch64_autovectorize_vector_modes): Apply VECT_COMPARE_COSTS
whenever aarch64_vect_compare_costs is true.

gcc/testsuite/
PR target/113104
* gcc.target/aarch64/pr113104.c: New test.
* gcc.target/aarch64/sve/cond_arith_1.c: Update for new parameter
names.
* gcc.target/aarch64/sve/cond_arith_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3_run.c: Likewise.
* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/load_const_offset_2.c: Likewise.
* gcc.target/aarch64/sve/load_const_offset_3.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_load_slp_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise.
* gcc.target/aarch64/sve/pack_1.c: Likewise.
* gcc.target/aarch64/sve/reduc_4.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_6.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_7.c: Likewise.
* gcc.target/aarch64/sve/strided_load_3.c: Likewise.
* gcc.target/aarch64/sve/strided_store_3.c: Likewise.
* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_signed_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_unsigned_1_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_11.c: Likewise.
* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

[Bug tree-optimization/113210] [14 Regression] ICE: tree check: expected integer_cst, have cond_expr in get_len, at tree.h:6481

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113210

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #8 from Jakub Jelinek  ---
Created attachment 56993
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56993=edit
gcc14-pr113210.patch

So, I'd go with this patch (so far untested).

[Bug tree-optimization/113210] [14 Regression] ICE: tree check: expected integer_cst, have cond_expr in get_len, at tree.h:6481

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113210

--- Comment #7 from Jakub Jelinek  ---
Or maybe just a bug in the PLUS_EXPR folding?
The code sets NITERSM1 to
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 +
255) : 0
and then fold_build2s PLUS_EXPR of that and 1 and somehow it folds to 1, that
doesn't sound right to me.
Now, when folding the + 1 addition just with the second operand, i.e.
~(short unsigned int) (a.0_1 + 255)
it correctly folds into
-(short unsigned int) (a.0_1 + 255)
and obviously the second one to 1.
There is also the
/* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
   X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
(simplify
 (cond (gt (plus @0 integer_onep) @1) (negate @0) integer_onep@2)
 (if (TYPE_UNSIGNED (type))
  (cond (ge @0 @1) (negate @0) @2)))
match.pd rule, but that I'd think should just fold the whole thing to:
(short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255)
: 1

Though, a.0_1 is unsigned char, so (short unsigned int) (a.0_1 + 255) + 1 > 256
is actually never true.
So guess the folding is correct.

[Bug ipa/112616] [11/12/13/14 Regression] wrong code at -O{s, 2, 3} on x86_64-linux-gnu since r10-3311

2024-01-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112616

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #5 from Martin Jambor  ---
(In reply to Andrew Pinski from comment #2)
> This is like PR 108007 but unlike that one, -fno-tree-dce is not used.

But the patch fixes it, so I gess it's time to make it pass ppc64le bootstrap.

(But I did not want to backport that patch, I wonder whether we can't figure
out something simpler :-/ )

[Bug target/113243] New: mips: Wrong code for pr91323.c

2024-01-05 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113243

Bug ID: 113243
   Summary: mips: Wrong code for pr91323.c
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xry111 at gcc dot gnu.org
  Target Milestone: ---

In pr91323.c:

int
__attribute__ ((noinline, noclone))
f3 (float a, float b)
{
  return a < b || a > b;
}

is compiled to:

test(float, float):
c.ueq.s $fcc5,$f12,$f14
li  $2,1
jr  $31
movt$2,$0,$fcc5

for mips64r2, and 

test(float, float):
cmp.ne.s$f12,$f12,$f14
bc1nez  $f12,$L4
li  $2,1
move$2,$0
$L4:
jrc $31

for mips64r6.  Both are incorrect because both cmp.ne.s and c.ueq.s do not
raise INVALID exception with qNaN inputs, but in C "<" and ">" should raise
INVALID for qNaN inputs.

[Bug target/110796] builtin_iseqsig fails some tests in armv8l-linux-gnueabihf

2024-01-05 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110796

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #13 from Xi Ruoyao  ---
(In reply to Francois-Xavier Coudert from comment #8)
> (In reply to Richard Earnshaw from comment #6)
> > Is the exception status supposed to be in a defined state when the test
> > runs?  Shouldn't there be a call to feclearexcept (FE_ALL_EXCEPT) at the
> > start of the test?
> 
> Isn't the exception status guaranteed to be defined (and not signaling) when
> the program starts?

It should be guaranteed.  Otherwise it indicates a bug in kernel or libc.

> But adding feclearexcept (FE_ALL_EXCEPT); at the beginning of main() could
> not hurt, for sure.

[Bug ipa/112783] core dump on libxo when function is inlined

2024-01-05 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112783

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #10 from Xi Ruoyao  ---
(In reply to Jiang ChuanGang from comment #9)

> I've encountered the same bug, and your solution does fix it.
> But strangely enough, I can't reproduce it with code like the following.
> The inevitable condition of this bug still puzzles me. Do you have any
> thoughts on this.

If you invoke an undefined behavior then anything can happen.  And the
condition is not "inevitable", if you use a different compiler or use different
compile options it may change as well.

Do not try to predict the outcome of an undefined behavior.  If you really want
an explanation you can try to trace how the compiler optimizes the program (by
adding -fdump-tree-all -fdump-rtl-all and investigating all the dumped files). 
But such an explanation will only apply for the specific GCC version you are
using, with a different GCC release the optimization passes will just change. 
So I don't think such an explanation will be useful or worth to find out.

If you want to catch such bugs more easily try things like -fsanitize=undefined
or -fsanitize=address.

[Bug tree-optimization/113210] [14 Regression] ICE: tree check: expected integer_cst, have cond_expr in get_len, at tree.h:6481

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113210

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
LOOP_VINFO_NITERS is INTEGER_CST 1 here, but LOOP_VINFO_NITERSM1 is that
complex
expression and we assume that if LOOP_VINFO_NITERS_KNOWN_P then NITERSM1 is
also known,
which is not the case here.

[Bug rtl-optimization/113048] [13/14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1862 (unable to find a register to spill) {*andndi3_doubleword_bmi} with -march=cascadelake since r13

2024-01-05 Thread manuel.lauss at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113048

--- Comment #6 from Manuel Lauss  ---
After be977db17c91ad6627dee70a1904a95d229aa1be I don't see this ICE any longer
on either x64 or mips bootstrap.

[Bug libstdc++/113241] [13/14 Regression] Unguarded use of __is_convertible built-in

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113241

--- Comment #1 from GCC Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:57fa5b60bbbf8038b8a699d2bcebd2a9b2e29aa4

commit r14-6958-g57fa5b60bbbf8038b8a699d2bcebd2a9b2e29aa4
Author: Jonathan Wakely 
Date:   Fri Jan 5 12:03:22 2024 +

libstdc++: Do not use __is_convertible unconditionally [PR113241]

The new __is_convertible built-in should only be used after checking
that it's supported.

libstdc++-v3/ChangeLog:

PR libstdc++/113241
* include/std/type_traits (is_convertible_v): Guard use of
built-in with preprocessor check.

[Bug tree-optimization/113201] [14 Regression] internal compiler error: tree check: expected ssa_name, have integer_cst in replace_uses_by, at tree-cfg.cc:2058

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113201

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Jakub Jelinek  ---
.

[Bug tree-optimization/111268] [14 Regression] internal compiler error: in to_constant, at poly-int.h:504

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111268

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #12 from Jakub Jelinek  ---
The #c8 testcase started to ICE with
r13-926-g08afab6f8642f58f702010ec196dce3b00955627
The #c3 one started to ICE with
r11-1445-g502d63b6d6141597bb18fd23c87736a1b384cf8f
So, not really sure why is this marked as [14 Regression] only.

[Bug middle-end/113205] [14 Regression] internal compiler error: in backward_pass, at tree-vect-slp.cc:5346

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Can you reproduce it without -flto?

[Bug modula2/112923] gm2 runs out of memory

2024-01-05 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112923

--- Comment #2 from Gaius Mulley  ---
Created attachment 56992
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56992=edit
Tiny example code showing the problem

The attached source will compile but consumes vast amounts of ram.

$ time gm2 -g -c  Blowup.mod

takes 43 secs on a amd64 3.6 GHz box and about 6GB ram.  The M2Diagnostic
module will generate resource data and will be called when -fmem-report or
-ftime-report is issued.

[Bug modula2/112923] gm2 runs out of memory

2024-01-05 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112923

Gaius Mulley  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-05
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Gaius Mulley  ---
Indeed - I'm writing a M2Diagnostic module to collect stats about memory and
time consumption - which hopefully will identify the modules consuming too much
resource.

In this particular case it occurs when building an array constant with
characters - I suspect too many temporary constants are being preserved from
the garbage collector.

[Bug tree-optimization/113144] [14 regression] ICE when building dpkg-1.21.15 in verify_dominators (error: dominator of 9 should be 48, not 12)

2024-01-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113144

--- Comment #9 from Tamar Christina  ---
*** Bug 113145 has been marked as a duplicate of this bug. ***

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-01-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 113145, which changed state.

Bug 113145 Summary: [14 regression] ICE in verify_dominators when building 
mit-krb5-1.21.2 since r14-6822-g01f4251b8775c8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113145

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

[Bug tree-optimization/113145] [14 regression] ICE in verify_dominators when building mit-krb5-1.21.2 since r14-6822-g01f4251b8775c8

2024-01-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113145

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Tamar Christina  ---
Oh I had missed this one, both those cases already fixed in one of the
submitted patches. I think 113144 so I'll mark it as a dup.

With the submitted patches both vectorize correctly.

*** This bug has been marked as a duplicate of bug 113144 ***

[Bug tree-optimization/113145] [14 regression] ICE in verify_dominators when building mit-krb5-1.21.2 since r14-6822-g01f4251b8775c8

2024-01-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113145

Martin Jambor  changed:

   What|Removed |Added

Summary|[14 regression] ICE when|[14 regression] ICE in
   |building mit-krb5-1.21.2|verify_dominators when
   ||building mit-krb5-1.21.2
   ||since
   ||r14-6822-g01f4251b8775c8
   Last reconfirmed||2024-01-05
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||jamborm at gcc dot gnu.org

--- Comment #2 from Martin Jambor  ---
This has been introduced with r14-6822-g01f4251b8775c8 (middle-end: Support
vectorization of loops with multiple exits).

I have reduced another testcase from 526.blender_r, which however requires
-Ofast -march=x86-64-v3 -fprofile-generate so the original is probably better:

void *check_for_dupid_lb_0;
char check_for_dupid_name;
int check_for_dupid_nr;
void BLI_split_name_num();
char check_for_dupid() {
  int a;
  while (1) {
for (; check_for_dupid_lb_0;)
  BLI_split_name_num();
a = 0;
for (; a < 64; a++)
  if (a >= check_for_dupid_nr)
break;
if (a && check_for_dupid_name)
  return 1;
  }
}

[Bug middle-end/113228] [14 Regression] ICE: recalculate_side_effects, at gimplify.cc:3347 since r14-6420

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113228

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #13 from Jakub Jelinek  ---
Created attachment 56991
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56991=edit
gcc14-pr113228.patch

So I'd just go with a simple patch to recalculate_side_effects.
Changing the case SSA_NAME: in gimplify_expr obviously isn't needed, that works
fine as is.

[Bug c++/113242] New: g++ rejects-valid template argument of class type containing an lvalue reference

2024-01-05 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113242

Bug ID: 113242
   Summary: g++ rejects-valid template argument of class type
containing an lvalue reference
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: janschultke at googlemail dot com
  Target Milestone: ---

## Code to Reproduce

struct wrapper {
int& ref;
constexpr wrapper(int& ref) : ref(ref) {}
};

template 
void fun1() {}

template 
void fun2() {
fun1();
}

int main() {
static int val = 22;
fun2();
}


## Incorrect Output (GCC 14, -std=c++20) (https://godbolt.org/z/1jzqza73z)

: In instantiation of 'void fun2() [with wrapper X = wrapper{val}]':
:16:14:   required from here
   16 | fun2();
  | ~^~
:11:12: error: no matching function for call to 'fun1()'
   11 | fun1();
  | ~~~^~
:7:6: note: candidate: 'template void fun1()'
7 | void fun1() {}
  |  ^~~~
:7:6: note:   template argument deduction/substitution failed:
:11:12: error: the address of 'wrapper{val}' is not a valid template
argument
   11 | fun1();
  | ~~~^~

## Explanation

None. Clang compiles this, but GCC doesn't. The reference contained within
wrapper is a valid template argument (see
https://eel.is/c++draft/temp.arg.nontype#6), but falsely disqualifies X in fun1
from binding to X in fun2.

See https://stackoverflow.com/a/77764351/5740428 for a more detailed
explanation for the relevant standardese.

[Bug target/113217] [14 Regression][aarch64] ICE in rtl_verify_bb_insns, at cfgrtl.cc:2796 since r14-6605-gc0911c6b357ba9

2024-01-05 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113217

Alex Coplan  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #7 from Alex Coplan  ---
Should be fixed, thanks for the report.

[Bug target/113217] [14 Regression][aarch64] ICE in rtl_verify_bb_insns, at cfgrtl.cc:2796 since r14-6605-gc0911c6b357ba9

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113217

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Alex Coplan :

https://gcc.gnu.org/g:4b67ec7ff5b1aa9b3b70e9b58afc594b890abeb0

commit r14-6947-g4b67ec7ff5b1aa9b3b70e9b58afc594b890abeb0
Author: Alex Coplan 
Date:   Fri Jan 5 12:25:00 2024 +

aarch64: Further fix for throwing insns in ldp/stp pass [PR113217]

As the PR shows, the fix in
r14-6916-g057dc349021660c40699fb5c98fd9cac8e168653 was not complete.
That fix was enough to stop us trying to move throwing accesses above
nondebug insns, but due to this code in try_fuse_pair:

  // Placement strategy: push loads down and pull stores up, this should
  // help register pressure by reducing live ranges.
  if (load_p)
range.first = range.last;
  else
range.last = range.first;

we would still try to move stores up above any debug insns that occurred
immediately after the previous nondebug insn.  This patch fixes that by
narrowing the move range in the case that the second access is throwing
to exactly the range of that insn.

Note that we still need the fix to latest_hazard_before mentioned above
so as to ensure we select a suitable base and reject pairs if it isn't
viable to form the pair at the end of the BB.

gcc/ChangeLog:

PR target/113217
* config/aarch64/aarch64-ldp-fusion.cc
(ldp_bb_info::try_fuse_pair): If the second access can throw,
narrow the move range to exactly that insn.

gcc/testsuite/ChangeLog:

PR target/113217
* g++.dg/pr113217.C: New test.

[Bug middle-end/113228] [14 Regression] ICE: recalculate_side_effects, at gimplify.cc:3347 since r14-6420

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113228

--- Comment #12 from Jakub Jelinek  ---
The reason why late gimplification/regimplification generally works fine with
SSA_NAMEs is that the
case SSA_NAME:
  /* Allow callbacks into the gimplifier during optimization.  */
  ret = GS_ALL_DONE;
  break;
case doesn't fall through into the recalculation.  It is just this new match.pd
folding which can turn a tcc_comparison *expr_p (which is what generally wants
to recalculate side-effects) into SSA_NAME (which isn't handled there).

[Bug libstdc++/113241] [13/14 Regression] Unguarded use of __is_convertible built-in

2024-01-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113241

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |13.3
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-01-05
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org

[Bug libstdc++/113241] New: [13/14 Regression] Unguarded use of __is_convertible built-in

2024-01-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113241

Bug ID: 113241
   Summary: [13/14 Regression] Unguarded use of __is_convertible
built-in
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

In r13-2883-gaf85ad891703db I made this change in :

 template 
   inline constexpr bool is_base_of_v = __is_base_of(_Base, _Derived);
 template 
-  inline constexpr bool is_convertible_v = is_convertible<_From, _To>::value;
+  inline constexpr bool is_convertible_v = __is_convertible(_From, _To);
 template
   inline constexpr bool is_invocable_v = is_invocable<_Fn, _Args...>::value;
 template


However, that should be guarded by a __has_builtin check, so that it doesn't
break older compilers using GCC 13 headers, e.g. Intel icc 2021.8.0 and later
on Compiler Explorer, e.g. https://gcc.godbolt.org/z/jWEfEz6bq

[Bug libstdc++/113200] std::char_traits::move is not constexpr when the argument is a string literal

2024-01-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113200

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |12.4

[Bug middle-end/113228] [14 Regression] ICE: recalculate_side_effects, at gimplify.cc:3347 since r14-6420

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113228

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1
Summary|[14 Regression] ICE:|[14 Regression] ICE:
   |recalculate_side_effects,   |recalculate_side_effects,
   |at gimplify.cc:3347 |at gimplify.cc:3347 since
   ||r14-6420
 CC||jakub at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek  ---
Confirmed this started with r14-6420-g85c5efcffed19ca6160eeecc2d4faebd9fee63aa
Reproduceable also on x86_64-linux with -O3.

[Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
(In reply to Roger Sayle from comment #3)
> This patch addresses the regression, but probably isn't the correct fix.

Why?  To me it looks like the correct fix.

> The issue is that the backend now has a way of representing the
> concatenation of two registers (for example, TI is constructed for two DI
> mode registers):
> 
> (set (reg:TI 111 [ bD.2764 ])
> (ior:TI (ashift:TI (zero_extend:TI (reg:DI 142))
> (const_int 64 [0x40]))
> (zero_extend:TI (reg:DI 141
> 
> But combine is unable to cleanly extract the (original) DI mode components
> back out of this using SUBREGs.  Currently combine gets confused and
> attempts to match things like:
> 
> Trying 10 -> 74:
>10: r111:TI=zero_extend(r142:DI)<<0x40|zero_extend(r141:DI)
>   REG_DEAD r141:DI
>   REG_DEAD r142:DI
>74: r137:DI=r111:TI#0
> Failed to match this instruction:
> (parallel [
> (set (reg:DI 137 [ bD.2764 ])
> (reg:DI 141))
> (set (reg:TI 111 [ bD.2764 ])
> (ior:TI (ashift:TI (zero_extend:TI (reg:DI 142))
> (const_int 64 [0x40]))
> (zero_extend:TI (reg:DI 141
> ])
> 
> which contains the simplification we want, "reg:DI 137 := reg:DI 141", but
> along with stuff that combine should really take care off (strip/duplicate).

How it could do anything else?  It simplifies the computation of r137, but
because r111 isn't dead but used later, it has to preserve the previous
computation.  And, combine only considers the 2-4 instructions being simplified
together, so doesn't know that the other use only extracts the highpart subreg
and can be therefore also simplified.
Combine simply never tries to simplify folding say 10 -> 74 + 75.

Does your patch help with the testcase?

[Bug libstdc++/113200] std::char_traits::move is not constexpr when the argument is a string literal

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113200

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:15cc291887dc9dd92b2c93f4545e20eb6c190122

commit r14-6944-g15cc291887dc9dd92b2c93f4545e20eb6c190122
Author: Jonathan Wakely 
Date:   Wed Jan 3 15:01:09 2024 +

libstdc++: Fix std::char_traits::move [PR113200]

The current constexpr implementation of std::char_traits::move relies
on being able to compare the pointer parameters, which is not allowed
for unrelated pointers. We can use __builtin_constant_p to determine
whether it's safe to compare the pointers directly. If not, then we know
the ranges must be disjoint and so we can use char_traits::copy to
copy forwards from the first character to the last. If the pointers can
be compared directly, then we can simplify the condition for copying
backwards to just two pointer comparisons.

libstdc++-v3/ChangeLog:

PR libstdc++/113200
* include/bits/char_traits.h (__gnu_cxx::char_traits::move): Use
__builtin_constant_p to check for unrelated pointers that cannot
be compared during constant evaluation.
* testsuite/21_strings/char_traits/requirements/113200.cc: New
test.

[Bug libstdc++/113099] locale without RTTI uses dynamic_cast before gcc 13.2 or has ODR violation since gcc 13.2

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113099

--- Comment #16 from GCC Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:9f3eb93e72703f6ea30aa27d0b6fc6db62cb4a04

commit r14-6942-g9f3eb93e72703f6ea30aa27d0b6fc6db62cb4a04
Author: Jonathan Wakely 
Date:   Wed Jan 3 12:23:32 2024 +

libstdc++: Use if-constexpr in std::__try_use_facet [PR113099]

As noted in the PR, we can use if-constexpr for the explicit
instantantiation definitions that are compiled with -std=gnu++11. We
just need to disable the -Wc++17-extensions diagnostics.

libstdc++-v3/ChangeLog:

PR libstdc++/113099
* include/bits/locale_classes.tcc (__try_use_facet): Use
if-constexpr for C++11 and up.

[Bug tree-optimization/113201] [14 Regression] internal compiler error: tree check: expected ssa_name, have integer_cst in replace_uses_by, at tree-cfg.cc:2058

2024-01-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113201

--- Comment #4 from Jakub Jelinek  ---
Should be fixed now.

[Bug tree-optimization/113201] [14 Regression] internal compiler error: tree check: expected ssa_name, have integer_cst in replace_uses_by, at tree-cfg.cc:2058

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113201

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:b8faf1fca42a9b987fec0992ca5d63995b2640b3

commit r14-6941-gb8faf1fca42a9b987fec0992ca5d63995b2640b3
Author: Jakub Jelinek 
Date:   Fri Jan 5 11:18:17 2024 +0100

scev: Avoid ICE on results used in abnormal PHI args [PR113201]

The following testcase ICEs when rslt is SSA_NAME_OCCURS_IN_ABNORMAL_PHI
and we call replace_uses_by with a INTEGER_CST def, where it ICEs on:
  if (e->flags & EDGE_ABNORMAL
  && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (val))
because val is not an SSA_NAME.  One way would be to add
  && TREE_CODE (val) == SSA_NAME
check in between the above 2 lines in replace_uses_by.

And/or the following patch just punts propagating constants to
SSA_NAME_OCCURS_IN_ABNORMAL_PHI rslt uses.

Or we could punt somewhere earlier in final value replacement (but dunno
where).

2024-01-05  Jakub Jelinek  

PR tree-optimization/113201
* tree-scalar-evolution.cc (final_value_replacement_loop): Don't
call
replace_uses_by on SSA_NAME_OCCURS_IN_ABNORMAL_PHI rslt.

* gcc.c-torture/compile/pr113201.c: New test.

[Bug tree-optimization/90693] Missing popcount simplifications

2024-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:0152637c74c9eab7658483b1cca4e3d584dd4262

commit r14-6940-g0152637c74c9eab7658483b1cca4e3d584dd4262
Author: Jakub Jelinek 
Date:   Fri Jan 5 11:16:58 2024 +0100

Improve __builtin_popcount* (x) == 1 generation if x is known != 0
[PR90693]

We expand __builtin_popcount* (x) == 1 as
x ^ (x - 1) > x - 1, either unconditionally in tree-ssa-math-opts.cc
if we don't have a direct optab support for popcount, or during
expansion where we compare the costs of comparison of the popcount
against one vs. the above expression.
As mentioned in the PR, if we know from ranger that the argument is
not zero, we can emit x & (x - 1) == 0 test which is same number of
GIMPLE statements, but on many targets cheaper (e.g. whenever an AND
instruction can also set flags on whether result was zero or not).

The following patch does that.

2024-01-05  Jakub Jelinek  

PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): If
tree_expr_nonzero_p (arg), remember it in the second argument to
IFN_POPCOUNT or lower it as arg & (arg - 1) == 0 rather than
arg ^ (arg - 1) > arg - 1.
* internal-fn.cc (expand_POPCOUNT): If second argument to
IFN_POPCOUNT suggests arg is non-zero, try to expand it as
arg & (arg - 1) == 0 rather than arg ^ (arg - 1) > arg - 1.

* gcc.target/i386/pr90693-2.c: New test.

[Bug rtl-optimization/113048] [13/14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1862 (unable to find a register to spill) {*andndi3_doubleword_bmi} with -march=cascadelake since r13

2024-01-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113048

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |rtl-optimization

--- Comment #5 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #3)
> Started with r13-1716-gfd3d25d6df1cbd385d2834ff3059dfb6905dd75c

There is nothing wrong with the constrints in *andndi3_doubleword_bmi:

(define_insn_and_split "*andn3_doubleword_bmi"
  [(set (match_operand: 0 "register_operand" "=,r,r")
(and:
  (not: (match_operand: 1 "register_operand" "r,0,r"))
  (match_operand: 2 "nonimmediate_operand" "ro,ro,0")))
   (clobber (reg:CC FLAGS_REG))]

Reconfirmed as RA problem.