[Bug c/115104] RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed

2024-05-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104 --- Comment #2 from Robin Dapp --- Thanks, I was just about to open a PR.

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-05-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #18 from Robin Dapp --- A bit of a follow-up: I'm working on a patch for reassociation that can handle the mentioned cases and some more but it will still require a bit of time to get everything regression free and correct. What

[Bug middle-end/114196] [13 Regression] Fixed length vector ICE: in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9454

2024-05-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196 --- Comment #7 from Robin Dapp --- I can barely build a compiler on gcc185 due to disk space. I'm going to set up a cross toolchain (that I need for other purposes as well) in order to test.

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #10 from Robin Dapp --- Yes it helps. Great that get_gimple_for_ssa_name is right below get_rtx_for_ssa_name that I stepped through several times while debugging and I didn't realize the connection, g. But thanks! Good thing

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #8 from Robin Dapp --- Created attachment 58037 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58037=edit Expand dump Dump attached. Insn 209 is the problematic one. The changing from _911 to 1078 happens in

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 Robin Dapp changed: What|Removed |Added CC||rguenth at gcc dot gnu.org,

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #5 from Robin Dapp --- What happens is that code sinking does: Sinking # VUSE <.MEM_1235> vect__173.251_1238 = .MASK_LEN_LOAD (_911, 32B, { -1, -1, -1, -1 }, loop_len_1064, 0); from bb 3 to bb 4 so we have vect__173.251_1238 =

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #5

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #4 from Robin Dapp --- Ok, it looks like we do 5 iterations with the last one being length-masked to length 2 and then in the "live extraction" phase use "iteration 6".

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #3 from Robin Dapp --- > probably -fwhole-program is enough, -flto not needed(?) Yes, -fwhole-program is sufficient. > > # vectp_g.248_1401 = PHI > ... > _1411 = .SELECT_VL (ivtmp_1409, POLY_INT_CST [2, 2]); > .. >

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #1 from Robin Dapp --- Confirmed.

[Bug middle-end/114733] [14] Miscompile with -march=rv64gcv -O3 on riscv

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114733 --- Comment #1 from Robin Dapp --- Confirmed, also shows up here.

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #5 from Robin Dapp --- Weird, I tried your exact qemu version and still can't reproduce the problem. My results are always FFB5. Binutils difference? Very unlikely. Could you post your QEMU_CPU settings just to be sure?

[Bug target/114668] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/114686] Feature request: Dynamic LMUL should be the default for the RISC-V Vector extension

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686 --- Comment #3 from Robin Dapp --- I think we have always maintained that this can definitely be a per-uarch default but shouldn't be a generic default. > I don't see any reason why this wouldn't be the case for the vast majority of >

[Bug target/114668] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668 --- Comment #2 from Robin Dapp --- This, again, seems to be a problem with bit extraction from masks. For some reason I didn't add the VLS modes to the corresponding vec_extract patterns. With those in place the problem is gone because we go

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #2 from Robin Dapp --- Checked with the latest commit on a different machine but still cannot reproduce the error. PR114668 I can reproduce. Maybe a copy and paste problem?

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #1 from Robin Dapp --- Hmm, my local version is a bit older and seems to give the same result for both -O2 and -O3. At least a good starting point for bisection then.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247 --- Comment #6 from Robin Dapp --- Testsuite looks unchanged on rv64gcv.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247 --- Comment #5 from Robin Dapp --- This fixes the test case for me locally, thanks. I can run the testsuite with it later if you'd like.

[Bug tree-optimization/114476] [13/14 Regression] wrong code with -fwrapv -O3 -fno-vect-cost-model (and -march=armv9-a+sve2 on aarch64 and -march=rv64gcv on riscv)

2024-04-03 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476 --- Comment #8 from Robin Dapp --- I tried some things (for the related bug without -fwrapv) then got busy with some other things. I'm going to have another look later this week.

[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523

2024-04-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Robin Dapp changed: What|Removed |Added CC||ewlu at rivosinc dot com,

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #4 from Robin Dapp --- Yes, the vectorization looks ok. The extracted live values are not used afterwards and therefore the whole vectorized loop is being thrown away. Then we do one iteration of the epilogue loop, inverting the

[Bug tree-optimization/114476] [13/14 Regression] wrong code with -fwrapv -O3 -fno-vector-cost-mode (and -march=armv9-a+sve2 on aarch64 and -march=rv64gcv on riscv)

2024-03-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476 --- Comment #5 from Robin Dapp --- So the result is -9 instead of 9 (or vice versa) and this happens (just) with vectorization. We only vectorize with -fwrapv. >From a first quick look, the following is what we have before vect: (loop)

[Bug tree-optimization/114396] [14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv

2024-03-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #8 from Robin Dapp --- No fallout on x86 or aarch64. Of course using false instead of TYPE_SIGN (utype) is also possible and maybe clearer?

[Bug tree-optimization/114396] [14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #7 from Robin Dapp --- diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 4375ebdcb49..f8f7ba0ccc1 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9454,7 +9454,7 @@ vect_peel_nonlinear_iv_init

[Bug target/114396] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #3 from Robin Dapp --- -O3 -mavx2 -fno-vect-cost-model -fwrapv seems to be sufficient.

[Bug target/114396] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 Robin Dapp changed: What|Removed |Added Target|riscv*-*-* |x86_64-*-* riscv*-*-* --- Comment #2 from

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #29 from Robin Dapp --- Yes, that also appears to work here. There was no lto involved this time? Now we need to figure out what's different with SPEC.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #27 from Robin Dapp --- Can you try it with a simpler (non SPEC) test? Maybe there is still something weird happening with SPEC's scripting.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #24 from Robin Dapp --- I rebuilt GCC from scratch with your options but still have the same problem. Could our sources differ? My SPEC version might not be the most recent but I'm not aware that mcf changed at some point. Just

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #22 from Robin Dapp --- Still the same problem unfortunately. I'm a bit out of ideas - maybe your compiler executables could help?

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #20 from Robin Dapp --- No change with -std=gnu99 unfortunately.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #18 from Robin Dapp --- Hmm, doesn't help unfortunately. A full command line for me looks like: x86_64-pc-linux-gnu-gcc -c -o pbeampp.o -DSPEC_CPU -DNDEBUG -DWANT_STDC_PROTO -Ofast -march=znver4 -mtune=znver4 -flto=32 -g

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #16 from Robin Dapp --- Thank you! I'm having a problem with the data, though. Compiling with -Ofast -march=znver4 -mtune=znver4 -flto -fprofile-use=/tmp. Would you mind showing your exact final options for compilation of e.g.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #10 from Robin Dapp --- (In reply to Sam James from comment #9) > (In reply to Filip Kastl from comment #8) > > I'd like to help but I'm afraid I cannot send you the SPEC binaries with PGO > > applied since SPEC is licensed nor can

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #7 from Robin Dapp --- I built executables with and without the commit (-Ofast -march=znver4 -flto). There is no difference so it must really be something that happens with PGO. I'd really need access to a zen4 box or the pgo

[Bug target/114202] [14] RISC-V rv64gcv: miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114202 Robin Dapp changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/114200] [14] RISC-V fixed-length vector miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200 --- Comment #3 from Robin Dapp --- *** Bug 114202 has been marked as a duplicate of this bug. ***

[Bug middle-end/114196] [13/14 Regression] Fixed length vector ICE: in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9454

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196 Robin Dapp changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug target/114200] [14] RISC-V fixed-length vector miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200 --- Comment #1 from Robin Dapp --- Took me a while to analyze this... needed more time than I'd like to admit to make sense of the somewhat weird code created by fully unrolling and peeling. I believe the problem is that we reload the output

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #6 from Robin Dapp --- Honestly, I don't know how to analyze/debug this without a zen4, in particular as it only seems to happen with PGO. I tried locally but of course the execution time doesn't change (same as with zen3 according

[Bug middle-end/114109] x264 satd vectorization vs LLVM

2024-02-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 --- Comment #4 from Robin Dapp --- Yes, as mentioned, vectorization of the first loop is debatable.

[Bug middle-end/114109] x264 satd vectorization vs LLVM

2024-02-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 --- Comment #2 from Robin Dapp --- It is vectorized with a higher zvl, e.g. zvl512b, refer https://godbolt.org/z/vbfjYn5Kd.

[Bug middle-end/114109] New: x264 satd vectorization vs LLVM

2024-02-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 Bug ID: 114109 Summary: x264 satd vectorization vs LLVM Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement

[Bug target/114028] [14] RISC-V rv64gcv_zvl256b: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114028 --- Comment #2 from Robin Dapp --- This is a target issue. It looks like we try to construct a "superword" sequence when the element size is already == Pmode. Testing a patch.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #9 from Robin Dapp --- Argh, I actually just did a gcc -O3 -march=native pr114027.c -fno-vect-cost-model on cfarm188 with a recent-ish GCC but realized that I used my slightly modified version and not the original test case. long

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 Robin Dapp changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Last

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-02-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #4 from Robin Dapp --- Judging by the graph it looks like it was slow before, then got faster and now slower again. Is there some more info on why it got faster in the first place? Did the patch reverse something or is it rather a

[Bug target/113827] MrBayes benchmark redundant load on riscv

2024-02-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 --- Comment #1 from Robin Dapp --- x86 (-march=native -O3 on an i7 12th gen) looks pretty similar: .L3: movq(%rdi), %rax vmovups (%rax), %xmm1 vdivps %xmm0, %xmm1, %xmm1 vmovups %xmm1, (%rax) addq

[Bug target/113827] New: MrBayes benchmark redundant load

2024-02-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 Bug ID: 113827 Summary: MrBayes benchmark redundant load Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #23 from Robin Dapp --- > this is: > > _429 = mask_patt_205.47_276[i] ? vect_cst__262[i] : (vect_cst__262 << > {0,..})[i]; > vect_iftmp.55_287 = mask_patt_209.54_286[i] ? _429 [i] : vect_cst__262[i] But isn't it rather _429 =

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-30 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #19 from Robin Dapp --- What seems odd to me is that in fre5 we simplify _429 = .COND_SHL (mask_patt_205.47_276, vect_cst__262, vect_cst__262, { 0, ... }); vect_prephitmp_129.51_282 = _429; vect_iftmp.55_287 = VEC_COND_EXPR ;

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #18 from Robin Dapp --- Hehe no it doesn't make sense... I wrongly read a v2 as a v1. Please disregard the last message.

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #17 from Robin Dapp --- Grasping for straws by blaming qemu ;) At some point we do the vector shift vsll.vv v1,v2,v2,v0.t but the mask v0 is all zeros: gdb: b = {0 } According to the mask-undisturbed policy set before

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #16 from Robin Dapp --- Disabling vec_extract makes us operate on non-partial vectors, though so there are a lot of differences in codegen. I'm going to have a look.

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #9 from Robin Dapp --- (In reply to rguent...@suse.de from comment #6) > t.c:47:21: missed: the size of the group of accesses is not a power of 2 > or not equal to 3 > t.c:47:21: missed: not falling back to elementwise

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #10 from Robin Dapp --- The compile farm machine I'm using doesn't have SVE. Compiling with -march=armv8-a -O3 pr113607.c -fno-vect-cost-model and running it returns 0 (i.e. ok). pr113607.c:35:5: note: vectorized 3 loops in

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #7 from Robin Dapp --- Yep, that one fails for me now, thanks.

[Bug target/113607] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3

2024-01-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607 --- Comment #4 from Robin Dapp --- I cannot reproduce it either, tried with -ftree-vectorize as well as -fno-vect-cost-model.

[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)

2024-01-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 --- Comment #14 from Robin Dapp --- Ok, running tests with the adjusted version and going to post a patch afterwards. However, during a recent run compiling insn-recog took 2G and insn-emit-7 as well as insn-emit-10 required > 1.5G each.

[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 --- Comment #12 from Robin Dapp --- Created attachment 57209 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57209=edit Tentative I tested the attached "fix". On my machine with 13.2 host compiler it reduced the build time for

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #2 from Robin Dapp --- > It's interesting, for Clang only RISC-V can vectorize it. The full loop can be vectorized on clang x86 as well when I remove the first conditional (which is not in the snippet I posted above). So that's

[Bug tree-optimization/113583] New: Main loop in 519.lbm not vectorized.

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 Bug ID: 113583 Summary: Main loop in 519.lbm not vectorized. Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 --- Comment #7 from Robin Dapp --- Ok, I'm going to check.

[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #5

[Bug target/113570] RISC-V: SPEC2017 549 fotonik3d miscompilation in autovec VLS 256 build

2024-01-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570 --- Comment #2 from Robin Dapp --- I'm pretty certain this is "works as intended" and -Ofast causes the precision to be different than with -O3 (and dependant on the target). See also: It has been reported that with gfortran -Ofast

[Bug testsuite/113558] [14 regression] gcc.dg/vect/vect-outer-4c-big-array.c etc. FAIL

2024-01-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113558 --- Comment #2 from Robin Dapp --- Created attachment 57195 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57195=edit Tentative patch Ah, it looks like nothing is being vectorized at all and the second check just happened to match as

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2024-01-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087 --- Comment #38 from Robin Dapp --- deepsjeng also looks ok here.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2024-01-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087 --- Comment #37 from Robin Dapp --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113206#c9 > Using 4a0a8dc1b88408222b88e10278017189f6144602, the spec run failed on: > zvl128b (All runtime fails): > 527.cam4 (Runtime) > 531.deepsjeng (Runtime)

[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

2024-01-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #27 from Robin Dapp --- Following up on this: I'm seeing the same thing Patrick does. We create a lot of large non-sparse sbitmaps that amount to around 33G in total. I did local experiments replacing all sbitmaps that are not

[Bug c/113474] RISC-V: Fail to use vmerge.vim for constant vector

2024-01-18 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113474 --- Comment #1 from Robin Dapp --- Good catch. Looks like the ifn expander always forces into a register. That's probably necessary on all targets except riscv. diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247 --- Comment #9 from Robin Dapp --- I also noticed this (likely unwanted) vector snippet and wondered where it is being created. First I thought it's a vec_extract but doesn't look like it. I'm going to check why we create this. Pan, the test

[Bug middle-end/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2024-01-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #22 from Robin Dapp --- Yes, going to the thread soon.

[Bug target/113249] RISC-V: regression testsuite errors -mtune=generic-ooo

2024-01-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113249 --- Comment #4 from Robin Dapp --- > One of the reasons I've been testing things with generic-ooo is because > generic-ooo had initial vector pipelines defined. For cleaning up the > scheduler, I copied over the generic-ooo pipelines into

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247 --- Comment #4 from Robin Dapp --- The other option is to assert that all tune models have at least a vector cost model rather than NULL... But not falling back to the builtin costs still makes sense.

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247 --- Comment #3 from Robin Dapp --- Yes, sure and I gave a bit of detail why the values chosen there (same as aarch64) make sense to me. Using this generic vector cost model by default without adjusting the latencies is possible. I would be OK

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247 --- Comment #1 from Robin Dapp --- Hmm, so I tried reproducing this and without a vector cost model we indeed vectorize. My qemu dynamic instruction count results are not as abysmal as yours but still bad enough (20-30% increase in dynamic

[Bug target/113281] [14] RISC-V rv64gcv_zvl256b vector: Runtime mismatch with rv64gc

2024-01-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281 --- Comment #2 from Robin Dapp --- Confirmed. Funny, we shouldn't vectorize that but really optimize to "return 0". Costing might be questionable but we also haven't optimized away the loop when comparing costs. Disregarding that, of course

[Bug target/113249] RISC-V: regression testsuite errors -mtune=generic-ooo

2024-01-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113249 --- Comment #1 from Robin Dapp --- Yes, several (most?) of those are expected because the tests rely on the default latency model. One option is to hard code the tune in those tests. On the other hand the dump tests checking for a more or less

[Bug target/112999] riscv: Infinite loop with mask extraction

2023-12-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/112773] [14 Regression] RISC-V ICE: in force_align_down_and_div, at poly-int.h:1828 on rv32gcv_zvl256b

2023-12-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773 --- Comment #16 from Robin Dapp --- I'd hope it was not fixed by this but just latent because we chose a VLS-mode vectorization instead. Hopefully we're better off with the fix than without :)

[Bug target/113014] RISC-V: Redundant zeroing instructions in reduction due to r14-3998-g6223ea766daf7c

2023-12-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014 --- Comment #4 from Robin Dapp --- Richard has posted it and asked for reviews. I have tested it and we have several testsuite regressions with it but no severe ones. Most or all of them are dump fails because we combine into vx variants that

[Bug target/113014] RISC-V: Redundant zeroing instructions in reduction due to r14-3998-g6223ea766daf7c

2023-12-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014 --- Comment #2 from Robin Dapp --- Yes, that's right.

[Bug target/112999] riscv: Infinite loop with mask extraction

2023-12-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999 --- Comment #1 from Robin Dapp --- What actually gets in the way of vec_extract here is changing to a "better" vector mode (which is RVVMF4QI here). If we tried to extract from the mask directly everything would work directly. I have a patch

[Bug target/112999] New: riscv: Infinite loop with mask extraction

2023-12-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999 Bug ID: 112999 Summary: riscv: Infinite loop with mask extraction Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component:

[Bug middle-end/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2023-12-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #8 from Robin Dapp --- Yes, can confirm that this helps.

[Bug target/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2023-12-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #5 from Robin Dapp --- Yes that's what I just tried. No infinite loop anymore then. But that's not a new simplification and looks reasonable so there must be something special for our backend.

[Bug target/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2023-12-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #3 from Robin Dapp --- In match.pd we do something like this: ;; Function e (e, funcdef_no=0, decl_uid=2751, cgraph_uid=1, symbol_order=4) Pass statistics of "forwprop": Matching expression match.pd:2771,

[Bug target/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2023-12-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #2 from Robin Dapp --- It doesn't look like the same issue to me. The other bug is related to TImode handling in combination with mask registers. I will also have a look at this one.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #15 from Robin Dapp --- I think we need to make sure that we're not writing out of bounds. In that case anything might happen and if we just don't happen to overwrite this variable we might hit another one but the test can still

[Bug target/112853] RISC-V: RVV: SPEC2017 525.x264 regression

2023-12-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853 --- Comment #10 from Robin Dapp --- I just realized that I forgot to post the comparison recently. With the patch now upstream I don't see any differences for zvl128b and different vlens anymore. What I haven't fully tested yet is zvl256b or

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #13 from Robin Dapp --- I just built from the most recent commit and it still fails for me. Could there be a difference in qemu? I'm on qemu-riscv64 version 8.1.91 but yours is even newer so that might not explain it. You could

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #9 from Robin Dapp --- In the good version the length is 32 here because directly before the vsetvl we have: li a4,32 That seems to get lost somehow.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #7 from Robin Dapp --- Here 0x105c6 vse8.v v8,(a5) is where we overwrite m. The vl is 128 but the preceding vsetvl gets a4 = 46912504507016 as AVL which seems already borken.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #6 from Robin Dapp --- This seems to be gone when simple vsetvl (instead of lazy) is used or with -fno-schedule-insns which might indicate a vsetvl pass problem. We might have a few more of those. Maybe it would make sense to run

[Bug target/112853] RISC-V: RVV: SPEC2017 525.x264 regression

2023-12-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853 --- Comment #8 from Robin Dapp --- With Juzhe's latest fix that disables VLS modes >= 128 bit for zvl128b x264 runs without issues here and some of the additional execution failures are gone. Will post the current comparison later.

[Bug middle-end/112872] [14 Regression] RISCV ICE: in store_integral_bit_field, at expmed.cc:1049 with -03 rv64gcv_zvl1024b --param=riscv-autovec-preference=fixed-vlmax

2023-12-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112872 --- Comment #2 from Robin Dapp --- Thanks. Yes that's similar and also looks fixed by the introduction of the vec_init expander. Added this test case to the patch and will push it soon.

[Bug target/112853] RISC-V: RVV: SPEC2017 525.x264 regression

2023-12-05 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853 --- Comment #7 from Robin Dapp --- Ah, forgot three tests: FAIL: gcc.dg/vect/bb-slp-cond-1.c execution test FAIL: gcc.dg/vect/bb-slp-pr101668.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/bb-slp-pr101668.c execution test On

[Bug target/112853] RISC-V: RVV: SPEC2017 525.x264 regression

2023-12-05 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853 --- Comment #6 from Robin Dapp --- I indeed see more failures with _zvl128b, vlen=256 (than with _zvl128b, vlen=128): FAIL: gcc.dg/vect/pr66251.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/pr66251.c execution test FAIL:

  1   2   3   >