[Bug target/111317] RISC-V: Incorrect COST model for RVV conversions

2023-09-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111317 --- Comment #1 from Robin Dapp --- I think the default cost model is not too bad for these simple cases. Our emitted instructions match gimple pretty well. The thing we don't model is vsetvl. We could ignore it under the assumption that it

[Bug middle-end/111337] ICE in gimple-isel.cc for RISC-V port

2023-09-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337 --- Comment #12 from Robin Dapp --- Yes, as far as I know. I would also go ahead and merge the test suite patch now as there is already a v2 fix posted. Even if it's not the correct one it will be done soon so we should not let that block

[Bug middle-end/111337] ICE in gimple-isel.cc for RISC-V port

2023-09-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337 --- Comment #8 from Robin Dapp --- Yes, I doubt we would get much below 4 instructions with riscv specifics. A quick grep yesterday didn't reveal any aarch64 or gcn patterns for those (as long as they are not hidden behind some pattern

[Bug middle-end/111337] ICE in gimple-isel.cc for RISC-V port

2023-09-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337 --- Comment #10 from Robin Dapp --- I would be OK with the riscv implementation, then we don't need to touch isel. Maybe a future vector extension will also help us here so we could just switch the implementation then.

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53 --- Comment #2 from Robin Dapp --- With the current trunk we don't spill anymore: (VLS) .L4: vle32.v v2,0(a5) vadd.vv v1,v1,v2 addia5,a5,16 bne a5,a4,.L4 Considering just that loop I'd say costing works

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53 --- Comment #4 from Robin Dapp --- Yes, with VLS reduction this will improve. On aarch64 + sve I see loop inside costs: 2 This is similar to our VLS costs. And their loop is indeed short: ld1wz30.s, p7/z, [x0, x2, lsl 2]

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #2

[Bug c/111337] ICE in gimple-isel.cc for RISC-V port

2023-09-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #1

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #6 from Robin Dapp --- Created attachment 55902 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55902=edit Tentative You're referring to the case where we have init = -0.0, the condition is false and we end up wrongly doing

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #3 from Robin Dapp --- Several other things came up, so I'm just going to post the latest status here without having revised or tested it. Going to try fixing it and testing tomorrow. --- a/gcc/tree-vect-loop.cc +++

[Bug target/111311] New: RISC-V regression testsuite errors with --param=riscv-autovec-preference=scalable

2023-09-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311 Bug ID: 111311 Summary: RISC-V regression testsuite errors with --param=riscv-autovec-preference=scalable Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794 --- Comment #10 from Robin Dapp --- >From what I can tell with my barely working connection no regressions on x86, aarch64 or power10 with the adjusted check.

[Bug target/112109] New: Missing riscv vectorized strcmp (and other) expanders

2023-10-27 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112109 Bug ID: 112109 Summary: Missing riscv vectorized strcmp (and other) expanders Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-11-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #30 from Robin Dapp --- On my machine it is not nearly as bad as insn-emit.cc. What dominates for me with a GCC 13 host compiler is the already fixed insn-opinit problem. How long does it take for you (maybe in % of the total

[Bug target/111311] RISC-V regression testsuite errors with --param=riscv-autovec-preference=scalable

2023-11-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311 --- Comment #10 from Robin Dapp --- As a general remark: Some of those are present on other backends as well, some have been introduced by recent common-code changes and some are bogus test prerequisites or checks. I'm not saying we are in

[Bug target/112363] GCN: 'FAIL: gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c execution test'

2023-11-03 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112363 --- Comment #1 from Robin Dapp --- This test was introduced in order to check that we correctly "reduce" with -0.0 as neutral element, i.e. a reduction preserves an intial -0.0 and doesn't turn it into 0.0 by adding 0.0. Kernel aborted means

[Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076

2023-11-03 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361 --- Comment #2 from Robin Dapp --- I can have a look. Of course I tested it but neither the compile farm machine (gcc188) I used nor my local device have AVX512 run capability. Anywhere else I can test it?

[Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076

2023-11-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361 --- Comment #6 from Robin Dapp --- So "before" we created vect__3.12_55 = MEM [(float *)vectp_a.10_53]; vect__ifc__43.13_57 = VEC_COND_EXPR ; // _ifc__43 = _24 ? _3 : 0.0; stmp__44.14_58 = BIT_FIELD_REF ; stmp__44.14_59 = r3_29 +

[Bug middle-end/112359] [14 Regression] ICE: in expand_fn_using_insn, at internal-fn.cc:215 with -O -ftree-loop-if-convert -mavx512fp16

2023-11-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112359 --- Comment #2 from Robin Dapp --- Would something like + bool allow_cond_op = flag_tree_loop_vectorize +&& !gimple_bb (phi)->loop_father->dont_vectorize; in convert_scalar_cond_reduction be sufficient or are the more conditions to check

[Bug target/111488] New: ICE ion riscv gcc.dg/vect/vect-126.c

2023-09-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488 Bug ID: 111488 Summary: ICE ion riscv gcc.dg/vect/vect-126.c Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug target/111488] ICE ion riscv gcc.dg/vect/vect-126.c

2023-09-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488 Robin Dapp changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment

[Bug target/111428] RISC-V vector: Flaky segfault in {min|max}val_char_{1|2}.f90

2023-09-21 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111428 --- Comment #2 from Robin Dapp --- Reproduced locally. The identical binary sometimes works and sometimes doesn't so it must be a race...

[Bug target/111506] RISC-V: Failed to vectorize conversion from INT64 -> _Float16

2023-10-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111506 --- Comment #5 from Robin Dapp --- Ah, thanks Joseph, so this at least means that we do not need !flag_trapping_math here. However, the vectorizer emulates the 64-bit integer to _Float16 conversion via an intermediate int32_t and now the riscv

[Bug target/111506] RISC-V: Failed to vectorize conversion from INT64 -> _Float16

2023-10-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111506 Robin Dapp changed: What|Removed |Added CC||joseph at codesourcery dot com ---

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #16 from Robin Dapp --- Confirming that it's the compilation of insn-emit.cc which takes > 10 minutes. The rest (including auto generating of files) is reasonably fast. Going to do some experiments with it and see which pass takes

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #18 from Robin Dapp --- Just finished an initial timing run, sorted, first 10: Time variable usr sys wall GGC phase opt and generate : 567.60 ( 97%) 38.23

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #20 from Robin Dapp --- Mhm, why is your profile so different from mine? I'm also on an x86_64 host with a 13.2.1 host compiler (Fedora). Is it because of the preprocessed source? Or am I just reading the timing report wrong?

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #22 from Robin Dapp --- Ah, then it's not that different, your machine is just faster ;) callgraph ipa passes : 69.77 ( 11%) 5.97 ( 13%) 76.05 ( 12%) 2409M ( 10%) integration: 91.95 (

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 Robin Dapp changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #12

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #23 from Robin Dapp --- For the lack of a better idea (and time constraints as looking for compiler bottlenecks is slow and tedious) I went with Kito's suggestion of splitting insn-emit.cc This reduces this part of the compilation

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #25 from Robin Dapp --- At least here locally the maximum I saw was 1.4 GB of RES for insn-emit-10.cc. That's still not ideal (especially when 8 or 10 of those files compile in parallel) but at least no 8 GB for a single file

[Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function

2023-10-18 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791 --- Comment #4 from Robin Dapp --- This is a scalar popcount and as Kito already noted we will just emit cpop a0, a0 once the zbb extension is present. As to the question what is actually being vectorized here, I'm not so sure :D It looks

[Bug tree-optimization/111760] risc-v regression: COND_LEN_* incorrect fold/simplify in middle-end

2023-10-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111760 --- Comment #6 from Robin Dapp --- Yes, thanks for filing this bug separately. The patch doesn't disable all of those optimizations, of course I paid special attention not mess up with them. The difference here is that we valueize, add

[Bug target/111428] RISC-V vector: Flaky segfault in {min|max}val_char_{1|2}.f90

2023-10-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111428 --- Comment #3 from Robin Dapp --- Still difficult to track down. The following is a smaller reproducer: program main implicit none integer, parameter :: n=5, m=3 integer, dimension(n,m) :: v real, dimension(n,m) :: r do call

[Bug tree-optimization/111760] risc-v regression: COND_LEN_* incorrect fold/simplify in middle-end

2023-10-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111760 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org,

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794 --- Comment #5 from Robin Dapp --- Disregarding the reasons for the precision adjustment, for this case here, we seem to fail at: /* We do not handle bit-precision changes. */ if ((CONVERT_EXPR_CODE_P (code) || code ==

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794 --- Comment #9 from Robin Dapp --- Yes, that's from pattern recog: slp.c:11:20: note: === vect_pattern_recog === slp.c:11:20: note: vect_recog_mask_conversion_pattern: detected: _5 = _2 & _4; slp.c:11:20: note: mask_conversion pattern

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794 --- Comment #7 from Robin Dapp --- vectp.4_188 = x_50(D); vect__1.5_189 = MEM [(int *)vectp.4_188]; mask__2.6_190 = { 1, 1, 1, 1, 1, 1, 1, 1 } == vect__1.5_189; mask_patt_156.7_191 = VIEW_CONVERT_EXPR>(mask__2.6_190); _1 = *x_50(D);

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #26 from Robin Dapp --- So insn-opinit.cc still takes 2-3 minutes to compile here, even though the file is not gigantic. With the same GCC 13.1 x86 host compiler I see: phase opt and generate : 170.28 ( 99%) 0.75 (

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794 --- Comment #4 from Robin Dapp --- Just to mention here as well. As this seems ninstance++ where the adjust_precision thing comes back to bite us, I'm going to go back and check if the issue why it was introduced (DCE?) cannot be solved

[Bug target/110559] Bad mask_load/mask_store codegen of RVV

2023-08-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110559 --- Comment #3 from Robin Dapp --- I got back to this again today, now that pressure-aware scheduling is the default. As mentioned before, it helps but doesn't get rid of the spills. Testing with the "generic ooo" scheduling model it looks

[Bug tree-optimization/111136] New: ICE in RISC-V test case since r14-3441-ga1558e9ad85693

2023-08-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36 Bug ID: 36 Summary: ICE in RISC-V test case since r14-3441-ga1558e9ad85693 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-08-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53 --- Comment #1 from Robin Dapp --- We seem to decide that a slightly more expensive loop (one instruction more) without an epilogue is better than a loop with an epilogue. This looks intentional in the vectorizer cost estimation and is not

[Bug tree-optimization/111136] ICE in RISC-V test case since r14-3441-ga1558e9ad85693

2023-08-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36 --- Comment #4 from Robin Dapp --- All gather-scatter tests pass for me again (the given example in particular) after applying this.

[Bug target/108271] Missed RVV cost model

2023-08-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108271 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #3

[Bug rtl-optimization/108412] RISC-V: Negative optimization of GCSE && LOOP INVARIANTS

2023-08-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108412 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #3

[Bug tree-optimization/112464] [14 Regression] ICE avx512 with -ftrapv since r14-5076

2023-11-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464 --- Comment #4 from Robin Dapp --- Is there another way to make it more robust? Or does the existing void vect_finish_replace_stmt (vec_info *vinfo, stmt_vec_info stmt_info, gimple *vec_stmt) { gimple *scalar_stmt

[Bug tree-optimization/112464] [14 Regression] ICE avx512 with -ftrapv since r14-5076

2023-11-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464 --- Comment #2 from Robin Dapp --- I tested diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a544bc9b059..257fd40793e 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7084,7 +7084,7 @@

[Bug middle-end/112406] [14 Regression] Several SPECCPU 2017 benchmarks fail with on internal compiler error: in expand_insn, at optabs.cc:8305 after g:01c18f58d37865d5f3bbe93e666183b54ec608c7

2023-11-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406 --- Comment #11 from Robin Dapp --- Thanks, this is helpful. I have a patch that I just bootstrapped and ran the testsuite with on aarch64. Going to post it soon, maybe Richi still has a better idea how to work around this.

[Bug middle-end/91213] Missed optimization: (sub X Y) -> (xor X Y) when Y <= X and isPowerOf2(X + 1)

2022-08-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91213 rdapp at gcc dot gnu.org changed: What|Removed |Added CC||rdapp at gcc dot gnu.org ---

[Bug middle-end/91213] Missed optimization: (sub X Y) -> (xor X Y) when Y <= X and isPowerOf2(X + 1)

2022-08-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91213 --- Comment #8 from rdapp at gcc dot gnu.org --- Hacked something together, inspired by the other cases that try two different sequences. Does this go into the right direction? Works for me on s390. I see some regressions related to predictive

[Bug middle-end/91213] Missed optimization: (sub X Y) -> (xor X Y) when Y <= X and isPowerOf2(X + 1)

2022-08-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91213 --- Comment #9 from rdapp at gcc dot gnu.org --- The regressions are unrelated and due to another patch that I still had on the same branch.

[Bug target/106701] Compiler does not take into account number range limitation to avoid subtract from immediate

2022-08-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106701 rdapp at gcc dot gnu.org changed: What|Removed |Added Target|s390|s390 x86_64-linux-gnu

[Bug target/106701] Compiler does not take into account number range limitation to avoid subtract from immediate

2022-08-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106701 --- Comment #3 from rdapp at gcc dot gnu.org --- I though expand (or combine) were independent of value range. What would be the proper place for it then?

[Bug tree-optimization/100756] vect: Superfluous epilog created on s390x

2022-10-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100756 rdapp at gcc dot gnu.org changed: What|Removed |Added CC||rdapp at gcc dot gnu.org ---

[Bug target/106919] [13 Regression] RTL check: expected code 'set' or 'clobber', have 'if_then_else' in s390_rtx_costs, at config/s390/s390.cc:3672on s390x-linux-gnu since r13-2251-g1930c5d05ceff2

2022-09-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106919 --- Comment #8 from rdapp at gcc dot gnu.org --- Yes, one of dst and dest is superflous. Looks good like that. I bootstrapped the same patch locally already, no regressions.

[Bug rtl-optimization/105988] [10/11/12/13 Regression] ICE in linemap_ordinary_map_lookup, at libcpp/line-map.cc:1064 since r6-4873-gebedc9a3414d8422

2022-08-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105988 rdapp at gcc dot gnu.org changed: What|Removed |Added Target|x86_64-pc-linux-gnu |x86_64-pc-linux-gnu s390 ---

[Bug middle-end/106527] New: ICE with modulo scheduling dump (-fdump-rtl-sms)

2022-08-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106527 Bug ID: 106527 Summary: ICE with modulo scheduling dump (-fdump-rtl-sms) Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug middle-end/107617] SCC-VN with len_store and big endian

2022-11-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107617 rdapp at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P4

[Bug middle-end/107617] New: SCC-VN with len_store and big endian

2022-11-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107617 Bug ID: 107617 Summary: SCC-VN with len_store and big endian Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end

[Bug middle-end/107617] SCC-VN with len_store and big endian

2022-11-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107617 --- Comment #1 from rdapp at gcc dot gnu.org --- For completeness, the mailing list thread is here: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602252.html

[Bug tree-optimization/100756] [12 Regression] vect: Superfluous epilog created on s390x

2023-02-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100756 --- Comment #8 from rdapp at gcc dot gnu.org --- For completeness: haven't observed any fallout on s390 since and the regression is fixed.

[Bug target/110559] Bad mask_load/mask_store codegen of RVV

2023-07-07 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110559 --- Comment #1 from Robin Dapp --- This can be improved in parts by enabling register-pressure aware scheduling. The rest is due to the default issue rate of 1. Setting proper instruction latency will then obviously cause a bit more reordering

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #2 from Robin Dapp --- > It's interesting, for Clang only RISC-V can vectorize it. The full loop can be vectorized on clang x86 as well when I remove the first conditional (which is not in the snippet I posted above). So that's

[Bug tree-optimization/113583] New: Main loop in 519.lbm not vectorized.

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 Bug ID: 113583 Summary: Main loop in 519.lbm not vectorized. Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #5

[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 --- Comment #7 from Robin Dapp --- Ok, I'm going to check.

[Bug target/113570] RISC-V: SPEC2017 549 fotonik3d miscompilation in autovec VLS 256 build

2024-01-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570 --- Comment #2 from Robin Dapp --- I'm pretty certain this is "works as intended" and -Ofast causes the precision to be different than with -O3 (and dependant on the target). See also: It has been reported that with gfortran -Ofast

[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)

2024-01-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 --- Comment #12 from Robin Dapp --- Created attachment 57209 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57209=edit Tentative I tested the attached "fix". On my machine with 13.2 host compiler it reduced the build time for

[Bug target/113827] MrBayes benchmark redundant load on riscv

2024-02-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 --- Comment #1 from Robin Dapp --- x86 (-march=native -O3 on an i7 12th gen) looks pretty similar: .L3: movq(%rdi), %rax vmovups (%rax), %xmm1 vdivps %xmm0, %xmm1, %xmm1 vmovups %xmm1, (%rax) addq

[Bug target/113827] New: MrBayes benchmark redundant load

2024-02-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 Bug ID: 113827 Summary: MrBayes benchmark redundant load Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-02-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #4 from Robin Dapp --- Judging by the graph it looks like it was slow before, then got faster and now slower again. Is there some more info on why it got faster in the first place? Did the patch reverse something or is it rather a

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 Robin Dapp changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Last

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #9 from Robin Dapp --- Argh, I actually just did a gcc -O3 -march=native pr114027.c -fno-vect-cost-model on cfarm188 with a recent-ish GCC but realized that I used my slightly modified version and not the original test case. long

[Bug target/113014] RISC-V: Redundant zeroing instructions in reduction due to r14-3998-g6223ea766daf7c

2023-12-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014 --- Comment #4 from Robin Dapp --- Richard has posted it and asked for reviews. I have tested it and we have several testsuite regressions with it but no severe ones. Most or all of them are dump fails because we combine into vx variants that

[Bug target/113014] RISC-V: Redundant zeroing instructions in reduction due to r14-3998-g6223ea766daf7c

2023-12-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014 --- Comment #2 from Robin Dapp --- Yes, that's right.

[Bug target/112773] [14 Regression] RISC-V ICE: in force_align_down_and_div, at poly-int.h:1828 on rv32gcv_zvl256b

2023-12-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773 --- Comment #16 from Robin Dapp --- I'd hope it was not fixed by this but just latent because we chose a VLS-mode vectorization instead. Hopefully we're better off with the fix than without :)

[Bug target/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2023-12-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #2 from Robin Dapp --- It doesn't look like the same issue to me. The other bug is related to TImode handling in combination with mask registers. I will also have a look at this one.

[Bug middle-end/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2023-12-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #8 from Robin Dapp --- Yes, can confirm that this helps.

[Bug target/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2023-12-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #5 from Robin Dapp --- Yes that's what I just tried. No infinite loop anymore then. But that's not a new simplification and looks reasonable so there must be something special for our backend.

[Bug target/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2023-12-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #3 from Robin Dapp --- In match.pd we do something like this: ;; Function e (e, funcdef_no=0, decl_uid=2751, cgraph_uid=1, symbol_order=4) Pass statistics of "forwprop": Matching expression match.pd:2771,

[Bug target/112999] riscv: Infinite loop with mask extraction

2023-12-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999 --- Comment #1 from Robin Dapp --- What actually gets in the way of vec_extract here is changing to a "better" vector mode (which is RVVMF4QI here). If we tried to extract from the mask directly everything would work directly. I have a patch

[Bug target/112999] New: riscv: Infinite loop with mask extraction

2023-12-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999 Bug ID: 112999 Summary: riscv: Infinite loop with mask extraction Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component:

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #13 from Robin Dapp --- I just built from the most recent commit and it still fails for me. Could there be a difference in qemu? I'm on qemu-riscv64 version 8.1.91 but yours is even newer so that might not explain it. You could

[Bug target/112853] RISC-V: RVV: SPEC2017 525.x264 regression

2023-12-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853 --- Comment #10 from Robin Dapp --- I just realized that I forgot to post the comparison recently. With the patch now upstream I don't see any differences for zvl128b and different vlens anymore. What I haven't fully tested yet is zvl256b or

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #9 from Robin Dapp --- In the good version the length is 32 here because directly before the vsetvl we have: li a4,32 That seems to get lost somehow.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #6 from Robin Dapp --- This seems to be gone when simple vsetvl (instead of lazy) is used or with -fno-schedule-insns which might indicate a vsetvl pass problem. We might have a few more of those. Maybe it would make sense to run

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #7 from Robin Dapp --- Here 0x105c6 vse8.v v8,(a5) is where we overwrite m. The vl is 128 but the preceding vsetvl gets a4 = 46912504507016 as AVL which seems already borken.

[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime

2023-12-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 --- Comment #15 from Robin Dapp --- I think we need to make sure that we're not writing out of bounds. In that case anything might happen and if we just don't happen to overwrite this variable we might hit another one but the test can still

[Bug target/112999] riscv: Infinite loop with mask extraction

2023-12-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/113249] RISC-V: regression testsuite errors -mtune=generic-ooo

2024-01-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113249 --- Comment #1 from Robin Dapp --- Yes, several (most?) of those are expected because the tests rely on the default latency model. One option is to hard code the tune in those tests. On the other hand the dump tests checking for a more or less

[Bug target/113281] [14] RISC-V rv64gcv_zvl256b vector: Runtime mismatch with rv64gc

2024-01-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281 --- Comment #2 from Robin Dapp --- Confirmed. Funny, we shouldn't vectorize that but really optimize to "return 0". Costing might be questionable but we also haven't optimized away the loop when comparing costs. Disregarding that, of course

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247 --- Comment #9 from Robin Dapp --- I also noticed this (likely unwanted) vector snippet and wondered where it is being created. First I thought it's a vec_extract but doesn't look like it. I'm going to check why we create this. Pan, the test

[Bug middle-end/112971] [14] RISC-V rv64gcv_zvl256b vector -O3: internal compiler error: Segmentation fault signal terminated program cc1

2024-01-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971 --- Comment #22 from Robin Dapp --- Yes, going to the thread soon.

[Bug c/113474] RISC-V: Fail to use vmerge.vim for constant vector

2024-01-18 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113474 --- Comment #1 from Robin Dapp --- Good catch. Looks like the ifn expander always forces into a register. That's probably necessary on all targets except riscv. diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247 --- Comment #3 from Robin Dapp --- Yes, sure and I gave a bit of detail why the values chosen there (same as aarch64) make sense to me. Using this generic vector cost model by default without adjusting the latencies is possible. I would be OK

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247 --- Comment #4 from Robin Dapp --- The other option is to assert that all tune models have at least a vector cost model rather than NULL... But not falling back to the builtin costs still makes sense.

[Bug target/113249] RISC-V: regression testsuite errors -mtune=generic-ooo

2024-01-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113249 --- Comment #4 from Robin Dapp --- > One of the reasons I've been testing things with generic-ooo is because > generic-ooo had initial vector pipelines defined. For cleaning up the > scheduler, I copied over the generic-ooo pipelines into

[Bug target/113247] RISC-V: Performance bug in SHA256 after enabling RVV vectorization

2024-01-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247 --- Comment #1 from Robin Dapp --- Hmm, so I tried reproducing this and without a vector cost model we indeed vectorize. My qemu dynamic instruction count results are not as abysmal as yours but still bad enough (20-30% increase in dynamic

[Bug target/112853] RISC-V: RVV: SPEC2017 525.x264 regression

2023-12-05 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853 --- Comment #5 from Robin Dapp --- Can confirm. The scalable build works with qemu vlen=128 but fails with vlen=256. That's a good data point as I'm not sure we're already covering this with the current runs? I'm going to start a testsuite

  1   2   3   >