[Bug tree-optimization/115073] New: RISC-V: Gimple fold not honor C[LT]Z_DEFINED_VALUE_AT_ZERO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115073 Bug ID: 115073 Summary: RISC-V: Gimple fold not honor C[LT]Z_DEFINED_VALUE_AT_ZERO Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kito at gcc dot gnu.org Target Milestone: --- Target: riscv64-unknown-linux-gnu # What's up? A loop induction variable initialized from __builtin_ctz (x), and the loop bound is 32, and increment is one, and GCC turn it into infinite loops when x is 0. However RISC-V has defined CTZ_DEFINED_VALUE_AT_ZERO as 32 for SImode, so it's not UB IMO, but seems like gimple-range-op.cc and match.pd are not handle that. # Command to reproduce ``` $ riscv64-unknown-elf-gcc -O3 -march=rv64gc_zba_zbb_zbc ``` # Testcase ```c void f(); void foo(unsigned int id, unsigned int x) { for (unsigned int idx = __builtin_ctz(x); idx < 32; idx++) { f(); } } ``` # Asm output with comment: ``` foo: addisp,sp,-32 sd s0,16(sp) sd s1,8(sp) sd ra,24(sp) ctzws0,a1 # s0 is 32 if a1 is 0 li s1,32 .L2: addiw s0,s0,1 # thne s0 become 33 here callf bne s0,s1,.L2 # compare with 32, which never terminate ld ra,24(sp) ld s0,16(sp) ld s1,8(sp) addisp,sp,32 jr ra ``` # What I tried? I try to call CTZ_DEFINED_VALUE_AT_ZERO gimple-range-op.cc but it seems not help for this test case, and then I found it was screw up at match.pd when ccp pass. It applied a CTZ simplifications at match.pd: ``` (for op (eq ne) (simplify /* __builtin_ctz (x) == C -> (x & ((1 << (C + 1)) - 1)) == (1 << C). */ (op (ctz:s @0) INTEGER_CST@1) (with { tree type0 = TREE_TYPE (@0); int prec = TYPE_PRECISION (type0); } (if (prec <= MAX_FIXED_MODE_SIZE) (if (tree_int_cst_sgn (@1) < 0 || wi::to_widest (@1) >= prec) { constant_boolean_node (op == EQ_EXPR ? false : true, type); } (op (bit_and @0 { wide_int_to_tree (type0, wi::mask (tree_to_uhwi (@1) + 1, false, prec)); }) { wide_int_to_tree (type0, wi::shifted_mask (tree_to_uhwi (@1), 1, false, prec)); }))) ``` Then I found it has checked with CTZ_DEFINED_VALUE_AT_ZERO (g:75f8900159133ce069ef1d2edf3b67c7bc82e305) untill g:7383cb56e1170789929201b0dadc156888928fdd, but I realized it because is not really work well here CLZ_DEFINED_VALUE_AT_ZERO. So I did some aggressive experiment here: convert __builtin_ctz to IFN_CTZ with second operand (from C[LT]Z_DEFINED_VALUE_AT_ZERO, ideally), it can work *IF* backend provide patterns for ctz, but NOT work when backend is not provided, it could be a problem to RISC-V since ctz is not included in baseline ISA for RISC-V. It might be arguable if target didn't have ctz/clz pattern but C[LT]Z_DEFINED_VALUE_AT_ZERO is provided to backend, so I think middle-end optimization should still honor with that? Or another thought is convert that into target macro to resolve the issue describe in g:75f8900159133ce069ef1d2edf3b67c7bc82e305? # Aggressive experiment: ``` diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc index 494da49791d..d84469a6dca 100644 --- a/gcc/c-family/c-gimplify.cc +++ b/gcc/c-family/c-gimplify.cc @@ -858,7 +858,16 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p ATTRIBUTE_UNUSED, c, CALL_EXPR_ARG (*expr_p, 1)); return GS_OK; } - break; + if (fndecl && fndecl_built_in_p(fndecl, BUILT_IN_CTZ) && + call_expr_nargs(*expr_p) == 1) { +tree a = save_expr(CALL_EXPR_ARG(*expr_p, 0)); +*expr_p = build_call_expr_internal_loc( +EXPR_LOCATION(*expr_p), IFN_CTZ, TREE_TYPE(a), 2, a, +build_int_cst(TREE_TYPE(a), 32)); +return GS_OK; + } + +break; } default:; ```
[Bug target/114988] RISC-V: ICE in intrinsic __riscv_vfwsub_wf_f32mf2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114988 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #3 from Kito Cheng --- > Can this fix backport to GCC-14 ? Sure, GCC 14.1 released, so it open to accept fixes now :)
[Bug target/114747] [13 only] [RISC-V RVV] Wrong SEW set for mixed-size intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114747 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Kito Cheng --- Fixed on gcc 13 branch, and GCC 13.3 will have the fix :)
[Bug target/113095] [13 Regression] RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #8 from Kito Cheng --- Fixed on both trunk and GCC 13
[Bug target/111234] [13] RISC-V: ICE in vsetvl pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111234 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #6 from Kito Cheng --- Backport to GCC 13
[Bug c/114885] RISC-V: ICE of unrecog insn when graphite for both the c/c++ and fortran
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114885 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |NEW CC||kito at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2024-04-29 --- Comment #1 from Kito Cheng --- I can reproduce on my side
[Bug target/114172] [13 only] RISC-V: ICE with riscv rvv VSETVL intrinsic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114172 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #4 from Kito Cheng --- Fixed, and then gcc 13.3 will contain the fix, and that should release in near future :)
[Bug target/111935] gcc ICE with risc-v vector intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111935 Kito Cheng changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED CC||kito at gcc dot gnu.org --- Comment #5 from Kito Cheng --- Checked this has fixed on trunk and GCC 13 branch
[Bug target/111234] [13] RISC-V: ICE in vsetvl pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111234 --- Comment #4 from Kito Cheng --- Fixed on trunk, but still ICE on 13
[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2024-04-15 Status|UNCONFIRMED |NEW --- Comment #3 from Kito Cheng --- Reduced case, not the final result, but it already run 8+ hours... ``` typedef int a; typedef short b; typedef unsigned c; template < typename > using e = unsigned; template < typename > void ab(); #pragma riscv intrinsic "vector" template < typename f, int, int ac > struct g { using i = f; template < typename m > using j = g< m, 0, ac >; using k = g< i, 1, ac - 1 >; using ad = g< i, 1, ac + 1 >; }; namespace ae { struct af { using h = g< short, 6, 0 < 3 >; }; struct ag { using h = af::h; }; } template < typename, int > using ah = ae::ag::h; template < class ai > using aj = typename ai::i; template < class i, class ai > using j = typename ai::j< i >; template < class ai > using ak = j< e< ai >, ai >; template < class ai > using k = typename ai::k; template < class ai > using ad = typename ai::ad; template < a ap > vuint16m1_t ar(g< b, ap, 0 >, b); template < a ap > vuint16m2_t ar(g< b, ap, 1 >, b); template < a ap > vuint32m2_t ar(g< c, ap, 1 >, c); template < a ap > vuint32m4_t ar(g< c, ap, 2 >, c); template < class ai > using as = decltype(ar(ai(), aj< ai >())); template < class ai > as< ai > at(ai); namespace ae { template < int ap > vuint32m4_t au(g< c, ap, 1 + 1 >, vuint32m2_t l) { return __riscv_vlmul_ext_v_u32m2_u32m4(l); } } template < int ap > vuint32m2_t aw(g< c, ap, 1 >, vuint16m1_t l) { return __riscv_vzext_vf2_u32m2(l, 0); } namespace ae { vuint32m4_t ax(vuint32m4_t, vuint32m4_t, a); } template < class ay, class an > as< ay > az(ay ba, an bc) { an bb; return ae::ax(ae::au(ba, bc), ae::au(ba, bb), 2); } template < class bd > as< bd > be(bd, as< ad< bd > >); namespace ae { template < class bh, class bi > void bj(bh bk, bi bl) { ad< decltype(bk) > bn; az(bn, bl); } } template < int ap, int ac, class bp, class bq > void br(g< c, ap, ac > bk, bp, bq bl) { ae::bj(bk, bl); } template < class ai > using bs = decltype(at(ai())); struct bt; template < int ac = 1 > class bu { public: template < typename i > void operator()(i) { ah< i, ac > d; bt()(i(), d); } }; struct bt { template < typename bv, class bf > void operator()(bv, bf bw) { using bx = bv; ak< bf > by; k< bf > bz; using bq = bs< decltype(by) >; using bp = bs< decltype(bw) >; bp cb; ab< bx >(); for (;;) { bp cc; bq bl = aw(by, be(bz, cc)); br(by, cb, bl); } } }; void d() { bu()(b()); } ```
[Bug target/114130] [11 Regression] RISC-V: `__atomic_compare_exchange` does not use sign-extended value for RV64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114130 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #6 from Kito Cheng --- Fixed on trunk also backport to 11~13 branch.
[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639 --- Comment #4 from Kito Cheng --- Reduced case: ```c typedef long c; #pragma riscv intrinsic "vector" template struct d {}; struct e { using f = d<0>; }; struct g { using f = e::f; }; template using h = g::f; template long k(d); vbool16_t j(vuint64m4_t a) { c b; return __riscv_vmsne_vx_u64m4_b16(a, b, k(h())); } ```
[Bug target/114639] [riscv] ICE in create_pre_exit, at mode-switching.cc:451
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114639 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2024-04-08 --- Comment #1 from Kito Cheng --- Confirmed, and try to reducing the testcase.
[Bug target/106530] RISCV documentation for -march= is very lacking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106530 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from Kito Cheng --- g:19260a04ba6f75b1fae52afab50dcb43d44eb259 and g:5a22bb250d8f4ad239e12fea9828c18a0aa23e38 should address this issue :)
[Bug target/109349] riscv: Add --print-supported-extensions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109349 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #6 from Kito Cheng --- Implemented on trunk now :)
[Bug target/113742] ICE: RTL check: expected elt 1 type 'i' or 'n', have 'e' (rtx set) in riscv_macro_fusion_pair_p, at config/riscv/riscv.cc:8416 with -O2 -finstrument-functions -mtune=sifive-p600-se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113742 --- Comment #1 from Kito Cheng --- Thanks, forward and assigned this to our (SiFive) engineer :)
[Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #23 from Kito Cheng --- > I am considering whether we should disable LICM for RISC-V by default if > vector is enabled ? That's will cause regression for other program, also may hurt those program not vectorized but benefited from LICM.
[Bug target/113240] Use wrong rule to pass fixed-length(size<=2*XLEN) vector argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113240 --- Comment #6 from Kito Cheng --- > There needs to be a -Wabi warning for this too for the change between > versions. This bug only happened on trunk, and GCC 13 is OK, so I think it's not the case?
[Bug target/112929] [14] RISC-V vector: Variable clobbered at runtime
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #20 from Kito Cheng --- ``` .L15: li a3,9 lui a4,%hi(s) sw a3,%lo(j)(t2) sh a5,%lo(s)(a4) <--a4 is hold the address of s beq t0,zero,.L42 sw t5,8(t4) vsetvli zero,a4,e8,m8,ta,ma <<--- a4 as avl ```
[Bug target/112817] RISC-V: RVV: provide a preprocessor macro for VLS codegen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112817 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #8 from Kito Cheng --- This topic has raised at last RISC-V GCC sync meeting, and one action item for me is chat with JuzheZhong about -mrvv-vector-bits=zvl / __riscv_v_fixed_vlen / riscv_rvv_vector_bits stuffs
[Bug target/112478] riscv: asm clobbers not honored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112478 Kito Cheng changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Kito Cheng --- Fixed on trunk :)
[Bug target/112109] Missing riscv vectorized strcmp (and other) expanders
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112109 --- Comment #1 from Kito Cheng --- Just note: I would like to introduce `-mstringop-strategy=`, `-mmemcpy-strategy=` and -mmemset-strategy=` option to control the behavior like x86. the possible option list from my mind is: - auto: current status, use scalar or vector - libcall: always fallback to lib call - scalar: Only scalar - vector: Only vector I guess we may need few more option to control some detail, but it could add it to --param later.
[Bug target/112537] Is there a way to disable cpymem pass for rvv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537 --- Comment #11 from Kito Cheng --- It's not scope of auto vectorization, so I would suggest add something like `-mstringop-strategy=*` or `-mmemcpy-strategy=*` (from x86) or `-param=riscv-mops-memcpy-size-threshold=` (from aarch64). Personally I prefer x86 approach.
[Bug target/112537] Is there a way to disable cpymem pass for rvv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537 --- Comment #8 from Kito Cheng --- That remind me we may need one option like something -mgeneral-regs-only in aarch64 and also for target attribute. BTW, clang has an generic option called -mno-implicit-float can did similar thing
[Bug target/112478] riscv: asm clobbers not honored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112478 --- Comment #8 from Kito Cheng --- Proposed fix: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636466.html
[Bug target/112527] RVV integer vector instructions generated with rv64gc_zvfh
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112527 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #1 from Kito Cheng --- Just some boring supplement for the damm arch string since I guess not everyone know that rule well: zvfh require zve32f and zfhmin that means rv64gc_zvfh is equivalent to rv64gc_zvfh_zvfhmin_zve32f so rv64gc_zvfh has vector, but only zve32f.
[Bug target/112478] riscv: asm clobbers not honored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112478 Kito Cheng changed: What|Removed |Added Ever confirmed|0 |1 CC||kito at gcc dot gnu.org Last reconfirmed||2023-11-14 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |kito at gcc dot gnu.org --- Comment #6 from Kito Cheng --- Oh, I guess I know what happened, I was confused by the commit you refer (but it's the root cause as you pointed out!) since I thought it may related to far jumps, but...actually not, the problem is something you describe in the title, and can be demonstrate by following small program: ```c void foo() { asm volatile("# " : ::"ra"); } ``` Before that commit: ```asm foo: addisp,sp,-16 sd ra,8(sp) #APP # 2 "x.c" 1 # # 0 "" 2 #NO_APP ld ra,8(sp) addisp,sp,16 jr ra ``` After that commit: ```asm foo: .LFB0: .cfi_startproc #APP # 2 "x.c" 1 # # 0 "" 2 #NO_APP ret ``` But why? because ra is accidentally become caller save register by following change: https://github.com/gcc-mirror/gcc/commit/71f906498ada9ec2780660b03bd6e27a93ad350c#diff-4083cffa971a940af1d435359a45dbfd4d5934384275b0ae5e0c71dece5fd866R331 So we no longer save it at prologue and epilogue longer...anyway I will take this and send a patch to fix that soon.
[Bug target/112433] RISC-V GCC-15 feature: Split register allocation into RVV and non-RVV, and make vsetvl PASS run between them
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112433 --- Comment #4 from Kito Cheng --- Yeah, 3 major goal in LLVM is improving scheduling, partial spilling and re-materialization, but none of those points are issue for RISC-V GCC :P Ref: https://docs.google.com/presentation/d/1BOYNYKe1T-u3Q5HXRrcObLUkdKSPASmnuQTkALvJXto/edit
[Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 --- Comment #12 from Kito Cheng --- oh, yeah, you are right, it already take a5 to splat, so it's right, and as you said it must be VLMAX, unless it AVL prorogation for both splat and the following vadd.vv
[Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 --- Comment #10 from Kito Cheng --- (In reply to JuzheZhong from comment #9) > I have a draft patch to fix it: > > foo: > ble a0,zero,.L5 > vsetvli a5,zero,e32,m1,ta,ma > vid.v v2 > .L3: > vsetvli a5,a0,e32,m1,ta,ma > sllia4,a5,2 > vle32.v v3,0(a1) > sub a0,a0,a5 > vadd.vv v1,v2,v3 > vse32.v v1,0(a2) > add a1,a1,a4 > add a2,a2,a4 > vsetvli a4,zero,e32,m1,ta,ma > vmv.v.x v1,a5 this splat must be under "vsetvli a5,a0,e32,m1,ta,ma" rather than "vsetvlia4,zero,e32,m1,ta,ma" > vadd.vv v2,v2,v1 > bne a0,zero,.L3 > .L5: > ret > > Seems correct ?
[Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 --- Comment #8 from Kito Cheng --- > Oh. I understand it now. I think it's a bug. > > And.. I just take a look at my internal LLVM... > Also has same issue > > I think we need to adapt the Gimple IR here: > > _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]); > _21 = vect_vec_iv_.6_22 + { POLY_INT_CST [4, 4], ... }; > > change it into: > > _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]); > _21 = vect_vec_iv_.6_22 + _35; Yeah, so...I guess the original report still valid, it's just bring up another potential bug :P Personally I really hate that magic constraint for vl but it's just too late.
[Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 --- Comment #6 from Kito Cheng --- The key is the splat of VLMAX instruction need move into loop body, but AVL propagation should still able to do: ``` foo(int, int*, int*): ble a0,zero,.L5 csrra5,vlenb srlia5,a5,2 vsetvli a3,zero,e32,m1,ta,ma vid.v v2 .L3: vsetvli a5,a0,e32,m1,ta,ma sllia4,a5,2 vle32.v v1,0(a1) sub a0,a0,a5 vadd.vv v1,v1,v2 vse32.v v1,0(a2) add a1,a1,a4 vmv.v.x v4,a5 # Move to here, splat vl to a5 rather than VLMAX vsetvli a5,zero,e32,m1,ta,ma --- > redundant add a2,a2,a4 vadd.vv v2,v2,v4 bne a0,zero,.L3 .L5: ret ```
[Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 --- Comment #5 from Kito Cheng --- Assume: VLEN = 128 and n = 5, *in is {0, 0, 0, 0, 0} so VLMAX = 4 for e32m1 It can be run with vl = 4 for first iteration, and vl = 1 vl for second iteration But it could be something like that: vl = 3 for first iteration and vl = 2 for second iteration, ok, let run the code with that: foo(int, int*, int*): ble a0,zero,.L5 csrra5,vlenb srlia5,a5,2 vsetvli a3,zero,e32,m1,ta,ma vmv.v.x v4,a5 # v4 = {4, 4, 4, 4} vid.v v2# v2 = {0, 1, 2, 3} .L3: vsetvli a5,a0,e32,m1,ta,ma# first iteration got vl = 3 sllia4,a5,2 vle32.v v1,0(a1) # v1 = {0, 0, 0} sub a0,a0,a5 vadd.vv v1,v1,v2 # v1 = {0, 0, 0} + {0, 1, 2} vse32.v v1,0(a2) # out = {0, 1, 2, 0, 0} add a1,a1,a4 vsetvli a5,zero,e32,m1,ta,ma add a2,a2,a4 vadd.vv v2,v2,v4 # v2 = {0, 1, 2, 3} + {4, 4, 4, 4} #= {4, 5, 6, 7} bne a0,zero,.L3 .L5: ret Ok, let run second iteration: .L3: vsetvli a5,a0,e32,m1,ta,ma# first iteration got vl = 2 sllia4,a5,2 vle32.v v1,0(a1) # v1 = {0, 0} sub a0,a0,a5 vadd.vv v1,v1,v2 # v1 = {0, 0} + {4, 5} vse32.v v1,0(a2) # out = {0, 1, 2, 4, 5} add a1,a1,a4 vsetvli a5,zero,e32,m1,ta,ma add a2,a2,a4 vadd.vv v2,v2,v4 # v2 = {4, 5, 6, 7} + {4, 4, 4, 4} #= {8, 9, 10, 11} bne a0,zero,.L3 And the you will got {0, 1, 2, 4, 5} rather than {0, 1, 2, 3, 4}
[Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 --- Comment #2 from Kito Cheng --- oh, but the root cause might be little bit deeper, not just the problem of propagation or not propagation the AVL.
[Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #1 from Kito Cheng --- Actually I suspect that should be a bug rather than missed-optimization, that will only trigger on some CPU implementation, because ISA spec didn't guarantee penultimate iteration will always got VLMAX for vl... https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#63-constraints-on-setting-vl
[Bug c/112431] RISC-V GCC-15 feature: Support register overlap on widen RVV instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431 --- Comment #3 from Kito Cheng --- Share some thought from my end: we've tried at least 3 different approach on LLVM side before, and now we model that as "partial early clobber", we plan to upstream this on LLVM side but just didn't get high enough priority yet :( What means? Give some practical example to demo the idea: 1. It's normal live range without early clobber vadd x, y z # y and z is dead after this use. |-| | read | yz | | write | x | |-| 2. It's live range with early clobber. vadd x, y z # y and z is dead after this use, and assume x is early clobber. |-| | read | x yz | | write | x | |-| 3. It's live range with partial early clobber. vwadd.vv x, y, z # x is two time larger than y and z So we split x into xh and xl to represent the high part and low part, and assume high part can be overlap with others. || | read |xl yz | | write | xh xl | || And following case is assume high part can overlap with others: || | read | xh yz | | write | xh xl | || Then the register allocator should able to did the overlapping allocation naturally IF we build live range.
[Bug c/112433] RISC-V GCC-15 feature: Split register allocation into RVV and non-RVV, and make vsetvl PASS run between them
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112433 --- Comment #1 from Kito Cheng --- Give few more background why LLVM must do that way: LLVM can't allocate new pseudo register during register allocation process, however spilling vector register with specific length may require scratch register to setting the VL. And the benefit of more exactly live range for GPR is kind of by-products which we didn't aware during the discussion stage :P
[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #4 from Kito Cheng --- The testcase it self is look like tricky but right, it typically could use to optimize mixed-width (mixed-SEW) operations, You can refer to the EEW stuffs in v-spec[1], most load store has encoding static-EEW and then could apply such vsetvli fusion optimization. [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#52-vector-operands Give a (more) practical example here: ```c #include "riscv_vector.h" void foo(int32_t *in1, int16_t *in2, int16_t *in3, int32_t *out, size_t n, int cond, int avl) { size_t vl = __riscv_vsetvl_e16mf2(avl); vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); vint16mf2_t b = __riscv_vle16_v_i16mf2(in2, vl); vint16mf2_t c = __riscv_vle16_v_i16mf2(in3, vl); vint32m1_t x = __riscv_vwmacc_vv_i32m1(a, b, c, vl); __riscv_vse32_v_i32m1(out, x, vl); } ``` > Is is guaranteed by the RVV specification that the value of `vl' produced > (which is then supplied as an argument to `__riscv_vle32_v_i32m1', etc.; > I presume implicitly via the VL CSR as I can't see it in actual assembly > produced) is going to be the same for all microarchitectures for both: > > vsetvli zero,a6,e32,m1,tu,ma > >and: > > vsetvli zero,a6,e16,mf2,ta,ma This is another trick in this case: tail agnostic vs tail undisturbed tail undisturbed has stronger semantic than tail agnostic, so using tail undisturbed for agnostic is always safe and satisfied the semantic, same for mask agnostic vs mask undisturbed. But performance is another story, as I know some uArch implement agnostic as undisturbed, which means agnostic or undisturbed no much difference, so fuse those two vsetvli is become kind of optimization. However you could imagine, that also means some uArch is implement agnostic in another way: agnostic MAY has better performance than undisturbed, we should not fuse those vsetvli IF we are targeting such target, anyway, our cost model for RVV still in an initial states, so personally I am fine with that for now, but I guess we need add some more stuff to -mtune to handle those difference.
[Bug target/111926] RISC-V: Use vsetvl insn replace csrr vlenb insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111926 --- Comment #2 from Kito Cheng --- Forgot to mention, personally I love idea to simplify code gen, I could imagine that's definitely an optimization for specific uarch :)
[Bug target/111926] RISC-V: Use vsetvl insn replace csrr vlenb insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111926 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #1 from Kito Cheng --- Plz leave an option to let user has choice, performance things is hard to saw which is absolutely better for all uarch, my thought is leaving an option and let mtune and a command line option to control that.
[Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791 Bug ID: 111791 Summary: RISC-V: Strange loop vectorizaion on popcount function Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kito at gcc dot gnu.org Target Milestone: --- Symptom: A typical popcount implementation with Brian Kernighan’s algorithm, vectorizer has recognized that as popcount, but...come with strange vectorization result, I know that might because I add -fno-vect-cost-model, but I still don't understand why it vectorized, so I guess maybe it's something worth to report. NOTE: Those bad/strange code gen will gone once scalar popcount instruction available. Case: ``` int popcount(unsigned long value) { int nbits; for (nbits = 0; value != 0; value &= value - 1) nbits++; return nbits; } ``` Command to reproduce: ``` $ riscv64-unknown-linux-gnu-gcc x.c -march=rv64gcv -o - -S -fno-vect-cost-model -O3 ``` Sha1: g:faae30c49560f1481f036061fa2f894b0f7257f8 (some random point of top of trunk) Current output: ``` .globl popcount .type popcount, @function popcount: .LFB0: .cfi_startproc beq a0,zero,.L4 addisp,sp,-16 .cfi_def_cfa_offset 16 sd ra,8(sp) .cfi_offset 1, -8 call__popcountdi2 csrra2,vlenb sext.w a0,a0 srlia2,a2,2 vsetvli a3,zero,e32,m1,ta,ma vid.v v1 .L3: vsetvli a5,a0,e8,mf4,ta,ma sub a0,a0,a5 vsetvli a3,zero,e32,m1,ta,ma vmv1r.v v3,v1 vmv.v.x v2,a2 vadd.vv v1,v1,v2 bne a0,zero,.L3 ld ra,8(sp) .cfi_restore 1 addia5,a5,-1 vadd.vi v3,v3,1 vslidedown.vx v3,v3,a5 addisp,sp,16 .cfi_def_cfa_offset 0 vmv.x.s a0,v3 jr ra .L4: li a0,0 ret .cfi_endproc .LFE0: .size popcount, .-popcount ```
[Bug target/111600] [14 Regression] RISC-V bootstrap time regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #14 from Kito Cheng --- Some info for generated files: - File blankcomment code - insn-output.cc 3532 350291631721 insn-emit.cc 37288 402401161790 insn-recog.cc 44203 23 428130 insn-attrtab.cc 1014 30 169934 gimple-match-2.cc 77 2 49303 gimple-match-9.cc 29 2 33073 insn-extract.cc 241 8 28934 gimple-match-1.cc105 2 25578 gimple-match-8.cc114 2 24348 options.cc 325 1 24175 insn-opinit.cc12 6 20156 generic-match-9.cc55 2 19080 gimple-match-3.cc 98 2 17433 gimple-match-7.cc108 2 17105 gimple-match-10.cc 129 2 16888 gimple-match-6.cc115 2 16836 gimple-match-4.cc 97 2 16830 gimple-match-5.cc 99 2 16377 generic-match-3.cc57 2 16138 options-save.cc 1037 19 15121 generic-match-4.cc70 2 14095 gtype-desc.cc679 30 12597 insn-automata.cc 73 11 11735 generic-match-2.cc60 2 11543 generic-match-1.cc56 2 11504 generic-match-7.cc66 2 10238 generic-match-5.cc71 2 10231 generic-match-10.cc 66 2 9860 generic-match-6.cc61 2 9853 generic-match-8.cc53 2 9651 insn-modes.cc750410 7655 min-insn-modes.cc 9 2 2280 gengtype-lex.cc 398424 2126 insn-preds.cc146 32 1515 insn-dfatab.cc31 3 1230 insn-latencytab.cc26 3 1142 gcc-ranlib.cc 55 49196 insn-enums.cc 6 2173 insn-peep.cc 7 2 34 cc1-checksum.cc0 0 3 cc1plus-checksum.cc0 0 3
[Bug bootstrap/111664] [14 regression] Fails to build with mawk (error in gcc/opt-read.awk) after r14-4354-ge4a4b8e983bac8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111664 Kito Cheng changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2023-10-03 --- Comment #3 from Kito Cheng --- Proposed fix, and verified with mawk on my machine :) https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631785.html
[Bug target/111600] [14 Regression] RISC-V bootstrap time regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #13 from Kito Cheng --- I guess we may need something like this g:703417a0 for those generator for md file?
[Bug target/111412] RISC-V:ICE in phase 6 of vsetvl pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111412 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED CC||kito at gcc dot gnu.org --- Comment #2 from Kito Cheng --- fixed
[Bug target/111372] libgcc: RISCV C++ exception handling stack usage grew in 13.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111372 --- Comment #5 from Kito Cheng --- > Ok, but it's better to have configure option or something else just > for toolchains that definitely do not use vector extension I can understand that there would be such a demand in the embedded world, but that's not critical issue, so this won't get high priority to most RISC-V GCC developer, it would be appreciate if you could send a patch for that.
[Bug target/110277] RISC-V: ICE when build RVV intrinsic float reduction with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110277 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||kito at gcc dot gnu.org Resolution|--- |FIXED --- Comment #3 from Kito Cheng --- Fixed on trunk
[Bug target/110299] RISC-V: ICE when build RVV intrinsic widen with "-march=rv32gc_zve64d -mabi=ilp32d", both GCC 14 and 13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110299 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||kito at gcc dot gnu.org Resolution|--- |FIXED --- Comment #2 from Kito Cheng --- Fixed on trunk
[Bug target/111037] RISC-V: Invalid vsetvli fusion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111037 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Kito Cheng --- Fixed
[Bug target/111074] RISC-V: segmentation fault during RTL pass: vsetvl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111074 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED CC||kito at gcc dot gnu.org --- Comment #2 from Kito Cheng --- Fixed
[Bug target/110560] internal compiler error: in extract_constrain_insn_cached, at recog.cc:2704
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110560 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Kito Cheng --- Should fixed now
[Bug target/109773] RISC-V: ICE when build RVV Intrinsic in Both GCC 13 && GCC 14
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109773 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Kito Cheng --- Fixed on upstream for a while.
[Bug target/109725] [14 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4430
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109725 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #6 from Kito Cheng --- Ok for back port :)
[Bug target/111065] [RISCV] t-linux-multilib specifies incorrect multilib reuse patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111065 --- Comment #4 from Kito Cheng --- I guess I skip too much detail here, the multilib for linux isn’t really honor to the reause rule in the multilib config file for a while. That just control how multilib build, e.g. build ilp32 with which arch, and we will find matched ABI, but why we did that? The reason is simplify the reuse rule, RISC-V has huge number of extension now, so enumeration the possible combination are almost impossible. But why it can’t use same scheme as baremetal? Okay, that’s because we encode the abi in the path only, unlike baremetal we have encode both abi and arch, it kinda of de facto ABI in linux/glibc, also it not make too much sense to having too much different multilib within a (RISC-V) linux system.
[Bug target/111065] [RISCV] t-linux-multilib specifies incorrect multilib reuse patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111065 Kito Cheng changed: What|Removed |Added Version|og13 (devel/omp/gcc-13) |14.0 CC||kito at gcc dot gnu.org --- Comment #1 from Kito Cheng --- One major issue around multilib for linux is we only encode abi to the path, so it hard to extend that like baremetal toolchain.
[Bug target/111037] New: RISC-V: Invalid vsetvli fusion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111037 Bug ID: 111037 Summary: RISC-V: Invalid vsetvli fusion Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kito at gcc dot gnu.org CC: juzhe.zhong at rivai dot ai Target Milestone: --- Target: riscv64 Reduced case: ``` #include void foo(_Float16 y, int64_t *i64p) { vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1); vx = __riscv_vadd_vv_i64m1 (vx, vx, 1); vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1); asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy)); } ``` Command to reproduce: $ riscv64-unknown-elf-gcc -O3 -march=rv64gczve64f_zvfh foo: vsetivlizero,1,e64,m1,ta,ma vle64.v v1,0(a0) vfmv.s.fv2,fa0 # Will raise illegal instruction here, because we don't have F64 for vector vadd.vv v1,v1,v1 ret
[Bug target/110812] Missing TARGET_OPTION_SAVE/RESTORE on riscv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110812 Kito Cheng changed: What|Removed |Added Status|NEW |ASSIGNED CC||kito at gcc dot gnu.org --- Comment #1 from Kito Cheng --- Ooops, I thought those target hook should implement when we have implement target attribute, anyway thanks for the hint!
[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751 --- Comment #4 from Kito Cheng --- > OK, so TA is either merge or all-ones. Yes, your understand is correct, just few more detail is that can be mixing with either merge or all-ones. e.g. An 4 x i32 vector with mask 1 0 1 0 Op = | a | b | c | d | Mask = | 1 | 0 | 1 | 0 | the result could be: | a | b | c | d | | a | all-1 | c | d | | a | all-1 | c | all-1 | | a | all-1 | c | d | > Not sure how you can use MA at the moment since you specify an existing > operand in your target hook. As far as > I can see there's no value the target hook can provide that matches any of the implementation semantics? That's the key point - we don't know how to return an undefined value there, we have intrinsic can generate undefined value, but it seems impossible to generate that within the hook.
[Bug target/110748] RISC-V: optimize store of DF 0.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748 --- Comment #2 from Kito Cheng --- And seems we already has such constraint for a while, not sure why GCC 13 did that, I saw the status has changed to ASSIGNED, so I assume Vineet you are already spending time on that, so I will just stop there :)
[Bug target/110748] RISC-V: optimize store of DF 0.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #1 from Kito Cheng --- hmmm, weird, GCC 12 did well but something wrong after GCC 13? https://godbolt.org/z/ToM1qTxrq void zd(double *d) { *d = 0.0; } void zf(float *f) { *f = 0.0; } GCC 12: zd: sd zero,0(a0) ret zf: sw zero,0(a0) ret GCC 13: zd: fmv.d.x fa5,zero fsd fa5,0(a0) ret zf: fmv.s.x fa5,zero fsw fa5,0(a0) ret
[Bug target/110696] RISC-V: -march doesn't imply correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110696 Kito Cheng changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-07-17 Status|UNCONFIRMED |NEW --- Comment #2 from Kito Cheng --- Fixed on upstream, but will wait one more week for backporting to GCC 13 branch
[Bug target/110478] RISC-V multilib gcc zicsr in the -march causing incorrect libgcc to be used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110478 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #3 from Kito Cheng --- I've fix few multilib issue on linux, however it's unfortunately fixed after GCC 13.1 release...could you try trunk or releases/gcc-13 branch to see if that issue resolved? https://github.com/gcc-mirror/gcc/commit/6f0eb99c9bda726f953bdbe06dd3489a26af2823 https://github.com/gcc-mirror/gcc/commit/49d596e90deedbe9c7a1aa5824fb484fe3ad3193 https://github.com/gcc-mirror/gcc/commit/554aabc26786891ffb4d542c359eca0cef407ed1
[Bug target/110448] [RISC-V] RVV intrinsic api test error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110448 Kito Cheng changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #2 from Kito Cheng --- That might be annoying, but we (SiFive) promise that is we won't made any incompatible change after RVV intrinsic 1.0 release. So I gonna close this bug as resolved/invalid.
[Bug target/110448] [RISC-V] RVV intrinsic api test error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110448 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #1 from Kito Cheng --- That's incompatible change at RVV intrinsic spec land. see https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
[Bug target/110264] internal compiler error: riscv_vector::vector_insn_info::get_avl_reg_rtx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264 Kito Cheng changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Kito Cheng --- Fixed on trunk and backported to GCC 13
[Bug target/110188] gcc for RISC-V stack aligned error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110188 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #5 from Kito Cheng --- Each stack area will align to 16 byte, that could be optimized in theory, but will complicate the frame layout implementation. sp - 0 a9 / outgoing stack arguments area sp - 4 / outgoing stack arguments area sp - 8 / outgoing stack arguments area sp - 12 / outgoing stack arguments area sp - 16 / GPR save area sp - 20 / GPR save area sp - 24 / GPR save area sp - 28 ra / GPR save area sp - 32 Complete layout has document in riscv.cc: +---+ | | | incoming stack arguments | | | +---+ <-- incoming stack pointer | | | callee-allocated save area | | for arguments that are | | split between registers and | | the stack| | | +---+ <-- arg_pointer_rtx | | | callee-allocated save area | | for register varargs | | | +---+ <-- hard_frame_pointer_rtx; | | stack_pointer_rtx + gp_sp_offset | GPR save area| + UNITS_PER_WORD | | +---+ <-- stack_pointer_rtx + fp_sp_offset | | + UNITS_PER_HWVALUE | FPR save area| | | +---+ <-- frame_pointer_rtx (virtual) | | | local variables | | | P +---+ | | | outgoing stack arguments | | | +---+ <-- stack_pointer_rtx
[Bug target/109972] RISC-V: Could use umodsi3/udivsi3/divsi3 libcalls for 32-bit division/remainder on RV64 without M extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109972 --- Comment #3 from Kito Cheng --- We care but it's lower priority compare to other configuration, so create bug to tracking here should be best solution for now :P
[Bug target/109974] RISCV: RVV VSETVL Pass ICE in SLP auto-vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from Kito Cheng --- Fixed on trunk
[Bug target/109547] [13] RISC-V: Multiple vsetvli for load/store loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109547 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from Kito Cheng --- Fixed on both trunk and gcc 13
[Bug target/109743] RISC-V: Unnecessary VSETVLI of the RVV intrinsic in loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Kito Cheng --- Fixed on trunk
[Bug target/109748] RISC-V: Mis code gen for the RVV intrinsic VSETVL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109748 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from Kito Cheng --- Should be resolved at trunk.
[Bug target/109748] RISC-V: Mis code gen for the RVV intrinsic VSETVL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109748 --- Comment #1 from Kito Cheng --- Is this also happened in GCC 13 branch?
[Bug target/109535] [13 regression] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #16 from Kito Cheng --- Fixed both on trunk and GCC 13 branch :)
[Bug target/109617] RISC-V: ICE for vlmul_ext_v intrinsic API
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109617 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from Kito Cheng --- fixed on trunk
[Bug target/109272] RISCV: vbool*_t opportunities of a better code generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109272 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from Kito Cheng --- Fixed on trunk
[Bug target/109547] [13] RISC-V: Multiple vsetvli for load/store loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109547 Kito Cheng changed: What|Removed |Added Summary|RISC-V: Multiple vsetvli|[13] RISC-V: Multiple |for load/store loop |vsetvli for load/store loop Target Milestone|--- |13.2
[Bug target/109535] [13/14] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535 Kito Cheng changed: What|Removed |Added Target Milestone|--- |13.2 Summary|internal compiler error: in |[13/14] internal compiler |finalize_new_accesses, at |error: in |rtl-ssa/changes.cc:471 |finalize_new_accesses, at ||rtl-ssa/changes.cc:471
[Bug target/109547] RISC-V: Multiple vsetvli for load/store loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109547 Kito Cheng changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2023-04-19 --- Comment #1 from Kito Cheng --- Confirmed.
[Bug target/109535] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535 Kito Cheng changed: What|Removed |Added Last reconfirmed||2023-04-17 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |juzhe.zhong at rivai dot ai
[Bug target/109104] [13/14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1171 with -fzero-call-used-regs=all -march=rv64gv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109104 Kito Cheng changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Kito Cheng --- Fixed on trunk
[Bug target/109535] internal compiler error: in finalize_new_accesses, at rtl-ssa/changes.cc:471
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109535 --- Comment #5 from Kito Cheng --- Confirmed the the output is text file, it's just suffixed with .out
[Bug target/109479] [RISC-V] Build vint64m1_t with rv64gc_zve32x_zvl64b should promote information like "vint64m1_t requires the 'zve64x' extensions"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109479 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #8 from Kito Cheng --- Fixed on upstream now :)
[Bug target/109479] [RISC-V] Build with rv64gc_zve32x_zvl64b should fail but actually not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109479 Kito Cheng changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2023-04-12 --- Comment #3 from Kito Cheng --- Title might little bit misleading, -march=rv64gc_zve32x_zvl64b is valid arch configuration, invalid thing is vint64m*_t and vuint64m*_t are invalid for rv64gc_zve32x.
[Bug bootstrap/109461] build gcc for riscv target failed with `execvp: /bin/sh: Argument list too long error when using with --with-multilib-generator`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109461 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #2 from Kito Cheng --- https://github.com/gcc-mirror/gcc/commit/5ca9980fc86242505ffdaaf62bca1fd5db26550b https://github.com/gcc-mirror/gcc/commit/d72ca12b846a9f5c01674b280b1817876c77888f New multi-lib selection scheme should improve this, so that you don't need to specify so loong multi-lib config. I guess I should write more doc and adding release note to mention that.
[Bug target/109328] [13 Regression] Build fail in RISC-V port
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109328 Kito Cheng changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Kito Cheng --- Verified with crosstool-ng, also fixed several missing dependency in t-riscv
[Bug target/109349] riscv: Add --print-supported-extensions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109349 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #4 from Kito Cheng --- MaskRay: I would prefer to adding -march=help rather than --print-supported-extensions on GNU toolchain side, that should be satisfy the conventions in GCC and also having consistent with clang, although I am personally prefer -march=? rather than -march=help, but I know clang has rename -mcpu=? -mtune=? to -mcpu=help and -mtune=help, anyway that's minor. BTW, 4vtomat is our(SiFive) team member, so actually we've plan to add that on GNU toolchain side but because it's stage 4 for GCC so I still hold there :P Andrew Pinski: Yeah, I plan to make up document stuffs and release notes at April...
[Bug target/109104] [13 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1171 with -fzero-call-used-regs=all -march=rv64gv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109104 Kito Cheng changed: What|Removed |Added Status|NEW |ASSIGNED CC||kito at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |pan2.li at intel dot com --- Comment #4 from Kito Cheng --- Pan Li from Intel is working on fixing that
[Bug target/109328] [13 Regression] Build fail in RISC-V port
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109328 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Kito Cheng --- I can reproduce the problem with crosstool-ng, and it has resolved by Andrew Pinski's fix, I am reviewing the dependency in the file. Plan to drop a complete version of patch later :)
[Bug target/109312] Missing __riscv_v_intrinsic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109312 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from Kito Cheng --- Fixed, let me know if you got any issue on RVV intrinsic, thanks :)
[Bug target/109228] warning: implicit declaration of function '__riscv_vlenb'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109228 Kito Cheng changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from Kito Cheng --- Fixed!
[Bug target/109244] internal compiler error: in setup_preferred_alternate_classes_for_new_pseudos, at ira.cc:2892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109244 Kito Cheng changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #6 from Kito Cheng --- Fixed, let us know if you got any issue on compiling or testing highway!
[Bug target/109244] internal compiler error: in setup_preferred_alternate_classes_for_new_pseudos, at ira.cc:2892
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109244 --- Comment #4 from Kito Cheng --- Gonna commit the fix soon, and following code is the reduced case which is reduced from your attachment. Reduced case (reduced by creduce) typedef int a; using c = float; template < typename > using e = int; #pragma riscv intrinsic "vector" template < typename, int, int f > struct aa { using g = int; template < typename > static constexpr int h() { return f; } template < typename i > using ab = aa< i, 0, h< i >() >; }; template < int f > struct p { using j = aa< float, 6, f >; }; template < int f > struct k { using j = typename p< f >::j; }; template < typename, int f > using ac = typename k< f >::j; template < class ad > using l = typename ad::g; template < class g, class ad > using ab = typename ad::ab< g >; template < class ad > using ae = ab< e< ad >, ad >; template < int m > vuint32mf2_t ai(aa< a, m, -1 >, a aj) { return __riscv_vmv_v_x_u32mf2(aj, 0); } template < int m > vfloat32mf2_t ai(aa< c, m, -1 >, c); template < class ad > using ak = decltype(ai(ad(), l< ad >())); template < class ad > ak< ad > al(ad d) { ae< decltype(d) > am; return an(d, ai(am, 0)); } template < typename g, int m > vuint8mf2_t ao(aa< g, m, -1 >, vuint32mf2_t n) { return __riscv_vreinterpret_v_u32mf2_u8mf2(n); } template < int m > vuint32mf2_t ap(aa< a, m, -1 >, vuint8mf2_t n) { return __riscv_vreinterpret_v_u8mf2_u32mf2(n); } template < typename g, int m > vuint8mf2_t ao(aa< g, m, -1 >, vfloat32mf2_t n) { return __riscv_vreinterpret_v_u32mf2_u8mf2( __riscv_vreinterpret_v_f32mf2_u32mf2(n)); } template < int m > vfloat32mf2_t ap(aa< c, m, -1 >, vuint8mf2_t); template < class ad, class aq > ak< ad > an(ad d, aq n) { return ap(d, ao(d, n)); } vbool64_t av(vuint32mf2_t, vuint32mf2_t); template < class ad > bool ba(ad, vbool64_t); template < class ad > using bb = decltype(al(ad())); template < typename g > using be = ac< g, -1 >; struct bf { template < class ad > bool bh(ad, bb< ad > bi) { ae< ad > am; return ba(am, av(an(am, bi), al(am))); } }; int bo; template < class ad, class bl, typename g > void o(ad d, bl bn, g) { bb< ad > bq = al(d); for (; bo;) { int br = bn.bh(d, bq); if (__builtin_expect(br, 0)) for (;;) ; } } template < class ad, class bl, typename g > void bs(ad d, bl bn, g) { g bu; o(d, bn, bu); } template < class ad, class bl, typename g > void bv(ad d, bl bn, g *, int, g *bt) { bs(d, bn, bt); } float by; int bz; float ca; void b() { be< float > d; bf bn; bv(d, bn, , bz, ); }
[Bug c/109228] warning: implicit declaration of function '__riscv_vlenb'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109228 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #1 from Kito Cheng --- Thanks for report! we definitely missed that...
[Bug target/108185] [RISC-V] Sub-optimal code-gen for vsetvli: redundant stack store
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108185 Kito Cheng changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Kito Cheng --- Resolved by Pan's patch :)
[Bug target/108339] [11/10 only] riscv64-linux-gnu: fails to link libgcc_s.so on the GCC 10 branch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108339 Kito Cheng changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Kito Cheng --- Backported to GCC 10 branch.
[Bug target/108764] [RISCV] Cost model for RVB is too aggressive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #3 from Kito Cheng --- > I think one solution is to change the cost model of such complex instructions > to the sum of the cost for each part. E.g. > cost for shNadd = COSTS_N_INSNS (SINGLE_SHIFT_COST) + COSTS_N_INSNS (1) # > cost of addition Some RISC-V core implementation did has one cycle for shNadd operation as I know, but I know it's not true for every implementation. Anyway, it's really uarch dependent, so I would prefer keep as it for now, and then extend the cost model function to easier handle different uarch (-mtune) when GCC 14 is open.
[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 --- Comment #13 from Kito Cheng --- Patch posted before, but seems like not everybody agree: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603049.html
[Bug target/108185] [RISC-V] Sub-optimal code-gen for vsetvli: redundant stack store
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108185 Kito Cheng changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2023-01-03 --- Comment #4 from Kito Cheng --- So it's about the code gen quality instead of correctness, let me update the title.
[Bug target/108185] [RISC-V]RVV assemble not set vsetvli correct.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108185 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #2 from Kito Cheng --- It seems right to me? ``` $ riscv64-unknown-elf-gcc pr108185.c -march=rv64gcv -mabi=lp64d -O3 -S -o - .file "pr108185.c" .option nopic .attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_v1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0" .attribute unaligned_access, 0 .attribute stack_align, 16 .text .align 1 .globl foo5_3 .type foo5_3, @function foo5_3: csrrt0,vlenb sllit1,t0,1 csrra5,vlenb sub sp,sp,t1 sllia3,a5,1 add a3,a3,sp vl1re8.vv25,0(a0) # Load value from *(vint8m1_t*)in sub a5,a3,a5 vs1r.v v25,0(a1) # Store value to *(vint8m1_t*)out vs1r.v v25,0(a5) # Store value to stack, although it's unused. addia4,a1,800 csrrt0,vlenb sllit1,t0,1 vsetvli a5,zero,e8,m1,ta,ma # Right vsetvli for vsm.v vsm.v v25,0(a4) add sp,sp,t1 jr ra .size foo5_3, .-foo5_3 .ident "GCC: (g44b22ab81cf) 13.0.0 20221229 (experimental)" ```
[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #7 from Kito Cheng --- We are hitting this issue on RISC-V, and got some complain from linux kernel developers, but in different form as the original report, we found cold function or any function is marked as cold by `-fguess-branch-probability` are all not honor to the -falign-functions=N setting, that become problem on some linux kernel feature since they want to control the minimal alignment to make sure they can atomically update the instruction which require align to 4 byte. However current GCC behavior can't guarantee that even -falign-functions=4 is given, there is 3 option in my mind: 1. Fix -falign-functions=N, let it work as expect on -Os and all cold functions 2. Force align to 4 byte if -fpatchable-function-entry is given, that's should be doable by adjust RISC-V's FUNCTION_BOUNDARY 3. Adjust RISC-V's FUNCTION_BOUNDARY to let it honor to -falign-functions=N 4. Adding a -malign-functions=N...Okay, I know that suck idea, x86 already deprecated that. But I think ideally this should fixed by 1 option if possible. Testcase from RISC-V kernel guy: ``` /* { dg-do compile } */ /* { dg-options "-march=rv64gc -mabi=lp64d -O1 -falign-functions=128" } */ /* { dg-final { scan-assembler-times ".align 7" 2 } } */ // Using 128 byte align rather than 4 byte align since it easier to observe. __attribute__((__cold__)) void a() {} // This function isn't align to 128 byte void b() {} // This function align to 128 byte. ``` Proposed fix: ``` diff --git a/gcc/varasm.c b/gcc/varasm.c index 49d5cda122f..6f8ed85fea9 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -1907,8 +1907,7 @@ assemble_start_function (tree decl, const char *fnname) Note that we still need to align to DECL_ALIGN, as above, because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all. */ if (! DECL_USER_ALIGN (decl) - && align_functions.levels[0].log > align - && optimize_function_for_speed_p (cfun)) + && align_functions.levels[0].log > align) { #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN int align_log = align_functions.levels[0].log; ```