https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110430
Bug ID: 110430 Summary: Fail to CSE for LEN_MASK_STORE Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- Consider this following case: void __attribute__((noinline,noclone)) foo (int *out, int *res) { int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; int i; for (i = 0; i < 16; ++i) { if (mask[i]) out[i] = i; } int o0 = out[0]; int o7 = out[7]; int o14 = out[14]; int o15 = out[15]; res[0] = o0; res[2] = o7; res[4] = o14; res[6] = o15; } -O3 -march=rv64gcv_zvl512b --param riscv-autovec-preference=fixed-vlmax Current RVV auto-vectorization codegen: foo: lui a5,%hi(.LANCHOR0) vsetivli zero,16,e32,m1,ta,ma addi a5,a5,%lo(.LANCHOR0) vid.v v1 vlm.v v0,0(a5) vsetvli a5,zero,e32,m1,ta,ma vse32.v v1,0(a0),v0.t lw a2,0(a0) lw a3,28(a0) lw a4,56(a0) lw a5,60(a0) sw a2,0(a1) sw a3,8(a1) sw a4,16(a1) sw a5,24(a1) ret However, with this patch: https://patchwork.sourceware.org/project/gcc/patch/20230627064737.16257-1-juzhe.zh...@rivai.ai/ We will end up with better codegen with CSE: foo: lui a5,%hi(.LANCHOR0) vsetivli zero,16,e32,m1,ta,ma addi a5,a5,%lo(.LANCHOR0) vid.v v1 vlm.v v0,0(a5) vsetvli a5,zero,e32,m1,ta,ma vse32.v v1,0(a0),v0.t lw a4,0(a0) lw a5,56(a0) sw a4,0(a1) sw a5,16(a1) li a4,7 li a5,15 sw a4,8(a1) sw a5,24(a1) ret 2 "lw" should be CSE into 2 "li" instructions, gimple IR: .LEN_MASK_STORE (out_10(D), 32B, 16, { 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1 }, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }, 0); o0_11 = *out_10(D); o14_13 = MEM[(int *)out_10(D) + 56B]; *res_15(D) = o0_11; MEM[(int *)res_15(D) + 8B] = 7; MEM[(int *)res_15(D) + 16B] = o14_13; MEM[(int *)res_15(D) + 24B] = 15; mask ={v} {CLOBBER(eol)}; Since after discussion with Richi, this current possible fix patch can only hanlde VLS (fixed-length) vectors, can not handle VLA (variable-length) vectors. It's hard for us to create a C code testcase to produce CSE opportunity for VL vectors. So, open a BUG for now to make me won't forget such issue. Will enhance LEN_MASK_STORE in CSE after I finished all RVV auto-vectorization support.