[Bug target/114809] New: [RISC-V RVV] Counting elements might be simpler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809 Bug ID: 114809 Summary: [RISC-V RVV] Counting elements might be simpler Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Consider this simple procedure --- #include #include size_t count_chars(const char *src, size_t len, char c) { size_t count = 0; for (size_t i=0; i < len; i++) { count += src[i] == c; } return count; } --- Assembly for it (GCC 14.0, -march=rv64gcv -O3): --- count_chars(char const*, unsigned long, char): beq a1,zero,.L4 vsetvli a4,zero,e8,mf8,ta,ma vmv.v.x v2,a2 vsetvli zero,zero,e64,m1,ta,ma vmv.v.i v1,0 .L3: vsetvli a5,a1,e8,mf8,ta,ma vle8.v v0,0(a0) sub a1,a1,a5 add a0,a0,a5 vmseq.vvv0,v0,v2 vsetvli zero,zero,e64,m1,tu,mu vadd.vi v1,v1,1,v0.t bne a1,zero,.L3 vsetvli a5,zero,e64,m1,ta,ma li a4,0 vmv.s.x v2,a4 vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret .L4: li a0,0 ret --- The counting procedure might use `vcpop.m` instead of updating vector of counters (`v1`) and summing them in the end. This would move all mode switches outside the loop. And there's a missing peephole optimization: li a4,0 vmv.s.x v2,a4 It can be: vmv.s.x v2,zero
[Bug c++/114747] New: [RISC-V RVV] Wrong SEW set for mixed-size intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114747 Bug ID: 114747 Summary: [RISC-V RVV] Wrong SEW set for mixed-size intrinsics Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- This is a distilled procedure from simdutf project: --- #include #include #include size_t convert_latin1_to_utf16le(const char *src, size_t len, char16_t *dst) { char16_t *beg = dst; for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) { vl = __riscv_vsetvl_e8m4(len); vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t*)src, vl); __riscv_vse16_v_u16m8((uint16_t*)dst, __riscv_vzext_vf2_u16m8(v, vl), vl); } return dst - beg; } --- When compiled with gcc 13.2.0 with flags "-march=rv64gcv -O2" it sets a wrong SEW: --- convert_latin1_to_utf16le(char const*, unsigned long, char16_t*): beq a1,zero,.L4 mv a4,a2 .L3: vsetvli a5,a1,e8,m4,ta,ma # set SEW=8 vle8.v v8,0(a0) sllia3,a5,1 vzext.vf2 v24,v8 # illegal instruction, as SEW/2 < 8 sub a1,a1,a5 vse16.v v24,0(a4) add a0,a0,a5 add a4,a4,a3 bne a1,zero,.L3 sub a0,a4,a2 sraia0,a0,1 ret .L4: li a0,0 ret --- The trunk available on godbold.org (riscv64-unknown-linux-gnu-g++ 14.0.1 20240415) emits vsetvli with e16 argument, which seems to be fine.
[Bug target/114172] [13 only] ICE with riscv rvv VSETVL intrinsic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114172 Wojciech Mula changed: What|Removed |Added CC||wojciech_mula at poczta dot onet.p ||l --- Comment #2 from Wojciech Mula --- Checked 13.2 from Debian: $ riscv64-linux-gnu-gcc --version riscv64-linux-gnu-gcc (Debian 13.2.0-12) 13.2.0 For the Bruce's testcase the following invocation triggers segfault (-O2, -O1, -O2 - no error): $ riscv64-linux-gnu-gcc -march=rv64gcv -c 1.c -O3 Below is just the bottom of stack obtained by gdb. There's an infinite recursion somewhere around `riscv_vector::avl_info::operator==`. #629078 0x00fa3372 in riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const () #629079 0x00fa37f9 in ?? () #629080 0x00fa2543 in ?? () #629081 0x00fa3372 in riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const () #629082 0x00fa37f9 in ?? () #629083 0x00fa2543 in ?? () #629084 0x00fa3372 in riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const () #629085 0x00fa37f9 in ?? () #629086 0x00fa2543 in ?? () #629087 0x00fa3372 in riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const () #629088 0x00fa37f9 in ?? () #629089 0x00fa2543 in ?? () #629090 0x00fa3372 in riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const () #629091 0x00fa394b in ?? () #629092 0x00f9f588 in riscv_vector::vector_insn_info::compatible_p(riscv_vector::vector_insn_info const&) const () #629093 0x00fa0eb9 in pass_vsetvl::compute_local_backward_infos(rtl_ssa::bb_info const*) () #629094 0x00fa8c6b in pass_vsetvl::lazy_vsetvl() () #629095 0x00fa8e1f in pass_vsetvl::execute(function*) () #629096 0x00b5e21b in execute_one_pass(opt_pass*) () #629097 0x00b5eac0 in ?? () #629098 0x00b5ead2 in ?? () #629099 0x00b5ead2 in ?? () #629100 0x00b5eaf9 in execute_pass_list(function*, opt_pass*) () #629101 0x00822588 in cgraph_node::expand() () #629102 0x00823afb in ?? () #629103 0x00825fd8 in symbol_table::finalize_compilation_unit() () #629104 0x00c29bad in ?? () #629105 0x006a4c97 in toplev::main(int, char**) () #629106 0x006a6a8b in main ()
[Bug target/88798] AVX512BW code does not use bit-operations that work on mask registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798 --- Comment #8 from Wojciech Mula --- Thank you for the answer. Thus my question is: is it possible to delay conversion from kmasks into ints? I'm not a language lawyer, but I guess a `x binop y` has to be treated as `(int)x binop (int)y`. If it's true, we will have to prove that `(int)(x avx512-binop y)` is equivalent to the latter expr.
[Bug target/88798] AVX512BW code does not use bit-operations that work on mask registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798 --- Comment #6 from Wojciech Mula --- Hongtao, thank you for your patch and for pinging back! I checked the code from this issue against version 11.2.0 (Debian 11.2.0-14), but still, there are KMOVQs before performing any bit ops. Here is the output from `gcc -O3 -march=icelake-server -S` vpcmpub $0, .LC0(%rip), %zmm0, %k0 vpcmpub $0, .LC1(%rip), %zmm0, %k1 vpcmpub $0, .LC2(%rip), %zmm0, %k2 kmovq %k0, %rcx kmovq %k1, %rax orq %rcx, %rax kmovq %k2, %rdx orq %rdx, %rax ret