[Bug target/114809] New: [RISC-V RVV] Counting elements might be simpler

2024-04-22 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809

Bug ID: 114809
   Summary: [RISC-V RVV] Counting elements might be simpler
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wojciech_mula at poczta dot onet.pl
  Target Milestone: ---

Consider this simple procedure

---
#include 
#include 

size_t count_chars(const char *src, size_t len, char c) {
size_t count = 0;
for (size_t i=0; i < len; i++) {
count += src[i] == c;
}

return count;
}
---

Assembly for it (GCC 14.0, -march=rv64gcv -O3):

---
count_chars(char const*, unsigned long, char):
beq a1,zero,.L4
vsetvli a4,zero,e8,mf8,ta,ma
vmv.v.x v2,a2
vsetvli zero,zero,e64,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e8,mf8,ta,ma
vle8.v  v0,0(a0)
sub a1,a1,a5
add a0,a0,a5
vmseq.vvv0,v0,v2
vsetvli zero,zero,e64,m1,tu,mu
vadd.vi v1,v1,1,v0.t
bne a1,zero,.L3
vsetvli a5,zero,e64,m1,ta,ma
li  a4,0
vmv.s.x v2,a4
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li  a0,0
ret
---

The counting procedure might use `vcpop.m` instead of updating vector of
counters (`v1`) and summing them in the end. This would move all mode switches
outside the loop.

And there's a missing peephole optimization:

li  a4,0
vmv.s.x v2,a4

It can be:

vmv.s.x v2,zero

[Bug c++/114747] New: [RISC-V RVV] Wrong SEW set for mixed-size intrinsics

2024-04-16 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114747

Bug ID: 114747
   Summary: [RISC-V RVV] Wrong SEW set for mixed-size intrinsics
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wojciech_mula at poczta dot onet.pl
  Target Milestone: ---

This is a distilled procedure from simdutf project:

---
#include 
#include 
#include 

size_t convert_latin1_to_utf16le(const char *src, size_t len, char16_t *dst) {
  char16_t *beg = dst;
  for (size_t vl; len > 0; len -= vl, src += vl, dst += vl) {
vl = __riscv_vsetvl_e8m4(len);
vuint8m4_t v = __riscv_vle8_v_u8m4((uint8_t*)src, vl);
__riscv_vse16_v_u16m8((uint16_t*)dst, __riscv_vzext_vf2_u16m8(v, vl), vl);
  }
  return dst - beg;
}
---

When compiled with gcc 13.2.0 with flags "-march=rv64gcv -O2" it sets a wrong
SEW:

---
convert_latin1_to_utf16le(char const*, unsigned long, char16_t*):
beq a1,zero,.L4
mv  a4,a2
.L3:
vsetvli a5,a1,e8,m4,ta,ma  # set SEW=8
vle8.v  v8,0(a0)
sllia3,a5,1
vzext.vf2   v24,v8 # illegal instruction, as SEW/2 < 8
sub a1,a1,a5
vse16.v v24,0(a4)
add a0,a0,a5
add a4,a4,a3
bne a1,zero,.L3
sub a0,a4,a2
sraia0,a0,1
ret
.L4:
li  a0,0
ret
---

The trunk available on godbold.org (riscv64-unknown-linux-gnu-g++ 14.0.1
20240415) emits vsetvli with e16 argument, which seems to be fine.

[Bug target/114172] [13 only] ICE with riscv rvv VSETVL intrinsic

2024-03-28 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114172

Wojciech Mula  changed:

   What|Removed |Added

 CC||wojciech_mula at poczta dot 
onet.p
   ||l

--- Comment #2 from Wojciech Mula  ---
Checked 13.2 from Debian:

$ riscv64-linux-gnu-gcc --version
riscv64-linux-gnu-gcc (Debian 13.2.0-12) 13.2.0

For the Bruce's testcase the following invocation triggers segfault (-O2, -O1,
-O2 - no error):

$ riscv64-linux-gnu-gcc -march=rv64gcv -c 1.c -O3

Below is just the bottom of stack obtained by gdb. There's an infinite
recursion somewhere around `riscv_vector::avl_info::operator==`.

#629078 0x00fa3372 in
riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const ()
#629079 0x00fa37f9 in ?? ()
#629080 0x00fa2543 in ?? ()
#629081 0x00fa3372 in
riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const ()
#629082 0x00fa37f9 in ?? ()
#629083 0x00fa2543 in ?? ()
#629084 0x00fa3372 in
riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const ()
#629085 0x00fa37f9 in ?? ()
#629086 0x00fa2543 in ?? ()
#629087 0x00fa3372 in
riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const ()
#629088 0x00fa37f9 in ?? ()
#629089 0x00fa2543 in ?? ()
#629090 0x00fa3372 in
riscv_vector::avl_info::operator==(riscv_vector::avl_info const&) const ()
#629091 0x00fa394b in ?? ()
#629092 0x00f9f588 in
riscv_vector::vector_insn_info::compatible_p(riscv_vector::vector_insn_info
const&) const ()
#629093 0x00fa0eb9 in
pass_vsetvl::compute_local_backward_infos(rtl_ssa::bb_info const*) ()
#629094 0x00fa8c6b in pass_vsetvl::lazy_vsetvl() ()
#629095 0x00fa8e1f in pass_vsetvl::execute(function*) ()
#629096 0x00b5e21b in execute_one_pass(opt_pass*) ()
#629097 0x00b5eac0 in ?? ()
#629098 0x00b5ead2 in ?? ()
#629099 0x00b5ead2 in ?? ()
#629100 0x00b5eaf9 in execute_pass_list(function*, opt_pass*) ()
#629101 0x00822588 in cgraph_node::expand() ()
#629102 0x00823afb in ?? ()
#629103 0x00825fd8 in symbol_table::finalize_compilation_unit() ()
#629104 0x00c29bad in ?? ()
#629105 0x006a4c97 in toplev::main(int, char**) ()
#629106 0x006a6a8b in main ()

[Bug target/88798] AVX512BW code does not use bit-operations that work on mask registers

2022-02-07 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798

--- Comment #8 from Wojciech Mula  ---
Thank you for the answer. Thus my question is: is it possible to delay
conversion from kmasks into ints? I'm not a language lawyer, but I guess a `x
binop y` has to be treated as `(int)x binop (int)y`. If it's true, we will have
to prove that `(int)(x avx512-binop y)` is equivalent to the latter expr.

[Bug target/88798] AVX512BW code does not use bit-operations that work on mask registers

2022-01-31 Thread wojciech_mula at poczta dot onet.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798

--- Comment #6 from Wojciech Mula  ---
Hongtao, thank you for your patch and for pinging back! I checked the code from
this issue against version 11.2.0 (Debian 11.2.0-14), but still, there are
KMOVQs before performing any bit ops. Here is the output from `gcc -O3
-march=icelake-server -S`

vpcmpub $0, .LC0(%rip), %zmm0, %k0
vpcmpub $0, .LC1(%rip), %zmm0, %k1
vpcmpub $0, .LC2(%rip), %zmm0, %k2
kmovq   %k0, %rcx
kmovq   %k1, %rax
orq %rcx, %rax
kmovq   %k2, %rdx
orq %rdx, %rax
ret