Hi Changsheng,
Thank you for providing so much detailed information. I am glad to see that RISC-VV is gradually maturing. I conducted a simple experiment on GCC, it can automatically generate vectorized code now. However, I found a special instruction vlseg4e32.v in output, and consulted the RISC-VV documentation. I got many doubts. Such as registers is (V8, V9, V10, V11) in document, or (V8, V10, V12, V14) in GCC. And, there are still many ASM instruction's details that only exist in fragmented PPT from different user/companies, without a unified official document These issues may lead to repeated code rework in the future, which makes me still not recommend accepting RISC-VV now. If possible, please help the RISC-V community to improving the RISC-VV ISA documentation. I am pleased to accept RISC-VV as one of the target platforms in the future. Regards, Chen Code int test1(int *x, int N) { int sum = 0; if (__builtin_expect(N % 4, 0)) { for(int i = 0; i < N; i+=4) { sum += (x[i+0] + x[i+1] + x[i+2] + x[i+3]); } } else __builtin_unreachable(); return sum; } GCC output test1(int*, int): ble a1,zero,.L4 vsetvlia5,zero,e32,m2,ta,ma addiw a4,a1,-1 vmv.v.iv4,0 srliw a4,a4,2 addiw a4,a4,1 .L3: vsetvlia5,a4,e32,m2,tu,ma vlseg4e32.v v8,(a0) slli a3,a5,4 sub a4,a4,a5 add a0,a0,a3 vadd.vvv2,v10,v8 vadd.vvv2,v2,v12 vadd.vvv2,v2,v14 vadd.vvv4,v4,v2 bne a4,zero,.L3 vsetvlia5,zero,e32,m2,ta,ma vmv.s.xv1,zero vredsum.vs v4,v4,v1 vmv.x.sa0,v4 ret .L4: li a0,0 ret RISC-V Document 7.8.2. Vector Strided Segment Loads and Stores Vector strided segment loads and stores move contiguous segments where each segment is separated by the byte-stride offset given in the rs2 GPR argument. | Note | Negative and zero strides are supported. | # Format vlsseg<nf>e<eew>.v vd, (rs1), rs2, vm # Strided segment loads vssseg<nf>e<eew>.v vs3, (rs1), rs2, vm # Strided segment stores # Examples vsetvli a1, t0, e8, ta, ma vlsseg3e8.v v4, (x5), x6 # Load bytes at addresses x5+i*x6 into v4[i], # and bytes at addresses x5+i*x6+1 into v5[i], # and bytes at addresses x5+i*x6+2 into v6[i]. # Examples vsetvli a1, t0, e32, ta, ma vssseg2e32.v v2, (x5), x6 # Store words from v2[i] to address x5+i*x6 # and words from v3[i] to address x5+i*x6+4 At 2025-07-07 15:24:16, wu.changsh...@sanechips.com.cn wrote: Hi Chen, Thank you for your previous feedback. I'd like to supplement some information about RISC-V Vector V1.0 and hope you can reconsider x265 support for the RISC-V architecture. 1. The RISC-V community considers Vector V1.0 a stable version. The RISC-V Vector V1.0 was officially approved and released in 2021. The server profile RVA23, released in October 2024, also specifies Vector V1.0, and The RISC-V Instruction Set Manual Volume published the same year adopts Vector V1.0 as well. 2. Many chip manufacturers already support RISC-V Vector Extension V1.0, such as the already released SiFive P670/P470, Andes NX27V, Alibaba C920, and SpaceMIT X100 CPUs. In the next year or two, many more vendors will launch chips supporting Vector V1.0. 3. GCC experimentally introduced RISC-V Vector support in GCC 12 (May 2022) and officially supported RISC-V Vector V1.0 in GCC 14 (May 2024). 4. The Linux kernel merged support for RISC-V Vector V1.0 in June 2023 and released it in the LTS 6.21 version. 5. Our company has already planned to deploy RISC-V servers in data centers, with x265 video encoding being one of the key business scenarios. We will continue contributing RISC-V architecture patches. RISC-V has garnered widespread attention and strong investment, leading to rapid development. I believe it will become another mainstream architecture following x86 and Arm. RISC-V is now commercially viable and deserves adoption by the x265 community. Best Wishes! Changsheng Wu M: +86 13776570034 E:wu.changsh...@sanechips.com.cn SANECHIPS TECHNOLOGY CO.,LTD. Original From: chen <chenm...@163.com> To: 吴昌盛0318004250; Cc: x265-devel@videolan.org <x265-devel@videolan.org>;mah...@multicorewareinc.com <mah...@multicorewareinc.com>;pavan.ta...@multicorewareinc.com <pavan.ta...@multicorewareinc.com>;沈显来0318003851;袁佳0318004243;吴昌盛0318004250; Date: 2025年07月07日 03:28 Subject: Re:[x265] [PATCH] RISCV64: add copy_cnt assembly optimization Hi Changsheng, Thank for the patches. However, I don't think RISC-V Extension-V stable enough nowadays. v1.0 frozen at September 2021 v1.1 public review at May 2023 no more update until July 2025 And most instructions has not behavior description, For example, vredsum.vs in the patch vredsum.vs vd, vs2, vs1, vm # vd[0] = sum( vs1[0] , vs2[*] ) I just guess it is vd[0] = vs1[0] + sum(vs2[*]) Another example is vlse8.v, I may guess it is equal to x86 PSHUFB or ARM VTBL, Above example I just guess, I can't confirm my concept in past couple years, too many similar problem inside RISC-V Extension-V So, I suggest do not integrate / implement RISC-V patch, until specification become stable enough. Rgards, Chen 2025-07-06 10:08:25,wu.changsh...@sanechips.com.cn From 7562e3a834a6a5ea76ab1b97acf915e095646cd5 Mon Sep 17 00:00:00 2001 From: Changsheng Wu <wu.changsh...@sanechips.com.cn> Date: Sat, 5 Jul 2025 23:09:14 +0800 Subject: [PATCH] RISCV64: add copy_cnt assembly optimization TestBench test result: copy_cnt[4x4] | 1.34x | 123.12 | 165.06 copy_cnt[8x8] | 2.64x | 214.07 | 564.26 copy_cnt[16x16] | 3.96x | 563.83 | 2232.00 copy_cnt[32x32] | 7.44x | 2144.80 | 15954.42
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel