On 6. Jul 23 20:39, Wu, Fei wrote: > On 5/29/2023 11:29 AM, Wu, Fei wrote: > > On 5/28/2023 1:06 AM, Petr Pavlu wrote: > >> On 21. Apr 23 17:25, Jojo R wrote: > >>> We consider to add RVV/Vector [1] feature in valgrind, there are some > >>> challenges. > >>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means > >>> the > >>> vector length is agnostic. > >>> ARM's SVE is not supported in valgrind :( > >>> > >>> There are three major issues in implementing RVV instruction set in > >>> Valgrind > >>> as following: > >>> > >>> 1. Scalable vector register width VLENB > >>> 2. Runtime changing property of LMUL and SEW > >>> 3. Lack of proper VEX IR to represent all vector operations > >>> > >>> We propose applicable methods to solve 1 and 2. As for 3, we explore > >>> several > >>> possible but maybe imperfect approaches to handle different cases. > >>> > I did a very basic prototype for vlen Vector-IR, particularly on RISC-V > Vector (RVV): > > * Define new iops such as Iop_VAdd8/16/32/64, the difference from > existing SIMD version is that no element number is specified like > Iop_Add8x32 > > * Define new IR type Ity_VLen along side existing types such as Ity_I64, > Ity_V256 > > * Define new class HRcVecVLen in HRegClass for vlen vector registers > The real length is embedded in both IROp and IRType for vlen ops/types, > it's runtime-decided and already known when handling insn such as vadd, > this leads to more flexibility, e.g. backend can issue extra vsetvl if > necessary. > > With the above, RVV instruction in the guest can be passed from > frontend, to memcheck, to the backend, and generate the final RVV insn > during host isel, a very basic testcase has been tested. > > Now here comes to the complexities: > > 1. RVV has the concept of LMUL, which groups multiple (or partial) > vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This > complicates the register allocation. > > 2. RVV uses the "implicit" v0 for mask, its content must be loaded to > the exact "v0" register instead of any other ones if host isel wants to > leverage RVV insn, this implicitness in ISA requires more explicitness > in Valgrind implementation. > > For #1 LMUL, a new register allocation algorithm for it can be added, > and it will be great if someone is willing to try it, I'm not sure how > much effort it will take. The other way is splitting it into multiple > ops which only takes one vector register, taking vadd for example, 2 > vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay > for the widening insn, most of the arithmetic insns can be covered in > this way. The exception could be register gather insn vrgather, which we > can consult other ways for it, e.g. scalar or helper. > > For #2 v0 mask, one way is to handle the mask in the very beginning at > guest_riscv64_toIR.c, similar to what AVX port does: > > a) Read the whole dest register without mask > b) Generate unmasked result by running op without mask > c) Applying mask to a,b and generate the final dest > > by doing this, insn with mask is converted to non-mask ones, although > more insns are generated but the performance should be acceptable. There > are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask > but as carry, but just as mentioned above, it's okay to use other ways > for a few insns. Eventually, we can pass v0 mask down to the backend if > it's proved a better solution. > > This approach will introduce a bunch of new vlen Vector IRs, especially > the arithmetic IRs such as vadd, my goal is for a good solution which > takes reasonable time to reach usable status, yet still be able to > evolve and generic enough for other vector ISA. Any comments?
Could you please share a repository with your changes or send them to me as patches? I have a few questions but I think it might be easier for me first to see the actual code. Thanks, Petr _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users