On 7/11/2023 7:28 PM, Wu, Fei wrote:
> On 7/11/2023 4:50 AM, Petr Pavlu wrote:
>> On  6. Jul 23 20:39, Wu, Fei wrote:
>>> On 5/29/2023 11:29 AM, Wu, Fei wrote:
>>>> On 5/28/2023 1:06 AM, Petr Pavlu wrote:
>>>>> On 21. Apr 23 17:25, Jojo R wrote:
>>>>>> We consider to add RVV/Vector [1] feature in valgrind, there are some
>>>>>> challenges.
>>>>>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means 
>>>>>> the
>>>>>> vector length is agnostic.
>>>>>> ARM's SVE is not supported in valgrind :(
>>>>>>
>>>>>> There are three major issues in implementing RVV instruction set in 
>>>>>> Valgrind
>>>>>> as following:
>>>>>>
>>>>>> 1. Scalable vector register width VLENB
>>>>>> 2. Runtime changing property of LMUL and SEW
>>>>>> 3. Lack of proper VEX IR to represent all vector operations
>>>>>>
>>>>>> We propose applicable methods to solve 1 and 2. As for 3, we explore 
>>>>>> several
>>>>>> possible but maybe imperfect approaches to handle different cases.
>>>>>>
>>> I did a very basic prototype for vlen Vector-IR, particularly on RISC-V
>>> Vector (RVV):
>>>
>>> * Define new iops such as Iop_VAdd8/16/32/64, the difference from
>>> existing SIMD version is that no element number is specified like
>>> Iop_Add8x32
>>>
>>> * Define new IR type Ity_VLen along side existing types such as Ity_I64,
>>> Ity_V256
>>>
>>> * Define new class HRcVecVLen in HRegClass for vlen vector registers
>>> The real length is embedded in both IROp and IRType for vlen ops/types,
>>> it's runtime-decided and already known when handling insn such as vadd,
>>> this leads to more flexibility, e.g. backend can issue extra vsetvl if
>>> necessary.
>>>
>>> With the above, RVV instruction in the guest can be passed from
>>> frontend, to memcheck, to the backend, and generate the final RVV insn
>>> during host isel, a very basic testcase has been tested.
>>>
>>> Now here comes to the complexities:
>>>
>>> 1. RVV has the concept of LMUL, which groups multiple (or partial)
>>> vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This
>>> complicates the register allocation.
>>>
>>> 2. RVV uses the "implicit" v0 for mask, its content must be loaded to
>>> the exact "v0" register instead of any other ones if host isel wants to
>>> leverage RVV insn, this implicitness in ISA requires more explicitness
>>> in Valgrind implementation.
>>>
>>> For #1 LMUL, a new register allocation algorithm for it can be added,
>>> and it will be great if someone is willing to try it, I'm not sure how
>>> much effort it will take. The other way is splitting it into multiple
>>> ops which only takes one vector register, taking vadd for example, 2
>>> vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay
>>> for the widening insn, most of the arithmetic insns can be covered in
>>> this way. The exception could be register gather insn vrgather, which we
>>> can consult other ways for it, e.g. scalar or helper.
>>>
>>> For #2 v0 mask, one way is to handle the mask in the very beginning at
>>> guest_riscv64_toIR.c, similar to what AVX port does:
>>>
>>> a) Read the whole dest register without mask
>>> b) Generate unmasked result by running op without mask
>>> c) Applying mask to a,b and generate the final dest
>>>
>>> by doing this, insn with mask is converted to non-mask ones, although
>>> more insns are generated but the performance should be acceptable. There
>>> are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask
>>> but as carry, but just as mentioned above, it's okay to use other ways
>>> for a few insns. Eventually, we can pass v0 mask down to the backend if
>>> it's proved a better solution.
>>>
>>> This approach will introduce a bunch of new vlen Vector IRs, especially
>>> the arithmetic IRs such as vadd, my goal is for a good solution which
>>> takes reasonable time to reach usable status, yet still be able to
>>> evolve and generic enough for other vector ISA. Any comments?
>>
>> Could you please share a repository with your changes or send them to me
>> as patches? I have a few questions but I think it might be easier for me
>> first to see the actual code.
>>
> Please see attachment. It's a very raw version to just verify the idea,
> mask is not added but expected to be done as mentioned above, it's based
> on commit 71272b2529 on your branch, patch 0013 is the key.
> 
Hi Petr,

Have you taken a look? Any comments?

Thanks,
Fei.

> btw, I will setup a repository but it takes a few days to pass the
> internal process.
> 
> Thanks,
> Fei.
> 
>> Thanks,
>> Petr



_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to