On 6/1/2023 7:13 PM, LATHUILIERE Bruno via Valgrind-developers wrote:
> 
> -------- Courriel original --------
> Objet: Re: [Valgrind-developers] RFC: support scalable vector model / riscv 
> vector
> Date: 2023-05-29 05:29
> De: "Wu, Fei" <fei2...@intel.com>
> À: Petr Pavlu <petr.pa...@dagobah.cz>, Jojo R <rjie...@gmail.com>
> Cc: pa...@sourceware.org, yunhai....@alibaba-inc.com, 
> valgrind-develop...@lists.sourceforge.net,
> valgrind-users@lists.sourceforge.net, zhaomingxin....@alibaba-inc.com
> 
>> On 5/28/2023 1:06 AM, Petr Pavlu wrote:
>>> On 21. Apr 23 17:25, Jojo R wrote:
>>>> The last remaining big issue is 3, which we introduce some ad-hoc 
>>>> approaches to deal with. We summarize these approaches into three 
>>>> types as
>>>> following:
>>>>
>>>> 1. Break down a vector instruction to scalar VEX IR ops.
>>>> 2. Break down a vector instruction to fixed-length VEX IR ops.
>>>> 3. Use dirty helpers to realize vector instructions.
>>>
>>> I would also look at adding new VEX IR ops for scalable vector 
>>> instructions. In particular, if it could be shown that RVV and SVE can 
>>> use same new ops then it could make a good argument for adding them.
>>>
>>> Perhaps interesting is if such new scalable vector ops could also 
>>> represent fixed operations on other architectures, but that is just me 
>>> thinking out loud.
>>>
>> It's a good idea to consolidate all vector/simd together, the challenge is 
>> to verify its feasibility and to speedup the adaption progress, as it's 
>> supposed to take more efforts and longer time. Is there anyone with 
>> knowledge or experience of other ISA such as avx/sve on valgrind >can share 
>> the pain and gain, or we can do some quick prototype?
>>
>> Thanks,
>> Fei.
> 
> Hi,
> 
> I don't know if my experience is the one you expect, nevertheless I will try 
> to share it.

Hi Bruno,

Thank you for sharing this, it's definitely worth reading.

> I'm the main developer of a valgrind tool called verrou (url: 
> https://github.com/edf-hpc/verrou ) which currently only works with x86_64 
> architecture.
> From user's point of view, verrou enables to estimate the effect of the 
> floating-point rounding error propagation (If you are interested by the 
> subject, there are documentation and publication). 
> 
It looks interesting, good job.

> From valgrind tool developer's point of view, we need to replace all 
> floating-point operations (fpo) by our own modified fpo implemented with C++ 
> functions. One C++ function has 1,2 or 3 floating point input values and one 
> floating point output value. 
> 
Do you use libvex_BackEnd() to translate the insn to host, e.g.
host_riscv64_isel.c to select the host insn, Is there any difference of
processing flow between verrou and memcheck?

> As we have to replace all VEX fpo, the way we handle with SSE and AVX has 
> consequences for us. For each kind of fpo 
> (add,sub,mul,div,sqrt)x(float,double), we have to replace VEX op for the 
> following variants : scalar, SSE low lane, SSE, AVX. It is painful but 
> possible via code generation. Thanks to the multiple VEX ops it is possible 
> to select only one type of instruction (it can be useful to 1- get speed up, 
> 2- know if floating point errors come from scalar or vector instructions).
> 
> On the other hand, for fma operations (madd,msub)x(float,double) we have less 
> work to do, as valgrind do the un-vectorisation for us, but it is impossible 
> to instrument selectively scalar or vector ops.

As these insns are un-vectorised, are there any other issues besides the
1 (performance) & 2 (original type) mentioned above? I want to make sure
if there is any risk of the un-vectorisation design, e.g. when the
vector length is large such as 2k vlen on rvv.

> We could think that the multiple VEX ops enable performance improvements via 
> the vectorisation of C++ call, but it is not now possible (at least to my 
> knowledge). Indeed, with the valgrind API I don't know how I can get the 
> floating-point values in the register without applying un-vectorisation : To 
> get the values in the AVX register, I do an awful sequence of Iop_V256to64_0, 
> Iop_V256to64_1, Iop_V256to64_2, Iop_V256to64_3 for the 2 arguments. As it is 
> not possible to do a IRStmt_Dirty call with a function with 9 args (9=2*4+1  
> 2 for a binary operation, 4 for the vector length and 1 for the result), I do 
> a first call to copy the 4 values of the first arg somewhere then a second 
> one to perform the 4 C++ calls.
> Due to the algorithm inside the C++ calls it could be tricky to vectorise, 
> but I even didn't try because of the sequence of Iop_V256to64_*.

For memcheck, the process is as follows if we put it simple:
    toIR -> instrumentation -> Backend isel

If the vector insn is split into scalar at the stage of toIR just as I
did in this series, the advantage looks obvious as I only need to deal
with this single stage and leverage the existing code to handle the
scalar version, the disadvantage is that it might lose some
opportunities to optimize, e.g.
* toIR - introduce extra temp variables for generated scalars
* instrumentation - for memcheck, the key is to trace the V+A bits
instead of the real results of the ops, the ideal case is V+A of the
whole vector can be checked together w/o breaking it to scalars
* Backend isel - the ideal case is to use the vector insn on host for
guest vector insn, but I'm not sure how much effort will be taken to
achieve this.

> In my dreams I would like Iop_ to convert a V256 or V128 type to an aligned 
> pointer on floating point args. 
> 
> So, I don't know if my experience can be useful for you, but if someone has a 
> better solution to my needs it will be useful at least ... to me :)
> 
Thank you again for this sharing. I hope the discussion can help both of
us, and others.

Best regards,
Fei.

> Best regards,
> Bruno Lathuilière
> 
> 
> 
> 
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis 
> à l'intention exclusive des destinataires et les informations qui y figurent 
> sont strictement confidentielles. Toute utilisation de ce Message non 
> conforme à sa destination, toute diffusion ou toute publication totale ou 
> partielle, est interdite sauf autorisation expresse.
> 
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
> copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. 
> Si vous avez reçu ce Message par erreur, merci de le supprimer de votre 
> système, ainsi que toutes ses copies, et de n'en garder aucune trace sur 
> quelque support que ce soit. Nous vous remercions également d'en avertir 
> immédiatement l'expéditeur par retour du message.
> 
> Il est impossible de garantir que les communications par messagerie 
> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
> erreur ou virus.
> ____________________________________________________
> 
> This message and any attachments (the 'Message') are intended solely for the 
> addressees. The information contained in this Message is confidential. Any 
> use of information contained in this Message not in accord with its purpose, 
> any dissemination or disclosure, either whole or partial, is prohibited 
> except formal approval.
> 
> If you are not the addressee, you may not copy, forward, disclose or use any 
> part of it. If you have received this message in error, please delete it and 
> all copies from your system and notify the sender immediately by return 
> message.
> 
> E-mail communication cannot be guaranteed to be timely secure, error or 
> virus-free.
> 
> 
> 
> _______________________________________________
> Valgrind-developers mailing list
> valgrind-develop...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers



_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to