On 6/1/2023 7:13 PM, LATHUILIERE Bruno via Valgrind-developers wrote: > > -------- Courriel original -------- > Objet: Re: [Valgrind-developers] RFC: support scalable vector model / riscv > vector > Date: 2023-05-29 05:29 > De: "Wu, Fei" <fei2...@intel.com> > À: Petr Pavlu <petr.pa...@dagobah.cz>, Jojo R <rjie...@gmail.com> > Cc: pa...@sourceware.org, yunhai....@alibaba-inc.com, > valgrind-develop...@lists.sourceforge.net, > valgrind-users@lists.sourceforge.net, zhaomingxin....@alibaba-inc.com > >> On 5/28/2023 1:06 AM, Petr Pavlu wrote: >>> On 21. Apr 23 17:25, Jojo R wrote: >>>> The last remaining big issue is 3, which we introduce some ad-hoc >>>> approaches to deal with. We summarize these approaches into three >>>> types as >>>> following: >>>> >>>> 1. Break down a vector instruction to scalar VEX IR ops. >>>> 2. Break down a vector instruction to fixed-length VEX IR ops. >>>> 3. Use dirty helpers to realize vector instructions. >>> >>> I would also look at adding new VEX IR ops for scalable vector >>> instructions. In particular, if it could be shown that RVV and SVE can >>> use same new ops then it could make a good argument for adding them. >>> >>> Perhaps interesting is if such new scalable vector ops could also >>> represent fixed operations on other architectures, but that is just me >>> thinking out loud. >>> >> It's a good idea to consolidate all vector/simd together, the challenge is >> to verify its feasibility and to speedup the adaption progress, as it's >> supposed to take more efforts and longer time. Is there anyone with >> knowledge or experience of other ISA such as avx/sve on valgrind >can share >> the pain and gain, or we can do some quick prototype? >> >> Thanks, >> Fei. > > Hi, > > I don't know if my experience is the one you expect, nevertheless I will try > to share it.
Hi Bruno, Thank you for sharing this, it's definitely worth reading. > I'm the main developer of a valgrind tool called verrou (url: > https://github.com/edf-hpc/verrou ) which currently only works with x86_64 > architecture. > From user's point of view, verrou enables to estimate the effect of the > floating-point rounding error propagation (If you are interested by the > subject, there are documentation and publication). > It looks interesting, good job. > From valgrind tool developer's point of view, we need to replace all > floating-point operations (fpo) by our own modified fpo implemented with C++ > functions. One C++ function has 1,2 or 3 floating point input values and one > floating point output value. > Do you use libvex_BackEnd() to translate the insn to host, e.g. host_riscv64_isel.c to select the host insn, Is there any difference of processing flow between verrou and memcheck? > As we have to replace all VEX fpo, the way we handle with SSE and AVX has > consequences for us. For each kind of fpo > (add,sub,mul,div,sqrt)x(float,double), we have to replace VEX op for the > following variants : scalar, SSE low lane, SSE, AVX. It is painful but > possible via code generation. Thanks to the multiple VEX ops it is possible > to select only one type of instruction (it can be useful to 1- get speed up, > 2- know if floating point errors come from scalar or vector instructions). > > On the other hand, for fma operations (madd,msub)x(float,double) we have less > work to do, as valgrind do the un-vectorisation for us, but it is impossible > to instrument selectively scalar or vector ops. As these insns are un-vectorised, are there any other issues besides the 1 (performance) & 2 (original type) mentioned above? I want to make sure if there is any risk of the un-vectorisation design, e.g. when the vector length is large such as 2k vlen on rvv. > We could think that the multiple VEX ops enable performance improvements via > the vectorisation of C++ call, but it is not now possible (at least to my > knowledge). Indeed, with the valgrind API I don't know how I can get the > floating-point values in the register without applying un-vectorisation : To > get the values in the AVX register, I do an awful sequence of Iop_V256to64_0, > Iop_V256to64_1, Iop_V256to64_2, Iop_V256to64_3 for the 2 arguments. As it is > not possible to do a IRStmt_Dirty call with a function with 9 args (9=2*4+1 > 2 for a binary operation, 4 for the vector length and 1 for the result), I do > a first call to copy the 4 values of the first arg somewhere then a second > one to perform the 4 C++ calls. > Due to the algorithm inside the C++ calls it could be tricky to vectorise, > but I even didn't try because of the sequence of Iop_V256to64_*. For memcheck, the process is as follows if we put it simple: toIR -> instrumentation -> Backend isel If the vector insn is split into scalar at the stage of toIR just as I did in this series, the advantage looks obvious as I only need to deal with this single stage and leverage the existing code to handle the scalar version, the disadvantage is that it might lose some opportunities to optimize, e.g. * toIR - introduce extra temp variables for generated scalars * instrumentation - for memcheck, the key is to trace the V+A bits instead of the real results of the ops, the ideal case is V+A of the whole vector can be checked together w/o breaking it to scalars * Backend isel - the ideal case is to use the vector insn on host for guest vector insn, but I'm not sure how much effort will be taken to achieve this. > In my dreams I would like Iop_ to convert a V256 or V128 type to an aligned > pointer on floating point args. > > So, I don't know if my experience can be useful for you, but if someone has a > better solution to my needs it will be useful at least ... to me :) > Thank you again for this sharing. I hope the discussion can help both of us, and others. Best regards, Fei. > Best regards, > Bruno Lathuilière > > > > > Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis > à l'intention exclusive des destinataires et les informations qui y figurent > sont strictement confidentielles. Toute utilisation de ce Message non > conforme à sa destination, toute diffusion ou toute publication totale ou > partielle, est interdite sauf autorisation expresse. > > Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le > copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. > Si vous avez reçu ce Message par erreur, merci de le supprimer de votre > système, ainsi que toutes ses copies, et de n'en garder aucune trace sur > quelque support que ce soit. Nous vous remercions également d'en avertir > immédiatement l'expéditeur par retour du message. > > Il est impossible de garantir que les communications par messagerie > électronique arrivent en temps utile, sont sécurisées ou dénuées de toute > erreur ou virus. > ____________________________________________________ > > This message and any attachments (the 'Message') are intended solely for the > addressees. The information contained in this Message is confidential. Any > use of information contained in this Message not in accord with its purpose, > any dissemination or disclosure, either whole or partial, is prohibited > except formal approval. > > If you are not the addressee, you may not copy, forward, disclose or use any > part of it. If you have received this message in error, please delete it and > all copies from your system and notify the sender immediately by return > message. > > E-mail communication cannot be guaranteed to be timely secure, error or > virus-free. > > > > _______________________________________________ > Valgrind-developers mailing list > valgrind-develop...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/valgrind-developers _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users