On 11. Jul 23 19:28, Wu, Fei wrote: > On 7/11/2023 4:50 AM, Petr Pavlu wrote: > > On 6. Jul 23 20:39, Wu, Fei wrote: > >> [...] > >> > >> This approach will introduce a bunch of new vlen Vector IRs, especially > >> the arithmetic IRs such as vadd, my goal is for a good solution which > >> takes reasonable time to reach usable status, yet still be able to > >> evolve and generic enough for other vector ISA. Any comments?
This personally looks to me as a right direction. Supporting scalable vector extensions in Valgrind as a first-class citizen would be my preferred choice. I think it is something that will be needed to handle Arm SVE and RISC-V RVV well. On the other hand, it is likely the most complex approach and could take time to iron out. > > Could you please share a repository with your changes or send them to me > > as patches? I have a few questions but I think it might be easier for me > > first to see the actual code. > > > Please see attachment. It's a very raw version to just verify the idea, > mask is not added but expected to be done as mentioned above, it's based > on commit 71272b2529 on your branch, patch 0013 is the key. Thanks for sharing this code. The previous discussions and this series introduces a new concept of translating client code per some CPU state. That is something I spent most time thinking about. I can see it is indeed necessary for RVV. In particular, this "versioning" of translations allows that Valgrind IR can statically express an element type of each vector operation, i.e. that it is an operation on I32, F64, ... An alternative would be to try to express the type dynamically in IR. That should be still somewhat manageable in the toIR frontend but I have a hard time seeing how it would work for the instrumentation and codegen. The versioning should work well for RVV translations because my expectation is that most RVV loops will consist of a call to vsetvli (with a static vtype), followed by some actual vector operations. Such a block then requires only one translation. This is however true only if translations are versioned just per vtype, without vl. If I understood correctly, the patches version them per vl too but it isn't clear to me conceptually if this is really necessary. For instance, I think VAdd8 could look as follows: VAdd8(<len>, <in1>, <in2>, <flags?>) where <len> is something as IRExpr_Get(OFFB_VL, Ity_I64). Another problem which I noticed is that blocks containing no RVV instructions are also versioned. Consider the following: while (true) { // (1) some RVV code which can set vtype to different values // (2) a large chunk of non-RVV code } The code in (2) will currently have multiple same translations for each residue left in vtype by (1). In general, I think the concept of allowing translations per some CPU state could be useful in other cases and for other architectures too. For RISC-V, it could be beneficial for floating-point operations. My expectation is that regular RISC-V FP code will have instructions with encoded rm=DYN and always executed with frm=RNE. The current approach is that the toIR frontend generates an IR which reads the rounding mode from frm and remaps it to the Valgrind's representation. The codegen then does the opposite. The idea here is that the frontend would know the actual rounding mode and could create IR which has directly this mode, for instance, AddF64(Irrm_NEAREST, <in1>, <in2>). The codegen then doesn't need to know how to handle any dynamic rounding modes as they become static. I plan to look further into this series. Specifically, I'd like to have a stab at adding some basic support for Arm SVE to get a better understanding if this is generic enough. Thanks, Petr _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users