On 11. Jul 23 19:28, Wu, Fei wrote:
> On 7/11/2023 4:50 AM, Petr Pavlu wrote:
> > On  6. Jul 23 20:39, Wu, Fei wrote:
> >> [...]
> >> 
> >> This approach will introduce a bunch of new vlen Vector IRs, especially
> >> the arithmetic IRs such as vadd, my goal is for a good solution which
> >> takes reasonable time to reach usable status, yet still be able to
> >> evolve and generic enough for other vector ISA. Any comments?

This personally looks to me as a right direction. Supporting scalable
vector extensions in Valgrind as a first-class citizen would be my
preferred choice. I think it is something that will be needed to handle
Arm SVE and RISC-V RVV well. On the other hand, it is likely the most
complex approach and could take time to iron out.

> > Could you please share a repository with your changes or send them to me
> > as patches? I have a few questions but I think it might be easier for me
> > first to see the actual code.
> > 
> Please see attachment. It's a very raw version to just verify the idea,
> mask is not added but expected to be done as mentioned above, it's based
> on commit 71272b2529 on your branch, patch 0013 is the key.

Thanks for sharing this code. The previous discussions and this series
introduces a new concept of translating client code per some CPU state.
That is something I spent most time thinking about.

I can see it is indeed necessary for RVV. In particular, this
"versioning" of translations allows that Valgrind IR can statically
express an element type of each vector operation, i.e. that it is an
operation on I32, F64, ... An alternative would be to try to express the
type dynamically in IR. That should be still somewhat manageable in the
toIR frontend but I have a hard time seeing how it would work for the
instrumentation and codegen.

The versioning should work well for RVV translations because my
expectation is that most RVV loops will consist of a call to vsetvli
(with a static vtype), followed by some actual vector operations. Such
a block then requires only one translation.

This is however true only if translations are versioned just per vtype,
without vl. If I understood correctly, the patches version them per vl
too but it isn't clear to me conceptually if this is really necessary.

For instance, I think VAdd8 could look as follows:
VAdd8(<len>, <in1>, <in2>, <flags?>) where <len> is something as
IRExpr_Get(OFFB_VL, Ity_I64).

Another problem which I noticed is that blocks containing no RVV
instructions are also versioned. Consider the following:
while (true) {
   // (1) some RVV code which can set vtype to different values
   // (2) a large chunk of non-RVV code
}

The code in (2) will currently have multiple same translations for each
residue left in vtype by (1).

In general, I think the concept of allowing translations per some CPU
state could be useful in other cases and for other architectures too.
For RISC-V, it could be beneficial for floating-point operations. My
expectation is that regular RISC-V FP code will have instructions with
encoded rm=DYN and always executed with frm=RNE. The current approach is
that the toIR frontend generates an IR which reads the rounding mode
from frm and remaps it to the Valgrind's representation. The codegen
then does the opposite. The idea here is that the frontend would know
the actual rounding mode and could create IR which has directly this
mode, for instance, AddF64(Irrm_NEAREST, <in1>, <in2>). The codegen then
doesn't need to know how to handle any dynamic rounding modes as they
become static.

I plan to look further into this series. Specifically, I'd like to have
a stab at adding some basic support for Arm SVE to get a better
understanding if this is generic enough.

Thanks,
Petr


_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to