Re: [Valgrind-developers] binary code optimization

Nicholas Nethercote Mon, 10 Mar 2008 13:52:27 -0700

On Sun, 9 Mar 2008, Nuno Lopes wrote:

> I'm currently attending a Virtual Execution Environment class at the
> university this semester. For this class I need to do a project (with 2
> colleagues) about something related with the class (VMs, etc..). The project
> duration is about 1 month.
> My idea (still pending the teacher's authorization) was to do something with
> valgrind, more specifically implement some binary code optimization over the
> VEX IR. Do you think this is feasible in the given time frame? And which
> optimizations are currently implemented and which ones do you think are
> interesting to implement?
>
> We are also open to other ideas (as long as they fit in the class subject).
> Of course we'll give back the code when we finish the project.


I think it's certainly feasible, and a nice idea for a project.

http://www.valgrind.org/docs/valgrind2007.pdf has some info on optimisations 
performed in section 3.7.  Note especially that there are two optimisation 
phases.  You'll want to read the surrounding sections (and probably the 
whole paper) to make sense of it.

VEX/priv/ir/iropt.c is the main file controlling Vex optimisation, I 
believe.  It has this comment near the top:

    Transformation order
    ~~~~~~~~~~~~~~~~~~~~

    There are three levels of optimisation, controlled by
    vex_control.iropt_level.  Define first:

    "Cheap transformations" are the following sequence:
       * Redundant-Get removal
       * Redundant-Put removal
       * Constant propagation/folding
       * Dead code removal
       * Specialisation of clean helper functions
       * Dead code removal

    "Expensive transformations" are the following sequence:
       * CSE
       * Folding of add/sub chains
       * Redundant-GetI removal
       * Redundant-PutI removal
       * Dead code removal

    Then the transformations are as follows, as defined by
    vex_control.iropt_level:

    Level 0:
       * Flatten into atomic form.

    Level 1: the following sequence:
       * Flatten into atomic form.
       * Cheap transformations.

    Level 2: the following sequence
       * Flatten into atomic form.
       * Cheap transformations.
       * If block contains any floating or vector types, CSE.
       * If block contains GetI or PutI, Expensive transformations.
       * Try unrolling loops.  Three possible outcomes:
         - No effect: do nothing more.
         - Unrolled a loop, and block does not contain GetI or PutI:
           Do: * CSE
               * Dead code removal
         - Unrolled a loop, and block contains GetI or PutI:
           Do: * Expensive transformations
               * Cheap transformations

The comments in VEX/pub/libvex_ir.h is the best documentation for Vex.

As the Valgrind paper says, these are fairly heavyweight optimisations for a 
binary translation system.  You could try to do some more compiler-style 
optimisations, but I think the scope for improvement there is not so great. 
But I could be wrong -- Julian, what do you think?

A very interesting project you could be try would be to implement chaining 
for Vex -- the Valgrind paper (above) talks about this.  (Section 2.3.6 of 
http://www.valgrind.org/docs/phd2004.pdf discusses the old implementation of 
chaining that was in pre-Vex Valgrind -- you can see the actual 
implementation in Valgrind 2.4.1 at 
http://www.valgrind.org/downloads/old.html.)

Another interesting one would be to try to stage the compiler somehow -- 
recompile hot code fragments more aggressively, as frameworks like Pin does.

Both of those would be quite challenging, though, because they would require 
changing how Valgrind manages its translated code blocks.  Just doing new 
intra-block optimisations would be easier because you'd just be adding more 
optimisation passes (but less likely to achieve interesting results, as I 
said).

An important question you should consider early is what your goals are. 
More specifically, are you trying to speed up Valgrind's code when no 
instrumentation is present?  In one way, that's the obvious thing to do, but 
it's also the least interesting thing to do (the paper talks about this in 
section 5.4).  A more interesting thing is to speed up any of the real 
existing tools, especially Memcheck, since that's the most widely used.

As for giving the code you develop back, there's a complication -- Julian is 
the sole author and thus owns the copyright for the Vex part of Valgrind. 
This means that accepting external contributions for Vex is more complicated 
than the rest of Valgrind.  He might have more to add.

If you have more questions, feel free to ask.

Nick

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Valgrind-developers mailing list
Valgrind-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-developers

Re: [Valgrind-developers] binary code optimization

Reply via email to