On Monday 03 December 2007 20:29, Dave Nomura wrote:
> So you tracked down these unitialized values down to the strxxx
> functions defined in ld.so and Valgrind normally intercepts these calls
> because Memcheck can't handle the sorts of code that is generated for
> these routines?

Correct.

> Is it possible to teach Memcheck to deal with these optimizations?
>
> Steve Munroe, the author of those optimized strxxx functions, tells me
> that the kinds of optimizations done for these routines are going to
> start appearing in other library routines, and possibly in generated
> object code so the problem is going to become more pervasive.

You're in the land of difficult tradeoffs.  A lot of effort has 
already been applied here.

All these optimised, vectorised (effectively) string ops rely on two
techniques:

(1) using properties of carry-chain propagation in addition/subtraction
    so as find out whether any byte in a word is zero, and if so
    which one

(2) reading (traditional C-style zero-terminated) strings using 
    aligned word reads, rather than byte reads


(1) fools Memcheck's normal handling of definedness tracking for
    adds/subtracts, causing it to believe the result of the add/subtract
    is completely undefined, when it isn't really.  In fact Memcheck
    can and sometimes does generate a more exact interpretation, which
    does handle this case correctly.

    The problem is deciding when to apply it.  The standard analysis 
    costs about 3 insns in the generated code, and the exact analysis
    more than 10 insns (+ more registers).  Applying the expensive case
    throughout would cause significant slowdowns to the 99.99% of code
    fragments for which the standard handling is perfectly adequate.

(2) causes Memcheck to report invalid address errors for the partial
    word loads covering the zero terminating bytes at the end of
    strings.  You can stop it complaining about this by giving
    --partial-loads-ok=yes, but that could cause genuine errors to
    be missed.  Said flag is not enabled by default.

    I realise that (2) is "perfectly safe" in that the word-sized loads
    are naturally aligned and so cannot possibly cause any page faults 
    that would not otherwise occur.  Nevertheless, any way you slice it,
    ISO C/C++ says that reading memory outside of allocated blocks
    counts as undefined behaviour (IIUC), and that's precisely what 
    Memcheck aims to report.


We have never claimed that Memcheck is suitable for code compiled at
-O2 and above.  -O is the max recommended level.  I would advocate the
following:

* do not allow gcc to inline stringops at -O, only at -O2 and above

* do not strip all symbol names off ld.so


In short there's a conflict between optimising the hell out of stringops
and having enough visibility for reliable debugging.  Given the above
constraints I don't see how you can have your cake and eat it.

Note that none of the above is PPC specific -- it also applies to
x86/amd64.  I'm not sure why these problems appear more acute on ppc
-- it may be some interaction between the carry chain propagation
games and the fact that ppc is bigendian.

J

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
Valgrind-developers mailing list
Valgrind-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-developers

Reply via email to