OK - to follow up on a blast from the past: On Tue, 11 Dec 2007, Patrick Smears wrote:
> Hi all, > >> Sorry not to have been very inputful on this so far. >> >> I can't claim to have great insight into the robustness aspects >> of the NON_SIMD_CALL mechanism. However, looking at your original message, >> it occurs to me your prospects of that working are improved if you >> ensure that the NON_SIMD_CALL'd code (test_func) does not refer to any >> global variables, and certainly does not refer to any libc or other >> functions (printf et al). Any kind of entanglement with libc or >> dynamic linking is likely to have a bad outcome, for gnarly reasons >> which we've grappled with a lot in the distant past. >> >> If you simply make test_func be a wrapper around a LOCK-prefixed >> instruction and literally nothing else, your prospects might improve >> (or not, YMMV :-) Worth a try, I'd say. >> >> AIUI the lock-prefixed insns (etc) are actually the only things that >> you absolutely can't run on the simulator, right? > > That's certainly my understanding. The LOCK instructions are buried several > function layers down from where they'd most conveniently be wrapped, amongst > a number of libc calls, so extracting them isn't trivial (alas), but > certainly possible. > > I'll put my thinking cap on about this and the other suggestions - thanks to > all who have responded! - and try to figure out what will work best for this > particular situation. If I come up with anything that might be of general > use, I'll report back... OK, for the record, here is what I did. To recap for anyone who missed the original discussion, the problem was that I wanted to run Valgrind on a process that communicates with a number of other processes via shared memory, and uses synchronisation primitives (mutexes etc) in that shared memory. The synchronisation primitives rely on using certain assembler instructions that perform atomic operations (e.g. read a location, and if its value is equal to register A, set it to register B, but don't let anyone change it between comparing it to A and setting it to B). Because of the way Valgrind works (simulating instructions), the atomic instructions get broken up into separate loads/stores, meaning that they're no longer atomic (i.e. it's possible for someone to change the location in question after it was compared to register A, but before it was set to register B). The upshot of this is that race conditions are introduced when running under Valgrind that do not otherwise exist - and this leads to the process becoming deadlocked :-(. A number of potential solutions were suggested on the mailing list (thanks!), but the one I decided to go with was the one from Julian quoted above - to replace each atomic instruction with a function that just uses that atomic instruction, and then call it using one of the VALGRIND_NON_SIMD_CALL*() macros (which calls the function on the 'real' CPU - rather than simulating it - and so you get the atomicity). Of course, this means that no tracking is done of memory/cache/whatever used by the atomic instruction, but this is acceptable in this application (and probably most others). Now, the locking primitives used by the application under test are (wrappers round wrappers round) the standard pthread_*() calls, which are provided (on Linux) by glibc, in the form of the nptl libpthread library[1]. So the easiest[2] thing to do was to modify the nptl routines to use VALGRIND_NON_SIMD_CALL*() when using atomic instructions. It turns out that this isn't too hard - all the pthread_* synchronisation routines in NPTL are built on top of a simpler, mutex-like primitive - the "lowlevellock" - so only the implementation of that (and the atomic instructions used) needs to be modified. (In fact that's not totally true - some calls have hand-coded optimised assembler routines for specific CPUs - but those can be disabled, leaving a simpler C implementation in terms of lowlevellocks.) This turned out to work beautifully - no more deadlocks, but plenty of bugs found/fixed in the software under test :-) In case anyone wants to use this, I have made the relevant files available. I started with the RPM for the RHEL4 version of the C library (since that was what was running on the servers in question - though it is rather long in the tooth now). The original source RPM, a .spec file that will make the modifications, plus the necessary files that it will copy in, I've placed at http://valgrind.smears.org/ If anyone has any questions about how this all works etc I'll do my best to answer them... Patrick [1] There also exists the older LinuxThreads pthread implementation, and indeed other C libraries, but the system I needed to use was using nptl, so that was what I used. [2] For some definition of 'easiest' ------------------------------------------------------------------------------ _______________________________________________ Valgrind-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/valgrind-users
