OK - to follow up on a blast from the past:

On Tue, 11 Dec 2007, Patrick Smears wrote:

> Hi all,
>
>> Sorry not to have been very inputful on this so far.
>> 
>> I can't claim to have great insight into the robustness aspects
>> of the NON_SIMD_CALL mechanism.  However, looking at your original message,
>> it occurs to me your prospects of that working are improved if you
>> ensure that the NON_SIMD_CALL'd code (test_func) does not refer to any
>> global variables, and certainly does not refer to any libc or other
>> functions (printf et al).  Any kind of entanglement with libc or
>> dynamic linking is likely to have a bad outcome, for gnarly reasons
>> which we've grappled with a lot in the distant past.
>> 
>> If you simply make test_func be a wrapper around a LOCK-prefixed
>> instruction and literally nothing else, your prospects might improve
>> (or not, YMMV :-)  Worth a try, I'd say.
>> 
>> AIUI the lock-prefixed insns (etc) are actually the only things that
>> you absolutely can't run on the simulator, right?
>
> That's certainly my understanding. The LOCK instructions are buried several 
> function layers down from where they'd most conveniently be wrapped, amongst 
> a number of libc calls, so extracting them isn't trivial (alas), but 
> certainly possible.
>
> I'll put my thinking cap on about this and the other suggestions - thanks to 
> all who have responded! - and try to figure out what will work best for this 
> particular situation. If I come up with anything that might be of general 
> use, I'll report back...

OK, for the record, here is what I did. To recap for anyone who missed the 
original discussion, the problem was that I wanted to run Valgrind on a 
process that communicates with a number of other processes via shared 
memory, and uses synchronisation primitives (mutexes etc) in that shared 
memory. The synchronisation primitives rely on using certain assembler 
instructions that perform atomic operations (e.g. read a location, and if 
its value is equal to register A, set it to register B, but don't let 
anyone change it between comparing it to A and setting it to B). Because 
of the way Valgrind works (simulating instructions), the atomic
instructions get broken up into separate loads/stores, meaning that 
they're no longer atomic (i.e. it's possible for someone to change the 
location in question after it was compared to register A, but before it 
was set to register B).

The upshot of this is that race conditions are introduced when running 
under Valgrind that do not otherwise exist - and this leads to the process 
becoming deadlocked :-(.

A number of potential solutions were suggested on the mailing list 
(thanks!), but the one I decided to go with was the one from Julian quoted 
above - to replace each atomic instruction with a function that just uses 
that atomic instruction, and then call it using one of the 
VALGRIND_NON_SIMD_CALL*() macros (which calls the function on the 'real' 
CPU - rather than simulating it - and so you get the atomicity). Of 
course, this means that no tracking is done of memory/cache/whatever used 
by the atomic instruction, but this is acceptable in this application (and 
probably most others).

Now, the locking primitives used by the application under test are 
(wrappers round wrappers round) the standard pthread_*() calls, which are 
provided (on Linux) by glibc, in the form of the nptl libpthread 
library[1]. So the easiest[2] thing to do was to modify the nptl routines 
to use VALGRIND_NON_SIMD_CALL*() when using atomic instructions.

It turns out that this isn't too hard - all the pthread_* synchronisation 
routines in NPTL are built on top of a simpler, mutex-like primitive - the 
"lowlevellock" - so only the implementation of that (and the atomic 
instructions used) needs to be modified. (In fact that's not totally true 
- some calls have hand-coded optimised assembler routines for specific 
CPUs - but those can be disabled, leaving a simpler C implementation in 
terms of lowlevellocks.)

This turned out to work beautifully - no more deadlocks, but plenty of 
bugs found/fixed in the software under test :-)

In case anyone wants to use this, I have made the relevant files 
available. I started with the RPM for the RHEL4 version of the C library 
(since that was what was running on the servers in question - though it is 
rather long in the tooth now). The original source RPM, a .spec file that 
will make the modifications, plus the necessary files that it will copy 
in, I've placed at http://valgrind.smears.org/

If anyone has any questions about how this all works etc I'll do my best 
to answer them...

Patrick

[1] There also exists the older LinuxThreads pthread implementation, and 
indeed other C libraries, but the system I needed to use was using nptl, 
so that was what I used.

[2] For some definition of 'easiest'

------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to