David, Julian,
Thank you both for the insight.  This helps.

Julian:
I had inferred that there would be some mechanism by which each node
didn't produce it's own output from a conversation with a friend, and
not from any official documentation.  Since, in an mpi environment there
are no guarantees about how nodes are connected, it seems that the only
way to accomplish this would be if valgrind issued calls to the mpi
libraries, and so I had thought that this claim was a bit odd.

Finding the addresses of the network buffers or forcing MPI to deal with
"traditional" sys calls for TCP/IP are both options we will look into in
the future.  For now, however, the most important thing for me is to
find a way to organize the massive amount of output from many nodes
(which currently exceeds 60k lines).

David:
Thanks for the script (and to Ms. Pittman as well).  This looks very
helpful.  The only possible issue, I think, will be to guarantee that
each process has a different pid, which I don't imagine is guaranteed
for a job spread over many nodes.  I think such collisions will prove
uncommon.

Sincerely,
~Josh


Dave Goodell wrote:
> On Apr 28, 2010, at 1:36 AM, Julian Seward wrote:
>
>   
>> On Wednesday 28 April 2010, Joshua R. Tepper wrote:
>>     
>>> Hi all,
>>> I am trying to use Valgrind to debug an MPI application, but things  
>>> don't
>>> seem to work.  I understand that Valgrind explicitly implements  
>>> wrapper
>>> functions for MPI calls and uses the PMPI interface.  To the best  
>>> of my
>>> understanding, when called in the context of MPI, valgrind should  
>>> somehow
>>> check MPI calls,
>>>       
>> yes
>>     
>>> avoid giving "garbage" output from the underlying mpi
>>> libraries,
>>>       
>> no (but see below)
>>     
>>> and suppress the printing of a separate output for each node.
>>>       
>> no (how did you infer that?)
>>     
>
> I'm not entirely sure what you are expecting in terms of output, but  
> you might try Ashley Pittman's vg_xmlmerge.pl script.  I've never used  
> it, but I believe that it merges valgrind output and removes  
> duplications for a parallel job.
>
> http://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg01162.html
>
>   
>> Re the garbage, debugging MPI apps is problematic because the NIC
>> I/O and control buffers are mapped directly into memory, and memcheck
>> doesn't have a way to detect state changes in them.
>>
>> One option is to use the --ignore-ranges= flag, if you can figure out
>> what the relevant NIC addresses are.  Another is to tell your MPI  
>> stack
>> to not map cards into memory, but just to use TCP/IP via normal
>> syscalls to communicate.  That is (or at least, used to be) possible
>> with OpenMPI with the mpirun args --mca btl tcp,self, for example.
>>
>> If you are using OpenMPI you might want to ask the OpenMPI devs
>> for advice.  They are pretty Memcheck-aware, afaik.
>>     
>
> You should be able to build a TCP version of MVAPICH2 by passing "-- 
> with-device=ch3:sock" to configure.  While you are doing that, you  
> should probably also include "--enable-g=dbg,meminit" to avoid some  
> messages about passing uninitialized buffers to certain syscalls.
>
> You may also want to post your message to mvapich-disc...@cse.ohio-state.edu 
>   to see if the OSU folks have any specific suggestions when using  
> IB.  At the very least it might be a gentle reminder for them to make  
> MVAPICH2 play nicely with Valgrind in the future (if it doesn't right  
> now).
>
> -Dave
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Valgrind-users mailing list
> Valgrind-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
>   

------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to