Hi Julian,

Thank you for your response.  Here is some additional information.  I
apologize that this is going to be a long message, I'm trying to provide
everything relevant to the problem.

I think I'm following the instructions as documented in the MPI section on
the valgrind web page.  However, the fact that I don't see the valgrind
banner when I run an MPI application suggests that I'm missing something.
The only messages I get are those from the MPI wrappers from valgrind.

Detailed responses to your questions are in line below.

Thank you very much for any information you can provide!

On 01/21/2013 03:49 PM, Raghu Reddy wrote:
>>  I was wondering if anyone has successfully used valgrind with MPI 
>>  applications on SGI systems with MPT?
>
> I don't know about on SGI w/ MPT (whatever MPT is).  But for sure in
general on MPI, it works.

The SGI implementation of MPI is called MPT (message passing toolkit).  So
it is simply another implementation of MPI and conforms to the MPI2
standard.

>> Using a non-MPI program (the simple example from the valgrind website
>> tutorial) works exactly as documented.  However, an MPI hello world 
>> example with the same error does not point out the error, even though 
>> there are messages from the MPI wrappers.
>
> Does your MPI hello world test work as expected (with -DBUG) if you remove
the MPI specifics and just run it as an
 > ordinary executable?

Without Valgrind:
=============

If I compile my MPI example code with -DBUG (without linking with valgrind),
and launch it with MPI launcher (on the SGI Systems it is not possible to
run an MPI program without using the MPI launcher) the program runs to
completion even though it has a bug (the complete code was included in the
original message; I wasn't sure if it was appropriate to include it again
for completeness):

r31i2n2% mpicc -DBUG -g -O0 -o hello_mpi_c hello_mpi_c.c
r31i2n2% mpiexec_mpt -np 4 ./hello_mpi_c
Hello from rank 0 out of 4; procname = r31i2n2
Print something 0
Hello from rank 1 out of 4; procname = r31i2n2
Print something 0
Hello from rank 2 out of 4; procname = r31i2n2
Print something 0
Hello from rank 3 out of 4; procname = r31i2n2
Print something 0
r31i2n2%

If I make it a serial program by stripping out all MPI, I can execute the
program as a serial program, and it runs to completion if (even though there
is a bug):

r31i2n2% m mem-bug.c
  #include <stdlib.h>

  void f(void)
  {
     int* x = malloc(10 * sizeof(int));
     x[10] = 0;        // problem 1: heap block overrun
  }                    // problem 2: memory leak -- x not freed

  int main(void)
  {
     f();
     return 0;
  }
r31i2n2%
r31i2n2% icc -o mem-bug -debug mem-bug.c
r31i2n2%
r31i2n2% ./mem-bug
r31i2n2%

With valgrind:
==========

The serial program with no MPI, when launched with valgrind, it does point
to the error and valgrind is working as expected:

r31i2n2% /contrib/valgrind/valgrind-3.8.1/bin/valgrind ./mem-bug
==9806== Memcheck, a memory error detector
==9806== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==9806== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==9806== Command: ./mem-bug
==9806==
==9806== Invalid write of size 4
==9806==    at 0x40051E: f (mem-bug.c:6)
==9806==    by 0x40052E: main (mem-bug.c:11)
==9806==  Address 0x5a70068 is 0 bytes after a block of size 40 alloc'd
==9806==    at 0x4C278FE: malloc (vg_replace_malloc.c:270)
==9806==    by 0x400508: f (mem-bug.c:5)
==9806==    by 0x40052E: main (mem-bug.c:11)
==9806==
==9806==
==9806== HEAP SUMMARY:
==9806==     in use at exit: 40 bytes in 1 blocks
==9806==   total heap usage: 1 allocs, 0 frees, 40 bytes allocated
==9806==
==9806== LEAK SUMMARY:
==9806==    definitely lost: 40 bytes in 1 blocks
==9806==    indirectly lost: 0 bytes in 0 blocks
==9806==      possibly lost: 0 bytes in 0 blocks
==9806==    still reachable: 0 bytes in 0 blocks
==9806==         suppressed: 0 bytes in 0 blocks
==9806== Rerun with --leak-check=full to see details of leaked memory
==9806==
==9806== For counts of detected and suppressed errors, rerun with: -v
==9806== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
r31i2n2%

But the problem is I am unable to get valgrind to point out the problem in
the MPI code.  The output from that run is included below (if it is all
right, I will include the source code also):

r31i2n2% m hello_mpi_c.c
#include <stdio.h>
#include <mpi.h>

int main(int argc, char **argv)
{
   int ierr, myid, npes;
   int len;
   char name[MPI_MAX_PROCESSOR_NAME];

   ierr = MPI_Init(&argc, &argv);
#ifdef MACROTEST
#define MACROTEST 10
#endif
   ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid);
   ierr = MPI_Comm_size(MPI_COMM_WORLD, &npes);
   ierr = MPI_Get_processor_name( name, &len );

     printf("Hello from rank %d out of %d; procname = %s\n", myid, npes,
name);

#ifdef MACROTEST
     printf("Test Macro: %d\n", MACROTEST);
#endif
#ifdef BUG
     {
       int* x = (int*)malloc(10 * sizeof(int));
       x[10] = 0;        // problem 1: heap block overrun
       printf("Print something %d\n",x[10]);
     }                    // problem 2: memory leak -- x not freed
#endif

   ierr = MPI_Finalize();

}
r31i2n2% mpicc -DBUG -g -O0 -o hello_mpi_c hello_mpi_c.c
/contrib/valgrind/valgrind-3.8.1/lib/valgrind/libmpiwrap-amd64-linux.so
r31i2n2%
r31i2n2% env MPIWRAP_DEBUG=verbose mpiexec_mpt -np 1
/contrib/valgrind/valgrind-3.8.1/bin/valgrind ./hello_mpi_c
valgrind MPI wrappers  9993: Active for pid 9993
valgrind MPI wrappers  9993: Try MPIWRAP_DEBUG=help for possible options
valgrind MPI wrappers  9993: enter PMPI_Init
valgrind MPI wrappers  9993: enter PMPI_Init_thread
valgrind MPI wrappers  9993:  exit PMPI_Init (err = 0)
valgrind MPI wrappers  9993: enter PMPI_Comm_rank
valgrind MPI wrappers  9993:  exit PMPI_Comm_rank (err = 0)
valgrind MPI wrappers  9993: enter PMPI_Comm_size
valgrind MPI wrappers  9993:  exit PMPI_Comm_size (err = 0)
valgrind MPI wrappers  9993: enter PMPI_Get_processor_name
Hello from rank 0 out of 1; procname = r31i2n2
Print something 0
valgrind MPI wrappers  9993: enter PMPI_Finalize
valgrind MPI wrappers  9993:  exit PMPI_Finalize (err = 0)
r31i2n2%

-----Original Message-----
From: Julian Seward [mailto:jsew...@acm.org] 
Sent: Tuesday, January 29, 2013 4:33 AM
To: Raghu Reddy
Cc: Valgrind-users@lists.sourceforge.net
Subject: Re: [Valgrind-users] Is it possible to use valgrind with MPI
applications (with SGI MPT)?


On 01/21/2013 03:49 PM, Raghu Reddy wrote:
> I was wondering if anyone has successfully used valgrind with MPI 
> applications on SGI systems with MPT?

I don't know about on SGI w/ MPT (whatever MPT is).  But for sure in general
on MPI, it works.

> Using a non-MPI program (the simple example from the valgrind website
> tutorial) works exactly as documented.  However, an MPI hello world 
> example with the same error does not point out the error, even though 
> there are messages from the MPI wrappers.

Does your MPI hello world test work as expected (with -DBUG) if you remove
the MPI specifics and just run it as an ordinary executable?

J



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to