Hi,

I'm trying to get TotalView to work using OpenMPI with a simple
1-processor test program. I have tried building it using both OpenMPI
1.1.4 and 1.2.3, with the -g option. This is on two RedHat EL4 systems,
one a 32-bit system, and one a 64-bit system. Each executable is built
on its own system. I then use the command:

mpirun -tv -np 1 /path/to/my/MPI/test/program

or

totalview mpirun -a -np 1 /path/to/my/MPI/test/program

By following the OpenMPI docs
(http://www.open-mpi.org/faq/?category=running#run-with-tv), TV will
start mpirun (actually, orterun), and then state that it can't find my
main program, as shown below in the output on the 32-bit system:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> totalview mpirun -a -np 1 /path/to/my/MPI/test/program
Linux x86 TotalView 8.1.0-0
Copyright 2007 by TotalView Technologies, LLC. ALL RIGHTS RESERVED.
Copyright 1999-2007 by Etnus, LLC.
Copyright 1999 by Etnus, Inc.
Copyright 1996-1998 by Dolphin Interconnect Solutions, Inc.
Copyright 1989-1996 by BBN Inc.
Reading symbols for process 1, executing "mpirun"
Library /usr/local/openmpi/1.1.4/intel/i386/bin/orterun, with 2 asects,
was linked at 0x08048000, and initially loaded at 0x90000000
WARNING: Invalid .gnu_debuglink checksum for file
'/usr/lib/debug/usr/local/openmpi/1.1.4/intel/i386/bin/orterun.debug' is
3fb29221, expected fa794855
Mapping 3031 bytes of ELF string data from
'/usr/local/openmpi/1.1.4/intel/i386/bin/orterun'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/usr/local/openmpi/1.1.4/intel/i386/bin/orterun'...done
Library /usr/local/openmpi/1.1.4/intel/i386/lib/liborte.so.0, with 2
asects, was linked at 0x00000000, and initially loaded at 0x90022d00
WARNING: Invalid .gnu_debuglink checksum for file
'/usr/lib/debug/usr/local/openmpi/1.1.4/intel/i386/lib/liborte.so.0.0.0.
debug' is d24f7322, expected 2e59816b
Mapping 32483 bytes of ELF string data from
'/usr/local/openmpi/1.1.4/intel/i386/lib/liborte.so.0'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/usr/local/openmpi/1.1.4/intel/i386/lib/liborte.so.0'...done
Library /usr/lib/libtorque.so.0, with 2 asects, was linked at
0x00456000, and initially loaded at 0x900b0500
Mapping 6778 bytes of ELF string data from
'/usr/lib/libtorque.so.0'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/usr/lib/libtorque.so.0'...done
Library /usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0, with 2
asects, was linked at 0x00000000, and initially loaded at 0x900ea400
WARNING: Invalid .gnu_debuglink checksum for file
'/usr/lib/debug/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0.0.0.
debug' is 4a2fe1c5, expected 17575d23
Mapping 9597 bytes of ELF string data from
'/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0'...done
Library /lib/libnsl.so.1, with 2 asects, was linked at 0x04a92000, and
initially loaded at 0x9012a900
Mapping 3146 bytes of ELF string data from '/lib/libnsl.so.1'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/lib/libnsl.so.1'...done
Library /lib/libutil.so.1, with 2 asects, was linked at 0x00343000, and
initially loaded at 0x9013ee00
Mapping 407 bytes of ELF string data from '/lib/libutil.so.1'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/lib/libutil.so.1'...done
Library /lib/tls/libm.so.6, with 2 asects, was linked at 0x00bbb000, and
initially loaded at 0x90140800
Mapping 1996 bytes of ELF string data from '/lib/tls/libm.so.6'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/lib/tls/libm.so.6'...done
Library /lib/libgcc_s.so.1, with 2 asects, was linked at 0x00456000, and
initially loaded at 0x90161700
Mapping 1403 bytes of ELF string data from '/lib/libgcc_s.so.1'...done
Indexing 1404 bytes of DWARF '.eh_frame' symbols from
'/lib/libgcc_s.so.1'...done
Library /lib/tls/libpthread.so.0, with 2 asects, was linked at
0x00cd1000, and initially loaded at 0x90168800
Mapping 4402 bytes of ELF string data from
'/lib/tls/libpthread.so.0'...done
Indexing 2272 bytes of DWARF '.eh_frame' symbols from
'/lib/tls/libpthread.so.0'...done
Library /lib/tls/libc.so.6, with 2 asects, was linked at 0x00a88000, and
initially loaded at 0x90178400
Mapping 20760 bytes of ELF string data from '/lib/tls/libc.so.6'...done
Indexing 16648 bytes of DWARF '.eh_frame' symbols from
'/lib/tls/libc.so.6'...done
Library /lib/libdl.so.2, with 2 asects, was linked at 0x00bb5000, and
initially loaded at 0x902a2000
Mapping 481 bytes of ELF string data from '/lib/libdl.so.2'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/lib/libdl.so.2'...done
Library /opt/intel/fc/9.1.040/lib/libimf.so, with 2 asects, was linked
at 0x00000000, and initially loaded at 0x902a3c00
Mapping 38346 bytes of ELF string data from
'/opt/intel/fc/9.1.040/lib/libimf.so'...done
Library /opt/intel/fc/9.1.040/lib/libirc.so, with 2 asects, was linked
at 0x00000000, and initially loaded at 0x904e0900
Mapping 12223 bytes of ELF string data from
'/opt/intel/fc/9.1.040/lib/libirc.so'...done
Library /lib/ld-linux.so.2, with 2 asects, was linked at 0x00a6f000, and
initially loaded at 0x9000d600
Mapping 390 bytes of ELF string data from '/lib/ld-linux.so.2'...done
Indexing 348 bytes of DWARF '.eh_frame' symbols from
'/lib/ld-linux.so.2'...done
**************************************
Automatically starting orterun
**************************************
Library /lib/libnss_nis.so.2, with 2 asects, was linked at 0x00000000,
and initially loaded at 0x90520c00
Mapping 1974 bytes of ELF string data from '/lib/libnss_nis.so.2'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/lib/libnss_nis.so.2'...done
Library /lib/libnss_files.so.2, with 2 asects, was linked at 0x00000000,
and initially loaded at 0x90528b00
Mapping 2020 bytes of ELF string data from
'/lib/libnss_files.so.2'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/lib/libnss_files.so.2'...done

[the following is the output of my program]
rank = 0, size = 1, sysname = Linux, nodename = adroit, release =
2.6.9-42.0.10.ELsmp, version = #1 SMP Tue Feb 27 07:12:58 EST 2007,
machine = i686

Could not find the user's main function.
Check TV::Private::main_names in tvdinit.tvd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The output from the 64-bit system is essentially the same, except that
the checksum warnings don't appear, and the paths are different. 

tvdinit.tvd is located under the totalview root directory in
linux-x86/lib (32-bit system) or linux-x86-64/lib (64-bit system), and
appears to contain the correct main program name ("main"), which 'nm' is
able to display.

If I don't source openmpi-totalview.tcl in .tvdrc, then I still get the
last 2 lines above, but the program doesn't start. If I tell TV to run
it, then it runs to the end. In either case, I never see any source or
assembly code, so I can't put in any breakpoints.

Yet I can debug it under TotalView this way (i.e., via mpirun) when
building with and using MPICH, and I can debug it under TV with OpenMPI
if I run my program directly (i.e., not using mpirun or mpiexec).

Any idea why the main program can't be found when running under mpirun?
Does openmpi need to be built with either --enable-debug or
--enable-mem-debug? The "configure --help" says the former is not for
general MPI users. Unclear about the latter.

Thanks,
       Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University

Reply via email to