Hi, I'm trying to get TotalView to work using OpenMPI with a simple 1-processor test program. I have tried building it using both OpenMPI 1.1.4 and 1.2.3, with the -g option. This is on two RedHat EL4 systems, one a 32-bit system, and one a 64-bit system. Each executable is built on its own system. I then use the command:
mpirun -tv -np 1 /path/to/my/MPI/test/program or totalview mpirun -a -np 1 /path/to/my/MPI/test/program By following the OpenMPI docs (http://www.open-mpi.org/faq/?category=running#run-with-tv), TV will start mpirun (actually, orterun), and then state that it can't find my main program, as shown below in the output on the 32-bit system: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > totalview mpirun -a -np 1 /path/to/my/MPI/test/program Linux x86 TotalView 8.1.0-0 Copyright 2007 by TotalView Technologies, LLC. ALL RIGHTS RESERVED. Copyright 1999-2007 by Etnus, LLC. Copyright 1999 by Etnus, Inc. Copyright 1996-1998 by Dolphin Interconnect Solutions, Inc. Copyright 1989-1996 by BBN Inc. Reading symbols for process 1, executing "mpirun" Library /usr/local/openmpi/1.1.4/intel/i386/bin/orterun, with 2 asects, was linked at 0x08048000, and initially loaded at 0x90000000 WARNING: Invalid .gnu_debuglink checksum for file '/usr/lib/debug/usr/local/openmpi/1.1.4/intel/i386/bin/orterun.debug' is 3fb29221, expected fa794855 Mapping 3031 bytes of ELF string data from '/usr/local/openmpi/1.1.4/intel/i386/bin/orterun'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/usr/local/openmpi/1.1.4/intel/i386/bin/orterun'...done Library /usr/local/openmpi/1.1.4/intel/i386/lib/liborte.so.0, with 2 asects, was linked at 0x00000000, and initially loaded at 0x90022d00 WARNING: Invalid .gnu_debuglink checksum for file '/usr/lib/debug/usr/local/openmpi/1.1.4/intel/i386/lib/liborte.so.0.0.0. debug' is d24f7322, expected 2e59816b Mapping 32483 bytes of ELF string data from '/usr/local/openmpi/1.1.4/intel/i386/lib/liborte.so.0'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/usr/local/openmpi/1.1.4/intel/i386/lib/liborte.so.0'...done Library /usr/lib/libtorque.so.0, with 2 asects, was linked at 0x00456000, and initially loaded at 0x900b0500 Mapping 6778 bytes of ELF string data from '/usr/lib/libtorque.so.0'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/usr/lib/libtorque.so.0'...done Library /usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0, with 2 asects, was linked at 0x00000000, and initially loaded at 0x900ea400 WARNING: Invalid .gnu_debuglink checksum for file '/usr/lib/debug/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0.0.0. debug' is 4a2fe1c5, expected 17575d23 Mapping 9597 bytes of ELF string data from '/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0'...done Library /lib/libnsl.so.1, with 2 asects, was linked at 0x04a92000, and initially loaded at 0x9012a900 Mapping 3146 bytes of ELF string data from '/lib/libnsl.so.1'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/lib/libnsl.so.1'...done Library /lib/libutil.so.1, with 2 asects, was linked at 0x00343000, and initially loaded at 0x9013ee00 Mapping 407 bytes of ELF string data from '/lib/libutil.so.1'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/lib/libutil.so.1'...done Library /lib/tls/libm.so.6, with 2 asects, was linked at 0x00bbb000, and initially loaded at 0x90140800 Mapping 1996 bytes of ELF string data from '/lib/tls/libm.so.6'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/lib/tls/libm.so.6'...done Library /lib/libgcc_s.so.1, with 2 asects, was linked at 0x00456000, and initially loaded at 0x90161700 Mapping 1403 bytes of ELF string data from '/lib/libgcc_s.so.1'...done Indexing 1404 bytes of DWARF '.eh_frame' symbols from '/lib/libgcc_s.so.1'...done Library /lib/tls/libpthread.so.0, with 2 asects, was linked at 0x00cd1000, and initially loaded at 0x90168800 Mapping 4402 bytes of ELF string data from '/lib/tls/libpthread.so.0'...done Indexing 2272 bytes of DWARF '.eh_frame' symbols from '/lib/tls/libpthread.so.0'...done Library /lib/tls/libc.so.6, with 2 asects, was linked at 0x00a88000, and initially loaded at 0x90178400 Mapping 20760 bytes of ELF string data from '/lib/tls/libc.so.6'...done Indexing 16648 bytes of DWARF '.eh_frame' symbols from '/lib/tls/libc.so.6'...done Library /lib/libdl.so.2, with 2 asects, was linked at 0x00bb5000, and initially loaded at 0x902a2000 Mapping 481 bytes of ELF string data from '/lib/libdl.so.2'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/lib/libdl.so.2'...done Library /opt/intel/fc/9.1.040/lib/libimf.so, with 2 asects, was linked at 0x00000000, and initially loaded at 0x902a3c00 Mapping 38346 bytes of ELF string data from '/opt/intel/fc/9.1.040/lib/libimf.so'...done Library /opt/intel/fc/9.1.040/lib/libirc.so, with 2 asects, was linked at 0x00000000, and initially loaded at 0x904e0900 Mapping 12223 bytes of ELF string data from '/opt/intel/fc/9.1.040/lib/libirc.so'...done Library /lib/ld-linux.so.2, with 2 asects, was linked at 0x00a6f000, and initially loaded at 0x9000d600 Mapping 390 bytes of ELF string data from '/lib/ld-linux.so.2'...done Indexing 348 bytes of DWARF '.eh_frame' symbols from '/lib/ld-linux.so.2'...done ************************************** Automatically starting orterun ************************************** Library /lib/libnss_nis.so.2, with 2 asects, was linked at 0x00000000, and initially loaded at 0x90520c00 Mapping 1974 bytes of ELF string data from '/lib/libnss_nis.so.2'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/lib/libnss_nis.so.2'...done Library /lib/libnss_files.so.2, with 2 asects, was linked at 0x00000000, and initially loaded at 0x90528b00 Mapping 2020 bytes of ELF string data from '/lib/libnss_files.so.2'...done Indexing 4 bytes of DWARF '.eh_frame' symbols from '/lib/libnss_files.so.2'...done [the following is the output of my program] rank = 0, size = 1, sysname = Linux, nodename = adroit, release = 2.6.9-42.0.10.ELsmp, version = #1 SMP Tue Feb 27 07:12:58 EST 2007, machine = i686 Could not find the user's main function. Check TV::Private::main_names in tvdinit.tvd ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The output from the 64-bit system is essentially the same, except that the checksum warnings don't appear, and the paths are different. tvdinit.tvd is located under the totalview root directory in linux-x86/lib (32-bit system) or linux-x86-64/lib (64-bit system), and appears to contain the correct main program name ("main"), which 'nm' is able to display. If I don't source openmpi-totalview.tcl in .tvdrc, then I still get the last 2 lines above, but the program doesn't start. If I tell TV to run it, then it runs to the end. In either case, I never see any source or assembly code, so I can't put in any breakpoints. Yet I can debug it under TotalView this way (i.e., via mpirun) when building with and using MPICH, and I can debug it under TV with OpenMPI if I run my program directly (i.e., not using mpirun or mpiexec). Any idea why the main program can't be found when running under mpirun? Does openmpi need to be built with either --enable-debug or --enable-mem-debug? The "configure --help" says the former is not for general MPI users. Unclear about the latter. Thanks, Dennis Dennis McRitchie Computational Science and Engineering Support (CSES) Academic Services Department Office of Information Technology Princeton University