Thanks. Adding FCFLAGS="-mismatch -w" allowed openmpi-1.1a9r10177 to build this time and i am able to run simple test problems on the cluster. However, I am unable to run example problems that come with the Nag Parallel library which we also have in addition to the Nag f95 compiler. So I just installed mpich1 with mx support and was able to cleanly compile and run the Nag Parallel library sample problems with it. The Nag Parallel library was itself built as described here <http://www.nag.com/doc/inun/fd03/l6ad9/in.html>. For example i can successfully compile a sample problem from the parallel library with Openmpi like this: mpif77 f07aefpe.f -L/opt/nag/fdl6a03d9/lib -lnagmpi -lnagfls -lacml -dcfuns -mismatch -w. The compilation does give one warning "Unrecognised option -pthread passed to ld". When i try to run the binary i get the error message output shown below. I have attached my config.log, config.out and make.out from my build of openmpi in case that helps. Since the examples run with mpich1 and not with openmpi, i am assuming this is a openmpi problem and not a problem with Nags compiler or Parallel Library ?
# /opt/openmpi/openmpi-1.1a9r10177/bin/mpirun -np 2 a.out Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0xf3 Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0xf3[0] func:/opt/openmpi/openmpi-1.1a9r10177/lib/libopal.so.0 [0x2aaaaaeef3fa]
[1] func:/lib/libpthread.so.0 [0x2aaaab9697a0][2] func:/opt/openmpi/openmpi-1.1a9r10177/lib/libmpi.so.0(MPI_Comm_size+0x58) [0x2aaaaac33458]
[3] func:a.out [0x41dec8] [4] func:a.out [0x417eef] [5] func:a.out [0x404a0c] [6] func:/lib/libc.so.6(__libc_start_main+0xda) [0x2aaaaba8e4ca] [7] func:a.out [0x4025aa] *** End of error message ***[0] func:/opt/openmpi/openmpi-1.1a9r10177/lib/libopal.so.0 [0x2aaaaaeef3fa]
[1] func:/lib/libpthread.so.0 [0x2aaaab9687a0][2] func:/opt/openmpi/openmpi-1.1a9r10177/lib/libmpi.so.0(MPI_Comm_size+0x58) [0x2aaaaac33458]
[3] func:a.out [0x41dec8] [4] func:a.out [0x417eef] [5] func:a.out [0x404a0c] [6] func:/lib/libc.so.6(__libc_start_main+0xda) [0x2aaaaba8d4ca] [7] func:a.out [0x4025aa] *** End of error message *** Any ideas greatly appreciated, -Doug ---------- Forwarded message ---------- List-Post: users@lists.open-mpi.org Date: Fri, 2 Jun 2006 17:53:03 -0400 From: Brock Palen I was able to build OMPI (1.1a9r10177) with nag f95 5.0(414) with out any problems. To configure it be sure to use: FCFLAGS='-mismatch -w' That is the only really big change, I did use a prefix path to pbs (for tm) I also use portland for both my c and c++ compiler. Here if my full configure, its mostlikly useless to you, but somthing will make sence to you: ./configure --prefix=/home/software/rhel4/openmpi-1.1a8-nag --with- tm=/home/software/torque-2.0.0p8/ FC=/afs/engin.umich.edu/caen/rhel_4/ nag/bin/f95 F77=/afs/engin.umich.edu/caen/rhel_4/nag/bin/f95 FCFLAGS="-mismatch -w" CC=pgcc CXX=pgCC Some things i found, you cant have FCFLAGS have -O3 your mpif90 will segfault. Currently though we have problems with OMPI with nag though. So if some devs have some in sight into this problem would be help. Heres the problem, the package builds fine, on execution the following error is given: -bash-3.00$ mpirun -np 2 SWMF.exe [nyx-login.engin.umich.edu:06116] *** An error occurred in MPI_Comm_rank [nyx-login.engin.umich.edu:06116] *** on communicator MPI_COMM_WORLD [nyx-login.engin.umich.edu:06116] *** MPI_ERR_COMM: invalid communicator [nyx-login.engin.umich.edu:06116] *** MPI_ERRORS_ARE_FATAL (goodbye) 1 additional process aborted (not shown) I know there were some similar messages on the list sooner, Is this a known problem? If so is a fix in the works? And last is there a timeline for such a fix? Brock
ompi-output.tar.gz
Description: GNU Zip compressed data