Hello, I am attempting to port Sandia's DAKOTA code from MVAPICH to the default OpenMPI/Intel environment on Sandia's thunderbird cluster. I can successfully build DAKOTA in the default tbird software environment, but I'm having runtime problems when DAKOTA attempts to make a system call. Typical output looks like:
[0,1,1][btl_openib_component.c:897:mca_btl_openib_component_progress] from an64 to: an64 error polling HP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 5714048 opcode 0 I'm attaching a tarball containing output from `ompi_info --all` as well as two simple sample programs with output to demonstrate the problem behavior. I built them in the default tbird MPI environment (openmpi-1.1.2-ofed-intel-9.1) with mpicc mpi_syscall.c -i_dynamic -o mpi_syscall mpicc mpi_nosyscall.c -i_dynamic -o mpi_nosyscall where `which mpicc` = /apps/x86_64/mpi/openmpi/intel-9.1/openmpi-1.1.2-ofed/bin/mpicc The latter has no system call and runs fine on two processors, whereas the former gives the openib error (not in the attached output, though dumped to the screen). The problem exists regardless of whether -i_dynamic is included. I am executing from within an interactive 2 processor job using /apps/x86_64/mpi/openmpi/intel-9.1/openmpi-1.1.2-ofed/bin/mpiexec -> orterun I know some OpenMPI developers have access to thunderbird for testing, but if you require additional information on the build or runtime environment, please advise and I will attempt to send it along. Note: Both programs run fine with MVAPICH on tbird, and with OpenMPI or MPICH on my Linux x86_64 SMP workstation. Thanks, Brian ---------------------------------------- Brian M. Adams, PhD (bria...@sandia.gov) Optimization and Uncertainty Estimation Sandia National Laboratories P.O. Box 5800, Mail Stop 1318 Albuquerque, NM 87185-1318 Voice: 505-284-8845, FAX: 505-284-2518
ompi_tbird_system.tgz
Description: ompi_tbird_system.tgz