Hello,

I am attempting to port Sandia's DAKOTA code from MVAPICH to the default
OpenMPI/Intel environment on Sandia's thunderbird cluster.  I can
successfully build DAKOTA in the default tbird software environment, but
I'm having runtime problems when DAKOTA attempts to make a system call.
Typical output looks like:

[0,1,1][btl_openib_component.c:897:mca_btl_openib_component_progress]
from an64 to: an64 error polling HP CQ with status LOCAL LENGTH ERROR
status number 1 for wr_id 5714048 opcode 0

I'm attaching a tarball containing output from `ompi_info --all` as well
as two simple sample programs with output to demonstrate the problem
behavior.  I built them in the default tbird MPI environment
(openmpi-1.1.2-ofed-intel-9.1) with 

  mpicc mpi_syscall.c -i_dynamic -o mpi_syscall
  mpicc mpi_nosyscall.c -i_dynamic -o mpi_nosyscall

where `which mpicc` =
/apps/x86_64/mpi/openmpi/intel-9.1/openmpi-1.1.2-ofed/bin/mpicc The
latter has no system call and runs fine on two processors, whereas the
former gives the openib error (not in the attached output, though dumped
to the screen).  The problem exists regardless of whether -i_dynamic is
included.  I am executing from within an interactive 2 processor job
using 

  /apps/x86_64/mpi/openmpi/intel-9.1/openmpi-1.1.2-ofed/bin/mpiexec ->
orterun

I know some OpenMPI developers have access to thunderbird for testing,
but if you require additional information on the build or runtime
environment, please advise and I will attempt to send it along. 

Note:  Both programs run fine with MVAPICH on tbird, and with OpenMPI or
MPICH on my Linux x86_64 SMP workstation.

Thanks,
Brian
----------------------------------------
Brian M. Adams, PhD (bria...@sandia.gov) 
Optimization and Uncertainty Estimation 
Sandia National Laboratories 
P.O. Box 5800, Mail Stop 1318 
Albuquerque, NM 87185-1318
Voice: 505-284-8845, FAX: 505-284-2518




Attachment: ompi_tbird_system.tgz
Description: ompi_tbird_system.tgz

Reply via email to