On Thu, 7 Aug 2008, Panagiotis Kritikakos wrote:

Hi,

I think I have a bug on the Scientific Linux desktop relating to the
OpenMPI installation. It appears there is a condition which
prevents any MPI application from completing. Bellow you can see perhaps the simplest MPI that can be written and still link to the MPI library:

x.F90

program main

 use mpi

 implicit none

 integer : :  ierror

  call mpi_init(ierror)
  call mpi_finalize(ierror)

end program main

On my sl51 (32-bit) boxes mpif90 objects the the 'integer : : ierror' line. Maybe my compiler is feeling odd...

Replacing it with 'integer ierror' lets it compile for me.

I've compiled it with mpif90 -o x x.F90. Compilation goes fine, but when trying to execute the resulting x file, I received the following error:

libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,0,0]: OpenIB on host localhost was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
librdmacm:  couldn't read ABI version.
librdmacm:  assuming: 4
libibverbs: Fatal: couldn't read uverbs ABI version.
CMA: unable to open /dev/infiniband/rdma_cm

If I run the result I get no hang, but then i get not much useful output either :-)

$ mpirun ./x
libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host unfair.damtp.cam.ac.uk was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,0]: uDAPL on host unfair.damtp.cam.ac.uk was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

In case it matters that box has:

$ rpm -q sl-release gcc-gfortran openmpi openmpi-devel
sl-release-5.1-2.i386
gcc-gfortran-4.1.2-14.el5.i386
openmpi-1.2.3-4.el5.i386
openmpi-devel-1.2.3-4.el5.i386

The warnings should not be a problem as it just compalinging about not finding a high performance interconnect network options that were compiled in and the system will fall back to lower performance defaults.

strace gives a lot of output but shows the program is stalling with:

futex(0x26049c, FUTEX_WAIT, 2, NULL

It looks like some kind of deadlock is occurring probably relating to
the threads involved or shared memory.

Debugging mpi problems is always a bit of a nightmare. BTW how many processors were you running it on?

I'm still worrying about updating our systems to the (newer) openmpi from sl52 as the package maintainers have switched from alternatives (which I sort of understand) to mpi-selector (which I don't)...

I have compiled up a personal version of OpenMPI removing the OpenIB support:

./configure –prefix=/opt/local/openmpi --without-openib

This works (once I worked out that it was linking against the old
libraries at runtime) and the correct output (nothing) is produced.

 -- Jon

Reply via email to