I've also had someone run into the endpoint busy problem. I never figured it out, I just increased the default endpoints on MX-10G from 8 to 16 endpoints to make the problem go away. Here's the actual command and error before setting the endpoints to 16. The version is MX-1.2.1with OMPI 1.2.3:

node1:~/taepic tae$ mpirun --hostfile hostmx10g -byslot -mca btl self,sm,mx -np 12 test_beam_injection test_beam_injection.inp -npx 12 > out12
[node2:00834] mca_btl_mx_init: mx_open_endpoint() failed with status=20
------------------------------------------------------------------------ --
Process 0.1.3 is unable to reach 0.1.7 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------ -- ------------------------------------------------------------------------ --
Process 0.1.11 is unable to reach 0.1.7 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------ -- ------------------------------------------------------------------------ --
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 PML add procs failed
 --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------ --
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
------------------------------------------------------------------------ --


Warner Yuen
Scientific Computing Consultant
Apple Computer
email: wy...@apple.com
Tel: 408.718.2859
Fax: 408.715.0133


On Jul 10, 2007, at 7:53 AM, users-requ...@open-mpi.org wrote:

------------------------------

Message: 2
Date: Tue, 10 Jul 2007 09:19:42 -0400
From: Tim Prins <tpr...@open-mpi.org>
Subject: Re: [OMPI users] openmpi fails on mx endpoint busy
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <4693876e.4070...@open-mpi.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

SLIM H.A. wrote:
Dear Tim

So, you should just be able to run:
mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile
ompi_machinefile ./cpi

I tried

node001>mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile
ompi_machinefile ./cpi

I put in a sleep call to keep it running for some time and to monitor
the endpoints. None of the 4 were open, it must have used tcp.
No, this is not possible. With this command line it will not use tcp.
Are you launching on more than one machine? If the procs are all on one
machine, then it will use the shared memory component to communicate
(sm), although the endpoints should still be opened.

Just to make sure, you did put the sleep between MPI_Init and MPI_Finalize?


Reply via email to