I've also had someone run into the endpoint busy problem. I never
figured it out, I just increased the default endpoints on MX-10G from
8 to 16 endpoints to make the problem go away. Here's the actual
command and error before setting the endpoints to 16. The version is
MX-1.2.1with OMPI 1.2.3:
node1:~/taepic tae$ mpirun --hostfile hostmx10g -byslot -mca btl
self,sm,mx -np 12 test_beam_injection test_beam_injection.inp -npx 12
> out12
[node2:00834] mca_btl_mx_init: mx_open_endpoint() failed with status=20
------------------------------------------------------------------------
--
Process 0.1.3 is unable to reach 0.1.7 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
Process 0.1.11 is unable to reach 0.1.7 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
------------------------------------------------------------------------
--
Warner Yuen
Scientific Computing Consultant
Apple Computer
email: wy...@apple.com
Tel: 408.718.2859
Fax: 408.715.0133
On Jul 10, 2007, at 7:53 AM, users-requ...@open-mpi.org wrote:
------------------------------
Message: 2
Date: Tue, 10 Jul 2007 09:19:42 -0400
From: Tim Prins <tpr...@open-mpi.org>
Subject: Re: [OMPI users] openmpi fails on mx endpoint busy
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <4693876e.4070...@open-mpi.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
SLIM H.A. wrote:
Dear Tim
So, you should just be able to run:
mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile
ompi_machinefile ./cpi
I tried
node001>mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile
ompi_machinefile ./cpi
I put in a sleep call to keep it running for some time and to monitor
the endpoints. None of the 4 were open, it must have used tcp.
No, this is not possible. With this command line it will not use tcp.
Are you launching on more than one machine? If the procs are all on
one
machine, then it will use the shared memory component to communicate
(sm), although the endpoints should still be opened.
Just to make sure, you did put the sleep between MPI_Init and
MPI_Finalize?