Sorry, I forgot to mention that I did get my mpi app working with:
mpirun --mca oob_tcp_dynamic_ipv4_ports 46100-46117 --mca btl_tcp_port_min_v4
46118 --mca btl_tcp_port_range_v4 17
But it’s not safe just to hard code those port ranges incase someone else uses
those ports, or I want to run the
Hi,
Thanks for the explanation. I’m trying to restrict the port range, because if I
don’t, mpiexec doesn’t function reliably.
With 2 hosts it always works, then as you add hosts it is more and more likely
to fail, until by 16 hosts is almost always fails.
“Fails” here means that mpiexec
Let me briefly explain how MPI jobs start. mpirun launches a set of daemons,
one per node. Each daemon has a "phone home" address passed to it on its cmd
line. It opens a port (obtained from its local OS) and connects back to the
port provided on its cmd line. This establishes a connection back
No firewall between nodes in the cluster.
OMPI may be asking localhost for available ports, but is it checking those
ports are also available on all the other hosts it’s going to run on?
On 18 Mar 2021, at 15:57, Ralph Castain via users
mailto:users@lists.open-mpi.org>> wrote:
Hmmm...then
Hmmm...then you have something else going on. By default, OMPI will ask the
local OS for an available port and use it. You only need to specify ports when
working thru a firewall.
Do you have firewalls on this cluster?
On Mar 18, 2021, at 8:55 AM, Sendu Bala mailto:s...@sanger.ac.uk> > wrote:
Yes, that’s the trick. I’m going to have to check port usage on all hosts and
pick suitable ranges just-in-time - and hope I don’t hit a race condition with
other users of the cluster.
Does mpiexec not have this kind of functionality built in? When I use it with
no port options set (pure
Hard to say - unless there is some reason, why not make it large enough to not
be an issue? You may have to experiment a bit as there is nothing to guarantee
that other processes aren't occupying those regions.
On Mar 18, 2021, at 2:13 AM, Sendu Bala mailto:s...@sanger.ac.uk> > wrote:
Thanks, it made it work when I was running “true” as a test, but then my real
MPI app failed with:
[node-5-8-2][[48139,1],0][btl_tcp_component.c:966:mca_btl_tcp_component_create_listen]
bind() failed: no port available in the range [46107..46139]