Re: [OMPI users] [External] Help with MPI and macOS Firewall

2021-03-18 Thread Gilles Gouaillardet via users
Matt,

you can either

mpirun --mca btl self,vader ...

or

export OMPI_MCA_btl=self,vader
mpirun ...

you may also add
btl = self,vader
in your /etc/openmpi-mca-params.conf
and then simply

mpirun ...

Cheers,

Gilles

On Fri, Mar 19, 2021 at 5:44 AM Matt Thompson via users
 wrote:
>
> Prentice,
>
> Ooh. The first one seems to work. The second one apparently is not liked by 
> zsh and I had to do:
> ❯ mpirun -mca btl '^tcp' -np 6 ./helloWorld.mpi3.exe
> Compiler Version: GCC version 10.2.0
> MPI Version: 3.1
> MPI Library Version: Open MPI v4.1.0, package: Open MPI 
> mathomp4@gs6101-parcel.local Distribution, ident: 4.1.0, repo rev: v4.1.0, 
> Dec 18, 2020
>
> Next question: is this:
>
> OMPI_MCA_btl='self,vader'
>
> the right environment variable translation of that command-line option?
>
> On Thu, Mar 18, 2021 at 3:40 PM Prentice Bisbal via users 
>  wrote:
>>
>> OpenMPI should only be using shared memory on the local host automatically, 
>> but maybe you need to force it.
>>
>> I think
>>
>> mpirun -mca btl self,vader ...
>>
>> should do that.
>>
>> or you can exclude tcp instead
>>
>> mpirun -mca btl ^tcp
>>
>> See
>>
>> https://www.open-mpi.org/faq/?category=sm
>>
>> for more info.
>>
>> Prentice
>>
>> On 3/18/21 12:28 PM, Matt Thompson via users wrote:
>>
>> All,
>>
>> This isn't specifically an Open MPI issue, but as that is the MPI stack I 
>> use on my laptop, I'm hoping someone here might have a possible solution. (I 
>> am pretty sure something like MPICH would trigger this as well.)
>>
>> Namely, my employer recently did something somewhere so that now *any* MPI 
>> application I run will throw popups like this one:
>>
>> https://user-images.githubusercontent.com/4114656/30962814-866f3010-a44b-11e7-9de3-9f2a3b0229c0.png
>>
>> though for me it's asking about "orterun" and "helloworld.mpi3.exe", etc. I 
>> essentially get one-per-process.
>>
>> If I had sudo access, I suppose I could just keep clicking "Allow" for every 
>> program, but I don't and I compile lots of programs with different names.
>>
>> So, I was hoping maybe an Open MPI guru out there knew of an MCA thing I 
>> could use to avoid them? This is all isolated on-my-laptop MPI I'm doing, so 
>> at most an "mpirun --oversubscribe -np 12" or something. It'll never go over 
>> my network to anything, etc.
>>
>> --
>> Matt Thompson
>>“The fact is, this is about us identifying what we do best and
>>finding more ways of doing less of it better” -- Director of Better Anna 
>> Rampton
>
>
>
> --
> Matt Thompson
>“The fact is, this is about us identifying what we do best and
>finding more ways of doing less of it better” -- Director of Better Anna 
> Rampton


Re: [OMPI users] [External] Help with MPI and macOS Firewall

2021-03-18 Thread Matt Thompson via users
Prentice,

Ooh. The first one seems to work. The second one apparently is not liked by
zsh and I had to do:
❯ mpirun -mca btl '^tcp' -np 6 ./helloWorld.mpi3.exe
Compiler Version: GCC version 10.2.0
MPI Version: 3.1
MPI Library Version: Open MPI v4.1.0, package: Open MPI
mathomp4@gs6101-parcel.local Distribution, ident: 4.1.0, repo rev: v4.1.0,
Dec 18, 2020

Next question: is this:

OMPI_MCA_btl='self,vader'

the right environment variable translation of that command-line option?

On Thu, Mar 18, 2021 at 3:40 PM Prentice Bisbal via users <
users@lists.open-mpi.org> wrote:

> OpenMPI should only be using shared memory on the local host
> automatically, but maybe you need to force it.
>
> I think
>
> mpirun -mca btl self,vader ...
>
> should do that.
>
> or you can exclude tcp instead
>
> mpirun -mca btl ^tcp
>
> See
>
> https://www.open-mpi.org/faq/?category=sm
>
> for more info.
>
> Prentice
>
> On 3/18/21 12:28 PM, Matt Thompson via users wrote:
>
> All,
>
> This isn't specifically an Open MPI issue, but as that is the MPI stack I
> use on my laptop, I'm hoping someone here might have a possible solution.
> (I am pretty sure something like MPICH would trigger this as well.)
>
> Namely, my employer recently did something somewhere so that now *any* MPI
> application I run will throw popups like this one:
>
>
> https://user-images.githubusercontent.com/4114656/30962814-866f3010-a44b-11e7-9de3-9f2a3b0229c0.png
>
> though for me it's asking about "orterun" and "helloworld.mpi3.exe", etc.
> I essentially get one-per-process.
>
> If I had sudo access, I suppose I could just keep clicking "Allow" for
> every program, but I don't and I compile lots of programs with different
> names.
>
> So, I was hoping maybe an Open MPI guru out there knew of an MCA thing I
> could use to avoid them? This is all isolated on-my-laptop MPI I'm doing,
> so at most an "mpirun --oversubscribe -np 12" or something. It'll never go
> over my network to anything, etc.
>
> --
> Matt Thompson
>“The fact is, this is about us identifying what we do best and
>finding more ways of doing less of it better” -- Director of Better
> Anna Rampton
>
>

-- 
Matt Thompson
   “The fact is, this is about us identifying what we do best and
   finding more ways of doing less of it better” -- Director of Better Anna
Rampton


Re: [OMPI users] [External] Re: Error intialising an OpenFabrics device.

2021-03-18 Thread Cunningham, Brendan via users
(sorry, formatting got munged)

Try adding:
--mca btl_openib_warn_no_device_params_found 0 --mca btl_openib_allow_ib 
true
To your mpirun line to suppress this warning.

> -Original Message-
> From: Cunningham, Brendan
> Sent: Thursday, March 18, 2021 3:40 PM
> To: 'Open MPI Users' 
> Cc: Prentice Bisbal 
> Subject: RE: [OMPI users] [External] Re: Error intialising an OpenFabrics
> device.
> 
> I believe this is an expected warning in OMPI 4.0.x series as the openib BTL 
> is
> being deprecated (https://www.open-mpi.org/software/ompi/major-
> changes.php)
> 
> Try adding:
>   --mca btl_openib_warn_no_device_params_found 0 --mca
> btl_openib_allow_ib true To suppress this warning.
> 
> This issue (https://github.com/open-mpi/ompi/issues/6300) may be
> relevant.
> 
> > -Original Message-
> > From: users  On Behalf Of Prentice
> > Bisbal via users
> > Sent: Thursday, March 18, 2021 3:28 PM
> > To: users@lists.open-mpi.org
> > Cc: Prentice Bisbal 
> > Subject: Re: [OMPI users] [External] Re: Error intialising an
> > OpenFabrics device.
> >
> > >   If you disable it with -mtl ^openib the warning goes away.
> > And the performance of openib goes away right along with it.
> >
> > Prentice
> >
> > On 3/13/21 5:43 PM, Heinz, Michael William via users wrote:
> > > I’ve begun getting this annoyingly generic warning, too. It appears
> > > to be
> > coming from the openib provider. If you disable it with -mtl ^openib
> > the warning goes away.
> > >
> > > Sent from my iPad
> > >
> > >> On Mar 13, 2021, at 3:28 PM, Bob Beattie via users
> > >>  > mpi.org> wrote:
> > >>
> > >> Hi everyone,
> > >>
> > >> To be honest, as an MPI / IB noob, I don't know if this falls under
> > OpenMPI or Mellanox
> > >>
> > >> Am running a small cluster of HP DL380 G6/G7 machines.
> > >> Each runs Ubuntu server 20.04 and has a Mellanox ConnectX-3 card,
> > connected by an IS dumb switch.
> > >> When I begin my MPI program (snappyHexMesh for OpenFOAM) I get
> an
> > error reported.
> > >> The error doesn't stop my programs or appear to cause any problems,
> > >> so
> > this request for help is more about delving into the why.
> > >>
> > >> OMPI is compiled from source using v4.0.3; which is the default
> > >> version for Ubuntu 20.04 This compiles and works.  I did this
> > >> because I
> > wanted to understand the compilation process whilst using a known
> > working OMPI version.
> > >>
> > >> The Infiniband part is the Mellanox MLNXOFED installer v4.9-0.1.7.0
> > >> and I install that with --dkms --without-fw-update --hpc
> > >> --with-nfsrdma
> > >>
> > >> The actual error reported is:
> > >> Warning: There was an error initialising an OpenFabrics device.
> > >>Local host: of1
> > >>Local device: mlx4_0
> > >>
> > >> Then shortly after:
> > >> [of1:1015399] 19 more processes have sent help message
> > >> help-mpi-btl-openib.txt / error in device init [of1:1015399] Set
> > >> MCA parameter "orte_base_help_aggregate" to 0 to see all help /
> > >> error messages
> > >>
> > >> Adding this MCA parameter to the mpirun line simply gives me 20 or
> > >> so
> > copies of the first warning.
> > >>
> > >> Any ideas anyone ?
> > >> Cheers,
> > >> Bob.


Re: [OMPI users] [External] Re: Error intialising an OpenFabrics device.

2021-03-18 Thread Cunningham, Brendan via users
I believe this is an expected warning in OMPI 4.0.x series as the openib BTL is 
being deprecated (https://www.open-mpi.org/software/ompi/major-changes.php)

Try adding:
--mca btl_openib_warn_no_device_params_found 0 --mca 
btl_openib_allow_ib true
To suppress this warning.

This issue (https://github.com/open-mpi/ompi/issues/6300) may be relevant.

> -Original Message-
> From: users  On Behalf Of Prentice
> Bisbal via users
> Sent: Thursday, March 18, 2021 3:28 PM
> To: users@lists.open-mpi.org
> Cc: Prentice Bisbal 
> Subject: Re: [OMPI users] [External] Re: Error intialising an OpenFabrics
> device.
> 
> >   If you disable it with -mtl ^openib the warning goes away.
> And the performance of openib goes away right along with it.
> 
> Prentice
> 
> On 3/13/21 5:43 PM, Heinz, Michael William via users wrote:
> > I’ve begun getting this annoyingly generic warning, too. It appears to be
> coming from the openib provider. If you disable it with -mtl ^openib the
> warning goes away.
> >
> > Sent from my iPad
> >
> >> On Mar 13, 2021, at 3:28 PM, Bob Beattie via users  mpi.org> wrote:
> >>
> >> Hi everyone,
> >>
> >> To be honest, as an MPI / IB noob, I don't know if this falls under
> OpenMPI or Mellanox
> >>
> >> Am running a small cluster of HP DL380 G6/G7 machines.
> >> Each runs Ubuntu server 20.04 and has a Mellanox ConnectX-3 card,
> connected by an IS dumb switch.
> >> When I begin my MPI program (snappyHexMesh for OpenFOAM) I get an
> error reported.
> >> The error doesn't stop my programs or appear to cause any problems, so
> this request for help is more about delving into the why.
> >>
> >> OMPI is compiled from source using v4.0.3; which is the default
> >> version for Ubuntu 20.04 This compiles and works.  I did this because I
> wanted to understand the compilation process whilst using a known working
> OMPI version.
> >>
> >> The Infiniband part is the Mellanox MLNXOFED installer v4.9-0.1.7.0
> >> and I install that with --dkms --without-fw-update --hpc
> >> --with-nfsrdma
> >>
> >> The actual error reported is:
> >> Warning: There was an error initialising an OpenFabrics device.
> >>Local host: of1
> >>Local device: mlx4_0
> >>
> >> Then shortly after:
> >> [of1:1015399] 19 more processes have sent help message
> >> help-mpi-btl-openib.txt / error in device init [of1:1015399] Set MCA
> >> parameter "orte_base_help_aggregate" to 0 to see all help / error
> >> messages
> >>
> >> Adding this MCA parameter to the mpirun line simply gives me 20 or so
> copies of the first warning.
> >>
> >> Any ideas anyone ?
> >> Cheers,
> >> Bob.


Re: [OMPI users] [External] Help with MPI and macOS Firewall

2021-03-18 Thread Prentice Bisbal via users
OpenMPI should only be using shared memory on the local host 
automatically, but maybe you need to force it.


I think

mpirun -mca btl self,vader ...

should do that.

or you can exclude tcp instead

mpirun -mca btl ^tcp

See

https://www.open-mpi.org/faq/?category=sm

for more info.

Prentice

On 3/18/21 12:28 PM, Matt Thompson via users wrote:

All,

This isn't specifically an Open MPI issue, but as that is the MPI 
stack I use on my laptop, I'm hoping someone here might have a 
possible solution. (I am pretty sure something like MPICH would 
trigger this as well.)


Namely, my employer recently did something somewhere so that now *any* 
MPI application I run will throw popups like this one:


https://user-images.githubusercontent.com/4114656/30962814-866f3010-a44b-11e7-9de3-9f2a3b0229c0.png 



though for me it's asking about "orterun" and "helloworld.mpi3.exe", 
etc. I essentially get one-per-process.


If I had sudo access, I suppose I could just keep clicking "Allow" for 
every program, but I don't and I compile lots of programs with 
different names.


So, I was hoping maybe an Open MPI guru out there knew of an MCA thing 
I could use to avoid them? This is all isolated on-my-laptop MPI I'm 
doing, so at most an "mpirun --oversubscribe -np 12" or something. 
It'll never go over my network to anything, etc.


--
Matt Thompson
   “The fact is, this is about us identifying what we do best and
   finding more ways of doing less of it better” -- Director of Better 
Anna Rampton


Re: [OMPI users] [External] Re: Error intialising an OpenFabrics device.

2021-03-18 Thread Prentice Bisbal via users

  If you disable it with -mtl ^openib the warning goes away.

And the performance of openib goes away right along with it.

Prentice

On 3/13/21 5:43 PM, Heinz, Michael William via users wrote:

I’ve begun getting this annoyingly generic warning, too. It appears to be 
coming from the openib provider. If you disable it with -mtl ^openib the 
warning goes away.

Sent from my iPad


On Mar 13, 2021, at 3:28 PM, Bob Beattie via users  
wrote:

Hi everyone,

To be honest, as an MPI / IB noob, I don't know if this falls under OpenMPI or 
Mellanox

Am running a small cluster of HP DL380 G6/G7 machines.
Each runs Ubuntu server 20.04 and has a Mellanox ConnectX-3 card, connected by 
an IS dumb switch.
When I begin my MPI program (snappyHexMesh for OpenFOAM) I get an error 
reported.
The error doesn't stop my programs or appear to cause any problems, so this 
request for help is more about delving into the why.

OMPI is compiled from source using v4.0.3; which is the default version for 
Ubuntu 20.04
This compiles and works.  I did this because I wanted to understand the 
compilation process whilst using a known working OMPI version.

The Infiniband part is the Mellanox MLNXOFED installer v4.9-0.1.7.0 and I 
install that with --dkms --without-fw-update --hpc --with-nfsrdma

The actual error reported is:
Warning: There was an error initialising an OpenFabrics device.
   Local host: of1
   Local device: mlx4_0

Then shortly after:
[of1:1015399] 19 more processes have sent help message help-mpi-btl-openib.txt 
/ error in device init
[of1:1015399] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
/ error messages

Adding this MCA parameter to the mpirun line simply gives me 20 or so copies of 
the first warning.

Any ideas anyone ?
Cheers,
Bob.


[OMPI users] Help with MPI and macOS Firewall

2021-03-18 Thread Matt Thompson via users
All,

This isn't specifically an Open MPI issue, but as that is the MPI stack I
use on my laptop, I'm hoping someone here might have a possible solution.
(I am pretty sure something like MPICH would trigger this as well.)

Namely, my employer recently did something somewhere so that now *any* MPI
application I run will throw popups like this one:

https://user-images.githubusercontent.com/4114656/30962814-866f3010-a44b-11e7-9de3-9f2a3b0229c0.png

though for me it's asking about "orterun" and "helloworld.mpi3.exe", etc. I
essentially get one-per-process.

If I had sudo access, I suppose I could just keep clicking "Allow" for
every program, but I don't and I compile lots of programs with different
names.

So, I was hoping maybe an Open MPI guru out there knew of an MCA thing I
could use to avoid them? This is all isolated on-my-laptop MPI I'm doing,
so at most an "mpirun --oversubscribe -np 12" or something. It'll never go
over my network to anything, etc.

-- 
Matt Thompson
   “The fact is, this is about us identifying what we do best and
   finding more ways of doing less of it better” -- Director of Better Anna
Rampton


Re: [OMPI users] How do you change ports used? [EXT]

2021-03-18 Thread Ralph Castain via users
Hmmm...then you have something else going on. By default, OMPI will ask the 
local OS for an available port and use it. You only need to specify ports when 
working thru a firewall.

Do you have firewalls on this cluster?


On Mar 18, 2021, at 8:55 AM, Sendu Bala mailto:s...@sanger.ac.uk> > wrote:

Yes, that’s the trick. I’m going to have to check port usage on all hosts and 
pick suitable ranges just-in-time - and hope I don’t hit a race condition with 
other users of the cluster.

Does mpiexec not have this kind of functionality built in? When I use it with 
no port options set (pure default), it just doesn’t function (I’m guessing 
because it chose “bad” or in-use ports).



On 18 Mar 2021, at 14:11, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:

Hard to say - unless there is some reason, why not make it large enough to not 
be an issue? You may have to experiment a bit as there is nothing to guarantee 
that other processes aren't occupying those regions.



On Mar 18, 2021, at 2:13 AM, Sendu Bala mailto:s...@sanger.ac.uk> > wrote:

Thanks, it made it work when I was running “true” as a test, but then my real 
MPI app failed with:

[node-5-8-2][[48139,1],0][btl_tcp_component.c:966:mca_btl_tcp_component_create_listen]
 bind() failed: no port available in the range [46107..46139]
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[48139,1],1]) is on host: node-12-6-2
  Process 2 ([[48139,1],0]) is on host: node-5-8-2
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.


This was when running with 16 cores, so I thought  a 32 port range would be 
fine. Is this telling me I have to make it a 33 port range, have different 
ranges for oob and btl, or that some other unrelated software is using some 
ports in my range?


(I changed my range from my previous post, because using that range resulted in 
the issue I posted about here before, where mpirun just does nothing for 5mins 
and then terminates itself, without any error messages.)


Cheers,
Sendu.


On 17 Mar 2021, at 13:25, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:

What you are missing is that there are _two_ messaging layers in the system. 
You told the btl/tcp layer to use the specified ports, but left the oob/tcp one 
unspecified. You need to add

oob_tcp_dynamic_ipv4_ports = 46207-46239

or whatever range you want to specify

Note that if you want the btl/tcp layer to use those other settings (e.g., 
keepalive_time), then you'll need to set those as well. The names of the 
variables may not match between the layers - you'll need to use ompi_info to 
find the names and params available for each layer.


On Mar 16, 2021, at 2:43 AM, Vincent via users mailto:users@lists.open-mpi.org> > wrote:

On 09/03/2021 11:23, Sendu Bala via users wrote:
When using mpirun, how do you pick which ports are used?

I???ve tried:

mpirun --mca btl_tcp_port_min_v4 46207  --mca btl_tcp_port_range_v4 32 --mca 
oob_tcp_keepalive_time 45 --mca oob_tcp_max_recon_attempts 20 --mca 
oob_tcp_retry_delay  1 --mca oob_tcp_keepalive_probes 20 --mca 
oob_tcp_keepalive_intvl 10 true

And also setting similar things in openmpi/etc/openmpi-mca-params.conf :

btl_tcp_port_min_v4 = 46207
btl_tcp_port_range_v4 = 32
oob_tcp_keepalive_time = 45
oob_tcp_max_recon_attempts = 20
oob_tcp_retry_delay = 1
oob_tcp_keepalive_probes = 20
oob_tcp_keepalive_intvl = 10

But when the process is running:

ss -l -p -n | grep "pid=57642,"
tcp  LISTEN 0  128    
127.0.0.1:58439 0.0.0.0:* users:(("mpirun",pid=57642,fd=14))
tcp  LISTEN 0  128  
0.0.0.0:36253 0.0.0.0:*   users:(("mpirun",pid=57642,fd=17))

What am I doing wrong, and how do I get it to use my desired ports (and other 
settings above)?


Hello

Could this be related to some recently resolved bug ?
What version are you running ?
Having a look on 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_open-2Dmpi_ompi_issues_8304=DwIFaQ=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo=R4ZUzQZ7_TZ1SVV_pAmysrrJ1zatMHFpzMNAdJSpPIo=Dv6xQizR35EO5Xf86whFlO2mZWbJO9kT0iMDaeL0iXs=RhsRamUPqN_mfRS_JffG2ZAfqgCaYGL1Fkqbv1d3WB8=
  could be possibly useful?


Regards

Vincent.

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a 
charity registered in England with number 1021457 and a company registered in 
England with number 2742969, whose registered office is 215 Euston Road, 
London, NW1 2BE.


-- The Wellcome Sanger Institute is operated 

Re: [OMPI users] How do you change ports used? [EXT]

2021-03-18 Thread Sendu Bala via users
Yes, that’s the trick. I’m going to have to check port usage on all hosts and 
pick suitable ranges just-in-time - and hope I don’t hit a race condition with 
other users of the cluster.

Does mpiexec not have this kind of functionality built in? When I use it with 
no port options set (pure default), it just doesn’t function (I’m guessing 
because it chose “bad” or in-use ports).



On 18 Mar 2021, at 14:11, Ralph Castain via users 
mailto:users@lists.open-mpi.org>> wrote:

Hard to say - unless there is some reason, why not make it large enough to not 
be an issue? You may have to experiment a bit as there is nothing to guarantee 
that other processes aren't occupying those regions.



On Mar 18, 2021, at 2:13 AM, Sendu Bala 
mailto:s...@sanger.ac.uk>> wrote:

Thanks, it made it work when I was running “true” as a test, but then my real 
MPI app failed with:

[node-5-8-2][[48139,1],0][btl_tcp_component.c:966:mca_btl_tcp_component_create_listen]
 bind() failed: no port available in the range [46107..46139]
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[48139,1],1]) is on host: node-12-6-2
  Process 2 ([[48139,1],0]) is on host: node-5-8-2
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.


This was when running with 16 cores, so I thought  a 32 port range would be 
fine. Is this telling me I have to make it a 33 port range, have different 
ranges for oob and btl, or that some other unrelated software is using some 
ports in my range?


(I changed my range from my previous post, because using that range resulted in 
the issue I posted about here before, where mpirun just does nothing for 5mins 
and then terminates itself, without any error messages.)


Cheers,
Sendu.


On 17 Mar 2021, at 13:25, Ralph Castain via users 
mailto:users@lists.open-mpi.org>> wrote:

What you are missing is that there are _two_ messaging layers in the system. 
You told the btl/tcp layer to use the specified ports, but left the oob/tcp one 
unspecified. You need to add

oob_tcp_dynamic_ipv4_ports = 46207-46239

or whatever range you want to specify

Note that if you want the btl/tcp layer to use those other settings (e.g., 
keepalive_time), then you'll need to set those as well. The names of the 
variables may not match between the layers - you'll need to use ompi_info to 
find the names and params available for each layer.


On Mar 16, 2021, at 2:43 AM, Vincent via users 
mailto:users@lists.open-mpi.org>> wrote:

On 09/03/2021 11:23, Sendu Bala via users wrote:
When using mpirun, how do you pick which ports are used?

I???ve tried:

mpirun --mca btl_tcp_port_min_v4 46207  --mca btl_tcp_port_range_v4 32 --mca 
oob_tcp_keepalive_time 45 --mca oob_tcp_max_recon_attempts 20 --mca 
oob_tcp_retry_delay  1 --mca oob_tcp_keepalive_probes 20 --mca 
oob_tcp_keepalive_intvl 10 true

And also setting similar things in openmpi/etc/openmpi-mca-params.conf :
btl_tcp_port_min_v4 = 46207
btl_tcp_port_range_v4 = 32
oob_tcp_keepalive_time = 45
oob_tcp_max_recon_attempts = 20
oob_tcp_retry_delay = 1
oob_tcp_keepalive_probes = 20
oob_tcp_keepalive_intvl = 10

But when the process is running:

ss -l -p -n | grep "pid=57642,"
tcp  LISTEN 0  128
127.0.0.1:58439 0.0.0.0:* users:(("mpirun",pid=57642,fd=14))
tcp  LISTEN 0  128  
0.0.0.0:36253 0.0.0.0:*   users:(("mpirun",pid=57642,fd=17))

What am I doing wrong, and how do I get it to use my desired ports (and other 
settings above)?


Hello

Could this be related to some recently resolved bug ?
What version are you running ?
Having a look on 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_open-2Dmpi_ompi_issues_8304=DwIFaQ=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo=R4ZUzQZ7_TZ1SVV_pAmysrrJ1zatMHFpzMNAdJSpPIo=Dv6xQizR35EO5Xf86whFlO2mZWbJO9kT0iMDaeL0iXs=RhsRamUPqN_mfRS_JffG2ZAfqgCaYGL1Fkqbv1d3WB8=
  could be possibly useful?

Regards

Vincent.

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a 
charity registered in England with number 1021457 and a company registered in 
England with number 2742969, whose registered office is 215 Euston Road, 
London, NW1 2BE.





-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.

Re: [OMPI users] How do you change ports used? [EXT]

2021-03-18 Thread Ralph Castain via users
Hard to say - unless there is some reason, why not make it large enough to not 
be an issue? You may have to experiment a bit as there is nothing to guarantee 
that other processes aren't occupying those regions.



On Mar 18, 2021, at 2:13 AM, Sendu Bala mailto:s...@sanger.ac.uk> > wrote:

Thanks, it made it work when I was running “true” as a test, but then my real 
MPI app failed with:

[node-5-8-2][[48139,1],0][btl_tcp_component.c:966:mca_btl_tcp_component_create_listen]
 bind() failed: no port available in the range [46107..46139]
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[48139,1],1]) is on host: node-12-6-2
  Process 2 ([[48139,1],0]) is on host: node-5-8-2
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.


This was when running with 16 cores, so I thought  a 32 port range would be 
fine. Is this telling me I have to make it a 33 port range, have different 
ranges for oob and btl, or that some other unrelated software is using some 
ports in my range?


(I changed my range from my previous post, because using that range resulted in 
the issue I posted about here before, where mpirun just does nothing for 5mins 
and then terminates itself, without any error messages.)


Cheers,
Sendu.


On 17 Mar 2021, at 13:25, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:

What you are missing is that there are _two_ messaging layers in the system. 
You told the btl/tcp layer to use the specified ports, but left the oob/tcp one 
unspecified. You need to add

oob_tcp_dynamic_ipv4_ports = 46207-46239

or whatever range you want to specify

Note that if you want the btl/tcp layer to use those other settings (e.g., 
keepalive_time), then you'll need to set those as well. The names of the 
variables may not match between the layers - you'll need to use ompi_info to 
find the names and params available for each layer.


On Mar 16, 2021, at 2:43 AM, Vincent via users mailto:users@lists.open-mpi.org> > wrote:

On 09/03/2021 11:23, Sendu Bala via users wrote:
When using mpirun, how do you pick which ports are used?

I???ve tried:

mpirun --mca btl_tcp_port_min_v4 46207  --mca btl_tcp_port_range_v4 32 --mca 
oob_tcp_keepalive_time 45 --mca oob_tcp_max_recon_attempts 20 --mca 
oob_tcp_retry_delay  1 --mca oob_tcp_keepalive_probes 20 --mca 
oob_tcp_keepalive_intvl 10 true

And also setting similar things in openmpi/etc/openmpi-mca-params.conf :

btl_tcp_port_min_v4 = 46207
btl_tcp_port_range_v4 = 32
oob_tcp_keepalive_time = 45
oob_tcp_max_recon_attempts = 20
oob_tcp_retry_delay = 1
oob_tcp_keepalive_probes = 20
oob_tcp_keepalive_intvl = 10

But when the process is running:

ss -l -p -n | grep "pid=57642,"
tcp  LISTEN 0  128    
127.0.0.1:58439 0.0.0.0:* users:(("mpirun",pid=57642,fd=14))
tcp  LISTEN 0  128  
0.0.0.0:36253 0.0.0.0:*   users:(("mpirun",pid=57642,fd=17))

What am I doing wrong, and how do I get it to use my desired ports (and other 
settings above)?


Hello

Could this be related to some recently resolved bug ?
What version are you running ?
Having a look on 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_open-2Dmpi_ompi_issues_8304=DwIFaQ=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo=R4ZUzQZ7_TZ1SVV_pAmysrrJ1zatMHFpzMNAdJSpPIo=Dv6xQizR35EO5Xf86whFlO2mZWbJO9kT0iMDaeL0iXs=RhsRamUPqN_mfRS_JffG2ZAfqgCaYGL1Fkqbv1d3WB8=
  could be possibly useful?


Regards

Vincent.

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a 
charity registered in England with number 1021457 and a company registered in 
England with number 2742969, whose registered office is 215 Euston Road, 
London, NW1 2BE. 



Re: [OMPI users] How do you change ports used? [EXT]

2021-03-18 Thread Sendu Bala via users
Thanks, it made it work when I was running “true” as a test, but then my real 
MPI app failed with:

[node-5-8-2][[48139,1],0][btl_tcp_component.c:966:mca_btl_tcp_component_create_listen]
 bind() failed: no port available in the range [46107..46139]
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[48139,1],1]) is on host: node-12-6-2
  Process 2 ([[48139,1],0]) is on host: node-5-8-2
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.


This was when running with 16 cores, so I thought  a 32 port range would be 
fine. Is this telling me I have to make it a 33 port range, have different 
ranges for oob and btl, or that some other unrelated software is using some 
ports in my range?


(I changed my range from my previous post, because using that range resulted in 
the issue I posted about here before, where mpirun just does nothing for 5mins 
and then terminates itself, without any error messages.)


Cheers,
Sendu.


On 17 Mar 2021, at 13:25, Ralph Castain via users 
mailto:users@lists.open-mpi.org>> wrote:

What you are missing is that there are _two_ messaging layers in the system. 
You told the btl/tcp layer to use the specified ports, but left the oob/tcp one 
unspecified. You need to add

oob_tcp_dynamic_ipv4_ports = 46207-46239

or whatever range you want to specify

Note that if you want the btl/tcp layer to use those other settings (e.g., 
keepalive_time), then you'll need to set those as well. The names of the 
variables may not match between the layers - you'll need to use ompi_info to 
find the names and params available for each layer.


On Mar 16, 2021, at 2:43 AM, Vincent via users 
mailto:users@lists.open-mpi.org>> wrote:

On 09/03/2021 11:23, Sendu Bala via users wrote:
When using mpirun, how do you pick which ports are used?

I???ve tried:

mpirun --mca btl_tcp_port_min_v4 46207  --mca btl_tcp_port_range_v4 32 --mca 
oob_tcp_keepalive_time 45 --mca oob_tcp_max_recon_attempts 20 --mca 
oob_tcp_retry_delay  1 --mca oob_tcp_keepalive_probes 20 --mca 
oob_tcp_keepalive_intvl 10 true

And also setting similar things in openmpi/etc/openmpi-mca-params.conf :
btl_tcp_port_min_v4 = 46207
btl_tcp_port_range_v4 = 32
oob_tcp_keepalive_time = 45
oob_tcp_max_recon_attempts = 20
oob_tcp_retry_delay = 1
oob_tcp_keepalive_probes = 20
oob_tcp_keepalive_intvl = 10

But when the process is running:

ss -l -p -n | grep "pid=57642,"
tcp  LISTEN 0  128
127.0.0.1:58439 0.0.0.0:* users:(("mpirun",pid=57642,fd=14))
tcp  LISTEN 0  128  
0.0.0.0:36253 0.0.0.0:*   users:(("mpirun",pid=57642,fd=17))

What am I doing wrong, and how do I get it to use my desired ports (and other 
settings above)?


Hello

Could this be related to some recently resolved bug ?
What version are you running ?
Having a look on 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_open-2Dmpi_ompi_issues_8304=DwIFaQ=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo=R4ZUzQZ7_TZ1SVV_pAmysrrJ1zatMHFpzMNAdJSpPIo=Dv6xQizR35EO5Xf86whFlO2mZWbJO9kT0iMDaeL0iXs=RhsRamUPqN_mfRS_JffG2ZAfqgCaYGL1Fkqbv1d3WB8=
  could be possibly useful?

Regards

Vincent.




-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.