Re: [OMPI users] Disable network interface selection

2018-07-09 Thread Jeff Squyres (jsquyres) via users
Can you send the full verbose output with "--mca btl_base_verbose 100"?


> On Jul 4, 2018, at 4:36 PM, carlos aguni  wrote:
> 
> Hi Gilles. 
> 
> Thank you for your reply! :)
> I'm now using a compiled version of OpenMPI 3.0.2 and all seems to work fine 
> now.
> Running `mpirun -n 3 -host c01,c02,c03 hostname` i get:
> c01
> c02
> c03
> 
> `mpirun -n 2 -host c01,c02 hostname`:
> c02
> c01
> 
> `mpirun -n 2 -host c01,c03 hostname`:
> c01
> c03
> 
> Which is expected.
> 
> Now when I run a MPI_Spawn it prints out a warning message which refers to it 
> getting the wrong IP.
> Check the command. I'll highlight some verbose.
> `mpirun -n 1 --machinefile con_c03_hostfile --mca oob_base_verbose 10 
> con_c03`:
> Hello world from processor c01, rank 0 out of 2 processors
> Im the spawned rank 0
> Hello world from processor c03, rank 1 out of 2 processors
> [[35996,2],0][btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect] from 
> c03 to: c01 Unable to connect to the peer 10.0.0.1 on port 1024: Network is 
> unreachable
> 
> [c03:06355] pml_ob1_sendreq.c:235 FATAL
> 
> Verbose below:
> [c01:05462] [[36010,0],0] oob:tcp:init adding 10.0.0.1 to our list of V4 
> connections
> [c01:05462] [[36010,0],0] oob:tcp:init adding 172.16.0.1 to our list of V4 
> connections
> [c01:05462] [[36010,0],0] oob:tcp:init adding 172.21.1.136 to our list of V4 
> connections
> [c03:06225] [[36010,0],1] oob:tcp:init adding 192.168.0.1 to our list of V4 
> connections
> [c03:06225] [[36010,0],1] oob:tcp:init adding 172.16.0.2 to our list of V4 
> connections
> 
> Is there a way to suppress it?
> 
> My env is as described below:
> c01
> ens8 10.0.0.1/24
> ens9 172.16.0.1/24
> eth0 172.21.1.136/24
> 
> c02
> eth0 10.0.0.2/24
> 
> c03
> ens8 192.168.0.1/24
> eth1 172.16.0.2/24
> 
> c04
> eth0 192.168.0.2/24
> 
> Regards,
> Carlos.
> 
> On Sun, Jul 1, 2018 at 9:01 PM, Gilles Gouaillardet  wrote:
> Carlos,
> 
> 
> Open MPI 3.0.2 has been released, and it contains several bug fixes, so I do
> 
> encourage you to upgrade and try again.
> 
> 
> 
> if it still does not work, can you please run
> 
> mpirun --mca oob_base_verbose 10 ...
> 
> and then compress and post the output ?
> 
> 
> out of curiosity, would
> 
> mpirun --mca routed_radix 1 ...
> 
> work in your environment ?
> 
> 
> once we can analyze the logs, we should be able to figure out what is going 
> wrong.
> 
> 
> Cheers,
> 
> Gilles
> 
> On 6/29/2018 4:10 AM, carlos aguni wrote:
> Just realized my email wasn't sent to the archive.
> 
> On Sat, Jun 23, 2018 at 5:34 PM, carlos aguni  > wrote:
> 
> Hi!
> 
> Thank you all for your reply Jeff, Gilles and rhc.
> 
> Thank you Jeff and rhc for clarifying to me some of the openmpi's
> internals.
> 
> >> FWIW: we never send interface names to other hosts - just dot
> addresses
> > Should have clarified - when you specify an interface name for the
> MCA param, then it is the interface name that is transferred as
> that is the value of the MCA param. However, once we determine our
> address, we only transfer dot addresses between ourselves
> 
> If only dot addresses are sent to the hosts then why doesn't
> openmpi use the default route like `ip route get `
> instead of choosing a random one? Is it an expected behaviour? Can
> it be changed?
> 
> Sorry. As Gilles pointed out I forgot to mention which openmpi
> version I was using. I'm using openmpi 3.0.0 gcc 7.3.0 from
> openhpc. Centos 7.5.
> 
> > mpirun—mca oob_tcp_if_exclude192.168.100.0/24
> ...
> 
> I cannot just exclude that interface cause after that I want to
> add another computer that's on a different network. And this is
> where things get messy :( I cannot just include and exclude
> networks cause I have different machines on different networks.
> This is what I want to achieve:
> 
> 
> 
> 
> compute01
> 
> 
> 
> compute02
> 
> 
> 
> compute03
> 
> ens3
> 
> 
> 
> 192.168.100.104/24 
> 
> 
> 
> 10.0.0.227/24 
> 
> 
> 
> 192.168.100.105/24 
> 
> ens8
> 
> 
> 
> 10.0.0.228/24 
> 
> 
> 
> 172.21.1.128/24 
> 
> 
> 
> ---
> 
> ens9
> 
> 
> 
> 172.21.1.155/24 
> 
> 
> 
> ---
> 
> 
> 
> ---
> 
> 
> So I'm in compute01 MPI_spawning another process on compute02 and
> compute03.
> With both MPI_Spawn and `mpirun -n 3 -host
> compute01,compute02,compute03 hostname`
> 
> Then when I include the mca parameters I get this:
> `mpirun --oversubscribe --allow-run-as-root -n 3 --mca
> oob_tcp_if_include 10.0.0.0/24,192.168.100.0/24
>  -host
> compute01,compute02,compute03 hostna

Re: [OMPI users] Disable network interface selection

2018-07-04 Thread carlos aguni
Hi Gilles.

Thank you for your reply! :)
I'm now using a compiled version of OpenMPI 3.0.2 and all seems to work
fine now.
Running `mpirun -n 3 -host c01,c02,c03 hostname` i get:
c01
c02
c03

`mpirun -n 2 -host c01,c02 hostname`:
c02
c01

`mpirun -n 2 -host c01,c03 hostname`:
c01
c03

Which is expected.

Now when I run a MPI_Spawn it prints out a warning message which refers to
it getting the wrong IP.
Check the command. I'll highlight some verbose.
`mpirun -n 1 --machinefile con_c03_hostfile --mca oob_base_verbose 10
con_c03`:
Hello world from processor c01, rank 0 out of 2 processors
Im the spawned rank 0
Hello world from processor c03, rank 1 out of 2 processors
[[35996,2],0][btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect]
from c03 to: c01 Unable to connect to the peer 10.0.0.1 on port 1024:
Network is unreachable

[c03:06355] pml_ob1_sendreq.c:235 FATAL

Verbose below:
[c01:05462] [[36010,0],0] oob:tcp:init adding 10.0.0.1 to our list of V4
connections
[c01:05462] [[36010,0],0] oob:tcp:init adding 172.16.0.1 to our list of V4
connections
[c01:05462] [[36010,0],0] oob:tcp:init adding 172.21.1.136 to our list of
V4 connections
[c03:06225] [[36010,0],1] oob:tcp:init adding 192.168.0.1 to our list of V4
connections
[c03:06225] [[36010,0],1] oob:tcp:init adding 172.16.0.2 to our list of V4
connections

Is there a way to suppress it?

My env is as described below:
*c01*
ens8 10.0.0.1/24
ens9 172.16.0.1/24
eth0 172.21.1.136/24

*c02*
eth0 10.0.0.2/24

*c03*
ens8 192.168.0.1/24
eth1 172.16.0.2/24

*c04*
eth0 192.168.0.2/24

Regards,
Carlos.

On Sun, Jul 1, 2018 at 9:01 PM, Gilles Gouaillardet 
wrote:

> Carlos,
>
>
> Open MPI 3.0.2 has been released, and it contains several bug fixes, so I
> do
>
> encourage you to upgrade and try again.
>
>
>
> if it still does not work, can you please run
>
> mpirun --mca oob_base_verbose 10 ...
>
> and then compress and post the output ?
>
>
> out of curiosity, would
>
> mpirun --mca routed_radix 1 ...
>
> work in your environment ?
>
>
> once we can analyze the logs, we should be able to figure out what is
> going wrong.
>
>
> Cheers,
>
> Gilles
>
> On 6/29/2018 4:10 AM, carlos aguni wrote:
>
>> Just realized my email wasn't sent to the archive.
>>
>> On Sat, Jun 23, 2018 at 5:34 PM, carlos aguni > > wrote:
>>
>> Hi!
>>
>> Thank you all for your reply Jeff, Gilles and rhc.
>>
>> Thank you Jeff and rhc for clarifying to me some of the openmpi's
>> internals.
>>
>> >> FWIW: we never send interface names to other hosts - just dot
>> addresses
>> > Should have clarified - when you specify an interface name for the
>> MCA param, then it is the interface name that is transferred as
>> that is the value of the MCA param. However, once we determine our
>> address, we only transfer dot addresses between ourselves
>>
>> If only dot addresses are sent to the hosts then why doesn't
>> openmpi use the default route like `ip route get `
>> instead of choosing a random one? Is it an expected behaviour? Can
>> it be changed?
>>
>> Sorry. As Gilles pointed out I forgot to mention which openmpi
>> version I was using. I'm using openmpi 3.0.0 gcc 7.3.0 from
>> openhpc. Centos 7.5.
>>
>> > mpirun—mca oob_tcp_if_exclude192.168.100.0/24
>> ...
>>
>> I cannot just exclude that interface cause after that I want to
>> add another computer that's on a different network. And this is
>> where things get messy :( I cannot just include and exclude
>> networks cause I have different machines on different networks.
>> This is what I want to achieve:
>>
>>
>>
>>
>> compute01
>>
>>
>>
>> compute02
>>
>>
>>
>> compute03
>>
>> ens3
>>
>>
>>
>> 192.168.100.104/24 
>>
>>
>>
>> 10.0.0.227/24 
>>
>>
>>
>> 192.168.100.105/24 
>>
>> ens8
>>
>>
>>
>> 10.0.0.228/24 
>>
>>
>>
>> 172.21.1.128/24 
>>
>>
>>
>> ---
>>
>> ens9
>>
>>
>>
>> 172.21.1.155/24 
>>
>>
>>
>> ---
>>
>>
>>
>> ---
>>
>>
>> So I'm in compute01 MPI_spawning another process on compute02 and
>> compute03.
>> With both MPI_Spawn and `mpirun -n 3 -host
>> compute01,compute02,compute03 hostname`
>>
>> Then when I include the mca parameters I get this:
>> `mpirun --oversubscribe --allow-run-as-root -n 3 --mca
>> oob_tcp_if_include 10.0.0.0/24,192.168.100.0/24
>>  -host
>> compute01,compute02,compute03 hostname`
>> WARNING: An invalid value was given for oob_tcp_if_include. This
>> value will be ignored.
>> ...
>> Message:Did not find interface matching this subnet
>>
>> This would all work if it were to use the system's internals like
>> `ip route`.
>>
>> Best regards,
>> Carlos.
>>
>>
>>
>>

Re: [OMPI users] Disable network interface selection

2018-07-01 Thread Gilles Gouaillardet

Carlos,


Open MPI 3.0.2 has been released, and it contains several bug fixes, so I do

encourage you to upgrade and try again.



if it still does not work, can you please run

mpirun --mca oob_base_verbose 10 ...

and then compress and post the output ?


out of curiosity, would

mpirun --mca routed_radix 1 ...

work in your environment ?


once we can analyze the logs, we should be able to figure out what is 
going wrong.



Cheers,

Gilles

On 6/29/2018 4:10 AM, carlos aguni wrote:

Just realized my email wasn't sent to the archive.

On Sat, Jun 23, 2018 at 5:34 PM, carlos aguni > wrote:


Hi!

Thank you all for your reply Jeff, Gilles and rhc.

Thank you Jeff and rhc for clarifying to me some of the openmpi's
internals.

>> FWIW: we never send interface names to other hosts - just dot
addresses
> Should have clarified - when you specify an interface name for the
MCA param, then it is the interface name that is transferred as
that is the value of the MCA param. However, once we determine our
address, we only transfer dot addresses between ourselves

If only dot addresses are sent to the hosts then why doesn't
openmpi use the default route like `ip route get `
instead of choosing a random one? Is it an expected behaviour? Can
it be changed?

Sorry. As Gilles pointed out I forgot to mention which openmpi
version I was using. I'm using openmpi 3.0.0 gcc 7.3.0 from
openhpc. Centos 7.5.

> mpirun—mca oob_tcp_if_exclude192.168.100.0/24
...

I cannot just exclude that interface cause after that I want to
add another computer that's on a different network. And this is
where things get messy :( I cannot just include and exclude
networks cause I have different machines on different networks.
This is what I want to achieve:




compute01



compute02



compute03

ens3



192.168.100.104/24 



10.0.0.227/24 



192.168.100.105/24 

ens8



10.0.0.228/24 



172.21.1.128/24 



---

ens9



172.21.1.155/24 



---



---


So I'm in compute01 MPI_spawning another process on compute02 and
compute03.
With both MPI_Spawn and `mpirun -n 3 -host
compute01,compute02,compute03 hostname`

Then when I include the mca parameters I get this:
`mpirun --oversubscribe --allow-run-as-root -n 3 --mca
oob_tcp_if_include 10.0.0.0/24,192.168.100.0/24
 -host
compute01,compute02,compute03 hostname`
WARNING: An invalid value was given for oob_tcp_if_include. This
value will be ignored.
...
Message:    Did not find interface matching this subnet

This would all work if it were to use the system's internals like
`ip route`.

Best regards,
Carlos.




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Disable network interface selection

2018-07-01 Thread carlos aguni
Just realized my email wasn't sent to the archive.

On Sat, Jun 23, 2018 at 5:34 PM, carlos aguni  wrote:

> Hi!
>
> Thank you all for your reply Jeff, Gilles and rhc.
>
> Thank you Jeff and rhc for clarifying to me some of the openmpi's
> internals.
>
> >> FWIW: we never send interface names to other hosts - just dot addresses
> > Should have clarified - when you specify an interface name for the MCA
> param, then it is the interface name that is transferred as that is the
> value of the MCA param. However, once we determine our address, we only
> transfer dot addresses between ourselves
>
> If only dot addresses are sent to the hosts then why doesn't openmpi use
> the default route like `ip route get ` instead of choosing a
> random one? Is it an expected behaviour? Can it be changed?
>
> Sorry. As Gilles pointed out I forgot to mention which openmpi version I
> was using. I'm using openmpi 3.0.0 gcc 7.3.0 from openhpc. Centos 7.5.
>
> > mpirun—mca oob_tcp_if_exclude 192.168.100.0/24 ...
>
> I cannot just exclude that interface cause after that I want to add
> another computer that's on a different network. And this is where things
> get messy :( I cannot just include and exclude networks cause I have
> different machines on different networks.
> This is what I want to achieve:
>
>
> compute01
>
> compute02
>
> compute03
>
> ens3
>
> 192.168.100.104/24
>
> 10.0.0.227/24
>
> 192.168.100.105/24
>
> ens8
>
> 10.0.0.228/24
>
> 172.21.1.128/24
>
> ---
>
> ens9
>
> 172.21.1.155/24
>
> ---
>
> ---
>
> So I'm in compute01 MPI_spawning another process on compute02 and
> compute03.
> With both MPI_Spawn and `mpirun -n 3 -host compute01,compute02,compute03
> hostname`
>
> Then when I include the mca parameters I get this:
> `mpirun --oversubscribe --allow-run-as-root -n 3 --mca oob_tcp_if_include
> 10.0.0.0/24,192.168.100.0/24 -host compute01,compute02,compute03 hostname`
> WARNING: An invalid value was given for oob_tcp_if_include.  This value
> will be ignored.
> ...
> Message:Did not find interface matching this subnet
>
> This would all work if it were to use the system's internals like `ip
> route`.
>
> Best regards,
> Carlos.
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Disable network interface selection

2018-07-01 Thread carlos aguni
Hi!

Thank you all for your reply Jeff, Gilles and rhc.

Thank you Jeff and rhc for clarifying to me some of the openmpi's internals.

>> FWIW: we never send interface names to other hosts - just dot addresses
> Should have clarified - when you specify an interface name for the MCA
param, then it is the interface name that is transferred as that is the
value of the MCA param. However, once we determine our address, we only
transfer dot addresses between ourselves

If only dot addresses are sent to the hosts then why doesn't openmpi use
the default route like `ip route get ` instead of choosing a
random one? Is it an expected behaviour? Can it be changed?

Sorry. As Gilles pointed out I forgot to mention which openmpi version I
was using. I'm using openmpi 3.0.0 gcc 7.3.0 from openhpc. Centos 7.5.

> mpirun—mca oob_tcp_if_exclude 192.168.100.0/24 ...

I cannot just exclude that interface cause after that I want to add another
computer that's on a different network. And this is where things get messy
:( I cannot just include and exclude networks cause I have different
machines on different networks.
This is what I want to achieve:


compute01

compute02

compute03

ens3

192.168.100.104/24

10.0.0.227/24

192.168.100.105/24

ens8

10.0.0.228/24

172.21.1.128/24

---

ens9

172.21.1.155/24

---

---

So I'm in compute01 MPI_spawning another process on compute02 and compute03.
With both MPI_Spawn and `mpirun -n 3 -host compute01,compute02,compute03
hostname`

Then when I include the mca parameters I get this:
`mpirun --oversubscribe --allow-run-as-root -n 3 --mca oob_tcp_if_include
10.0.0.0/24,192.168.100.0/24 -host compute01,compute02,compute03 hostname`
WARNING: An invalid value was given for oob_tcp_if_include.  This value
will be ignored.
...
Message:Did not find interface matching this subnet

This would all work if it were to use the system's internals like `ip
route`.

Best regards,
Carlos.

On Sat, Jun 23, 2018 at 12:27 AM, r...@open-mpi.org  wrote:

>
>
> On Jun 22, 2018, at 8:25 PM, r...@open-mpi.org wrote:
>
>
>
> On Jun 22, 2018, at 7:31 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
> Carlos,
>
> By any chance, could
>
> mpirun—mca oob_tcp_if_exclude 192.168.100.0/24 ...
>
> work for you ?
>
> Which Open MPI version are you running ?
>
>
> IIRC, subnets are internally translated to interfaces, so that might be an
> issue if
> the translation if made on the first host, and then the interface name is
> sent to the other hosts.
>
>
> FWIW: we never send interface names to other hosts - just dot addresses
>
>
> Should have clarified - when you specify an interface name for the MCA
> param, then it is the interface name that is transferred as that is the
> value of the MCA param. However, once we determine our address, we only
> transfer dot addresses between ourselves
>
>
>
>
> Cheers,
>
> Gilles
>
> On Saturday, June 23, 2018, carlos aguni  wrote:
>
>> Hi all,
>>
>> I'm trying to run a code on 2 machines that has at least 2 network
>> interfaces in it.
>> So I have them as described below:
>>
>> compute01
>> compute02
>> ens3
>> 192.168.100.104/24
>> 10.0.0.227/24
>> ens8
>> 10.0.0.228/24
>> 172.21.1.128/24
>> ens9
>> 172.21.1.155/24
>> ---
>>
>> Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname`
>> on them what I get is the correct output after a very long delay..
>>
>> What I've read so far is that OpenMPI performs a greedy algorithm on each
>> interface that timeouts if it doesn't find the desired IP.
>> Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection)
>> that I can run commands like:
>> `$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24 -n 2 -host
>> compute01,compute02 hosname`
>> But this configuration doesn't reach the other host(s).
>> In the end I sometimes I get the same timeout.
>>
>> So is there a way to let it to use the system's default route?
>>
>> Regards,
>> Carlos.
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Disable network interface selection

2018-07-01 Thread carlos aguni
Hi!

Thank you all for your reply Jeff, Gilles and rhc.

Thank you Jeff and rhc for clarifying to me some of the openmpi's internals.

>> FWIW: we never send interface names to other hosts - just dot addresses
> Should have clarified - when you specify an interface name for the MCA
param, then it is the interface name that is transferred as that is the
value of the MCA param. However, once we determine our address, we only
transfer dot addresses between ourselves

If only dot addresses are sent to the hosts then why doesn't openmpi use
the default route like `ip route get ` instead of choosing a
random one? Is it an expected behaviour? Can it be changed?

Sorry. As Gilles pointed out I forgot to mention which openmpi version I
was using. I'm using openmpi 3.0.0 gcc 7.3.0 from openhpc. Centos 7.5.

> mpirun—mca oob_tcp_if_exclude 192.168.100.0/24 ...

I cannot just exclude that interface cause after that I want to add another
computer that's on a different network. And this is where things get messy
:( I cannot just include and exclude networks cause I have different
machines on different networks.
This is what I want to achieve:


compute01

compute02

compute03

ens3

192.168.100.104/24

10.0.0.227/24

192.168.100.105/24

ens8

10.0.0.228/24

172.21.1.128/24

---

ens9

172.21.1.155/24

---

---

So I'm in compute01 MPI_spawning another process on compute02 and compute03.
With both MPI_Spawn and `mpirun -n 3 -host compute01,compute02,compute03
hostname`

Then when I include the mca parameters I get this:
`mpirun --oversubscribe --allow-run-as-root -n 3 --mca oob_tcp_if_include
10.0.0.0/24,192.168.100.0/24 -host compute01,compute02,compute03 hostname`
WARNING: An invalid value was given for oob_tcp_if_include.  This value
will be ignored.
...
Message:Did not find interface matching this subnet

This would all work if it were to use the system's internals like `ip
route`.

Best regards,
Carlos.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread r...@open-mpi.org


> On Jun 22, 2018, at 8:25 PM, r...@open-mpi.org wrote:
> 
> 
> 
>> On Jun 22, 2018, at 7:31 PM, Gilles Gouaillardet 
>> mailto:gilles.gouaillar...@gmail.com>> wrote:
>> 
>> Carlos,
>> 
>> By any chance, could
>> 
>> mpirun—mca oob_tcp_if_exclude 192.168.100.0/24  ...
>> 
>> work for you ?
>> 
>> Which Open MPI version are you running ?
>> 
>> 
>> IIRC, subnets are internally translated to interfaces, so that might be an 
>> issue if
>> the translation if made on the first host, and then the interface name is 
>> sent to the other hosts.
> 
> FWIW: we never send interface names to other hosts - just dot addresses

Should have clarified - when you specify an interface name for the MCA param, 
then it is the interface name that is transferred as that is the value of the 
MCA param. However, once we determine our address, we only transfer dot 
addresses between ourselves


> 
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On Saturday, June 23, 2018, carlos aguni > > wrote:
>> Hi all, 
>> 
>> I'm trying to run a code on 2 machines that has at least 2 network 
>> interfaces in it.
>> So I have them as described below:
>> 
>> compute01
>> compute02
>> ens3
>> 192.168.100.104/24    
>> 10.0.0.227/24 
>> ens8
>> 10.0.0.228/24  
>> 172.21.1.128/24 
>> ens9
>> 172.21.1.155/24  
>> ---
>> 
>> Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname` on 
>> them what I get is the correct output after a very long delay..
>> 
>> What I've read so far is that OpenMPI performs a greedy algorithm on each 
>> interface that timeouts if it doesn't find the desired IP.
>> Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection 
>> ) that I can run 
>> commands like:
>> `$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24  -n 
>> 2 -host compute01,compute02 hosname`
>> But this configuration doesn't reach the other host(s).
>> In the end I sometimes I get the same timeout.
>> 
>> So is there a way to let it to use the system's default route?
>> 
>> Regards,
>> Carlos.
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread r...@open-mpi.org


> On Jun 22, 2018, at 7:31 PM, Gilles Gouaillardet 
>  wrote:
> 
> Carlos,
> 
> By any chance, could
> 
> mpirun—mca oob_tcp_if_exclude 192.168.100.0/24  ...
> 
> work for you ?
> 
> Which Open MPI version are you running ?
> 
> 
> IIRC, subnets are internally translated to interfaces, so that might be an 
> issue if
> the translation if made on the first host, and then the interface name is 
> sent to the other hosts.

FWIW: we never send interface names to other hosts - just dot addresses

> 
> Cheers,
> 
> Gilles
> 
> On Saturday, June 23, 2018, carlos aguni  > wrote:
> Hi all, 
> 
> I'm trying to run a code on 2 machines that has at least 2 network interfaces 
> in it.
> So I have them as described below:
> 
> compute01
> compute02
> ens3
> 192.168.100.104/24 
> 10.0.0.227/24 
> ens8
> 10.0.0.228/24   
> 172.21.1.128/24 
> ens9
> 172.21.1.155/24   
> ---
> 
> Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname` on 
> them what I get is the correct output after a very long delay..
> 
> What I've read so far is that OpenMPI performs a greedy algorithm on each 
> interface that timeouts if it doesn't find the desired IP.
> Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection 
> ) that I can run 
> commands like:
> `$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24  -n 2 
> -host compute01,compute02 hosname`
> But this configuration doesn't reach the other host(s).
> In the end I sometimes I get the same timeout.
> 
> So is there a way to let it to use the system's default route?
> 
> Regards,
> Carlos.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread Gilles Gouaillardet
Carlos,

By any chance, could

mpirun—mca oob_tcp_if_exclude 192.168.100.0/24 ...

work for you ?

Which Open MPI version are you running ?


IIRC, subnets are internally translated to interfaces, so that might be an
issue if
the translation if made on the first host, and then the interface name is
sent to the other hosts.

Cheers,

Gilles

On Saturday, June 23, 2018, carlos aguni  wrote:

> Hi all,
>
> I'm trying to run a code on 2 machines that has at least 2 network
> interfaces in it.
> So I have them as described below:
>
>
> compute01
>
> compute02
>
> ens3
>
> 192.168.100.104/24
>
> 10.0.0.227/24
>
> ens8
>
> 10.0.0.228/24
>
> 172.21.1.128/24
>
> ens9
>
> 172.21.1.155/24
>
> ---
>
> Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname`
> on them what I get is the correct output after a very long delay..
>
> What I've read so far is that OpenMPI performs a greedy algorithm on each
> interface that timeouts if it doesn't find the desired IP.
> Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection)
> that I can run commands like:
> `$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24 -n 2 -host
> compute01,compute02 hosname`
> But this configuration doesn't reach the other host(s).
> In the end I sometimes I get the same timeout.
>
> So is there a way to let it to use the system's default route?
>
> Regards,
> Carlos.
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread Jeff Squyres (jsquyres) via users
On Jun 22, 2018, at 7:36 PM, carlos aguni  wrote:
> 
> I'm trying to run a code on 2 machines that has at least 2 network interfaces 
> in it.
> So I have them as described below:
> 
> compute01
> compute02
> ens3
> 192.168.100.104/24
> 10.0.0.227/24
> ens8
> 10.0.0.228/24
> 172.21.1.128/24
> ens9
> 172.21.1.155/24
> ---
> 
> Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname` on 
> them what I get is the correct output after a very long delay..
> 
> What I've read so far is that OpenMPI performs a greedy algorithm on each 
> interface that timeouts if it doesn't find the desired IP.
> Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection) 
> that I can run commands like:
> `$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24 -n 2 -host 
> compute01,compute02 hosname`
> But this configuration doesn't reach the other host(s).

There's actually 2 different uses of TCP in Open MPI: the MPI communications 
and the runtime communications.

In your scenario, the MPI communications should probably "just figure it out" 
(since you have 2 interfaces on the same subnets on each machine).  It can do 
this because the runtime has already established, and -- for lack of a longer 
explanation -- it can do very speedy discovery and interface matching.

But the runtime has nothing else to refer to, and it has to do its own 
discovery with no prior knowledge of anything.  This is where the timeouts come 
in.

What you described above -- setting oob_tcp_if_include to the 10.0.0.0/24 
network -- *should* work.  It's a little surprising that it does not.

Can you run with:

mpirun -np 2 --mca oob_tcp_if_include 10.0.0.0/24 --mca oob_base_verbose 100 
-host compute01,compute02 hostname

And see what it shows us?

> In the end I sometimes I get the same timeout.
> 
> So is there a way to let it to use the system's default route?

Yes and no.  The problem is that in HPC environments, the default IP route is 
not always in the same direction as the nodes on which you're trying to run 
(i.e., there's a zillion different ways to setup the IP networking, and Open 
MPI uses tend to use a lot of different ones...).

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] Disable network interface selection

2018-06-22 Thread carlos aguni
Hi all,

I'm trying to run a code on 2 machines that has at least 2 network
interfaces in it.
So I have them as described below:


compute01

compute02

ens3

192.168.100.104/24

10.0.0.227/24

ens8

10.0.0.228/24

172.21.1.128/24

ens9

172.21.1.155/24

---

Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname`
on them what I get is the correct output after a very long delay..

What I've read so far is that OpenMPI performs a greedy algorithm on each
interface that timeouts if it doesn't find the desired IP.
Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection)
that I can run commands like:
`$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24 -n 2 -host
compute01,compute02 hosname`
But this configuration doesn't reach the other host(s).
In the end I sometimes I get the same timeout.

So is there a way to let it to use the system's default route?

Regards,
Carlos.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users