Re: [OMPI users] Enforcing specific interface and subnet usage

2018-07-01 Thread Maksym Planeta
Sorry for late response. But I just wanted to inform you that I found 
another workaround, unrelated to the method we discussed here.


On 19/06/18 15:26, r...@open-mpi.org wrote:

The OMPI cmd line converts "--mca ptl_tcp_remote_connections 1” to OMPI_MCA_ 
ptl_tcp_remote_connections, which is not recognized by PMIx. PMIx is looking for 
PMIX_MCA_ptl_tcp_remote_connections. The only way to set PMIx MCA params for the 
code embedded in OMPI is to put them in your environment



On Jun 19, 2018, at 2:08 AM, Maksym Planeta  
wrote:

But what about remote connections parameter? Why is it not set?

On 19/06/18 00:58, r...@open-mpi.org wrote:

I’m not entirely sure I understand what you are trying to do. The 
PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx 
server (i.e., the OMPI daemon on that node). This is always done over the 
loopback device since it is a purely local connection that is never used for 
MPI messages.
I’m sure that the tcp/btl is using your indicated subnet as that would be used 
for internode messages.

--
Regards,
Maksym Planeta

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users



--
Regards,
Maksym Planeta
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-19 Thread r...@open-mpi.org
The OMPI cmd line converts "--mca ptl_tcp_remote_connections 1” to OMPI_MCA_ 
ptl_tcp_remote_connections, which is not recognized by PMIx. PMIx is looking 
for PMIX_MCA_ptl_tcp_remote_connections. The only way to set PMIx MCA params 
for the code embedded in OMPI is to put them in your environment


> On Jun 19, 2018, at 2:08 AM, Maksym Planeta  
> wrote:
> 
> But what about remote connections parameter? Why is it not set?
> 
> On 19/06/18 00:58, r...@open-mpi.org wrote:
>> I’m not entirely sure I understand what you are trying to do. The 
>> PMIX_SERVER_URI2 envar tells local clients how to connect to their local 
>> PMIx server (i.e., the OMPI daemon on that node). This is always done over 
>> the loopback device since it is a purely local connection that is never used 
>> for MPI messages.
>> I’m sure that the tcp/btl is using your indicated subnet as that would be 
>> used for internode messages.
> -- 
> Regards,
> Maksym Planeta
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-19 Thread Gilles Gouaillardet
What is exactly the issue you are facing ?

You also need to force the subnet used by oob/tcp
mpirun —mca oob_tcp_if_include 10.233.0.0/19 ...

iirc, Open MPI might discard addresses from a bridge interface, but I do
not exactly remember if it affects both btl/tcp and/or oob/tcp and/or none
by default.

 Cheers,

Gilles

On Tuesday, June 19, 2018, Maksym Planeta 
wrote:

> Hello,
>
> I want to force OpenMPI to use TCP and in particular use a particular
> subnet. Unfortunately, I can't manage to do that.
>
> Here is what I try:
>
> $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca
> ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np
> 4  --oversubscribe -H ib1n,ib2n bash -c 'echo $PMIX_SERVER_URI2'
>
> The expected result would be a list of IP addresses in 10.233.0.0 subnet,
> but instead I get this:
>
> 2659516416.2;tcp4://127.0.0.1:46777
> 2659516416.2;tcp4://127.0.0.1:46777
> 2659516416.1;tcp4://127.0.0.1:45055
> 2659516416.1;tcp4://127.0.0.1:45055
>
> Could you help me to debug this problem somehow?
>
> The IP addresses are completely available in the desired subnet
>
> $BIN/mpirun --mca pml ob1 --mca btl tcp,self  --mca
> ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np
> 4  --oversubscribe -H ib1n,ib2n ip addr show dev br0
>
> Returns a set of bridges looking like:
>
> 9: br0:  mtu 1500 qdisc noqueue state UP
> group default qlen 1000
> link/ether 94:de:80:ba:37:e4 brd ff:ff:ff:ff:ff:ff
> inet 141.76.49.17/26 brd 141.76.49.63 scope global br0
>valid_lft forever preferred_lft forever
> inet 10.233.0.82/19 scope global br0
>valid_lft forever preferred_lft forever
> inet6 2002:8d4c:3001:48:40de:80ff:feba:37e4/64 scope global
> deprecated mngtmpaddr dynamic
>valid_lft 59528sec preferred_lft 0sec
> inet6 fe80::96de:80ff:feba:37e4/64 scope link tentative dadfailed
>valid_lft forever preferred_lft forever
> 
>
> What is more boggling is that if I attache with a debugger at
> opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp_components.c around
> line 500 I see that mca_ptl_tcp_component.remote_connections is false.
> This means that the way I set up component parameters is ignored.
>
> --
> Regards,
> Maksym Planeta
>
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-19 Thread Maksym Planeta

But what about remote connections parameter? Why is it not set?

On 19/06/18 00:58, r...@open-mpi.org wrote:

I’m not entirely sure I understand what you are trying to do. The 
PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx 
server (i.e., the OMPI daemon on that node). This is always done over the 
loopback device since it is a purely local connection that is never used for 
MPI messages.

I’m sure that the tcp/btl is using your indicated subnet as that would be used 
for internode messages.


--
Regards,
Maksym Planeta



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-18 Thread r...@open-mpi.org
I’m not entirely sure I understand what you are trying to do. The 
PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx 
server (i.e., the OMPI daemon on that node). This is always done over the 
loopback device since it is a purely local connection that is never used for 
MPI messages.

I’m sure that the tcp/btl is using your indicated subnet as that would be used 
for internode messages.


> On Jun 18, 2018, at 3:52 PM, Maksym Planeta  
> wrote:
> 
> Hello,
> 
> I want to force OpenMPI to use TCP and in particular use a particular subnet. 
> Unfortunately, I can't manage to do that.
> 
> Here is what I try:
> 
> $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca ptl_tcp_remote_connections 
> 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4  --oversubscribe -H 
> ib1n,ib2n bash -c 'echo $PMIX_SERVER_URI2'
> 
> The expected result would be a list of IP addresses in 10.233.0.0 subnet, but 
> instead I get this:
> 
> 2659516416.2;tcp4://127.0.0.1:46777
> 2659516416.2;tcp4://127.0.0.1:46777
> 2659516416.1;tcp4://127.0.0.1:45055
> 2659516416.1;tcp4://127.0.0.1:45055
> 
> Could you help me to debug this problem somehow?
> 
> The IP addresses are completely available in the desired subnet
> 
> $BIN/mpirun --mca pml ob1 --mca btl tcp,self  --mca 
> ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4  
> --oversubscribe -H ib1n,ib2n ip addr show dev br0
> 
> Returns a set of bridges looking like:
> 
> 9: br0:  mtu 1500 qdisc noqueue state UP 
> group default qlen 1000
>link/ether 94:de:80:ba:37:e4 brd ff:ff:ff:ff:ff:ff
>inet 141.76.49.17/26 brd 141.76.49.63 scope global br0
>   valid_lft forever preferred_lft forever
>inet 10.233.0.82/19 scope global br0
>   valid_lft forever preferred_lft forever
>inet6 2002:8d4c:3001:48:40de:80ff:feba:37e4/64 scope global deprecated 
> mngtmpaddr dynamic 
>   valid_lft 59528sec preferred_lft 0sec
>inet6 fe80::96de:80ff:feba:37e4/64 scope link tentative dadfailed 
>   valid_lft forever preferred_lft forever
> 
> 
> What is more boggling is that if I attache with a debugger at 
> opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp_components.c around line 
> 500 I see that mca_ptl_tcp_component.remote_connections is false. This means 
> that the way I set up component parameters is ignored.
> 
> -- 
> Regards,
> Maksym Planeta
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Enforcing specific interface and subnet usage

2018-06-18 Thread Maksym Planeta
Hello,

I want to force OpenMPI to use TCP and in particular use a particular subnet. 
Unfortunately, I can't manage to do that.

Here is what I try:

$BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca ptl_tcp_remote_connections 1 
--mca btl_tcp_if_include '10.233.0.0/19' -np 4  --oversubscribe -H ib1n,ib2n 
bash -c 'echo $PMIX_SERVER_URI2'

The expected result would be a list of IP addresses in 10.233.0.0 subnet, but 
instead I get this:

2659516416.2;tcp4://127.0.0.1:46777
2659516416.2;tcp4://127.0.0.1:46777
2659516416.1;tcp4://127.0.0.1:45055
2659516416.1;tcp4://127.0.0.1:45055

Could you help me to debug this problem somehow?

The IP addresses are completely available in the desired subnet

$BIN/mpirun --mca pml ob1 --mca btl tcp,self  --mca ptl_tcp_remote_connections 
1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4  --oversubscribe -H ib1n,ib2n 
ip addr show dev br0

Returns a set of bridges looking like:

9: br0:  mtu 1500 qdisc noqueue state UP group 
default qlen 1000
link/ether 94:de:80:ba:37:e4 brd ff:ff:ff:ff:ff:ff
inet 141.76.49.17/26 brd 141.76.49.63 scope global br0
   valid_lft forever preferred_lft forever
inet 10.233.0.82/19 scope global br0
   valid_lft forever preferred_lft forever
inet6 2002:8d4c:3001:48:40de:80ff:feba:37e4/64 scope global deprecated 
mngtmpaddr dynamic 
   valid_lft 59528sec preferred_lft 0sec
inet6 fe80::96de:80ff:feba:37e4/64 scope link tentative dadfailed 
   valid_lft forever preferred_lft forever


What is more boggling is that if I attache with a debugger at 
opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp_components.c around line 500 
I see that mca_ptl_tcp_component.remote_connections is false. This means that 
the way I set up component parameters is ignored.

-- 
Regards,
Maksym Planeta



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users