What is exactly the issue you are facing ?

You also need to force the subnet used by oob/tcp
mpirun —mca oob_tcp_if_include 10.233.0.0/19 ...

iirc, Open MPI might discard addresses from a bridge interface, but I do
not exactly remember if it affects both btl/tcp and/or oob/tcp and/or none
by default.

 Cheers,

Gilles

On Tuesday, June 19, 2018, Maksym Planeta <mplan...@os.inf.tu-dresden.de>
wrote:

> Hello,
>
> I want to force OpenMPI to use TCP and in particular use a particular
> subnet. Unfortunately, I can't manage to do that.
>
> Here is what I try:
>
> $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca
> ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np
> 4  --oversubscribe -H ib1n,ib2n bash -c 'echo $PMIX_SERVER_URI2'
>
> The expected result would be a list of IP addresses in 10.233.0.0 subnet,
> but instead I get this:
>
> 2659516416.2;tcp4://127.0.0.1:46777
> 2659516416.2;tcp4://127.0.0.1:46777
> 2659516416.1;tcp4://127.0.0.1:45055
> 2659516416.1;tcp4://127.0.0.1:45055
>
> Could you help me to debug this problem somehow?
>
> The IP addresses are completely available in the desired subnet
>
> $BIN/mpirun --mca pml ob1 --mca btl tcp,self  --mca
> ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np
> 4  --oversubscribe -H ib1n,ib2n ip addr show dev br0
>
> Returns a set of bridges looking like:
>
> 9: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
> group default qlen 1000
>     link/ether 94:de:80:ba:37:e4 brd ff:ff:ff:ff:ff:ff
>     inet 141.76.49.17/26 brd 141.76.49.63 scope global br0
>        valid_lft forever preferred_lft forever
>     inet 10.233.0.82/19 scope global br0
>        valid_lft forever preferred_lft forever
>     inet6 2002:8d4c:3001:48:40de:80ff:feba:37e4/64 scope global
> deprecated mngtmpaddr dynamic
>        valid_lft 59528sec preferred_lft 0sec
>     inet6 fe80::96de:80ff:feba:37e4/64 scope link tentative dadfailed
>        valid_lft forever preferred_lft forever
> <three overs are similar>
>
> What is more boggling is that if I attache with a debugger at
> opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp_components.c around
> line 500 I see that mca_ptl_tcp_component.remote_connections is false.
> This means that the way I set up component parameters is ignored.
>
> --
> Regards,
> Maksym Planeta
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to