What is exactly the issue you are facing ? You also need to force the subnet used by oob/tcp mpirun —mca oob_tcp_if_include 10.233.0.0/19 ...
iirc, Open MPI might discard addresses from a bridge interface, but I do not exactly remember if it affects both btl/tcp and/or oob/tcp and/or none by default. Cheers, Gilles On Tuesday, June 19, 2018, Maksym Planeta <mplan...@os.inf.tu-dresden.de> wrote: > Hello, > > I want to force OpenMPI to use TCP and in particular use a particular > subnet. Unfortunately, I can't manage to do that. > > Here is what I try: > > $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca > ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np > 4 --oversubscribe -H ib1n,ib2n bash -c 'echo $PMIX_SERVER_URI2' > > The expected result would be a list of IP addresses in 10.233.0.0 subnet, > but instead I get this: > > 2659516416.2;tcp4://127.0.0.1:46777 > 2659516416.2;tcp4://127.0.0.1:46777 > 2659516416.1;tcp4://127.0.0.1:45055 > 2659516416.1;tcp4://127.0.0.1:45055 > > Could you help me to debug this problem somehow? > > The IP addresses are completely available in the desired subnet > > $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca > ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np > 4 --oversubscribe -H ib1n,ib2n ip addr show dev br0 > > Returns a set of bridges looking like: > > 9: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP > group default qlen 1000 > link/ether 94:de:80:ba:37:e4 brd ff:ff:ff:ff:ff:ff > inet 141.76.49.17/26 brd 141.76.49.63 scope global br0 > valid_lft forever preferred_lft forever > inet 10.233.0.82/19 scope global br0 > valid_lft forever preferred_lft forever > inet6 2002:8d4c:3001:48:40de:80ff:feba:37e4/64 scope global > deprecated mngtmpaddr dynamic > valid_lft 59528sec preferred_lft 0sec > inet6 fe80::96de:80ff:feba:37e4/64 scope link tentative dadfailed > valid_lft forever preferred_lft forever > <three overs are similar> > > What is more boggling is that if I attache with a debugger at > opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp_components.c around > line 500 I see that mca_ptl_tcp_component.remote_connections is false. > This means that the way I set up component parameters is ignored. > > -- > Regards, > Maksym Planeta > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users