I use a shared memory system and for my MPI algorithm, I set the IP-addresses for all the nodes as 127.0.0.1 in some_hostfile and I execute the program using "mpirun --machinefile some_hostfile -np 4 prog-name". I think, by default the sm btl switch is ON. Will this help in such a case? I am not sure but you may just give it a try, if you haven't tried this, Bill.
-Sarang. Quoting Brian Barrett <bbarr...@lanl.gov>: > Bill - > > This is a known issue in all released versions of Open MPI. I have a > patch that hopefully will fix this issue in 1.2.3. It's currently > waiting on people in the OPen MPI team to verify I didn't do > something stupid. > > Brian > > On May 29, 2007, at 9:59 PM, Bill Saphir wrote: > > > > > George, > > > > This is one of the things I tried, and the setting the oob > > interface did not work, > > with the error message below. > > > > Also, per this thread: > > http://www.open-mpi.org/community/lists/users/2007/05/3319.php > > I believe it is oob_tcp_include, not oob_tcp_if_include. The latter > > is silently > > ignored in 1.2, as far as I can tell. > > > > Interestingly, telling the MPI layer to use lo0 (or to not use tcp > > at all) works fine. > > But when I try to do the same for the OOB layer, it complains. The > > full error is: > > > > [mymac.local:07001] [0,0,0] mca_oob_tcp_init: invalid address '' > > returned for selected oob interfaces. > > [mymac.local:07001] [0,0,0] ORTE_ERROR_LOG: Error in file oob_tcp.c > > at line 1196 > > > > mpirun actually hangs at this point and no processes are spawned. I > > have to ^C to stop it. > > I see this behavior on both Mac OS and on Linux with 1.2.2. > > > > Bill > > > > > > George Bosilica wrote: > >> There are 2 sets of sockets: one for the oob layer and one for the > >> MPI layer (at least if TCP support is enabled). Therefore, in > order > >> to achieve what you're looking for you should add to the command > line > >> "--mca oob_tcp_if_include lo0 --mca btl_tcp_if_include lo0". > >> On May 29, 2007, at 3:58 PM, Bill Saphir wrote: > >> > > > > ----- original message below --- > > > >> We have run into the following problem: > >> > >> - start up Open MPI application on a laptop > >> - disconnect from network > >> - application hangs > >> > >> I believe that the problem is that all sockets created by Open MPI > >> are bound to the external network interface. > >> For example, when I start up a 2 process MPI job on my Mac (no > >> hosts specified), I get the following tcp > >> connections. 192.168.5.2 is an address on my LAN. > >> > >> tcp4 0 0 192.168.5.2.49459 192.168.5.2.49463 > >> ESTABLISHED > >> tcp4 0 0 192.168.5.2.49463 192.168.5.2.49459 > >> ESTABLISHED > >> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49462 > >> ESTABLISHED > >> tcp4 0 0 192.168.5.2.49462 192.168.5.2.49456 > >> ESTABLISHED > >> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49460 > >> ESTABLISHED > >> tcp4 0 0 192.168.5.2.49460 192.168.5.2.49456 > >> ESTABLISHED > >> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49458 > >> ESTABLISHED > >> tcp4 0 0 192.168.5.2.49458 192.168.5.2.49456 > >> ESTABLISHED > >> > >> Since this application is confined to a single machine, I would > >> like it to use 127.0.0.1, > >> which will remain available as the laptop moves around. I am > >> unable to force it to bind > >> sockets to this address, however. > >> > >> Some of the things I've tried are: > >> - explicitly setting the hostname to 127.0.0.1 (--host 127.0.0.1) > >> - turning off the tcp btl (--mca btl ^tcp) and other variations > (-- > >> mca btl self,sm) > >> - using --mca oob_tcp_include lo0 > >> > >> The first two have no effect. The last one results in an error > >> message of: > >> [myhost.locall:05830] [0,0,0] mca_oob_tcp_init: invalid address '' > >> returned for selected oob interfaces. > >> > >> Is there any way to force Open MPI to bind all sockets to > 127.0.0.1? > >> > >> As a side question -- I'm curious what all of these tcp > >> connections are used for. As I increase the number > >> of processes, it looks like there are 4 sockets created per MPI > >> process, without using the tcp btl. > >> Perhaps stdin/out/err + control? > >> > >> Bill > >> > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >