Nevermind, found the answer in:
http://www.open-mpi.org/community/lists/users/2005/10/0250.php
"-mca btl_tcp_exclude eth1 "
-Galen
+
Galen Arnold, consulting group--system engineer
National Center for Supercomputing Applications
1205 W. Clark St. (217) 244-3473
Urbana, IL 61801 arno...@ncsa.uiuc.edu
On Thu, 12 Apr 2007, Galen Arnold wrote:
What was the fix last time [openmpi 1.2 is in action below] ?
[arnoldg@honest3 mpi]$ !mpirun
mpirun --mca btl self,sm,tcp -np 4 -machinefile hosts allall_openmpi_icc
50 50 1000
[honest1][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
mpirun: killing job...
mpirun noticed that job rank 0 with PID 21119 on node honest1 exited on
signal 15 (Terminated).
3 additional processes aborted (not shown)
Troy and I talked about this off-list and resolved that the issue was
with the TCP setup on the nodes.
But it is worth noting that we had previously fixed a bug in the TCP
setup in 1.0.2 with respect to the SEGVs that Troy was seeing -- hence,
when he tested the 1.0.3 prerelease tarballs, there were no SEGVs.
-Galen
+
Galen Arnold, consulting group--system engineer
National Center for Supercomputing Applications
1205 W. Clark St. (217) 244-3473
Urbana, IL 61801 arno...@ncsa.uiuc.edu
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users