Hello all. I could really use help trying to figure out why mpirun is hanging as detailed in my previous message yesterday, 16 July. Since there's been no response, please allow me to give a short summary.
-Open MPI 1.2.3 on GNU/Linux, 2.6.21 kernel, gcc 4.1.2, bash 3.2.15 is default shell -Open MPI installed to /usr/local, which is in non-interactive session path -Systems are AMD64, using ethernet as interconnect, on private IP network mpirun hangs whenever I invoke any process running on a remote node. It runs a job fine if I invoke it so that it only runs on the local node. Ctrl+C never successfully cancels an mpirun job -- I have to use kill -9. I'm asking for help trying to figure what steps have been taken by mpirun, and how I can figure out where things are getting stuck / crashing. What could be happening on the remote nodes? What debugging steps can I take? Without MPI running, the cluster is of no use, so I would really appreciate some help here. ____________________________________________________________________________________ Need Mail bonding? Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users. http://answers.yahoo.com/dir/?link=list&sid=396546091