Brian Barrett wrote:

On Wed, 2006-06-28 at 09:43 -0400, Patrick Jessee wrote:
Hello. I've tracked down the source of the previously reported startup problem with Openmpi 1.1. On startup, it fails with the messages:

mca_oob_tcp_accept: accept() failed with errno 9.
   :

This didn't happen with 1.0.2.

The trigger for this behavior is if standard input happens to be closed before calling mpirun. In this particular case, mpirun was being started by a wrapper Bourne shell script that had standard input closed. It's fairly easy to reproduce. Interestingly, the problem is not seen if standard input is opened from an arbitrary device such as /dev/null.

This is the first MPI with which we've seen this behavior, and it didn't happen with 1.0.2 so something must have been introduced in 1.1. Perhaps 1.1 makes some assumptions about the state of the standard file descriptors.

Hopefully this feedback is helpful to someone in resolving the problem.

Yup, in order to fix some other things with standard input that users
rightly were complaining about, we changed some standard input handling
between 1.0.2 and 1.1. My recommendation is to just tie it to /dev/null
instead.  We're unlikely to fix this issue in the near future.
Thanks for the reply. We can work around the issue in the near future; however, this seems like a restriction/assumption that could possibly be addressed in OpenMPI in the long run. (It's easy to work-around/avoid once you know what the issue/restriction is, but tracking down the problem takes some time.) Anyway, perhaps this it could be placed on a todo list so it doesn't get lost. I'd be happy to provide any additional information if needed.

Regards,

Patrick

<<attachment: pj.vcf>>

Reply via email to