Axel, thanks! As far as the NFS is concerned, we have quite a heavy load on the system. I've seen a few jobs that terminated prematurely for no reason with the MPI errors, using the code from the CVS from mid-August '05.
I have not seen the same thing from a very recent version, so I'd like to think that if there was a problem, it was somehow fixed in the meantime. My question was caused by the head node being unavailable for more than an hour, so probably expecting MPI to behave nicely with such a long outage is not really reasonable. Kostya --- Axel Kohlmeyer <akohlmey at vitae.cmm.upenn.edu> wrote: > this handled by the MPI _implementation_. you may want to try a > different package (there are several others besides MPICH which is > IMNSHO somewhat clunky , see e.g.: > http://www.lam-mpi.org/mpi/implementations/shortlist.php ), > or hack your local installation to increase the delay before > there is a timeout and the library considers the cpu0 process > as dead. __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
