On Jun 29, 2010, at 9:35 PM, 王睿 wrote: > Thanks for the feedback. More below: > > Is there any MPI implementions which meet the following requirements: > > 1, it doesn't terminate the whole job when a node is dead? > > 2, it allows the spare node to replace the dead node and take over the work > of the dead node? > > As far as I know, FT-MPI meets the two requirements, but it hasn't updated > since 2004. Open-mpi is said to combine serveral projects including FT-MPI, > but so far, it only provides checkpoinr/restart as a way of fault-tolerance.
I know that the UT people have been working on such things over the past few years, but I don't know the current status. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/