Hi! 
  I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on Redhat 
Linux x86_64. 



I run a test like this: just killed the orted process and the job hung for a 
long time (hang for 2~3 hours then I killed the job).



I have the follow questions:



     when network failed or host failed or orted deamon was killed by accident, 
How long would the running mpi job notice and exit?  



     Does OpenMPI support a heartbeat mechanism or how could I fast detect the 
failture to avoid the mpi job hang?





thanks a lot!



_________________________________________________________________
打工,挣钱,买房子,快来MClub一起”金屋藏娇”!
http://club.msn.cn/?from=10

Reply via email to