Hello everybody, I'm a beginner in Open MPI world. Maybe it's a simple problem, but I cannot figure out what happen on it...
My situation is I use 4 hosts totally, and their IP address are static. I can't do *mpirun* over 1500 times almost at the same time. (but it's always okay less than 1000 times) I got many "*ssh_exchange_identification: connection closed by remote host*" errors. -------------------------------------------------------------------------------------------------------------------------- My Open MPI version : 1.6.2 -------------------------------------------------------------------------------------------------------------------------- I use a simple bash shell script file to run my Open MPI file(named openMPI_test) Here is my script content : for (( index=0; index<2000 ; index++)) do (time mpirun --hostfile my_hostfile openMPI_test &) >> file 2>&1 done p.s.1 "my_hostfile" file lists 4 hosts' IP address. p.s.2 "openMPI_test" file ask each host do the same thing, it means: if(rank == 0){ for(i=0 ; i<65535 ; i++) temp = i/(i+1); } else if(rank == 1){ for(i=0 ; i<65535 ; i++) temp = i/(i+1); } else if(rank == 2){ for(i=0 ; i<65535 ; i++) temp = i/(i+1); } else if(rank == 3){ for(i=0 ; i<65535 ; i++) temp = i/(i+1); } -------------------------------------------------------------------------------------------------------------------------- At the first, I thought I have some system problems, so I tried to modify my /etc/ssh/sshd_config file. I set Max_Sessions up to 65535, and MaxStartups up to 65535, but the result made me so sad because it still didn't work :(( I'm not sure if there are some settings or limits in Open MPI, or I just have another system problems? I really hope someone can help me.. Thank you all very very much!! Best Wishes, Jen