Hello everybody,

I'm a beginner in Open MPI world.
Maybe it's a simple problem, but I cannot figure out what happen on it...

My situation is
I use 4 hosts totally, and their IP address are static.
I can't do *mpirun* over 1500 times almost at the same time.
(but it's always okay less than 1000 times)
I got many "*ssh_exchange_identification: connection closed by remote host*"
errors.

--------------------------------------------------------------------------------------------------------------------------
My Open MPI version : 1.6.2
--------------------------------------------------------------------------------------------------------------------------
I use a simple bash shell script file to run my Open MPI file(named
openMPI_test)
Here is my script content :

for (( index=0; index<2000 ; index++))
   do
       (time mpirun --hostfile my_hostfile openMPI_test &) >> file 2>&1
   done


p.s.1 "my_hostfile" file lists 4 hosts' IP address.
p.s.2 "openMPI_test" file ask each host do the same thing, it means:
          if(rank == 0){      for(i=0 ; i<65535 ; i++)    temp = i/(i+1);  }
          else if(rank == 1){      for(i=0 ; i<65535 ; i++)    temp =
i/(i+1);  }
          else if(rank == 2){      for(i=0 ; i<65535 ; i++)    temp =
i/(i+1);  }
          else if(rank == 3){      for(i=0 ; i<65535 ; i++)    temp =
i/(i+1);  }
--------------------------------------------------------------------------------------------------------------------------

At the first, I thought I have some system problems,
so I tried to modify my /etc/ssh/sshd_config file.
I set Max_Sessions up to 65535, and MaxStartups up to 65535,
but the result made me so sad because it still didn't work :((

I'm not sure if there are some settings or limits in Open MPI,
or I just have another system problems?

I really hope someone can help me..
Thank you all very very much!!



Best Wishes,
Jen

Reply via email to