I did try with both MaxSessions and MaxStartups set to 200, unfortunately it did not help - I still got the same errors as before.
> Date: Sat, 14 Apr 2012 12:58:49 -0400 > From: Tim Miller <btamil...@gmail.com> > Subject: Re: [OMPI users] OpenMPI fails to run with -np larger than 10 > To: Open MPI Users <us...@open-mpi.org> > Message-ID: > <CAMsSzSBxTv4u1MLE=ZGMc73N2+k6fkj3-KP_PQB2H=p+yvz...@mail.gmail.com> > Content-Type: text/plain; charset=windows-1252 > > This may or may not be related, but I've had similar issues on RHEL > 6.x and clones when using the SSH job launcher and running more than > 10 processes per node. It sounds like you're only distributing 6 > processes per node, so it doesn't sound like your problem, but you > might want to check your hostfile and make sure you're not > oversubscribing one of the nodes. The trick I've found to launch > 10 > processes per node via SSH is to set MaxSessions to some number higher > than 10 in /etc/ssh/sshd_config (I choose 100, somewhat arbitrarily). > > Assuming you're using the SSH launcher on an RHEL 6 derivative, you > might give this a try. It's an SSH issue, not an OpenMPI one. > > Regards, > Tim > > On Thu, Apr 12, 2012 at 9:04 AM, Seyyed Mohtadin Hashemi > <haa...@gmail.com> wrote: > > Hello, > > > > I have a very peculiar problem: I have a micro cluster with three nodes > (18 > > cores total); the nodes are clones of each other and connected to a > frontend > > via Ethernet and Debian squeeze as the OS for all nodes. When I run > parallel > > jobs I can used up ?-np 10? if I go further the job crashes, I have > > primarily done tests with GROMACS (because that is what I will be running) > > but have also used OSU Micro-Benchmarks 3.5.2. > > > > For a simple parallel job I use: ?path/mpirun ?hostfile path/hostfile ?np > XX > > ?d ?display-map path/mdrun_mpi ?s path/topol.tpr ?o path/output.trr? > > > > (path is global) For ?np XX being smaller than or 10 it works, however as > > soon as I make use of 11 or larger the whole thing crashes. The terminal > > dump is attached to this mail: when_working.txt is for ??np 10?, > > when_crash.txt is for ??np 12?, and OpenMPI_info.txt is output from > > ?path/mpirun --bynode --hostfile path/hostfile --tag-output ompi_info -v > > ompi full ?parsable? > > > > I have tried OpenMPI v.1.4.2 all the way up to beta v1.5.5, and all yield > > the same result. > > > > The output files are from a new install I did today: I formatted all nodes > > and started from a fresh minimal install of Squeeze and used "apt-get > > install gromacs gromacs-openmpi" and installed all dependencies. Then I > ran > > two jobs using the parameters described above, I also did one with OSU > bench > > (data is not included) it also crashed with ?-np? larger than 10. > > > > I hope somebody can help figure out what is wrong and how I can fix it. > > > > Best regards, > > Mohtadin > > > > > ***************************************************************************** > > ** > > ** > > ** WARNING: This email contains an attachment of a very suspicious type. > > ** > > ** You are urged NOT to open this attachment unless you are absolutely > > ** > > ** sure it is legitimate. Opening this attachment may cause irreparable > > ** > > ** damage to your computer and your files. If you have any questions > > ** > > ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. > > ** > > ** > > ** > > ** This warning was added by the IU Computer Science Dept. mail scanner. > > ** > > > ***************************************************************************** > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users >