Prentice Bisbal wrote:
Is there a limit on how many MPI processes can run on a single host?
I have a user trying to test his code on the command-line on a single
host before running it on our cluster like so:
mpirun -np X foo
When he tries to run it on large number of process (X = 256, 512), the
program fails, and I can reproduce this with a simple "Hello, World"
program:
$ mpirun -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)
I've done some testing and found that X <155 for this program to work.
Is this a bug, part of the standard, or design/implementation decision?
One possible issue is the limit on the number of descriptors. The error
message should be pretty helpful and descriptive, but perhaps you're
using an older version of OMPI. If this is your problem, one workaround
is something like this:
unlimit descriptors
mpirun -np 256 mpihello
though I guess the syntax depends on what shell you're running. Another
is to set the MCA parameter opal_set_max_sys_limits to 1.