Prentice Bisbal wrote:

Is there a limit on how many MPI processes can run on a single host?

I have a user trying to test his code on the command-line on a single
host before running it on our cluster like so:

mpirun -np X foo

When he tries to run it on large number of process (X = 256, 512), the
program fails, and I can reproduce this with a simple "Hello, World"
program:

$ mpirun -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)

I've done some testing and found that X <155 for this program to work.
Is this a bug, part of the standard, or design/implementation decision?
One possible issue is the limit on the number of descriptors. The error message should be pretty helpful and descriptive, but perhaps you're using an older version of OMPI. If this is your problem, one workaround is something like this:

unlimit descriptors
mpirun -np 256 mpihello

though I guess the syntax depends on what shell you're running. Another is to set the MCA parameter opal_set_max_sys_limits to 1.

Reply via email to