Eugene Loh wrote:
> Prentice Bisbal wrote:
> 
>> Is there a limit on how many MPI processes can run on a single host?
>>
>> I have a user trying to test his code on the command-line on a single
>> host before running it on our cluster like so:
>>
>> mpirun -np X foo
>>
>> When he tries to run it on large number of process (X = 256, 512), the
>> program fails, and I can reproduce this with a simple "Hello, World"
>> program:
>>
>> $ mpirun -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>>
>> I've done some testing and found that X <155 for this program to work.
>> Is this a bug, part of the standard, or design/implementation decision?
>>  
>>
> One possible issue is the limit on the number of descriptors.  The error
> message should be pretty helpful and descriptive, but perhaps you're
> using an older version of OMPI.  If this is your problem, one workaround
> is something like this:
> 
> unlimit descriptors
> mpirun -np 256 mpihello

Looks like I'm not allowed to set that as a regular user:

$ ulimit -n 2048
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Since I am the admin, I could change that elsewhere, but I'd rather not
do that system-wide unless absolutely necessary.

> 
> though I guess the syntax depends on what shell you're running.  Another
> is to set the MCA parameter opal_set_max_sys_limits to 1.

That didn't work either:

$ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ

Reply via email to