Ralph Castain wrote:
> On Mar 4, 2010, at 7:27 AM, Prentice Bisbal wrote:
> 
>>
>> Ralph Castain wrote:
>>> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
>>>
>>>> Eugene Loh wrote:
>>>>> Prentice Bisbal wrote:
>>>>>> Eugene Loh wrote:
>>>>>>
>>>>>>> Prentice Bisbal wrote:
>>>>>>>
>>>>>>>> Is there a limit on how many MPI processes can run on a single host?
>>>>>>>>
>>>>> Depending on which OMPI release you're using, I think you need something
>>>>> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
>>>>> 1000+ descriptors.  You're quite possibly up against your limit, though
>>>>> I don't know for sure that that's the problem here.
>>>>>
>>>>> You say you're running 1.2.8.  That's "a while ago", so would you
>>>>> consider updating as a first step?  Among other things, newer OMPIs will
>>>>> generate a much clearer error message if the descriptor limit is the
>>>>> problem.
>>>> While 1.2.8 might be "a while ago", upgrading software just because it's
>>>> "old" is not a valid argument.
>>>>
>>>> I can install the lastest version of OpenMPI, but it will take a little
>>>> while.
>>> Maybe not because it is "old", but Eugene is correct. The old versions of 
>>> OMPI required more file descriptors than the newer versions.
>>>
>>> That said, you'll still need a minimum of 4x the number of procs on the 
>>> node even with the latest release. I suggest talking to your sys admin 
>>> about getting the limit increased. It sounds like it has been set 
>>> unrealistically low.
>>>
>>>
>> I *am* the system admin! ;)
>>
>> The file descriptor limit is the default for RHEL,  1024, so I would not
>> characterize it as "unrealistically low".  I assume someone with much
>> more knowledge of OS design and administration than me came up with this
>> default, so I'm hesitant to change it without good reason. If there was
>> good reason, I'd have no problem changing it. I have read that setting
>> it to more than 8192 can lead to system instability.
> 
> Never heard that, and most HPC systems have it set a great deal higher 
> without trouble.

I just read that the other day. Not sure where, though. Probably a forum
posting somewhere. I'll take your word for it that it's safe to increase
if necessary.
> 
> However, the choice is yours. If you have a large SMP system, you'll 
> eventually be forced to change it or severely limit its usefulness for MPI. 
> RHEL sets it that low arbitrarily as a way of saving memory by keeping the fd 
> table small, not because the OS can't handle it.
> 
> Anyway, that is the problem. Nothing we (or any MPI) can do about it as the 
> fd's are required for socket-based communications and to forward I/O.

Thanks, Ralph, that's exactly the answer I was looking for - where this
limit was coming from.

I can see how on a large SMP system the fd limit would have to be
increased. In normal circumstances, my cluster nodes should never have
more than 8 MPI processes running at once (per node), so I shouldn't be
hitting that limit on my cluster.

> 
> 
>> This is admittedly unusual situation - in normal use, no one would ever
>> want to run that many processes on a single system - so I don't see any
>> justification for modifying that setting.
>>
>> Yesterday I spoke to the researcher who originally asked me this limit -
>> he just wanted to know what the limit was, and doesn't actually plan to
>> do any "real" work with that many processes on a single node, rendering
>> this whole discussion academic.
>>
>> I did install OpenMPI 1.4.1 yesterday, but I haven't had a chance to
>> test it yet. I'll post the results of testing here.
>>
>>>>>>>> I have a user trying to test his code on the command-line on a single
>>>>>>>> host before running it on our cluster like so:
>>>>>>>>
>>>>>>>> mpirun -np X foo
>>>>>>>>
>>>>>>>> When he tries to run it on large number of process (X = 256, 512), the
>>>>>>>> program fails, and I can reproduce this with a simple "Hello, World"
>>>>>>>> program:
>>>>>>>>
>>>>>>>> $ mpirun -np 256 mpihello
>>>>>>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>>>>>>>> exited on signal 15 (Terminated).
>>>>>>>> 252 additional processes aborted (not shown)
>>>>>>>>
>>>>>>>> I've done some testing and found that X <155 for this program to work.
>>>>>>>> Is this a bug, part of the standard, or design/implementation decision?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> One possible issue is the limit on the number of descriptors.  The error
>>>>>>> message should be pretty helpful and descriptive, but perhaps you're
>>>>>>> using an older version of OMPI.  If this is your problem, one workaround
>>>>>>> is something like this:
>>>>>>>
>>>>>>> unlimit descriptors
>>>>>>> mpirun -np 256 mpihello
>>>>>>>
>>>>>> Looks like I'm not allowed to set that as a regular user:
>>>>>>
>>>>>> $ ulimit -n 2048
>>>>>> -bash: ulimit: open files: cannot modify limit: Operation not permitted
>>>>>>
>>>>>> Since I am the admin, I could change that elsewhere, but I'd rather not
>>>>>> do that system-wide unless absolutely necessary.
>>>>>>
>>>>>>> though I guess the syntax depends on what shell you're running.  Another
>>>>>>> is to set the MCA parameter opal_set_max_sys_limits to 1.
>>>>>>>
>>>>>> That didn't work either:
>>>>>>
>>>>>> $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
>>>>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>>>>>> exited on signal 15 (Terminated).
>>>>>> 252 additional processes aborted (not shown)
>>
>> -- 
>> Prentice Bisbal
>> Linux Software Support Specialist/System Administrator
>> School of Natural Sciences
>> Institute for Advanced Study
>> Princeton, NJ
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ

Reply via email to