Re: [OMPI users] memory per core/process

Reuti Tue, 2 Apr 2013 07:42:23 -0400

Hi,

Am 02.04.2013 um 13:22 schrieb Duke Nguyen:


> On 4/1/13 9:20 PM, Ralph Castain wrote:
>> It's probably the same problem - try running 'mpirun -npernode 1 -tag-output 
>> ulimit -a"  on the remote nodes and see what it says. I suspect you'll find 
>> that they aren't correct.
> 
> Somehow I could not run your advised CMD:
> 
> $ qsub -l nodes=4:ppn=8 -I
> qsub: waiting for job 481.biobos to start
> qsub: job 481.biobos ready
> 
> $ /usr/local/bin/mpirun -npernode 1 -tag-output ulimit -a
> --------------------------------------------------------------------------
> mpirun was unable to launch the specified application as it could not find an 
> executable:

`ulimit` is a shell builtin:

$ type ulimit
ulimit is a shell builtin

It should work wit:

$ /usr/local/bin/mpirun -npernode 1 -tag-output  sh -c "ulimit -a"

-- Reuti


> Executable: ulimit
> Node: node0108.biobos
> 
> while attempting to start process rank 0.
> --------------------------------------------------------------------------
> 4 total processes failed to start
> 
> But anyway, I figured out the reason. Yes, it is the cluster nodes that did 
> not update ulimit settings (our system is a diskless node with warewulf so 
> basically we have to update the vnfs and reboot all nodes before the nodes 
> can run with new settings).
> 
> Thanks for all the helps :)
> 
> D.
> 
>> 
>> BTW: the "-tag-output'" option marks each line of output with the rank of 
>> the process. Since all the outputs will be interleaved, this will help you 
>> identify what came from each node.
>> 
>> 
>> On Mar 31, 2013, at 11:30 PM, Duke Nguyen <duke.li...@gmx.com> wrote:
>> 
>>> On 3/31/13 12:20 AM, Duke Nguyen wrote:
>>>> I should really have asked earlier. Thanks for all the helps.
>>> I think I was excited too soon :). Increasing stacksize does help if I run 
>>> a job in a dedicated server. Today I tried to modify the cluster 
>>> (/etc/security/limits.conf, /etc/init.d/pbs_mom) and tried to run a 
>>> different job with 4 nodes/8 core each (nodes=4:ppn=8), but I still get the 
>>> mpirun error. My ulimit now reads:
>>> 
>>> $ ulimit -a
>>> core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> scheduling priority             (-e) 0
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 8271027
>>> max locked memory       (kbytes, -l) unlimited
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 32768
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> real-time priority              (-r) 0
>>> stack size              (kbytes, -s) unlimited
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 8192
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>>> 
>>> Any other advice???
>>> 
>>>> On 3/30/13 10:28 PM, Ralph Castain wrote:
>>>>> FWIW: there is an MCA param that helps with such problems:
>>>>> 
>>>>>         opal_set_max_sys_limits
>>>>>                  "Set to non-zero to automatically set any system-imposed 
>>>>> limits to the maximum allowed",
>>>>> 
>>>>> At the moment, it only sets the limits on number of files open, and max 
>>>>> size of a file we can create. Easy enough to add the stack size, though 
>>>>> as someone pointed out, it has some negatives as well.
>>>>> 
>>>>> 
>>>>> On Mar 30, 2013, at 7:35 AM, Gustavo Correa <g...@ldeo.columbia.edu> 
>>>>> wrote:
>>>>> 
>>>>>> On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:
>>>>>> 
>>>>>>> On 3/30/13 8:20 PM, Reuti wrote:
>>>>>>>> Am 30.03.2013 um 13:26 schrieb Tim Prince:
>>>>>>>> 
>>>>>>>>> On 03/30/2013 06:36 AM, Duke Nguyen wrote:
>>>>>>>>>> On 3/30/13 5:22 PM, Duke Nguyen wrote:
>>>>>>>>>>> On 3/30/13 3:13 PM, Patrick Bégou wrote:
>>>>>>>>>>>> I do not know about your code but:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1) did you check stack limitations ? Typically intel fortran codes 
>>>>>>>>>>>> needs large amount of stack when the problem size increase.
>>>>>>>>>>>> Check ulimit -a
>>>>>>>>>>> First time I heard of stack limitations. Anyway, ulimit -a gives
>>>>>>>>>>> 
>>>>>>>>>>> $ ulimit -a
>>>>>>>>>>> core file size          (blocks, -c) 0
>>>>>>>>>>> data seg size           (kbytes, -d) unlimited
>>>>>>>>>>> scheduling priority             (-e) 0
>>>>>>>>>>> file size               (blocks, -f) unlimited
>>>>>>>>>>> pending signals                 (-i) 127368
>>>>>>>>>>> max locked memory       (kbytes, -l) unlimited
>>>>>>>>>>> max memory size         (kbytes, -m) unlimited
>>>>>>>>>>> open files                      (-n) 1024
>>>>>>>>>>> pipe size            (512 bytes, -p) 8
>>>>>>>>>>> POSIX message queues     (bytes, -q) 819200
>>>>>>>>>>> real-time priority              (-r) 0
>>>>>>>>>>> stack size              (kbytes, -s) 10240
>>>>>>>>>>> cpu time               (seconds, -t) unlimited
>>>>>>>>>>> max user processes              (-u) 1024
>>>>>>>>>>> virtual memory          (kbytes, -v) unlimited
>>>>>>>>>>> file locks                      (-x) unlimited
>>>>>>>>>>> 
>>>>>>>>>>> So stack size is 10MB??? Does this one create problem? How do I 
>>>>>>>>>>> change this?
>>>>>>>>>> I did $ ulimit -s unlimited to have stack size to be unlimited, and 
>>>>>>>>>> the job ran fine!!! So it looks like stack limit is the problem. 
>>>>>>>>>> Questions are:
>>>>>>>>>> 
>>>>>>>>>> * how do I set this automatically (and permanently)?
>>>>>>>>>> * should I set all other ulimits to be unlimited?
>>>>>>>>>> 
>>>>>>>>> In our environment, the only solution we found is to have mpirun run 
>>>>>>>>> a script on each node which sets ulimit (as well as environment 
>>>>>>>>> variables which are more convenient to set there than in the mpirun), 
>>>>>>>>> before starting the executable.  We had expert recommendations 
>>>>>>>>> against this but no other working solution.  It seems unlikely that 
>>>>>>>>> you would want to remove any limits which work at default.
>>>>>>>>> Stack size unlimited in reality is not unlimited; it may be limited 
>>>>>>>>> by a system limit or implementation.  As we run up to 120 threads per 
>>>>>>>>> rank and many applications have threadprivate data regions, ability 
>>>>>>>>> to run without considering stack limit is the exception rather than 
>>>>>>>>> the rule.
>>>>>>>> Even if I would be the only user on a cluster of machines, I would 
>>>>>>>> define this in any queuingsystem to set the limits for the job.
>>>>>>> Sorry if I dont get this correctly, but do you mean I should set this 
>>>>>>> using Torque/Maui (our queuing manager) instead of the system itself 
>>>>>>> (/etc/security/limits.conf and /etc/profile.d/)?
>>>>>> Hi Duke
>>>>>> 
>>>>>> We do both.
>>>>>> Set memlock and stacksize to unlimited, and increase the maximum number 
>>>>>> of
>>>>>> open files  in the pbs_mom script in /etc/init.d, and do the same in 
>>>>>> /etc/security/limits.conf.
>>>>>> This maybe an overzealous  "belt and suspenders" policy, but it works.
>>>>>> As everybody else said, a small stacksize is a common cause of 
>>>>>> segmentation fault in
>>>>>> large codes.
>>>>>> Basically all codes that we run here have this problem, with too many
>>>>>> automatic arrays, structures, etc in functions and subroutines.
>>>>>> But also a small memlock is trouble for OFED/Infinband, and the small 
>>>>>> (default)
>>>>>> max number of open file handles may hit the limit easily if many programs
>>>>>> (or poorly written  programs) are running in the same node.
>>>>>> The default Linux distribution limits don't seem to be tailored for HPC, 
>>>>>> I guess.
>>>>>> 
>>>>>> I hope this helps,
>>>>>> Gus Correa
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] memory per core/process

Reply via email to