Are you using in any of your /etc/pam.d/ configuration files? 
That would be enforcing /etc/security/limits.conf for all users which are 
usually unlimited for root. Root’s almost always allowed to do stuff bad enough 
to crash the machine or run it out of resources. If the /etc/pam.d/sshd file 
has in it, that’s probably where the unlimited setting for root 
is coming from.


Bill Barth, Ph.D., Director, HPC        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445

On 4/15/18, 1:26 PM, "slurm-users on behalf of Mahmood Naderan" 
< on behalf of> wrote:

    I actually have disabled the swap partition (!) since the system goes
    really bad and based on my experience I have to enter the room and
    reset the affected machine (!). Otherwise I have to wait for long
    times to see it get back to normal.
    When I ssh to the node with root user, the ulimit -a says unlimited
    virtual memory. So, it seems that the root have unlimited value while
    users have limited value.
    On Sun, Apr 15, 2018 at 10:26 PM, Ole Holm Nielsen
    <> wrote:
    > Hi Mahmood,
    > It seems your compute node is configured with this limit:
    > virtual memory          (kbytes, -v) 72089600
    > So when the batch job tries to set a higher limit (ulimit -v 82089600) 
    > permitted by the system (72089600), this must surely get rejected, as you
    > have discovered!
    > You may want to reconfigure your compute nodes' limits, for example by
    > setting the virtual memory limit to "unlimited" in your configuration. If
    > the nodes has a very small RAM memory + swap space size, you might 
    > Out Of Memory errors...
    > /Ole

Reply via email to