Re: [slurm-users] ulimit in sbatch script

2018-04-17 Thread Mahmood Naderan
Great. Thank you very much. It passed the problematic point. On Tue, Apr 17, 2018, 19:24 Ole Holm Nielsen wrote: > On 04/17/2018 04:38 PM, Mahmood Naderan wrote: > > That parameter is used in slurm.conf. Should I modify that only on the > > head node? Or all nodes? Then should I restart slurm

Re: [slurm-users] ulimit in sbatch script

2018-04-17 Thread Ole Holm Nielsen
On 04/17/2018 04:38 PM, Mahmood Naderan wrote: That parameter is used in slurm.conf. Should I modify that only on the head node? Or all nodes? Then should I restart slurm processes? Yes, definitely! I collected the detailed instructions here: https://wiki.fysik.dtu.dk/niflheim/Slurm_configurat

Re: [slurm-users] ulimit in sbatch script

2018-04-17 Thread Mahmood Naderan
That parameter is used in slurm.conf. Should I modify that only on the head node? Or all nodes? Then should I restart slurm processes? Regards, Mahmood On Tue, Apr 17, 2018 at 4:18 PM, Chris Samuel wrote: > On Tuesday, 17 April 2018 7:23:40 PM AEST Mahmood Naderan wrote: > >> [hamid@rocks7 ca

Re: [slurm-users] ulimit in sbatch script

2018-04-17 Thread Chris Samuel
On Tuesday, 17 April 2018 7:23:40 PM AEST Mahmood Naderan wrote: > [hamid@rocks7 case1_source2]$ scontrol show config | fgrep VSizeFactor > VSizeFactor = 110 percent Great, I think that's the cause of the limit you are seeing.. VSizeFactor Memory specifications

Re: [slurm-users] ulimit in sbatch script

2018-04-17 Thread Mahmood Naderan
See [hamid@rocks7 case1_source2]$ scontrol show config | fgrep VSizeFactor VSizeFactor = 110 percent Regards, Mahmood On Tue, Apr 17, 2018 at 12:51 PM, Chris Samuel wrote: > On Tuesday, 17 April 2018 5:08:09 PM AEST Mahmood Naderan wrote: > >> So, UsePAM has not been set. So, slu

Re: [slurm-users] ulimit in sbatch script

2018-04-17 Thread Chris Samuel
On Tuesday, 17 April 2018 5:08:09 PM AEST Mahmood Naderan wrote: > So, UsePAM has not been set. So, slurm shouldn't limit anything. Is > that correct? however, I see that slurm limits the virtual memory size What does this say? scontrol show config | fgrep VSizeFactor -- Chris Samuel : htt

Re: [slurm-users] ulimit in sbatch script

2018-04-17 Thread Mahmood Naderan
Hi Bill, Sorry for the late reply. As I greped for pam_limits.so, I see [root@rocks7 ~]# grep -r pam_limits.so /etc/pam.d/ /etc/pam.d/sudo:sessionrequired pam_limits.so /etc/pam.d/runuser:session requiredpam_limits.so /etc/pam.d/sudo-i:sessionrequired pam_limit

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Bill Barth
Specifying --mem to Slurm only tells it to find a node that has that much, not to enforce a limit as far as I know. That node has that much so it finds it. You probably want to enable UsePAM and setup the pam.d slurm files and /etc/security/limits.conf to keep users under the 64000MB physical me

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Mahmood Naderan
Bill, Thing is that both user and root see unlimited virtual memory when they directly ssh to the node. However, when the job is submitted, the user limits change. That means, slurm modifies something. The script is #SBATCH --job-name=hvacSteadyFoam #SBATCH --output=hvacSteadyFoam.log #SBATCH --n

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Bill Barth
Mahmood, sorry to presume. I meant to address the root user and your ssh to the node in your example. At our site, we use UsePAM=1 in our slurm.conf, and our /etc/pam.d/slurm and slurm.pam files both contain pam_limits.so, so it could be that way for you, too. I.e. Slurm could be setting the l

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Mahmood Naderan
Excuse me... I think the problem is not pam.d. How do you interpret the following output? [hamid@rocks7 case1_source2]$ sbatch slurm_script.sh Submitted batch job 53 [hamid@rocks7 case1_source2]$ tail -f hvacSteadyFoam.log max memory size (kbytes, -m) 65536000 open files

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Mahmood Naderan
BTW, the memory size of the node is 64GB. Regards, Mahmood On Sun, Apr 15, 2018 at 10:56 PM, Mahmood Naderan wrote: > I actually have disabled the swap partition (!) since the system goes > really bad and based on my experience I have to enter the room and > reset the affected machine (!). Oth

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Bill Barth
Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Mahmood Naderan
I actually have disabled the swap partition (!) since the system goes really bad and based on my experience I have to enter the room and reset the affected machine (!). Otherwise I have to wait for long times to see it get back to normal. When I ssh to the node with root user, the ulimit -a says u

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Ole Holm Nielsen
Hi Mahmood, It seems your compute node is configured with this limit: virtual memory (kbytes, -v) 72089600 So when the batch job tries to set a higher limit (ulimit -v 82089600) than permitted by the system (72089600), this must surely get rejected, as you have discovered! You may