Hi, I'm having some trouble with resource allocation in the sense that according to how I understood the documentation and applied that to the config file I am expecting some behavior that does not happen.
Here is the relevant excerpt from the config file: 60 SchedulerType=sched/backfill 61 SchedulerParameters=bf_continue,bf_interval=45,bf_resolution=90,max_array_tasks=1000 62 #SchedulerAuth= 63 #SchedulerPort= 64 #SchedulerRootFilter= 65 SelectType=select/cons_res 66 SelectTypeParameters=CR_CPU_Memory 67 FastSchedule=1 ... 102 NodeName=cn_burebista Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 RealMemory=256000 State=UNKNOWN 103 PartitionName=main_compute Nodes=cn_burebista Shared=YES Default=YES MaxTime=76:00:00 State=UP According to the above I have the backfill scheduler enabled with CPUs and Memory configured as resources. I have 56 CPUs and 256GB of RAM in my resource pool. I would expect that he backfill scheduler attempts to allocate the resources in order to fill as much of the cores as possible if there are multiple processes asking for more resources than available. In my case I have the following queue: JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2361 main_comp training mcetatea PD 0:00 1 (Resources) 2356 main_comp skrf_ori jhanca R 58:41 1 cn_burebista 2357 main_comp skrf_ori jhanca R 44:13 1 cn_burebista Jobs 2356 and 2357 are asking for 16 CPUs each, job 2361 is asking for 20 CPUs, meaning in total 52 CPUs As seen from above job 2361(which is started by a different user) is marked as pending due to lack of resources although there are plenty of CPUs and memory available. "scontrol show nodes cn_burebista" gives me the following: NodeName=cn_burebista Arch=x86_64 CoresPerSocket=14 CPUAlloc=32 CPUErr=0 CPUTot=56 CPULoad=21.65 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=cn_burebista NodeHostName=cn_burebista Version=16.05 OS=Linux RealMemory=256000 AllocMem=64000 FreeMem=178166 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A BootTime=2018-03-09T12:04:52 SlurmdStartTime=2018-03-20T10:35:50 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s I'm going through the documentation again and again but I cannot figure out what am I doing wrong ... Why do I have the above situation? What should I change to my config to make this work? scontrol show -dd job <jobid> shows me the following: JobId=2361 JobName=training_carlib UserId=mcetateanu(1000) GroupId=mcetateanu(1001) MCS_label=N/A Priority=4294901726 Nice=0 Account=(null) QOS=(null) JobState=PENDING Reason=Resources Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=3-04:00:00 TimeMin=N/A SubmitTime=2018-03-27T10:30:38 EligibleTime=2018-03-27T10:30:38 StartTime=2018-03-28T10:27:36 EndTime=2018-03-31T14:27:36 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=main_compute AllocNode:Sid=zalmoxis:23690 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) SchedNodeList=cn_burebista NumNodes=1 NumCPUs=20 NumTasks=1 CPUs/Task=20 ReqB:S:C:T=0:0:*:* TRES=cpu=20,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=20 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/mcetateanu/workspace/CarLib/src/_outputs/linux-xeon_e5v4-icc17.0/bin/classifier/train_classifier.sh WorkDir=/home/mcetateanu/workspace/CarLib/src/_outputs/linux-xeon_e5v4-icc17.0/bin/classifier StdErr=/home/mcetateanu/workspace/CarLib/src/_outputs/linux-xeon_e5v4-icc17.0/bin/classifier/training_job_2383.out StdIn=/dev/null StdOut=/home/mcetateanu/workspace/CarLib/src/_out I also changed my config to specify exactly the numver of CPUs and to not let slurm compute the CPUs from Sockets, CoresPerSocket, and ThreadsPerCore. The 2 tasks that I am trying to run have the following output from "scontrol show -dd job <jobid>" but the one asking for 20 CPUs is still pending due to lack of resources: NumNodes=1 NumCPUs=16 NumTasks=1 CPUs/Task=16 ReqB:S:C:T=0:0:*:* TRES=cpu=16,mem=32000M,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* Nodes=cn_burebista CPU_IDs=0-15 Mem=32000 MinCPUsNode=16 MinMemoryCPU=2000M MinTmpDiskNode=0 NumNodes=1 NumCPUs=20 NumTasks=1 CPUs/Task=20 ReqB:S:C:T=0:0:*:* TRES=cpu=20,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* Thank you ------------------------------------------------------------------------------------------- Marius Cetateanu Senior Embedded Software Engineer Engineering Department 1, Driver & Embedded Sony Depthsensing Solutions Tel: +32 (0)28992171 email: marius.cetate...@sony.com Sony Depthsensing Solutions 11 Boulevard de la Plaine, 1050 Brussels, Belgium ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This footnote also confirms that this email message has been checked for all known viruses. Sony DepthSensing Solutions SA/NV Registered Office: 11 Boulevard de la Plaine, 1050 Brussels, Belgium Registered number: RPM/RPR Brussels 0811 784 189 ********************************************************************** ________________________________________ From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf of slurm-users-requ...@lists.schedmd.com [slurm-users-requ...@lists.schedmd.com] Sent: Sunday, April 15, 2018 9:02 PM To: slurm-users@lists.schedmd.com Subject: slurm-users Digest, Vol 6, Issue 21 Send slurm-users mailing list submissions to slurm-users@lists.schedmd.com To subscribe or unsubscribe via the World Wide Web, visit https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=01%7C01%7Cmcetateanu%40softkinetic.com%7C531c46b911e643cc3bad08d5a303860b%7C918620360842404b8ecc17f785a95cfe%7C0&sdata=0J4phgqhMDHOFVqXITNuNY62BWyprqriA75AvslDMG8%3D&reserved=0 or, via email, send a message with subject or body 'help' to slurm-users-requ...@lists.schedmd.com You can reach the person managing the list at slurm-users-ow...@lists.schedmd.com When replying, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..." Today's Topics: 1. Re: ulimit in sbatch script (Mahmood Naderan) 2. Re: ulimit in sbatch script (Bill Barth) 3. Re: ulimit in sbatch script (Mahmood Naderan) 4. Re: ulimit in sbatch script (Mahmood Naderan) 5. Re: ulimit in sbatch script (Bill Barth) ---------------------------------------------------------------------- Message: 1 Date: Sun, 15 Apr 2018 22:56:01 +0430 From: Mahmood Naderan <mahmood...@gmail.com> To: ole.h.niel...@fysik.dtu.dk, Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] ulimit in sbatch script Message-ID: <cada2p2xsyw0tbvgjubi_yrpddo15jalkssqqxdzgzczd8vc...@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" I actually have disabled the swap partition (!) since the system goes really bad and based on my experience I have to enter the room and reset the affected machine (!). Otherwise I have to wait for long times to see it get back to normal. When I ssh to the node with root user, the ulimit -a says unlimited virtual memory. So, it seems that the root have unlimited value while users have limited value. Regards, Mahmood On Sun, Apr 15, 2018 at 10:26 PM, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: > Hi Mahmood, > > It seems your compute node is configured with this limit: > > virtual memory (kbytes, -v) 72089600 > > So when the batch job tries to set a higher limit (ulimit -v 82089600) than > permitted by the system (72089600), this must surely get rejected, as you > have discovered! > > You may want to reconfigure your compute nodes' limits, for example by > setting the virtual memory limit to "unlimited" in your configuration. If > the nodes has a very small RAM memory + swap space size, you might encounter > Out Of Memory errors... > > /Ole ------------------------------ Message: 2 Date: Sun, 15 Apr 2018 18:31:08 +0000 From: Bill Barth <bba...@tacc.utexas.edu> To: Slurm User Community List <slurm-users@lists.schedmd.com>, "ole.h.niel...@fysik.dtu.dk" <ole.h.niel...@fysik.dtu.dk> Subject: Re: [slurm-users] ulimit in sbatch script Message-ID: <6218364a-07c8-4a75-b90a-a7ae77ebe...@tacc.utexas.edu> Content-Type: text/plain; charset="utf-8" Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu | Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 4/15/18, 1:26 PM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-boun...@lists.schedmd.com on behalf of mahmood...@gmail.com> wrote: I actually have disabled the swap partition (!) since the system goes really bad and based on my experience I have to enter the room and reset the affected machine (!). Otherwise I have to wait for long times to see it get back to normal. When I ssh to the node with root user, the ulimit -a says unlimited virtual memory. So, it seems that the root have unlimited value while users have limited value. Regards, Mahmood On Sun, Apr 15, 2018 at 10:26 PM, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: > Hi Mahmood, > > It seems your compute node is configured with this limit: > > virtual memory (kbytes, -v) 72089600 > > So when the batch job tries to set a higher limit (ulimit -v 82089600) than > permitted by the system (72089600), this must surely get rejected, as you > have discovered! > > You may want to reconfigure your compute nodes' limits, for example by > setting the virtual memory limit to "unlimited" in your configuration. If > the nodes has a very small RAM memory + swap space size, you might encounter > Out Of Memory errors... > > /Ole ------------------------------ Message: 3 Date: Sun, 15 Apr 2018 23:01:32 +0430 From: Mahmood Naderan <mahmood...@gmail.com> To: ole.h.niel...@fysik.dtu.dk, Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] ulimit in sbatch script Message-ID: <CADa2P2U-9Pxm0oPT-DkmjzBDa66uk2z=tr-69X=p5woawph...@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" BTW, the memory size of the node is 64GB. Regards, Mahmood On Sun, Apr 15, 2018 at 10:56 PM, Mahmood Naderan <mahmood...@gmail.com> wrote: > I actually have disabled the swap partition (!) since the system goes > really bad and based on my experience I have to enter the room and > reset the affected machine (!). Otherwise I have to wait for long > times to see it get back to normal. > > When I ssh to the node with root user, the ulimit -a says unlimited > virtual memory. So, it seems that the root have unlimited value while > users have limited value. > > Regards, > Mahmood > > > > > On Sun, Apr 15, 2018 at 10:26 PM, Ole Holm Nielsen > <ole.h.niel...@fysik.dtu.dk> wrote: >> Hi Mahmood, >> >> It seems your compute node is configured with this limit: >> >> virtual memory (kbytes, -v) 72089600 >> >> So when the batch job tries to set a higher limit (ulimit -v 82089600) than >> permitted by the system (72089600), this must surely get rejected, as you >> have discovered! >> >> You may want to reconfigure your compute nodes' limits, for example by >> setting the virtual memory limit to "unlimited" in your configuration. If >> the nodes has a very small RAM memory + swap space size, you might encounter >> Out Of Memory errors... >> >> /Ole ------------------------------ Message: 4 Date: Sun, 15 Apr 2018 23:11:20 +0430 From: Mahmood Naderan <mahmood...@gmail.com> To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] ulimit in sbatch script Message-ID: <cada2p2xtfsztdtw2_drbtxkkwxz4qdqnlf9p2sbmpu_4c2o...@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" Excuse me... I think the problem is not pam.d. How do you interpret the following output? [hamid@rocks7 case1_source2]$ sbatch slurm_script.sh Submitted batch job 53 [hamid@rocks7 case1_source2]$ tail -f hvacSteadyFoam.log max memory size (kbytes, -m) 65536000 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) 72089600 file locks (-x) unlimited ^C [hamid@rocks7 case1_source2]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 53 CLUSTER hvacStea hamid R 0:27 1 compute-0-3 [hamid@rocks7 case1_source2]$ ssh compute-0-3 Warning: untrusted X11 forwarding setup failed: xauth key data not generated Last login: Sun Apr 15 23:03:29 2018 from rocks7.local Rocks Compute Node Rocks 7.0 (Manzanita) Profile built 19:21 11-Apr-2018 Kickstarted 19:37 11-Apr-2018 [hamid@compute-0-3 ~]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 256712 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [hamid@compute-0-3 ~]$ As you can see, the log file where I put "ulimit -a" before the main command says limited virtual memory. However, when I login to the node, it says unlimited! Regards, Mahmood On Sun, Apr 15, 2018 at 11:01 PM, Bill Barth <bba...@tacc.utexas.edu> wrote: > Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? > That would be enforcing /etc/security/limits.conf for all users which are > usually unlimited for root. Root’s almost always allowed to do stuff bad > enough to crash the machine or run it out of resources. If the > /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the > unlimited setting for root is coming from. > > Best, > Bill. ------------------------------ Message: 5 Date: Sun, 15 Apr 2018 19:02:48 +0000 From: Bill Barth <bba...@tacc.utexas.edu> To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] ulimit in sbatch script Message-ID: <9a10d099-77fd-4892-9288-9708b796f...@tacc.utexas.edu> Content-Type: text/plain; charset="utf-8" Mahmood, sorry to presume. I meant to address the root user and your ssh to the node in your example. At our site, we use UsePAM=1 in our slurm.conf, and our /etc/pam.d/slurm and slurm.pam files both contain pam_limits.so, so it could be that way for you, too. I.e. Slurm could be setting the limits for jobscripts for your users, but for root SSHes, where that’s being set by PAM through another config file. Also, root’s limits are potentially differently set by PAM (in /etc/security/limits.conf) or the kernel at boot time. Finally, users should be careful using ulimit in their job scripts b/c that can only change the limits for that shell script process and not across nodes. That jobscript appears to only apply to one node, but if they want different limits for jobs that span nodes, they may need to use other features of SLURM to get them across all the nodes their job wants (cgroups, perhaps?). Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu | Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 4/15/18, 1:41 PM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-boun...@lists.schedmd.com on behalf of mahmood...@gmail.com> wrote: Excuse me... I think the problem is not pam.d. How do you interpret the following output? [hamid@rocks7 case1_source2]$ sbatch slurm_script.sh Submitted batch job 53 [hamid@rocks7 case1_source2]$ tail -f hvacSteadyFoam.log max memory size (kbytes, -m) 65536000 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) 72089600 file locks (-x) unlimited ^C [hamid@rocks7 case1_source2]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 53 CLUSTER hvacStea hamid R 0:27 1 compute-0-3 [hamid@rocks7 case1_source2]$ ssh compute-0-3 Warning: untrusted X11 forwarding setup failed: xauth key data not generated Last login: Sun Apr 15 23:03:29 2018 from rocks7.local Rocks Compute Node Rocks 7.0 (Manzanita) Profile built 19:21 11-Apr-2018 Kickstarted 19:37 11-Apr-2018 [hamid@compute-0-3 ~]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 256712 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [hamid@compute-0-3 ~]$ As you can see, the log file where I put "ulimit -a" before the main command says limited virtual memory. However, when I login to the node, it says unlimited! Regards, Mahmood On Sun, Apr 15, 2018 at 11:01 PM, Bill Barth <bba...@tacc.utexas.edu> wrote: > Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from. > > Best, > Bill. End of slurm-users Digest, Vol 6, Issue 21 ******************************************