Hi, I was running jobs on a single node, then recently added two more. The first node was capable of running only 4 jobs at a time, so added two more CPUs that can run run 8 jobs each and have confirmed that the nodes are up and running. However, if I submit a job, its only running on the old node, it only runs on the new nodes when I specify the job to run on three nodes using ‘sbatch –N3 job.sh. Previously I ran my jobs using ‘sbatch –J jobname job.sh’, I wanted to know how I can specify the job to run on a particular node using sbatch, or automatically transfer it to a free node? Since we have added the two more nodes, when the maximum number of jobs have been submitted to the old node, its no longer queuing any new jobs, unless I specify the job to be shared on all three nodes. I have pasted some more info below:
Sinfo shows me: PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 3 idle bio-linux,bio-linuxnode2,node01 And scontrol show nodes: [batsiraim@bio-linux ~]$ scontrol show nodes NodeName=bio-linux Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=4 CPULoad=0.00 Features=(null) Gres=(null) NodeAddr=xxxx1 NodeHostName=bio-linux Version=15.08 OS=Linux RealMemory=28000 AllocMem=0 FreeMem=5083 Sockets=1 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A BootTime=2017-02-01T05:36:22 SlurmdStartTime=2017-06-18T08:24:56 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s NodeName=bio-linuxnode2 Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.00 Features=(null) Gres=(null) NodeAddr=xxxx2 NodeHostName=bio-linuxnode2 Version=15.08 OS=Linux RealMemory=3800 AllocMem=0 FreeMem=2278 Sockets=1 Boards=1 State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A BootTime=2017-05-10T04:59:26 SlurmdStartTime=2017-06-18T08:25:50 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s NodeName=node01 Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.01 Features=(null) Gres=(null) NodeAddr=xxxxx3 NodeHostName=node01 Version=15.08 OS=Linux RealMemory=3800 AllocMem=0 FreeMem=827 Sockets=1 Boards=1 State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A BootTime=2017-05-24T04:51:53 SlurmdStartTime=2017-06-18T08:25:38 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Thanks in advance. Regards, Batsirai The views expressed in this email are, unless otherwise stated, those of the author and not those of the National Health Laboratory Service or its management. The information in this e-mail is confidential and is intended solely for the addressee. Access to this e-mail by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted in reliance on this, is prohibited and may be unlawful. Whilst all reasonable steps are taken to ensure the accuracy and integrity of information and data transmitted electronically and to preserve the confidentiality thereof, no liability or responsibility whatsoever is accepted if information or data is, for whatever reason, corrupted or does not reach its intended destination.
