Hi Benjamin,

Here is my configuration file:

CompleteWait=60
SlurmdUser=root
ControlMachine=mymachine
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
AuthType=auth/munge
SlurmctldPidFile=/var/run/slurmctl.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
StateSaveLocation=/var/spool/slurm
SlurmctldLogFile=/var/log/slurmctl/controller.log
SlurmdLogFile=/var/log/slurmd/node.log
SwitchType=switch/none
TaskPlugin=task/none
FastSchedule=1
SchedulerType=sched/backfill
ClusterName=mycluster
SelectType=select/cons_res
SelectTypeParameters=CR_CPU,CR_LLN
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoragePort=6819
AccountingStorageHost=other
JobAcctGatherType=jobacct_gather/linux
AccountingStorageEnforce=safe
MaxJobCount=1000000
SlurmctldDebug=1
SlurmdDebug=1
SallocDefaultCommand="srun --mem-per-cpu=0 --pty --preserve-env --mpi=none 
$SHELL"
NodeName=imperial-node[01-10] CPUs=24 RealMemory=64 Sockets=2 CoresPerSocket=12 
ThreadsPerCore=1
NodeName=silver-node[01-28] CPUs=32 RealMemory=128 Sockets=2 CoresPerSocket=16 
ThreadsPerCore=1
NodeName=ocean-node01 CPUs=16 RealMemory=256 Sockets=2 CoresPerSocket=8 
ThreadsPerCore=1
NodeName=ocean-node[02-05] CPUs=32 RealMemory=256 Sockets=2 CoresPerSocket=8 
ThreadsPerCore=2
NodeName=loma-node[01-10] CPUs=24 RealMemory=64 Sockets=2 CoresPerSocket=6 
ThreadsPerCore=2
PartitionName=all 
Nodes=imperial-node[01-10],silver-node[01-28],ocean-node[01-05],loma-node[01-10]
 STATE=UP Default=YES AllocNodes=vax,prime
PartitionName=imperial Nodes=imperial-node[01-10] Default=NO MaxTime=INFINITE 
State=UP AllocNodes=vax,prime
PartitionName=silver Nodes=silver-node[01-28] Default=NO MaxTime=INFINITE 
STATE=UP AllocNodes=vax,prime
PartitionName=ocean Nodes=ocean-node[01-05] Default=NO MaxTime=INFINITE 
STATE=UP AllocNodes=vax,prime
PartitionName=loma Nodes=loma-node[01-10] Default=NO MaxTime=INFINITE STATE=UP 
AllocNodes=vax,prime=2


As you can see, it looks like I should be treating all resrouces on a per-cpu 
basis. I’m not sure what is wrong. Is there a command that I can use on a job 
to check and see that those jobs that are queued are waiting on an available 
CPU? For me, they only say “Resources”, but can you get more information about 
what Resources?

Jordan


> On Jan 16, 2016, at 7:34 AM, Benjamin Redling <[email protected]> 
> wrote:
> 
> 
> Hello Jordan,
> 
> On 2016-01-16 01:21, Jordan Willis wrote:
>> If my partition is used up according to the node configuration, but still 
>> has available CPUS, is there a way to allow a user to who only has a task 
>> that takes 1 cpu on that node?
>> 
>> For instance here is my partition:
>> 
>> NODELIST    NODES PARTITION  STATE  NODES(A/I) CPUS       CPUS(A/I/O/T)   
>> MEMORY
>> loma-node[     38 all*       mix    38/0       16+        981/171/0/1152  64+
>> 
>> 
>> According to the nodes, there is nothing Idling, but there are 171 available 
>> cpus. Does anyone know what’s going on? When a new user asks for 1 task, why 
>> can’t they get on one of those free cpus? What should I change in my 
>> configuration.
> 
> without seeing your configuration thats just guesswork.
> Are you using "select/linear" and "Shared=NO"?
> 
> Apart from that you might want to see the column "Resulting Behavior" to
> get an idea what you have to check in your config:
> http://slurm.schedmd.com/cons_res_share.html
> 
> Regards,
> Benjamin
> -- 
> FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
> vox: +49 3641 9 44323 | fax: +49 3641 9 44321

Reply via email to