Hi Benjamin, Here is my configuration file:
CompleteWait=60 SlurmdUser=root ControlMachine=mymachine MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=1 AuthType=auth/munge SlurmctldPidFile=/var/run/slurmctl.pid SlurmdPidFile=/var/run/slurmd.pid SlurmdSpoolDir=/var/spool/slurmd StateSaveLocation=/var/spool/slurm SlurmctldLogFile=/var/log/slurmctl/controller.log SlurmdLogFile=/var/log/slurmd/node.log SwitchType=switch/none TaskPlugin=task/none FastSchedule=1 SchedulerType=sched/backfill ClusterName=mycluster SelectType=select/cons_res SelectTypeParameters=CR_CPU,CR_LLN AccountingStorageType=accounting_storage/slurmdbd AccountingStoragePort=6819 AccountingStorageHost=other JobAcctGatherType=jobacct_gather/linux AccountingStorageEnforce=safe MaxJobCount=1000000 SlurmctldDebug=1 SlurmdDebug=1 SallocDefaultCommand="srun --mem-per-cpu=0 --pty --preserve-env --mpi=none $SHELL" NodeName=imperial-node[01-10] CPUs=24 RealMemory=64 Sockets=2 CoresPerSocket=12 ThreadsPerCore=1 NodeName=silver-node[01-28] CPUs=32 RealMemory=128 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 NodeName=ocean-node01 CPUs=16 RealMemory=256 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 NodeName=ocean-node[02-05] CPUs=32 RealMemory=256 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 NodeName=loma-node[01-10] CPUs=24 RealMemory=64 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 PartitionName=all Nodes=imperial-node[01-10],silver-node[01-28],ocean-node[01-05],loma-node[01-10] STATE=UP Default=YES AllocNodes=vax,prime PartitionName=imperial Nodes=imperial-node[01-10] Default=NO MaxTime=INFINITE State=UP AllocNodes=vax,prime PartitionName=silver Nodes=silver-node[01-28] Default=NO MaxTime=INFINITE STATE=UP AllocNodes=vax,prime PartitionName=ocean Nodes=ocean-node[01-05] Default=NO MaxTime=INFINITE STATE=UP AllocNodes=vax,prime PartitionName=loma Nodes=loma-node[01-10] Default=NO MaxTime=INFINITE STATE=UP AllocNodes=vax,prime=2 As you can see, it looks like I should be treating all resrouces on a per-cpu basis. I’m not sure what is wrong. Is there a command that I can use on a job to check and see that those jobs that are queued are waiting on an available CPU? For me, they only say “Resources”, but can you get more information about what Resources? Jordan > On Jan 16, 2016, at 7:34 AM, Benjamin Redling <[email protected]> > wrote: > > > Hello Jordan, > > On 2016-01-16 01:21, Jordan Willis wrote: >> If my partition is used up according to the node configuration, but still >> has available CPUS, is there a way to allow a user to who only has a task >> that takes 1 cpu on that node? >> >> For instance here is my partition: >> >> NODELIST NODES PARTITION STATE NODES(A/I) CPUS CPUS(A/I/O/T) >> MEMORY >> loma-node[ 38 all* mix 38/0 16+ 981/171/0/1152 64+ >> >> >> According to the nodes, there is nothing Idling, but there are 171 available >> cpus. Does anyone know what’s going on? When a new user asks for 1 task, why >> can’t they get on one of those free cpus? What should I change in my >> configuration. > > without seeing your configuration thats just guesswork. > Are you using "select/linear" and "Shared=NO"? > > Apart from that you might want to see the column "Resulting Behavior" to > get an idea what you have to check in your config: > http://slurm.schedmd.com/cons_res_share.html > > Regards, > Benjamin > -- > FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html > vox: +49 3641 9 44323 | fax: +49 3641 9 44321
