Hello,
Perhaps someone else have already found a solution to my problem? I upgraded to version 14.11.6 yesterday and today find that Slurm refuses to run more than one job per node. Earlier it could cram in sixteen one-core jobs on a 16 core node or sixtyfour one-core jobs on a 64 core node. Here is the partition for the 64 core system: PartitionName=halvan Nodes=h1 Default=YES Shared=NO DefaultTime=00:00:01 MaxTime=14400 MaxNodes=1 State=UP (I include the full configuration at end.) I prefer to run as many jobs as I can on the system, e.g. two 32-core jobs instead of only one. Cheers, -- Lennart Karlsson, UPPMAX, Uppsala University, Sweden ControlMachine=halvan-q AuthType=auth/munge CacheGroups=0 CryptoType=crypto/munge EnforcePartLimits=YES Epilog=/etc/slurm/slurm.epilog PrologSlurmctld=/etc/slurm/slurmctld.prolog JobCredentialPrivateKey=/etc/slurm/slurm.key JobCredentialPublicCertificate=/etc/slurm/slurm.cert JobRequeue=0 MaxJobCount=100000 MpiDefault=none Proctracktype=proctrack/cgroup Prolog=/etc/slurm/slurm.prolog PropagateResourceLimits=RSS ReturnToService=0 SallocDefaultCommand="/usr/bin/srun -n1 -N1 --pty --preserve-env --mpi=none -Q $SHELL" SchedulerParameters=kill_invalid_depend,default_queue_depth=5000,bf_window=28800,max_job_bf=5000,bf_interval=120 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/tmp/slurmd SlurmUser=slurm StateSaveLocation=/usr/local/slurm-state SwitchType=switch/none TaskPlugin=task/cgroup TaskProlog=/etc/slurm/slurm.taskprolog TopologyPlugin=topology/none TmpFs=/scratch TrackWCKey=yes TreeWidth=20 UsePAM=1 HealthCheckInterval=1800 HealthCheckProgram=/etc/slurm/slurm.healthcheck InactiveLimit=0 KillWait=600 MessageTimeout=60 ResvOverRun=UNLIMITED MinJobAge=65533 SlurmctldTimeout=300 SlurmdTimeout=2400 Waittime=0 FastSchedule=1 MaxMemPerCPU=32768 SchedulerType=sched/backfill SchedulerPort=7321 SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory PriorityType=priority/multifactor PriorityDecayHalfLife=0 PriorityCalcPeriod=5 PriorityUsageResetPeriod=MONTHLY PriorityFavorSmall=NO PriorityMaxAge=14-0 PriorityWeightAge=20160 PriorityWeightFairshare=10000 PriorityWeightJobSize=104 PriorityWeightPartition=0 PriorityWeightQOS=300000 AccountingStorageEnforce=associations,limits,qos AccountingStorageHost=milou-q AccountingStoragePort=7031 AccountingStorageType=accounting_storage/slurmdbd ClusterName=halvan DebugFlags=NO_CONF_HASH JobCompLoc=/etc/slurm/slurm_jobcomp_logger JobCompType=jobcomp/script JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm/slurmd.log NodeName=DEFAULT Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN TmpDisk=100000 NodeName=h1 Sockets=8 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=2064400 Feature=mem2048GB,noswitch0,usage_mail Weight=4 PartitionName=all Nodes=h1 Shared=EXCLUSIVE DefaultTime=00:00:01 MaxTime=14400 State=DOWN PartitionName=halvan Nodes=h1 Default=YES Shared=NO DefaultTime=00:00:01 MaxTime=14400 MaxNodes=1 State=UP
