Brian, Try setting a default memory per CPU in the partition definition. Later versions of SLURM (>= 14.11.6?) require this value to be set, otherwise all memory per node is scheduled.
HTH, John DeSantis 2016-01-26 15:20 GMT-05:00 Andrus, Brian Contractor <[email protected]>: > All, > > > > I am in the process of transitioning from Torque to Slurm. > > So far it is doing very well, especially handling arrays. > > > > Now I have one array job that is running across several nodes, but only > using some of the node resources. I would like to have slurm start sharing > the nodes so some of the array jobs will start where there are unused > resources. > > > > I ran a scontrol update to force sharing and see the partition did change: > > > > *#scontrol show partitions* > > *PartitionName=debug* > > * AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL* > > * AllocNodes=ALL Default=YES QoS=N/A* > > * DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > Hidden=NO* > > * MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO > MaxCPUsPerNode=UNLIMITED* > > * Nodes=compute[45-49]* > > * Priority=1 RootOnly=NO ReqResv=NO Shared=FORCE:4 PreemptMode=OFF* > > * State=UP TotalCPUs=280 TotalNodes=5 SelectTypeParameters=N/A* > > * DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED* > > > > But it is not starting job 416_37 on any node as I would expect. > > > > *#squeue* > > * JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON)* > > * 416_[37-1013%6] debug slurm_ar user1 PD 0:00 1 > (Resources)* > > * 416_36 debug slurm_ar user1 R 35:46 1 > compute49* > > * 416_35 debug slurm_ar user1 R 1:47:25 1 > compute46* > > * 416_33 debug slurm_ar user1 R 7:30:50 1 > compute45* > > * 416_32 debug slurm_ar user1 R 7:38:39 1 > compute47* > > * 416_31 debug slurm_ar user1 R 8:53:26 1 > compute48* > > > > In my config, I have: > > *SelectType = select/cons_res* > > *SelectTypeParameters = CR_CORE_MEMORY* > > > > > > What am I missing to get more than one job to run on a node? > > > > Thanks in advance, > > > > Brian Andrus >
