To expand upon Martin's reply, two SLURM plugins (sched/gang and
select/cons_res) currently construct a bitmap when the slurmctld
daemon starts with one bit per core on the entire system. If nodes
register with more resources than configured, bitmaps within those
plugins would need to be re-built. The logic to rebuild those bitmaps
does not current exist. It would be possible to add, but it does not
exist today. I will update the slurm.conf man page section on
FastSchedule to clarify this.
Moe Jette
SchedMD
Quoting [email protected]:
Hi Andrew,
I can use select/cons_res with FastSchedule=0 successfully on 2.4. Note
that the processor count in your node definition must match the actual
hardware. From the slurm.conf man page for FastSchedule=0:
Base scheduling decisions upon the actual configuration of each individual
node except that the node's processor count in SLURM's configuration must
match the actual hardware configuration if SchedulerType=sched/gang or
SelectType=select/cons_res are configured
Regards,
Martin Perry
Bull Phoenix
Andrew Punnett <[email protected]>
Sent by: [email protected]
11/02/2011 10:55 PM
Please respond to
[email protected]
To
[email protected]
cc
Subject
[slurm-dev] Problems using select/cons_res with fastschedule=0
Hi,
Is it possible to use FastSchedule=0 with the Consumable Resources Plugin?
When I set 'SelectType=select/cons_res' and 'FastSchedule=0' in my
slurm.conf SLURM does not detect the correct number of
Procs/Sockets/Cores/Threads for my client nodes. The SLURM log on the
client nodes contains the following:
---
[2011-11-03T17:02:47] slurmd version 2.2.4 started
[2011-11-03T17:02:47] slurmd started on Thu 03 Nov 2011 17:02:47 +1300
[2011-11-03T17:02:47] Procs=1 Sockets=1 Cores=1 Threads=1 Memory=64555
TmpDisk=3842 Uptime=87404
[2011-11-03T17:07:03] Node configuration differs from hardware
Procs=1:24(hw) Sockets=1:2(hw)
CoresPerSocket=1:12(hw) ThreadsPerCore=1:1(hw)
---
The function 'validate_node_specs' in 'node_mgr.c' seems to be the
culprit as it always overrides the values provided by the hardware
detection when the 'cons_res' flag is set.
Without the Consumable Resources Plugin enabled, but with
fastschedule=0 set the correct number of Procs/Sockets/Cores/Threads
is detected and shown by 'scontrol show node ctcp001'
Thanks,
Andy
--
Andrew Punnett <[email protected]>
---
Centre for Theoretical Chemistry and Physics (CTCP),
Bldg. 40, Massey University (Albany Campus),
Private Bag 102 904, Auckland 0745,
NEW ZEALAND
---
Phone +64 (0)9 414 0800 ext. 9886
http://ctcp.massey.ac.nz/~punnett